Creating a Visualization Library in Mathematica

Status: First draft; written in a hurry. Publishing since I want to get better at publishing incremental work and nobody reads these things anyways.

I recently came back to using Mathematica as my tool of choice for early-stage prototyping and data exploration. Unlike many other language ecosystems, Mathematica comes with a lot built in, including many data visualization functions. However, plots are scattered throughout the landscape without a high-level system unifying them: every new chart lives in its own world, and there’s nothing resembling The Grammar of Graphics unifying them.

So I decided to see how much work it would take to create a more uniform and flexible plotting system like Plot, my current tool of choice for browser-based visualizations, in Mathematica. It’s been a few days and I’ve made some progress and have the following picture to show for it:

Which was made by this code:

plot[
 dot[
  "x" -> data[[All, 1]],
  "y" -> data[[All, 2]],
  "stroke" -> data[[All, 5]]]]

The library works along very similar lines to Observable Plot, which is the existing library I’m most familiar with, and directly inspired this effort.

Technical notes

The following was written in something of a hurry, and I plan to refine it later.

The basic idea is that a plot is a collection of marks overlaid atop each other. (This plot has one mark, dot.)

A plot encodes meaning through visual channels, such as x or y or color. Each mark provides data to some number of channels. Based on this data, together with preferences provided by the user, the plot function comes up with an encoding for each channel that maps from the data domain to the visual domain (eg. taking a category label such as “a” and turning it into a color, or taking a numeric data value and turning it into a position along the x-axis.

The usability of the library depends strongly on the amount of deduction the library can make for you, and its ability to add explanatory marks to the plot, such as legends and axes, so that the resulting picture can be made sense of.

An interesting point that I had not sufficiently appreciated is the importance of representing scales in a way that allows you to make these explanatory marks. For example, the library needs to be able to tell when a log scale is used so that it can generate logarithmic axis ticks. Another example is that in Plot, the “shorthand” field specification syntax is what allows the library to know how to label your axes. A possibility in Mathematica is to use controlled evaluation for the same purpose.

This also raises some questions about evaluation order. I was hoping to use regular marks to build the axis legends, but had to figure out a solution to the time-ordering problem: Normal marks determine the set of inputs for each channel, from which scales for each channel are inferred, and ticks generated. But axis marks need to know the tick values. My current solution is to allow a mark to be a function which will be invoked with scale information. Any points plotted by such marks will not be used to determine the domain of the plot.

Another Plot design point that I hadn’t fully appreciated is that one reason it uses a z channel to allow eg. the line mark to make multiple lines is because sometimes one data column wants to be plotted as multiple curves, eg. with categorical time series, where you don’t want the user to need to reshape their data to calling your mark function.

My implementation so far is about 250 lines of code, and doesn’t yet include the ability to make bar charts, does not include faceting, or support for something like Plot’s z channel. I’m also not yet sure about the best way to support interactive selection, highlighting, and animated transitions. But you can use continuous, ordinal, and categorical encodings to plot text, dots, and lines, with scale and tick inference and a composable mark system.

We’ll see if my current design survives the addition of any of these new features. Part of the pleasure of using such a high-level programming language is that rewrites are cheap: when you know what you want to say, and if performance isn’t too much of a concern, you can express yourself very concisely.

References

Plot – My favorite plotting library, from the folks behind d3.

Wolfram Videos: ggplot: A Grammar of Graphics for the Wolfram Language – I came across this while writing this note. Haven’t watched it yet.

Appendix: Code

(* Names of all channels *)
channelNames = {"x", "y", "r", "color", "opacity", "fx", "fy"};
(* Mapping from option names to channel names *)
optionChannels = <|"x" -> "x", "x1" -> "x", "x2" -> "x", "y" -> "y", 
   "y1" -> "y", "y2" -> "y", "r" -> "r", "stroke" -> "color", 
   "fill" -> "color", "opacity" -> "opacity", 
   "strokeOpacity" -> "opacity", "fillOpacity" -> "opacity", 
   "fx" -> "fx", "fy" -> "fy"|>;

(* Converts its argument to a list, if it isn't one already *)
toList[x_] := If[ListQ[x], x, {x}]

(* "Broadcasts" scalar values in an association to lists, so the \
result is tabular in structure. *)
toTable[assoc_] := 
 AssociationThread[
  Keys[assoc] -> Map[toList, Transpose[Thread[Values[assoc]]]]]

(* Normalization functions to rescale a domain to [0, 1] in the case \
of continuous and ordinal, or to [0, n] in the case of indexed *)
continuous[domain_] := Rescale[#, MinMax[domain]] &
indexed[domain_] := 
 AssociationThread[domain -> Range[Length[domain]]]
ordinal[domain_] := 
 indexed[domain]/Length[domain] - 1/(2 Length[domain])

(* Map an association to another association: <| k->v |> -> <| \
k->f[k,v ]|> *)
(* This is a bit confusing, but have a look at its usage in mark \
render functions. *)
kvMap[f_, assoc___] := 
 AssociationMap[First[#] -> f[First[#], Last[#]] &, assoc]

(* Applies the appropriate scale function to each option in assocs *)
scaleMap[scale_, assocs__] := kvMap[
  Map[scale[optionChannels[#]]["apply"], #2] &,
  toTable[Association[assocs]]]

(* Returns a list of k->v where k is the channel name and v is a list \
of values *)
markChannelValues[mark_] := 
 mark // List // Prepend[optionChannels] // KeyIntersection // 
     Values // Transpose // MapApply[#1 -> toList[#2] &] // 
  KeyDrop[Keys[mark["scales"]]]

(* Dot *)
render[dot, opts_, scale_, ctx_] := Module[{
      m = 
    scaleMap[scale, <|"x" -> 0, "y" -> 0|>, 
     KeyTake[opts, {"x", "y", "r", "stroke"}]]},
    MapThread[
   Style[Circle[#1, #2], #3] &, {Transpose[
     ctx["dims"]*{m["x"], m["y"]}], m["r"], m["stroke"]}]]
dot[opts___] := Association[Options[dot], opts]
Options[dot] = {"type" -> dot, "fillOpacity" -> 0.1, 
   "strokeOpacity" -> 0.5, "r" -> 1, "stroke" -> Automatic, 
   "scales" -> <||>};

(* Line *)
render[line, opts_, scale_, ctx_] := Module[{
   m = scaleMap[scale, KeyTake[opts, {"x", "y", "stroke"}]]},
    (* For now we draw a single curve with a single color. *)
    Style[Line[Transpose[ctx["dims"]*{m["x"], m["y"]}]], 
   First[m["stroke"]]]]
line[opts___] := Association[Options[line], opts]
Options[line] = {"type" -> line, "x" -> 0, "y" -> 0, 
   "fillOpacity" -> 0.3, "stroke" -> Automatic, "scales" -> <||>};

(* Text *)
render[text, opts_, scale_, ctx_] := Module[{
   m = toTable[Association[
      scaleMap[scale, KeyTake[opts, {"x", "y", "stroke"}]],
      KeyTake[opts, {"dx", "dy"}],
      <|"align" -> opts["align"]|>]]},
    MapThread[
   Style[Text[#1, #2, #3], #4] &,
   {opts["text"], 
    Transpose[ctx["dims"]*{m["x"], m["y"]} + {m["dx"], m["dy"]}], 
    m["align"], m["stroke"]}]]
text[opts___] := Association[Options[text], opts]
Options[text] = {"type" -> text, "x" -> 0, "y" -> 0, "dx" -> 0, 
   "dy" -> 0, "align" -> Center, "stroke" -> Automatic, "scales" -> <||>};

(* Link *)
render[links, opts_, scale_, ctx_] := Module[{
   m = Association[
     scaleMap[scale, 
      KeyTake[opts, {"x1", "y1", "x2", "y2", "stroke"}]],
     KeyTake[opts, {"dx1", "dx2", "dy1", "dy2"}]],
    p1, p2, xy
   },
    p1 = 
   Transpose[ctx["dims"]*{m["x1"], m["y1"]} + {m["dx1"], m["dy1"]}];
    p2 = 
   Transpose[ctx["dims"]*{m["x2"], m["y2"]} + {m["dx2"], m["dy2"]}];
    Thread[Style[Map[Line, Thread[List[p1, p2]]]], m["stroke"]]]
Options[links] = {"type" -> links, "x1" -> 0, "y1" -> 0, "x2" -> 0, 
   "y2" -> 0, "dx1" -> 0, "dx2" -> 0, "dy1" -> 0, "dy2" -> 0, 
   "stroke" -> Automatic, "scales" -> <||>};
links[opts___] := Association[Options[links], opts]

(* Axes are given as function marks, which accept the scales as an \
argument, and whose values do not inform the scale domains. *)
(* This solves the order-of-operations problem where the domain is \
inferred from marks, and wants to be "visualized" by axis marks. *)
(* We still need a way to signal to Plot to not add its own axes if \
the user provided some; maybe with some magic options. *)
xAxis[] :=
 If[KeyExistsQ[#, "x"] && Length[#["x"]["domain"]] > 0,
   {text["x" -> #["x"]["ticks"], "y" -> 0, 
     "text" -> Map[NumberForm[#, {Infinity, 1}] &, #["x"]["ticks"]], 
     "dy" -> -18, "stroke" -> Black, "align" -> Top, 
     "scales" -> <|"y" -> <|"apply" -> Identity|>, 
       "color" -> <|"apply" -> Identity|>|>],
    links["x1" -> #["x"]["ticks"], "x2" -> #["x"]["ticks"], "y1" -> 0,
      "y2" -> 0, "dy1" -> -12, "dy2" -> -16, 
     "scales" -> <|"y" -> <|"apply" -> Identity|>, 
       "color" -> <|"apply" -> (Black &)|>|>]}, {}] &

yAxis[] := 
 If[KeyExistsQ[#, "y"] && 
    Length[#["y"]["domain"]] > 0, {text["x" -> 0, 
     "y" -> #["y"]["ticks"], 
     "text" -> Map[NumberForm[#, {Infinity, 1}] &, #["y"]["ticks"]], 
     "dx" -> -20, "stroke" -> Black, "align" -> Right, 
     "scales" -> <|"x" -> <|"apply" -> Identity|>, 
       "color" -> <|"apply" -> Identity|>|>],
    links["x1" -> 0, "x2" -> 0, "y1" -> #["y"]["ticks"], 
     "y2" -> #["y"]["ticks"], "dx1" -> -12, "dx2" -> -16, 
     "scales" -> <|"x" -> <|"apply" -> Identity|>, 
       "color" -> <|"apply" -> (Black &)|>|>]}, {}] &

(* Default color schemes *)
indexedColorScheme := 
 Module[{colors = 
    Map[Interpreter["StructuredColor"], {"#1f77b4", "#ff7f0e", 
      "#2ca02c", "#d62728", "#9467bd", "#8c564b", "#e377c2", 
      "#7f7f7f", "#bcbd22", "#17becf"}]}, colors[[#]] &]
ordinalColorScheme := 
 Blend[Map[
    Interpreter["StructuredColor"], {"#f7f4f9", "#e7e1ef", "#d4b9da", 
     "#c994c7", "#df65b0", "#e7298a", "#ce1256", "#980043", 
     "#67001f"}], #] &

(* Scale inference fills in the subcomponents of a channel scale \
based on the channel, whether the values are numeric, and \
already-specified subcomponents. *)

(* Base case *)
inferScale[channelName_, isNumeric_, {dom_, norm_, out_}] := {dom, 
  norm, out}
(* Default "out" for the radius channel *)
inferScale["r", isNumeric_, {dom_, norm_, Automatic}] := {dom, 
  norm, (2 + 5 #) &}
(* Default domain for the "opacity" channel *)
inferScale["opacity", 
  isNumeric_, {Automatic, norm_, out_}] := {{0, 1} &, norm, out}
(* Default domain and normalization for channels with all numeric \
values *)
inferScale[channelName_, 
  True, {Automatic, Automatic, out_}] := {MinMax, continuous, out}
(* Default domain and normalization for channels with nonnumeric \
values *)
inferScale[channelName_, 
  False, {Automatic, Automatic, out_}] := {Union, ordinal, out}
inferScale["color", False, {Automatic, Automatic, out_}] := {Union, 
  indexed, out}
(* Default continuous|ordinal color scheme *)
inferScale["color", 
  isNumeric_, {dom_, norm : continuous | ordinal, Automatic}] := {dom,
   norm, ordinalColorScheme}
(* Default indexed color scheme *)
inferScale["color", 
  isNumeric_, {dom_, norm : indexed, Automatic}] := {dom, norm, 
  indexedColorScheme}
(* Default normalization is continuous *)
inferScale[channelName_, isNumeric_, {dom_, Automatic, out_}] := {dom,
   continuous, out}
(* Default out transform is Identity *)
inferScale[channelName_, 
  isNumeric_, {dom_, norm_, Automatic}] := {dom, norm, Identity}

(* Tick inference figures out the ticks for a scale based on the \
properties of its subcomponents. *)
inferTicks[in_, norm : ordinal | indexed, domain_] := domain
inferTicks[in : Log10, norm : continuous, domain_] := 
 Range[Floor[in[First[domain]]], Ceiling[in[Last[domain]]], 1/4.]
inferTicks[in_, norm : continuous, {lo_, hi_}] := 
 Range[lo, hi, (hi - lo)/4.]

(* Remove Null elements from a list, since it's a common mistake to \
call plot[mark1, mark2, ] with a trailing comma. *)
removeNulls[xs_] := xs /. Null -> Sequence[]

(* The plot function accepts a list of marks, and returns a graphics \
object. *)
Options[plot] = {"dims" -> {100, 100}, 
   "margins" -> {{50, 10}, {50, 10}}, "metaMarks" -> ({} &)};
plot[marks___, OptionsPattern[]] := Module[{
   dims = OptionValue["dims"],
   margins = OptionValue["margins"],
   markList = 
    Join[{xAxis[], yAxis[]}, removeNulls[Flatten[{marks}]]],
   markTypes,
   channelValues,
   channelNumericQ,
   scales,
   markScales,
   filledMarks,
   renderedMarks
   },
  channelValues = 
   Merge[Map[markChannelValues, Select[markList, AssociationQ]], 
    Catenate];
  channelNumericQ = Map[AllTrue[NumericQ], channelValues];
  scales = Association[
    <|"x" -> <|"domain" -> {}, "ticks" -> {}, "apply" -> (0 &)|>, 
     "y" -> <|"domain" -> {}, "ticks" -> {}, "apply" -> (0 &)|>|>,
    channelValues // kvMap[Module[{
         channel = #, values = #2, in = Identity, dom, norm, out, 
         domain},
        {dom, norm, out} =
         
         ConstantArray[Automatic, 3] //. 
          opts_ :> inferScale[channel, channelNumericQ[channel], opts];
        domain = dom[Map[in, values]];
        <|"domain" -> domain, "ticks" -> inferTicks[in, norm, domain],
          "apply" -> Composition[out, norm[domain], in]|>
        ] &]
    ];
  markList = 
   FixedPoint[Replace[Flatten[#], f_Function :> f[scales], {1}] &, 
    markList];
  markScales = 
   markList // Map[Merge[{scales, #scales}, Merge[Last]] &];
  markTypes = Comap[markList, "type"];
  renderedMarks = 
   Thread[render[markTypes, markList, markScales, <|"dims" -> dims|>]];
  Graphics[
   Translate[renderedMarks, Map[First, margins]],
   ImageSize -> dims + Map[Total, margins],
   PlotRange -> Transpose[{{0, 0}, dims + Map[Total, margins]}]]]

raw = ExampleData[{"MachineLearning", "FisherIris"}, "Data"];
data = raw // Map[Append[First[#], Last[#]] &];

plot[
 dot[
  "x" -> data[[All, 1]],
  "y" -> data[[All, 2]],
  "stroke" -> data[[All, 5]]]]