DataGraph Version 4.3 was released on January 20, 2018. This version contains many additions and improvements, including a new Color ramp variable.
In this post, we will compare the new Color ramp to the existing Color scheme variable.
Using Color ramps and schemes
Both Color ramps and Color schemes can be used for fill or line colors in several commands including: Points, Plots, Bar, and Pivot. In fact, any command that allows you to select a Color scheme will also work with the new Color ramp.
DataGraph has included Color schemes for a while now. Color schemes allow you to specify discrete colors for a range of values or categories.
In contrast, Color ramps are continuous color schemes. To compare let’s consider the following graphic that was created in DataGraph. We first created this graphic using a Color scheme, as discussed in a previous post.
The Color scheme is used for the fill color in a Bar command. To use a Color ramp, select the name of the ramp in the same location that you would normally select a Color Scheme.
Below, you see the Color scheme on the left, called “My Scheme”, and the Color ramp on the right, called “My Ramp”. These are both located in the Global variables list.
In the Color scheme, a discrete color is specified for each numerical range. With the new Color ramp, you must specify a starting and ending color, along with the numerical range associated with the ramp.
In this image, both of these options result in an almost identical image; however, using the Color ramp makes it somewhat more straight forward to explore different color options, as we only have two color tiles to change.
You can also add additional lines to the Color ramp to utilize multiple colors.
Below, you can see the result of using this new ramp.
Creating Color ramps and schemes
Color ramps can only be created from the Other menu in the global variables section of DataGraph.
Color Schemes can also be created from this menu or using shortcuts from within a command. If you have never created a Color scheme before, check out the example in this video, using shortcuts to quickly create a color scheme variable: DataGraph 4.1 News: New Color Schemes in the Beta
Color ramp legend
Note that displaying a color ramp on a graph is done with the new Color ramp legend. See the on-line Help for more information on using this new command.
A recent comment on the app store caught our attention and got us thinking a bit. One user characterized DataGraph as “Powerful, customizable, and a bit quirky”.
Certainly we like being called powerful and customizable, but we weren’t sure quirky was a good thing. This comment got me thinking about what may have seemed different or quirky to me when I first started using DataGraph about five years ago.
“Powerful, customizable, a bit quirky”
by zfirst, App Store Reviewer
Prior to DataGraph, I had a lot of experience using spreadsheets and databases for organizing and processing data. I used spreadsheets for many years but found the freeform nature of spreadsheets was a problem with larger datasets. Moving to databases solved many of the problems I had with spreadsheets, but the table structure could be limiting.
The way data is organized in DataGraph is very different from spreadsheets and databases. In general, DataGraph is more structured than a spreadsheet, but more flexible than a database.
In this post, we will compare DataGraph to spreadsheets and databases, with a focus on how data is organized and some benefits of this approach.
Spreadsheets and DataGraph
At first glance, the DataGraph data table looks similar to a spreadsheet. There are columns of data with a header; however, the data table is very different than a spreadsheet.
In spreadsheets, you have a lot of flexibly. You can have white space surrounding your data, and put formulas in any you cell you like. You can highlight a range of cells and move them to a new location.
In DataGraph, your data is much more structured. Data is organized in columns with names you specify. Rows get added when you add more data. You manipulate data in terms of entire columns or rows, rather than cells or cell ranges.
This leads us to the concept of ‘granularity’. You can think of granularity as the smallest unit that you use to refer to data. In a spreadsheet, the granularity is a single cell. In DataGraph, the granularity is the column.
Using columns as the fundamental unit for storing data makes formulas easier to create and understand. No need to have each cell contain individual formulas. Calculations are done on entire columns.
DataGraph also contains a number of column properties for commonly used calculations. For example, the following video demonstrates how to calculate a running sum for a column using the ‘isum’ property of the column.
As demonstrated above, the Running Sum column is calculated by adding an Equation column to the column definition list on the left side of the screen. The equation is simply the name of the column followed by ‘.isum’. Using this approach is fast and ensures that every row is calculated correctly, even when we add more data.
Compare this to a spreadsheet where a running sum equation for the example sheet above would have the following equations: D3=C3, D4=C4+D3, D5=C5+D4, and D6=C6+D5. If you added more data you also have to add more equations.
Of course there are somethings that you can do in a spreadsheet that are difficult or impossible in DataGraph. If you want each cell to have different color shading or outlines you could not do this in DataGraph. Decidedly, DataGraph is geared toward to visualizing large data sets in graphical form, rather than in table view.
DataBases and DataGraph
For larger datasets, databases have an advantage over spreadsheets. Databases, specifically relational databases, impose a table structure. This can be an improvement over the freeform approach of spreadsheets; however, not all data fits well into this format. Moving datasets in and out of databases can also be a challenge.
Databases are similar to DataGraph in that calculations are also done on entire columns of data and you can specify column names. The big difference is that DataGraph does not impose a specific table structure.
In databases, data is organized in tables where every column in the table has to have the same number of rows. In DataGraph, data is organized in groups, that can be viewed simultaneously, where every column can have its own length.
Many databases also have a relational structure, where tables have a relationship based on some key variables. In DataGraph, data is organized in groups that can be nested in multiple levels, creating a hierarchical approach to managing your data.
There are huge benefits to having a hierarchical approach. You can nest data in multiple levels to create a data structure that best suits your data.
Similar to a relational database, you can make connections between data groups using a Map column, which allows you to lookup items in a list based on a key value. This approach is similar to a query, but there is no need to compose and run an SQL statement. See the DataGraph on-line help for more information on Map columns.
Data groups also make it easy to replace data used in a graph. In the following example, there are two data groups with columns called ‘Day’ and ‘Amount’. Note how the data is updated by simply dragging and dropping the groups contents on the command that is plotting the data.
Using groups makes it easy to reuse commands and graphics. If your data changes it can easily be updated with a simple drag and drop.
The Design of DataGraph
The foundational structure of a program will determine which functionality is easy or difficulty to implement and understand. DataGraph is not trying to clone a spreadsheet program or be a new type of database.
DataGraph has a unique approach that includes the strengths of spreadsheets and databases. In particular, DataGraph has a design structure that makes many operations that are difficult to express in spreadsheets more intuitive, especially for large datasets.
So yes – DataGraph is different and some aspects of the program may seem unusual, unexpected, and even a bit quirky. To end this post, I thought it would be fitting to provide the full review that inspired it.
“This app is a godsend after struggling with the many data-display shortcomings of Excel and Numbers. It lets you customize nearly every aspect of your outputs, and with a refreshing layers-style interface for controlling and formatting each visual element. As other reviewers have noted, the interface elements are quirky and often nonstandard. But the rewards of spending a bit of time to master this app are vast.”
A common question from new users of DataGraph is, “How do I specify the axis range for a graph?” In many programs, you set an axis range by specifying both a minimum and maximum value; however, a problem with this approach is that your graph may only look good for a particular data set.
DataGraph takes a more flexible approach, allowing you to control the behavior of the axis range, without having to explicitly set a minimum and maximum value. As a result, you can create graphs that can be used with new data, without having to change the axis settings.
In this blog post, we will describe the how and why of axis settings in DataGraph and present some examples to illustrate how they work.
Modifying the Axis Range in DataGraph
One of the strengths of DataGraph is that you can easily replace or modify the data used in a command. Consistent with this approach, the settings for modifying an axis range allow you to influence the range without having to explicitly specify both a minimum and maximum value. That said, if you want to restrict a graph to a particular range, you can do that as well.
To modify the default axis range open the detail view in the Axis settings.
In the detail view, you will see three settings that allow you to modify the axis ranges: (1) Include, (2) Padding, and (3) Restrict. The default values for these settings are shown below.
In the Include field, you can enter values that you want to make sure are included in the axis range. For example, if you wanted a graph to span, at a minimum, from -10 to 10, you would set Include to “-10,10”.
The Padding is set to avoid points lying directly on the axis box. When this is set to “Nice Value”, DataGraph will increase the range by a small amount beyond the minimum and maximum values in the data, and then pick the nearest tick mark.
Restrict ensures the axis does not go beyond a particular range, and requires a minimum and a maximum value. By default, Restrict is set from negative infinity to positive infinity, or all values. Restrict is applied last, and might therefore overwrite the range specified in the Include field.
Using the Include Setting
The first example comes from Ott and Mendenhall’s Understanding Statistics ( 1990), in which the last Chapter is entitled “Communicating the results of a statistical test”. In this chapter, a classic example is given, showing how the axis range can significantly impact how we interpret a given set of data.
They provide a data set containing the consumer price index (CPI) for the months Jan to Jun, plotted here using the default axis settings in DataGraph. The slope of the line seems to indicate a significant rise in the CPI over time.
Now let’s see what the graph looks like when the y-axis starts at zero.
The CPI is still rising but it does not appear as dramatic when compared to the first graph.
Including the Origin
In the first graph above, the y-axis range is based on the default settings. In the second graph, we added “0” to Include in y.
This effectively includes “0” in the dataset that the program uses to determine the axis range.
Including a Range
For the graph above where we adjusted the y-axis to include 0, let’s say we also wanted the y-axis to end at exactly 250.
Below are three ways to modify the Axis settings that would result in the above graph. The settings that have been modified from the default values are highlighted.
In Option 1, we added 230 to the Include field, which is slightly higher than the maximum value in the data. With the additional padding, the y-axis range is extended to 250, just where we want it. Option 2 sets the range at exactly 0 to 250 and removes the Padding. Option 3 sets the Include and the Restrict to the same value.
The difference in these options becomes evident when we add a another Plot command with a new data set.
Option 3 is the most restrictive approach and does not allow the axis range to grow with the new data. Option 1 and 2 are not restricted, so both will grow with the data; however, Option 1 allows the program to add padding to prevent the points from overlapping the axis box.
Using the Restrict Setting
The Restrict setting can also be thought of as a way to crop a graph or zoom in on a particular region of data. You can change Restrict in the Axis settings or interactively on the graph itself, as discussed below.
Consider the following data where a signal is being tracked over time.
Now consider the same data set with a single outlier.
Restrict can be used to focus the graphic on the data. Here we set Restrict to a maximum y-value of 25.
You can also click and drag directly on the graph. The program will then highlight the region you are selecting.
When you release the mouse, the graph will be limited to the highlighted region and the Restrict setting is modified according to your selection.
Removing the Restrict Settings
If you no longer want to restrict the range, you change the setting back to infinity (∞) in the Restrict field. DataGraph understands the infinity symbol (Option-5 on the keyboard) or you can type the variable name, ‘inf’.
As a short cut, you can also click the expand icon, as shown below. The expand icon is only visible when the graph has a Restrict setting for either the x or the y axis.
Also, hovering over an axis that has a Restrict setting will cause the following display to pop-up, containing the same icon in the top left.
This pop-up includes a bar representing the entire range of the underlying data. The portion of the bar filled in white corresponds to the data range that is shown on the graph. This is also an interactive element, in that you can drag the white bar and change the range of data displayed.
You can go back and forth between setting Restrict by typing directly in the Axis settings or zooming to a value directly from the graph.
Using Global Variables
Note that both Include and Restrict can be set using global variables. In this example, we use global variables to pull the maximum x and y (i.e., CountX and MaxY) in the data and use that in the Include settings.
Next, the data is masked on time to create the animation.
Use the links below to download the files used to create these graphs and the above animation.
I use animation all the time in DataGraph; it is a lot of fun and relatively straight forward. If you have not tried creating animations before then read on.
In this post, I’ll describe the basics of the animation variable, which makes it all possible. The animation variable is further explained using a simple example, animating a function. A more complicated example is also given where data is animated over time, based on a user question.
Overview of the Animation Variable
The key to creating an animation is to use the animation variable. Settings for the animation variable are located on the bottom left corner of the program, below the column definition list (click the definition icon on right side of the toolbar to view).
By default, the entry for Animate is set to ‘t’, which is the default name of the animation variable. If you are not familiar with what each of these entries represents, you can hover over each item to get a short description or tool tip.
You can change any of these values, the Range, the Duration, and even the name of the variable. You can also type in the current value for ‘t’ just left of the play button.
To have the value for ‘t’ change over time, hit the play button on the bottom right. Based on the default settings, the value of ‘t’ will increase from 0 to 1 over a duration of 10 seconds. You can also stop the variable from changing at any time or use the slider to vary the value, as shown in the short video below.
Example: Animating a function
Just about anywhere that you can enter a number in DataGraph, you can also enter a variable that represents a number. The animation variable is just like any other variable and it can be used in a a wide variety of ways.
The example I have here shows a simple animation of a function.
The animation was created using a Function command. By default, the Function command has a Range from 0 to 1, where the Range refers to the x-range that the function is evaluated over. Notice that we changed the Range to go from 0 to ‘t’, where ‘t’ is our animation variable.
We also had to edit the axis settings to always include the minimum and maximum x and y values for our function. If we did not include these values the axis range would change as the animation was running.
The following one minute video demonstrates how to create this animation. Notice that, initially, the axis range is changing as the value for ‘t’ is varied. After we edited the Axis settings as shown above, the axis range for the graph no longer varies during the animation.
In the above video, we are able to continue customizing our graphic, changing the line color and line width, while the animation is running.
Exporting a Video
If you want to export your animation, you can click the small QuickTime icon just above the play button. DataGraph creates .mov files. The current format is a lossless file that you can upload directly to YouTube or use in programs like Powerpoint.
If the file is large and you would like to modify the compression settings we recommend using Handbrake. We also are working on adding the option to create .mp4 files directly from DataGraph.
Example Animating Population Data
We recently had a help request on how to animate a graph that a user had created that contained population data from 1971 to 2016 for the country of Australia.
For this example, we modified the Range of the animation variable over the years of population data in the data set (1971-2016) and selected the Integer check box.
To animate the bar graph, the animation variable was used in a mask in a Bar command. The locations of the Label and Region commands were also animated.
We created two DataGraph Demo videos to show the details of how we animated the population data along with the annotations on the graph (i.e., labels, highlighted regions).
Last week, I watched speakers at the Woman in Data Science conference (WiDS) at Stanford via a local meet-up here in North Carolina. The meeting was a world-wide event with speakers simulcast over the globe.
One talk that particularly interested me was on Data Visualization given by Miriah Meyer, a professor at the University of Utah. You can watch her talk at 55:00 minutes into the livestream. She gives some great examples about the insights we can gain by visualizing data.
During the Q&A session, Dr. Meyer was asked about the next great challenge in visualization. She responded that, “… It turns out we still don’t have good tools for non-programmers to use to create very rich and unique visualizations.”
…we still don’t have good tools
for non-programmers … to create very
rich and unique visualizations…
– Miriah Meyer, PhD
DataGraph is not a programing language but it definitely has a programming attitude. I strongly believe that DataGraph can help bridge this gap to allow non-programmers, and programmers alike, to create rich data visualizations. The program gives you (1) virtually complete control of what you see in a graphic, and (2) a visual interface for combining commands and creating graphics.
For this post, I wanted to provide a couple of examples created in DataGraph that would be hard or impossible to do in a standard graphing program.
The first example is inspired by a second comment by Dr. Meyer that people working in data journalism or designing infographics are, “…somewhat limited in what they can do with data, unless they get some programming skills.”
I posted an example a few months ago to illustrate how DataGraph can be used to create infographics that are data driven. On the left is an infographic that was made by the CDC. On the right is my re-creation of this graphic using DataGraph.
In the CDC version, the U.S. is given the same color as New Zealand and Canada, despite being almost twice the amount.
In DataGraph version, the U.S. color is much darker than the other countries, which makes sense given the underlying data. To achieve this look, the color scheme is set up with a color ramp using bins, such that the same color is used for data between 2 and 3, 3 and 4, 4 and 5, … and so on.
Admittedly, this is a relatively simple graphic but the cool thing here is that the graphic is entirely data-driven. If I decide to add more countries or update the data when a new report comes out, I don’t have to recreate my graphic.
The next example is an animation, inspired by the well-known statistician Hans Rosling, who sadly passed away this week with pancreatic cancer.
This particular video shows the relationship between fertility and life expectancy for 100 years of data in one minute, using data downloaded from Gapminder.com. Each moving point represents a country where the size is scaled by population and the color is determined by the continent.
Rosling used similar animations in a number of highly watched TED talks that brought data to life in a unique way. The one I have linked to here has over 11 Million views! Clearly, well worth the watch.
On January 30th, we released the newest version of DataGraph, version 4.2. This version has some great features, many of which were requested by our users. These include:
confidence intervals in regressions,
a new Bracket command, and
improved color scheme options for larger data sets.
One of my personal favorites is the ability to extract x and y locations from the Label, Bracket, Region, and Range commands. This adds an interesting level of interactivity to graphics.
In the short video above, note how both graphs vary based on the location of the arrow, which we can drag around. To create this interactivity, the location of the arrow is extracted as a global variable. The value for the arrow location can then be used in other commands.
For a more detailed demo, check out the following video. Watch here or on our YouTube Channel.
We recently added a new time variable to the DataGraph Beta version.
You can add this variable using the Other drop-down menu in the Global variables section of DataGraph.
We used the new time variable to create a DataGraph file that you can use to count down to the New Year. The file is set up to show a ‘Ball Drop’ in New York and San Francisco. You can also select your own time zone.
Click below to download the file and try out the new time variable. Note that you must use the Beta for this file to work.
The most recent update to the DataGraph Beta includes new color scheme options and updates to the Plots command.
In the current release version of DataGraph (version 4.1), you can get suggested color schemes that depend on the number of items in your list. These suggested schemes were limited to 12 items or less.
In the most recent Beta, there are two new color schemes, a Rainbow and a Gray scale color scheme, that can be created for longer lists. You can also easily create custom color schemes that interpolate between two colors.
The Plots command is only in the Beta and the most recent Beta version has given this command a significant face lift. The Plots command allows you to quickly create multiple line graphs and vary the color of each.
The above image is featured in one of our Mac App Store screenshots. It includes sideways histograms of monthly temperature data from three U.S. cities: Greensboro, NC; San Francisco, CA; and Flint, MI.
We were recently asked how to create this graphic. Although DataGraph has a Histogram command, it only creates a single histogram at a time. To create a series of sideways histograms, we are actually using the Box command, often used to create box-and-whisker plots.
First, let me show you what this data looks like using a Plot command for one of the cities. These are average daily temperatures over 10 years.
Plotting the same data using a Box command, where the Values are set to the ‘temperature’ and the Position is set to ‘month’, results in the following image.
To create the sideways histograms, go into the detail view of the Box command and modify the Type drop-down menu from ‘Whisker’ to ‘Probability’. Now you have sideways histograms!
Note that these are slightly different representations of your data when compared to the Histogram command, as each sideways histogram is scaled to the same height.
To create a graphic with all three cities, we used three Box commands and added a fill to each. In the Axis settings, we also set the X-tick marks drop-down box to ‘Categories’ and set the Labels to a column with the name of each month.
You can see why Mark Twain said, “The coldest winter I ever spent was a summer in San Francisco”, as this west-coast city does not warm up in the summer, when compared to locations like Flint or Greensboro. Although, I wonder whether or not he ever spent a winter in Flint?