In any Data Science methodology, it is extremely important to know the data at hand. Exploring data is a crucial process while solving a problem and the importance of it is briefly explored in this article.
Exploration involves a thorough investigation of data and interpretation of hidden insights. One of the most effective ways of understanding your data would be through visual analytics. Visualisations are effective as they carry a lot of information. They contain graphic elements which add further meaning and interpretation towards conveying a strong message or insight.
Data visualisation aims at
- Effective data examination and cleansing
- Testing hypotheses and assumptions
- Exploring data to understand it inside-out
- Presentation and communication of insights/ trends from data
The most commonly used visualisation techniques include Bar graphs, Line graphs, Histograms, Scatter plots, Box plots. Most of the times these visuals would work great and do the job of conveying necessary information to the audience. The same data when visualising with few uncommon graphs would sometimes open up new angles into interpreting the data. Below are few of the uncommon visualisation techniques which would enable both the visualiser and audience to look at data differently.
This is one of the simplest ways of dealing with multivariate data. Multiple graphs based on values of the same variable are plotted and placed on a grid. The graphs can be line, bar or scatter plots. This makes it easier to compare and evaluate the differences in values among the variable.
Polar area diagram
Polar area diagram looks similar to a Pie chart. Where a Pie chart has a magnitude of the value associated with the angle of the circle, the Polar area diagram shows the magnitude based on how far the parts extend from the centre. The angle of every part remains the same making it look like the polar area of the earth.
Hexbin maps are usually good for spatial data. It is similar to a scatter plot on the map. In here the points are grouped in hexagons to combine clusters. This makes it look like a pixelated point representation. Addition of colours/ textures can introduce another variable signifying magnitude of the variable. Overlaying this plot on a geographic map of the world would give more understanding and insight into the data.
It is a variation of a stacked area graph. While the stacked area graph has values relative to a fixed straight axis, the stream graph has its values displaced around a changing central axis. The thickness of the area signifies the magnitude of the variable. This way the graph gives a nice impression of a flow.
Spider chart (Radar chart)
A Spider chart is effective in displaying three or more quantitative variables on axes starting from the same point. The magnitude of a particular variable is marked on the axes and are connected by lines. These connected lines on a circular fashion of multiple axes look like a spider web and so it is also called as a web chart. It becomes easier to compare values with different categories of related variables.
Parallel coordinates plot
This graph is highly suitable for multivariate numerical data. The plots aid in comparing multiple variables together. This makes it easy to understand the relationship between the variables. Each variable in the plot is assigned with a vertical bar and all the bars are placed parallel to each other with respective scales. The data points are then plotted on the vertical bars and connected with a line to signify one record.
Bubble map is basically a scatter plot on a map. It is good for spatial data. Geographic locations are considered and bubbles are plotted on respective areas. The size of bubbles depends on the magnitude of the variable in check. It is good to have transparency to the bubbles so that the map can be seen behind it.
A Bullet graph is a variation of Bar chart but with more context added to it. The additionals are visual elements. Usually, this graph best suits data where a comparison is involved and performance metrics are analysed. The key data is represented by the main bar in the middle known as the ‘Feature measure’. Another small line perpendicular to the middle bar is called a ‘Comparative measure’ So when the main bar touches/ crosses the perpendicular bar, goals have been reached.
When there is hierarchical data, circular packing visualises it effectively. The data points on every level of the tree are circles inside their root circle. The subtrees under every level are grouped into their respective root circles. So every circle will have other small circles inside them and the size of the circle is dependant on the relative magnitude of the data points on the same tree level.
Graph best works if it is interactive. Smaller circles inside the larger ones won’t have labels and it becomes difficult to interpret which circle is representing which value among datapoints.
By clicking on Asia, the interactive graph zooms in and reveals the details about the selected data point.
These are few of the less commonly known statistical graphs. If these visualisations were interesting do check out Scatter plot matrix(SPLOM), Windrose, Doughnut chart, Mosaic plot, Heat maps, QQ plot and Connected scatter plot.