Chris Harrison’s “Word Associations Visualizing Google’s Bi-Gram Data” displays information from a dataset in a really cool and artistic way. The project gathers information from the web to make the graphics.
Here is what Harrison said about how the project processes data: Each of [the rays] represent a different tendency of use (ranging from 0 to 100% in 4% intervals). Words are sorted by decreasing frequency within each ray. I render as many words as can fit onto the canvas. There is a nice visual analogy at play – the “lean” of each ray represents the strength of the tendency towards one of the two terms. As in the word spectrum visualization, font size is based on a inverse power function (uniquely set for each visualization, so you can’t compare across pieces).
Words closest to one side are used more for that term (so we can see from the graphic that ‘turkey’ is closely associated with cold, and the ‘water’ is used almost evenly for both cold and hot). I enjoy looking at the graphic and seeing what words usually ‘belong’ together — I imagine something like this is used for predictive text search.
I would also like to see something like this for a different dataset, where instead of gathering word frequencies from the web, you present the terms to many people and ask them to say the first word that comes to their mind.
This was a continuation from his previous Word Spectrum project, which looks more like a typical word cloud. They use the same dataset, but I prefer this one to ‘Spectrum’ since it looks more readable. He has more infographic projects on his site –– I also thought his Wikipedia Top 50 Visualization was interesting to look at.