My April plan is to make a final project that attempts to visualize someone’s browsing history as a city. It is difficult to say how far I’ll get in this (especially considering all my other courses ramped up in difficulty, so I haven’t been able to think much of this course in the past two weeks). At the very least from my talk with Golan, I would like to spend a substantial amount of time parsing and finding meaning representations of a person’s browsing history. More than half of this project is figuring out what the data means, what the trends and frequencies are, and overall what this says about the person. I’d like to focus my efforts on trying to (from a high level) classify and describe a person’s browsing history using as few descriptors as possible. From there I can work out the visuals such as how I want to display elements of the city (such as buildings, trees, roads, etc.), but even if I do not get to this stage it is alright. I want to spend more time in the planning aspect and carefully understanding and analyzing the data I’m working with rather than jumping to the visual sides of things.
Some attempts I may try at data analysis:
- Keep a dictionary of good/bad sites, and check each url to see if it is in the dictionary.
- See trends of the data. Is a site being visited day after day, or just once?
- Extract the HTML from the page and use NLP to classify a url if it is unknown