Data Visualization and Analysis
Tuesday, Thursday, 8:30-9:45 AM
LGRT room 121 Instructor:
Dr. Ali Sarvghad
Office hours: Monday 3-4:30 pm
Other times, by appointment only. Teaching Assistant:
Office hours: TBD
Information visualization is an area of research that helps people analyze and understand data using visualization techniques. The multi-disciplinary area draws from other areas of science, including human-computer interaction, data science, psychology, and art to develop new visualization methods and understand how (and why) they are effective.
Information visualization methods are applied to data from many different application domains, including:
- Political reporting and forecasting – as seen on TV and in the papers in election season.
- News reporting – look at the interactive visualizations used by the New York Times, Wall Street Journal, Slate, etc.
- Social science and economic data, such as census and other surveys, and micro and macroeconomic trends.
- Social networking and web traffic, to understand patterns of communication
- Business intelligence and business dashboards – to forecast sales trends, understand competitive marketplace positions, allocate resources, manage production and logistics.
- Text analysis – to determine trends and relationships for literary analysis and for information retrieval.
- Criminal investigations – to portray the relationships between event, people, places and things.
- Performance analysis of computer networks and systems.
- Software engineering – developing, debugging and maintaining software.
- Bioinformatics, to understand DNA, gene expressions, systems biology.
- Learn the principles involved in information visualization
- Understand the wide variety of information visualizations and know what visualizations are appropriate for various types of data and for different goals
- Develop skills in critiquing different visualization techniques in the context of user goals and objectives
- Learn how to implement compelling information visualizations
- Visualization Analysis and Design, Tamara Munzner, CRC Press, ISBN 9781466508910. Principles and paradigmes of visualization design.
- Interactive Data Visualization for the Web, Scott Murray, O’Reilly Media, ISBN 9781449339739. All about D3, the programming tool we will be using for homework and projects.
Grading will be based on assignments, midterm exam, class participation, and a final project. Final course grades may be curved (but not always). Grading weights are:
|Paper Reading & Discussion||15%|
Details about the assignments can be found under the Assignments tab.
Lectures: Monads & Wednesdays
Course Schedule (Lectures, midterm, due dates)
||1/21||Course overview & Intro to InfoVis|
|1/23||Data & task abstraction, DSM, Exercise: data & task abstraction|
|1/30||Validation, Marks & Channels, rules of thumb|
||2/11||Tables, color, Exercise: 2N/BR/CP|
||2/18||No class – Monday Schedule|
|2/20||D3: set up, drawing with SVG|
||2/25||Visualizing spatial data, Networks, & Trees, Exercise|
|2/27||D3: Making basic charts|
||3/3||Handle data complexity: manipulate, Facet, Reduce, Exercise|
||3/10||D3: scales, axes||Project proposal due|
||3/24||Peer review of proposals|
|3/26||Designing Vis UIs||Update proposal|
||3/31||Guest lecture: smart interface, TBD|
|4/2||D3: interactivity, layouts|
||4/7||User evaluation, statistical analysis|
|4/9||D3: project work|
|4/16||Guest lecturer: Digital Civics, Prof. Narges Mahyar|
|4/23||D3: project work|
Please make sure that you have setup D3 before attending the first lab. You can find comprehensive instructions for D3 setup in Murray book Chapter 4. There are also numerous resources online that you can use to help you with the setup.
|Week||Date||Topic||Suggested Reading (Murray Book)|
|1||1/25||D3 set up||Lab 1_Slides|
||2/1||Drawing with SVG||Ch. 6|
||2/8||Making simple charts (Bar, line, scatter plot)||Ch. 6|
||3/15||Spring Recess. No Labs.|
Homework Assignments (HW)
These individual assignments will help you develop your knowledge for design principles for Information Visualization. For each of these, the deadline to submit your work is by the start of class on the day they are due. Unless otherwise described, the submissions must be submitted via Moodle.
The grading distribution is broken down as follows.
Recall, HW assignments are worth 30% of your overall grade, broken down as:
- HW1: 8%
- HW2: 8%
- HW3: 8%
- HW4: 6%
(assignments are adopted from Alex Endert’s InfoVis course: http://va.gatech.edu/courses/cs4460/)
Homework 1: Data Exploration and Analysis
The purpose of this assignment is to provide you with some experience of exploring and analyzing data without using an information visualization system. Below is a data set (that can be imported into Excel, or any other data viewer you want to use) about cereals. You should explore and analyze this data using Excel or simply by hand (drawing pictures is fine), but do not use any visualization tools. Also, you should avoid the visualization and charting functionality of Excel for the purpose of this assignment. Your goal here is to perform an exploratory analysis of the data set, to better understand the data set and its characteristics, and to develop insights about the cereal data.
Submission: What you turn in should consist of four things.
- List (bullet list of items) five analytics queries or questions that a person may have about this data set. These would be questions that an analyst examining the data might be pondering.
- List (bullet list of items) five “insights”, chunks of knowledge, or deeper questions that you either encountered or gained while exploring the data. Insight could be some understanding of the data and its characteristics that are not relatively obvious or intuitive. It is something that most people might not realize initially. Note that an insight or knowledge chunk simply may be a deeper question that arose in your mind while exploring the data. And your analysis may not have been sufficient to answer the question.
- Write one paragraph about the process you used to do the exploration and analysis. Did you load the data into Excel, work manually, or do both? What did you do in Excel? Did you draw pictures? Did you take notes? What did you take notes on? What did you draw? This paragraph should be a general description of your analytic workflow.
- Write one paragraph about challenges or problems that you encountered in doing the analysis this way. Did anything limit or frustrate you? If nothing did, perhaps there was something that was more difficult than you thought it should be. Nothing is perfect, so you should be able to list some potential issues here. So, to sum up, your assignment should have two bullet lists of five items followed by two paragraphs.
Grading: We will evaluate the quality of the insights you listed and the detail given for the process you went through. We are looking for things that we find interesting or perhaps unexpected. This is subjective. For the second and third parts, we will evaluate if you did what the assignment asked.
Cereals data (xls format): a1-cereals
The data set should be pretty self-explanatory. The Manufacturer is a one letter code with the expected mapping (Q-Quaker Oats, P-Post, G-General Mills, K-Kelloggs, R-Ralston Purina, N-Nabisco) and Type is C (cold) or H (hot).
Homework 2: Visual Design
The purpose of this assignment is to provide you with practice and experience designing the appearance of data tables and basic visual charts. Below are two Excel spreadsheets. For the first (Part 1), you should create a table that presents its information as clearly and informatively as possible. Keep in mind the basic chart principles we covered in class.
For the second (Part 2), design a visual chart that does the same. Think about the data in each spreadsheet and what an analyst looking at that data would care about. You are allowed to derive new variables (attributes) that are combinations of the given ones, but you cannot make up totally new variables and values.
To create and render your designs, you can use colored pencils/markers if you’d like. You can also design, layout, and draw your ideas in a computer tool such as Illustrator, PowerPoint, Photoshop, but you cannot use those tools to do any of the design for you. That is, tools that are not allowed include: Tableau, ggplot, Spotfire, Numbers, Excel, etc. Again, you don’t need a tool for this, hand-drawn is fine. If you want to use a tool, they should just be used as drawing tools — The ideas behind the design should be yours.
Submission: Scan or take a picture of your table and graph designs and submit to Canvas.
Grading: We will evaluate the effectiveness and design aspects of your creations, how well and how clearly they can answer a variety of questions about the data. Of course, this is subjective, but we will look for tables and graphs that apply the design recommendations discussed in class and in our readings.
Part 1 dataset: Performance of sales representatives (xls format)
Part 2 dataset: Performance of different company departments over year (xls format)
Homework 3: Use and Critique Tableau
Use and critique Tableau – an Information Visualization System that does not require programming. This assignment will familiarize you with a full-featured InfoVis system – Tableau – which will be introduced in class.
The goals of the assignment are for you to learn the capabilities provided by Tableau (it is one of the best commercial systems), learn the basic visualization methods that it provides and assess its utility in analyzing data.
Groups of 2 are allowed for this assignment! You can write a report on this homework by yourself, or you can do it with a partner (which I encourage, it will be more fun and you will learn more). Note only groups of 2 are allowed, no larger. If you write with a partner, you will both receive the same grade. You may ask others for help with downloading and figuring out how to use Tableau. The paper and its ideas should be developed by you or by your two-person team.
The assignment has four parts:
1. Gain familiarity with Tableau – Familiarize yourself with the visualization techniques and the user interfaces during the class presentation, and via online videos at http://www.tableausoftware.com/learn/training
2. Examine the data sets – Browse several data sets to decide which one to use for the rest of this assignment. Decide on one, and then use the system to explore it further.
3. Develop three interesting questions about the selected data set – put yourself in the shoes of a data analyst, and think about all the different kinds of analysis tasks that a person might want to perform. For instance, someone working with breakfast cereal data might have analysis tasks like:
• Find all the information on Cocoa Pebbles.
• Identify the cereal with the least fat that is also high in fiber.
• What is the distribution of carbohydrates in the cereals?
• Does high fat mean high calories?
• Which of the following three kinds of cereal is best for people on a diet?
Do NOT make all of your questions be about correlations or min or max values.
4. Write a report – Part 1 – List your three questions and answers, along with a screenshot showing the visualization you used to answer each question. One page per question – screenshot and narrative. Each question should be answered with a different visualization – so three different visualizations (and not just different data overlaid on a map as can be done in Gapminder). Part 2 – Critique the system. What are the system’s strengths and weaknesses? For what kinds of user tasks is the system particularly well suited? Focus more here on the visualization techniques as opposed to the particular user interface quirks, though you should feel free to comment on UI aspects when they are particularly good or bad. Describe the characteristics of the UI using the concepts and terminology you have learned in class. This second part should be close to 2 pages.
Submission: Your document should be in PDF format and is limited to a maximum of 5 pages, no cover sheet. Use Times Roman 12 point type with normal margins, 1.5 line spacing. Submit the paper via Canvas. If you worked with a partner, both of you are required to submit to Moodle and ensure both of your names are on it
Homework 4: Draw a Graph
The purpose of this assignment is to give you an appreciation of just how challenging it is to lay out a graph (network) in the plane. Below is an adjacency matrix for an undirected graph. The nodes are labeled along both sides (1-10). Inside the matrix, a 1 indicates an edge, 0 means there is no edge.
Your objective here is to come up with a positioning for all the vertices such that an aesthetically pleasing graph drawing results. Please draw the graph using a standard technique: vertices are represented by some kind of glyph such as a circle, square, etc. with the vertex number inside. Edges are simply lines draw between vertices. Follow those basics, then you are free to embellish beyond that.
Submission: Take a picture (or scan) the piece of paper you drew your graph on and submit it to Moodle. In addition, submit 1 or 2 paragraphs that describing your design process and the method or algorithm you used to create the graph. Put your name on the page with your description of your method, not on the drawing page.
This is just a short HW, so don’t spend too much time or thought on it. (It turns out that you could spend the rest of your life on it.) If you follow the instructions, you’ll receive full credit.
The idea of the project is to take the knowledge and background that you are learning this semester about Information Visualization and put it to good use in a new, creative effort. A real key to the project, however, is to select a data set that people will find interesting and intriguing. Even better would be to select a data set with a clearly identified set of “users” or “analysts” who care deeply about that data. Select a topic that people want to know more about! I cannot emphasize strongly enough the importance of your topic and data set. Use sketches and other low-effort ways to think through how ideas for different datasets. Remember interaction, and the questions that users of the data have (and even questions that your team has of the data). Think about the suite of data visualizations that the NY Times has created over the past few years, a few examples of which are listed below:
- Politicial candidate’s names in speeches
- Netflix Queues in areas
- Movie revenues
- Casualties of war
- Buy or rent?
- Olympic medals
No matter what topic you choose, I am expecting a high-quality project. This project accounts for a majority of your grade in this course (40%) and will require a significant amount of time and effort. In particular, I’m seeking creative projects showcasing interesting ideas. A good project should consist of visualization designs and a software artifact that implements the designs. Interaction is key in information visualization, and it is difficult to understand the interaction issues in your project without a running system. I am explicitly not expecting user testing and evaluation. Ideally, I would like your efforts to be innovative and to result in some form of potential publication to similar venues and styles as the papers that we have read throughout the semester.
You should develop a web-deployable system so that your system can be shown to everyone in the world, and use D3 for visualizing data! Arguments will be entertained for using different visualization toolkits, but in general, D3 is preferred. Using a different toolkit should be approved by the professor prior to starting any code
1- Selection of data and problem (5%):
You (i.e. each group) need to submit a 1-page document describing selected data and the problem that you want to solve. You do not need to have a clear solution to the problem yet but can propose your high-level ideas.
Your selection of data+problem should be approved before you can proceed. Make sure that you the dataset is in public domain (or you can get owner’s consent to use it for class and share it with ) and of reasonable size (not too small or super gigantic).
2- Initial project description (15%):
This is a 2-3 page document listing project members and describing topic to be addressed and data sources/formats. You should address the following questions: What is the problem being addressed? Who would be interested in understanding this data better (users)? what would want they want to see in data (tasks)? Where is the data coming from and what are its characteristics? Why is this a visualization problem and cannot be solved with ML/statistical analysis/etc.
3- Proposed Solutions (30%):
This is a 4-5 page document describing a number of different design ideas for your problem. I want to see a variety of design ideas sketched out well enough so that other people can understand them and provide feedback and comments. Following point will help you (as well as showing on what basis we will evaluate your work) to think about your designs:
- Is the visualization an effective representation of the data?
- Is it based on the fundamentals discussed in this course?
- Does the visualization support different analytical tasks that your users need to do?
- Is the visualization creative and does it illustrate some new ideas? (While it is not necessary to invent some new visualization technique for the project, designs that illustrate creativity and new thinking will generally be viewed positively. Of course, innovation cannot be a total substitute for utility.)
4- Implementation & presentation(50%):
You are required to implement a system that works, ie, it read in the data and present an interactive visualization of the data. Notice, I’m not asking for a bug-free, commercial piece of software. However, functionality that is demonstrated for the purpose of the system should be functional.
Tips for a Successful Project
It is extremely important to select an interesting problem with data that some group of people will care deeply about. I cannot stress enough how vital it is to start with interesting data. Find some topic that almost everyone cares about (e.g., baby names, feature films, traffic in Boston, flight delays, the stock market, weather, etc. — THERE’S DATA ALL AROUND YOU!) or that some subset of people really care about (e.g., sports data, politics, personal health etc.). Consider combining different data sets to produce a new composite data set of special interest. Such a fusion of data often creates a dataset that people want to learn about. Remember that this often takes time and effort to “fuse” multiple data types, so you want to make sure you pick them wisely (i.e., they should be in support of the questions that you want people to be able to answer using your tool).
Two possible styles of successful visualization projects (definitely the space is not limited to these two):
In the first style, the group created a visualization system that has only one view/representation but this representation is new and creative. Here, you should focus on designing an innovative new visual representation. The actual user interface may have different components or pieces, but it should be tightly integrated. The real focus here is on creativity and innovation, and the novel representation of the information. These projects emphasize the mappings between the data (and characteristics/variables of the data) to visual encodings, glyphs, and metaphors.
The second type of successful project employs multiple coordinated views where each view may use some well-known visualization techniques, perhaps customized a little for this problem. The emphasis in this type of project is to create a sound, functional system implementation that clearly can be of help for data analysis and understanding. It is important in this type of project to have coordinated views that work well together and provide different perspectives on the data. This type of project does not have the same level of visualization innovation as the first, but it comes together in strong system implementation, including well-designed user interactions that allow users to explore the data and progress through their task to answer the questions they may have of the data.
http://www.youtube.com/watch?v=8jvoTV54nXw – nice overview and run-through video/talk
http://alignedleft.com/tutorials/d3/ – thorough d3 tutorials from an academic instructor and the author of the open OReilly book, “Interactive Data Visualization for the Web” (look for free preview link for the actual book draft
http://sightlinevis.com/ – many d3 examples
https://www.youtube.com/user/d3Vienno/videos?view=0&flow=grid – many tutorial videos by d3Vienno
http://www.cs171.org/2015/resources/ – list of d3 resources from Harvard CS 171 class
https://github.com/mbostock/d3/wiki/Tutorials – big list of resources from the author of D3
https://github.com/mbostock/d3/wiki/API-Reference – well-done D3 documentation
http://www.d3noob.org – free ebook with lots of tips and tricks, actively updated
http://www.jeromecukier.net/wp-content/uploads/2012/10/d3-cheat-sheet.pdf – cheat sheet for D3, also see parent site for blog posts
https://groups.google.com/forum/?fromgroups=#!forum/d3-js – D3 Google group
http://bost.ocks.org/mike/selection/ – Guide to understanding selections, key part of D3.
http://benclinkinbeard.com/talks/2012/NCDevCon/ – A talk, with interactive examples and code snippets, explaining d3
http://www.udacity.com/course/data-visualization-and-d3js–ud507 – d3.js Udacity Course
http://bl.ocks.org/curran/3a68b0c81991e2e94b19 – Responsive Visualizations (Resizing)
http://bl.ocks.org/hubgit/raw/9133448/ – Nesting CSV Data
http://bost.ocks.org/mike/nest/ – Nesting Visualization Elements
http://www.visualcinnamon.com/blog – Creative Tutorials from Nadieh Bremer