Data Visualization and Analysis (COMPSCI 590V)- Spring 2018

Data Visualization and Analysis
Spring 2018
Monday, Wednesday, Friday 1:30-2:15 pm
Hasbrouck Laboratory room 134

Instructor:
Dr. Ali Sarvghad
asarv@cs.umass.edu
CICS 324A
Office Hours: (will be posted)

Teaching Assisstants:
TBA

Overview

Information visualization is an area of research that helps people analyze and understand data using visualization techniques. The multi-disciplinary area draws from other areas of science, including human-computer interaction, data science, psychology, and art to develop new visualization methods and understand how (and why) they are effective.

Information visualization methods are applied to data from many different application domains, including:

  • Political reporting and forecasting – as seen on TV and in the papers in election season.
  • News reporting – look at the interactive visualizations used by the New York Times, Wall Street Journal, Slate, etc.
  • Social science and economic data, such as census and other surveys, and micro and macroeconomic trends.
  • Social networking and web traffic, to understand patterns of communication
  • Business intelligence and business dashboards – to forecast sales trends, understand competitive marketplace positions, allocate resources, manage production and logistics.
  • Text analysis – to determine trends and relationships for literary analysis and for information retrieval.
  • Criminal investigations – to portray the relationships between event, people, places and things.
  • Performance analysis of computer networks and systems.
  • Software engineering – developing, debugging and maintaining software.
  • Bioinformatics, to understand DNA, gene expressions, systems biology.

Course objectives

  • Learn the principles involved in information visualization
  • Understand the wide variety of information visualizations and know what visualizations are appropriate for various types of data and for different goals
  • Develop skills in critiquing different visualization techniques in the context of user goals and objectives
  • Learn how to implement compelling information visualizations

Required text

  • Visualization Analysis and Design, Tamara Munzner, CRC Press, ISBN 9781466508910. Principles and paradigmes of visuzliation desgin
  • Interactive Data Visualization for the Web, Scott Murray, O’Reilly Media, ISBN 9781449339739. All about D3, the programming tool we will be using for homework and projects. 

Grading

Grading will be based on assignments, midterm exam, class participation, and a final project.  Final course grades may be curved (but not always). Grading weights are:

Assignments 30%
Midterm Exam 20%
Class Participation 10%
Final Project 40%

 

Details about the assignments can be found under the Assignments tab.

Lectures: Monads & Wednesdays

Labs: Fridays (Please see Lab Schedule for more details)


Course Schedule (Lectures, midterm, due dates)

Week Date Topic Suggested Reading (Munzner book) Due Material Final Project Milestones
1

1/23 Course overview
1/25 Introduction Ch. 1
2

1/28 Data & task abstraction Ch. 2 & 3 Form groups
1/30
2/1 Lab
3

2/4 Marks & channels Ch. 5 Select data & problem
2/6
2/8 Lab
4

2/11 Visualizing tabular data Ch. 7
2/13
2/15 Lab
5

2/18 No class – President’s Day
2/20 Color Ch. 10
2/22 Lab
6

2/25 Visualizing  networks & trees Ch. 9 Solution proposal
2/27
3/1 Lab
7

3/4 Four levels of validation Ch. 4
3/6 Midterm Exam
3/8 Lab
8

3/11  Spring Recess. No class.
3/13
3/15
9

3/18 Visualizing spatial data Ch. 8
3/20
3/22 Lab
10

3/25 Manipulate views Ch. 12
3/27
3/29 Lab
11

4/1 Interaction Ch. 11
4/3
4/5 Lab
12

4/8 Rules of thumb Ch. 6
4/10
4/12 Lab
13

4/15 No class – Patriot’s Day
4/17 Project presentation Presentations start
4/19
14

4/22 Project Presentation
4/24
4/26
15 4/29 Project Presentation
5/1

 

Labs: Fridays

Please make sure that you have setup D3 before attending the first lab. You can find comprehensive instructions for D3 setup in Murray book Chapter 4. There are also numerous resources online that you can use to help you with the setup.

 


Labs Schedule

Week Date Topic Suggested Reading (Murray Book)
2

2/1 Drawing with SVG Ch. 6
3

2/8 Making simple charts (Bar, line, scatter plot) Ch. 6
4

2/15 Scales Ch. 7
5

2/22 Axes Ch. 8
6

3/1 Interactivity Ch. 10
7

3/8 Layouts Ch. 11
8

3/15  Spring Recess. No Labs.
9

3/22 Project work
10

3/29 Project work
11

4/5 Project work
12

4/12 Project work
13

4/19 Project work

 

Homework Assignments (HW)

These individual assignments will help you develop your knowledge for design principles for Information Visualization. For each of these, the deadline to submit your work is by the start of class on the day they are due. Unless otherwise described, the submissions must be submitted via Moodle.

The grading distribution is broken down as follows.

Recall, HW assignments are worth 30% of your overall grade, broken down as:

  • HW1: 8%
  • HW2: 8%
  • HW3: 8%
  • HW4: 6%

(assignments are adopted from Alex Endert’s InfoVis course: http://va.gatech.edu/courses/cs4460/)

Homework 1: Data Exploration and Analysis

The purpose of this assignment is to provide you with some experience of exploring and analyzing data without using an information visualization system. Below is a data set (that can be imported into Excel, or any other data viewer you want to use) about cereals. You should explore and analyze this data using Excel or simply by hand (drawing pictures is fine), but do not use any visualization tools. Also, you should avoid the visualization and charting functionality of Excel for the purpose of this assignment. Your goal here is to perform an exploratory analysis of the data set, to better understand the data set and its characteristics, and to develop insights about the cereal data.

Submission: What you turn in should consist of four things.

  1. List (bullet list of items) five analytics queries or questions that a person may have about this data set. These would be questions that an analyst examining the data might be pondering.
  2. List (bullet list of items) five “insights”, chunks of knowledge, or deeper questions that you either encountered or gained while exploring the data. Insight could be some understanding of the data and its characteristics that are not relatively obvious or intuitive. It is something that most people might not realize initially. Note that an insight or knowledge chunk simply may be a deeper question that arose in your mind while exploring the data. And your analysis may not have been sufficient to answer the question.
  3. Write one paragraph about the process you used to do the exploration and analysis. Did you load the data into Excel, work manually, or do both? What did you do in Excel? Did you draw pictures? Did you take notes? What did you take notes on? What did you draw? This paragraph should be a general description of your analytic workflow.
  4. Write one paragraph about challenges or problems that you encountered in doing the analysis this way. Did anything limit or frustrate you? If nothing did, perhaps there was something that was more difficult than you thought it should be. Nothing is perfect, so you should be able to list some potential issues here. So, to sum up, your assignment should have two bullet lists of five items followed by two paragraphs. 

Grading: We will evaluate the quality of the insights you listed and the detail given for the process you went through. We are looking for things that we find interesting or perhaps unexpected. This is subjective. For the second and third parts, we will evaluate if you did what the assignment asked.

Cereals data (xls format): a1-cereals

The data set should be pretty self-explanatory. The Manufacturer is a one letter code with the expected mapping (Q-Quaker Oats, P-Post, G-General Mills, K-Kelloggs, R-Ralston Purina, N-Nabisco) and Type is C (cold) or H (hot).

Homework 2: Visual Design

The purpose of this assignment is to provide you with practice and experience designing the appearance of data tables and basic visual charts. Below are two Excel spreadsheets. For the first (Part 1), you should create a table that presents its information as clearly and informatively as possible. Keep in mind the basic chart principles we covered in class.

For the second (Part 2), design a visual chart that does the same. Think about the data in each spreadsheet and what an analyst looking at that data would care about. You are allowed to derive new variables (attributes) that are combinations of the given ones, but you cannot make up totally new variables and values.

To create and render your designs, you can use colored pencils/markers if you’d like. You can also design, layout, and draw your ideas in a computer tool such as Illustrator, PowerPoint, Photoshop, but you cannot use those tools to do any of the design for you. That is, tools that are not allowed include: Tableau, ggplot, Spotfire, Numbers, Excel, etc. Again, you don’t need a tool for this, hand-drawn is fine.  If you want to use a tool, they should just be used as drawing tools — The ideas behind the design should be yours.

Submission: Scan or take a picture of your table and graph designs and submit to Canvas.

Grading: We will evaluate the effectiveness and design aspects of your creations, how well and how clearly they can answer a variety of questions about the data. Of course, this is subjective, but we will look for tables and graphs that apply the design recommendations discussed in class and in our readings.

Part 1 dataset: Performance of sales representatives (xls format)
Part 2 dataset: Performance of different company departments over year (xls format)

Homework 3: Use and Critique Tableau

Use and critique Tableau – an Information Visualization System that does not require programming. This assignment will familiarize you with a full-featured InfoVis system – Tableau – which will be introduced in class.

The goals of the assignment are for you to learn the capabilities provided by Tableau (it is one of the best commercial systems), learn the basic visualization methods that it provides and assess its utility in analyzing data.

Groups of 2 are allowed for this assignment! You can write a report on this homework by yourself, or you can do it with a partner (which I encourage, it will be more fun and you will learn more). Note only groups of 2 are allowed, no larger. If you write with a partner, you will both receive the same grade. You may ask others for help with downloading and figuring out how to use Tableau. The paper and its ideas should be developed by you or by your two-person team.

The assignment has four parts:

1. Gain familiarity with Tableau – Familiarize yourself with the visualization techniques and the user interfaces during the class presentation, and via online videos at http://www.tableausoftware.com/learn/training

2. Examine the data sets – Browse several data sets to decide which one to use for the rest of this assignment. Decide on one, and then use the system to explore it further.

3. Develop three interesting questions about the selected data set – put yourself in the shoes of a data analyst, and think about all the different kinds of analysis tasks that a person might want to perform. For instance, someone working with breakfast cereal data might have analysis tasks like:

• Find all the information on Cocoa Pebbles.

• Identify the cereal with the least fat that is also high in fiber.

• What is the distribution of carbohydrates in the cereals?

• Does high fat mean high calories?

• Which of the following three kinds of cereal is best for people on a diet?

Do NOT make all of your questions be about correlations or min or max values.

4. Write a report – Part 1 – List your three questions and answers, along with a screenshot showing the visualization you used to answer each question. One page per question – screenshot and narrative. Each question should be answered with a different visualization – so three different visualizations (and not just different data overlaid on a map as can be done in Gapminder). Part 2 – Critique the system. What are the system’s strengths and weaknesses? For what kinds of user tasks is the system particularly well suited? Focus more here on the visualization techniques as opposed to the particular user interface quirks, though you should feel free to comment on UI aspects when they are particularly good or bad. Describe the characteristics of the UI using the concepts and terminology you have learned in class. This second part should be close to 2 pages.

Submission: Your document should be in PDF format and is limited to a maximum of 5 pages, no cover sheet. Use Times Roman 12 point type with normal margins, 1.5 line spacing. Submit the paper via Canvas. If you worked with a partner, both of you are required to submit to Moodle and ensure both of your names are on it

Homework 4: Draw a Graph

The purpose of this assignment is to give you an appreciation of just how challenging it is to lay out a graph (network) in the plane. Below is an adjacency matrix for an undirected graph. The nodes are labeled along both sides (1-10). Inside the matrix, a 1 indicates an edge, 0 means there is no edge.

Your objective here is to come up with a positioning for all the vertices such that an aesthetically pleasing graph drawing results. Please draw the graph using a standard technique: vertices are represented by some kind of glyph such as a circle, square, etc. with the vertex number inside. Edges are simply lines draw between vertices. Follow those basics, then you are free to embellish beyond that.

Submission: Take a picture (or scan) the piece of paper you drew your graph on and submit it to Moodle. In addition, submit 1 or 2 paragraphs that describing your design process and the method or algorithm you used to create the graph. Put your name on the page with your description of your method, not on the drawing page.

This is just a short HW, so don’t spend too much time or thought on it. (It turns out that you could spend the rest of your life on it.) If you follow the instructions, you’ll receive full credit.

This document describes the semester project for the course. Students should work on a project in teams of 3-4 people.  Expectations will NOT be adjusted according to group size. Each group MUST be a combination of graduate and undergraduate students.

The idea of the project is to take the knowledge and background that you are learning this semester about Information Visualization and put it to good use in a new, creative effort. A real key to the project, however, is to select a data set that people will find interesting and intriguing. Even better would be to select a data set with a clearly identified set of “users” or “analysts” who care deeply about that data. Select a topic that people want to know more about! I cannot emphasize strongly enough the importance of your topic and data set. Use sketches and other low-effort ways to think through how ideas for different datasets. Remember interaction, and the questions that users of the data have (and even questions that your team has of the data). Think about the suite of data visualizations that the NY Times has created over the past few years, a few examples of which are listed below:

No matter what topic you choose, I am expecting a high-quality project. This project accounts for a majority of your grade in this course (40%) and will require a significant amount of time and effort. In particular, I’m seeking creative projects showcasing interesting ideas. A good project should consist of visualization designs and a software artifact that implements the designs. Interaction is key in information visualization, and it is difficult to understand the interaction issues in your project without a running system. I am explicitly not expecting user testing and evaluation. Ideally, I would like your efforts to be innovative and to result in some form of potential publication to similar venues and styles as the papers that we have read throughout the semester.

You should develop a web-deployable system so that your system can be shown to everyone in the world, and use D3 for visualizing data! Arguments will be entertained for using different visualization toolkits, but in general, D3 is preferred. Using a different toolkit should be approved by the professor prior to starting any code

Important Milestones (5) and Due Dates 

Will be updated.

D3 Resources

http://www.youtube.com/watch?v=8jvoTV54nXw – nice overview and run-through video/talk
http://alignedleft.com/tutorials/d3/ – thorough d3 tutorials from an academic instructor and the author of the open OReilly book, “Interactive Data Visualization for the Web” (look for free preview link for the actual book draft
http://sightlinevis.com/ – many d3 examples
https://www.youtube.com/user/d3Vienno/videos?view=0&flow=grid – many tutorial videos by d3Vienno
http://www.cs171.org/2015/resources/ – list of d3 resources from Harvard CS 171 class
https://github.com/mbostock/d3/wiki/Tutorials – big list of resources from the author of D3
https://github.com/mbostock/d3/wiki/API-Reference – well-done D3 documentation
http://www.d3noob.org – free ebook with lots of tips and tricks, actively updated
http://www.jeromecukier.net/wp-content/uploads/2012/10/d3-cheat-sheet.pdf – cheat sheet for D3, also see parent site for blog posts
https://groups.google.com/forum/?fromgroups=#!forum/d3-js – D3 Google group
http://bost.ocks.org/mike/selection/ – Guide to understanding selections, key part of D3.
http://benclinkinbeard.com/talks/2012/NCDevCon/ – A talk, with interactive examples and code snippets, explaining d3
http://www.udacity.com/course/data-visualization-and-d3js–ud507 – d3.js Udacity Course
http://bl.ocks.org/curran/3a68b0c81991e2e94b19 – Responsive Visualizations (Resizing)
http://bl.ocks.org/hubgit/raw/9133448/ – Nesting CSV Data
http://bost.ocks.org/mike/nest/ – Nesting Visualization Elements
http://www.visualcinnamon.com/blog – Creative Tutorials from Nadieh Bremer