Data Visualization and Exploration – Spring 2021
Data Visualization and Analysis
Spring 2021 Tuesday, Thursday, 8:30-9:45 AM
Instructor: Dr. Ali Sarvghad
asarv@cs.umass.edu, CICS 344
Virtual office hours: Monday 2:30 – 3:30 pm. Zoom link
Teaching Assistant: Joshua Levine
joshualevine@umass.edu
Virtual office hours: Friday 1:30-3 pm. Zoom link
Course delivery policy for Spring 2021
Welcome to Data Visualization and Exploration course (590V). Due to the COVID-19 pandemic, the course will be offered remotely. Lectures will be pre-recorded and provided asynchronously, and discussion sessions will be synchronous. We will record and publish discussions for those of you who can not attend the live sessions due to a major time difference. Please see the class schedule for more details about the lectures and discussion topics.
Course overview
Information visualization is an area of research that helps people analyze and understand data using visualization techniques. The multi-disciplinary area draws from other areas of science, including human-computer interaction, data science, psychology, and art, to develop new visualization methods and understand how (and why) they are effective.
Information visualization methods are applied to data from many different application domains, including:
- Political reporting and forecasting – as seen on TV and in the papers in the election season.
- News reporting – look at the interactive visualizations used by the New York Times, Wall Street Journal, Slate, etc.
- Social science and economic data, such as census and other surveys, and micro and macroeconomic trends.
- Social networking and web traffic to understand patterns of communication
- Business intelligence and business dashboards – to forecast sales trends, understand competitive marketplace positions, allocate resources, manage production, and logistics.
- Text analysis – to determine trends and relationships for literary analysis and information retrieval.
- Criminal investigations – to portray the relationships between events, people, places, and things.
- Performance analysis of computer networks and systems.
- Software engineering – developing, debugging, and maintaining software.
- Bioinformatics, to understand DNA, gene expressions, systems biology.
Course objectives
- Learn the principles involved in information visualization
- Understand the wide variety of information visualizations and know what visualizations are appropriate for various types of data and for different goals
- Develop skills in critiquing different visualization techniques in the context of user goals and objectives
- Learn how to implement compelling information visualizations
Recommended text
The following textbooks are strongly recommended for this course. Particularly, we will closely follow Tamara Munzner’s book:
- Visualization Analysis and Design, Tamara Munzner, CRC Press, ISBN 9781466508910. Principles and paradigms of visualization design.
- Interactive Data Visualization for the Web, Scott Murray, O’Reilly Media, ISBN 9781449339739. All about D3, the programming tool we will be using for homework and projects.
Evaluation
Grading will be based on the project deliverables, midterm exam, class participation, and final project demo. Final course grades may be curved (but not always). Grading weights are:
Midterm exam | 25% |
Homework & Discussion | 20% |
Readings | 10% |
Course project & deliverables | 45% |
Project details can be found under the Project tab.
Course Schedule (Lectures, midterm, due dates)
Week | Date | Activity | Topic | Homework | Reading | Project Due dates |
1 | 2/2 | Lecture | Course introduction | |||
2/4 | Lecture | Data and task abstraction | HW-1: data and task abstraction | P1: Bridging From Goals to Tasks with Design Study Analysis Reports | ||
2 | 2/9 | Discussion | HW-1 (due 8th) | |||
2/11 | Lecture | D3: set up, drawing with SVG | HW-2: driving with SVG | Project: groups (12th), HW-2 (15th) | ||
3 | 2/16 | Discussion | HW-2 (due 15th) | |||
2/18 | Lecture | Marks & Channels | HW-3: Marks & channels | P2: Crowdsourcing graphical perception | Project: dataset | |
4 | 2/23 | Discussion | HW-3 (due 22nd) | |||
2/25 | Lecture | Color Maps – Visualizing tabular data | HW-4: colormaps & tabular data | P3: Task-Based Effectiveness of Basic Visualizations | Project: user and tasks | |
5 | 3/2 | Discussion | HW-4 (due 1st) | |||
3/4 | Lecture | D3: making basic charts, scales, axes | HW-5: practice building charts with D3 | |||
6 | 3/9 | Discussion | HW-5 (due 8th) | Project: data & task abstraction | ||
3/11 | Lecture | Visualizing networks and Trees | P4: NodeTrix: A Hybrid Visualization of Social Networks | |||
7 | 3/16 | MIDTERM | ||||
3/18 | No Class | No Class | ||||
8 | 3/23 | Lecture | Handle data complexity | P5: Graph-Theoretic Scagnostics | ||
3/25 | Discussion | Project: design | ||||
9 | 3/30 | Lecture | D3: interactivity, layouts | |||
4/1 | Discussion | HW-6 (due before discussion) | ||||
10 | 4/6 | Lecture | Uncertainty visualization | P6: Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data | ||
4/8 | Lecture | Evaluation techniques in Visualization | P7: A Systematic Review on the Practice of Evaluating Visualization | |||
11 | 4/13 | Lecture | Storytelling with data | |||
4/15 | Cancelled | Cancelled | ||||
12 | 4/20 | No class | No class | |||
4/22 | Guest lecture: Cindy Xiao | Cognitive biases and visual data analysis | ||||
13 | 4/27 | Guest lecture: Narges Mahyar | Application example: Digital civics | |||
4/29 | Final Project Demo | |||||
14 | 5/4 | Final Project Demo |
Project overview
The course project carries 45% of the overall course grade. This is a group project (unless the course instructor approves an individual work). Expectations will NOT be adjusted according to group size.
Groups
Each group MUST be comprised of both grad and undergrad students. The preferred size of a group is 3. Groups smaller than 2 and bigger than 4 will not be allowed (unless under special circumstances and with the instructor’s approval).
- You can start looking for teammates as early as the first day of classes. The deadline for having a team is Feb 12, the last day of add/drop. Reach out to us before the deadline if you have difficulty finding a team.
Project proposal (45%)
- Data selection (10%)
- Users and tasks identification (10%)
- Data and task abstraction (10%)
- Design (15%)
- Scroll down to see details.
User evaluation (Optional- Bonus 5%)
- Report of methodology, data analysis, and findings
- Scroll down to see details.
Final demo (50%)
- Live demo of the implemented data visualization tool
- Scroll down to see details.
Final report (5%)
- Implementation details
- Evaluation results (if any)
- Scroll down to see details.
Late and missing submission policy
- Late submission = 50% loss of project deliverable’s mark
- No submission = 0
- Scroll down to see details.
This project accounts for 45% of your grade in this course and will require a significant amount of time and effort. In particular, I’m seeking creative projects showcasing interesting ideas related to current challenges and issues we face today such as the impact of the pandemic on different aspects of our lives, minorities and underrepresented communities and groups, and social equity and justice. No matter what topic you choose, I am expecting a high-quality project. A good project should consist of visualization designs and a software artifact that implements the designs. Interaction and data manipulation are keys in information visualization, and it is difficult to understand the interaction issues in your project without a running system. Ideally, I would like your efforts to be innovative and to result in some form of potential publication to similar venues and styles as the papers that we will read throughout the semester.
You should develop a web-deployable system so that your system can be shown to everyone in the world, and use D3 for visualizing data! Arguments will be entertained for using different visualization toolkits, but in general, D3 is preferred. Using a different toolkit should be approved by the professor prior to starting any code
Two examples of previous projects can be found here and here.
Project details
The idea of the project is to take the knowledge and background that you are learning this semester about Information Visualization and put it to good use in a new, creative effort. A real key to the project, however, is to select a data set that people will find interesting and intriguing. Even better would be to select a data set with a clearly identified set of “users” or “analysts” who care deeply about that data. Select a topic that people want to know more about! I cannot emphasize strongly enough the importance of your topic and data set.
Project proposal
The project proposal is a document that you will gradually complete. The proposal has four major components, listed below:
1- Data selection (10%). Due on: 2/18
You start your project by selecting a dataset, and a problem. You can use your own data (from a school project, self-quantified, etc) or find and select publicly available data online:
- BYOD (Bring Your Own Data)
- you (or your teammates) have your own data to analyze such as:
- thesis/research topic
- personal interest
- dovetail with another course (sometimes works, but timing may be tricky)
- you (or your teammates) have your own data to analyze such as:
- FDOI (Find Data of Interest)
If you select a tabular dataset online, there should contain a minimum of 10 variables and 500 records. We will consider smaller datasets upon the approval by the instructor.
The deliverable in this part is a document with information about the data set, such as personal or public, the name of the dataset, and a one-paragraph description of data origin and high-level semantics.
All the submitted documents for this course should be single-spaced with a 12 font size.
2-Users and tasks identification (10%). Due on: 2/25
In this part, you identify describe the intended user(s) who need to explore, analyze, and make sense of data. These descriptions can be based on real users or personas (what is a persona?). We encourage you to consider 3-4 different users, with varying interest in the data to enrich the space of exploration (and tool design). Each of these users needs to understand and analyze data from (slightly to very) different perspectives. Remember the tool you design will need to accommodate these users and their analytical needs. You will create a user scenario for each identified user (what is a user scenario?). Each scenario includes the following:
- Background – who are your users (including their knowledge base and skillset/s)?
- Motivations – what goals do they want to achieve?
- Tasks – what must they do to reach those goals?
- Context of use – how will they encounter your design?
- Challenges – when they try to use it, what can get in their way (e.g., signal loss)?
The deliverable in this part is user scenarios. For each identified user, you will write a one-paragraph user scenario.
3 – Data and task abstraction (10%). Due on: 3/9
In this part of the proposal, you will apply data and task abstraction techniques that will be covered in lectures to describe and transform the data and user tasks from user space to visualization design space.
The deliverables of this part are data and task abstractions (1-3 pages each)
4- Design (15%). Due on: 3/25
In this part, you will propose the design of the visualization tool.
The deliverable is a 3-5 pages document, explaining why and how your proposed design will help the (previously identified) users to achieve their data analysis goals. This document should include medium or high fidelity prototypes of your solution (what is prototyping?).
User Evaluation
Due to the current pandemic and difficulty of performing user studies, user evaluation is optional. Performing a user evaluation, however, carries a 5% bonus mark towards the overall project assessment. We will cover common methods of visualization evaluation in lectures. The evaluation must be completed before the final project demo and reported in the final report.
Final Demo. Due on: 4/29
Final project demos will be held online during the last two or three classes (depending on the number of teams). Each group will be given 12 minutes to present their work. The 12 minutes should be roughly broken down as follows:
- a short slideshow, presenting data, user, and tasks (~5 mins)
- live demo of the tool (~5 mins)
- question and answer (~2 mins)
The presentation date and order will be assigned randomly. Therefore, you need to make sure that your tool is ready before the first date when presentations start.
Demos will be evaluated according to these high-level criteria:
- The match between the defined user tasks and the tool’s functionalities
- Design of effective and efficient visualizations that match data type and support user tasks
- Supporting interactivity
- Supporting data manipulation and transformation
- Supporting exploration
Final Report. Due date: 5/14
The final report will provide implementation details of the tool such as architecture and technologies used. This will be a 1-2 page document. In the case of any user evaluations, you should add 1-2 pages to the final report and describe the methodology used, participants, description of collected data, and your high-level findings.
Late and missing submission policy
Late submissions are allowed for the project deliverables, however, you will lose 50% of the mark allocated to the specific deliverable. Missing submission results in the total loss of the deliverable’s mark.
If your late or missing submission is due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.
Tips for a Successful Project
It is extremely important to select an interesting problem with data that some group of people will care deeply about. I cannot stress enough how vital it is to start with interesting data. Find some topic that almost everyone cares about or that some subset of people really cares about. Consider combining different data sets to produce a new composite data set of special interest. Such a fusion of data often creates a dataset that people want to learn about. Remember that this often takes time and effort to “fuse” multiple data types, so you want to make sure you pick them wisely (i.e., they should be in support of the questions that you want people to be able to answer using your tool).
Two possible styles of successful visualization projects (definitely space is not limited to these two):
In the first style, the group created a visualization system that has only one view/representation but this representation is new and creative. Here, you should focus on designing an innovative new visual representation. The actual user interface may have different components or pieces, but it should be tightly integrated. The real focus here is on creativity and innovation, and the novel representation of the information. These projects emphasize the mappings between the data (and characteristics/variables of the data) to visual encodings, glyphs, and metaphors.
The second type of successful project employs multiple coordinated views where each view may use some well-known visualization techniques, perhaps customized a little for this problem. The emphasis in this type of project is to create a sound, functional system implementation that clearly can be of help for data analysis and understanding. It is important in this type of project to have coordinated views that work well together and provide different perspectives on the data. This type of project does not have the same level of visualization innovation as the first, but it comes together in strong system implementation, including well-designed user interactions that allow users to explore the data and progress through their task to answer the questions they may have of the data.
Example projects
Required Reading (10%)
Regularly during the term, you will be assigned research papers related to the topics that have been covered in the class to read. The papers will be posted on Moodle. All the students MUST read the paper(s) and provide a summary(s) of the research on the associated Moodle forum. In addition to the summary, graduate students MUST post at least one question/critique about the paper and answer others’ questions/critiques. This is optional for undergrads though we encourage them to participate in discussions around the papers. Summary(s) are due one week after the date on which the paper is posted.
Your summary MUST include the following:
- Explain the problem that the paper investigates. Why is it important? Whom does it affect?
- How the authors propose to investigate/solve this problem?
- Why & how the proposed solution can address the problem?
- How do they evaluate their solution? What methodology do they use in their evaluation?
- What are the most important findings of their evaluation?
We will read all of your summaries and questions/critiques. However, we will NOT provide you with feedback unless your work is below the bar or is missing.
You can find many guidelines online that describe how to read and summarize a research paper. Here is an example from Harvard. You can also talk to me or the TA if you have any questions about reading and summarizing research papers.
Late and missing submission policy
Late submission is allowed for reading, however, you will lose 50% of the mark allocated to the reading. Missing submission results in the total loss of the reading’s mark.
If your late or missing submission is due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.
Reading links
Reading 1: Bridging From Goals to Tasks with Design Study Analysis Reports
Reading 2: Crowdsourcing graphical perception
Reading 3: Task-Based Effectiveness of Basic Visualizations
Reading 4: NodeTrix: A Hybrid Visualization of Social Networks
Reading 5: Graph-Theoretic Scagnostics
Reading 6: Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data
Reading 7: A Systematic Review on the Practice of Evaluating Visualization
Homework
Throughout the course, you will be given 6 homework that you will work on individually and submit your work on Moodle. Homeworks cover topics covered in visualization and D3 lectures. We will discuss answers to the homework in the discussion sessions. Each homework is due before its discussion session. See class schedule for details. We will mark and return your homework to you after the discussion session.
Late and missing submission policy
Late submission is allowed for homework, however, you will lose 50% of the mark allocated to the homework. Missing submission results in the total loss of the homework’s mark.
If your late or missing submission is due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.
Midterm
The midterm will be on March 16th. It will be a take-home exam with a 24 hour return time. Questions will be open-ended, requiring you to draw on the knowledge you have gained so far in the course to answer.
Late and missing submission policy
Late submission is NOT allowed for the midterm exam.
If you miss the midterm due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.
D3 Resources
http://www.youtube.com/watch?v=8jvoTV54nXw – nice overview and run-through video/talk
http://alignedleft.com/tutorials/d3/ – thorough d3 tutorials from an academic instructor and the author of the open OReilly book, “Interactive Data Visualization for the Web” (look for free preview link for the actual book draft
https://www.youtube.com/user/d3Vienno/videos?view=0&flow=grid – many tutorial videos by d3Vienno
http://www.cs171.org/2015/resources/ – list of d3 resources from Harvard CS 171 class
https://github.com/mbostock/d3/wiki/Tutorials – big list of resources from the author of D3
https://github.com/mbostock/d3/wiki/API-Reference – well-done D3 documentation
http://www.d3noob.org – free ebook with lots of tips and tricks, actively updated
https://groups.google.com/forum/?fromgroups=#!forum/d3-js – D3 Google group
http://bost.ocks.org/mike/selection/ – Guide to understanding selections, key part of D3.
http://bcc-talks.surge.sh/2012/NCDevCon/#/ – A talk, with interactive examples and code snippets, explaining d3
http://www.udacity.com/course/data-visualization-and-d3js–ud507 – d3.js Udacity Course
http://bl.ocks.org/curran/3a68b0c81991e2e94b19 – Responsive Visualizations (Resizing)
http://bl.ocks.org/hubgit/raw/9133448/ – Nesting CSV Data
http://bost.ocks.org/mike/nest/ – Nesting Visualization Elements
http://www.visualcinnamon.com/blog – Creative Tutorials from Nadieh Bremer