Data Visualization and Exploration – Spring 2022

Tuesday, Thursday, 8:30-9:45 AM, LederCA301

Instructor: Dr. Ali Sarvghad

asarv@cs.umass.edu,  CICS 344

Office hours & location: To come

TA: dgraymullen@umass.edu

Office hours & location: : To come.

 

Course overview

Information visualization is an area of research that helps people analyze and understand data using visualization techniques. The multi-disciplinary area draws from other areas of science, including human-computer interaction, data science, psychology, and art, to develop new visualization methods and understand how (and why) they are effective.

Information visualization methods are applied to data from many different application domains, including:

  • Political reporting and forecasting – as seen on TV and in the papers in the election season.
  • News reporting – look at the interactive visualizations used by the New York Times, Wall Street Journal, Slate, etc.
  • Social science and economic data, such as census and other surveys, and micro and macroeconomic trends.
  • Social networking and web traffic to understand patterns of communication
  • Business intelligence and business dashboards – to forecast sales trends, understand competitive marketplace positions, allocate resources, manage production, and logistics.
  • Text analysis – to determine trends and relationships for literary analysis and information retrieval.
  • Criminal investigations – to portray the relationships between events, people, places, and things.
  • Performance analysis of computer networks and systems.
  • Software engineering – developing, debugging, and maintaining software.
  • Bioinformatics, to understand DNA, gene expressions, systems biology.

Course objectives

  • Learn the principles involved in information visualization
  • Understand the wide variety of information visualizations and know what visualizations are appropriate for various types of data and for different goals
  • Develop skills in critiquing different visualization techniques in the context of user goals and objectives
  • Learn how to implement compelling information visualizations

Recommended text

The following textbooks are strongly recommended for this course. Particularly, we will closely follow Tamara Munzner’s book:

  • Visualization Analysis and Design, Tamara Munzner, CRC Press, ISBN 9781466508910. Principles and paradigms of visualization design.
  • Interactive Data Visualization for the Web, Scott Murray, O’Reilly Media, ISBN 9781449339739. All about D3, the programming tool we will be using for homework and projects.

Evaluation

Grading will be based on the project deliverables, midterm exam, class participation, and final project demo.  Final course grades may be curved (but not always). Grading weights are:

Week Date Topic Activity Assignment Reading
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Project overview

The course project carries 45% of the overall course grade. This is a group project (unless the course instructor approves an individual work). Expectations will NOT be adjusted according to group size.

  • Groups

    • Each group MUST be comprised of both grad and undergrad students. The preferred size of a group is 3. Groups smaller than 2 and bigger than 4 will not be allowed (unless under special circumstances and with the instructor’s approval).
    • You can start looking for teammates as early as the first day of classes. The deadline for having a team is Feb 12,  the last day of add/drop. Reach out to us before the deadline if you have difficulty finding a team.
  • Project proposal (45%)

    • Data selection (10%)
    • Users and tasks identification (10%)
    • Data and task abstraction (10%)
    • Design (15%)
    • Image result for pointing hand icon Scroll down to see details.
  • User evaluation (Optional- Bonus 5%)

    • Report of methodology, data analysis, and findings
    • Image result for pointing hand icon Scroll down to see details.
  • Final demo (50%)

    • Live demo of the implemented data visualization tool
    • Image result for pointing hand icon Scroll down to see details.
  • Final report (5%)

    • Implementation details
    • Evaluation results (if any)
    • Image result for pointing hand icon Scroll down to see details.
  • Late and missing submission policy

    • Late submission = 50% loss of project deliverable’s mark
    • No submission = 0
    • Image result for pointing hand icon Scroll down to see details.

This project accounts for 45% of your grade in this course and will require a significant amount of time and effort. In particular, I’m seeking creative projects showcasing interesting ideas related to current challenges and issues we face today such as the impact of the pandemic on different aspects of our lives,  minorities and underrepresented communities and groups, and social equity and justice. No matter what topic you choose, I am expecting a high-quality project. A good project should consist of visualization designs and a software artifact that implements the designs. Interaction and data manipulation are keys in information visualization, and it is difficult to understand the interaction issues in your project without a running system.  Ideally, I would like your efforts to be innovative and to result in some form of potential publication to similar venues and styles as the papers that we will read throughout the semester.

You should develop a web-deployable system so that your system can be shown to everyone in the world, and use D3 for visualizing data! Arguments will be entertained for using different visualization toolkits, but in general, D3 is preferred. Using a different toolkit should be approved by the professor prior to starting any code

Two examples of previous projects can be found here and here.


Project details

The idea of the project is to take the knowledge and background that you are learning this semester about Information Visualization and put it to good use in a new, creative effort. A real key to the project, however, is to select a data set that people will find interesting and intriguing. Even better would be to select a data set with a clearly identified set of “users” or “analysts” who care deeply about that data. Select a topic that people want to know more about! I cannot emphasize strongly enough the importance of your topic and data set.

Project proposal

The project proposal is a document that you will gradually complete. The proposal has four major components, listed below:

1- Data selection (10%). Due on: 2/18

You start your project by selecting a dataset, and a problem. You can use your own data (from a school project, self-quantified, etc) or find and select publicly available data online:

  • BYOD (Bring Your Own Data)
    • you (or your teammates) have your own data to analyze such as:
      • thesis/research topic
      • personal interest
      • dovetail with another course (sometimes works, but timing may be tricky)
  • FDOI (Find Data of Interest)
    • many existing datasets on the internet (e.g., KaggleUN Data)
    • Can be tricky to determine reasonable analysis tasks that users may want to do

If you select a tabular dataset online, there should contain a minimum of 10 variables and 500 records. We will consider smaller datasets upon approval by the instructor.

The deliverable in this part is a document with information about the data set, such as personal or public, the name of the dataset, and a one-paragraph description of data origin and high-level semantics.

All the submitted documents for this course should be single-spaced with a 12 font size.  

2-Users and tasks identification (10%). Due on: 2/25

In this part, you identify describe the intended user(s) who need to explore, analyze, and make sense of data. These descriptions can be based on real users or personas (what is a persona?). We encourage you to consider 3-4 different users, with varying interest in the data to enrich the space of exploration (and tool design). Each of these users needs to understand and analyze data from  (slightly to very) different perspectives. Remember the tool you design will need to accommodate these users and their analytical needs. You will create a user scenario for each identified user (what is a user scenario?). Each scenario includes the following:

  • Background – who are your users (including their knowledge base and skillset/s)?
  • Motivations – what goals do they want to achieve?
  • Tasks – what must they do to reach those goals?
  • Context of use – how will they encounter your design?
    • Challenges – when they try to use it, what can get in their way (e.g., signal loss)?

The deliverable in this part is user scenarios. For each identified user, you will write a one-paragraph user scenario.

3 – Data and task abstraction (10%). Due on: 3/9

In this part of the proposal, you will apply data and task abstraction techniques that will be covered in lectures to describe and transform the data and user tasks from user space to visualization design space.

The deliverables of this part are data and task abstractions  (1-3 pages each)

4- Design (15%). Due on: 3/25

In this part, you will propose the design of the visualization tool.

The deliverable is a 3-5 pages document, explaining why and how your proposed design will help the (previously identified) users to achieve their data analysis goals. This document should include medium or high fidelity prototypes of your solution (what is prototyping?).


User Evaluation

Due to the current pandemic and difficulty of performing user studies, user evaluation is optional. Performing a user evaluation, however, carries a 5% bonus mark towards the overall project assessment. We will cover common methods of visualization evaluation in lectures. The evaluation must be completed before the final project demo and reported in the final report.


Final Demo. Due on: 4/29

Final project demos will be held online during the last two or three classes (depending on the number of teams). Each group will be given 12 minutes to present their work. The 12 minutes should be roughly broken down as follows:

  • a short slideshow, presenting data, user, and tasks (~5 mins)
  • live demo of the tool (~5 mins)
  • question and answer (~2 mins)

The presentation date and order will be assigned randomly. Therefore, you need to make sure that your tool is ready before the first date when presentations start.

Demos will be evaluated according to these high-level criteria:

  1. The match between the defined user tasks and the tool’s functionalities
  2. Design of effective and efficient visualizations that match data type and support user tasks
  3. Supporting interactivity
  4. Supporting data manipulation and transformation
  5. Supporting exploration

Final Report. Due date: 5/14

The final report will provide implementation details of the tool such as architecture and technologies used. This will be a 1-2 page document. In the case of any user evaluations, you should add 1-2 pages to the final report and describe the methodology used, participants, description of collected data, and your high-level findings.


Late and missing submission policy

Late submissions are allowed for the project deliverables, however, you will lose 50% of the mark allocated to the specific deliverable. Missing submission results in the total loss of the deliverable’s mark.

If your late or missing submission is due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.


Tips for a Successful Project

It is extremely important to select an interesting problem with data that some group of people will care deeply about. I cannot stress enough how vital it is to start with interesting data. Find some topic that almost everyone cares about or that some subset of people really cares about. Consider combining different data sets to produce a new composite data set of special interest. Such a fusion of data often creates a dataset that people want to learn about. Remember that this often takes time and effort to “fuse” multiple data types, so you want to make sure you pick them wisely (i.e., they should be in support of the questions that you want people to be able to answer using your tool).

Two possible styles of successful visualization projects (definitely space is not limited to these two):

In the first style, the group created a visualization system that has only one view/representation but this representation is new and creative. Here, you should focus on designing an innovative new visual representation. The actual user interface may have different components or pieces, but it should be tightly integrated. The real focus here is on creativity and innovation, and the novel representation of the information. These projects emphasize the mappings between the data (and characteristics/variables of the data) to visual encodings, glyphs, and metaphors.

The second type of successful project employs multiple coordinated views where each view may use some well-known visualization techniques, perhaps customized a little for this problem. The emphasis in this type of project is to create a sound, functional system implementation that clearly can be of help for data analysis and understanding. It is important in this type of project to have coordinated views that work well together and provide different perspectives on the data. This type of project does not have the same level of visualization innovation as the first, but it comes together in strong system implementation, including well-designed user interactions that allow users to explore the data and progress through their task to answer the questions they may have of the data.


Example projects

Project on the Food Environment Atlas

Project on the OSMI Mental Health Data

To come.

Required Reading (10%)

Regularly during the term, you will be assigned research papers related to the topics that have been covered in the class to read. The papers will be posted on Moodle.  All the students MUST read the paper(s) and provide a summary(s) of the research on the associated Moodle forum. In addition to the summary, graduate students MUST post at least one question/critique about the paper and answer others’ questions/critiques. This is optional for undergrads though we encourage them to participate in discussions around the papers. Summary(s) are due one week after the date on which the paper is posted.

Your summary MUST include the following:

  • Explain the problem that the paper investigates. Why is it important? Whom does it affect?
  • How the authors propose to investigate/solve this problem?
  • Why & how the proposed solution can address the problem?
  • How do they evaluate their solution? What methodology do they use in their evaluation?
  • What are the most important findings of their evaluation?

We will read all of your summaries and questions/critiques. However, we will NOT provide you with feedback unless your work is below the bar or is missing.

You can find many guidelines online that describe how to read and summarize a research paper. Here is an example from Harvard. You can also talk to me or the TA if you have any questions about reading and summarizing research papers.

Late and missing submission policy

Late submission is allowed for reading, however, you will lose 50% of the mark allocated to the reading. Missing submission results in the total loss of the reading’s mark.

If your late or missing submission is due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.

Reading links

Reading 1: Bridging From Goals to Tasks with Design Study Analysis Reports

Reading 2: Crowdsourcing graphical perception

Reading 3: Task-Based Effectiveness of Basic Visualizations

Reading 4: NodeTrix: A Hybrid Visualization of Social Networks

Reading 5: Graph-Theoretic Scagnostics

Reading 6: Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data

Reading 7: A Systematic Review on the Practice of Evaluating Visualization