Data Visualization and Exploration – Spring 2022

[vc_row][vc_column][vc_tta_tabs style=”modern” shape=”round” color=”black” active_section=”1″][vc_tta_section title=”Course overview” tab_id=”1641229816402-d8bc63be-6317″][vc_column_text]

Tuesday, Thursday, 8:30-9:45 AM, LederCA301

Instructor: Dr. Ali Sarvghad

asarv@cs.umass.edu,  CICS 344

Office hours & location: Thursdays, 10-12, LGRC A217G

Other times, by appointment, Zoom.

TA: Declan Gray-Mullen

dgraymullen@umass.edu

Office hours & location: Tuesday’s 3pm – 4pm, Thursday’s 11:30am – 12:30pm, LRTC T220

Course overview

Information visualization is an area of research that helps people analyze and understand data using visualization techniques. The multi-disciplinary area draws from other areas of science, including human-computer interaction, data science, psychology, and art, to develop new visualization methods and understand how (and why) they are effective.

Information visualization methods are applied to data from many different application domains, including:

  • Political reporting and forecasting – as seen on TV and in the papers in the election season.
  • News reporting – look at the interactive visualizations used by the New York Times, Wall Street Journal, Slate, etc.
  • Social science and economic data, such as census and other surveys, and micro and macroeconomic trends.
  • Social networking and web traffic to understand patterns of communication
  • Business intelligence and business dashboards – to forecast sales trends, understand competitive marketplace positions, allocate resources, manage production, and logistics.
  • Text analysis – to determine trends and relationships for literary analysis and information retrieval.
  • Criminal investigations – to portray the relationships between events, people, places, and things.
  • Performance analysis of computer networks and systems.
  • Software engineering – developing, debugging, and maintaining software.
  • Bioinformatics, to understand DNA, gene expressions, systems biology.

Course objectives

  • Learn the principles involved in information visualization
  • Understand the wide variety of information visualizations and know what visualizations are appropriate for various types of data and for different goals
  • Develop skills in critiquing different visualization techniques in the context of user goals and objectives
  • Learn how to implement compelling information visualizations

Recommended text

The following textbooks are strongly recommended for this course. Particularly, we will closely follow Tamara Munzner’s book:

  • Visualization Analysis and Design, Tamara Munzner, CRC Press, ISBN 9781466508910. Principles and paradigms of visualization design.
  • Interactive Data Visualization for the Web, Scott Murray, O’Reilly Media, ISBN 9781449339739. All about D3, the programming tool we will be using for homework and projects.

Evaluation

Grading will be based on the project deliverables, midterm exam, in-class exercises, and final project demo.  Final course grades may be curved (but not always). Grading weights are:

 

Course project (including deliverables and demo) 45%
Midterm exam 25%
Readings and discussions 15%
In-class exercises 15%

[/vc_column_text][/vc_tta_section][vc_tta_section title=”Schedule” tab_id=”1641229816402-1d727827-c7b5″][vc_column_text]

Week Date Topic Activity To do for next class Reading Due
1 1/25 Course introduction
1/27 Data & Task Abstraction Lecture Start forming groups Bridging From Goals to Tasks with Design Study Analysis Reports
2 2/1 Marks & Channels Lecture Select dataset, identify users Crowdsourcing graphical perception
2/3 D3: set up, drawing with SVG Lecture
3 2/8

In-class exercise:

data & task abstraction

Exercise Final project groups names
2/10

In-class exercise:

marks & channels

Exercise
4 2/15

Color

In-class exercise:

designing colormaps

Lecture & Exercise There is more to encodings than meets the eye
Visualzing tabular data Lecture
5 2/22 No class – Monday schedule
2/24

In-class exercise:

visualizing tabular data

Exercise Task-Based Effectiveness of Basic Visualizations  
6 3/1 D3: making basic charts, scales, axes Lecture
3/3 Visualizing networks and trees Lecture

Final project deliverable 1:

Data & Task abstraction report

7 3/8 Review and Q&A Review
3/10 Midterm exam
8 3/15 No class – Spring Recess
3/17
9 3/22 Visualization Dashboards Lecture
3/24  Interaction Lecture

Final project deliverable 2:

Dashboard prototype

10 3/29 D3: interactivity, layouts Lecture

In-class exercise:

networks & trees

Exercise
11 4/5 Handling data complexity Lecture
4/7 Uncertainty visualization Lecture
12 4/12 Text visualization Guest lecture: Mahmood Jasim
4/14 Data visualization in civics Guest lecture: Narges Mahyar
13 4/19 Evaluation techniques in data visualization lecture
4/21 Storytelling with data Lecture
14 4/26 Project Presentation
4/28 Project presentation
15 5/3 Project presentation

[/vc_column_text][/vc_tta_section][vc_tta_section title=”Project” tab_id=”1641229833331-4e438430-7f2e”][vc_column_text]

Project overview

The course project carries 45% of the overall course grade. This is a group project (unless the course instructor approves an individual work). Expectations will NOT be adjusted according to group size.

  • Groups

    • Each group MUST be comprised of both grad and undergrad students. The preferred size of a group is 3. Groups smaller than 2 and bigger than 4 will not be allowed (unless under special circumstances and with the instructor’s approval).
    • You can start looking for teammates as early as the first day of classes. The deadline for having a team is Feb 8,  the last day of add/drop. Reach out to us before the deadline if you have difficulty finding a team.
  • Deliverable 1: (40%) 

    • Description of the dataset (5%)
    • User identification (10%)
    • Tasks identification (10%)
    • Data and task abstraction (15%)
    • Image result for pointing hand icon Scroll down to see details.
  • Deliverable 2: (15%)
    • Dashboard prototype (medium of high-fidelity)
    • Image result for pointing hand icon Scroll down to see details.
  • Final demo (45%)

    • Live demo of the implemented data visualization dashboard
    • Image result for pointing hand icon Scroll down to see details.
  • User evaluation (Optional- Bonus 5%)

    • Report of methodology, data analysis, and findings
    • Image result for pointing hand icon Scroll down to see details.
  • Late and missing submission policy

    • Late submission = 50% loss of project deliverable’s mark
    • No submission = 0
    • Image result for pointing hand icon Scroll down to see details.

This project accounts for 45% of your grade in this course and will require a significant amount of time and effort. In particular, I’m seeking creative projects showcasing interesting ideas related to current challenges and issues we face today such as the impact of the pandemic on different aspects of our lives,  minorities and underrepresented communities and groups, and social equity and justice. No matter what topic you choose, I am expecting a high-quality project and a functioning visualization dashboard.

You can use any technology and programming language that enable you to build and run your dashboard. The choice of technology will not have an impact on the evaluation of your work.

An example of previous projects can be found here.


Project details

The idea of the project is to take the knowledge and background that you are learning this semester about Information Visualization and put it to good use in a new, creative effort. A real key to the project, however, is to select a data set that people will find interesting and intriguing. Even better would be to select a data set with a clearly identified set of “users” or “analysts” who care deeply about that data. Select a topic that people want to know more about! I cannot emphasize strongly enough the importance of your topic and data set.

Deliverable 1. Due on 2/24

This deliverable will be a PDF document, containing the following sections. All the submitted documents for this course should be single-spaced with a 12 font size.

1- Description of the dataset (5%). 

You start your project by selecting a dataset. You can use your own data (from a school project, self-quantified, etc) or find and select publicly available data online:

  • BYOD (Bring Your Own Data)
    • you (or your teammates) have your own data to analyze such as:
      • thesis/research topic
      • personal interest
      • dovetail with another course (sometimes works, but timing may be tricky)
  • FDOI (Find Data of Interest)
    • many existing datasets on the internet (e.g., KaggleUN Data)
    • Can be tricky to determine reasonable analysis tasks that users may want to do

If you select a tabular dataset online, there should at least be 8 variables and 1000 records.

Data description is a short  (1-2 paragraphs) document with information about the data set, such as personal or public, the name of the dataset, and what the data is about.

2-Users identification (10%). 

In this part, you identify describe the intended user(s) who need to explore, analyze, and make sense of data. These descriptions can be based on real users or personas (what is a persona?). We encourage you to consider 3-4 different users, with varying interests in the data to enrich the space of exploration (and tool design). Each of these users needs to understand and analyze data from  (slightly to very) different perspectives. Remember the tool you design will need to accommodate these users and their analytical needs. You will create a user scenario for each identified user (what is a user scenario?). Each scenario includes the following:

  • Background – who are your users (including their knowledge base and skillset/s)?
  • Motivations – what goals do they want to achieve?
  • Tasks – what must they do to reach those goals?
  • Context of use – how will they encounter your design?
  • Challenges – when they try to use it, what can get in their way (e.g., signal loss)?

User identification should be 3-4 (or more if needed) pages with information about the users.

3 – Task identification (10%).

Identify tasks that your different users would want to carry out exploring the dataset. At this point, you will describe the tasks in a narrative form such as “User A wants to understand the trends of climate change across various geographical locations and correlate the observed changes with different natural phenomena and human activities”. For each identified user in the previous step, identify 3-4 tasks.

Task identification should be 3-4 (or more if needed) pages with information about the tasks.

4- Data and task abstraction (15%)

In this part of deliverable 1, you will apply data and task abstraction techniques that will be covered in lectures to describe and transform the data and user tasks from user space to visualization design space. 

Data and task abstraction should be 3-4 (or more if needed) pages.

What to submit: a PDF containing your work for parts 1-4. Each group will submit a single document that includes all members’ names.


Deliverable 2. Due on 3/24

Prototype Design (15%).

In this part, you will propose and prototype your visualization dashboard. The prototype can be low/medium fidelity (What is prototyping-1?  and what is prototyping-2?).

What to submit: The deliverable is a 3-5 (more if needed) pages PDF document that includes images of your prototype along with descriptions that will help us to understand why and how your proposed design will enable the (previously identified) users to achieve their data analysis tasks.


User Evaluation (optional, 5% bonus)

Performing a user evaluation for your dashboard. We will cover common methods of visualization evaluation in lectures. The evaluation must be completed before the final project demo and reported in a separate PDF document.


Final Demo. (Tentatively will start on 4/26)

Final project demos will be held online during the last two or three classes (depending on the number of teams). Each group will be given 15 minutes to present their work. The 15 minutes should be roughly broken down as follows:

  • a short slideshow, presenting data, user, and tasks (~3 mins)
  • live demo of the tool (~10 mins)
  • question and answer (~2 mins)

The presentation date and order will be assigned randomly. Therefore, you need to make sure that your tool is ready before the first date when presentations start.

Demos will be evaluated according to these high-level criteria:

  1. The match between the defined user tasks and the tool’s functionalities
  2. Design of effective and efficient visualizations that match data type and support user tasks
  3. Supporting interactivity
  4. Supporting data manipulation and transformation
  5. Supporting exploration

Late and missing submission policy

Late submissions are allowed for the project deliverables, however, you will lose 50% of the mark allocated to the specific deliverable. Missing submission results in the total loss of the deliverable’s mark.

If your late or missing submission is due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.


Tips for a Successful Project

It is extremely important to select an interesting problem with data that some group of people will care deeply about. I cannot stress enough how vital it is to start with interesting data. Find some topic that almost everyone cares about or that some subset of people really cares about. Consider combining different data sets to produce a new composite data set of special interest. Such a fusion of data often creates a dataset that people want to learn about. Remember that this often takes time and effort to “fuse” multiple data types, so you want to make sure you pick them wisely (i.e., they should be in support of the questions that you want people to be able to answer using your tool).

Two possible styles of successful visualization projects (definitely space is not limited to these two):

In the first style, the group created a visualization system that has only one view/representation but this representation is new and creative. Here, you should focus on designing an innovative new visual representation. The actual user interface may have different components or pieces, but it should be tightly integrated. The real focus here is on creativity and innovation, and the novel representation of the information. These projects emphasize the mappings between the data (and characteristics/variables of the data) to visual encodings, glyphs, and metaphors.

The second type of successful project employs multiple coordinated views where each view may use some well-known visualization techniques, perhaps customized a little for this problem. The emphasis in this type of project is to create a sound, functional system implementation that clearly can be of help for data analysis and understanding. It is important in this type of project to have coordinated views that work well together and provide different perspectives on the data. This type of project does not have the same level of visualization innovation as the first, but it comes together in strong system implementation, including well-designed user interactions that allow users to explore the data and progress through their task to answer the questions they may have of the data.[/vc_column_text][/vc_tta_section][vc_tta_section title=”Midterm” tab_id=”1641229835700-827a8144-5927″][vc_column_text]

Midterm (25%)

The midterm carries 25% of the overall course assessment and will be on March 10th. It will be an in-class exam. Questions will be open-ended, requiring you to draw on the knowledge you have gained so far in the course to answer.

Missed exam policy

If you miss the midterm due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.[/vc_column_text][/vc_tta_section][vc_tta_section title=”Reading & Discussion” tab_id=”1641229855520-2993fdb6-dc7b”][vc_column_text]

Required Reading (15%)

Regularly during the term, you will be assigned research papers related to the topics that have been covered in the class to read. The papers will be posted on Moodle.  All the students MUST read the paper(s) and provide a summary(s) of the research on the associated Moodle forum. In addition to the summary, graduate students MUST post at least one question/critique about the paper and answer others’ questions/critiques. This is optional for undergrads though we encourage them to participate in discussions around the papers. Summary(s) are due one week after the date on which the paper is posted.

Your summary MUST include the following:

  • Explain the problem that the paper investigates. Why is it important? Whom does it affect?
  • How the authors propose to investigate/solve this problem?
  • Why & how the proposed solution can address the problem?
  • How do they evaluate their solution? What methodology do they use in their evaluation?
  • What are the most important findings of their evaluation?

We will read all of your summaries and questions/critiques. However, we will NOT provide you with feedback unless your work is below the bar or is missing.

You can find many guidelines online that describe how to read and summarize a research paper. Here is an example from Harvard. You can also talk to me or the TA if you have any questions about reading and summarizing research papers.

Late and missing submission policy

Late submission is allowed for reading, however, you will lose 50% of the mark allocated to the reading. Missing submission results in the total loss of the reading’s mark.

If your late or missing submission is due to a legitimate reason such as illness or an emergency, contact the TA or the course instructor as soon as possible to consider and assess your case.

Reading links

Reading 1: Bridging From Goals to Tasks with Design Study Analysis Reports

Reading 2: Crowdsourcing graphical perception

Reading 3: Task-Based Effectiveness of Basic Visualizations

Reading 4: NodeTrix: A Hybrid Visualization of Social Networks

Reading 5: Graph-Theoretic Scagnostics

Reading 6: Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data

Reading 7: A Systematic Review on the Practice of Evaluating Visualization[/vc_column_text][/vc_tta_section][/vc_tta_tabs][/vc_column][/vc_row]