Evaluating Causal Discovery Algorithms with Software Systems


To obtain realistic data for evaluating causal discovery algorithms, we designed and executed factorial experiments on three large-scale software systems: PostgreSQL, the Java Development Kit, and web server infrastructure. Each domain is characterized by three classes of variables: covariates, treatments, and outcomes. Under the factorial experiment design, outcomes were measured for every combination of subjects and treatments. This yields a dataset with many records for the same subject. To permit a variety of data transformation opportunities, we performed multiple trials of each factorial experiment.

PostgreSQL

We collected a sample of StackOverflow‘s database, and gathered a sample of user-authored queries from their query explorer. We executed each query against our sample of the database, varying database configuration and monitoring query execution.

11,252 subjects
3 treatments
7 outcomes
10 covariates

JDK

We downloaded a sample of Maven-enabled Java projects from GitHub. We compiled and ran the unit tests of each project, varying JDK options and monitoring runtime behavior.

473 subjects
3 treatments
5 outcomes
5 covariates

Networking

We identified a pool of websites using a small web crawl. We then executed several web requests against each site, varying request parameters and monitoring output.

2,599 subjects
3 treatments
5 outcomes
1 covariate


PostgreSQL Details

Field Name Category Description
url subject identifier Identifies the source of the query
trial trial identifier
index_level treatment Indicates the level of indexing employed.
0: No indexing
1: Indexing on primary keys and foreign keys only
2: Indexing on primary keys, foreign keys, and other commonly-referenced fields
page_cost treatment Indicates the estimated disk access cost provided to the Postgres query planner. 0 corresponds the smallest disk access cost, 3 corresponds the largest disk access cost.
memory_level treatment The amount of working memory provided to Postgres, in increasing increments from 0 to 2
local_written_blocks outcome The number of blocks written to temporary tables andindices
temp_written_blocks outcome The number of blocks written to short-term working memory
shared_hit_blocks outcome Number of regular table blocks hit in the cache
temp_read_blocks outcome Number of blocks read from temporary tables and indices
local_read_blocks outcome Number of blocks read from temporary tables and indices
runtime outcome Runtime of the query in milliseconds
shared_read_blocks outcome Number of blocks read from regular tables and indices
rows covariate The number of rows produced by the query
creation_year covariate Year the query was created on the StackOverflow site.
num_ref_tables covariate Number of tables referenced by the query
num_joins covariate Number of joins employed by the query
num_group_by covariate Number of “group by” clauses employed by the query
queries_by_user covariate The number of other queries written by the author of this query
length_chars covariate Length of the query in characters
total_ref_rows covariate Total number of rows of all tables referenced in the query
local_hit_blocks covariate Number of temporary table and temporary index blocks hit in the cache
favorite_count covariate Number of times another StackOverflow user has marked this query as a favorite

JDK Details

Field Name Category Description
repo_name subject identifier Name of the GitHub repository containing the experimentation code
trial trial identifier
debug treatment Indicates whether debug symbols were requested during compilation
obfuscate treatment Indicates whether a code obfuscator was run on the final JAR file
parallelgc treatment Indicates whether a parallel garbage collection was employed during execution (instead of serial garbage collection)
num_bytecode_ops outcome Number of bytecode instructions in the compiled code
total_unit_test_time outcome Number of seconds required to execute unit tests
allocated_bytes outcome Number of bytes allocated during execution of unit tests
jar_file_size_bytes outcome Size of JAR file after compilation (and possibly obfuscation)
compile_time_ms outcome Number of milliseconds to compile the source
source_ncss covariate Number of non-comment source statements in the source code
test_classes covariate Number of Java classes in the unit test source
test_functions covariate Number of functions in the unit test source
test_ncss covariate Number of non-comment source statements in the test source
test_javadocs covariate Number of JavaDoc comments in the test source

Networking Details

Field Name Category Description
url subject identifier Web address requested
trial trial identifier
mobile_user_agent treatment Indicates if the site was requested with a mobile user agent
proxy treatment Indicates if the site was requested through a proxy server
compression treatment Indicates if the request indicates that compression is supported
html_attrs outcome Number of HTML attributes in the response
html_tags outcome Number of HTML tags in the response
elapsed outcome Elapsed time between request and response in seconds
decompressed_content_length outcome Length of the response content, in bytes, after decompression
raw_content_length outcome Length of the response content, in bytes, before decompression
server.class covariate Category of web server issuing the response