Evaluating Causal Discovery Algorithms with Software Systems
To obtain realistic data for evaluating causal discovery algorithms, we designed and executed factorial experiments on three large-scale software systems: PostgreSQL, the Java Development Kit, and web server infrastructure. Each domain is characterized by three classes of variables: covariates, treatments, and outcomes. Under the factorial experiment design, outcomes were measured for every combination of subjects and treatments. This yields a dataset with many records for the same subject. To permit a variety of data transformation opportunities, we performed multiple trials of each factorial experiment.
PostgreSQL
We collected a sample of StackOverflow‘s database, and gathered a sample of user-authored queries from their query explorer. We executed each query against our sample of the database, varying database configuration and monitoring query execution.
11,252 subjects
3 treatments
7 outcomes
10 covariates
JDK
We downloaded a sample of Maven-enabled Java projects from GitHub. We compiled and ran the unit tests of each project, varying JDK options and monitoring runtime behavior.
473 subjects
3 treatments
5 outcomes
5 covariates
Networking
We identified a pool of websites using a small web crawl. We then executed several web requests against each site, varying request parameters and monitoring output.
2,599 subjects
3 treatments
5 outcomes
1 covariate
PostgreSQL Details
Field Name | Category | Description |
---|---|---|
url | subject identifier | Identifies the source of the query |
trial | trial identifier | |
index_level | treatment | Indicates the level of indexing employed. 0: No indexing 1: Indexing on primary keys and foreign keys only 2: Indexing on primary keys, foreign keys, and other commonly-referenced fields |
page_cost | treatment | Indicates the estimated disk access cost provided to the Postgres query planner. 0 corresponds the smallest disk access cost, 3 corresponds the largest disk access cost. |
memory_level | treatment | The amount of working memory provided to Postgres, in increasing increments from 0 to 2 |
local_written_blocks | outcome | The number of blocks written to temporary tables andindices |
temp_written_blocks | outcome | The number of blocks written to short-term working memory |
shared_hit_blocks | outcome | Number of regular table blocks hit in the cache |
temp_read_blocks | outcome | Number of blocks read from temporary tables and indices |
local_read_blocks | outcome | Number of blocks read from temporary tables and indices |
runtime | outcome | Runtime of the query in milliseconds |
shared_read_blocks | outcome | Number of blocks read from regular tables and indices |
rows | covariate | The number of rows produced by the query |
creation_year | covariate | Year the query was created on the StackOverflow site. |
num_ref_tables | covariate | Number of tables referenced by the query |
num_joins | covariate | Number of joins employed by the query |
num_group_by | covariate | Number of “group by” clauses employed by the query |
queries_by_user | covariate | The number of other queries written by the author of this query |
length_chars | covariate | Length of the query in characters |
total_ref_rows | covariate | Total number of rows of all tables referenced in the query |
local_hit_blocks | covariate | Number of temporary table and temporary index blocks hit in the cache |
favorite_count | covariate | Number of times another StackOverflow user has marked this query as a favorite |
JDK Details
Field Name | Category | Description |
---|---|---|
repo_name | subject identifier | Name of the GitHub repository containing the experimentation code |
trial | trial identifier | |
debug | treatment | Indicates whether debug symbols were requested during compilation |
obfuscate | treatment | Indicates whether a code obfuscator was run on the final JAR file |
parallelgc | treatment | Indicates whether a parallel garbage collection was employed during execution (instead of serial garbage collection) |
num_bytecode_ops | outcome | Number of bytecode instructions in the compiled code |
total_unit_test_time | outcome | Number of seconds required to execute unit tests |
allocated_bytes | outcome | Number of bytes allocated during execution of unit tests |
jar_file_size_bytes | outcome | Size of JAR file after compilation (and possibly obfuscation) |
compile_time_ms | outcome | Number of milliseconds to compile the source |
source_ncss | covariate | Number of non-comment source statements in the source code |
test_classes | covariate | Number of Java classes in the unit test source |
test_functions | covariate | Number of functions in the unit test source |
test_ncss | covariate | Number of non-comment source statements in the test source |
test_javadocs | covariate | Number of JavaDoc comments in the test source |
Networking Details
Field Name | Category | Description |
---|---|---|
url | subject identifier | Web address requested |
trial | trial identifier | |
mobile_user_agent | treatment | Indicates if the site was requested with a mobile user agent |
proxy | treatment | Indicates if the site was requested through a proxy server |
compression | treatment | Indicates if the request indicates that compression is supported |
html_attrs | outcome | Number of HTML attributes in the response |
html_tags | outcome | Number of HTML tags in the response |
elapsed | outcome | Elapsed time between request and response in seconds |
decompressed_content_length | outcome | Length of the response content, in bytes, after decompression |
raw_content_length | outcome | Length of the response content, in bytes, before decompression |
server.class | covariate | Category of web server issuing the response |