Evaluating Causal Discovery Algorithms with Software Systems
To obtain realistic data for evaluating causal discovery algorithms, we designed and executed factorial experiments on three large-scale software systems: PostgreSQL, the Java Development Kit, and web server infrastructure. Each domain is characterized by three classes of variables: covariates, treatments, and outcomes. Under the factorial experiment design, outcomes were measured for every combination of subjects and treatments. This yields a dataset with many records for the same subject. To permit a variety of data transformation opportunities, we performed multiple trials of each factorial experiment.
PostgreSQL
We collected a sample of StackOverflow‘s database, and gathered a sample of user-authored queries from their query explorer. We executed each query against our sample of the database, varying database configuration and monitoring query execution.
11,252 subjects
3 treatments
7 outcomes
10 covariates
JDK
We downloaded a sample of Maven-enabled Java projects from GitHub. We compiled and ran the unit tests of each project, varying JDK options and monitoring runtime behavior.
473 subjects
3 treatments
5 outcomes
5 covariates
Networking
We identified a pool of websites using a small web crawl. We then executed several web requests against each site, varying request parameters and monitoring output.
2,599 subjects
3 treatments
5 outcomes
1 covariate
PostgreSQL Details
| Field Name | Category | Description |
|---|---|---|
| url | subject identifier | Identifies the source of the query |
| trial | trial identifier | |
| index_level | treatment | Indicates the level of indexing employed. 0: No indexing 1: Indexing on primary keys and foreign keys only 2: Indexing on primary keys, foreign keys, and other commonly-referenced fields |
| page_cost | treatment | Indicates the estimated disk access cost provided to the Postgres query planner. 0 corresponds the smallest disk access cost, 3 corresponds the largest disk access cost. |
| memory_level | treatment | The amount of working memory provided to Postgres, in increasing increments from 0 to 2 |
| local_written_blocks | outcome | The number of blocks written to temporary tables andindices |
| temp_written_blocks | outcome | The number of blocks written to short-term working memory |
| shared_hit_blocks | outcome | Number of regular table blocks hit in the cache |
| temp_read_blocks | outcome | Number of blocks read from temporary tables and indices |
| local_read_blocks | outcome | Number of blocks read from temporary tables and indices |
| runtime | outcome | Runtime of the query in milliseconds |
| shared_read_blocks | outcome | Number of blocks read from regular tables and indices |
| rows | covariate | The number of rows produced by the query |
| creation_year | covariate | Year the query was created on the StackOverflow site. |
| num_ref_tables | covariate | Number of tables referenced by the query |
| num_joins | covariate | Number of joins employed by the query |
| num_group_by | covariate | Number of “group by” clauses employed by the query |
| queries_by_user | covariate | The number of other queries written by the author of this query |
| length_chars | covariate | Length of the query in characters |
| total_ref_rows | covariate | Total number of rows of all tables referenced in the query |
| local_hit_blocks | covariate | Number of temporary table and temporary index blocks hit in the cache |
| favorite_count | covariate | Number of times another StackOverflow user has marked this query as a favorite |
JDK Details
| Field Name | Category | Description |
|---|---|---|
| repo_name | subject identifier | Name of the GitHub repository containing the experimentation code |
| trial | trial identifier | |
| debug | treatment | Indicates whether debug symbols were requested during compilation |
| obfuscate | treatment | Indicates whether a code obfuscator was run on the final JAR file |
| parallelgc | treatment | Indicates whether a parallel garbage collection was employed during execution (instead of serial garbage collection) |
| num_bytecode_ops | outcome | Number of bytecode instructions in the compiled code |
| total_unit_test_time | outcome | Number of seconds required to execute unit tests |
| allocated_bytes | outcome | Number of bytes allocated during execution of unit tests |
| jar_file_size_bytes | outcome | Size of JAR file after compilation (and possibly obfuscation) |
| compile_time_ms | outcome | Number of milliseconds to compile the source |
| source_ncss | covariate | Number of non-comment source statements in the source code |
| test_classes | covariate | Number of Java classes in the unit test source |
| test_functions | covariate | Number of functions in the unit test source |
| test_ncss | covariate | Number of non-comment source statements in the test source |
| test_javadocs | covariate | Number of JavaDoc comments in the test source |
Networking Details
| Field Name | Category | Description |
|---|---|---|
| url | subject identifier | Web address requested |
| trial | trial identifier | |
| mobile_user_agent | treatment | Indicates if the site was requested with a mobile user agent |
| proxy | treatment | Indicates if the site was requested through a proxy server |
| compression | treatment | Indicates if the request indicates that compression is supported |
| html_attrs | outcome | Number of HTML attributes in the response |
| html_tags | outcome | Number of HTML tags in the response |
| elapsed | outcome | Elapsed time between request and response in seconds |
| decompressed_content_length | outcome | Length of the response content, in bytes, after decompression |
| raw_content_length | outcome | Length of the response content, in bytes, before decompression |
| server.class | covariate | Category of web server issuing the response |