Supporting Data Workers To Perform Exploratory Programming

Krishna Subramanian, Ilya Zubarev, Simon Voelker, and Jan Borchers
In Extended Abstracts of CHI' 19, ACM, LBW2511:1--LBW2511:6

Data science is an open-ended task in which exploratory programming is a common practice. Data workers often need faster and easier ways to explore alternative approaches to obtain insights from data, which frequently compromises code quality.  To understand how well current IDEs support this exploratory workflow, we conducted an observational study with 19 data workers. In this paper, we present two significant findings from our analysis that highlight issues faced by data workers: (a) code hoarding and (b) excessive task switching and code cloning. To mitigate these issues, we provide design recommendations based on existing work, and propose to augment IDEs with an interactive visual plugin. This plugin parses source code to identify and visualize high-level task details. Data workers can use the resulting visualization to better understand and navigate the source code. As a realization of this idea, we present HypothesisManager, an add-in for RStudio that identifies and visualizes the hypotheses that a data worker is testing for statistical significance through her source code.