This study evaluates your experience using an experimental software framework for collaborative data science. The framework, Ballet, supports collaborative feature engineering for processing dirty tabular datasets.
Do you want to use your data science skills for good? Collaborative, open-source projects that create a machine learning model could have a significant impact in civic technology, social sciences, public health, and more. Your task will be to write a feature definition that can be used to predict personal income from raw survey responses to the US Census American Community Survey. The model built from features submitted by the community can then be used to optimize administration of the survey, direct public policy interventions, assist empirical researchers, and more. Researchers will analyze the collaborative model to better understand how data scientists work together.
This task is expected to take from 30 minutes to 2 hours. You should preferably have basic experience in Python programming and data science development. You will also be asked to complete a short survey about your experiences and may additionally be contacted to have a short interview with researchers.
If you complete the study you will be entered in a raffle for a small Amazon gift card as a token of thanks for your time and effort.
Your participation in this MIT study is voluntary. You will not be compensated for your participation. You may withdraw from the study at any time without consequences by ceasing your participation.
You can also choose to contribute to the GitHub project without signing up or filling out one or both surveys, as if you had not participated in the study.
The study is open until October 3, 2020 at 11:59pm ET, though contributions to the GitHub project are always accepted before, during, and after the study.
You may be contributing to a public source code repository on GitHub. Your activity on GitHub will be public and may be reviewed by the investigators. You can choose to create a new and anonymous GitHub account for the purposes of participating in the study to avoid associating your personal information with your public activity and survey responses. If so, please sign up here (you may need to log out of your normal GitHub account or open a private browsing session).
We may also record minimal telemetry data of Ballet software for the purposes of understanding usage of Ballet software components. This data will be kept completely confidential. You may also opt out of this telemetry data collection on the signup form.
Your survey responses will be kept completely confidential, anonymized, and stored securely. Any publication or discussion of your data will be in aggregate or, in the case of free-text responses, anonymized.