Collaborative Data Science Study

Introduction to Ballet:

This study evaluates your experience using an experimental software framework for collaborative data science. The framework, Ballet, supports collaborative feature engineering for processing dirty tabular datasets.

Why should you participate?

Do you want to use your data science skills for good? Collaborative, open-source projects that create a machine learning model could have a significant impact in civic technology, social sciences, public health, and more. Your task will be to write a feature definition that can be used to predict personal income from raw survey responses to the US Census American Community Survey. The model built from features submitted by the community can then be used to optimize administration of the survey, direct public policy interventions, assist empirical researchers, and more. Researchers will analyze the collaborative model to better understand how data scientists work together.

What’s involved in the study?

This task is expected to take from 30 minutes to 2 hours. You should preferably have basic experience in Python programming and data science development. You will also be asked to complete a short survey about your experiences and may additionally be contacted to have a short interview with researchers.

 

If you complete the study you will be entered in a raffle for a small Amazon gift card as a token of thanks for your time and effort.

Details & Procedures:

Your participation in this MIT study is voluntary. You will not be compensated for your participation. You may withdraw from the study at any time without consequences by ceasing your participation.

If you choose to proceed, the steps are as follows:

  1. Fill out a signup form about your background and experience with data science.
  2. Join a project on GitHub where you will create, validate, and submit a feature definition to a project trying to predict personal income from raw survey responses to the US Census American Community Survey. You will be provided with resources to learn about this process. You are asked to continue working until you contribute a feature that is accepted to the project, a process that is estimated to take 30-90 minutes. You may continue working even after contributing an accepted feature.
  3. Complete a short survey about your experience in the study once you have finished your participation. You will be sent a link to this survey by email 24-48 hours after enrolling. After you complete this survey, you may be entered in a drawing for a $25 Amazon gift card as thanks for your participation.
  4. You may be contacted directly by the study investigators to participate in a short video interview about your experience.

You can also choose to contribute to the GitHub project without signing up or filling out one or both surveys, as if you had not participated in the study.

Timeline:

The study is open until October 3, 2020 at 11:59pm ET, though contributions to the GitHub project are always accepted before, during, and after the study.

Confidentiality:

You may be contributing to a public source code repository on GitHub. Your activity on GitHub will be public and may be reviewed by the investigators. You can choose to create a new and anonymous GitHub account for the purposes of participating in the study to avoid associating your personal information with your public activity and survey responses. If so, please sign up here (you may need to log out of your normal GitHub account or open a private browsing session).

We may also record minimal telemetry data of Ballet software for the purposes of understanding usage of Ballet software components. This data will be kept completely confidential. You may also opt out of this telemetry data collection on the signup form.

Your survey responses will be kept completely confidential, anonymized, and stored securely. Any publication or discussion of your data will be in aggregate or, in the case of free-text responses, anonymized.

Click “Proceed to Study” to acknowledge these
conditions and begin.

  • Complete the signup
  • Click link to begin
  • Follow the task instructions on the GitHub README page