Projects

Sintel

Sintel is an ecosystem for time series tasks including, time series featurization, anomaly detection, forecasting, and predictive maintenance. Sintel provides the ability to customize tools for particular domains such as predictive maintenance for wind energy industry.

Sibyl

Sibyl is a highly-configurable system for developing usable ML applications for any domain. Sibyl creates explanations of ML models that are rapidly understandable to audiences of all kinds, including those without ML experience.

Deep Mining

A tool to automatically tune the entire data processing pipeline by standardizing the pipeline abstractions, as well as building and testing several hyperparameter selection and optimization methods. Saves a significant amount of time and effort, as small differences in hyperparameters can significantly affect the performance of a particular pipeline.

Feature Hub

A tool that makes feature engineering easier, faster, and more effective by collaboratively crowdsourcing feature creation via an open-source web notebook. FeatureHub gathers, tests, collates and extracts features submitted by users, and uses them to train relevant machine learning models.

Auto Tune Models (ATM)

A cloud-based modeling system which provides an easier, faster way to narrow down to the best possible model for a given prediction problem. ATM performs bandit-based and Gaussian process learning to decide among methodologies, as well as to pinpoint which parameters and hyperparameters should be used for modeling.

Sense ML

A cloud-based master-worker architecture that can process IOT data from segmentation all the way through the formation of a machine learning example. In the SenseML platform, some steps—such as pre-processing—are automated, while others are surfaced strategically to allow data scientists to input their expertise.

TRANE

A language for describing prediction problems over relational datasets, as well as a system that allows data scientists to specify problems in that language. TRANE is able to automatically generate prediction problems for temporal datasets and produces labels for supervised learning. Its goal is to streamline the machine learning problem-solving process.

Automatic Data Elements Linking (ADEL)

A tool that generates a relational data schema and identifies other salient properties for a data model without meta information. It detects the type of each data field, Primary Keys for each row, and relationships between different data entities.

Synthetic Data Vault (SDV)

SDV enables data scientists to sidestep data-sharing concerns and expand the pool of possible problem participants by generating synthetic data. By learning a generative model that accounts for dependence and relationships, the SDV creates new data that resembles the original set statistically, formally, and structurally.