Main
Ellis Valentiner
Summary
Accomplished Staff Data Scientist with 10 years of experience wrangling complex and messy data, training machine learning models, and building data-based solutions at early-stage startups. Highly skilled in developing production machine learning models and cloud-native services.
Education
Carnegie Mellon University
Master’s in Statistical Practice
Pittsburgh, PA
2013 - 2012
University of Minnesota Morris
B.A. in Psychology and Liberal Arts for the Human Services
Morris, MN
2012 - 2007
Experience
Staff/Lead Data Scientist
Virtual Facility
Remote
present - Apr. 2022
- Staff Data Scientist at a seed stage proptech startup that enables commercial facilities management teams to better understand, control, predict, and reduce their operational risk.
- Design, develop, and deploy software, heuristics, and machine learning models to extract key information from facilities management alarms.
- Orchestrate real-time and batch workflows to extract, transform, load, and analyze building automation system data using modern data stack tools (meltano, dbt, BigQuery, Looker).
- Implement and maintain CI/CD pipelines using GitHub Actions workflows on self-hosted runners and GitOps based declarative deployment using Argo CD for Kubernetes workloads.
- Perform regular Postgres database maintenance, query optimization, migrations, and upgrades.
Senior Machine Learning Engineer/Team Lead
Groundspeed Analytics (acq. Insurance Quantified 2023)
Ann Arbor, MI
Apr. 2022 - Nov. 2019
- Lead Machine Learning Engineer at a pre-IPO insurtech startup that extracts information from commercial insurance documents and delivers clean, normalized, useful representations.
- Developed document classification, layout analysis, sequence classification, and information extraction models for end-to-end automated processing using classical methods likes gradient boosting classifiers and fine-tuned transformers like BERT.
- Managed a team of 4 machine learning engineers through one-on-ones, feedback, coaching, and delegation.
- Responsible for production machine learning infrastructure deployed using GitOps methodology with Terraform, Argo CD, Emissary-ingress, and Seldon Core.
Senior Data Scientist
Powerley
Royal Oak, MI
Nov. 2019 - June 2016
- Early data scientist for a pre-IPO, IoT startup building a home energy management platform.
- Reduced user energy consumption by engineering data-based solutions to deliver personalized insights.
- Identified energy use for major appliances by developing disaggregation algorithms in R and Python and provided individualized feedback for strategies to reduce their household energy use.
- Developed a thermostat schedule recommendation engine to balance costs and comfort based on household patterns and preferences.
- Improved query performance by implementing an Apache Spark ETL job to repartition 15.2TB of high-resolution energy usage data.
Data Scientist/Senior Data Scientist
FarmLogs (acq. Bushel 2021)
Ann Arbor, MI
June 2016 - Nov. 2014
- Early data scientist at a pre-IPO, YC W12 agtech startup developing software to improve farm operations.
- Improved rainfall monitoring by identifying high temporal and spatial resolution data sources and implementing a data ingest pipeline.
- Worked closely with product and backend engineering to develop decision support tools to improve crop management practices based on machine learning, computer vision, and statistical modeling.
Statistician, Intermediate
University of Michigan
Ann Arbor, MI
Nov. 2014 - May 2013
- Statistician for a research group studying effects of the built environment on child and maternal health outcomes.
- Conducted statistical analyses using R, SAS, and Stata to investigate geographic and individual factors associated with adverse health outomes among women and childhood populations.
- Published 3 peer-reviewed articles of health data with an environmental and geographic focus using unsupervised learning (clustering) and multi-level spatial regression models.
Skills
- Machine Learning: numpy, pandas, scikit-learn, TensorFlow, PyTorch, fast.ai, spacy
- LLM/AI: langchain, ollama, Claude, OpenAI, BERT
- Databases: PostgreSQL, Amazon Redshift, Google Big Query, dbt, meltano, Looker
- Geospatial: QGIS, PostGIS, GeoPandas, shapely, PyProj, GDAL, wgrib2
- Languages: Python, Go, SQL, Julia, R
- Cloud: AWS, GCP, Kubernetes, Terraform, Helm/Kustomize, Argo CD, Emissary