Main
Ellis Valentiner
Summary
Accomplished Staff Data Scientist and Team Lead with 10 years of experience building data-based solutions at early-stage startups. Highly skilled in developing machine learning models and cloud-native services.
Education
Carnegie Mellon University
Master’s in Statistical Practice
Pittsburgh, PA
2013 - 2012
University of Minnesota Morris
B.A. in Psychology and Liberal Arts for the Human Services
Morris, MN
2012 - 2007
Experience
Staff Data Scientist/Team Lead
Virtual Facility
Remote
present - Apr. 2022
- Staff Data Scientist at a seed stage proptech startup that enables commercial facilities management teams to better understand, control, predict, and reduce their operational risk.
- Lead a nimble team responsible for data collection, storage, movement, transformation, aggregation, and analysis.
- Transitioned existing data operations to use modern data stack tools.
- Develop heuristic rules and machine learning models to process unstructured facilities management alarms by extracting key information and automatically associating related alarms.
- Designed and implemented CI/CD pipelines using GitHub Actions workflows on self-hosted runners.
- Set up declarative, GitOps based deployment using Argo CD for Kubernetes workloads.
- Led Postgres major version upgrade migrations with minimal downtime optimized database parameters, yielding lower costs and improved performance.
- Work closely with product and engineering teams to identify user pain points, design solutions, and implement features.
Senior Machine Learning Engineer/Team Lead
Groundspeed Analytics (acq. Insurance Quantified 2023)
Ann Arbor, MI
Apr. 2022 - Nov. 2019
- Lead Machine Learning Engineer at a pre-IPO insurtech startup that extracts information from commercial insurance documents and delivers clean, normalized, useful representations.
- Manage a team of 4 machine learning engineers through one-on-ones, feedback, coaching, and delegation.
- Responsible for production machine learning infrastructure deployed using GitOps methodology with Terraform, Argo CD, Emissary-ingress, and Seldon Core.
- Develop document classification, layout analysis, sequence classification, and information extraction models for end-to-end automated processing.
Senior Data Scientist
Powerley
Royal Oak, MI
Nov. 2019 - June 2016
- Early data scientist for a pre-IPO, IoT startup building a home energy management platform.
- Reduced user energy consumption by engineering data-based solutions to deliver personalized insights.
- Identified energy use for major appliances by developing disaggregation algorithms in R and Python and provided individualized feedback for strategies to reduce their household energy use.
- Developed a thermostat schedule recommendation engine to balance costs and comfort based on household patterns and preferences.
- Improved query performance by implementing an Apache Spark ETL job to repartition 15.2TB of high-resolution energy usage data.
Data Scientist/Senior Data Scientist
FarmLogs (acq. Bushel 2021)
Ann Arbor, MI
June 2016 - Nov. 2014
- Early data scientist at a pre-IPO, YC W12 agtech startup developing software to improve farm operations.
- Improved rainfall monitoring by identifying high temporal and spatial resolution data sources and implementing a data ingest pipeline.
- Worked closely with product and backend engineering to develop decision support tools to improve crop management practices based on machine learning, computer vision, and statistical modeling.
Statistician, Intermediate
University of Michigan
Ann Arbor, MI
Nov. 2014 - May 2013
- Statistician for a research group studying effects of the built environment on child and maternal health outcomes.
- Conducted statistical analyses using R, SAS, and Stata to investigate geographic and individual factors associated with adverse health outomes among women and childhood populations.
- Published 3 peer-reviewed articles of health data with an environmental and geographic focus using unsupervised learning (clustering) and multi-level spatial regression models.
Skills
- Machine Learning: numpy, pandas, scikit-learn, TensorFlow, PyTorch, fast.ai, spacy
- Databases: PostgreSQL, Amazon Redshift, Google Big Query, dbt, meltano, Looker
- Geospatial: QGIS, PostGIS, GeoPandas, shapely, PyProj, GDAL, wgrib2
- Languages: Python, Golang, SQL, R
- Cloud: AWS, GCP, Kubernetes, Terraform, Helm, Argo CD, Emissary