(Also available as a pdf. This page is generated by parsing the LaTeX source for the pdf with a bunch of wild regexes. If you see any conversion errors, please let me know.)
Experienced in big data platforms and pipelines in Spark, Hadoop MapReduce, and Scalding; large-scale, zero-downtime data and service migrations; scaling services to support rapid user growth; monorepo build tools like Bazel and Pants; developer experience; geolocation problems of all sorts; and high-QPS full-text indexing and retrieval.
New York, NY, 2011–2024
Authored dozens of terabyte- to petabyte-scale, production-critical pipelines in Spark, Scalding, and Hadoop MapReduce. Performed zero-downtime upgrade of large monorepo with hundreds of pipelines from aws emr 5 to 6, Hadoop 2 to 3, Spark 2 to 3, and many transitive dependencies. Wrote a compatibility layer to enable the migration of hundreds of pipeline jobs to Airflow from a legacy workflow scheduling system. Led migration of data pipelines from on-prem to aws emr.
Migrated our core data collection from Mongodb to a custom, hybrid solution with ⅓ the operational cost. Built numerous core features of the Foursquare and Swarm apps enjoyed by millions of users, including the Stickers achievement system, the weekly Leaderboard, the Year in Review, a personalized Trivia system, and social sharing features.
Migration of a 2.5m loc monorepo used by 100 developers to Bazel build system, lowering ci build times from 3 hours to 20 minutes. Upgraded 2m loc Scala codebase from Scala 2.11 to 2.12. Led upgrade of 200k loc Python codebase from Python 2 to 3. Enabled type checking in several Python repos and refactored them to an error free state. Experienced with Docker, Jenkins, GitHub Actions, and aws CodeBuild.
Built and maintained the point-of-interest search service used by Instagram, Uber, and others for several years, and scaled the service from 2k qps to 40k qps. Designed and built ml-based ranking that improved precision@1 from 50% to 80%. Performed zero-downtime migration of search datastore from Solr to Elasticsearch.
Cambridge, MA, 2007–2010
Built Strongspace, an online file sharing and backup service written in Ruby on Rails. Implemented billing, user management, and a web-based file browser. Primary Objective-C programmer of ExpanDrive, remote file system over ftp / sftp for macos.
Shaw, Blake, Jon Shea, Siddhartha Sinha, and Andrew Hogue. “Learning to rank for spatiotemporal search.” In Proceedings of the sixth ACM international conference on Web search and data mining, pp. 717–726. 2013. dl.acm.org/doi/abs/10.1145/2433396.2433485