Software Engineer

Hi, I'm Scott.

Software Engineer @ Anyscale

M.S. EECS @ UC Berkeley


San Francisco, CA

scott.lee.3898@gmail.com

Resume (PDF)

in/scottjlee98

About Me

I'm Scott, a Bay Area native, tea connoisseur, and turtle enthusiast. Currently, I'm a software engineer at Anyscale, contributing to open-source Ray. In the past, I've also had the pleasure of working at other amazing companies like Lyft, Rubrik and Brilliant. I fill my free hours by sipping tea, playing video games, climbing rocks, and trying new restaurants.

I'm passionate about designing thoughtful solutions to complex problems. My current academic and industry interests include:
artificial intelligence (machine learning, LLMs, computer vision)
large-scale data engineering (TB-scale data engineering, data for AI/ML)
intersecting technology and education

Education & Skills

Education
University of California, Berkeley 🐻

M.S. EECS (2020)
B.A. Computer Science (2019)

Skills

Python & Libraries: Ray, PyTorch, TensorFlow, vLLM, HuggingFace, LangChain
Other Languages & Frameworks: Apache Spark, Arrow, Airflow; SQL, Go, Java, R, JavaScript
Domain Knowledge: Computer Vision, Ad Ops (Google/Meta)

Experience

Anyscale (2022 - Present)

Software Engineer (Ray Data)

• Contributor to the open-source Ray project (industry standard Python library for distributed computing) as a core developer for Ray Data.
• Developed core features for Ray Data General Availability, including execution plan optimizer, data ingestion for distributed training, and observability.
Eng lead for new LLM workloads at Anyscale, such as text embeddings generation and LLM batch inference.
• Published several blog posts and presented a talk at Ray Summit to publicize and share our work.



RISELab (2018 - 2020)

Graduate Researcher

• Research areas: Computer Vision (Explainability, Few-Shot), Medical Imaging (EKG)
• Key work: Neural-Backed Decision Trees (ICLR 2021)

Lyft (2020 - 2022)

Software Engineer (Growth Platforms)

Redesigned two major components in existing infrastructure for automated driver acquisition, efficiently scaling up marketing spend from COVID-shutdown to $5MM+/month marketing spend across three paid media channels.
Drove multi-quarter, mission-critical initiatives directly impacting key team OKRs, partnering with numerous other engineers and scientists in a highly cross-functional environment; leveraged and augmented team’s core database of 30+ tables in actively planning the team’s short-term strategy as well as long-term team roadmapping.

UC Berkeley (2017 - 2020)

Head Teaching Assistant

• Managed a team of 50 TAs, 60 tutors, and 150 lab assistants in orchestrating a 1300 student intro data science course.
• Led key infrastructure projects to support scaling across data science department, including streamlining assignment development, autograding system, cheating detection, and large-scale course logistics.

Projects

Fido

A Slackbot packed with features to assist teaching staff members, including roster lookup, Piazza paging, and groupshouts.



BerkeleyTime

An augmented course catalog used by over 30,000 UC Berkeley students that provides data on courses, enrollment trends, grade distributions, and more.

Neural-Backed Decision Trees (ICLR 2021)

Improving explainability for deep learning image classification using a decision tree-based structure.


Object-Focused Edge Detection

A general method for altering algorithms for edge detection in order to produce edge mappings that focus on one or few individual objects in an image.

Contact Me

I'm currently located in San Francisco, CA. If you want to grab a cup of tea or reach me otherwise, email is the preferred method. Thanks!