Skip to content

Projects & Blog

Tip

You can make images larger by clicking on them 😄

Data Science, MLOps, and Cloud Engineering

Taking Python to Production - Udemy Course

#Udemy #git #CI/CD #Testing #Clean Code #Python #semver

Highly-rated Udemy course covering the fundamentals of software engineering.

Many data scientists and junior engineers come from an academic background in statistics or other quantitative field, without having learned sufficient software engineering skills to bring their ideas to production or collaborate effectively.

This course has helped hundreds of students become effective citizens of the software engineering community. These skills have elevated the self-sufficiency of my co-workers and laid a foundation for us to do advanced MLOps.

Course Page

Soon-to-be-Official VS Code Extension for ClearML

#ClearML #TypeScript #VS Code #SSH #AWS EC2

Remote workstations are amazing! Especially for data science.

From your laptop, you can SSH into a more powerful machine to get tremendous gains in productivity.

Here is a Linkedin post of mine listing several benefits and comparing different remote workstation offerings.

Metaflow by Outerbounds is an open-source MLOps tool with a closed-source VS Code extension that allows you to connect to remote workstations.

I organized a hackathon that created a clone of Outerbounds' VS Code extension for another tool called ClearML, allowing anyone to host their own on-prem or cloud DS workstations without the need for Kubernetes and connect in one click.

🎉 ClearML offered to officially adopt the extension and take over maintenance. They also complimented our clean TypeScript code.

GitHub VS Code Marketplace

Elevated Analytics - CRM Analytics Startup

#AWS CDK #Spark #Dashboards #Batch ETL #Quicksight #AWS Step Functions #Athena

What: We offer analytics via SaaS for Salesforce and other CRMs as well as mentorship and training on best practices for tracking an organization's sales process.

This is ideal for groups who have not hired a full-time analyst but want affordable, high-quality visibility into their sales funnel.

Who: I started this company in 2022 together with Ryan Gardner who I met through Rootski, and two friends Ryan had worked with, Nate Roberts and Colin Toyn.

These are great co-founders. I built out data flows and a warehouse to ingest CRM data from customers, Ryan built the BI layer on top of that, and Nate and Colin have been networking to find clients. Nate, Ryan, and Colin are full-time sales analysts, so they are the subject matter experts.

Self-hostable Minecraft Server Platform-as-a-Service

#AWS CDK #serverless

Over the course of December 2022, I rallied several strangers and friends around the cause of saving Christmas 🎄 for the Riddoch family cousins. 7 software engineers pulled several super late nights.

We produced a high-quality, self-hostable, serverless, secure, nearly free-to-run Minecraft server platform-as-a-service (PaaS) that anyone can deploy into their own AWS account using AWS CDK.

🎉 This project pushes the boundary of what you can do with AWS CDK. Some of the AWS CDK core developers were really impressed!

pip install awscdk-minecraft to get started!

PyPI GitHub

Organized successful worldwide AI/data hackathon

I co-organize the Utah chapter of the MLOps.community Meetup. Through this Meetup, I organized the most successful hackathon we've ever had!

  • 60 in-person attendees, 20 remote
  • 2 sponsors (BENlabs and Nerd United)
  • $2,500 in prizes
  • 9 90-second project demo videos submitted (see video)

Platform Engineering: Create/deploy a product in minutes

#projen #TemplatesAreEvil #Pulumi #GitHub Actions

This is an open-source POC I did before implementing this at work.

What if, in under 5 minutes, you could create a new repository with permissions, CI/CD, build secrets, and a boilerplate app that deploys to the cloud in one-click?

Seriously, this is the pinnacle of platform engineering!

www.rootski.io Deep learning SaaS for studying Russian

#PyTorch #MLFlow #React #TypeScript #MaterialUI #FastAPI

v1.0.0 - Containers and SQL
#Terraform #DockerSwarm #Postgres #Bamboo Server #Bitbucket Pipelines #Traefik

v2.0.0 - Serverless and NoSQL
AWS CDK AWS Lambda DynamoDB Github Actions AWS API Gateway AWS Cognito / OAuth 2.0 Sphinx


Russian words are often looong, but there are only ~300 word roots which make up the most common words.

You can break up the word саморазмораживающийся (self-defrosting) like this:

само (self-) раз (un-) мораж (frost) ивающийся (-ing)

Rootski uses a deep learning transformer model to break russian words into roots. If a breakdown is obviously wrong, users can submit their own. The GIF to the left does the breakdown for "выходить" which means "to exit".

I worked on Rootski for four years.

I mentored more than 20 people from all over the world in exchange for help building it out.

I ran Rootski like a startup. I

  • Recorded a 10-hour YouTube playlist for onboarding junior developers to the codebase and tools.
  • Deployed a knowledge base generated with an advanced Sphinx setup.
  • Gave total strangers access to my AWS account paid for with my credit card and trusted my IAM roles to keep costs from exploding.
  • Automated the creation of ClickUp tickets for new contributors which guided them through onboarding in a structured, self-serve way.
  • Recruited contributors through Linkedin with posts like this one and made a lot of friends in the process.

In the end, I finally acknowledged that I had learned a lot from building Rootski, but there were other projects I wanted to work on more.

Rootski is a fantastic reference project for anyone who wants to learn to build a scalable, secure, stable SaaS product at low cost with modern tools.

Top Contributors


Eric Riddoch

🧑‍🏫 💻

Isaac Robbins

💻 🚇

Josh Abrahamsen

🚇

Ryan Gardner

💼

Joe Drapeau

💻

Ethan Walker

💻

Isaac Z Tai

👀

Adam Lenning

️️️️♿️

rootski.io docs.rootski.io GitHub organization

Rootski mobile app (abandoned 😢)

#ReactNative #TypeScript #SQLite #PyTorch #ONNX

My original plan for Rootski was to make it an offline-first, cross-platform mobile app. 2 things caused me to rewrite it for the web:

  1. After surveying many potential users, I found that most would prefer to use it on their computers.
  2. While I was able to export the PyTorch model with ONNX, I would have had to write native Java/Kotlin and Swift code to run inferences on iOS and Android.

It was sad to leave this project behind, but I learned React in the process, which has been super valuable.

MLOps & Observability: First to ever send BentoML logs, metrics, & traces to NewRelic

#BentoML #NewRelic #FastAPI #OpenTelemetry

BentoML is a cutting edge tool for high-performance model serving.

I did extensive testing and struggled to send BentoML metrics and traces to NewRelic.

I ultimately succeeded in creating an experiment (shown in screenshot; link to code below). I created FastAPI app instrumented with NewRelic that hit a BentoML API instrumented with OpenTelemetry which then hit another FastAPI app, again with NR instrumentation. I used the AWS OTel Collector to send the BentoML traces to NewRelic.

🎉 It worked! My example shows that NewRelic and OpenTelemetry traces are fully compatible and that BentoML can be monitored with no code changes!

🎉 I also submitted a issue which BentoML implemented, making it so failed HTTP requests return a trace ID in the headers. This makes it possible to look up the logs of failed requests to find the root cause!

GitHub

Book about Linux and the command line

#Python #Docker #GhostScript #CSS #MkDocs #Chromedriver #PDFSpecification

A lot of my friends in data science feel insecure about their software development skills because they didn't study "Software Engineering" or "Computer Science" in school.

I have mentored and pair-programmed with many people to help them learn Linux, git, OOP, test-driven(ish) design, and other key software skills. After doing this, I've decided to put it all on one place by writing a book that assumes nothing but knowledge of Python syntax.

Update: I've stopped writing the book, and am making a Udemy course instead. The chapters that are here turned out nicely!

The Path After Python (Preview)

Eric the Vast - Fitness blog with data!

#MkDocs #Bokeh #GoogleSheets #Javascript #Docker #HTML #CSS

Behold the latest in fitness tracking technology™️

I read a book called Atomic Habits which helped me realize that the reason I often fail to reach my long term fitness goals is that skipping workouts is not painful in the short term.

So, I wrote up a fitness contract stating that if I ever miss a workout, I must pay my family and friends $750!

I blog my progress monthly and embed interactive dashboards populated by my workout log in Google Sheets, so my friends have total visibility into everything I eat and all the workouts I do.

Eric the Vast!

Docker for data scientists - meetup talk

#Docker

Deploying code to production can be as easy as running it on your own machine.

In this presentation, I give a very visual introduction on how to use docker to wrap apps into tidy containers so that they can run seamlessly in any environment.

Thompson sampling with cats!

#TypeScript #NodeJS #D3.js #Python

A fun demonstration of how Thompson Sampling can be used to achieve better results than A/B Testing when deciding which version of a product to deploy.

GitHub See the Cats!

Spotify analysis with linear regression

#Python #SpotifyAPI

An analysis of several thousand songs on Spotify. It examines which features of a song are most correlated with popularity.

I check the LINE assumptions, run LASSO regression to eliminate noisy features, and then do a brute force model selection to find the features most predictive of popularity.

Analysis

Dominating a writing assignment with Python

#Python

For a writing class group assignment, we were pretending to be consultants for Pluralsight's social media strategy. Our paper was the usual fluff, except for THIS.

I scraped the company Facebook pages of a few different e-learning platforms and showed that Udemy (yellow line) had way more engaged followers than Pluralsight (blue line). We blew our professor's mind and we all got A's 🤣.

Cringy Paper

Advanced Topics in Applied Mathematics

Facial recognition

#Jupyter Notebook #Python

Algorithm that uses eigenfaces to solve the computer vision problem of face recognition.

Given an image of a face, this algorithm finds the closest match in the data set with respect to the Euclidean norm.

Fourier transform

#Python #Jupyter Notebook

Implementation, explanation, and several applied examples of the famous Fast Fourier Transform.

This algorithm decomposes a signal via a linear transformation from the time domain to the frequency domain in O(nlogn) time.

I have a rigorous understanding of the mathematics behind this process.

Classical machine learning problem

#Python

Implementation of a K-Dimensional Tree classifier used to correctly identify the digits 0-9 from thousands of images.

Page rank algorithm

#Python #NetworkX

The PageRank algorithm was principle to Google's success. Before PageRank, search results on the web were totally disorganized.

I implemented the algorithm myself and used it to rank websites from a dataset maintained by Stanford University. Then, I imported the optimized NetworkX version to build a March Madness bracket.

Markov chains

Yoda Speak

A Jedi's strength flows from this kind?

To question, no try.

A Jedi Knight with you be.

The Chosen One the Chancellor, he will be.

Encircle them we must, then divide.

#Python

I created a Markov chain from Master Yoda's speech to simulate a classical Natural Language Processing problem.

Because the outcome of a Markov chain only depends on a single state (a single word in this case), the results are non-sensical and fun to read.

Software Development

Family map on Android

#REST API #Android #SQLite #Java

The Family Map android app gives users a visual way to view and explore their own personal origins.

My web server generates family trees for each user and serves them to the client to be displayed on a world map.

Custom DOMO tiles - data pipeline tool

#DOMO API #R #Python #Cronjob

Your company may amass information from B2B services such as Facebook Advertising, LinkedIn, Salesforce, etc. DOMO provides an all-in-one data warehousing and analytics platform to handle this data.

This script can download one or more datasets from a DOMO instance, run the data through any R or Python script, and then push it back into DOMO. When scheduled, this serves as an easy data pipeline.

Algorithms and Data Structures

Pacman breadth-first search - Hackerrank

#Python

One of my favorites. Pacman chose to use Breadth First Search to find the food.

100+ Hackerrank challenges

#Python #Javascript #Java #C++ #Regex #SQL

Python doesn't have a sorted set data structure, but Java does. Hackerrank challenges have helped me discover which tools best suit each problem in coding competitions.

Kevin Bacon BFS

In [1]: graph = bfs.MovieGraph()
In [2]: graph.path_to_actor("Robert Downey Jr.", "Kevin Bacon")
Out [2]:
['Robert Downey Jr.', 'Avengers: Infinity War (2018)', 'Josh Brolin', 'Hollow Man (2000)', 'Kevin Bacon']
#Python

An overpowered solution to the Kevin Bacon problem with a data set scraped from IMDB. How many movies are between any actor and Kevin Bacon?

There is one actor between Robert Downey Jr. and Kevin Bacon giving Robert a Bacon number of 1.

Binary search / AVL Tree

#Python #C++

Implementation and benchmark comparison of insertion, removal, and lookup functions on a binary and AVL tree.

Web Development

This very website 😎

#HTML #CSS #Javascript #Bootstrap #MkDocs #Docker

This website has come a long way over the years. If you'd like to learn how to make your own, reach out to me! If you're willing to develop yourself and learn what you need, I'm willing to show you my tooling.

I want to thank my professional hero, Andrew Carr, for inspiring me to do projects like this and show them to the world. I've been riding his wave of pure-genius career ideas for years and it's high time I put his name on this website. He's blessed my life so much and deserves every bit of his massive success ❤.

Dropbot - Chrome automator

#Javascript #Chrome #Python

Writing bots and web scrapers is useful, but can be a pain. Tools such as the Selenium library rely on developers to cleverly locate the important HTML elements of a page.

The Dropbot Chrome extension makes this easy, and even lets you export custom bot scripts as JSON objects for use in any programming language.

Vision therapy Tetris

#Javascript

I have a lazy eye. I took a 3 day challenge to learn Javascript and made this vision therapy game for kids with the same condition.

Wearing an eye patch is a good way to exercise a non-dominant eye, but it doesn't train our two eyes to work together. With red and blue 3D glasses, only one eye can see the red blocks while the other sees the blue blocks--so the eyes must coordinate. This is a fun way for kids to do their vision therapy exercises, and train their eyes to work together.

Research lab website proposal

#AngularJS

This is a skeleton of the IDeA Labs' (Information & Decision Algorithms Laboratories) website written as a single page web application in Angular 5.

It uses routing to associate different states with URLs to simulate having multiple pages.

Further Fun Projects

3-day Java challenge

#Java

I wrote this in 3 days with almost zero prior java experience.