Copied URL with current time.
0:00 / 0:00

TradeRev Is a Machine Learning Vehicle Appraisal / Auctioning System

In this episode of Running in Production, Amit Jain goes over building an auctioning system that uses machine / deep learning and is powered by Flask and Python. It’s all hosted on AWS and has been up and running since mid 2011.

Amit goes over a few machine learning libraries, refactoring a 100k+ line monolith into microservices without any automated tests, the importance of machine learning accuracy, using a bunch of AWS services to deploy a large site, treating your infrastructure as code and more.

Show Notes

  • 3:58 – Amit lead a team of ~10 R&D engineers responsible for Data Science / ML
  • 4:33 – Roughly 1,000 cars a day are being traded with 8-10k auctions / bids per day
  • 5:15 – Motivation for using Flask and Python
  • 6:55 – Scikit-Learn and TensorFlow for machine / deep learning
  • 7:39 – Did things start off with multiple microservices or was it a monolith early on?
  • 9:41 – There’s about 80,000 to 120,000 lines of code across 200-300+ Python files
  • 10:14 – The huge refactor to microservices was done without automated tests initially
  • 11:11 – After the refactor now there’s 86% test coverage which is enough to be confident
  • 12:24 – Flask-Restplus is the main library used to build their RESTful APIs
  • 12:43 – Other notable libraries were gunicorn and boto3 (AWS SDK for Python)
  • 13:05 – Locust is an open source load / performance testing tool
  • 13:40 – With machine learning, speed is important but accuracy is even more important
  • 15:30 – gunicorn is very compact, performant and easy to configure
  • 16:28 – Most caches were in memory and they used Amazon DynamoDB
  • 17:09 – The primary database is MySQL running on Amazon RDS
  • 18:04 – SQLAlchemy is used on the Python side as an ORM
  • 19:29 – Docker is sort of being used in development
  • 21:02 – The platform runs on AWS with Lambda, API Gateway and AWS Fargate with ECS
  • 22:24 – What is AWS Fargate and what does it allow you to do?
  • 23:48 – Scaling with Fargate while using auto scaling policies and configuration
  • 26:28 – Taking advantage of the cloud and setting up load balancing with configuration
  • 28:04 – How do you deal with secrets when using Fargate / ECS?
  • 30:02 – What about logging and metrics? Are you exclusively using all of AWS’ services?
  • 31:12 – What about error reporting, such as getting notified if an error happens
  • 31:34 – The deploy process from development to production (includes CI / CD with Jenkins)
  • 33:26 – A Walk through of how the different AWS services come together
  • 36:54 – Terraform is being used to manage the infrastructure as code (valuable tool)
  • 40:04 – Database backups were performed by the DevOps team
  • 40:41 – Best tips? Start slow and expect failures, also don’t chase perfection
  • 42:14 – You can find Amit on Twitter at @ml_amit and on LinkedIn

Shameless Plugs

Questions

May 11, 2020

✏️ Edit on GitHub