Marek Suppa

Serverless deployment of (large) NLP models

How do you fit a 400MB NLP model into a 250MB serverless function? Learn the model distillation and dependency tricks that make it possible.

Serverless deployment of (large) NLP models
#1about 9 minutes

Exploring practical NLP applications at Slido

Several NLP-powered features are used to enhance user experience, including keyphrase extraction, sentiment analysis, and similar question detection.

#2about 4 minutes

Choosing serverless for ML model deployment

Serverless was chosen for its ease of deployment and minimal maintenance, but it introduces challenges like cold starts and strict package size limits.

#3about 8 minutes

Shrinking large BERT models for sentiment analysis

Knowledge distillation is used to train smaller, faster models like TinyBERT from a large, fine-tuned BERT base model without significant performance loss.

#4about 8 minutes

Building an efficient similar question detection model

Sentence-BERT (SBERT) provides an efficient alternative to standard BERT for semantic similarity, and knowledge distillation helps create smaller, deployable versions.

#5about 3 minutes

Using ONNX Runtime for lightweight model inference

The large PyTorch library is replaced with the much smaller ONNX Runtime to fit the model and its dependencies within AWS Lambda's package size limits.

#6about 3 minutes

Analyzing serverless ML performance and cost-effectiveness

Increasing allocated RAM for a Lambda function improves inference speed, potentially making serverless more cost-effective than a dedicated server for uneven workloads.

#7about 3 minutes

Key takeaways for deploying NLP models serverlessly

Successful serverless deployment of large NLP models requires aggressive model size reduction, lightweight inference libraries, and an understanding of the platform's limitations.

Related jobs
Jobs that call for the skills explored in this talk.

test

Milly
Vienna, Austria

Intermediate

test

Milly
Vienna, Austria

Intermediate

job ad

Saby Company
Delebio, Italy

Intermediate

Featured Partners

Related Articles

View all articles
BB
Benedikt Bischof
MLOps – What’s the deal behind it?
Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Nico Axtmann who introduced us to MLOpsAbout the speaker:Nico Axtmann is a seasoned machine learning veteran. Starting back in 2014 he observed ...
MLOps – What’s the deal behind it?
LM
Luis Minvielle
What Are Large Language Models?
Developers and writers can finally agree on one thing: Large Language Models, the subset of AIs that drive ChatGPT and its competitors, are stunning tech creations. Developers enjoying the likes of GitHub Copilot know the feeling: this new kind of te...
What Are Large Language Models?

From learning to earning

Jobs that call for the skills explored in this talk.