Open in app

Sign In

Write

Sign In

Ram Vegiraju
Ram Vegiraju

585 Followers

Home

Lists

About

Pinned

About Me — Ram Vegiraju

My Top Medium Stories — Subscribe Here & Join My Newsletter — Hey everyone, thank you for taking the time to visit my page! I just wanted to give a quick introduction and background of myself on this article as well as share some of my top performing articles. I recently graduated from the University of Virginia in 2021 and moved out…

Personal

2 min read

Personal

2 min read


Published in

AWS in Plain English

·8 hours ago

Integrating LangChain with SageMaker JumpStart to Operationalize LLM Applications

Building LLM-Driven Workflows — Large Language Models (LLMs) continue to take the world by storm. Hosting these models is a challenging task as we’ve explored in my previous articles. The next challenge is operationalizing these hosted LLMs in larger real-world applications. To solve these two problems we have a pair of respective tools that…

AWS

7 min read

Integrating LangChain with SageMaker JumpStart to Operationalize LLM Applications
Integrating LangChain with SageMaker JumpStart to Operationalize LLM Applications
AWS

7 min read


Published in

Towards Data Science

·Sep 22

Host Hundreds of NLP Models Utilizing SageMaker Multi-Model Endpoints Backed By GPU Instances

Integrate Triton Inference Server With Amazon SageMaker — In the past we’ve explored SageMaker Multi-Model Endpoints (MME) as a cost effective option to host multiple models behind a singular endpoint. While hosting smaller models is possible on MME with CPU based instances, as these models get larger and more complex in nature sometimes GPU compute may be necessary. …

AWS

7 min read

Host Hundreds of NLP Models Utilizing SageMaker Multi-Model Endpoints Backed By GPU Instances
Host Hundreds of NLP Models Utilizing SageMaker Multi-Model Endpoints Backed By GPU Instances
AWS

7 min read


Published in

Towards Data Science

·Sep 14

Deploying PyTorch Models with Nvidia Triton Inference Server

A flexible high-performant model serving solution — Machine Learning’s (ML) value is truly recognized in real-world applications when we arrive at Model Hosting and Inference. It’s hard to productionize ML workloads if you don’t have a highly performant model-serving solution that will help your model scale up and down. What is a model server/what is model serving…

Pytorch

7 min read

Deploying PyTorch Models with Nvidia Triton Inference Server
Deploying PyTorch Models with Nvidia Triton Inference Server
Pytorch

7 min read


Published in

AWS in Plain English

·Aug 24

Four Different Ways to Host Large Language Models on Amazon SageMaker

Pick the option that makes the most sense for your use-case — Amazon SageMaker is a platform that can be utilized for advanced Machine Learning Model Hosting. As Generative AI continues to expand at a rapid rate, so do the challenges of hosting these large models. Many factors make hosting these models particularly difficult ranging from model size to being able to…

AWS

9 min read

Four Different Ways to Host Large Language Models on Amazon SageMaker
Four Different Ways to Host Large Language Models on Amazon SageMaker
AWS

9 min read


Published in

Towards Data Science

·Jul 14

Deploying Large Language Models With HuggingFace TGI

Another way to efficiently host and scale your LLMs with Amazon SageMaker — Large Language Models (LLMs) continue to soar in popularity as a new one is released nearly every week. With the number of these models increasing, so are the options for how we can host them. In my previous article we explored how we could utilize DJL Serving within Amazon SageMaker…

AWS

5 min read

Deploying Large Language Models With HuggingFace TGI
Deploying Large Language Models With HuggingFace TGI
AWS

5 min read


Published in

Towards Data Science

·Jun 16

Debugging SageMaker Endpoints With Docker

An Alternative To SageMaker Local Mode — A pain point with getting started with SageMaker Real-Time Inference is that it is hard to debug at times. When creating an endpoint there are a number of ingredients you need to make sure are baked properly for successful deployment. Proper file structuring of model artifacts depending on the Model…

AWS

6 min read

Debugging SageMaker Endpoints With Docker
Debugging SageMaker Endpoints With Docker
AWS

6 min read


Published in

Towards Data Science

·Jun 7

Deploying LLMs On Amazon SageMaker With DJL Serving

Deploy BART on Amazon SageMaker Real-Time Inference — Large Language Models (LLMs) and Generative AI continue to take over the Machine Learning and general tech space in 2023. With the LLM expansion has come an influx of new models that continue to improve at a stunning rate. While the accuracy and performance of these models are incredible, they…

AWS

8 min read

Deploying LLMs On Amazon SageMaker With DJL Serving
Deploying LLMs On Amazon SageMaker With DJL Serving
AWS

8 min read


Published in

Towards Data Science

·May 18

Deploying Cohere Language Models On Amazon SageMaker

Scale and Host LLMs on AWS — Large Language Models (LLMs) and Generative AI are accelerating Machine Learning growth across various industries. With LLMs the scope for Machine Learning has increased to incredible heights, but has also been accompanied with a new set of challenges. The size of LLMs lead to difficult problems in both the Training…

AWS

7 min read

Deploying Cohere Language Models On Amazon SageMaker
Deploying Cohere Language Models On Amazon SageMaker
AWS

7 min read


Published in

AWS in Plain English

·May 10

Deploy An MLOps Pipeline With Training, Model Registry, and Batch Inference

Harness SageMaker Pipelines With Batch Inference — MLOPs continues to be a popular topic in the Machine Learning world and for good reason. As ML continues to mature it’s become crucial that we apply Software Engineering best practices to make reusable, end to end workflows tailored for Data Science. A tool we explored in my previous article…

Machine Learning

7 min read

Deploy An MLOps Pipeline With Training, Model Registry, and Batch Inference
Deploy An MLOps Pipeline With Training, Model Registry, and Batch Inference
Machine Learning

7 min read

Ram Vegiraju

Ram Vegiraju

585 Followers

Passionate about AWS & ML

Following
  • Tim Denning

    Tim Denning

  • ILLUMINATION

    ILLUMINATION

  • The PyCoach

    The PyCoach

  • Dr Mehmet Yildiz

    Dr Mehmet Yildiz

  • Bex T.

    Bex T.

See all (81)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams