Ram VegirajuinAWS in Plain EnglishDeploying Transformers ONNX Models on Amazon SageMakerAchieve High Scale Performance Utilizing Triton Inference Server With SageMaker Real-Time InferenceMar 13Mar 13
Ram VegirajuinTowards Data ScienceOptimized Deployment of Mistral7B on Amazon SageMaker Real-Time InferenceUtilize large model inference containers powered by DJL Serving & Nvidia TensorRTFeb 21Feb 21
Ram VegirajuinTowards Data ScienceBuilding a Multi-Purpose GenAI Powered ChatbotUtilize SageMaker Inference Components to work with Multiple LLMs EfficientlyFeb 71Feb 71
Ram VegirajuinTowards Data ScienceDeploying Large Language Models with SageMaker Asynchronous InferenceQueue Requests For Near Real-Time Based ApplicationsJan 27Jan 27
Ram VegirajuinTowards Data ScienceBuilding an LLMOPs PipelineUtilize SageMaker Pipelines, JumpStart, and Clarify to Fine-Tune and Evaluate a Llama 7B ModelJan 18Jan 18
Ram VegirajuinTowards AWSMLOPs With SageMaker Pipelines Step DecoratorAn End to End Example of Feature Engineering, Training, and Inference Simplified with new SageMaker Pipelines FeaturesJan 111Jan 111
Ram VegirajuinTowards Data ScienceHosting Multiple LLMs on a Single EndpointUtilize SageMaker Inference Components to Host Flan & Falcon in a Cost & Performance Efficient MannerJan 11Jan 11
Ram VegirajuinAWS in Plain Englishre:Invent 2023 AI/ML LaunchesMy personal overview of some of the key launches this yearDec 4, 2023Dec 4, 2023
Ram VegirajuinAWS in Plain EnglishIntegrating LangChain with SageMaker JumpStart to Operationalize LLM ApplicationsBuilding LLM-Driven WorkflowsOct 2, 20232Oct 2, 20232
Ram VegirajuinTowards Data ScienceHost Hundreds of NLP Models Utilizing SageMaker Multi-Model Endpoints Backed By GPU InstancesIntegrate Triton Inference Server With Amazon SageMakerSep 22, 2023Sep 22, 2023
Ram VegirajuinAWS in Plain EnglishFour Different Ways to Host Large Language Models on Amazon SageMakerPick the option that makes the most sense for your use-caseAug 24, 20231Aug 24, 20231
Ram VegirajuinTowards Data ScienceDeploying Large Language Models With HuggingFace TGIAnother way to efficiently host and scale your LLMs with Amazon SageMakerJul 14, 20231Jul 14, 20231
Ram VegirajuinTowards Data ScienceDebugging SageMaker Endpoints With DockerAn Alternative To SageMaker Local ModeJun 16, 2023Jun 16, 2023
Ram VegirajuinTowards Data ScienceDeploying LLMs On Amazon SageMaker With DJL ServingDeploy BART on Amazon SageMaker Real-Time InferenceJun 7, 2023Jun 7, 2023
Ram VegirajuinTowards Data ScienceDeploying Cohere Language Models On Amazon SageMakerScale and Host LLMs on AWSMay 18, 2023May 18, 2023
Ram VegirajuinAWS in Plain EnglishDeploy An MLOps Pipeline With Training, Model Registry, and Batch InferenceHarness SageMaker Pipelines With Batch InferenceMay 10, 20231May 10, 20231
Ram VegirajuinTowards Data ScienceDeploying Multiple Models with SageMaker PipelinesApplying MLOps best practices to advanced serving OptionsMar 23, 2023Mar 23, 2023
Ram VegirajuinTowards Data ScienceDeploying SageMaker Endpoints With TerraformInfrastructure as Code With TerraformMar 14, 2023Mar 14, 2023
Ram VegirajuinTowards Data ScienceLoad Testing Simplified With SageMaker Inference RecommenderTest TensorFlow ResNet50 on SageMaker Real-Time EndpointsMar 7, 2023Mar 7, 2023
Ram VegirajuinTowards Data ScienceLoad Testing SageMaker Multi-Model EndpointsUtilize Locust to Distribute Traffic Weight Across ModelsFeb 24, 2023Feb 24, 2023