InAWS in Plain EnglishbyRam VegirajuDeploying Transformers ONNX Models on Amazon SageMakerAchieve High Scale Performance Utilizing Triton Inference Server With SageMaker Real-Time InferenceMar 13Mar 13
InTowards Data SciencebyRam VegirajuOptimized Deployment of Mistral7B on Amazon SageMaker Real-Time InferenceUtilize large model inference containers powered by DJL Serving & Nvidia TensorRTFeb 21Feb 21
InTowards Data SciencebyRam VegirajuBuilding a Multi-Purpose GenAI Powered ChatbotUtilize SageMaker Inference Components to work with Multiple LLMs EfficientlyFeb 71Feb 71
InTowards Data SciencebyRam VegirajuDeploying Large Language Models with SageMaker Asynchronous InferenceQueue Requests For Near Real-Time Based ApplicationsJan 27Jan 27
InTowards Data SciencebyRam VegirajuBuilding an LLMOPs PipelineUtilize SageMaker Pipelines, JumpStart, and Clarify to Fine-Tune and Evaluate a Llama 7B ModelJan 18Jan 18
InTowards AWSbyRam VegirajuMLOPs With SageMaker Pipelines Step DecoratorAn End to End Example of Feature Engineering, Training, and Inference Simplified with new SageMaker Pipelines FeaturesJan 111Jan 111
InTowards Data SciencebyRam VegirajuHosting Multiple LLMs on a Single EndpointUtilize SageMaker Inference Components to Host Flan & Falcon in a Cost & Performance Efficient MannerJan 11Jan 11
InAWS in Plain EnglishbyRam Vegirajure:Invent 2023 AI/ML LaunchesMy personal overview of some of the key launches this yearDec 4, 2023Dec 4, 2023
InAWS in Plain EnglishbyRam VegirajuIntegrating LangChain with SageMaker JumpStart to Operationalize LLM ApplicationsBuilding LLM-Driven WorkflowsOct 2, 20232Oct 2, 20232
InTowards Data SciencebyRam VegirajuHost Hundreds of NLP Models Utilizing SageMaker Multi-Model Endpoints Backed By GPU InstancesIntegrate Triton Inference Server With Amazon SageMakerSep 22, 2023Sep 22, 2023
InAWS in Plain EnglishbyRam VegirajuFour Different Ways to Host Large Language Models on Amazon SageMakerPick the option that makes the most sense for your use-caseAug 24, 20231Aug 24, 20231
InTowards Data SciencebyRam VegirajuDeploying Large Language Models With HuggingFace TGIAnother way to efficiently host and scale your LLMs with Amazon SageMakerJul 14, 20231Jul 14, 20231
InTowards Data SciencebyRam VegirajuDebugging SageMaker Endpoints With DockerAn Alternative To SageMaker Local ModeJun 16, 2023Jun 16, 2023
InTowards Data SciencebyRam VegirajuDeploying LLMs On Amazon SageMaker With DJL ServingDeploy BART on Amazon SageMaker Real-Time InferenceJun 7, 2023Jun 7, 2023
InTowards Data SciencebyRam VegirajuDeploying Cohere Language Models On Amazon SageMakerScale and Host LLMs on AWSMay 18, 2023May 18, 2023
InAWS in Plain EnglishbyRam VegirajuDeploy An MLOps Pipeline With Training, Model Registry, and Batch InferenceHarness SageMaker Pipelines With Batch InferenceMay 10, 20231May 10, 20231
InTowards Data SciencebyRam VegirajuDeploying Multiple Models with SageMaker PipelinesApplying MLOps best practices to advanced serving OptionsMar 23, 2023Mar 23, 2023
InTowards Data SciencebyRam VegirajuDeploying SageMaker Endpoints With TerraformInfrastructure as Code With TerraformMar 14, 2023Mar 14, 2023
InTowards Data SciencebyRam VegirajuLoad Testing Simplified With SageMaker Inference RecommenderTest TensorFlow ResNet50 on SageMaker Real-Time EndpointsMar 7, 2023Mar 7, 2023
InTowards Data SciencebyRam VegirajuLoad Testing SageMaker Multi-Model EndpointsUtilize Locust to Distribute Traffic Weight Across ModelsFeb 24, 2023Feb 24, 2023