CUSTOMER STORIES
FriendliAI can serve a wide range of generative AI use cases
View all use casesS.Korea’s leading telecom provider, SK Telecom, operates their LLMs reliably and cost-efficiently without self-management by using Friendli Dedicated Endpoints.
Upstage’s Solar LLM efficiently handles ~100k tokens per day of translation traffic via Friendli Dedicated Endpoints, leading into significant reduction in GPU costs.
NextDay AI’s personalized character chatbots process ~0.5 trillion tokens per month, incurring high H100 GPU costs. By using Friendli Container, they instantly cut their GPU costs by more than 50%.
PARTNERSHIPS
Easily use Friendli Suite with our partner technology
HOW TO USE
Three ways to run generative AI models with Friendli Engine:
01
Friendli Container
Serve generative AI models with Friendli Engine in your GPU environment
Learn more03
Friendli Serverless Endpoints
Call fast and affordable API for open-source generative AI models
Learn moreRead more from our blogs
Are you ready to build and deploy
your generative AI application effortlessly?
Get started free1. Performance compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150. Evaluation conducted by FriendliAI.
2. Prices are based on running Friendli container and vLLM on a Coreweave A100 80GB GPU.
3. The price of the competitive service is $2.21 per hour.
4. Performance of Friendli Container compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150, mean request per second = 0.5. Evaluation conducted by FriendliAI.