Supercharge
building and serving
generative AI

Get started Docs

GROUNDBREAKING PERFORMANCE

8.9x Cheaper

with Friendli Container

1 2

11.0x Cheaper

with Friendli Dedicated Endpoints

1 3

10.7× Higher

Throughput

6.2× Lower

Latency

CUSTOMER STORIES

FriendliAI can serve a wide range of generative AI use cases

View all use cases

S.Korea’s leading telecom provider, SK Telecom, operates their LLMs reliably and cost-efficiently without self-management by using Friendli Dedicated Endpoints.

Upstage’s Solar LLM efficiently handles ~100k tokens per day of translation traffic via Friendli Dedicated Endpoints, leading into significant reduction in GPU costs.

NextDay AI’s personalized character chatbots process ~0.5 trillion tokens per month, incurring high H100 GPU costs. By using Friendli Container, they instantly cut their GPU costs by more than 50%.

Our customers

PARTNERSHIPS

Easily use Friendli Suite with our partner technology

INTEGRATIONS

Seamlessly build and deploy LLM agents with our integrations

Get started free

Hugging Face

TUTORIAL

Weights & Biases

TUTORIAL

LangChain

DOCS

LiteLLM

DOCS

HOW TO USE

Three ways to run generative AI models with Friendli Engine:

Friendli Container

Serve generative AI models with Friendli Engine in your GPU environment

Learn more

Friendli Dedicated Endpoints

Build and run generative AI models on autopilot

Learn more

Friendli Serverless Endpoints

Call fast and affordable API for open-source generative AI models

Learn more

Building AI Agents Using Function Calling with LLMs

June 27, 2024
10 min read

Building AI-Powered Web Applications In 20 Minutes with FriendliAI, Vercel AI SDK, and Next.js

May 22, 2024
11 min read

Measuring LLM Serving Performance with LLMServingPerfEvaluator

Are you ready to build and deploy
your generative AI application effortlessly?

Get started free

1. Performance compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150. Evaluation conducted by FriendliAI.

2. Prices are based on running Friendli container and vLLM on a Coreweave A100 80GB GPU.

3. The price of the competitive service is $2.21 per hour.

4. Performance of Friendli Container compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150, mean request per second = 0.5. Evaluation conducted by FriendliAI.

Superchargebuilding and servinggenerative AI

FriendliAI can serve a wide range of generative AI use cases

Easily use Friendli Suite with our partner technology

Seamlessly build and deploy LLM agents with our integrations

Hugging Face

Weights & Biases

LangChain

LiteLLM

Three ways to run generative AI models with Friendli Engine:

Friendli Container

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Read more from our blogs

Building AI Agents Using Function Calling with LLMs

Building AI-Powered Web Applications In 20 Minutes with FriendliAI, Vercel AI SDK, and Next.js

Measuring LLM Serving Performance with LLMServingPerfEvaluator

Are you ready to build and deploy your generative AI application effortlessly?

Supercharge
building and serving
generative AI

Are you ready to build and deploy
your generative AI application effortlessly?