(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL – utm_medium}}", "utm_source={{URL – utm_source}}", "utm_campaign={{URL – utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());

Supercharge
building and serving
generative AI

GROUNDBREAKING PERFORMANCE

8.9x Cheaper

with Friendli Container

12

11.0x Cheaper

with Friendli Dedicated Endpoints

13

10.7× Higher

Throughput

1

6.2× Lower

Latency

4

CUSTOMER STORIES

FriendliAI can serve a wide range of generative AI use cases

View all use cases
SKT

S.Korea’s leading telecom provider, SK Telecom, operates their LLMs reliably and cost-efficiently without self-management by using Friendli Dedicated Endpoints.​

Upstage

Upstage’s Solar LLM efficiently handles ~100k tokens per day of translation traffic via Friendli Dedicated Endpoints, leading into significant reduction in GPU costs.

NextdayAI

NextDay AI’s personalized character chatbots process ~0.5 trillion tokens per month, incurring high H100 GPU costs. By using Friendli Container, they instantly cut their GPU costs by more than 50%.

  • Nacloud
  • Animus
  • Upstage
  • Tunib
  • ScatterLab
  • Enuma
  • SKT
  • NextdayAI
  • Nacloud
  • Animus
  • Upstage
  • Tunib
  • ScatterLab
  • Enuma
  • SKT
  • NextdayAI

PARTNERSHIPS

Easily use Friendli Suite with our partner technology

AWS
Nvidia
Azure
MongoDB

INTEGRATIONS

Seamlessly build and deploy LLM agents with our integrations

Get started free

Hugging Face

TUTORIAL

Weights & Biases

TUTORIAL

LangChain

DOCS

LiteLLM

DOCS

HOW TO USE

Three ways to run generative AI models with Friendli Engine:

01

Friendli Container

Serve generative AI models with Friendli Engine in your GPU environment

Learn more

02

Friendli Dedicated Endpoints

Build and run generative AI models on autopilot

Learn more

03

Friendli Serverless Endpoints

Call fast and affordable API for open-source generative AI models

Learn more

Are you ready to build and deploy your generative AI application effortlessly?

Get started free

1. Performance compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150. Evaluation conducted by FriendliAI.
2. Prices are based on running Friendli container and vLLM on a Coreweave A100 80GB GPU.
3. The price of the competitive service is $2.21 per hour.
4. Performance of Friendli Container compared to vLLM on a single NVIDIA A100 80GB GPU running AWQ-ed Mixtral 8x7B from Mistral AI with the following settings: mean input token length = 500, mean output token length = 150, mean request per second = 0.5. Evaluation conducted by FriendliAI.