Browse models
supported by Friendli Engine

Model library

Organization	Model name	Model description
Mistral AI	Mixtral-8x22B-v0.1 FP8	A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.
Mistral AI	Mixtral-8x22B-Instruct-v0.1 FP8	The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1.
Mistral AI	Mixtral-8x7B-v0.1 FP8	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Mistral AI	Mixtral-8x7B-Instruct-v0.1 FP8	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Mistral AI	Mixtral-8x7B-v0.1-fp8	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Hugging Face H4	zephyr-orpo-141b-A35b-v0.1	Zephyr is a series of language models that are trained to act as helpful assistants, and is a fine-tuned version of Mixtral-8x22B-v0.1. A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters.
Mistral AI	Mistral-7B-Instruct-v0.2 LoRAFP8	The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.
Mistral AI	Mistral-7B-Instruct-v0.1 LoRA	The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.
Mistral AI	Mistral-7B-v0.1 LoRA	A 7B transformer model, fast-deployed and easily customisable. Small, yet very powerful for a variety of use cases.
Mistral AI	Mistral-7B-Instruct-v0.3 LoRA	The Mistral-7B-v0.3 Large Language Model (LLM) is a Mistral-7B-v0.2 with extended vocabulary. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768
Mistral AI	Mistral-Nemo-Instruct-2407 LoRA	The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.
Mistral AI	Mistral-Nemo-Base-2407 LoRA	The Mistral-Nemo-Base-2407 Large Language Model (LLM) is a pretrained generative text model of 12B parameters trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.
Mistral AI	Mistral-Large-Instruct-2407 LoRA	Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.
Mistral AI	Mathstral-7B-v0.1 LoRA	Mathstral 7B is a model specializing in mathematical and scientific tasks, based on Mistral 7B.
Cohere For AI	c4ai-command-r-plus FP8	C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks.
Cohere For AI	c4ai-command-r-v01	C4AI Command-R is a research release of a 35 billion parameter highly performant generative model. Command-R is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering.
Google	codegemma-7b-it LoRA	CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.
Google	codegemma-7b LoRA	CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.
Google	gemma-7b-it LoRA	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Google	gemma-7b LoRA	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Google	gemma-2-9b LoRA	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Google	gemma-2-27b LoRA	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Google	gemma-2-9b-it LoRA	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Google	gemma-2-27b-it LoRA	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Databricks	dbrx-instruct	DBRX Instruct is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. DBRX Instruct specializes in few-turn interactions.
Databricks	dbrx-base	DBRX Base is a mixture-of-experts (MoE) large language model trained from scratch by Databricks.
Meta	Llama-2-7b-hf LoRA	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
Meta	Llama-2-7b-chat-hf LoRAFP8	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
Meta	Llama-2-13b-hf LoRA	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
Meta	Llama-2-13b-chat-hf LoRAFP8	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
Meta	Llama-2-70b-hf LoRA	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
Meta	Llama-2-70b-chat-hf LoRAFP8	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
Meta	CodeLlama-7b-hf LoRA	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding.
Meta	CodeLlama-13b-hf LoRA	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding.
Meta	CodeLlama-34b-hf LoRA	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding.
Meta	CodeLlama-70b-hf LoRA	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding.
Meta	Meta-Llama-3-8B LoRAFP8	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
Meta	Meta-Llama-3-8B-Instruct LoRAFP8	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
Meta	Meta-Llama-3-70B LoRAFP8	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
Meta	Meta-Llama-3-70B-Instruct LoRAFP8	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
Meta	Meta-Llama-3.1-8B LoRA	The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Meta	Meta-Llama-3.1-8B-Instruct LoRA	The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Meta	Meta-Llama-3.1-70B LoRA	The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Meta	Meta-Llama-3.1-70B-Instruct LoRA	The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Meta	Meta-Llama-3.1-405B LoRA	The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Meta	Meta-Llama-3.1-405B-Instruct LoRA	The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
OpenLM Research	open_llama_13b LoRA	OpenLLaMA is an open reproduction of LLaMA.
LMSYS	vicuna-13b-v1.3 LoRA	Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
01.AI	Yi-6B LoRA	The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more.
01.AI	Yi-34B LoRA	The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more.
WizardLM Team	WizardLM-13B-V1.0 LoRA	WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM.
WizardLM Team	WizardLM-13B-V1.2 LoRA	WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM.
WizardLM Team	WizardLM-70B-V1.0 LoRA	WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM.
WizardLM Team	WizardMath-7B-V1.0 LoRA	Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
WizardLM Team	WizardMath-70B-V1.0 LoRA	Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
WizardLM Team	WizardMath-7B-V1.1 LoRA	Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
WizardLM Team	WizardCoder-15B-V1.0 LoRA	Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.
WizardLM Team	WizardCoder-33B-V1.1 LoRA	Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.
WizardLM Team	WizardCoder-Python-13B-V1.0 LoRA	Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.
WizardLM Team	WizardCoder-Python-34B-V1.0 LoRA	Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.
BigCode	starcoder2-15b	StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded.
BigCode	starcoder2-7b	StarCoder2-7B model is a 7B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded.
BigCode	starcoder2-3b	StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded.
BigCode	starcoder	The StarCoder models are 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2), with opt-out requests excluded. The model uses Multi Query Attention, a context window of 8192 tokens, and was trained using the Fill-in-the-Middle objective on 1 trillion tokens.
BigCode	gpt_bigcode-santacoder	This is the same model as SantaCoder but it can be loaded with transformers >=4.28.1 to use the GPTBigCode architecture.
MosaicML	mpt-7b LoRA	MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths. It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset.
MosaicML	mpt-7b-storywriter LoRA	MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths. It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset.
MosaicML	mpt-30b LoRA	MPT-30B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML.
Technology Innovation Institute	falcon-7b	Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.
Technology Innovation Institute	falcon-40b	Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.
BigScience	bloom	BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. As such, it is able to output coherent text in 46 languages and 13 programming languages that is hardly distinguishable from text written by humans. BLOOM can also be instructed to perform text tasks it hasn't been explicitly trained for, by casting them as text generation tasks.
BigScience	bloomz	BLOOMZ is a model capable of zero-shot following human instructions in dozens of languages. It is fine-tuned with BLOOM pretrained multilingual language models on a cross-lingual task mixture (xP3), thus enabling it to generalize cross-lingually to unseen tasks and languages.
Facebook	opt-66b	OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI.
Facebook	opt-iml-max-30b	OPT-IML (OPT + Instruction Meta-Learning) is a set of instruction-tuned versions of OPT, on a collection of ~2000 NLP tasks gathered from 8 NLP benchmarks, called OPT-IML Bench.
EleutherAI	gpt-j-6b LoRA	GPT-J 6B is a transformer model trained using Ben Wang's Mesh Transformer JAX. "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters.
EleutherAI	gpt-neox-20b	GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile using the GPT-NeoX library. Its architecture intentionally resembles that of GPT-3, and is almost identical to that of GPT-J- 6B. Its training dataset contains a multitude of English-language texts, reflecting the general-purpose nature of this model.
EleutherAI	pythia-12b	The Pythia model suite was deliberately designed to promote scientific research on large language models, especially interpretability research.
Databricks	dolly-v2-12b	Databricks' dolly-v2-12b, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based on pythia-12b, Dolly is trained on ~15k instruction/response fine tuning records databricks-dolly-15k generated by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA and summarization. dolly-v2-12b is not a state-of-the-art model, but does exhibit surprisingly high quality instruction following behavior not characteristic of the foundation model on which it is based.
StabilityAI	stablelm-tuned-alpha-7b	StableLM-Tuned-Alpha is a suite of 3B and 7B parameter decoder-only language models built on top of the StableLM-Base-Alpha models and further fine-tuned on various chat and instruction-following datasets.
Microsoft	phi-1_5	The language model Phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts.
Microsoft	phi-2	Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value).
Microsoft	Phi-3-mini-4k-instruct	The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties.
Microsoft	Phi-3-mini-128k-instruct	The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
Qwen	Qwen1.5-7B	Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.
Qwen	Qwen1.5-7B-Chat	Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.
Qwen	Qwen2-7B	Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model.
Qwen	Qwen2-7B-Instruct	Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model.
Qwen	Qwen2-72B	Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model.
Qwen	Qwen2-72B-Instruct	Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model.
OpenAI community	gpt2	GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences.
OpenAI community	gpt2-xl	GPT-2 XL is the 1.5B parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.
Google	flan-t5-base	For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages.
Google	flan-t5-xl	For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages.
Google	flan-t5-xxl	For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages.
Facebook	blenderbot-3B	The BlenderBot series has made progress in combining conversational skills — like personality, empathy and knowledge — incorporating long-term memory, and searching the internet to carry out meaningful conversations.
xAI	grok-1	Base model trained on a large amount of text data, not fine-tuned for any particular task. 314B parameter Mixture-of-Experts model with 25% of the weights active on a given token. Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.
Snowflake	snowflake-arctic-instruct	Snowflake Arctic is a top-tier enterprise-focused LLM that pushes the frontiers of cost-effective training and openness.
Snowflake	snowflake-arctic-base	Snowflake Arctic is a top-tier enterprise-focused LLM that pushes the frontiers of cost-effective training and openness.
LG AI Research	EXAONE-3.0-7.8B-Instruct	EXAONE-3.0-7.8B-Instruct is a powerful generative language model specifically designed for bilingual (English and Korean) tasks. The model was pre-trained with 8T curated tokens and post-trained with supervised fine-tuning and direct preference optimization.
Stability AI	stable-diffusion-xl-base-1.0	SDXL is a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder.
Runway	stable-diffusion-v1-5	The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

Friendli Engine supports FP8 in all models and AWQ (4-bit) in most models, except for models like GPT-J.

The list above is not exhaustive. The current version of Friendli Engine supports direct loading of safetensors checkpoints for the following Hugging Face transformers model architectures.

ArcticForCausalLM

BlenderbotForConditionalGeneration

BloomForCausalLM

CohereForCausalLM

DbrxForCausalLM

ExaoneForCausalLM

FalconForCausalLM

GPT2LMHeadModel

GPTBigCodeForCausalLM

GPTJForCausalLM

GPTNeoXForCausalLM

Gemma2ForCausalLM

GemmaForCausalLM

Grok1ForCasualLM

LlamaForCausalLM

MPTForCausalLM

MistralForCausalLM

MixtralForCausalLM

OPTForcausalLM

Phi3ForCausalLM

PhiForCausalLM

Qwen2ForCausalLM

Starcoder2ForCausalLM

T5ForConditionalGeneration

If your model does not belong to one of the above model architectures, please contact us for support.

HOW TO USE

Three ways to run generative AI models with Friendli Engine:

Friendli Container

Serve generative AI models with Friendli Engine in your GPU environment

Learn more

Friendli Dedicated Endpoints

Build and run generative AI models on autopilot

Learn more

Friendli Serverless Endpoints

Call fast and affordable API for open-source generative AI models

Learn more

Browse models supported by Friendli Engine

Model library

Three ways to run generative AI models with Friendli Engine:

Friendli Container

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Browse models
supported by Friendli Engine