Browse models
supported by Friendli Engine
Friendli EngineModel library
Organization | Model name | Model description |
---|---|---|
Mistral AI | Mixtral-8x22B-v0.1 | A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B. |
Mistral AI | Mixtral-8x22B-Instruct-v0.1 | The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1. |
Mistral AI | Mixtral-8x7B-v0.1 | The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. |
Mistral AI | Mixtral-8x7B-Instruct-v0.1 | The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. |
Mistral AI | Mixtral-8x7B-v0.1-fp8 | The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. |
Hugging Face H4 | zephyr-orpo-141b-A35b-v0.1 | Zephyr is a series of language models that are trained to act as helpful assistants, and is a fine-tuned version of Mixtral-8x22B-v0.1. A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters. |
Mistral AI | Mistral-7B-Instruct-v0.2 LoRAFP8 | The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. |
Mistral AI | Mistral-7B-Instruct-v0.1 LoRA | The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets. |
Mistral AI | Mistral-7B-v0.1 LoRA | A 7B transformer model, fast-deployed and easily customisable. Small, yet very powerful for a variety of use cases. |
Mistral AI | Mistral-7B-Instruct-v0.3 LoRA | The Mistral-7B-v0.3 Large Language Model (LLM) is a Mistral-7B-v0.2 with extended vocabulary. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 |
Mistral AI | Mistral-Nemo-Instruct-2407 LoRA | The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size. |
Mistral AI | Mistral-Nemo-Base-2407 LoRA | The Mistral-Nemo-Base-2407 Large Language Model (LLM) is a pretrained generative text model of 12B parameters trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size. |
Mistral AI | Mistral-Large-Instruct-2407 LoRA | Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities. |
Mistral AI | Mathstral-7B-v0.1 LoRA | Mathstral 7B is a model specializing in mathematical and scientific tasks, based on Mistral 7B. |
Cohere For AI | c4ai-command-r-plus | C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. |
Cohere For AI | c4ai-command-r-v01 | C4AI Command-R is a research release of a 35 billion parameter highly performant generative model. Command-R is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering. |
Google | codegemma-7b-it LoRA | CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion. |
Google | codegemma-7b LoRA | CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion. |
Google | gemma-7b-it LoRA | Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
Google | gemma-7b LoRA | Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
Google | gemma-2-9b LoRA | Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
Google | gemma-2-27b LoRA | Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
Google | gemma-2-9b-it LoRA | Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
Google | gemma-2-27b-it LoRA | Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
Databricks | dbrx-instruct | DBRX Instruct is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. DBRX Instruct specializes in few-turn interactions. |
Databricks | dbrx-base | DBRX Base is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. |
Meta | Llama-2-7b-hf LoRA | Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. |
Meta | Llama-2-7b-chat-hf LoRAFP8 | Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. |
Meta | Llama-2-13b-hf LoRA | Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. |
Meta | Llama-2-13b-chat-hf LoRAFP8 | Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. |
Meta | Llama-2-70b-hf LoRA | Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. |
Meta | Llama-2-70b-chat-hf LoRAFP8 | Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. |
Meta | CodeLlama-7b-hf LoRA | Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding. |
Meta | CodeLlama-13b-hf LoRA | Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding. |
Meta | CodeLlama-34b-hf LoRA | Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding. |
Meta | CodeLlama-70b-hf LoRA | Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This model is designed for general code synthesis and understanding. |
Meta | Meta-Llama-3-8B LoRAFP8 | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. |
Meta | Meta-Llama-3-8B-Instruct LoRAFP8 | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. |
Meta | Meta-Llama-3-70B LoRAFP8 | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. |
Meta | Meta-Llama-3-70B-Instruct LoRAFP8 | Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. |
Meta | Meta-Llama-3.1-8B LoRA | The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. |
Meta | Meta-Llama-3.1-8B-Instruct LoRA | The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. |
Meta | Meta-Llama-3.1-70B LoRA | The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. |
Meta | Meta-Llama-3.1-70B-Instruct LoRA | The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. |
Meta | Meta-Llama-3.1-405B LoRA | The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. |
Meta | Meta-Llama-3.1-405B-Instruct LoRA | The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. |
OpenLM Research | open_llama_13b LoRA | OpenLLaMA is an open reproduction of LLaMA. |
LMSYS | vicuna-13b-v1.3 LoRA | Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. |
01.AI | Yi-6B LoRA | The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. |
01.AI | Yi-34B LoRA | The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. |
WizardLM Team | WizardLM-13B-V1.0 LoRA | WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM. |
WizardLM Team | WizardLM-13B-V1.2 LoRA | WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM. |
WizardLM Team | WizardLM-70B-V1.0 LoRA | WizardLM is a large language model developed by fine-tuning the LLaMA model using a novel method called Evol-Instruct. This method generates large amounts of complex instruction data automatically, reducing the need for manual data creation. Starting with an initial set of instructions, Evol-Instruct iteratively rewrites them to increase their complexity. The resulting instruction data is then used to fine-tune the LLaMA model, creating WizardLM. |
WizardLM Team | WizardMath-7B-V1.0 LoRA | Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. |
WizardLM Team | WizardMath-70B-V1.0 LoRA | Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. |
WizardLM Team | WizardMath-7B-V1.1 LoRA | Present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying WizardLM Team's proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. |
WizardLM Team | WizardCoder-15B-V1.0 LoRA | Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. |
WizardLM Team | WizardCoder-33B-V1.1 LoRA | Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. |
WizardLM Team | WizardCoder-Python-13B-V1.0 LoRA | Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. |
WizardLM Team | WizardCoder-Python-34B-V1.0 LoRA | Introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. |
BigCode | starcoder2-15b | StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded. |
BigCode | starcoder2-7b | StarCoder2-7B model is a 7B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. |
BigCode | starcoder2-3b | StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. |
BigCode | starcoder | The StarCoder models are 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2), with opt-out requests excluded. The model uses Multi Query Attention, a context window of 8192 tokens, and was trained using the Fill-in-the-Middle objective on 1 trillion tokens. |
BigCode | gpt_bigcode-santacoder | This is the same model as SantaCoder but it can be loaded with transformers >=4.28.1 to use the GPTBigCode architecture. |
MosaicML | mpt-7b LoRA | MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths. It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. |
MosaicML | mpt-7b-storywriter LoRA | MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths. It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. |
MosaicML | mpt-30b LoRA | MPT-30B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML. |
Technology Innovation Institute | falcon-7b | Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. |
Technology Innovation Institute | falcon-40b | Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. |
BigScience | bloom | BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. As such, it is able to output coherent text in 46 languages and 13 programming languages that is hardly distinguishable from text written by humans. BLOOM can also be instructed to perform text tasks it hasn't been explicitly trained for, by casting them as text generation tasks. |
BigScience | bloomz | BLOOMZ is a model capable of zero-shot following human instructions in dozens of languages. It is fine-tuned with BLOOM pretrained multilingual language models on a cross-lingual task mixture (xP3), thus enabling it to generalize cross-lingually to unseen tasks and languages. |
Facebook | opt-66b | OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI. |
Facebook | opt-iml-max-30b | OPT-IML (OPT + Instruction Meta-Learning) is a set of instruction-tuned versions of OPT, on a collection of ~2000 NLP tasks gathered from 8 NLP benchmarks, called OPT-IML Bench. |
EleutherAI | gpt-j-6b LoRA | GPT-J 6B is a transformer model trained using Ben Wang's Mesh Transformer JAX. "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters. |
EleutherAI | gpt-neox-20b | GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile using the GPT-NeoX library. Its architecture intentionally resembles that of GPT-3, and is almost identical to that of GPT-J- 6B. Its training dataset contains a multitude of English-language texts, reflecting the general-purpose nature of this model. |
EleutherAI | pythia-12b | The Pythia model suite was deliberately designed to promote scientific research on large language models, especially interpretability research. |
Databricks | dolly-v2-12b | Databricks' dolly-v2-12b, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based on pythia-12b, Dolly is trained on ~15k instruction/response fine tuning records databricks-dolly-15k generated by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA and summarization. dolly-v2-12b is not a state-of-the-art model, but does exhibit surprisingly high quality instruction following behavior not characteristic of the foundation model on which it is based. |
StabilityAI | stablelm-tuned-alpha-7b | StableLM-Tuned-Alpha is a suite of 3B and 7B parameter decoder-only language models built on top of the StableLM-Base-Alpha models and further fine-tuned on various chat and instruction-following datasets. |
Microsoft | phi-1_5 | The language model Phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts. |
Microsoft | phi-2 | Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). |
Microsoft | Phi-3-mini-4k-instruct | The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. |
Microsoft | Phi-3-mini-128k-instruct | The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. |
Qwen | Qwen1.5-7B | Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. |
Qwen | Qwen1.5-7B-Chat | Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. |
Qwen | Qwen2-7B | Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. |
Qwen | Qwen2-7B-Instruct | Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. |
Qwen | Qwen2-72B | Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. |
Qwen | Qwen2-72B-Instruct | Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. |
OpenAI community | gpt2 | GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences. |
OpenAI community | gpt2-xl | GPT-2 XL is the 1.5B parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective. |
Google | flan-t5-base | For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages. |
Google | flan-t5-xl | For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages. |
Google | flan-t5-xxl | For the same number of parameters, FLAN-T5 has been fine-tuned on more than 1000 additional tasks covering also more languages. |
Facebook | blenderbot-3B | The BlenderBot series has made progress in combining conversational skills — like personality, empathy and knowledge — incorporating long-term memory, and searching the internet to carry out meaningful conversations. |
xAI | grok-1 | Base model trained on a large amount of text data, not fine-tuned for any particular task. 314B parameter Mixture-of-Experts model with 25% of the weights active on a given token. Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023. |
Snowflake | snowflake-arctic-instruct | Snowflake Arctic is a top-tier enterprise-focused LLM that pushes the frontiers of cost-effective training and openness. |
Snowflake | snowflake-arctic-base | Snowflake Arctic is a top-tier enterprise-focused LLM that pushes the frontiers of cost-effective training and openness. |
LG AI Research | EXAONE-3.0-7.8B-Instruct | EXAONE-3.0-7.8B-Instruct is a powerful generative language model specifically designed for bilingual (English and Korean) tasks. The model was pre-trained with 8T curated tokens and post-trained with supervised fine-tuning and direct preference optimization. |
Stability AI | stable-diffusion-xl-base-1.0 | SDXL is a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. |
Runway | stable-diffusion-v1-5 | The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. |
Friendli Engine supports FP8 in all models and AWQ (4-bit) in most models, except for models like GPT-J.
The list above is not exhaustive. The current version of Friendli Engine supports direct loading of safetensors checkpoints for the following Hugging Face transformers model architectures.
ArcticForCausalLM
BlenderbotForConditionalGeneration
BloomForCausalLM
CohereForCausalLM
DbrxForCausalLM
ExaoneForCausalLM
FalconForCausalLM
GPT2LMHeadModel
GPTBigCodeForCausalLM
GPTJForCausalLM
GPTNeoXForCausalLM
Gemma2ForCausalLM
GemmaForCausalLM
Grok1ForCasualLM
LlamaForCausalLM
MPTForCausalLM
MistralForCausalLM
MixtralForCausalLM
OPTForcausalLM
Phi3ForCausalLM
PhiForCausalLM
Qwen2ForCausalLM
Starcoder2ForCausalLM
T5ForConditionalGeneration
If your model does not belong to one of the above model architectures, please contact us for support.
HOW TO USE
Three ways to run generative AI models with Friendli Engine:
01
Friendli Container
Serve generative AI models with Friendli Engine in your GPU environment
Learn more03
Friendli Serverless Endpoints
Call fast and affordable API for open-source generative AI models
Learn more