Fluent local model support is limited to MLX converted models only. These models are generally believed to be more performant at a fraction of memory usage cost.
Still, there are some limitations on whatever MLX model can be loaded into Fluent. Currently supported types are:
- Mistral
- Llama
- Phi
- Phi3
- Phi MoE
- Gemma
- Gemma 2
- Qwen 2
- Qwen 3
- Qwen 3 MoE
- StarCoder 2
- Cohere
- OpenELM
- InternLM2
- Granite
- MiMo
- GLM4
- AceReason
This list will be updated on a constant basis, in regards to Fluent updates.