| Autoregressive Models |
Generate text by predicting the next token sequentially |
GPT series, Claude, LLaMA |
| Masked Language Models (MLMs) |
Predict masked tokens using surrounding context |
BERT, RoBERTa |
| Seq2Seq Models |
Encode input and decode output for tasks like translation |
T5, BART, MarianMT |
| Hybrid Models |
Combine transformers with retrieval or memory modules |
RETRO, RAG |
| Multimodal Models |
Handle text, image, audio, or video inputs |
GPT-4 Vision, Gemini, Kosmos |
| Instruction-Tuned Models |
Fine-tuned to follow commands and structured prompts |
FLAN-T5, Alpaca, Zephyr |
| Domain-Specific Models |
Tailored for specialized fields like healthcare or finance |
BioBERT, FinBERT |
| Open-Source Models |
Freely available for customization and research |
LLaMA 2, Mistral, Falcon |
| Efficient Small-Scale Models |
Optimized for edge devices or low-resource environments |
TinyLlama, Phi-2, Gemma |