December 2023

QLoRA: Efficient Finetuning of Quantized LLMs

Chibuike Mba2 years ago2 years ago04 mins

The key innovation behind QLoRA lies in its ability to backpropagate gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters (LoRA). The resulting model family, aptly named Guanaco, surpasses all previously released models on the Vicuna benchmark, achieving an impressive 99.3% of the performance level of ChatGPT. Notably, this feat is accomplished within a mere 24 hours of fine-tuning on a single GPU.

LoRA: Low-Rank Adaptation of Large Language Models

Chibuike Mba2 years ago2 years ago03 mins

The core idea behind LoRA is to freeze the pre-trained model weights and introduce trainable rank decomposition matrices into each layer of the Transformer architecture. This innovative approach significantly reduces the number of trainable parameters for downstream tasks, offering a more efficient and cost-effective adaptation method. For instance, when compared to fine-tuning GPT-3 175B with Adam, LoRA demonstrates an astonishing reduction of trainable parameters by a factor of 10,000 and a 3x decrease in GPU memory requirements.

AgileCoder: The AI That Writes Code Better Than You (And MetaGPT Too!)

Unlock the Power of Your Documents: Introducing Kemon AI, Your AI-Powered Research Assistant

Prometheus-Eval and Prometheus 2: Setting New Standards in LLM Evaluation and Open-Source Innovation with State-of-the-art Evaluator Language Model

Hugging Face Releases LeRobot: An Open-Source Machine Learning (ML) Model Created for Robotics

AI and CRISPR: Revolutionizing Genome Editing and Precision Medicine

Google DeepMind Introduces the Frontier Safety Framework: A Set of Protocols Designed to Identify & Mitigate Potential Harms Related to Future AI Systems

QLoRA: Efficient Finetuning of Quantized LLMs

LoRA: Low-Rank Adaptation of Large Language Models