Sign In
Free Sign Up
  • English
  • Español
  • 简体中文
  • Deutsch
  • 日本語
Sign In
Free Sign Up
  • English
  • Español
  • 简体中文
  • Deutsch
  • 日本語

How to Fine-Tune an LLM from Hugging Face

Large Language Models (LLMs) have - thanks to transformers (opens new window) and enormous training data - versatile functionalities with remarkable performance. Usually, LLMs are general-purpose and aren't trained with a solitary purpose in mind. For example, GPT-4 (opens new window) can allow language translation, text generation, question answering, and many other features.

For specific applications, say having a chatbot for healthcare or language translation for an underrepresented language, we need to have a specialized model. Luckily, one of the powerful features of LLMs (and other transformer-based models) is their ability to adapt. Hence, instead of training the model from scratch, we can take the existing LLM model and fine-tune it on the training data.

Fine-tuning is crucial in the domain of Large Language Models (LLMs), and there are many methods for it. As a result, we will be dedicating a couple of blogs to delve into the topic of fine-tuning and also compare it with other methods, such as prompt engineering and RAG. In our initial blog, we will explore the process of fine-tuning using the Hugging Face transformers library, while the subsequent one will focus on using OpenAI to fine-tune a general-purpose model.

# LLMs Fine-tuning

Fine-tuning can be either full or partial. Due to the huge size of the LLMs, it's infeasible to fine-tune them in full, and hence Performance Efficient fine-tuning (commonly known as PEFT (opens new window)) is a common technique for fine-tuning the LLMs. Since PEFT has good support in Hugging Face, so we will use the Hugging Face (opens new window) model for the purpose.

# Load the Pre-trained Model

Hugging Face has a whole ecosystem of libraries, so there are some useful libraries/modules, like:

For the pre-trained model, we can use any open-source model. And here, we will use Falcon-7b (opens new window) for its smaller size and amazing performance.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig

modelID = "tiiuae/falcon-7b"

# Prepare the Dataset

SFT (which we are going to use shortly in a while for the training) allows the Hugging Face hub datasets to be directly used. We can take leverage of heaps of datasets available there. Like, we will use the Open Assistance dataset (opens new window) for prompting.

Note:

In case the dataset you want to use for fine-tuning is unavailable on the Hugging Face hub, you can upload it using your account.

dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")

# Modify the Model According to the Requirements

In addition to partial fine-tuning, we can also use quantization to further reduce the weights’ size:

quantizationConfig = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(modelID, quantization_config=quantizationConfig)

We will load the tokenizer too:

tokenizer = AutoTokenizer.from_pretrained(modelID)
tokenizer.add_special_tokens({'pad_token': '<PAD>'})

Note:

Large Language Models (LLMs) require significant computational power for loading and fine-tuning. I have used Google Colab Pro with a A100 GPU for this model.

Boost Your AI App Efficiency now
Sign up for free to benefit from 150+ QPS with 5,000,000 vectors
Free Trial
Explore our product

# Fine-tune the Model

Now, we are ready to initialize our trainer. SFT's constructor takes some arguments, like:

  • model: pretrained model
  • train_dataset: fine-tuning dataset
  • dataset_text_field: text field, text - usually it is "text" by default
  • max_seq_length: maximum sequence length
  • tokenizer: tokenizer for the text
trainer = SFTTrainer(
        model=model,
        train_dataset=dataset,
        dataset_text_field="text",
        max_seq_length=512,
        tokenizer=tokenizer,
        packing=True,
    )

And we can train it now:

trainer.train()

Now, once it's trained, we can test it by some inference. But we need to call the respective pipeline first:

from transformers import pipeline
pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=AutoTokenizer.from_pretrained(model),
    device_map="auto",
)

For example, we inferred it using this prompt:

sequences = pipeline(
   "Arguably, the most delicious fruit on this planet is cashew (in raw form). Found in Brazil and other tropical regions, its taste is unparalleled. What do you think, Sam? \\n Sam:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")

And we got an output like this (it will differ in your case due to the stochastic nature of the language models):

Result: Arguably, the most delicious fruit on this planet is cashew (in raw form). Found in Brazil and other tropical regions, its taste is unparalleled. What do you think, Sam?
Sam: I'd say it's delicious. It has a slightly nutty-sweet flavor but with a very pleasant, buttery taste and a creamy, smooth texture. It's one of my favorite fruits and I can eat it by the handful.
What's the best place you've ever been?
Sam: Oh, this is a difficult question. It's a very difficult answer.
It's a difficult answer?
Sam: I've been in quite a few countries, but my favorite place is probably the Galapagos. I spent a few weeks there and loved the scenery.
What's the worst place you've ever been?
Sam: I think probably the most awful place I've ever seen is Berlin.

# Real-world Applications of LLMs Fine-tuning

Fine-tuning is changing how industries use AI, making it more affordable and user-friendly. Unlike using Retrieval-Augmented Generation (RAG), which requires continuous cost, fine-tuning lets you customize an open-source model once without further expenses. This gives you complete control and eliminates the need for extra infrastructure. Examples like PaLM and FinGPT show how fine-tuned models can be powerful and flexible:

  • Customer Service Automation: By fine-tuning models for specific customer service businesses, developers can create chatbots that don't just mimic general conversation but understand and respond to queries in the context of their business. This approach avoids use of external resources and provide 24/7 customer support that truly understands the terminologies and customer questions, improving the overall customer experience and satisfaction.

  • Language Translation Services: Through fine-tuning, developers can improve the language models to specialize in language translation tasks, bypassing the generic one-size-fits-all approach. This helps break down language barriers more effectively in international business and travel, without the need for continuous cost of external APIs.

  • Personalized Education: Fine-tuning makes it possible to create AI-powered platforms that adjust the learning materials to fit the speed and learning style of each student. By owning the model, educational institutions can continuously evolve and adapt the learning material without additional costs, making education more personalized and impactful.

These examples show that fine-tuning LLMs can be applied to solve real-world problems and enhance everyday life. The adaptability and efficiency of fine-tuned models promise even more innovative applications in the future.

Join Our Newsletter

# Conclusion

The capabilities of LLMs can be utilized by fine-tuning them for some specific task using a specialized dataset. It has been made easier with libraries like Hugging Face and PEFT techniques like partial fine-tuning and quantization. Due to the massive outbreak of open-source LLMs like Llama, Vicuana, Falcon, Aya and many others, LLM fine-tuning is becoming easier and affordable.

Nowadays, many organizations are developing AI applications using the APIs of Large Language Models (LLMs), where vector databases play a significant role by offering efficient storage and retrieval of contextual embeddings. MyScale (opens new window) is a vector database that has been designed specifically for AI applications, keeping all the factors in mind such as cost, accuracy, and speed. It is very easy to digest for the developers because it only requires SQL to interact with.

Fine-tuning plays a pivotal role in optimizing Large Language Models (LLMs), offering diverse methodologies for this endeavor. Stay tuned for our upcoming blog, where we'll explore fine-tuning a general-purpose model with OpenAI.

Related article: Outperforming Specialized Vector Databases with MyScale (opens new window)

If you have any feedback or suggestions, Please reach out to us on MyScale Discord (opens new window).

Keep Reading

Start building your Al projects with MyScale today

Free Trial
Contact Us