How to customize GPT for your use case (aka fine tuning guide)

If you haven’t already, sign up for The AI Exchange Newsletter where we’ll continue posting helpful resources like this!

<aside> 📝 This is a practical guide on the different methods of customizing pre-trained AI models to better serve your specific use case.

Created by: The AI Exchange Team

</aside>

What is a pre-trained model?

GPT stands for Generative Pre-Trained Transformer.
“Generative” is the type of model that can generate new content or data that is similar to or based on the set of examples or patterns that it has been trained on.
“Pre-trained” because the model has already been taught how to do the task. GPT’s specific task is to predict the next most likely token (or word) given the words that were preceding it.
“Transformer” is the architecture of the model. A transformer model is a neural network that learns context and thereby meaning by tracking relationships in sequential data, like the words in this sentence.
If you’re interested, here’s some great visuals on the nerdy details behind how GPT was trained.

Why should you use a pre-trained model?

Significant benefits:
- Reduces computation costs
- Reduces your carbon footprint
- Allows you to use state-of-the-art models without having to train one from scratch
If you were to train a model like GPT-3, it would take you:
- 355 years on a single NVIDIA Tesla V100 GPU
- Because of their partnership with Microsoft, OpenAI was able to train GPT-3 on a supercomputer in as little as 34 days with an estimated cost of around $5M
And you can leverage all of that knowledge + more with OpenAI’s relatively cheap API
- Even their most powerful and expensive model release, called “text-davinci-003” is only $0.02 per 1K tokens (or ~750 words)

How can GPT be customized to your use case?

Although GPT-3 out-of-the-box is crazy powerful, and can perform a wide variety of natural language tasks, better responses are received when GPT is customized

Zero-shot learning (customize via prompt)

Because GPT was trained on so much data, the model can perform most tasks without needing any examples added to the input (or prompt). This is called zero-shot learning.
For a lot of use cases, such as classification, sentiment, and language translation, zero-shot learning works perfectly well and a customized GPT model is not needed.
Zero-shot prompts can be customized in a variety of ways through different tactics such as asking GPT for a specific writing style and tone, asking GPT to generate a specific length, etc. See our Ultimate Prompt Engineering Guide for prompt design tips - https://bald-neighbor-4a9.notion.site/The-ultimate-prompt-engineering-guide-for-text-generation-7367bdf074d04f9e8a9a63ba5a42b45a

Few-shot learning (customize via examples)

Few-shot learning is a way to use a few task-specific examples (less than 10 generally) as added context to the input (the prompt), and allowing the language model to learn how to perform well on that specific task through the prompt.
With this method, customizing GPT is extremely cheap (just the additional token cost of your few-shot examples within the prompt) and can be significantly more effective at a specific task than zero-shot learning
In this process however, there is no updating of the model weights and therefore no changes to the model itself. Users would always have to include the few-shot learning examples within their prompt to continue to achieve similar task performance.
If you’re interested in the mechanics of why few shot learning works, check out this paper.