If you haven’t already, sign up for The AI Exchange Newsletter where we’ll continue posting helpful resources like this!

<aside> 📝 This is an intermediate level tutorial that teaches you how to customize GPT using OpenAI’s fine tuning API endpoint with Python. To get started, you’ll need:

- Working knowledge of Python

</aside>

# Background

- Let’s say you have a TikTok account for your **dog walking business** and each week, you share tips and tricks about owning a dog to your audience in the form of short, fun videos!
- Here’s an example script from a video our company has posted in the past:
    
    > *If you're frequently walking your dog in the dark, here are three tips to keep you safe at night.
    Just because it's dark, doesn't mean your dog doesn't have to go and walking around alone at night can pose some real safety risks if you're not careful!
    Our first tip is to wear reflective clothing, and put a reflective collar on your dog, so that you and your furry friend are safe from cars.
    Our second tip is to go un-plugged. It's best to avoid using your phone or lisening to music while walking your dog at night.
    And third, try to find streets that have street lights or businesses that are still open so that you are not walking completely in the dark.
    We hope this helps you stay safe while you're walking your dog at night!
    Follow for more tips and tricks!*
    > 
- You’ve decided that it takes too long to make draft scripts for these videos every week. You’ve tried to use ChatGPT to make scripts for you, and even implemented few-shot learning within your prompts, but honestly the quality just doesn’t cut it - the scripts don’t *sound like your brand*!
- Enter fine tuning. To “teach” GPT our brand’s voice and style, we will be fine tuning GPT using past TikTok video transcripts as our training data.
- This tutorial will show you how to:
    - Structure your training data
    - Fine tune your own custom GPT model
    - Query that model to generate new TikTok scripts

# Let’s get started

[Here’s an example Google Colab notebook for you to follow along!](<https://colab.research.google.com/drive/18v-_JWLRYdPEhFgc6hiZjcSLtJ2CewVL?usp=sharing>)

### 1. Collect or create your training data

- We have already downloaded and transcribed all of our brand’s TikTok videos and will use these transcripts as our training data.
- The transcription process is out of scope for this tutorial - however, we used [Repurpose.io](<http://Repurpose.io>) (download TikTok audio) and [Rev.ai](<http://Rev.ai>) (audio transcription services with an API).
- OpenAI recommends using at least a hundred examples and they have found that each doubling of the dataset size leads to a linear increase in model quality.
- For this fine tuning tutorial, we used 101 TikTok script transcriptions as our training data.

### 2. Convert your training data to a CSV file

- If your transcription data is in an Excel or Google sheet, simply export to CSV (file → download → comma separated values).

### 3. Open a new Google Colab notebook

### 4. Install Open AI’s module directly within the notebook

```python
pip install openai
```