Podcast notes – ChatGPT goes prime time! – Practical AI (Daniel and Chris)

Practical AI 206: ChatGPT goes prime time! – Listen on Changelog.com

Hosts: Daniel Whitenack (Data scientist), Chris Benson (Lockheed Martin)

All about ChatGPT

Chris – feels collaborative, like having a partner

Lots of structuring in the output – bulleted lists, paragraphs

Humans get things wrong / incomplete all the time – yet we’re holding AI to a higher standard

Shifting to more open access – maybe in response to open source AI products like Stable Diffusion

Chris – expect to see more “fast followers” to ChatGPT soon

TECHNICALS

GPT language models – it’s a “causal language model” not a “mass language model”
Trained to predict next word in sequence of words, but based on all previous words
Auto-regressive – predicts next thing based on all previous things, and so forth
That’s why the text develops as if a human were typing
Few shot learning – can change answer style based on your questions and prompts

Zero shot = input that a model has never seen before
Few shot = provide a small number of inputs / prompts to guide the model

ChatGPT – trained with “reinforcement learning from human feedback” (RLHF)
Human preference is key part of this

How does it scale?
Human feedback is expensive
3 steps:
1. Pre-train a language model (aka a “policy”) – not based on human feedback
2. Gather human preference data to train a reward model – outputs prediction of human preference
3. Fine tune (1) based on (2)

eg for ChatGPT,
(1) is GPT3.5 (for ChatGPT)
(2) outputs data based on (1), add human labels of preference, and train a “reward model”

GPT3 is 100B+ parameters
ChatGPT reward model is 6B parameters

For (2), goal is to reduce harm by adding human feedback into the loop

For (3), will penalize if it strays too far from (1), and score output according to (2)
Try only to make small iterative changes (adiabatic)

What’s next?
Open research questions – (2) architecture and process hasn’t been fully optimized, lot to explore there
Will be new language models coming (eg, GPT4, Microsoft, Google) – trying different (1) and (2)

Chris Albon tweet:

Sci-fi got it wrong.

We assumed AI would be super logical and humans would provide creativity.

But in reality it’s the opposite. Generative AI is good at getting an approximately correct output, but if you need precision and accuracy you need a human.