Practical AI 206: ChatGPT goes prime time! – Listen on Changelog.com
Hosts: Daniel Whitenack (Data scientist), Chris Benson (Lockheed Martin)
All about ChatGPT
Chris – feels collaborative, like having a partner
Lots of structuring in the output – bulleted lists, paragraphs
Humans get things wrong / incomplete all the time – yet we’re holding AI to a higher standard
Shifting to more open access – maybe in response to open source AI products like Stable Diffusion
Chris – expect to see more “fast followers” to ChatGPT soon
TECHNICALS
GPT language models – it’s a “causal language model” not a “mass language model”
Trained to predict next word in sequence of words, but based on all previous words
Auto-regressive – predicts next thing based on all previous things, and so forth
That’s why the text develops as if a human were typing
Few shot learning – can change answer style based on your questions and prompts
Zero shot = input that a model has never seen before
Few shot = provide a small number of inputs / prompts to guide the model
ChatGPT – trained with “reinforcement learning from human feedback” (RLHF)
Human preference is key part of this
How does it scale?
Human feedback is expensive
3 steps:
1. Pre-train a language model (aka a “policy”) – not based on human feedback
2. Gather human preference data to train a reward model – outputs prediction of human preference
3. Fine tune (1) based on (2)
eg for ChatGPT,
(1) is GPT3.5 (for ChatGPT)
(2) outputs data based on (1), add human labels of preference, and train a “reward model”
GPT3 is 100B+ parameters
ChatGPT reward model is 6B parameters
For (2), goal is to reduce harm by adding human feedback into the loop
For (3), will penalize if it strays too far from (1), and score output according to (2)
Try only to make small iterative changes (adiabatic)
What’s next?
Open research questions – (2) architecture and process hasn’t been fully optimized, lot to explore there
Will be new language models coming (eg, GPT4, Microsoft, Google) – trying different (1) and (2)
Chris Albon tweet:
—
Sci-fi got it wrong.
We assumed AI would be super logical and humans would provide creativity.
But in reality it’s the opposite. Generative AI is good at getting an approximately correct output, but if you need precision and accuracy you need a human.
—