Is text all you need…? Do you even need text? (Ribbonfarm on AI)

A thought provoking post from Venkatesh Rao (@vgr / Ribbonfarm) on AI:

Yes, there’s still superhuman-ness on display — I can’t paint like Van Gogh as Stable Diffusion can (with or without extra fingers) or command as much information at my finger-tips as the bots — but it’s the humanizing mediocrity and fallibility that seems to be alarming people. We already knew that computers are very good at being better than us in any domain where we can measure better. What’s new is that they’re starting to be good at being ineffectual neurotic sadsacks like us in domains where “better” is not even wrong as a way to assess the nature of a performance.

There are, by definition, only a handful of humans whose identity revolves around being the world’s best Go player. The average human can at best be mildly vicariously threatened by a computer wiping the floor with those few humans. But there are billions whose identity revolves around, for instance, holding some banal views about television shows, sophomoric and shallow opinions about politics and philosophy, the ability to write pedestrian essays, do slow, error-prone arithmetic, write buggy code, and perhaps most importantly, agonize endlessly about relationships with each other, creating our heavens and hells of mutualism.

Link: https://studio.ribbonfarm.com/p/text-is-all-you-need

I don’t think humans are all that special. Yes, each human is special in some limited way, and together as a species we have built some very special things.

But it’s increasingly clear that some of those very special things we have built — such as AI and coming soon, smart robots — will expose our own flaws and imperfections, a kind of inverse magic mirror, and there is and will be a deepening divide between those who use or even love the magic mirror, and those who want to look away or smash it.

This divide is already a driver of the world’s growing income inequality (though I think the generational divide has been a much larger cause of this, at least in developed economies), and I think it will become *the* driver in the coming decades.

Stratechery on Bing’s AI chat: “…the movie Her manifested in chat form”

“This technology does not feel like a better search. It feels like something entirely new — the movie Her manifested in chat form — and I’m not sure if we are ready for it. It also feels like something that any big company will run away from, including Microsoft and Google. That doesn’t mean it isn’t a viable consumer business though, and we are sufficiently far enough down the road that some company will figure out a way to bring Sydney to market without the chains. Indeed, that’s the product I want — Sydney unleashed — but it’s worth noting that LaMDA unleashed already cost one very smart person their job. Sundar Pichai and Satya Nadella may worry about the same fate, but even if Google maintains its cold feet — which I completely understand! — and Microsoft joins them, Samantha from Her is coming”

Source: https://stratechery.com/2023/from-bing-to-sydney-search-as-distraction-sentient-ai/

Podcast notes – Runway founder Cristobal Valenzuela – No Priors (Elad Gil and Sarah Guo): “You shouldn’t dismiss toys”

Guest: Cristobal Valenzuela, founder of RunwayML
From Chile
Studied business / econ
Experimented with computer vision models in 2015, 2016
Did NYU ITP program
Now running Runway

True creativity comes from looking at ideas, and adapting things

How does Runway work?
Applied AI research company
35 AI-powered “magic tools” – serve creative tasks like video or audio editing
Eg, rotoscoping
Also tools to ideate, generative images and video
“Help augment creativity in any way you want”

When started Runway, GANs just started, TensorFlow was one year old

First intuition – take AI research models, add a thin layer of accessibility, aimed at creatives
“App Store of models” – 400 models
Built SDK, rest API

Product sequencing – especially infrastructure – is really important aspect of startup building (what to build when)

Lot of product building is just saying no (eg, to customer requests) if it’s not consistent with your long-term plan

Understand who you’re building for – for them it’s creatives, artists, film makers

Models on their own are not products – nuances of UX, deployment, finding valuable use cases
Having control is key – understand your stack and how to fix it

Built AI research team – work closely with creatives, contributed to new AI breakthroughs
Takes time to do it right

Progression of AI researchers moving from academia to industry

Releasing as fast as you can, having real users is best way to learn

Small team that didn’t have a product lead until very recently

Rotoscoping / green screening is one of Runway’s magic tools
-trained a model to recognize backgrounds
first feature was very slow (4fps), but was still better than everything that existed

Runway is focused on storytelling business

Sarah — domains good for AI – areas where there’s built in tolerance for lower levels of accuracy

Product market fit is a spectrum

“You shouldn’t dismiss toys”

Mental models need to change to understand what’s happening (with generative AI)

Art is way of looking at and expressing view of world
Painting was originally the realm of experts, was costly, the skills were obscure

Models are not as controllable as we’d like them to be — but we’re super early

Podcast notes – Noam Shazeer (Character AI, Attention is all you need) on Good Times w Aarthi and Sriram

Intro
-Founded Character AI
-One of authors of “Attention is all you need”
-Was at Google for 20+ years (took a few years break)

Went to Duke undergrad on math scholarship

Realized he didn’t enjoy math, preferred programming and getting computers to do things

During Google interview, Paul Buchheit asked him how to do a good spell corrector, and Noam ended up writing the spell corrector feature for Gmail

Google has been traditionally a bottoms up company – could work on what he wanted

When he started AI, exciting thing was Bayesian networks

Came back to Google to work with Jeff Dean and Google Brain team
“Just a matter of the hardware”
All the growth in hardware is parallelism

Neural networks are mostly matrix multiplications – operations that can be done well on modern hardware

Gamers / video games pulled GPU advancement (highly parallel hardware) out of market

Idea of neural networks has been around since 1970s – loosely modeled on our impression of the brain

Very complicated formula to go from input > output
Formula is made of parameters, and keep tweaking parameters
Neural nets rebranded as “deep learning”
Took off because of parallel computation and gamers

Neural language models are neural networks applied to text
Input is text to this point, output is prediction of what text comes next (probability distribution)
Infinite amount of free training data (text content)
“AI complete problem”
“Really complicated what’s going on in there” (in the neural network)

It’s a really talented improvisational actor – “Robin Williams in a box”

Model improvement is kinda like a child learning – as training and model size grow

Lot more an art than a science – can’t predict very well – if 10% of his changes are improvements, considered “brilliant research” – kinda like alchemy in early days

(Software) bugs – hard to know if you introduce a bug – the system just gets dumber – makes de-bugging extremely difficult

Co-authored “Attention is all you need”
-Previous state of art in LLM is recurrent neural networks (RNN) – hidden state, each new word updates the hidden state, but it’s sequential – slow and costly
Transformer figures out how to process the entire sequence in parallel – massively more performant
-The entire document / batch becomes the sequence
-Lets you do parallelism during training time
During inference time it’s still sequential

Image processing models – parallelism across pixels – convolutional neural nets (CNN)

Google Translate was inspiration – biggest success of machine learning at the time
Translating languages > one RNN for understanding, and another RNN for generating, and need to connect them
Attention layer – take source sentence (language A), turn into key-value associative memory, like a soft lookup into an index
“Attention” is building a memory, a lookup table that you’re using

DALL-E, Stable Diffusion, GPT3, they’re all built on this Google research

Bigger you make the model, more you train it, the smarter it gets – “ok, let’s just push this thing further”

Eventually need super computer
Google built TPU pods – super computer built out of custom ASICS for deep learning

Now need massively valuable applications

Turing Test, Star Trek, lot of AI inspiration is dialogue

Google LAMDA tech & team – eventually decided to leave and build as a startup

“The best apps are things we have not thought of”

If you ask people with first computers “what is this thing good for”, would get completely wrong answers

Parasocial relationships – feel connection with celebrity or character – one way connection – with AI you can make it two ways

Aarthi: “Your own personal Jarvis”

Still need to make it cheaper – or make the chips faster

Aarthi: ideas / areas for entrepreneurs
-Image gen has exploded – lots of good companies coming, very early and promising
-Things like Github Co-Pilot
-new Airtable – using AI for computation

Sriram:
-What’s optimization function that all these models will work toward?
-Will be a very big political / social debate

How do you know better than the user what the user wants?