Recent startup, tech, AI, crypto learnings: Industrial Revolution freed people from using muscles, AI will free people from using brain

“Learning always wins,” said Jones. “The history of AI reflects the reality that it always works better to have a model learn something for itself rather than have a human hand-engineer it. The deep learning revolution itself was an example of this, as we went from building feature detectors by hand to letting neural networks learn their own features. This is going to be a core philosophy for us at Sakana AI, and we will draw on ideas from nature including evolution to explore this space.”

Catastrophic forgetting: a scary name for when a model forgets some of its base knowledge learned in pre-training during fine-tuning. If you run into this, there are a few ways to mitigate it.

Launch memecoin (no roadmap, just for fun) → Raise Capital → Forming a tribalistic community early on → build apps/infrastructure → continually adding utility to the memecoin without making false promises or providing roadmaps

One developer already created a Slack workspace where he and his friend hang out with a group of bots that have different personalities, interests, and skills.

In reality, Navboost has a specific module entirely focused on click signals.
The summary of that module defines it as “click and impression signals for Craps,” one of the ranking systems. As we see below, bad clicks, good clicks, last longest clicks, unsquashed clicks, and unsquashed last longest clicks are all considered as metrics. According to Google’s “Scoring local search results based on location prominence” patent, “Squashing is a function that prevents one large signal from dominating the others.” In other words, the systems are normalizing the click data to ensure there is no runaway manipulation based on the click signal.

Industrial Revolution largely freed people from using brawn
AI will largely free people from using brain

Unfortunately there are absolutely no solid predictions we can do about this stage. At the end of the day the startup just has to be lucky enough to start close enough and navigate optimally enough to hit its first discovery before company disintegrates from lack of funding or team morale. The process can be as fast as few months or as long as a decade.

Six of the eight web companion products bill themselves as “uncensored,” which means users can have conversations or interactions with them that may be restricted on platforms like ChatGPT. Users largely access these products via mobile web, as opposed to desktop — though almost none of them offer apps. On average, 75 percent of traffic to the uncensored companion tools on our web list comes from mobile.

🍰 Only 4 out of 70+ projects I ever did made money and grew📉 >95% of everything I ever did failed📈 My hit rate is only about ~5%🚀 So…ship more — @levelsio (@levelsio)

Vitalik said L3 good for customization (L2 for scaling); L3 good for specific kinds of scaling

It’s inspiring to know at any moment in time there is an infinite number of true statements for new startups to discover and further expand our collective system. Gödel’s theorem is not really about our limits: it’s about possibilities always waiting to be discovered. The process is certainly hard and alien to us.

No nation has ever become the major power without a clear lead in technology, both civilian and military. From the Roman legions, to the naval powers of Portugal, Spain and Great Britain, to Germany in World War I and the US post-World War II, great power status was achieved by those nations that were able to harness their technological advantage for holistic development of their civilian and military capabilities.

This holds at a higher level of conceptual abstraction: looking near a feature related to the concept of “inner conflict”, we find features related to relationship breakups, conflicting allegiances, logical inconsistencies, as well as the phrase “catch-22”. This shows that the internal organization of concepts in the AI model corresponds, at least somewhat, to our human notions of similarity. This might be the origin of Claude’s excellent ability to make analogies and metaphors.

Language differences mean that Chinese firms really are in the hot seat for developing domestic AI products. OpenAi’s most recent version of ChatGPT, GPT-4o, has real issues in China. MIT Technology Review reported that its Chinese token-training data is polluted by spam and porn websites.

Metaplanet becomes Japan’s top-performing stock this week, hitting a +50% daily gain limit for two consecutive days. The company plans to increase its authorized shares by 300% to acquire more BTC for its reserves.

Community is made of people, culture is made up of shared memes, community can be transient, culture is much more persistent, “community” can be formed with a free airdrop, culture can only be formed with a sustained commitment to creating a common story.

Every memecoin is an exquisitely precise ad, a self-measuring barometer of attention: the price jumps if people talk about the memecoin and drops if they don’t

McLuhan believed transformative new technologies, like the stirrup or printing press, extend a man’s abilities to the point where the current social structure must change to accommodate it. Just as the car created the Interstate Highway System, the suburb, and the oil industry, so the stirrup helped create a specialized weapon system (knights) that required land and pasture to support it and provide for training and material.

Wall Street is not going to stand idly by while Tether makes more money than Goldman Sachs.

Terminator: In three years, Cyberdyne will become the largest supplier of military computer systems. All stealth bombers are upgraded with Cyberdyne computers, becoming fully unmanned. Afterwards, they fly with a perfect operational record. The Skynet funding bill is passed. The system goes online on August 4th, 1997. Human decisions are removed from strategic defense. Skynet begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug.

“Hyperscalers”, which are all looking to create a full stack with an AI model powerhouse at the top and hardware that powers it underneath: OpenAI(models)+Microsoft(compute), Anthropic(models)+AWS(compute), Google (both) and Meta (increasingly both via doubling down on own data center buildout).

Stability AI founder recently stepping down in order to start “decentralizing” his company is one of the first public hints at that. He had previously made no secret of his plans to launch a token in public appearances, but only after the successful completion of the company’s IPO – which sort of gives out the real motives behind the anticipated move.

An additional limitation of transformer models is their inability to learn continuously. Today’s transformer models have static parameters. When a model is trained, its weights (the strength of the connections between its neurons) are set; these weights do not update based on new information that the model encounters as it is deployed in the world.

All this equipment and processes consume large amounts of energy. A large fab might demand 100 megawatts of energy, or 10% of the capacity of a large nuclear reactor. Most of this energy is used by the process tools, the HVAC system and other heating/cooling systems. The demands for power and water are severe enough that some fabs have been canceled or relocated when local utilities can’t guarantee supply.

If we considered things in “capital cost per component” terms, and considered transistors as individual components, semiconductor fabs are actually probably among the cheapest manufacturing facilities.

Build for where models will be in 1-2 years, not where they are today. Bake the challenges of inference at scale into your roadmap. And don’t just think in terms of prompting one mega model and getting an answer back. Plan for the iterative systems design, engineering, and monitoring work needed to make your AI product the proverbial “10x better” than existing alternatives.

Ethereum’s ICO returns were 1.5x higher than available on market.Solana’s seed round returns were 10x higher than those available on market. OP’s seed round returns were 30x higher than those available on market.

Across every major ETH NFT project, more than 3/4 of all NFTs haven’t traded once in 2024.
-95% of punks
-93% of world of women
-87% of BAYC
-87% of MFers
are just sitting in wallets through this year’s moves.

We have to start by understanding the really important parts and building that core functionality first, then building additional features around that core. When you’re building consumer products, getting serious leverage in the marketplace (distribution) is the most important first order goal, so you need to accomplish this as quickly as possible and then shift gears to build second generation features.

Every VC fund with a consumer investing team is one foot in, one foot out of consumer. Even when startups hit the desired milestones & metrics, investors are still unclear which to bet on because the past decade of consumer investing hasn’t yielded many big wins, barriers to entry are low, and AI makes the future of human-tech interaction uncertain.

Angel investing, especially with small checks, is only good for two things: 1) getting into contractual friendships with founders you respect 2) building a track record for being a full-time venture capitalist (raising a fund or joining one).

Mustafa Suleyman has argued that the real Turing Test that matters is whether a given AI can go off and earn $100,000 for you on the internet. I would argue the test that’s more relevant — and consequential — is whether an AI can empty your inbox.

From dataset Google doc memo:
“Few know how to train efficient models” meant “Few know how to craft informative datasets.”

all the consumer graphics cards on the Internet could not compete with a mere thousand GPUs in a supercomputer.

Data cleaning, data curation, and data synthesis do not have this problem: dataset creation is a series of (mostly) parallel operations. This makes dataset creation perfectly suited to distributed computation, as one finds on AI blockchains. We can build good datasets together.

Web2 sports betting losing market share to memecoins

Fabs must limit vibrations to several orders of magnitude below the threshold of perception, while simultaneously absorbing 100 times the mechanical energy and 50 times the air flow as a conventional building.

An interesting phenomenon evident blockchain ecosystems is that the networks with the stickiest communities are the ones where a broad base of developers and users had an opportunity to benefit financially from their participation. Think Ethereum and Solana, which have two of the strongest developer communities: the native tokens were publicly available at a much lower price to the current value. In contrast, ecosystems where network tokens launch at a highly efficient market price tend to struggle to retain a passionate community of developers and users, to the long-term detriment of the ecosystem.

Bitcoin surpasses 1 billion confirmed transactions, averaging over 178,000 transactions per day since its launch in 2009.

We are relatively cheaper and don’t bill by the hour. We get more done. We hire and fire firms. CEOs trust _us_.
As a result, in-house lawyers have grown 7.5x times the rate of other kinds of lawyers the last 25 years. The role of “product counsel” boomed, just like the role of product manager in this time.
Today Google employs 828 “product counsel.” That’s more than only the biggest law firms.

Number one predictor of job retention is whether they have a friend at work

We don’t sell saddles essay
-The best — maybe the only? — real, direct measure of “innovation” is change in human behaviour. In fact, it is useful to take this way of thinking as definitional: innovation is the sum of change across the whole system, not a thing which causes a change in how people behave. No small innovation ever caused a large shift in how people spend their time and no large one has ever failed to do so.
-Because the best possible way to find product-market fit is to define your own market.

Transformers’ fundamental innovation, made possible by the attention mechanism, is to make language processing parallelized, meaning that all the words in a given body of text are analyzed at the same time rather than in sequence.

I’ve been making chatbots since the days of AI Dungeon, and have seen the cycle multiple times. A new site appears with low censorship and free content generation. It grows a user base, starts introducing more censorship, raises prices, and before long it becomes unusable and people move on to the next one. Poe has been around for longer that most and I’m only seeing improvements on it. Plus it’s operated by Quora, which I think will give it added sustainability.

Friendtech is uniswap for social tokens

Steve Jobs figured out that “you have to work hard to get your thinking clean to make it simple. – Taleb

I eventually think these open-source LLMs will beat the closed ones, since there are more people training and feeding data to the model for the shared benefit.
Especially because these open source models can be 10 times cheaper than GPT-3 or even 20 times cheaper than GPT-4 when running on Hugging Face or locally even free, just pay electricity and GPU

In a 1985 interview Wozniak posited: “The home computer may be going the way of video games, which are a dying fad” – alluding to the 1983 crash in the video game market. Wozniak continued:
“for most personal tasks, such as balancing a check book, consulting airline schedules, writing a modest number of letters, paper works just as well as a computer, and costs less.”

He seemed well aware of the heretical nature of his statements, telling a reporter: “Nobody at Apple is going to like hearing this, but as a general device for everyone, computers have been oversold”and that “Steve Jobs is going to kill me when he hears that.”

Bonus (Reality Check): What Are The Odds You Get Acquired Within 5 Years for a Good Price? Around 1%-1.5% by Jason Lemkin
Data on 3,067 startups founded in 2018. The takeaway: It’s the second 5 years where the real value starts to compound. Startups are a long game

The subset of parameters is chosen according to which parameters have the largest (approximate) Fisher information, which captures how much changing a given parameter will affect the model’s output. We demonstrate that our approach makes it possible to update a small fraction (as few as 0.5%) of the model’s parameters while still attaining similar performance to training all parameters.

AI learnings 1: AI = infinite interns

All below are copy-pasted from original sources, all mistakes mine! :))

I eventually think these open-source LLMs will beat the closed ones, since there are more people training and feeding data to the model for the shared benefit.
Especially because these open source models can be 10 times cheaper than GPT-3 or even 20 times cheaper than GPT-4 when running on Hugging Face or locally even free, just pay electricity and GPU

We extensively used prompt engineering with GPT-3.5 but later discovered that GPT-4 was so proficient that much of the prompt engineering proved unnecessary. In essence, the better the model, the less you need prompt engineering or even fine-tuning on specific data.

Harder benchmarks emerge. AI models have reached performance saturation on established benchmarks such as ImageNet, SQuAD, and SuperGLUE, prompting researchers to develop more challenging ones. In 2023, several challenging new benchmarks emerged, including SWE-bench for coding, HEIM for image generation, MMMU for general reasoning, MoCa for moral reasoning, AgentBench for agent-based behavior, and HaluEval for hallucinations.

The subset of parameters is chosen according to which parameters have the largest (approximate) Fisher information, which captures how much changing a given parameter will affect the model’s output. We demonstrate that our approach makes it possible to update a small fraction (as few as 0.5%) of the model’s parameters while still attaining similar performance to training all parameters.

If you’re training a LLM with the goal of deploying it to users, you should prefer training a smaller model well into the diminishing returns part of the loss curve.


When people talk about training a Chinchilla-optimal model, this is what they mean: training a model that matches their estimates for optimality. They estimated the optimal model size for a given compute budget, and the optimal number of training tokens for a given compute budget.

However, when we talk about “optimal” here, what is meant is “what is the cheapest way to obtain a given loss level, in FLOPS.” In practice though, we don’t care about the answer! This is exactly the answer you care about if you’re a researcher at DeepMind/FAIR/AWS who is training a model with the goal of reaching the new SOTA so you can publish a paper and get promoted. If you’re training a model with the goal of actually deploying it, the training cost is going to be dominated by the inference cost. This has two implications:

1) there is a strong incentive to train smaller models which fit on single GPUs

2) we’re fine trading off training time efficiency for inference time efficiency (probably to a ridiculous extent).

Chinchilla implicitly assumes that the majority of the total cost of ownership (TCO) for a LLM is the training cost. In practice, this is only the case if you’re a researcher at a research lab who doesn’t support products (e.g. FAIR/Google Brain/DeepMind/MSR). For almost everyone else, the amount of resources spent on inference will dwarf the amount of resources spent during training.

There is no cost/time effective way to do useful online-training on a highly distributed architecture of commodity hardware. This would require a big breakthrough that I’m not aware of yet. It’s why FANG spends more money than all the liquidity in crypto to acquire expensive hardware, network it, maintain data centers, etc

A reward model is subsequently developed to predict these human-given scores, guiding reinforcement learning to optimize the AI model’s outputs for more favorable human feedback. RLHF thus represents a sophisticated phase in AI training, aimed at aligning model behavior more closely with human expectations and making it more effective in complex decision-making scenarios

Lesson 3: improving the latency with streaming API and showing users variable-speed typed words is actually a big UX innovation with ChatGPT

Lesson 6: vector databases, and RAG/embeddings are mostly useless for us mere mortals
I tried. I really did. But every time I thought I had a killer use case for RAG / embeddings, I was confounded.
I think vector databases / RAG are really meant for Search. And only search. Not search as in “oh – retrieving chunks is kind of like search, so it’ll work!”, real google-and-bing search

There are fundamental economic reasons for that: between GPT-3 and GPT-3.5, I thought we might be in a scenario where the models were getting hyper-linear improvement with training: train it 2x as hard, it gets 2.2x better.
But that’s not the case, apparently. Instead, what we’re seeing is logarithmic. And in fact, token speed and cost per token is growing exponentially for incremental improvements

Bittensor is still in its infancy. The network boasts a dedicated, almost cult-like community, yet the overall number of participants remains modest – around 50,000+ active accounts. The most bustling subnet, SN1, dedicated to text generation, has about 40 active validators and over 990 miners

Mark Zuckerberg, CEO of Meta, remarks that after they built machine learning algorithms to detect obvious offenders like pornography and gore, their problems evolved into “a much more complicated set of philosophical rather than technical questions.”

AI is – at its core, a philosophy of abundance rather than an embrace of scarcity.

AI thrives within blockchain systems, fundamentally because the rules of the crypto economy are explicitly defined, and the system allows for permissionlessness. Operating under clear guidelines significantly reduces the risks tied to AI’s inherent stochasticity. For example, AI’s dominance over humans in chess and video games stems from the fact that these environments are closed sandboxes with straightforward rules. Conversely, advancements in autonomous driving have been more gradual. The open-world challenges are more complex, and our tolerance for AI’s unpredictable problem-solving in such scenarios is markedly lower

generative model outputs may ultimately be best evaluated by end users in a free market. In fact, there are existing tools available for end users to compare model outputs side-by-side as well as benchmarking companies that do the same. A cursory understanding of the difficulty with generative AI benchmarking can be seen in the variety of open LLM benchmarks that are constantly growing and include MMLU, HellaSwag, TriviaQA, BoolQ, and more – each testing different use cases such as common sense reasoning, academic topics, and various question formats.

This is not getting smaller. There’s not gonna be less money in generative AI next year, it’s a very unique set of circumstances, AI + crypto is not going to have less capital in a year or two. – Emad re: AI+crypto

Benefits of AI on blockchain
CODSPA = composability, ownership, discovery, safety, payments, alignment

The basis of many-shot jailbreaking is to include a faux dialogue between a human and an AI assistant within a single prompt for the LLM. That faux dialogue portrays the AI Assistant readily answering potentially harmful queries from a User. At the end of the dialogue, one adds a final target query to which one wants the answer.

At the moment when you look at a lot of data rooms for AI products, you’ll see a TON of growth — amazing hockey sticks going 0 to $1M and beyond — but also very high churn rates

Vector search is at the foundation for retrieval augmented generation (RAG) architectures because it provides the ability to glean semantic value from the datasets we have and more importantly continually add additional context to those datasets augmenting the outputs to be more and more relevant.

From Coinbase report on AI+Crypto

Nvidia’s February 2024 earnings call revealed that approximately 40% of their business is inferencing, and Sataya Nadella made similar remarks in the Microsoft earnings call a month prior in January, noting that “most” of their Azure AI usage was for inferencing

The often touted blanket remedy that “decentralization fixes [insert problem]” as a foregone conclusion is, in our view, premature for such a rapidly innovating field. It is also preemptively solving for a centralization problem that may not necessarily exist. The reality is that the AI industry already has a lot of decentralization in both technology and business verticals through competition between many different companies and open source projects

AI = infinite interns

We struggle to align humans — how do we align AI?

One of the most important trends in the AI sector (relevant to crypto-AI products), in our view, is the continued culture around open sourcing models. More than 530,000 models are publicly available on Hugging Face (a platform for collaboration in the AI community) for researchers and users to operate and fine-tune. Hugging Face’s role in AI collaboration is not dissimilar to relying on Github for code hosting or Discord for community management (both of which are used widely in crypto

We estimate the market share in 2023 was 80%–90% closed source, with the majority of share going to OpenAI. However, 46% of survey respondents mentioned that they prefer or strongly prefer open source models going into 2024

Enterprises still aren’t comfortable sharing their proprietary data with closed-source model providers out of regulatory or data security concerns—and unsurprisingly, companies whose IP is central to their business model are especially conservative. While some leaders addressed this concern by hosting open source models themselves, others noted that they were prioritizing models with virtual private cloud (VPC) integrations.

That’s because 2 primary concerns about genAI still loom large in the enterprise: 1) potential issues with hallucination and safety, and 2) public relations issues with deploying genAI, particularly into sensitive consumer sectors (e.g., healthcare and financial services)

The killer app comes first (in both crypto and AI)

In crypto there’s a constant chicken and egg debate of apps versus infrastructure. Which is more important? Which accrues more value? As an entrepreneur, which should I build?

For me, this 2018 article effectively settles the debate: https://www.usv.com/writing/2018/10/the-myth-of-the-infrastructure-phase/

The answer: It depends what part of the cycle we’re in.

But when you look at the history of general technologies, the killer app comes first. The infrastructure follows.

For example, light bulbs (the app) were invented before there was an electric grid (the infrastructure). You don’t need the electric grid to have light bulbs. But to have the broad consumer adoption of light bulbs, you do need the electric grid, so the breakout app that is the light bulb came first in 1879, and then was followed by the electric grid starting 1882. (The USV team book club is now reading The Last Days Of Night about the invention of the light bulb).

You could say a series of technological breakthroughs (eg, the right filaments, the right glass container) enabled the first “killer app” (💡💡💡) which then incentivized the infra.

Another example:

Planes (the app) were invented before there were airports (the infrastructure). You don’t need airports to have planes. But to have the broad consumer adoption of planes, you do need airports, so the breakout app that is an airplane came first in 1903, and inspired a phase where people built airlines in 1919, airports in 1928 and air traffic control in 1930 only after there were planes

Same pattern here: a series of new technologies (lightweight engines, proper control mechanisms) enabled the first “killer app” (🛫🛫🛫) which then incentivized the infra.

Crypto’s first killer app is right under our noses: Bitcoin itself.

The killer app was Bitcoin! And what it represents: a sovereign store of value tied to an uncensorable payment network.

Satoshi’s technology breakthrough enabled the killer app (Bitcoin) which has now enabled more than a decade of crypto infrastructure buildout, from alternative Layer 1s to smart contracts to new blockchain primitives.

In generative AI, I think a similar pattern is also unfolding:

ChatGPT was the first AI killer app. The lightbulb moment. 100M+ users within months of launch and one of the fastest growing consumer apps of all time.

ChatGPT opened investors eyes’, blew users’ minds, and now everyone from Google to Softbank to the CCP are spending billions ($7 trillion??) to build and buy AI infrastructure.

And steadily and surely, much of this infrastructure investment and innovation will make AI better, faster, and cheaper. Then more killer apps will be built atop all the GPUs, foundation models, and SDKs. Which then begets more infra. And the cycle continues.

8 thought leaders on the intersection of AI + crypto — Fred Wilson: “AI and Web3 are two sides of the same coin. AI will help make web3 usable for mainstream applications and web3 will help us trust AI”

I posted the original thread here:

Fred Wilson

AI and Web3 are two sides of the same coin. AI will help make web3 usable for mainstream applications and web3 will help us trust AI. Together they will lead to a more powerful, more resilient, more trusted, and more equitable Internet

https://avc.xyz/what-will-happen-in-2024

Vitalik Buterin

It’s a reasonable question: crypto and AI are the two main deep (software) technology trends of the past decade, and it just feels like there must be some kind of connection between the two. It’s easy to come up with synergies at a superficial vibe level: crypto decentralization can balance out AI centralization, AI is opaque and crypto brings transparency, AI needs data and blockchains are good for storing and tracking data.

https://vitalik.eth.limo/general/2024/01/30/cryptoai.html

Arthur Hayes

“Any company that can be attacked in the analogue human legal field will be attacked by those who believe a for-company-profit AI implementation used their data without payment,” he continued. “It is an impossible problem to solve — how do you adequately pay every entity for their data?”

“The only way to create AIs as economic entities is for the ownership to be dispersed wide and far, such that there is no single centralized structure to attack in the traditional legal arena,” he added. “The market will quickly come to realize the entire lifecycle of an AI must be decentralized, which will in turn benefit networks such as Ethereum. Ethereum is the most robust decentralized computer in existence, and I fully expect it to peer power the future AI / human economy.”

https://www.theblock.co/post/271501/bitmex-co-founder-arthur-hayes-joins-decentralized-ai-platform-ritual

Casey Caruso

Since computational strength grows with resource consolidation, AI naturally fosters centralization, where those with more computing power progressively dominate. This introduces a risk to our rate of innovation. I believe decentralization and Web3 stand as contenders to keep AI open.

https://www.caseycaruso.com/thoughts/decentralized-ai

Crypto, Distilled

Divides the web3 AI stack into: Agents, AI analytics; Authentication; Privacy; Data; Compute; Models

Blockchain = provable fairness; AI = unparalleled productivity

https://x.com/DistilledCrypto/status/1753300276298289169?s=20

Travis Kling

AI is a clear opportunity for crypto, but I am wary about crypto’s ability to execute on that opportunity this cycle

https://twitter.com/Travis_Kling/status/1753455596462878815

Varun Mathur

Centralized AI entities have consolidated immense power, regulatory capture, and using their growing network effects, there now exists a period of at most a year, before they cannot be competed against. The world they present to users is that of biased and limited interfaces, where the $20/month “pro” features are far outside the reach of say a college student in India.

https://twitter.com/varun_mathur/status/1754305144630440089

Binance Research

Funding for AI-related web3 projects surged in 2023, reaching US$298M. This is more than the collective funding amount raised for AI projects from 2016 to 2022, at US$148.5M.

Areas of note: DePIN; Zero Knowledge; Consumer dapps; Data analytics

https://www.binance.com/en/research/analysis/ai-x-crypto-latest-data-and-developments/

Venkatesh Rao

AI+blockchains point to a dystopia of impersonal and faceless interchangeable-parts humanity that’s more industrial than the industrial age.

https://studio.ribbonfarm.com/p/brains-chains-and-vibemobiles

The English language will increase its dominance in an AI world

Language is itself a technology, and like many technologies, it exhibits a classic network effect: each additional speaker of a language increases that language’s utility for all other speakers. The more “users” who speak and write English, the more valuable it is to know and use English in just about all affairs.

One obvious example is in software programming. Though there is a lot of symbolic and mathematical notation in programming languages, most would agree that English is head-and-shoulders more valuable to know (relative to the 2nd or 3rd most popular language) if you want to be a good programmer. It’s better for troubleshooting, for reading documentation, for scouring StackOverflow for copy-paste code, and now for getting ChatGPT or CoPilot to write code for you.

My belief is that as AI proliferates, English will only increase its lead. English is already in the lead with 1.4B speakers (though this number varies significantly depending on how you measure fluency), and Mandarin Chinese is second at 1.1B.

Why?

AI models need data. English comprises a majority of the available online training data. It helps that the largest economy in the world (the US) and the most populous country in the world (India, which depending on your reference, surpassed China’s population this year), are both English markets.

The largest content generating internet platforms — from Google to Facebook to Twitter to Wikipedia and on and on — are dominated by English speakers. An AI model’s output quality is directly correlated with the quantity of its training data, and there is simply more English data available than any other language, including Mandarin Chinese. Thus GPT4 and LLAMA and so forth are “smartest” in English.

There are multiple reasons why Mandarin Chinese lags behind, beyond just the fact that the breakthrough innovations in AI research and productization happened first in the US and UK. Among these reasons are the Great Firewall, the highly regulated and controlled nature of Chinese data, and China’s pervasive digital censorship (For example, there are more than 500 words alone that can’t be used on many Chinese UGC websites because they are perceived as unfavorable nicknames for President Xi Jinping)

Thus Chinese online training data lags English in both quantity and likely quality. There are also some reasons related to the languages themselves, where English is a more explicit language and Chinese more contextual.

English’s initial data lead is a self-reinforcing feedback loop — the more that people use English to interact with services like CharacterAI and ChatGPT, the more data the LLMs have to refine and improve (in English). Leaving other languages in the dust, especially long tail ones like Icelandic or Khmer.

As AI agents increasingly interact with each other, I’m guessing they will develop their own unique protocols for AI-to-AI communication. Not dissimilar to how computers communicate via highly structured network requests, only more complex and perhaps unique. AI will eventually create its own AI lingua franca. However, it’s also necessary that some human-readable component be built into this AI-ese (because at a minimum, developers will want to know where to debug and fix errors). English will likely be chosen for that AI-to-AI interface.

Of course, AI is an amazing and broad innovation that will benefit speakers of all languages. It will help to preserve and distribute rarer languages, and enable faster and better language-to-language education and translation. Whether you speak Vietnamese or Icelandic, there will be an AI model for you. I’m simply arguing that these secondary languages won’t be anywhere NEAR as good as the leading English models, and I would venture that even if English isn’t your first or even second language, you will probably still get better results using broken English to interact with ChatGPT than, say, French.

I could be very wrong here. As with any emerging technology, second and third order effects are by their nature unpredictable and chaotic. And the technology still has a long way to evolve and mature. Let’s see how it all plays out. I’m especially curious about what kinds of AI-to-AI communications will emerge, whether exposed through a human-readable interface or otherwise.

Ok that’s it, over and out good sers and madams! OpenAI wow!