AI learnings 1: AI = infinite interns

All below are copy-pasted from original sources, all mistakes mine! :))

I eventually think these open-source LLMs will beat the closed ones, since there are more people training and feeding data to the model for the shared benefit.
Especially because these open source models can be 10 times cheaper than GPT-3 or even 20 times cheaper than GPT-4 when running on Hugging Face or locally even free, just pay electricity and GPU

We extensively used prompt engineering with GPT-3.5 but later discovered that GPT-4 was so proficient that much of the prompt engineering proved unnecessary. In essence, the better the model, the less you need prompt engineering or even fine-tuning on specific data.

Harder benchmarks emerge. AI models have reached performance saturation on established benchmarks such as ImageNet, SQuAD, and SuperGLUE, prompting researchers to develop more challenging ones. In 2023, several challenging new benchmarks emerged, including SWE-bench for coding, HEIM for image generation, MMMU for general reasoning, MoCa for moral reasoning, AgentBench for agent-based behavior, and HaluEval for hallucinations.

The subset of parameters is chosen according to which parameters have the largest (approximate) Fisher information, which captures how much changing a given parameter will affect the model’s output. We demonstrate that our approach makes it possible to update a small fraction (as few as 0.5%) of the model’s parameters while still attaining similar performance to training all parameters.

If you’re training a LLM with the goal of deploying it to users, you should prefer training a smaller model well into the diminishing returns part of the loss curve.


When people talk about training a Chinchilla-optimal model, this is what they mean: training a model that matches their estimates for optimality. They estimated the optimal model size for a given compute budget, and the optimal number of training tokens for a given compute budget.

However, when we talk about “optimal” here, what is meant is “what is the cheapest way to obtain a given loss level, in FLOPS.” In practice though, we don’t care about the answer! This is exactly the answer you care about if you’re a researcher at DeepMind/FAIR/AWS who is training a model with the goal of reaching the new SOTA so you can publish a paper and get promoted. If you’re training a model with the goal of actually deploying it, the training cost is going to be dominated by the inference cost. This has two implications:

1) there is a strong incentive to train smaller models which fit on single GPUs

2) we’re fine trading off training time efficiency for inference time efficiency (probably to a ridiculous extent).

Chinchilla implicitly assumes that the majority of the total cost of ownership (TCO) for a LLM is the training cost. In practice, this is only the case if you’re a researcher at a research lab who doesn’t support products (e.g. FAIR/Google Brain/DeepMind/MSR). For almost everyone else, the amount of resources spent on inference will dwarf the amount of resources spent during training.

There is no cost/time effective way to do useful online-training on a highly distributed architecture of commodity hardware. This would require a big breakthrough that I’m not aware of yet. It’s why FANG spends more money than all the liquidity in crypto to acquire expensive hardware, network it, maintain data centers, etc

A reward model is subsequently developed to predict these human-given scores, guiding reinforcement learning to optimize the AI model’s outputs for more favorable human feedback. RLHF thus represents a sophisticated phase in AI training, aimed at aligning model behavior more closely with human expectations and making it more effective in complex decision-making scenarios

Lesson 3: improving the latency with streaming API and showing users variable-speed typed words is actually a big UX innovation with ChatGPT

Lesson 6: vector databases, and RAG/embeddings are mostly useless for us mere mortals
I tried. I really did. But every time I thought I had a killer use case for RAG / embeddings, I was confounded.
I think vector databases / RAG are really meant for Search. And only search. Not search as in “oh – retrieving chunks is kind of like search, so it’ll work!”, real google-and-bing search

There are fundamental economic reasons for that: between GPT-3 and GPT-3.5, I thought we might be in a scenario where the models were getting hyper-linear improvement with training: train it 2x as hard, it gets 2.2x better.
But that’s not the case, apparently. Instead, what we’re seeing is logarithmic. And in fact, token speed and cost per token is growing exponentially for incremental improvements

Bittensor is still in its infancy. The network boasts a dedicated, almost cult-like community, yet the overall number of participants remains modest – around 50,000+ active accounts. The most bustling subnet, SN1, dedicated to text generation, has about 40 active validators and over 990 miners

Mark Zuckerberg, CEO of Meta, remarks that after they built machine learning algorithms to detect obvious offenders like pornography and gore, their problems evolved into “a much more complicated set of philosophical rather than technical questions.”

AI is – at its core, a philosophy of abundance rather than an embrace of scarcity.

AI thrives within blockchain systems, fundamentally because the rules of the crypto economy are explicitly defined, and the system allows for permissionlessness. Operating under clear guidelines significantly reduces the risks tied to AI’s inherent stochasticity. For example, AI’s dominance over humans in chess and video games stems from the fact that these environments are closed sandboxes with straightforward rules. Conversely, advancements in autonomous driving have been more gradual. The open-world challenges are more complex, and our tolerance for AI’s unpredictable problem-solving in such scenarios is markedly lower

generative model outputs may ultimately be best evaluated by end users in a free market. In fact, there are existing tools available for end users to compare model outputs side-by-side as well as benchmarking companies that do the same. A cursory understanding of the difficulty with generative AI benchmarking can be seen in the variety of open LLM benchmarks that are constantly growing and include MMLU, HellaSwag, TriviaQA, BoolQ, and more – each testing different use cases such as common sense reasoning, academic topics, and various question formats.

This is not getting smaller. There’s not gonna be less money in generative AI next year, it’s a very unique set of circumstances, AI + crypto is not going to have less capital in a year or two. – Emad re: AI+crypto

Benefits of AI on blockchain
CODSPA = composability, ownership, discovery, safety, payments, alignment

The basis of many-shot jailbreaking is to include a faux dialogue between a human and an AI assistant within a single prompt for the LLM. That faux dialogue portrays the AI Assistant readily answering potentially harmful queries from a User. At the end of the dialogue, one adds a final target query to which one wants the answer.

At the moment when you look at a lot of data rooms for AI products, you’ll see a TON of growth — amazing hockey sticks going 0 to $1M and beyond — but also very high churn rates

Vector search is at the foundation for retrieval augmented generation (RAG) architectures because it provides the ability to glean semantic value from the datasets we have and more importantly continually add additional context to those datasets augmenting the outputs to be more and more relevant.

From Coinbase report on AI+Crypto

Nvidia’s February 2024 earnings call revealed that approximately 40% of their business is inferencing, and Sataya Nadella made similar remarks in the Microsoft earnings call a month prior in January, noting that “most” of their Azure AI usage was for inferencing

The often touted blanket remedy that “decentralization fixes [insert problem]” as a foregone conclusion is, in our view, premature for such a rapidly innovating field. It is also preemptively solving for a centralization problem that may not necessarily exist. The reality is that the AI industry already has a lot of decentralization in both technology and business verticals through competition between many different companies and open source projects

AI = infinite interns

We struggle to align humans — how do we align AI?

One of the most important trends in the AI sector (relevant to crypto-AI products), in our view, is the continued culture around open sourcing models. More than 530,000 models are publicly available on Hugging Face (a platform for collaboration in the AI community) for researchers and users to operate and fine-tune. Hugging Face’s role in AI collaboration is not dissimilar to relying on Github for code hosting or Discord for community management (both of which are used widely in crypto

We estimate the market share in 2023 was 80%–90% closed source, with the majority of share going to OpenAI. However, 46% of survey respondents mentioned that they prefer or strongly prefer open source models going into 2024

Enterprises still aren’t comfortable sharing their proprietary data with closed-source model providers out of regulatory or data security concerns—and unsurprisingly, companies whose IP is central to their business model are especially conservative. While some leaders addressed this concern by hosting open source models themselves, others noted that they were prioritizing models with virtual private cloud (VPC) integrations.

That’s because 2 primary concerns about genAI still loom large in the enterprise: 1) potential issues with hallucination and safety, and 2) public relations issues with deploying genAI, particularly into sensitive consumer sectors (e.g., healthcare and financial services)

Recent crypto learnings 4: “Memecoins are prediction market perpetuals”

Past updates 1, 2, and 3

By the way, the FDIC is essentially guaranteeing over $20 trillion in deposits on just over a hundred billion. So they’ve got a half-penny on the dollar.

L2s eventually move to interoperate with one another based on tech evolution and customer demand
9) ETHEREUM IS SUDDENLY THE DE FACTO GLOBAL SETTLEMENT LAYER and ETH IS THE NATIVE PROGRAMMABLE MONEY OF THAT SETTLEMENT LAYER

when volume > marketcap, parabolas are often in final stages

CometBFT is software for securely and consistently replicating an application on many machines. By securely, we mean that CometBFT works as long as less than 1/3 of machines fail in arbitrary ways.

Grant Engelbart, Carson Group Vice President: “We’re seeing advisors allocate 3.5% of Bitcoin ETFs on average to client household portfolios

There are 8 key innovations that make the Solana network possible:
* Proof of History (POH) — a clock before consensus;
* Tower BFT — a PoH-optimized version of PBFT;
* Turbine — a block propagation protocol;
* Gulf Stream — Mempool-less transaction forwarding protocol;
* Sealevel — Parallel smart contracts run-time;
* Pipelining — a Transaction Processing Unit for validation optimization
* Cloudbreak — Horizontally-Scaled Accounts Database; and
* Replicators — Distributed ledger store

Validators are special full-nodes that participate in the consensus process (implemented in the underlying consensus engine) in order to add new blocks to the chain. Any account can declare its intention to become a validator operator, but only those with sufficient delegation get to enter the active set (for example, only the top 125 validator candidates with the most delegation get to be validators in the Cosmos Hub)

As (i) parabolically growing global debt necessitates accelerating debasement of even the most stable fiat currencies, (ii) price inflation across both essentials and durable assets marches higher, and (iii) more governments and banks around the world move to seize deposits and censor payments, the value of the properties above will become abundantly clear to billions (in most cases this will be an instinctive realization rather than an academic one).

But while the benefits of the internet’s early incarnations were more abstract, bitcoin comes with a powerful adoption incentive baked in: the opportunity for rapid and unmatched accrual of purchasing power over time (or more colloquially, “Number Go Up”). Early adopters will reap outsized and compounding rewards from this trend (i.e. a greater share of finite available bitcoin) at the expense of laggards, incentivizing a self-perpetuating rush to move first

Avalanche consensus stands out for its permissionless nature, meaning it doesn’t impose a strict limit on the number of validators, unlike other layer-1 solutions like Cosmos or BSC, which limit their active validators to 125 and 21, respectively

Pixels grew from 5K to 730 K DAU when migrating from Polygone to Ronin. MAU currently sits at a whipping 1.3 M. Is Pixels the largest onchain application in all of crypto right now? Game developers need users and Pixels proves that Ronin is the only chain that can offer this.

The general purpose public blockchains out there might best be understood as platforms for rule-breaking apps. (For if there are no rules being broken it becomes tempting to ask why a decentralized architecture is the best tool for the job.) If I were an investor I’d be asking any apps (or dApps) on top of these platforms the question “what rules are you breaking?”.

Under BIT001, each Subnet has its own token that can be converted into the main Bittensor token TAO. Also, each subnet has its own issuance (1 token per block, half to the TAO/Subnet token pool and half to miners/validators) schedule and Uniswap style LP pool for conversions between TAO and subnet token

Out of 1997 validators 1818 received delegations from the foundation & Alameda.
In total they have delegated 106M SOL, 73M from the foundation and 33M from Alameda.

Safety: blockchains are designed to be reliable and secure with minimal trust assumptions, in adversarial environments, where a lot of value is at stake. Agents interacting via smart contract applications inherit these strong properties

The world’s 1st on-chain AI project, “The Rockefeller Bot”
The world’s 1st on-chain AI game, “Leela vs the World”
And the world’s 1st on-chain AI artist, “zkMon”

I do not view @bittensor_ as a cryptocurrency project
I see $TAO as AI and Machine Learning infrastructure, utilising #crypto for incentivization

For instance, Ripple, which recently pledged $100 million to “ramp up” global carbon markets, was one of the blockchain networks used in the World Bank’s research on the Interledger protocol, research which the World Bank referred to as “very promising.” Ripple’s remittance product was previously endorsed by the World Bank and Ripple co-founder, Chris Larsen, was previously an advisor to the IMF on blockchain technologies.

Memecoins are crypto native social fi — Imran Khan

Another aspect is that liquidity can hide the ball. Shitcoins (and nfts) use low liquidity to meme that your bag is more valuable than it is. If for example a tiny bit of some shitcoin trades at some high price, that doesn’t mean the sum of everyone’s bag is worth that much, yet most are inclined to believe it. This is a kind of arbitrage on perception vs reality that these assets exploit

In short, it will FEEL like a regular bear cycle, but in reality the game has changed for BTC and ETH – forever.
This means the window in time for the average non rich person to get generational exposure to BTC and ETH is closing, very rapidly

Essentially, most people will be priced out of owning 10 ETH or 1 BTC.
I also believe that going forward alts will be less appealing each cycle as people just prefer the concensus trade of BTC and ETH that are guaranteed to go up due to ETF flows + because of new market participants size, you could still get 20-50% per year, with way less downside risk.
As such I think people will be less interested in altcoins.
This mimics how the S&P500 works, with basically 4-6 massive tech firms, like Google, apple, amazon, meta etc. Propping up the entire thing

While the experiment is still underway, Akash provided 24,000 NVIDIA A100 (80GB) hours to Thumper to code and train the model, and we’ll be publishing the model and code to Hugging Face soon. The result will be an image-generation AI model that can be used without the risk of copyright infringement, and will round out Akash’s capabilities to support the three most popular AI tasks: training, fine-tuning, and inferencing.

* The total time for generating the proving key was 327,916 seconds — over 91 hours when run on a single machine with 128 core CPUs and 1TB RAM
* These 144 proving keys occupied a disk space over 10TB
* The total proving time of the 144 sub-blocks was 322,774 seconds — just shy of 90 hours (when run on the same single machine)
And we did it! 200+ hours later, on a 128-core CPU and 1TB RAM machine, we completed the world’s 1st full ZK proving of the inference pass of a billion+ parameter LLM!

we will see much more homogeneity at L2, with the ultimate end game being that many L2s either become tightly coupled to each other (eg the superchain) and/or to L1 (via based sequencing and native zk prover support in the L1).

Chopping Block on decentralized AI:
Scale AI less about data lake / RLHF, more about fine tuning and running infra now
Two main categories: Decentralized inference and GPU marketplaces
Most don’t believe latter is competitive v centralized
Haseeb: Crypto excels where there’s a lot of censorship
Tarun: Even OpenAI fine tuning has lots of rules and restrictions
Seems they’re more skeptical about infra and more interested in app side finding real use cases and metrics, need more decentralized AI apps
“Ratio of infrastructure to application is absurd”

Has the ETH ETF launched and Larry Fink and his cabal of satanists piled into the deflationary asset? no Have we seen NFT mania yet? no
If you’re thinking of taking profits already you just might in fact be one of the weakest pathetic people I have ever known to ever exist

Memecoins are prediction market perpetuals eg $TRUMP, $DOGE

Bitcoin is the most successful financial meme since gold and even at today’s all-time high, all the bitcoin in the world is still only worth about 1/14 of all the gold in the world.
Unlike the gold meme, which has infected about as many minds as it ever will, the bitcoin meme is growing — and it’s growing in a time when 1) people have more money than ever to invest and 2) people are more than ever looking for lottery-ticket type investments

The driving force behind ongoing experimentation at the intersection of crypto and AI is the same that drives much of crypto’s most promising use cases – access to a permissionless and trustless coordination layer that better facilitates the transfer of value

Akash has long provided a marketplace for CPUs, for example, offering similar services as centralized alternatives at 70-80% discount. Lower prices, however, have not resulted in significant uptake. Active leases on the network have flattened out, averaging only 33% compute,16% of memory, and 13% of storage for the second of 2023. While these are impressive metrics for on-chain adoption (for reference, leading storage provider Filecoin had 12.6% storage utilization in Q3 2023), it demonstrates that supply continues to outpace demand for these products

Bitcoin, to me, essentially looks like the open-source code equivalent to a self-fulfilling prophecy. The way it functions, as I said yesterday, essentially makes it a freedom-money virus

Assuming fees from all Uniswap pairs are distributed to stakers and assuming that Uniswap does its highest-ever monthly volume (last achieved in 2021) and it maintains that level of volume for an entire year, stakers could then expect a payout of 5.3%.

“Memecoin” = “sell everything without having a product”

even when eschewing the eye-watering gains that early BTC investors earned between 2011 and 2015, note that February 2024’s percentage gain is “only” the fifth largest percentage gain since 2017 and the second largest percentage gain this halving cycle, with December 2020 having experienced a 47% gain

If you go look at social risk you will see it went parabolic *AFTER* BTC hit new highs
So perhaps the answer to retail coming back is dependent on *IF* #BTC hits new highs.

If we’ve learned anything from the past 8 years in digital assets, giving your users a chance to invest in the early stages of a project’s growth (via tokens) builds sticky customers, power users, and evangelists for life. We often say that tokens are the greatest capital formation tool in history by aligning customers and shareholders in a way never seen before