Verification Queue
Claims tagged [UNVERIFIED] across the wiki. Review periodically and resolve by either verifying against primary sources or upgrading the tag to [UNVERIFIABLE: reason].
| Claim | Source File | What Would Verify It | Status |
|---|---|---|---|
| "The age of scaling is over" — pre-training on static internet data has reached diminishing returns | wiki/summaries/dwarkesh-ilya-sutskever-2.md |
Find original Dwarkesh Podcast transcript/video of Ilya Sutskever interview (~Nov 2025) and confirm exact quote | Verified (2026-04-06) — interview confirmed (Nov 25, 2025); claims corroborated by multiple independent secondary sources (EA Forum, The Neuron, LangCopilot, Artificial Intelligence Monaco) |
| Progress now requires "age of research": new algorithmic breakthroughs, synthetic data, models that learn from deployment | wiki/summaries/dwarkesh-ilya-sutskever-2.md |
Same Dwarkesh/Sutskever transcript — confirm this framing is Sutskever's | Verified (2026-04-06) — "moving from age of scaling to age of research" confirmed; consistent across all secondary sources |
| Models exhibit "jagged generalization" — solving graduate-level problems but failing basic reasoning; RL risks benchmark overfitting | wiki/summaries/dwarkesh-ilya-sutskever-2.md |
Same transcript — confirm Sutskever uses "jagged generalization" phrasing | Verified (2026-04-06) — "jagged generalization" terminology confirmed consistently across secondary sources |
| RL consuming increasing compute relative to pre-training but yields only modest learning gains | wiki/summaries/dwarkesh-ilya-sutskever-2.md |
Same transcript — confirm specific claim about RL compute vs. gains | Verified (2026-04-06) — RL compute vs. gain critique confirmed across secondary sources |
| SSI structured research-first; alignment/safety as core design constraints; goal of AI "caring for sentient life" | wiki/summaries/dwarkesh-ilya-sutskever-2.md |
Same transcript — confirm SSI structure and "caring for sentient life" quote | Verified (2026-04-06) — SSI research-first structure and alignment-as-design-constraint confirmed across secondary sources |
| A "specific, currently unknown machine learning principle" needed for robust generalization | wiki/summaries/dwarkesh-ilya-sutskever-2.md |
Same transcript — confirm exact phrasing | Verified (2026-04-06) — unknown ML principle framing confirmed across secondary sources |
| 100× compute scaling might move the needle but would not fundamentally transform capabilities | wiki/summaries/dwarkesh-ilya-sutskever-2.md |
Same transcript — confirm 100× claim and context | Verified (2026-04-06) — 100× compute limitation claim confirmed across secondary sources |
| Superintelligence deployment should be gradual, with systems learning from real-world use | wiki/summaries/dwarkesh-ilya-sutskever-2.md |
Same transcript — confirm deployment thesis | Verified (2026-04-06) — gradual deployment with continual learning confirmed across secondary sources |
| Current AI progress relies on massive expert labor (PhDs writing Q&A, behavioral cloning) not genuine autonomous learning | wiki/summaries/dwarkesh-thoughts-on-ai-progress-dec-2025.md |
Access dwarkesh.com/p/thoughts-on-ai-progress-dec-2025-video or find transcript; confirm expert labor framing | Open |
| Simply scaling RL atop LLMs will not quickly produce AGI; "critical core" of generalizable learning is missing | wiki/summaries/dwarkesh-thoughts-on-ai-progress-dec-2025.md |
Same Dwarkesh blog/video — confirm "critical core" argument | Open |
| Robotics as litmus test: humans operate new hardware with minimal practice; AI requires thousands of handcrafted RL tasks | wiki/summaries/dwarkesh-thoughts-on-ai-progress-dec-2025.md |
Same source — confirm robotics comparison | Open |
| ~60% probability of AGI by 2040 | wiki/summaries/dwarkesh-thoughts-on-ai-progress-dec-2025.md |
Same source — confirm exact probability and year | Open |
| "Moderately bearish" short-term, "explosively bullish" long-term | wiki/summaries/dwarkesh-thoughts-on-ai-progress-dec-2025.md |
Same source — confirm exact phrasing | Open |
| Scaling laws uncertain whether they yield improvements toward general intelligence or plateau | wiki/summaries/dwarkesh-thoughts-on-ai-progress-dec-2025.md |
Same source — confirm scaling law uncertainty claim | Open |
| Robotics is primarily an algorithms problem, not hardware or data | wiki/summaries/dwarkesh-thoughts-on-ai-progress-dec-2025.md |
Same source — confirm algorithms-first framing | Open |
| Founders use Claude Code to build full applications from natural language, reducing dev time from months to days | wiki/summaries/forbes-vibe-code-revenue-stream.md |
Access forbes.com article directly and confirm specific claims about Claude Code usage and timelines | Verified 2026-04-06 — full article text provided; claims confirmed with direct quotes |
| Evan simultaneously runs multiple startups (patty.com, brooke.com, racingminds.com) without writing traditional code | wiki/summaries/forbes-vibe-code-revenue-stream.md |
Same Forbes article — confirm Evan example and specific startup names | Verified 2026-04-06 — Evan G. quote and startup names confirmed in article |
| Cost and risk of product experimentation drops dramatically with vibe coding | wiki/summaries/forbes-vibe-code-revenue-stream.md |
Same Forbes article — confirm cost/risk framing | Verified 2026-04-06 — article states "cost of being wrong dropped to almost nothing" |
| Claude Code, Wispr Flow, and Replit include "one-click deploy" features | wiki/summaries/forbes-vibe-code-revenue-stream.md |
Same Forbes article — confirm specific tools and deploy features mentioned | Partially Verified 2026-04-06 — Wispr Flow mentioned for dictation; article does not claim "one-click deploy" for these tools |
| Monetization strategies: micro-SaaS, service automation, freelance vibe coding services | wiki/summaries/forbes-vibe-code-revenue-stream.md |
Same Forbes article — confirm monetization categories | Verified 2026-04-06 — examples confirm these patterns (bananacam.ai subscriptions, etc.) |
| AI-generated code can contain vulnerabilities or maintenance challenges | wiki/summaries/forbes-vibe-code-revenue-stream.md |
Same Forbes article — confirm risk/limitation discussion | Unverifiable 2026-04-06 — article does not discuss code quality risks or vulnerabilities; this was likely from secondary sources |
| Vibe coding allows founders to leverage existing audience, expertise, and customer feedback | wiki/summaries/forbes-vibe-code-revenue-stream.md |
Same Forbes article — confirm leverage framing | Verified 2026-04-06 — article explicitly frames vibe coding for founders with existing audiences/expertise |
| Agents implement "continual learning" without weight updates via persistent memory injected into context | wiki/summaries/langchain-continual-learning-for-ai-agents.md |
Access blog.langchain.com/continual-learning-for-ai-agents/ directly and confirm core thesis | Verified (2026-04-06) — primary source confirms three-layer framework (Model/Harness/Context); context-layer learning without weight updates confirmed |
| LangChain models memory using COALA paper types: procedural, semantic, episodic | wiki/summaries/langchain-continual-learning-for-ai-agents.md |
Same LangChain article — confirm COALA paper reference and three memory types | Unverifiable (2026-04-06) — primary source does not reference COALA taxonomy; article uses Model/Harness/Context framework; claim was from secondary sources |
| Agent Builder implements memory as virtual filesystem; agents read/write own instruction files | wiki/summaries/langchain-continual-learning-for-ai-agents.md |
Same LangChain article — confirm Agent Builder filesystem implementation | Unverifiable (2026-04-06) — Agent Builder not referenced in primary source; context layer uses CLAUDE.md/SOUL.md config file patterns (Claude Code, OpenClaw) |
| LangSmith converts traces to test datasets; "LLM as judge" for automated scoring | wiki/summaries/langchain-continual-learning-for-ai-agents.md |
Same LangChain article — confirm LangSmith trace-to-test and LLM judge features | Unverifiable (2026-04-06) — LangSmith confirmed for trace collection; test dataset conversion and LLM-as-judge framing not in this article |
| DeepAgents (March 2026) is open-source harness with planning, filesystem, sub-agents, context summarization | wiki/summaries/langchain-continual-learning-for-ai-agents.md |
Same LangChain article + DeepAgents GitHub repo — confirm release date and feature set | Partially Verified (2026-04-06) — Deep Agents confirmed as production-ready model-agnostic base harness supporting context-layer learning; specific feature list from DeepAgents README not explicitly stated in article |
| LangSmith Fleet manages agent fleets with identity, permissions, audit trails | wiki/summaries/langchain-continual-learning-for-ai-agents.md |
Same LangChain article — confirm Fleet product details | Unverifiable (2026-04-06) — LangSmith Fleet not referenced in this article; claim was from secondary sources |
| Production feedback cycle (agent acts → user corrects → agent updates instructions) requires no re-deployment | wiki/summaries/langchain-continual-learning-for-ai-agents.md |
Same LangChain article — confirm feedback loop description | Partially Verified (2026-04-06) — in-the-hot-path context updates and offline dreaming confirmed; specific no-redeployment feedback cycle framing differs from original claim |
| Agents are "80% plumbing, 20% model" | wiki/summaries/nates-newsletter-agent-blind-spots.md |
Access natesnewsletter.substack.com article (may require subscription) and confirm exact ratio claim | Open |
| 12 foundational components most teams miss (tool registry, memory, orchestration, observability, security, etc.) | wiki/summaries/nates-newsletter-agent-blind-spots.md |
Same Nate's Newsletter article — confirm exact list of 12 components | Open |
| Claude Code's bash tool has an 18-module security architecture | wiki/summaries/nates-newsletter-agent-blind-spots.md |
Same article — confirm 18-module security claim | Open |
| Most teams focus exclusively on prompt engineering and model selection while ignoring plumbing | wiki/summaries/nates-newsletter-agent-blind-spots.md |
Same article — confirm team focus observation | Open |
| Gap between demo-grade and production-grade agents is almost entirely infrastructure, not model capability | wiki/summaries/nates-newsletter-agent-blind-spots.md |
Same article — confirm infrastructure gap thesis | Open |
| Sutskever's "age of scaling is over" assertion (referenced in connections) | wiki/connections.md |
Resolve via dwarkesh-ilya-sutskever-2.md verification above | Verified (2026-04-06) — resolved via dwarkesh-ilya-sutskever-2.md verification; [UNVERIFIED] tags removed from connections.md |
| "Vibe coding" phenomenon (referenced in connections) | wiki/connections.md |
Resolve via forbes-vibe-code-revenue-stream.md verification above | Verified 2026-04-06 — resolved via forbes-vibe-code-revenue-stream.md verification |
| LangChain's "continual learning" framing (referenced in connections) | wiki/connections.md |
Resolve via langchain-continual-learning-for-ai-agents.md verification above | Verified (2026-04-06) — resolved via langchain-continual-learning-for-ai-agents.md re-ingestion; connections updated with verified three-layer framework |