Intro: AI, the internet—and why you might not need all of it
Every cycle, a technology arrives that’s supposed to change everything. In the late ’90s it was the internet. Today it’s “AI.” The pattern is familiar: a handful of companies get a 10x advantage by aligning the tech with a business model the tech uniquely enables; most everyone else gets a 10% lift and a higher table-stakes bar.
That’s not cynicism—it’s operational reality. Your job isn’t to “get AI.” Your job is to decide whether AI rewires your economics or simply improves your workflow, and to invest accordingly.
This piece makes that decision easier. I’ll keep the useful parts of the history, correct a few popular myths, and give you a practical framework to find your Amazon-moment—if you have one—and to run low-risk pilots if you don’t (yet).
I. The internet lesson (keep the good history, lose the hype)
We remember Amazon because it’s unforgettable. What we forget is survivorship bias: for every internet-native juggernaut there were thousands of businesses whose economics barely moved. The neighborhood florist that put hours on a website didn’t become a platform; they met an expectation. That matters today because AI is following the same arc.
The point isn’t to dismiss the web (or AI). It’s to copy the logic of what worked:
- Amazon’s advantage was structural. National reach, searchable long-tail inventory, and logistics that scaled with traffic. The internet didn’t just make the old store cheaper; it made a different store possible.
- Most businesses adopted deliberately. A site, online ordering, better targeting, better analytics. All valuable. Rarely transformative.
Moral: Tech is transformative when it changes the constraints of your business model. Otherwise it’s a tool—important, sometimes urgent, but still a tool.
II. What actually changed in AI (and what didn’t)
“AI” is an umbrella. Most business value today flows from machine learning (ML)—algorithms that learn from data—drawing on applied statistics, optimization, linear algebra, probability, and systems.
A few inflection points explain the current moment:
- 2012: AlexNet makes deep learning practical. Convolutional neural networks existed for years, but AlexNet (Krizhevsky, Sutskever, Hinton at the University of Toronto) showed that training on GPUs, using ReLU activations, heavy data augmentation, and dropout could scale vision models on ImageNet. That wasn’t “backprop was invented”; it was engineering and compute alignment that unlocked a step-change in accuracy.
- 2017: Transformers change language modeling. Attention Is All You Need (Vaswani et al., Google) replaced recurrence with self-attention, enabling models that turn messy text into dense vectors (embeddings) and predict likely continuations over long contexts. That architecture powers today’s LLMs (GPT, Claude, Llama, Gemini).
- Modern generative models. GANs and diffusion models learn to generate realistic samples by optimization (generator vs. discriminator; or denoising a noise process), not by random mutation. The result: convincing synthetic text, images, and audio.
What this means for business boils down to two superpowers:
- Turn unstructured text into structure. Case notes, tickets, emails, reviews → embeddings, entities, clusters, sentiment—searchable and comparable.
- Retrieve and reason over your own content. Retrieval-augmented generation (RAG) lets an LLM draft answers with citations from your policies, SOPs, and knowledge base, under human approval.
And a crucial caveat: LLMs are probabilistic. That’s a feature in brainstorming, summarization, triage, routing, and first-draft work. It’s a risk for tasks demanding verified ground truth—unless you attach retrieval, guardrails, and checks.
III. A short, useful primer (no fluff)
- Embeddings: Numeric vectors that place similar concepts near each other. Great for semantic search, deduping, clustering, and “show me similar issues.”
- RAG (Retrieval-Augmented Generation): Fetch relevant internal docs first, then generate with citations. It improves accuracy, traceability, and compliance.
- Fine-tuning vs. prompting: Fine-tune for narrow, repetitive tasks on your data; prompt + tools for broader tasks. Often you start with prompts/RAG and fine-tune later.
- Guardrails: Evaluation sets, content filters, rate-limits, approval gates, and logging. These make AI operationally safe for customers and regulators.
That’s enough vocabulary to make good decisions.
IV. The 10x vs. 10% test (use this before you buy anything)
Ask these four questions:
- Where is your “long tail”?
Do you have a huge, varied problem space—support tickets, SKUs, case notes, compliance checks—where patterns are invisible at human scale? That’s where ML finds leverage you’re currently leaving on the table. - What’s your most valuable unstructured data?
If your best insights live in free-text (notes, transcripts, reviews), embeddings + extraction can turn them into metrics you can manage. - When is probabilistic “good enough”?
Drafting copy, clustering feedback, summarizing policy changes, triaging queues—all benefit from “good-enough quickly.” Legal citations, financial postings, safety-critical steps do not, unless you add retrieval and human checks. - Is the upside truly 10x—or just 10%?
Be honest: will this change acquisition cost, gross margin, throughput, or risk by an order of magnitude? If it’s incremental, still valuable—fund it like an optimization, not an existential pivot.
If you can’t articulate the 10x, treat AI as a tool in your kit. If you can, pick a thin slice and build it now.
V. Practical starting points (that won’t burn trust)
A) Unstructured-data audit (2–3 weeks)
Inventory your text sources. Sample embeddings across a few months of data. Stand up a demo of semantic search, clustering, and entity extraction on your own content. Deliverables: a live demo, labeled examples, and a ranked backlog of use cases with measured signal.
B) One thin-slice pilot (6–10 weeks)
Pick a single workflow with clear metrics—e.g., case-note normalization to standard fields, policy Q&A with citations, or ticket triage with deflection. Define success up-front (latency, accuracy, deflection rate, hours saved). Deliver to a small user group with opt-out and logging.
C) Cost and risk guardrails from day one
Cache prompts/responses, limit model calls, cap context, and attach retrieval filters. Add human approvals for actions that move money, touch customers, or update records of truth. Measure quality against a simple internal eval set (gold answers) before rollout.
VI. What AI is great at for small orgs (and what to skip for now)
Great now (especially for sub-150-employee orgs):
- Normalize free-text into structure. Map session notes, support emails, or sales call summaries into standard fields (topics, outcomes, risk flags) you can report on.
- Policy and SOP retrieval with citations. Reduce time spent hunting for the “right page” and cut misinterpretations.
- Queue triage and routing. Cluster similar cases; route to the right team; draft first responses your staff approves or edits.
- Doc drafting and summarization. First-pass emails, FAQs, release notes, internal updates. Humans still review.
- Data hygiene. Detect duplicates, near-duplicates, and mismatched records across systems.
Often premature (until you have more maturity):
- “Agentic” systems changing records without review. Add approvals first; remove later if metrics prove safety.
- Full app rewrites “to be AI-native.” Start with the pain point; integrate; measure; then refactor.
- Training a foundation model from scratch. You don’t need this. Use vendor models + RAG + small fine-tunes.
VII. The LLM “hallucination” problem (and what to do about it)
Hallucinations are a symptom of probabilistic generation without verification. Reduce risk by:
- Retrieval with citations (RAG) for fact-based answers.
- Task decomposition: first retrieve → then answer; or first classify → then choose a template → then fill the template.
- Evaluation sets: a small, evolving set of “gold” questions and documents that you re-test before each change.
- Human-in-the-loop: approvals for high-impact actions; feedback loops to collect misses and retrain prompts/rules.
- Clear UX: visible citations, “confidence” hints (via simple heuristics), and one-click escalation to a human.
VIII. Data realities (that make or break your ROI)
Before you expect a 10x payoff, check the plumbing:
- Where does the text live? Tickets, notes, PDFs, chats, email—can you access it legally and consistently?
- PII and compliance. Decide early what stays on-prem, what can go to a vendor’s EU/US region, and where you need hashing/redaction.
- Schemas and IDs. If you can’t join notes to customers/cases reliably, you’ll struggle to measure impact.
- Feedback capture. Add simple thumbs-up/down or “useful/not useful” so you can close the loop and improve.
If this list is shaky, start with retrieval + read-only insights. If it’s strong, push further into automation.
IX. A straight talk on cost, contracts, and lock-in
- Model choice: Start with a capable hosted model; swap later if cost/latency becomes a constraint. Avoid exotic features that lock you in on day one.
- Observability: Log prompts, responses, retrieval docs, latency, and cost per request. Without this you can’t improve.
- Evaluation: Keep a tiny but representative test set and run it before each change (model, prompt, index).
- Pricing: Long-lived assets (embeddings, indexes, eval sets) matter more than today’s model SKU. Design so you can change models with minimal pain.
X. Case sketches (how this looks when it works)
Case-note normalization (youth services)
Problem: Valuable insights trapped in free-text mentoring notes.
Approach: Embeddings + a light extractor map each note to 6–10 standard fields (topic, people involved, sentiment, follow-up).
Outcome: Supervisors get weekly roll-ups by theme and risk; staff get search that actually understands meaning, not keywords.
Policy Q&A with citations (nonprofit operations)
Problem: Staff burn time finding reimbursement, travel, or HR rules.
Approach: Segment and index policies; RAG with strict retrieval filters; show top-3 citations.
Outcome: Faster answers with traceability; fewer “I thought I remembered” mistakes.
Support triage (software vendor)
Problem: Long queue, repetitive issues across versions.
Approach: Cluster tickets by similarity; route to the right owner; draft first responses; deflect to known fixes when safe.
Outcome: Lower time-to-first-touch, fewer handoffs, measurable deflection on “known knowns.”
These aren’t moonshots. They’re compounding efficiencies that free humans for the judgment calls that still need them.
XI. The 90-minute test (use it to say no faster)
If a use case can’t pass these in a single working session, pause:
- Data exists and is reachable.
- Metric is obvious (time saved, accuracy vs. baseline, deflection rate, cycle time).
- Risk controls are clear (what can go wrong and how we prevent it).
- Thin slice is small (2–6 weeks to pilot with a small group).
If all four are true, green-light a pilot. If not, fix the gap or move on.
XII. How I help (typically sub-150-employee orgs)
- Roadmap Sprint (2–3 weeks): Quantify 10x vs. 10%, identify 1–2 high-leverage use cases, outline budget/ROI, and shortlist vendors.
- Unstructured-data audit: Mine tickets/notes/emails with embeddings + entity extraction; deliver the dashboards your ops team actually uses.
- Low-risk pilot (6–10 weeks): One workflow with clear metrics and human-in-the-loop guardrails, shipped to a small user group.
- Governance: Privacy design, evaluation sets, model/tool selection, and cost controls you can sustain without a full-time AI Team.
If you’re unsure whether AI is your Amazon moment or your Yellow Pages moment, we can pressure-test it in 45 minutes. I’ll tell you where AI can deliver step-change results—and where it’s smarter to wait.
XIII. Bottom line
AI will be the new internet—for some. For many, it’s an excellent tool that belongs in the kit, not in the business model. The difference isn’t how loudly you buy the buzzwords; it’s whether the technology changes your constraints.
Use the 10x vs. 10% test. Start where your data gives you leverage. Pilot small, measure honestly, and keep humans in the loop where it matters. When you do have that Amazon-moment, you’ll recognize it—and you’ll be ready.