AI Vocabulary
// AI PM Accelerator

Your 8-Week
Learning Sprint

From digital PM to AI PM β€” curated, time-boxed, and built for practitioners who learn by doing.

Overall Progress 0 / 22 modules
01
Weeks 1–2
ML/AI Fundamentals Literacy
Build the vocabulary engineers use. Tokens, context windows, fine-tuning vs RAG, embeddings β€” you'll stop nodding and start engaging.
πŸ”§ Apply It β€” Open a system prompt you've written β–Ύ
Open a system prompt you've written β€” or pull one from a public project or Claude.ai β€” and annotate each section: which parts are using few-shot examples, chain-of-thought instructions, role framing, or output formatting? Try labeling them with comments. This makes abstract concepts instantly concrete and will change how you write prompts going forward.
πŸ’­ Reflect β€” Write 2–3 sentences β–Ύ
How does understanding tokens and context windows change how you'd architect a multi-turn conversation in one of your apps?
~5 hrs / week
How LLMs Actually Work
Conceptual Foundation Β· Week 1
β–Ύ
β†’ Your connectionEvery prompt in every AI product runs through these mechanics. Understanding attention and context windows explains why your system prompts behave the way they do.
Resources
  • Video
    Andrej Karpathy β€” 1hr talk, Stanford-level clarity
    Purpose Karpathy is the single best explainer of LLM internals for smart non-engineers. This talk has no equations and covers everything from tokens to emergent behavior.
    After this You can explain what a token is, why LLMs "predict" rather than "think," and what a context window actually limits β€” in your own words.
    https://www.youtube.com/watch?v=zjkBMFhNj_g
    ~60 min
  • Video
    Andrej Karpathy, Microsoft Build 2023 β€” 45 min practical deep dive
    Purpose Karpathy walks through the full GPT training pipeline (pretraining β†’ RLHF β†’ instruction tuning), then shows live why certain prompting strategies work β€” grounded in how the model was actually built. Bridges theory to practice.
    After this You'll understand why chain-of-thought and few-shot prompting outperform zero-shot β€” explained through the lens of training, not just heuristics. This is the mental model engineers use when they say "the model was trained on this pattern."
    https://www.youtube.com/watch?v=bZQun8Y4L2A
    ~45 min
  • Article
    Stephen Wolfram β€” deep but accessible long-form
    Purpose Wolfram builds from first principles β€” probability, neural nets, training β€” without assuming technical background. It's long but the first half alone is worth it.
    After this You can explain why LLM outputs are probabilistic, not deterministic β€” a key concept when talking to engineers about reliability and testing.
    https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
    ~45 min
  • Podcast
    Lex Fridman Podcast (YouTube) β€” listen at 1.25x, first 45 min
    Purpose Hearing Karpathy speak conversationally fills in the texture that his formal talk misses β€” his mental model of where AI is headed, and what PMs should actually care about.
    After this You'll have a clearer intuition for how an AI researcher thinks about "intelligence" vs. pattern matching β€” a distinction that comes up constantly in PM product conversations.
    https://www.youtube.com/watch?v=cdiD-9MMpb0
    ~45 min
  • Doc
    Anthropic β€” read the model cards and capability notes
    Purpose Gives you the concrete vocabulary for the models you're already using β€” Haiku vs. Sonnet vs. Opus tradeoffs, context limits, and capability differences.
    After this You can explain to a stakeholder why you chose Sonnet over Haiku for a given feature, using cost, latency, and capability as the framework.
    https://docs.anthropic.com/en/docs/about-claude/models/overview
    ~20 min
Explanation mode:
RAG vs Fine-Tuning vs Prompt Engineering
Core Tradeoffs Β· Week 1
β–Ύ
β†’ Your connectionIf you've evaluated RAG tools or vector databases for a product, this module fills in the "why" behind those choices β€” and gives you the framework to explain the tradeoffs to an engineering partner.
Resources
  • Article
    Anyscale β€” clarifies the most common PM misconception
    Purpose Most PMs (and many engineers) think fine-tuning is how you "teach" a model new information. This article corrects that directly and explains what fine-tuning actually does well.
    After this You'll never again propose fine-tuning when RAG is the right answer β€” and you can explain why in a single sentence.
    https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts
    ~20 min
  • Video
    LangChain β€” 15 min conceptual walkthrough
    Purpose Visual diagrams of a retrieval pipeline make the abstract concept click. Seeing how a query becomes an embedding becomes a vector search becomes a context becomes an answer is the key mental model.
    After this You can sketch a RAG architecture on a whiteboard and explain each step β€” retrieval, augmentation, generation β€” to someone unfamiliar with it.
    https://www.youtube.com/watch?v=T-D1OfcDW1M
    ~15 min
  • Doc
    Anthropic β€” covers chain-of-thought, few-shot, and system prompts
    Purpose The authoritative reference for the model you're already building on. Unlike generic prompt guides, this reflects how Claude specifically responds to different techniques.
    After this You can name and intentionally apply at least five prompt engineering techniques β€” and explain the tradeoff between each when talking to an engineer.
    https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
    ~30 min
Explanation mode:
Embeddings, Vectors & Semantic Search
Technical Depth Β· Week 2
β–Ύ
β†’ Your connectionDirectly relevant to any AI product that involves search, recommendations, or retrieval. Understanding embeddings = understanding why vector search outperforms keyword search.
Resources
  • Article
    Simon Willison β€” clearest PM-friendly explanation available
    Purpose Willison writes for developers but explains concepts to a level a smart non-engineer can act on. His embedding explainer uses concrete examples that make the math unnecessary.
    After this You can explain to a stakeholder why "search by meaning" is different from keyword search β€” and when that difference matters for a product feature.
    https://simonwillison.net/2023/Oct/23/embeddings/
    ~25 min
  • Doc
    Pinecone β€” read the conceptual overview, skip the API docs
    Purpose This fills in the why behind the tool β€” what indexes are, how similarity search works, why dimensions matter.
    After this You can write a requirements doc for a vector search feature that an engineer could act on β€” specifying index size, similarity metric, and update frequency.
    https://docs.pinecone.io/guides/get-started/overview
    ~20 min
Explanation mode:
Cost, Latency & Model Selection Tradeoffs
PM Decision-Making Β· Week 2
β–Ύ
β†’ Your connectionYou've been choosing Claude Sonnet as your engine across all apps. This module gives you the framework to defend that choice β€” or know when Haiku or Opus is the right call for a specific feature.
Resources
  • Article
    Latent Space β€” tokens/sec, TTFT, and what matters for UX
    Purpose Latency is one of the top user experience levers in AI products. This article explains TTFT (time to first token), throughput, and why streaming changes the perception of speed β€” all relevant to your Vercel deployments.
    After this You can spec latency requirements for an AI feature (e.g., "TTFT under 800ms, stream enabled") and explain to engineers why those numbers matter for UX.
    https://www.latent.space/p/why-your-llm-is-slow
    ~20 min
  • Podcast
    Practical AI Podcast β€” ep. 256, ~35 min
    Purpose A PM-specific episode covering how to have credible technical conversations with ML engineers without being an engineer yourself. Covers model selection, build vs. buy, and scoping.
    After this You'll have a concrete vocabulary for what PMs are expected to know vs. delegate when working with ML teams.
    https://changelog.com/practicalai/256
    ~35 min
Explanation mode:
02
Weeks 3–4
Agentic AI & Systems Thinking
How multi-step agents are orchestrated, where they fail, and how to define what "done" looks like. The skills that separate an AI PM from a feature PM.
πŸ”§ Apply It β€” Sketch an agentic workflow as an agent spec β–Ύ
Choose a recurring task that could benefit from automation β€” a weekly brief, a research summary, a data report. Write a one-page agent spec: what tools does it need (search, calendar, email, databases)? What's the orchestration order? What happens if one tool fails? What does the output look like? This is a real engineering artifact you could hand to a developer.
πŸ’­ Reflect β€” Write 2–3 sentences β–Ύ
What's the difference between an AI feature and an AI agent? Where does an app you've worked with or built sit on that spectrum, and what would it take to push it toward agentic?
~5 hrs / week
What Is an Agent? Mental Models for PMs
Conceptual Β· Week 3
β–Ύ
β†’ Your connectionAny multi-step task that involves tools, decisions, and handoffs is a candidate for an agentic workflow. This module gives you the vocabulary to spec one properly with a developer.
Resources
  • Article
    Anthropic β€” the definitive framework for agentic system design
    Purpose Anthropic's own internal thinking on what makes agents work β€” and more importantly, what makes them fail. Required reading for anyone building on Claude.
    After this You can distinguish between workflows and agents, explain why human-in-the-loop matters for reliability, and spec appropriate checkpoints in an agentic system.
    https://www.anthropic.com/research/building-effective-agents
    ~30 min
  • Podcast
    Latent Space Podcast β€” the defining episode on where the field is headed
    Purpose This episode coined the term "AI Engineer" and defined what it means to build on top of foundation models rather than training them. It's the single best framing of where AI product work is going.
    After this You'll be able to articulate the difference between ML Engineer, AI Engineer, and AI PM β€” and where your builder profile fits.
    https://www.latent.space/p/ai-engineer
    ~50 min
  • Article
    Lilian Weng (OpenAI) β€” canonical overview, read sections 1–3
    Purpose Weng's post is the most-cited reference on agent architecture. It covers planning, memory, and tool use in a way that maps directly to how engineers think about building them.
    After this You can hold a whiteboard conversation with an engineer about agent components β€” planning, memory (short vs. long-term), and tool integration β€” without losing the thread.
    https://lilianweng.github.io/posts/2023-06-23-agent/
    ~40 min
Explanation mode:
Tool Use, Function Calling & MCP
Technical Pattern Β· Week 3
β–Ύ
β†’ Your connectionIf you've connected an AI app to any external service β€” calendar, email, databases β€” you've worked with something like this. This module explains the protocol so you can scope new integrations with precision instead of intuition.
Resources
  • Doc
    Anthropic β€” read the overview + best practices sections
    Purpose Function calling is how agents take actions in the world. Understanding the spec β€” how tools are defined, called, and results returned β€” lets you write accurate specs for tool integrations.
    After this You can write a tool definition in plain English that an engineer could translate directly into a JSON schema β€” name, description, parameters, and expected return.
    https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview
    ~25 min
  • Doc
    Anthropic MCP Docs β€” the spec, written accessibly
    Purpose MCP is becoming the standard way AI apps connect to external services. Since you're already using it with Monday and Gmail, understanding the protocol lets you reason about what's possible vs. what requires custom work.
    After this You can evaluate whether a new integration can use an existing MCP server or needs custom tool definitions β€” a meaningful scoping distinction.
    https://modelcontextprotocol.io/introduction
    ~20 min
Explanation mode:
Orchestration Frameworks: LangGraph & CrewAI
Framework Literacy Β· Week 4
β–Ύ
β†’ Your connectionWhen an agent needs to pull from multiple sources simultaneously β€” email, calendar, APIs, databases β€” an orchestration framework handles the sequencing. Knowing the names and tradeoffs helps you scope it right.
Resources
  • Video
    LangChain β€” visual walkthrough of stateful multi-agent graphs
    Purpose LangGraph uses graph metaphors (nodes and edges) to represent agent state and transitions. Seeing it visually makes the "state machine" concept click without needing to write code.
    After this You can describe a multi-step agent workflow as a graph β€” inputs, decision nodes, tool calls, output β€” which is exactly how engineers will think about implementing it.
    https://www.youtube.com/watch?v=4EXOmWeqXRc
    ~20 min
  • Podcast
    Practical AI Podcast β€” honest about real deployment failures
    Purpose Most agent content shows the happy path. This episode focuses on failure modes in production β€” loops, tool errors, context overflow, cost explosions β€” which is what PMs need to plan for.
    After this You can write a risk section in an agent spec that covers the top 5 failure modes and what mitigation looks like for each.
    https://changelog.com/practicalai/284
    ~40 min
Explanation mode:
Failure Modes: Hallucination, Drift & Context Loss
Risk Literacy Β· Week 4
β–Ύ
β†’ Your connectionEvery AI feature has failure modes. Knowing the taxonomy lets you write better acceptance criteria and edge case specs.
Resources
  • Article
    Galileo β€” systematic guide to agent failure modes and mitigations
    Purpose Gives you a named taxonomy β€” not just "it hallucinated" but intrinsic vs. extrinsic hallucination, factual vs. faithfulness errors. Names enable better bug reports and acceptance criteria.
    After this When you see a bad AI output, you can categorize the failure type and write a targeted fix β€” rather than just saying "the AI got it wrong."
    https://galileo.ai/blog/agent-failure-modes-guide
    ~20 min
  • Article
    Hamel Husain β€” bridges failure modes into eval design
    Purpose A warm-up read before Sprint 3. Hamel argues that most AI products ship with no systematic quality checks β€” and shows why that's a product management failure as much as an engineering one.
    After this You'll arrive at Sprint 3 already convinced that evals aren't optional, with a mental model of why they matter for products you own.
    https://hamel.dev/blog/posts/evals/
    ~25 min
Explanation mode:
03
Weeks 5–6
Evals & Quality Measurement
The #1 gap in AI product teams. Learn to define, design, and track LLM quality β€” the equivalent of writing a great A/B test spec, but for AI.
πŸ”§ Apply It β€” Red-team an AI product this week β–Ύ
Spend 30 minutes actively trying to break an AI product β€” one you've built or a public one you use regularly. Try edge inputs, contradictory instructions, multi-language queries, extremely vague prompts. Document every failure you find: what failed, what type of failure it was, and what a fix might look like. This is a real eval exercise.
πŸ’­ Reflect β€” Write 2–3 sentences β–Ύ
For one AI feature you've worked with or want to build β€” what would "good output" actually mean? How would you measure it without a human reviewing every response?
~4 hrs / week
What Are Evals and Why They're Non-Negotiable
Conceptual Foundation Β· Week 5
β–Ύ
β†’ Your connectionIf your AI product generates text, recommendations, or decisions β€” how do you know if they're good? Evals are how you answer that systematically and earn engineering trust when shipping AI features.
Resources
  • Article
    Hamel Husain β€” the essential PM read on eval strategy
    Purpose Hamel has run evals at scale at companies like Airbnb and GitHub. This post distills what actually works vs. what sounds good in theory β€” written for practitioners, not academics.
    After this You can write an eval plan for a single AI feature: what you're measuring, how you're measuring it, and what threshold means "good enough to ship."
    https://hamel.dev/blog/posts/evals/
    ~30 min
  • Podcast
    MLOps Community β€” hear Hamel explain his thinking conversationally
    Purpose Reading Hamel is good; hearing him think out loud is better. He goes off-script into examples that don't appear in the articles, including how PMs specifically can drive eval culture.
    After this You'll have a more nuanced view of the human judgment layer in evals β€” when automated scoring is reliable and when it misleads you.
    https://mlops.community/watch/evals-with-hamel-husain/
    ~45 min
  • Doc
    Anthropic β€” how to evaluate Claude-based products specifically
    Purpose Anthropic's own framework for evals is tuned for Claude's behavior β€” including how to handle ambiguity, instruction following, and refusal rates. More practical than generic LLM eval guides.
    After this You can set up a basic eval suite for a Claude-powered feature using Anthropic's recommended test categories.
    https://docs.anthropic.com/en/docs/test-and-evaluate/eval-overview
    ~20 min
Explanation mode:
Automated vs Human Evals β€” When to Use Each
Framework Β· Week 5
β–Ύ
β†’ Your connectionFor any product at scale β€” multiple languages, high query volume, diverse users β€” human eval alone doesn't work. Knowing when to automate and how is a key PM lever and a real skill gap in most product orgs.
Resources
  • Article
    Eugene Yan β€” covers eval, guardrails, and quality patterns
    Purpose Yan synthesizes patterns from building LLM systems at Amazon. His eval section is the clearest breakdown of when automated scoring is trustworthy and when it isn't.
    After this You can design an eval strategy that combines automated checks for objective criteria and human review for subjective quality β€” with a clear decision rule for which applies.
    https://eugeneyan.com/writing/llm-patterns/
    ~35 min
Explanation mode:
Red-Teaming Your Own AI Product
Applied Practice Β· Week 6
β–Ύ
β†’ Your connectionThis week's apply-it task lives here. Red-team any AI product you have access to. Try to break it. Document what you find. This is the practical deliverable.
Resources
  • Article
    Anthropic β€” how systematic red-teaming is structured
    Purpose Anthropic's own red-teaming methodology β€” the same process used on Claude before every major release. Gives you a structured approach rather than just "try weird things."
    After this You can run a structured red-team session on one of your apps, covering the main attack categories: prompt injection, jailbreaking, edge inputs, and adversarial users.
    https://www.anthropic.com/research/red-teaming-language-models-to-reduce-harms
    ~25 min
  • Article
    Learn Prompting β€” injection, jailbreaking, and edge cases
    Purpose A practical catalog of the actual techniques used to break AI products β€” injection attacks, goal hijacking, prompt leaking. Knowing the attack vectors helps you defend against them.
    After this You can identify at least three prompt injection vectors in your own apps and write guardrail instructions to address them.
    https://learnprompting.org/docs/prompt_hacking/intro
    ~30 min
Explanation mode:
Eval Tooling: Promptfoo, RAGAS, LangSmith
Tool Literacy Β· Week 6
β–Ύ
β†’ Your connectionYou don't need to implement these, but knowing they exist and what they do means you can have an informed conversation with an eng team about quality infrastructure β€” a real differentiator for an AI PM.
Resources
  • Doc
    Promptfoo β€” open-source eval framework, read the overview only
    Purpose Promptfoo is the most accessible eval tool for teams building on LLMs. Reading the intro shows you what a real eval config looks like β€” test cases, scoring criteria, threshold definitions.
    After this You can write a one-page eval spec for an AI feature that an engineer could implement using Promptfoo in under a day.
    https://www.promptfoo.dev/docs/intro/
    ~15 min
  • Doc
    RAGAS β€” faithfulness, answer relevancy, context recall
    Purpose RAGAS is the standard eval framework for RAG pipelines. Knowing these four metrics helps you measure whether your retrieval is actually helping the output β€” essential for any product using RAG for grounding.
    After this You can spec a RAG eval using RAGAS metrics β€” explaining to an engineer what faithfulness means and why low context recall indicates a retrieval problem, not a generation one.
    https://docs.ragas.io/en/latest/concepts/metrics/index.html
    ~20 min
Explanation mode:
04
Weeks 6–7
Data & Metrics for AI Products
AI products need different success metrics than traditional software. Learn to instrument, track, and communicate them to stakeholders who don't speak AI.
πŸ”§ Apply It β€” Define 3 success metrics for one of your apps β–Ύ
Pick any AI feature you've worked on or want to build. Define exactly three success metrics β€” one for output quality, one for user engagement, and one for business value. For each: what would you measure, how would you instrument it, and what threshold means "this feature is working"? Write it as if you're presenting to a product leadership team.
πŸ’­ Reflect β€” Write 2–3 sentences β–Ύ
Think of a product you work on or know well. What's a metric you track today that would need to change if you added an AI feature? What new metric would replace or complement it?
~4 hrs / week
AI Product Metrics That Actually Matter
Framework Β· Week 6
β–Ύ
β†’ Your connectionFor any AI output β€” a strategy report, a recommendation, a summary β€” "did this actually help?" is hard to measure. This module gives you the proxy metrics framework β€” follow-up queries, session depth, return usage β€” that correlates with real value delivered.
Resources
  • Article
    Lenny's Newsletter β€” PM-first framework for AI metrics
    Purpose Lenny interviews PMs from Notion, GitHub, and Linear about how they measure AI feature success. The patterns across companies reveal what metrics actually hold up vs. what looks good in a dashboard.
    After this You can build a simple AI feature metrics framework: leading indicators (engagement), lagging indicators (retention/revenue), and guardrail metrics (quality floor).
    https://www.lennysnewsletter.com/p/measuring-the-impact-of-ai-features
    ~25 min
  • Podcast
    Lenny Rachitsky β€” interviews AI PMs from top companies
    Purpose Lenny consistently gets candid answers about AI product development from PMs at Figma, Notion, and others. This episode focuses specifically on what metrics and measurement approaches they use in practice.
    After this You'll have real examples from working PMs of how they present AI feature impact to leadership β€” concrete language you can adapt for your own product context.
    https://www.lennyspodcast.com/how-to-build-great-ai-products-without-being-an-ai-expert/
    ~55 min
Explanation mode:
Instrumentation: What to Log and Why
Technical Β· Week 7
β–Ύ
β†’ Your connectionRight now your Vercel apps have minimal logging. Understanding what to capture β€” token counts, latency, user corrections, session signals β€” is how you build a real feedback loop into your apps.
Resources
  • Article
    Honeycomb β€” what to instrument in production AI apps
    Purpose Honeycomb pioneered observability tooling, and their LLM guide is the most practical breakdown of what to log, what those logs tell you, and how to use them to debug quality issues.
    After this You can write a logging requirements spec for a new AI feature β€” what events to capture, what fields to include, and how that data feeds into your eval and metrics dashboards.
    https://www.honeycomb.io/blog/observability-for-llms
    ~20 min
Explanation mode:
Communicating AI Metrics to Stakeholders
PM Communication Β· Week 7
β–Ύ
β†’ Your connectionTranslating "accuracy rate" and "hallucination rate" into business language stakeholders can act on is one of the highest-leverage skills an AI PM can develop.
Resources
  • Article
    Reforge β€” hallucination rate, task completion, correction frequency
    Purpose Reforge is where senior PMs go to level up. This piece specifically addresses the translation problem β€” how to take technical AI quality signals and convert them into metrics that resonate with a business audience.
    After this You can present an AI quality scorecard to a non-technical stakeholder using business outcomes (user correction rate β†’ support ticket cost, task completion β†’ feature ROI) rather than model metrics.
    https://www.reforge.com/blog/measuring-ai-product-quality
    ~20 min
Explanation mode:
A/B Testing AI Features β€” What Changes
Experimentation Β· Week 7
β–Ύ
β†’ Your connectionTraditional A/B tests assume deterministic outputs. AI features don't β€” every response varies. This module covers how to run valid experiments when outputs are probabilistic.
Resources
  • Article
    Netflix β€” real-world A/B testing challenges at scale
    Purpose Netflix runs some of the most rigorous experimentation in the industry. Their LLM experimentation write-up is honest about what breaks β€” variance, sample size requirements, metric instability β€” and how to address it.
    After this You can design an A/B test for an AI feature that accounts for output variance β€” including the right success metric, minimum sample size, and guardrail conditions.
    https://netflixtechblog.com/experimentation-with-llms-an-overview-of-challenges-and-strategies-b37bf28fc4a2
    ~25 min
Explanation mode:
Product Analytics & AI Unit Economics
Analytics Β· Week 7
β–Ύ
β†’ Your connectionYou can ship an AI feature β€” but can you prove it's working and paying for itself? This module covers the two lenses every AI PM needs post-launch: behavioral telemetry (what users actually do with outputs) and unit economics (whether token costs scale with revenue).
Resources
  • Article
    PostHog β€” behavioral telemetry patterns for AI features
    Purpose PostHog works with hundreds of AI companies and distills which product analytics events actually signal quality and engagement for AI features β€” copy-paste rates, regeneration requests, edit distance, and thumbs up/down as behavioral proxies for trust.
    After this You can draft an analytics telemetry spec β€” defining exactly which user interactions to instrument, what payload to send per event, and how each metric maps to a product health question.
    https://posthog.com/blog/ai-metrics
    ~20 min
  • Article
    a16z β€” unit economics framework for AI-powered products
    Purpose a16z analyzes how token costs, model efficiency, and usage patterns combine to determine whether an AI feature is margin-accretive or a loss leader. The framing β€” cost per intelligence unit vs. value delivered β€” is the mental model every AI PM needs when making model selection and tier decisions.
    After this You can model the unit economics of any AI feature: estimate monthly API burn from token usage per session Γ— volume, compare against revenue per user, and identify the model tier or caching strategy that makes the math work at scale.
    https://a16z.com/the-marginal-cost-of-intelligence/
    ~25 min
  • Doc
    Anthropic β€” authoritative input/output token pricing for the Claude family
    Purpose Unit economics analysis requires real numbers. This page shows per-token pricing across Haiku, Sonnet, and Opus tiers β€” the raw cost structure you need to build a credible model for any Claude-powered feature at different usage volumes.
    After this You can estimate the monthly API cost for a feature at 10k, 100k, and 1M sessions β€” and present a concrete cost/revenue case to finance or leadership when scoping AI feature investment.
    https://www.anthropic.com/pricing
    ~15 min
Explanation mode:
05
Week 8
Developer Empathy + Portfolio Framing
Cap the learning sprint. Deepen your code intuition, formalize your AI PM narrative, and map what to build or write next.
πŸ”§ Apply It β€” Find a public AI repo PR on GitHub β–Ύ
Go to github.com/anthropics/anthropic-sdk-python or a public LangChain repo. Find a recent merged PR. Read the diff. Try to understand: what changed, why it probably changed, and what the before/after behavior difference is. You don't need to understand every line β€” you're building the skill of reading code directionally.
πŸ’­ Reflect β€” Your capstone question β–Ύ
In one paragraph: what is your unique angle as an AI PM? What do you know or can you do that most AI PMs coming from traditional backgrounds cannot? This is the first draft of your thesis.
~5 hrs total
Reading Code Like a PM: PRs, Diffs & Logic Flows
Developer Empathy Β· Week 8
β–Ύ
β†’ Your connectionYour vibe-coding practice already gives you intuition here. The goal is to formalize it β€” being able to look at a GitHub PR diff and understand what changed and why is the key credibility signal with engineering teams.
Resources
  • Article
    Lenny's Newsletter β€” the credibility signals that matter most
    Purpose Lenny interviews engineering leads about what makes a PM they love working with. The consistent theme: it's not about writing code, it's about respecting how engineers think, communicate, and estimate.
    After this You can identify two or three specific behaviors you'll change in how you work with engineers β€” concrete, not abstract.
    https://www.lennysnewsletter.com/p/how-to-work-with-engineers
    ~20 min
  • Podcast
    SE Daily β€” how engineers actually use AI tools day-to-day
    Purpose Listening to how engineers describe their own AI-assisted workflow helps you understand what friction they face β€” and where a PM who understands both sides can add unique value.
    After this You'll be able to speak credibly in a room with engineers about AI coding tools β€” what they're good at, where they fail, and how that affects development velocity estimates.
    https://softwareengineeringdaily.com/2024/01/15/ai-assisted-development/
    ~40 min
  • Course
    Harvard β€” the goal is reading code, not writing it
    Purpose CS50P is the most accessible intro to Python that exists. Weeks 0–2 cover variables, conditionals, and functions β€” enough to read logic in a codebase without needing to write any yourself.
    After this You can read a Python function, understand what it does, and ask a precise clarifying question β€” rather than needing an engineer to explain every line in plain English.
    https://cs50.harvard.edu/python/2022/
    ~3 hrs
Explanation mode:
The AI PM Job Market in 2025–26
Career Intel Β· Week 8
β–Ύ
β†’ Your connectionUnderstand where the roles are concentrated β€” foundation model labs, AI-native startups, enterprise AI embeds β€” and how to position your background as a differentiator, not a detour.
Resources
  • Article
    Lenny's Newsletter β€” role types, required skills, how to break in
    Purpose The most-referenced career guide for AI PM transitions. Lenny maps the different flavors of "AI PM" β€” platform, feature, product β€” and what background fits each best.
    After this You can articulate which type of AI PM role fits your background best and what 1–2 specific gaps you'd need to close for each target company type.
    https://www.lennysnewsletter.com/p/how-to-become-an-ai-pm
    ~25 min
Explanation mode:
Framing Your Portfolio as an AI PM
Career Positioning Β· Week 8
β–Ύ
β†’ Your connectionSide projects, shipped tools, prototypes, open-source contributions β€” these aren't hobbies, they're portfolio evidence. This module is about framing your work as deliberate AI PM experience, not side projects.
Resources
  • Article
    LinkedIn Talent Blog β€” narrative framing for non-linear paths
    Purpose Many strong PM backgrounds are non-linear β€” a domain pivot, self-taught technical skills, or experience in an adjacent field. This guide is specifically about turning non-linear paths into a coherent narrative that hiring managers find compelling rather than confusing.
    After this You have a clear "throughline" sentence that connects your past to your present to your AI PM future β€” usable in interviews, your LinkedIn about section, and stakeholder conversations.
    https://www.linkedin.com/business/talent/blog/talent-acquisition/how-to-tell-a-compelling-career-story
    ~15 min
Capstone Deliverable
Write a 300-word "AI PM thesis" β€” your point of view on where AI product management is headed and your unique angle. This becomes your LinkedIn summary, your interview opener, and your pitch to any team or stakeholder.
Explanation mode:
What to Build or Write Next
Next Steps Β· Week 8
β–Ύ
β†’ Your connectionThe fastest way to solidify all areas: build a full-stack agentic workflow from scratch. It touches RAG, orchestration, prompt infrastructure, metrics, and developer empathy all at once.
Resources
  • Article
    Latent Space β€” curated list for staying current after Week 8
    Purpose A living reference for staying current after this sprint ends β€” covering the papers, tools, and communities that actually move the field, filtered for practitioners over academics.
    After this You have a recurring reading list and community to stay current β€” so the 8 weeks of learning doesn't go stale in month 3.
    https://www.latent.space/p/ai-engineer-2025-papers-to-know
    ~20 min
Explanation mode:
Rapid Prototyping: Vibe Coding for PMs
Hands-On Β· Week 8
β–Ύ
β†’ Your connectionThe fastest way to validate an AI concept is to build a working prototype yourself. This module teaches you to use tools like Cursor, Claude Code, and Vercel to stand up a functional app that tests your system prompt or RAG pipeline with real users β€” before writing a single engineering ticket.
Resources
  • Doc
    Cursor β€” AI-native code editor purpose-built for vibe coding workflows
    Purpose Cursor is the tool most PMs use for autonomous prototyping β€” you describe what you want in plain English and Cursor writes and edits the code. The docs walk through the basics of the chat interface, inline editing, and how to ask for changes without knowing syntax.
    After this You can open a project in Cursor, describe a feature or fix in plain English, and iterate with the AI until the result is right β€” without needing to understand the underlying syntax.
    https://docs.cursor.com/get-started/migrate-from-vscode
    ~20 min
  • Doc
    Vercel β€” the fastest path from system prompt to deployed prototype
    Purpose The Vercel AI SDK gives PMs a single abstraction for calling Claude, GPT-4, or Gemini from a web app β€” with streaming, tool use, and RAG patterns pre-built. Understanding what the SDK provides tells you exactly what to ask Cursor or Claude Code to scaffold for you.
    After this You can use Cursor to scaffold an app using the Vercel AI SDK, wire up a system prompt you've designed, deploy to Vercel, and share a live URL for user testing β€” in under 30 minutes.
    https://sdk.vercel.ai/docs/introduction
    ~25 min
  • Article
    Simon Willison β€” the clearest thinker on AI-assisted development
    Purpose Willison precisely defines the appropriate scope of vibe coding: fast, disposable, hypothesis-testing code β€” not production systems. His framing helps PMs understand what they should and shouldn't build autonomously, and how to hand off validated prototypes to engineering.
    After this You have a clear mental model for when to vibe code (validate a hypothesis, test a system prompt, build a demo) vs. when to write a spec for engineering (production scale, security requirements, maintainable codebase).
    https://simonwillison.net/2025/Mar/19/vibe-coding/
    ~15 min
Explanation mode:
06
Weeks 9–10 (Bonus Sprint)
Prompt Infrastructure & AI Workflow
The skill area most AI PMs ignore. Treat prompts as code β€” versioned, reusable, documented. This is where your vibe-coding practice becomes a professional differentiator.
πŸ”§ Apply It β€” Build a prompt library for your apps β–Ύ
Create a single markdown file (prompts.md) that documents every system prompt across the AI apps you've built or work with. For each: record the version, what it does, what techniques it uses, and what you'd change next. This is your first prompt registry β€” the foundation of treating prompts as infrastructure rather than one-off text.
πŸ’­ Reflect β€” Write 2–3 sentences β–Ύ
What would a "skill file" look like for an AI feature you own or want to build? What context would it need to include, and what behaviors would it enforce across every session?
~4 hrs / week
System Prompt Design as Engineering Discipline
Core Skill Β· Week 9
β–Ύ
β†’ Your connectionYou write system prompts for every app you build. This module reframes that work as a professional discipline β€” not "writing instructions" but designing a contract between the user, the model, and the product's intent.
Resources
  • Article
    Anthropic β€” the authoritative guide for Claude specifically
    Purpose Most prompt guides are model-agnostic. This one is tuned for how Claude specifically interprets system prompts β€” persona, context, task structure, and output formatting β€” with concrete examples.
    After this You can audit any of your existing system prompts against Anthropic's best practices and identify 3+ specific improvements β€” tighter persona definition, clearer output format, better edge case handling.
    https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts
    ~25 min
  • Article
    DAIR.AI β€” techniques for multi-step prompt design
    Purpose Complex tasks fail when crammed into a single prompt. Prompt chaining breaks work into steps β€” each prompt takes the output of the last as input. This is the architectural pattern behind every high-quality AI workflow.
    After this You can redesign a single-prompt feature in one of your apps as a prompt chain β€” specifying each step, its input, its output, and how it feeds the next stage.
    https://www.promptingguide.ai/techniques/prompt-chaining
    ~20 min
  • Podcast
    Latent Space Podcast β€” the best framing of where prompting is actually headed
    Purpose This episode cuts through the "prompt engineering is over" discourse and explains what actually matters β€” system design over clever wording, infrastructure over one-off prompts, and why this is now a real engineering discipline.
    After this You'll have a clear mental model of what "prompt infrastructure" means vs. just "writing prompts" β€” and you can articulate that distinction to a hiring manager or colleague.
    https://www.latent.space/p/prompt-engineering-is-dead
    ~50 min
Explanation mode:
Skill Files, Context Docs & Reusable Prompt Architecture
Workflow Systems Β· Week 10
β–Ύ
β†’ Your connectionYou've already been using skill/MD files and project-level system prompts in your vibe-coding workflow. This module formalizes that practice β€” turning it from intuition into a repeatable architecture you can explain and teach.
Resources
  • Doc
    Anthropic β€” how Claude Code uses CLAUDE.md and project context
    Purpose This doc explains exactly how Claude Code's memory system works β€” CLAUDE.md files, project-level instructions, and how context hierarchies are resolved. This is the production version of what you've been building instinctively.
    After this You can design a CLAUDE.md architecture for any of your apps β€” specifying what goes at project level vs. file level vs. session level, and why that separation matters.
    https://docs.anthropic.com/en/docs/claude-code/memory
    ~20 min
  • Article
    Brex Engineering β€” a real company's internal prompt standards
    Purpose Brex open-sourced their internal prompt engineering guide β€” the actual document their engineers use to maintain consistency across AI features. It's the closest thing to a real-world template for prompt infrastructure at scale.
    After this You can write a one-page prompt standards doc for your own apps β€” covering naming conventions, versioning, testing requirements, and documentation format.
    https://github.com/brexhq/prompt-engineering
    ~30 min
  • Article
    Simon Willison β€” treating prompts like code in a real workflow
    Purpose Willison shows how to apply software engineering practices to prompt management β€” git versioning, change tracking, regression testing. Since you already use GitHub, this maps directly to your current workflow.
    After this You have a working approach for versioning your prompts in GitHub β€” with commit messages that explain what changed and why, just like real code.
    https://simonwillison.net/2023/Jun/8/gpt-version-control/
    ~20 min
  • Podcast
    Prompt Engineering Podcast β€” deep dive on reusable prompt design
    Purpose A practitioner-focused episode on how to structure system prompts for reuse across features β€” modular sections, variable injection, conditional logic β€” rather than writing one monolithic block per product.
    After this You can refactor one of your app system prompts into modular components β€” a persona block, a context block, a task block, and an output format block β€” each independently editable.
    https://www.youtube.com/watch?v=T9aRN5JkmL8
    ~45 min
Explanation mode:
Enterprise Data Privacy, Security & Compliance
Enterprise Β· Week 10
β–Ύ
β†’ Your connectionThe biggest blocker for AI adoption in large organizations isn't capability β€” it's the security boundary. This module gives you the vocabulary and framework to navigate PII handling, compliance requirements, and secure API architecture so you can unblock enterprise AI features rather than getting stuck in legal review.
Resources
  • Article
    OWASP β€” the definitive security vulnerability taxonomy for AI systems
    Purpose OWASP is the gold standard for web security, and their LLM-specific Top 10 is what enterprise security teams use to evaluate AI features. Understanding the risks β€” prompt injection, training data poisoning, sensitive information disclosure β€” helps PMs write security requirements and anticipate review questions.
    After this You can walk into a security review and name the OWASP LLM categories relevant to your feature, describe your mitigations, and frame gaps as tracked risks rather than blockers β€” turning adversarial reviews into collaborative ones.
    https://owasp.org/www-project-top-10-for-large-language-model-applications/
    ~30 min
  • Doc
    NIST β€” the regulatory baseline for enterprise AI governance in the US
    Purpose The NIST AI RMF is becoming the de facto compliance standard for enterprise AI, referenced in government contracts, insurance requirements, and partner due diligence. Its four core functions (Govern, Map, Measure, Manage) are the language that satisfies legal and compliance stakeholders.
    After this You can map a new AI feature against the NIST AI RMF β€” identifying which governance controls apply, what documentation you'd need to produce, and how to present your risk posture to a compliance team.
    https://www.nist.gov/artificial-intelligence/ai-risk-management-framework
    ~25 min
  • Article
    Anthropic β€” how the Claude API handles enterprise data in practice
    Purpose Before routing any user data through an external LLM API, PMs need to know what the provider does with it. Anthropic's privacy policy covers retention, training opt-outs, and BAA availability for HIPAA workloads β€” the starting point for any enterprise data review.
    After this You can answer the first question every enterprise security team asks: "Where does our data go?" β€” with a specific, accurate description of Anthropic's data handling, retention windows, and the controls available to enterprise API customers.
    https://www.anthropic.com/privacy
    ~20 min
Explanation mode:
Sprint 1 Weeks 1–2