AI PM Accelerator — 8-Week Learning Sprint

01

Weeks 1–2

ML/AI Fundamentals Literacy

Build the vocabulary engineers use. Tokens, context windows, fine-tuning vs RAG, embeddings — you'll stop nodding and start engaging.

🔧 Apply It — Open a system prompt you've written ▾

Open a system prompt you've written — or pull one from a public project or Claude.ai — and annotate each section: which parts are using few-shot examples, chain-of-thought instructions, role framing, or output formatting? Try labeling them with comments. This makes abstract concepts instantly concrete and will change how you write prompts going forward.

💭 Reflect — Write 2–3 sentences ▾

How does understanding tokens and context windows change how you'd architect a multi-turn conversation in one of your apps?

~5 hrs / week

How LLMs Actually Work

Conceptual Foundation · Week 1

▾

→ Your connectionEvery prompt in every AI product runs through these mechanics. Understanding attention and context windows explains why your system prompts behave the way they do.

Resources

Video

Intro to Large Language Models

Andrej Karpathy — 1hr talk, Stanford-level clarity

Purpose Karpathy is the single best explainer of LLM internals for smart non-engineers. This talk has no equations and covers everything from tokens to emergent behavior.

After this You can explain what a token is, why LLMs "predict" rather than "think," and what a context window actually limits — in your own words.

https://www.youtube.com/watch?v=zjkBMFhNj_g

~60 min
Video

State of GPT — How to Use LLMs

Andrej Karpathy, Microsoft Build 2023 — 45 min practical deep dive

Purpose Karpathy walks through the full GPT training pipeline (pretraining → RLHF → instruction tuning), then shows live why certain prompting strategies work — grounded in how the model was actually built. Bridges theory to practice.

After this You'll understand why chain-of-thought and few-shot prompting outperform zero-shot — explained through the lens of training, not just heuristics. This is the mental model engineers use when they say "the model was trained on this pattern."

https://www.youtube.com/watch?v=bZQun8Y4L2A

~45 min
Article

What Is ChatGPT Doing… and Why Does It Work?

Stephen Wolfram — deep but accessible long-form

Purpose Wolfram builds from first principles — probability, neural nets, training — without assuming technical background. It's long but the first half alone is worth it.

After this You can explain why LLM outputs are probabilistic, not deterministic — a key concept when talking to engineers about reliability and testing.

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

~45 min
Podcast

Lex Fridman #333 — Andrej Karpathy

Lex Fridman Podcast (YouTube) — listen at 1.25x, first 45 min

Purpose Hearing Karpathy speak conversationally fills in the texture that his formal talk misses — his mental model of where AI is headed, and what PMs should actually care about.

After this You'll have a clearer intuition for how an AI researcher thinks about "intelligence" vs. pattern matching — a distinction that comes up constantly in PM product conversations.

https://www.youtube.com/watch?v=cdiD-9MMpb0

~45 min
Doc

Anthropic Model Overview

Anthropic — read the model cards and capability notes

Purpose Gives you the concrete vocabulary for the models you're already using — Haiku vs. Sonnet vs. Opus tradeoffs, context limits, and capability differences.

After this You can explain to a stakeholder why you chose Sonnet over Haiku for a given feature, using cost, latency, and capability as the framework.

https://docs.anthropic.com/en/docs/about-claude/models/overview

~20 min

Explanation mode:

RAG vs Fine-Tuning vs Prompt Engineering

Core Tradeoffs · Week 1

▾

→ Your connectionIf you've evaluated RAG tools or vector databases for a product, this module fills in the "why" behind those choices — and gives you the framework to explain the tradeoffs to an engineering partner.

Resources

Article

Fine-Tuning Is For Form, Not Facts

Anyscale — clarifies the most common PM misconception

Purpose Most PMs (and many engineers) think fine-tuning is how you "teach" a model new information. This article corrects that directly and explains what fine-tuning actually does well.

After this You'll never again propose fine-tuning when RAG is the right answer — and you can explain why in a single sentence.

https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts

~20 min
Video

RAG from Scratch — Explained Visually

LangChain — 15 min conceptual walkthrough

Purpose Visual diagrams of a retrieval pipeline make the abstract concept click. Seeing how a query becomes an embedding becomes a vector search becomes a context becomes an answer is the key mental model.

After this You can sketch a RAG architecture on a whiteboard and explain each step — retrieval, augmentation, generation — to someone unfamiliar with it.

https://www.youtube.com/watch?v=T-D1OfcDW1M

~15 min
Doc

Anthropic Prompt Engineering Guide

Anthropic — covers chain-of-thought, few-shot, and system prompts

Purpose The authoritative reference for the model you're already building on. Unlike generic prompt guides, this reflects how Claude specifically responds to different techniques.

After this You can name and intentionally apply at least five prompt engineering techniques — and explain the tradeoff between each when talking to an engineer.

https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview

~30 min

Explanation mode:

Embeddings, Vectors & Semantic Search

Technical Depth · Week 2

▾

→ Your connectionDirectly relevant to any AI product that involves search, recommendations, or retrieval. Understanding embeddings = understanding why vector search outperforms keyword search.

Resources

Article

Embeddings: What They Are and Why They Matter

Simon Willison — clearest PM-friendly explanation available

Purpose Willison writes for developers but explains concepts to a level a smart non-engineer can act on. His embedding explainer uses concrete examples that make the math unnecessary.

After this You can explain to a stakeholder why "search by meaning" is different from keyword search — and when that difference matters for a product feature.

https://simonwillison.net/2023/Oct/23/embeddings/

~25 min
Doc

Pinecone Quickstart Concepts

Pinecone — read the conceptual overview, skip the API docs

Purpose This fills in the why behind the tool — what indexes are, how similarity search works, why dimensions matter.

After this You can write a requirements doc for a vector search feature that an engineer could act on — specifying index size, similarity metric, and update frequency.

https://docs.pinecone.io/guides/get-started/overview

~20 min

Explanation mode:

Cost, Latency & Model Selection Tradeoffs

PM Decision-Making · Week 2

▾

→ Your connectionYou've been choosing Claude Sonnet as your engine across all apps. This module gives you the framework to defend that choice — or know when Haiku or Opus is the right call for a specific feature.

Resources

Article

Why Your LLM Is Slow

Latent Space — tokens/sec, TTFT, and what matters for UX

Purpose Latency is one of the top user experience levers in AI products. This article explains TTFT (time to first token), throughput, and why streaming changes the perception of speed — all relevant to your Vercel deployments.

After this You can spec latency requirements for an AI feature (e.g., "TTFT under 800ms, stream enabled") and explain to engineers why those numbers matter for UX.

https://www.latent.space/p/why-your-llm-is-slow

~20 min
Podcast

Practical AI — "What Product Managers Need to Know About ML"

Practical AI Podcast — ep. 256, ~35 min

Purpose A PM-specific episode covering how to have credible technical conversations with ML engineers without being an engineer yourself. Covers model selection, build vs. buy, and scoping.

After this You'll have a concrete vocabulary for what PMs are expected to know vs. delegate when working with ML teams.

https://changelog.com/practicalai/256

~35 min

Explanation mode:

02

Weeks 3–4

Agentic AI & Systems Thinking

How multi-step agents are orchestrated, where they fail, and how to define what "done" looks like. The skills that separate an AI PM from a feature PM.

🔧 Apply It — Sketch an agentic workflow as an agent spec ▾

Choose a recurring task that could benefit from automation — a weekly brief, a research summary, a data report. Write a one-page agent spec: what tools does it need (search, calendar, email, databases)? What's the orchestration order? What happens if one tool fails? What does the output look like? This is a real engineering artifact you could hand to a developer.

💭 Reflect — Write 2–3 sentences ▾

What's the difference between an AI feature and an AI agent? Where does an app you've worked with or built sit on that spectrum, and what would it take to push it toward agentic?

~5 hrs / week

What Is an Agent? Mental Models for PMs

Conceptual · Week 3

▾

→ Your connectionAny multi-step task that involves tools, decisions, and handoffs is a candidate for an agentic workflow. This module gives you the vocabulary to spec one properly with a developer.

Resources

Article

Building Effective Agents

Anthropic — the definitive framework for agentic system design

Purpose Anthropic's own internal thinking on what makes agents work — and more importantly, what makes them fail. Required reading for anyone building on Claude.

After this You can distinguish between workflows and agents, explain why human-in-the-loop matters for reliability, and spec appropriate checkpoints in an agentic system.

https://www.anthropic.com/research/building-effective-agents

~30 min
Podcast

Latent Space — "The Rise of the AI Engineer"

Latent Space Podcast — the defining episode on where the field is headed

Purpose This episode coined the term "AI Engineer" and defined what it means to build on top of foundation models rather than training them. It's the single best framing of where AI product work is going.

After this You'll be able to articulate the difference between ML Engineer, AI Engineer, and AI PM — and where your builder profile fits.

https://www.latent.space/p/ai-engineer

~50 min
Article

LLM Powered Autonomous Agents

Lilian Weng (OpenAI) — canonical overview, read sections 1–3

Purpose Weng's post is the most-cited reference on agent architecture. It covers planning, memory, and tool use in a way that maps directly to how engineers think about building them.

After this You can hold a whiteboard conversation with an engineer about agent components — planning, memory (short vs. long-term), and tool integration — without losing the thread.

https://lilianweng.github.io/posts/2023-06-23-agent/

~40 min

Explanation mode:

Tool Use, Function Calling & MCP

Technical Pattern · Week 3

▾

→ Your connectionIf you've connected an AI app to any external service — calendar, email, databases — you've worked with something like this. This module explains the protocol so you can scope new integrations with precision instead of intuition.

Resources

Doc

Anthropic Tool Use Guide

Anthropic — read the overview + best practices sections

Purpose Function calling is how agents take actions in the world. Understanding the spec — how tools are defined, called, and results returned — lets you write accurate specs for tool integrations.

After this You can write a tool definition in plain English that an engineer could translate directly into a JSON schema — name, description, parameters, and expected return.

https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview

~25 min
Doc

Model Context Protocol — Introduction

Anthropic MCP Docs — the spec, written accessibly

Purpose MCP is becoming the standard way AI apps connect to external services. Since you're already using it with Monday and Gmail, understanding the protocol lets you reason about what's possible vs. what requires custom work.

After this You can evaluate whether a new integration can use an existing MCP server or needs custom tool definitions — a meaningful scoping distinction.

https://modelcontextprotocol.io/introduction

~20 min

Explanation mode:

Orchestration Frameworks: LangGraph & CrewAI

Framework Literacy · Week 4

▾

→ Your connectionWhen an agent needs to pull from multiple sources simultaneously — email, calendar, APIs, databases — an orchestration framework handles the sequencing. Knowing the names and tradeoffs helps you scope it right.

Resources

Video

LangGraph in 20 Minutes

LangChain — visual walkthrough of stateful multi-agent graphs

Purpose LangGraph uses graph metaphors (nodes and edges) to represent agent state and transitions. Seeing it visually makes the "state machine" concept click without needing to write code.

After this You can describe a multi-step agent workflow as a graph — inputs, decision nodes, tool calls, output — which is exactly how engineers will think about implementing it.

https://www.youtube.com/watch?v=4EXOmWeqXRc

~20 min
Podcast

Practical AI — "Agents in Production: What Actually Breaks"

Practical AI Podcast — honest about real deployment failures

Purpose Most agent content shows the happy path. This episode focuses on failure modes in production — loops, tool errors, context overflow, cost explosions — which is what PMs need to plan for.

After this You can write a risk section in an agent spec that covers the top 5 failure modes and what mitigation looks like for each.

https://changelog.com/practicalai/284

~40 min

Explanation mode:

Failure Modes: Hallucination, Drift & Context Loss

Risk Literacy · Week 4

▾

→ Your connectionEvery AI feature has failure modes. Knowing the taxonomy lets you write better acceptance criteria and edge case specs.

Resources

Article

7 AI Agent Failure Modes and How to Fix Them

Galileo — systematic guide to agent failure modes and mitigations

Purpose Gives you a named taxonomy — not just "it hallucinated" but intrinsic vs. extrinsic hallucination, factual vs. faithfulness errors. Names enable better bug reports and acceptance criteria.

After this When you see a bad AI output, you can categorize the failure type and write a targeted fix — rather than just saying "the AI got it wrong."

https://galileo.ai/blog/agent-failure-modes-guide

~20 min
Article

Your AI Product Has No Evals. That's a Problem.

Hamel Husain — bridges failure modes into eval design

Purpose A warm-up read before Sprint 3. Hamel argues that most AI products ship with no systematic quality checks — and shows why that's a product management failure as much as an engineering one.

After this You'll arrive at Sprint 3 already convinced that evals aren't optional, with a mental model of why they matter for products you own.

https://hamel.dev/blog/posts/evals/

~25 min

Explanation mode:

03

Weeks 5–6

Evals & Quality Measurement

The #1 gap in AI product teams. Learn to define, design, and track LLM quality — the equivalent of writing a great A/B test spec, but for AI.

🔧 Apply It — Red-team an AI product this week ▾

Spend 30 minutes actively trying to break an AI product — one you've built or a public one you use regularly. Try edge inputs, contradictory instructions, multi-language queries, extremely vague prompts. Document every failure you find: what failed, what type of failure it was, and what a fix might look like. This is a real eval exercise.

💭 Reflect — Write 2–3 sentences ▾

For one AI feature you've worked with or want to build — what would "good output" actually mean? How would you measure it without a human reviewing every response?

~4 hrs / week

What Are Evals and Why They're Non-Negotiable

Conceptual Foundation · Week 5

▾

→ Your connectionIf your AI product generates text, recommendations, or decisions — how do you know if they're good? Evals are how you answer that systematically and earn engineering trust when shipping AI features.

Resources

Article

Evals Are All You Need

Hamel Husain — the essential PM read on eval strategy

Purpose Hamel has run evals at scale at companies like Airbnb and GitHub. This post distills what actually works vs. what sounds good in theory — written for practitioners, not academics.

After this You can write an eval plan for a single AI feature: what you're measuring, how you're measuring it, and what threshold means "good enough to ship."

https://hamel.dev/blog/posts/evals/

~30 min
Podcast

Hamel Husain on the MLOps Community Podcast

MLOps Community — hear Hamel explain his thinking conversationally

Purpose Reading Hamel is good; hearing him think out loud is better. He goes off-script into examples that don't appear in the articles, including how PMs specifically can drive eval culture.

After this You'll have a more nuanced view of the human judgment layer in evals — when automated scoring is reliable and when it misleads you.

https://mlops.community/watch/evals-with-hamel-husain/

~45 min
Doc

Anthropic Eval Guide

Anthropic — how to evaluate Claude-based products specifically

Purpose Anthropic's own framework for evals is tuned for Claude's behavior — including how to handle ambiguity, instruction following, and refusal rates. More practical than generic LLM eval guides.

After this You can set up a basic eval suite for a Claude-powered feature using Anthropic's recommended test categories.

https://docs.anthropic.com/en/docs/test-and-evaluate/eval-overview

~20 min

Explanation mode:

Automated vs Human Evals — When to Use Each

Framework · Week 5

▾

→ Your connectionFor any product at scale — multiple languages, high query volume, diverse users — human eval alone doesn't work. Knowing when to automate and how is a key PM lever and a real skill gap in most product orgs.

Resources

Article

Patterns for Building LLM Systems

Eugene Yan — covers eval, guardrails, and quality patterns

Purpose Yan synthesizes patterns from building LLM systems at Amazon. His eval section is the clearest breakdown of when automated scoring is trustworthy and when it isn't.

After this You can design an eval strategy that combines automated checks for objective criteria and human review for subjective quality — with a clear decision rule for which applies.

https://eugeneyan.com/writing/llm-patterns/

~35 min

Explanation mode:

Red-Teaming Your Own AI Product

Applied Practice · Week 6

▾

→ Your connectionThis week's apply-it task lives here. Red-team any AI product you have access to. Try to break it. Document what you find. This is the practical deliverable.

Resources

Article

Red-Teaming Language Models to Reduce Harms

Anthropic — how systematic red-teaming is structured

Purpose Anthropic's own red-teaming methodology — the same process used on Claude before every major release. Gives you a structured approach rather than just "try weird things."

After this You can run a structured red-team session on one of your apps, covering the main attack categories: prompt injection, jailbreaking, edge inputs, and adversarial users.

https://www.anthropic.com/research/red-teaming-language-models-to-reduce-harms

~25 min
Article

Prompt Hacking Guide

Learn Prompting — injection, jailbreaking, and edge cases

Purpose A practical catalog of the actual techniques used to break AI products — injection attacks, goal hijacking, prompt leaking. Knowing the attack vectors helps you defend against them.

After this You can identify at least three prompt injection vectors in your own apps and write guardrail instructions to address them.

https://learnprompting.org/docs/prompt_hacking/intro

~30 min

Explanation mode:

Eval Tooling: Promptfoo, RAGAS, LangSmith

Tool Literacy · Week 6

▾

→ Your connectionYou don't need to implement these, but knowing they exist and what they do means you can have an informed conversation with an eng team about quality infrastructure — a real differentiator for an AI PM.

Resources

Doc

Promptfoo Introduction

Promptfoo — open-source eval framework, read the overview only

Purpose Promptfoo is the most accessible eval tool for teams building on LLMs. Reading the intro shows you what a real eval config looks like — test cases, scoring criteria, threshold definitions.

After this You can write a one-page eval spec for an AI feature that an engineer could implement using Promptfoo in under a day.

https://www.promptfoo.dev/docs/intro/

~15 min
Doc

RAGAS Metrics for RAG Evaluation

RAGAS — faithfulness, answer relevancy, context recall

Purpose RAGAS is the standard eval framework for RAG pipelines. Knowing these four metrics helps you measure whether your retrieval is actually helping the output — essential for any product using RAG for grounding.

After this You can spec a RAG eval using RAGAS metrics — explaining to an engineer what faithfulness means and why low context recall indicates a retrieval problem, not a generation one.

https://docs.ragas.io/en/latest/concepts/metrics/index.html

~20 min

Explanation mode:

04

Weeks 6–7

Data & Metrics for AI Products

AI products need different success metrics than traditional software. Learn to instrument, track, and communicate them to stakeholders who don't speak AI.

🔧 Apply It — Define 3 success metrics for one of your apps ▾

Pick any AI feature you've worked on or want to build. Define exactly three success metrics — one for output quality, one for user engagement, and one for business value. For each: what would you measure, how would you instrument it, and what threshold means "this feature is working"? Write it as if you're presenting to a product leadership team.

💭 Reflect — Write 2–3 sentences ▾

Think of a product you work on or know well. What's a metric you track today that would need to change if you added an AI feature? What new metric would replace or complement it?

~4 hrs / week

AI Product Metrics That Actually Matter

Framework · Week 6

▾

→ Your connectionFor any AI output — a strategy report, a recommendation, a summary — "did this actually help?" is hard to measure. This module gives you the proxy metrics framework — follow-up queries, session depth, return usage — that correlates with real value delivered.

Resources

Article

Measuring the Impact of AI Features

Lenny's Newsletter — PM-first framework for AI metrics

Purpose Lenny interviews PMs from Notion, GitHub, and Linear about how they measure AI feature success. The patterns across companies reveal what metrics actually hold up vs. what looks good in a dashboard.

After this You can build a simple AI feature metrics framework: leading indicators (engagement), lagging indicators (retention/revenue), and guardrail metrics (quality floor).

https://www.lennysnewsletter.com/p/measuring-the-impact-of-ai-features

~25 min
Podcast

Lenny's Podcast — "How to Build AI Products Without Being an AI Expert"

Lenny Rachitsky — interviews AI PMs from top companies

Purpose Lenny consistently gets candid answers about AI product development from PMs at Figma, Notion, and others. This episode focuses specifically on what metrics and measurement approaches they use in practice.

After this You'll have real examples from working PMs of how they present AI feature impact to leadership — concrete language you can adapt for your own product context.

https://www.lennyspodcast.com/how-to-build-great-ai-products-without-being-an-ai-expert/

~55 min

Explanation mode:

Instrumentation: What to Log and Why

Technical · Week 7

▾

→ Your connectionRight now your Vercel apps have minimal logging. Understanding what to capture — token counts, latency, user corrections, session signals — is how you build a real feedback loop into your apps.

Resources

Article

Observability for LLM Applications

Honeycomb — what to instrument in production AI apps

Purpose Honeycomb pioneered observability tooling, and their LLM guide is the most practical breakdown of what to log, what those logs tell you, and how to use them to debug quality issues.

After this You can write a logging requirements spec for a new AI feature — what events to capture, what fields to include, and how that data feeds into your eval and metrics dashboards.

https://www.honeycomb.io/blog/observability-for-llms

~20 min

Explanation mode:

Communicating AI Metrics to Stakeholders

PM Communication · Week 7

▾

→ Your connectionTranslating "accuracy rate" and "hallucination rate" into business language stakeholders can act on is one of the highest-leverage skills an AI PM can develop.

Resources

Article

Measuring AI Product Quality

Reforge — hallucination rate, task completion, correction frequency

Purpose Reforge is where senior PMs go to level up. This piece specifically addresses the translation problem — how to take technical AI quality signals and convert them into metrics that resonate with a business audience.

After this You can present an AI quality scorecard to a non-technical stakeholder using business outcomes (user correction rate → support ticket cost, task completion → feature ROI) rather than model metrics.

https://www.reforge.com/blog/measuring-ai-product-quality

~20 min

Explanation mode:

A/B Testing AI Features — What Changes

Experimentation · Week 7

▾

→ Your connectionTraditional A/B tests assume deterministic outputs. AI features don't — every response varies. This module covers how to run valid experiments when outputs are probabilistic.

Resources

Article

Experimentation with LLMs — Netflix Tech Blog

Netflix — real-world A/B testing challenges at scale

Purpose Netflix runs some of the most rigorous experimentation in the industry. Their LLM experimentation write-up is honest about what breaks — variance, sample size requirements, metric instability — and how to address it.

After this You can design an A/B test for an AI feature that accounts for output variance — including the right success metric, minimum sample size, and guardrail conditions.

https://netflixtechblog.com/experimentation-with-llms-an-overview-of-challenges-and-strategies-b37bf28fc4a2

~25 min

Explanation mode:

Product Analytics & AI Unit Economics

Analytics · Week 7

▾

→ Your connectionYou can ship an AI feature — but can you prove it's working and paying for itself? This module covers the two lenses every AI PM needs post-launch: behavioral telemetry (what users actually do with outputs) and unit economics (whether token costs scale with revenue).

Resources

Article

AI Product Analytics: Measuring What Actually Matters

PostHog — behavioral telemetry patterns for AI features

Purpose PostHog works with hundreds of AI companies and distills which product analytics events actually signal quality and engagement for AI features — copy-paste rates, regeneration requests, edit distance, and thumbs up/down as behavioral proxies for trust.

After this You can draft an analytics telemetry spec — defining exactly which user interactions to instrument, what payload to send per event, and how each metric maps to a product health question.

https://posthog.com/blog/ai-metrics

~20 min
Article

The Marginal Cost of Intelligence

a16z — unit economics framework for AI-powered products

Purpose a16z analyzes how token costs, model efficiency, and usage patterns combine to determine whether an AI feature is margin-accretive or a loss leader. The framing — cost per intelligence unit vs. value delivered — is the mental model every AI PM needs when making model selection and tier decisions.

After this You can model the unit economics of any AI feature: estimate monthly API burn from token usage per session × volume, compare against revenue per user, and identify the model tier or caching strategy that makes the math work at scale.

https://a16z.com/the-marginal-cost-of-intelligence/

~25 min
Doc

Anthropic Model Pricing

Anthropic — authoritative input/output token pricing for the Claude family

Purpose Unit economics analysis requires real numbers. This page shows per-token pricing across Haiku, Sonnet, and Opus tiers — the raw cost structure you need to build a credible model for any Claude-powered feature at different usage volumes.

After this You can estimate the monthly API cost for a feature at 10k, 100k, and 1M sessions — and present a concrete cost/revenue case to finance or leadership when scoping AI feature investment.

https://www.anthropic.com/pricing

~15 min

Explanation mode:

05

Week 8

Developer Empathy + Portfolio Framing

Cap the learning sprint. Deepen your code intuition, formalize your AI PM narrative, and map what to build or write next.

🔧 Apply It — Find a public AI repo PR on GitHub ▾

Go to github.com/anthropics/anthropic-sdk-python or a public LangChain repo. Find a recent merged PR. Read the diff. Try to understand: what changed, why it probably changed, and what the before/after behavior difference is. You don't need to understand every line — you're building the skill of reading code directionally.

💭 Reflect — Your capstone question ▾

In one paragraph: what is your unique angle as an AI PM? What do you know or can you do that most AI PMs coming from traditional backgrounds cannot? This is the first draft of your thesis.

~5 hrs total

Reading Code Like a PM: PRs, Diffs & Logic Flows

Developer Empathy · Week 8

▾

→ Your connectionYour vibe-coding practice already gives you intuition here. The goal is to formalize it — being able to look at a GitHub PR diff and understand what changed and why is the key credibility signal with engineering teams.

Resources

Article

How PMs Should Work with Engineers

Lenny's Newsletter — the credibility signals that matter most

Purpose Lenny interviews engineering leads about what makes a PM they love working with. The consistent theme: it's not about writing code, it's about respecting how engineers think, communicate, and estimate.

After this You can identify two or three specific behaviors you'll change in how you work with engineers — concrete, not abstract.

https://www.lennysnewsletter.com/p/how-to-work-with-engineers

~20 min
Podcast

Software Engineering Daily — "AI-Assisted Development"

SE Daily — how engineers actually use AI tools day-to-day

Purpose Listening to how engineers describe their own AI-assisted workflow helps you understand what friction they face — and where a PM who understands both sides can add unique value.

After this You'll be able to speak credibly in a room with engineers about AI coding tools — what they're good at, where they fail, and how that affects development velocity estimates.

https://softwareengineeringdaily.com/2024/01/15/ai-assisted-development/

~40 min
Course

CS50P — Python for Non-Engineers (Weeks 0–2 only)

Harvard — the goal is reading code, not writing it

Purpose CS50P is the most accessible intro to Python that exists. Weeks 0–2 cover variables, conditionals, and functions — enough to read logic in a codebase without needing to write any yourself.

After this You can read a Python function, understand what it does, and ask a precise clarifying question — rather than needing an engineer to explain every line in plain English.

https://cs50.harvard.edu/python/2022/

~3 hrs

Explanation mode:

The AI PM Job Market in 2025–26

Career Intel · Week 8

▾

→ Your connectionUnderstand where the roles are concentrated — foundation model labs, AI-native startups, enterprise AI embeds — and how to position your background as a differentiator, not a detour.

Resources

Article

How to Become an AI PM

Lenny's Newsletter — role types, required skills, how to break in

Purpose The most-referenced career guide for AI PM transitions. Lenny maps the different flavors of "AI PM" — platform, feature, product — and what background fits each best.

After this You can articulate which type of AI PM role fits your background best and what 1–2 specific gaps you'd need to close for each target company type.

https://www.lennysnewsletter.com/p/how-to-become-an-ai-pm

~25 min

Explanation mode:

Framing Your Portfolio as an AI PM

Career Positioning · Week 8

▾

→ Your connectionSide projects, shipped tools, prototypes, open-source contributions — these aren't hobbies, they're portfolio evidence. This module is about framing your work as deliberate AI PM experience, not side projects.

Resources

Article

How to Tell a Compelling Career Story

LinkedIn Talent Blog — narrative framing for non-linear paths

Purpose Many strong PM backgrounds are non-linear — a domain pivot, self-taught technical skills, or experience in an adjacent field. This guide is specifically about turning non-linear paths into a coherent narrative that hiring managers find compelling rather than confusing.

After this You have a clear "throughline" sentence that connects your past to your present to your AI PM future — usable in interviews, your LinkedIn about section, and stakeholder conversations.

https://www.linkedin.com/business/talent/blog/talent-acquisition/how-to-tell-a-compelling-career-story

~15 min

Capstone Deliverable

Write a 300-word "AI PM thesis" — your point of view on where AI product management is headed and your unique angle. This becomes your LinkedIn summary, your interview opener, and your pitch to any team or stakeholder.

Explanation mode:

What to Build or Write Next

Next Steps · Week 8

▾

→ Your connectionThe fastest way to solidify all areas: build a full-stack agentic workflow from scratch. It touches RAG, orchestration, prompt infrastructure, metrics, and developer empathy all at once.

Resources

Article

Papers & Resources Every AI Builder Should Know

Latent Space — curated list for staying current after Week 8

Purpose A living reference for staying current after this sprint ends — covering the papers, tools, and communities that actually move the field, filtered for practitioners over academics.

After this You have a recurring reading list and community to stay current — so the 8 weeks of learning doesn't go stale in month 3.

https://www.latent.space/p/ai-engineer-2025-papers-to-know

~20 min

Explanation mode:

Rapid Prototyping: Vibe Coding for PMs

Hands-On · Week 8

▾

→ Your connectionThe fastest way to validate an AI concept is to build a working prototype yourself. This module teaches you to use tools like Cursor, Claude Code, and Vercel to stand up a functional app that tests your system prompt or RAG pipeline with real users — before writing a single engineering ticket.

Resources

Doc

Cursor: Getting Started

Cursor — AI-native code editor purpose-built for vibe coding workflows

Purpose Cursor is the tool most PMs use for autonomous prototyping — you describe what you want in plain English and Cursor writes and edits the code. The docs walk through the basics of the chat interface, inline editing, and how to ask for changes without knowing syntax.

After this You can open a project in Cursor, describe a feature or fix in plain English, and iterate with the AI until the result is right — without needing to understand the underlying syntax.

https://docs.cursor.com/get-started/migrate-from-vscode

~20 min
Doc

Vercel AI SDK — Introduction

Vercel — the fastest path from system prompt to deployed prototype

Purpose The Vercel AI SDK gives PMs a single abstraction for calling Claude, GPT-4, or Gemini from a web app — with streaming, tool use, and RAG patterns pre-built. Understanding what the SDK provides tells you exactly what to ask Cursor or Claude Code to scaffold for you.

After this You can use Cursor to scaffold an app using the Vercel AI SDK, wire up a system prompt you've designed, deploy to Vercel, and share a live URL for user testing — in under 30 minutes.

https://sdk.vercel.ai/docs/introduction

~25 min
Article

Vibe Coding — What It Is and What It Isn't

Simon Willison — the clearest thinker on AI-assisted development

Purpose Willison precisely defines the appropriate scope of vibe coding: fast, disposable, hypothesis-testing code — not production systems. His framing helps PMs understand what they should and shouldn't build autonomously, and how to hand off validated prototypes to engineering.

After this You have a clear mental model for when to vibe code (validate a hypothesis, test a system prompt, build a demo) vs. when to write a spec for engineering (production scale, security requirements, maintainable codebase).

https://simonwillison.net/2025/Mar/19/vibe-coding/

~15 min

Explanation mode:

06

Weeks 9–10 (Bonus Sprint)

Prompt Infrastructure & AI Workflow

The skill area most AI PMs ignore. Treat prompts as code — versioned, reusable, documented. This is where your vibe-coding practice becomes a professional differentiator.

🔧 Apply It — Build a prompt library for your apps ▾

Create a single markdown file (prompts.md) that documents every system prompt across the AI apps you've built or work with. For each: record the version, what it does, what techniques it uses, and what you'd change next. This is your first prompt registry — the foundation of treating prompts as infrastructure rather than one-off text.

💭 Reflect — Write 2–3 sentences ▾

What would a "skill file" look like for an AI feature you own or want to build? What context would it need to include, and what behaviors would it enforce across every session?

~4 hrs / week

System Prompt Design as Engineering Discipline

Core Skill · Week 9

▾

→ Your connectionYou write system prompts for every app you build. This module reframes that work as a professional discipline — not "writing instructions" but designing a contract between the user, the model, and the product's intent.

Resources

Article

Anthropic: System Prompts Best Practices

Anthropic — the authoritative guide for Claude specifically

Purpose Most prompt guides are model-agnostic. This one is tuned for how Claude specifically interprets system prompts — persona, context, task structure, and output formatting — with concrete examples.

After this You can audit any of your existing system prompts against Anthropic's best practices and identify 3+ specific improvements — tighter persona definition, clearer output format, better edge case handling.

https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts

~25 min
Article

Prompt Chaining & Decomposition

DAIR.AI — techniques for multi-step prompt design

Purpose Complex tasks fail when crammed into a single prompt. Prompt chaining breaks work into steps — each prompt takes the output of the last as input. This is the architectural pattern behind every high-quality AI workflow.

After this You can redesign a single-prompt feature in one of your apps as a prompt chain — specifying each step, its input, its output, and how it feeds the next stage.

https://www.promptingguide.ai/techniques/prompt-chaining

~20 min
Podcast

Latent Space — "Prompt Engineering is Dead, Long Live Prompt Engineering"

Latent Space Podcast — the best framing of where prompting is actually headed

Purpose This episode cuts through the "prompt engineering is over" discourse and explains what actually matters — system design over clever wording, infrastructure over one-off prompts, and why this is now a real engineering discipline.

After this You'll have a clear mental model of what "prompt infrastructure" means vs. just "writing prompts" — and you can articulate that distinction to a hiring manager or colleague.

https://www.latent.space/p/prompt-engineering-is-dead

~50 min

Explanation mode:

Skill Files, Context Docs & Reusable Prompt Architecture

Workflow Systems · Week 10

▾

→ Your connectionYou've already been using skill/MD files and project-level system prompts in your vibe-coding workflow. This module formalizes that practice — turning it from intuition into a repeatable architecture you can explain and teach.

Resources

Doc

Claude Code: Memory & Context Management

Anthropic — how Claude Code uses CLAUDE.md and project context

Purpose This doc explains exactly how Claude Code's memory system works — CLAUDE.md files, project-level instructions, and how context hierarchies are resolved. This is the production version of what you've been building instinctively.

After this You can design a CLAUDE.md architecture for any of your apps — specifying what goes at project level vs. file level vs. session level, and why that separation matters.

https://docs.anthropic.com/en/docs/claude-code/memory

~20 min
Article

Brex's Internal Prompt Engineering Guide

Brex Engineering — a real company's internal prompt standards

Purpose Brex open-sourced their internal prompt engineering guide — the actual document their engineers use to maintain consistency across AI features. It's the closest thing to a real-world template for prompt infrastructure at scale.

After this You can write a one-page prompt standards doc for your own apps — covering naming conventions, versioning, testing requirements, and documentation format.

https://github.com/brexhq/prompt-engineering

~30 min
Article

Version Controlling Your Prompts

Simon Willison — treating prompts like code in a real workflow

Purpose Willison shows how to apply software engineering practices to prompt management — git versioning, change tracking, regression testing. Since you already use GitHub, this maps directly to your current workflow.

After this You have a working approach for versioning your prompts in GitHub — with commit messages that explain what changed and why, just like real code.

https://simonwillison.net/2023/Jun/8/gpt-version-control/

~20 min
Podcast

The Prompt Engineering Podcast — "System Prompt Architecture"

Prompt Engineering Podcast — deep dive on reusable prompt design

Purpose A practitioner-focused episode on how to structure system prompts for reuse across features — modular sections, variable injection, conditional logic — rather than writing one monolithic block per product.

After this You can refactor one of your app system prompts into modular components — a persona block, a context block, a task block, and an output format block — each independently editable.

https://www.youtube.com/watch?v=T9aRN5JkmL8

~45 min

Explanation mode:

Enterprise Data Privacy, Security & Compliance

Enterprise · Week 10

▾

→ Your connectionThe biggest blocker for AI adoption in large organizations isn't capability — it's the security boundary. This module gives you the vocabulary and framework to navigate PII handling, compliance requirements, and secure API architecture so you can unblock enterprise AI features rather than getting stuck in legal review.

Resources

Article

OWASP Top 10 for LLM Applications

OWASP — the definitive security vulnerability taxonomy for AI systems

Purpose OWASP is the gold standard for web security, and their LLM-specific Top 10 is what enterprise security teams use to evaluate AI features. Understanding the risks — prompt injection, training data poisoning, sensitive information disclosure — helps PMs write security requirements and anticipate review questions.

After this You can walk into a security review and name the OWASP LLM categories relevant to your feature, describe your mitigations, and frame gaps as tracked risks rather than blockers — turning adversarial reviews into collaborative ones.

https://owasp.org/www-project-top-10-for-large-language-model-applications/

~30 min
Doc

NIST AI Risk Management Framework

NIST — the regulatory baseline for enterprise AI governance in the US

Purpose The NIST AI RMF is becoming the de facto compliance standard for enterprise AI, referenced in government contracts, insurance requirements, and partner due diligence. Its four core functions (Govern, Map, Measure, Manage) are the language that satisfies legal and compliance stakeholders.

After this You can map a new AI feature against the NIST AI RMF — identifying which governance controls apply, what documentation you'd need to produce, and how to present your risk posture to a compliance team.

https://www.nist.gov/artificial-intelligence/ai-risk-management-framework

~25 min
Article

Anthropic Privacy Policy & Data Handling

Anthropic — how the Claude API handles enterprise data in practice

Purpose Before routing any user data through an external LLM API, PMs need to know what the provider does with it. Anthropic's privacy policy covers retention, training opt-outs, and BAA availability for HIPAA workloads — the starting point for any enterprise data review.

After this You can answer the first question every enterprise security team asks: "Where does our data go?" — with a specific, accurate description of Anthropic's data handling, retention windows, and the controls available to enterprise API customers.

https://www.anthropic.com/privacy

~20 min

Explanation mode:

Your 8-WeekLearning Sprint

Your 8-Week
Learning Sprint