Part 1 — How it works today

The pipeline runs on every dashboard load

Six stages transform raw files and agent output into the ordered Activity feed you see in NowDashboard.

1
Collect threads from three sources Input

Three independent pools are merged into one list. Order matters: MEMORY → manual → discovered.

Source A
MEMORY files

Scans every .md file line-by-line for date + status keyword pairs. Stalled project files (untouched 14+ days) auto-generate a stalled entry.

Source B
Manual tasks

Threads added via "+ new thread." Stored in JSON, classified by due date if present, else open.

Source C
Discovered threads

Agent output from "Scan now" — searches Linear, Notion, PostHog, Outlook/Teams, claude-output. Persisted in cache; never auto-deleted.

overdue today upcoming (≤14d) stalled open — beyond 14d is ignored entirely
2
Filter out already-handled items Suppression

Four filters run in sequence. Any item caught by any filter is dropped before it reaches ranking.

1
Resolved
Item key appears in resolutions.md. Permanently hidden — no expiry.
2
Active snooze
Unexpired entry in postpones.json. Hidden until the snooze date passes.
3
CLOSE verdict (7-day TTL)
Auto-check agent returned CLOSE in the last 7 days. Suppressed as "not actionable" — resurfaces after TTL expires.
4
Snooze fatigue
Same item snoozed 2+ times in the last 30 days. Signal: you keep dismissing it, so stop showing it for a while.

Item identity key: file:line|first-80-chars-of-text

3
Deduplicate across sources Merge

The same work item can appear from multiple sources. mergeThreads runs two passes to collapse them:

1
Exact key match
Same file:line|text-prefix → duplicate dropped.
2
Cross-source reference match
Regex extracts Linear IDs (AWS-456), Notion page IDs, Linear URLs. Any two items sharing a reference → only the first survives. MEMORY items come first, so hand-written annotations win over agent paraphrases.
4
Score and rank within each kind Ranking

Each item gets a weight. Items are sorted by weight descending, then capped per kind.

weight = (days_overdue × source_coeff) + attention_signals
days_overdue= max(0, −daysOff) — only positive for overdue items; 0 for everything else
source_coeff= 1.0 for all sources (uniform; awaiting calibration data)
pavel_mention+1 if text contains "pavel" (case-insensitive)
status_keywordscount of distinct status words (due, blocked, awaiting, decision, …)
escalate+1 if latest auto-check verdict is ESCALATE
overdue — top 20
today — top 16
upcoming — top 16
stalled — top 16
open — top 30
5
Decorate with metadata Enrichment

Each item is enriched with context before being sent to the frontend.

Check history — latest CLOSE / KEEP / ESCALATE verdict + all past runs
Linked chats — any "Work on this" conversations bound to this thread
threadKey — stable cross-reference identifier
6
Render the feed Output

Items are displayed in kind order: overdue → today → upcoming → stalled → open. Within each kind, weight-sorted.

Activity feed — example rows
AWS-553 Partner Central listing — Honey's section incomplete
ESCALATE overdue · 3d late · Linear w14
Decision needed: approve revised pricing before EOD
KEEP today · MEMORY w2
Q2 co-sell pipeline review — in 5d
upcoming · in 5d · Notion w1
project_awssome.md untouched for 18 days
stalled · MEMORY w0

Each row shows: status dot · timing marker · source · verdict · hover actions (resolve, check, postpone) · w<N> debug badge


Part 2 — What could be better

Seven gaps between today and production-grade

Grounded in research on Gmail Priority Inbox, PagerDuty AIOps, GitHub Notifications, and WSJF / Cost of Delay prioritization literature.

Weight formula has a dead zone High impact
days_overdue = max(0, -daysOff) — urgency is zero for everything that isn't already overdue. An item due today, in 3 days, or in 14 days all score 0 urgency. Within the today, upcoming, and open kinds, ranking degrades to "how many status buzzwords does the text contain." Meanwhile a stalled project untouched for 45 days scores 45 — dominating over a Linear ticket that's 2 days overdue.
Fix: Use a continuous urgency function. Cost of Delay / WSJF, Gmail, and Meta feeds all use exponential decay (e^(-daysOff/τ)) — smooth urgency gradients that increase as a deadline approaches, without a hard cliff that only activates after it passes.
Recency is missing entirely High impact
Every production system surveyed uses time-since-last-touch as a primary ranking signal — Gmail, Meta feed ranking, PagerDuty, Linear's default sort. Your computeWeight uses none of it. The mtime field exists on stalled items but is never fed into the weight. Discovered items have lastSeenAt but it's used only for the binary stale/not-stale flag.

A thread actively discussed yesterday and one untouched for 13 days (just under the stale threshold) look identical to the ranker — sorted by keyword count, which tells you nothing about which actually needs attention.
Fix: Add recencyScore = 1 / (1 + daysSinceTouch) (or equivalent decay) as a signal in computeWeight. Feed mtime and lastSeenAt into it.
Item identity is fragile High impact
Identity key file:line|first-80-chars-of-text causes two classes of problems:

Collisions: Two items in the same file at the same line sharing the first 80 chars silently share identity. Particularly dangerous for discovered threads — every item gets line: 1 hardcoded, so any two items from the same source will collide if their text prefix matches.

Drift: Editing MEMORY.md or the agent paraphrasing a thread differently across runs changes the key — the item loses its check history, postpones, and linked chats. Appears as a "new" thread even if it's the same work item.
Fix: PagerDuty uses an explicit dedup_key independent of content. Gmail threads by conversation ID. Introduce a content-addressed hash or explicit stable key for discovered items — decouple identity from display text.
Dedup is "first wins," not "best wins" Medium impact
When mergeThreads finds a duplicate (two items sharing AWS-456), it keeps whichever appears first. MEMORY-first is the right default. But within discovered items, order is non-deterministic (Object.values() of JSON). If the same Linear ticket appears from two discovery runs with different text, whichever was cached first wins — even if the newer entry has a better description or a useful URL in the location field.
Fix: PagerDuty keeps the most recent alert in a group, not the first. When merging discovered items, prefer the entry with the latest lastSeenAt.
No feedback loop Medium impact
Gmail's importance threshold adjusts on every "Mark important" click. PagerDuty uses resolution data to flag transient alert patterns. Linear uses every priority assignment as training data.

Your system records every resolve, postpone, and "Work on this" click to disk — but none feed back into what surfaces next. The discovery agent doesn't know which sources you resolve from most often. The weight function doesn't know which kinds you act on fastest. The planned Phase 3–4 calibration can't happen yet because there's no structured outcome log: "surfaced May 5, resolved May 6, from Linear, kind=overdue."
Fix: Add a structured outcome log (item surfaced + item acted on = one row). Even 2 weeks of data makes coefficient calibration possible. This is Phase 3 of the weighting experiment — outcome capture needs to start first.
Filters would help more than better ranking Medium impact
GitHub manages notifications for ~150M developers with no ranking algorithm — only filters. Saved views like "review-requested, org:awssome, is:pull-request." Within a filter, sort is reverse-chronological. That's the entire algorithm.

The insight: for a notification stream you actively check (email, work dashboard), the user already has an opinion about what matters. They want a saved view matching their current focus. Ranking is for passive scroll (Instagram, TikTok) where the user doesn't know what they want.

Your Activity dashboard is closer to the GitHub case.
Fix: Add 3–4 filter chips — "Linear only," "KEEP / ESCALATE verdicts," "Pavel mentions," "Last 7 days" — before investing more in the weight formula. Direct user control over what's surfaced likely has higher ROI than score precision.
Scheduler can't keep up with item volume Lower priority
Background auto-check: 4 items per 4-hour tick. With ~45 active threads (30 discovered + ~15 from MEMORY), it takes 45+ hours to cycle through all of them. Verdict TTL is 6 hours — so by the time the scheduler revisits the first items, they're already stale.

Discovery also has no budget allocation across sources. The prompt says "6–15 tool calls TOTAL." If Linear returns 20 items in 3 calls, the budget is spent before Notion, PostHog, or claude-output are checked.
Fix: Raise throughput (more items per tick, or shorter interval) and add per-source budget guidance to the discovery prompt — or run sources in parallel sub-tasks.

Summary

Where the algorithm is vs. where it could be

Aspect Today Better
Macro sort Kind hierarchy (overdue → today → upcoming → stalled → open) ✓ Already good — matches production patterns
Suppression 4-layer gating (resolved, postponed, CLOSE TTL, snooze fatigue) ✓ Already strong — aligned with PagerDuty's approach
Intra-kind ranking Weight formula dominated by days_overdue; today / upcoming score 0 Continuous urgency function (e.g., 14 − daysOff or exponential decay)
Recency Not used in ranking at all Time-since-last-touch as a signal in computeWeight
Identity Text-prefix key — fragile to edits and paraphrasing Content-addressed hash or explicit dedup_key for discovered items
Dedup First-wins across sources Most-recent-wins or richest-metadata-wins
Feedback loop Actions recorded but don't influence future surfacing Outcome capture → coefficient calibration (planned but not started)
User control Single flat list, no filters Filter chips for source, verdict, recency (GitHub model)
Scheduler throughput 4 items / 4h — 45h to cycle through 45 items Higher throughput or shorter interval
Discovery budget Single prompt, single budget, source-blind Per-source allocation or parallel sub-tasks