How to Write a Performance Review for Engineers (2026 Guide)
Most engineering performance reviews fail in the same way: the manager opens the doc the night before it's due, scrolls Slack and memory, and writes adjectives. "Strong communicator." "Owns problems." "Could grow in scope." None of it is wrong. None of it is grounded. And the engineer reading it knows the difference instantly.
This guide is the method I use — and the method PerfCopilot automates — to write reviews that hold up under calibration: every claim mapped to a specific artifact, every growth area framed with evidence, every section ready to defend. It takes a day per engineer by hand. You don't need a tool to do it, but you do need a system.
Key Takeaways
- The single biggest predictor of a defensible engineer review is whether each claim maps to a specific artifact — a PR number, ticket ID, or incident timestamp.
- Recency bias dominates reviews written without a structured evidence pass; research from Lattice and Engagedly on review accuracy keeps surfacing the same pattern — the last 4–6 weeks crowd out the previous 4–5 months.
- The "evidence table" — claim → artifact — is the structural innovation that turns a vague review into a calibrated one.
- Honest beats padded. If evidence for a growth area is thin, say so; don't manufacture it.
Why engineer reviews fail
Engineer reviews fail for three predictable reasons, and they compound. The first is recency bias — the tendency to weight the last few weeks far more than the rest of the cycle. Research on performance review accuracy from Lattice and Engagedly has consistently flagged recency as the dominant cognitive distortion in manager-written reviews; whichever exact number you trust, the direction is unambiguous and every experienced EM has felt it.
The second is invisible work. A senior engineer who spent six weeks unblocking three other teams, reviewing 40 PRs, and rewriting a flaky test suite leaves almost no trace in the places managers usually look. Their JIRA board is quiet. Their commit count is modest. The work is real; the surface is thin.
The third is vague adjectives. "Strong technically." "A real team player." "Needs to grow into senior scope." None of these survive a calibration meeting where another manager asks, "Compared to what? Show me." The fix for all three is the same: pull the evidence before you write a single sentence.
Step 1 — Gather the evidence before you write
Before you open the review template, spend ninety minutes pulling artifacts. Not writing — collecting. This is the part almost every manager skips and almost every senior engineer notices.
Where the signal lives
For a software engineer, the signal lives in seven places. Pull from all of them for the entire review period, not just the last sprint:
- GitHub / GitLab — every PR they authored, every PR they reviewed, every issue they opened or closed. Note the size, the repo, and the reviewer mix.
- Jira / Linear / Shortcut — tickets owned, tickets completed, cycle time, scope creep, tickets they unblocked for others.
- Incident tooling (PagerDuty, FireHydrant, incident.io) — incidents they were paged for, incidents they led the response on, post-mortems they authored or contributed to.
- Design docs / RFCs — documents authored, reviewed, or substantially commented on. This is where staff-level work lives.
- Slack / channel threads — the long technical threads where they answered a hard question, helped a junior debug, or pushed back on a bad design. Search their handle.
- 1:1 notes — your own. The goals they set at the start of the cycle. The blockers they raised. The growth areas you flagged in week three.
- Peer / upward feedback — from your formal cycle or just direct asks to 3–5 close collaborators.
You're not reading any of this deeply yet. You're collecting links into a single doc. For a mid-level engineer over a six-month cycle, expect 30–80 PRs, 40–120 tickets, 2–10 incidents, and 1–4 design docs. If your count is wildly outside that range, ask why before you write.
The evidence-table technique
This is the structural part competitors stop short of. Open a two-column scratch table. On the left, write the claim you're considering making. On the right, paste the artifacts that prove it. The table is for you, not the review — but every claim that survives into the final review must have a non-empty right column.
| Claim (draft) | Artifacts | |---|---| | Took ownership of the billing service | PR #4821 (refactor), PR #4903 (idempotency keys), INC-2026-014 (led response), RFC: Billing v2 retry semantics | | Mentored two junior engineers | PRs reviewed for @junior-a (28), @junior-b (19); #help-platform thread 2026-02-11; 1:1 notes 2026-01-22 | | Could grow in cross-team influence | No design docs authored this cycle; PR review activity stays inside team repos; one attempted RFC (RFC-114) stalled at draft |
If a claim has no right column, you have two choices: cut it, or do another pass through the artifacts to find the evidence. What you don't do is leave it in.
For a more detailed walkthrough of the artifact-to-claim mapping with screenshots, see our engineering performance review template.
Step 2 — Structure the review
Once the evidence table is full, structure follows naturally. Most engineering review forms have four sections that matter: achievements, growth areas, peer-feedback synthesis, and goals. Keep that order.
Achievements
Lead with impact, not activity. "Shipped the billing v2 migration" is impact. "Wrote 4,200 lines of code" is activity. For each achievement, the structure is what shipped → what changed → who benefited → the artifact. Three to five achievements is the right range for a six-month cycle. More than that and you're padding; fewer and you're under-selling.
Growth areas
Frame growth areas the same way: specific, evidence-backed, forward-looking. "Could communicate more proactively" is a bad growth area — it's an adjective. "Tended to surface blockers in 1:1s rather than in the team channel, which delayed unblocking by 1–3 days on three observed occasions (cite 1:1 notes 2026-02-04, 2026-03-18, 2026-04-22)" is a real one. The engineer can act on it.
Two growth areas is the practical maximum. Engineers can work on two things; they can't work on five.
Peer-feedback synthesis
Don't quote peer feedback verbatim — synthesize it. Look for themes that appear in two or more pieces of feedback, then cite the count. "Three of four peers independently flagged depth of code review as a strength, with two noting it specifically slowed merge time in a way they valued." That's defensible. A pasted quote is not.
STAR framing for the hard claims
For any claim that will get scrutinized in calibration — promotion-relevant work, incident leadership, cross-team impact — write it in STAR form: Situation, Task, Action, Result. Compress it to 3–4 sentences. STAR is overkill for "shipped feature X"; it's the right tool for "led the incident response that prevented a billing outage."
Step 3 — Write each claim with a citation
Now you write. The rule is simple: every factual claim ends with an artifact reference, in-line. Treat your review like a Wikipedia article — claims without citations get cut.
Vague (don't ship this):
"Priya was a strong technical contributor this cycle and took on increasing ownership of the billing service."
Evidence-backed (ship this):
"Priya owned the billing service end-to-end this cycle, authoring the v2 retry semantics RFC (RFC-2026-018), shipping the idempotency-keys refactor (PR #4903, +1,240 / -890 lines), and leading the response to INC-2026-014 on March 14, where she co-authored the post-mortem and drove three of the four follow-up actions to completion (BILL-1182, BILL-1184, BILL-1191)."
Same engineer, same six months, completely different review. The second one survives calibration. The first doesn't.
For more before/after rewrites at different levels, see our collection of performance review examples for software engineers. Related: performance review examples for software engineers.
Step 4 — Bias-check before you submit
Before you hit submit, run a five-minute bias pass. Not because you're biased and your peers aren't — because everyone is, and the structured pass catches what intuition misses.
Read the review once and ask:
- Recency — Do my examples span the full cycle, or cluster in the last 6 weeks? Count the dates.
- Halo / horns — Am I letting one strong (or weak) project color unrelated assessments?
- Affinity — Am I describing this engineer in language I'd use for me?
- Attribution — Did I credit team wins to this person, or this person's wins to the team?
- Gendered language — "Aggressive," "bossy," "abrasive," "nurturing," "supportive" — flag and rewrite to describe the behavior, not the personality.
A deeper checklist with rewrite patterns lives in our guide to reducing bias in performance reviews. Related: how to reduce bias in performance reviews.
A complete worked example
Here's what the output looks like end-to-end. Engineer: Priya Raman, mid-level backend engineer (E4), Q1 2026 cycle, payments platform team.
Overall summary. Priya had a strong cycle, expanding her ownership from feature work into service-level responsibility on the billing service. She is operating at the top of E4 expectations in execution and approaching E5 in scope, with the gap being cross-team influence.
Achievements.
- Owned the billing v2 idempotency refactor. Authored RFC-2026-018, shipped the implementation across three PRs (#4821, #4903, #4912) over seven weeks, coordinated cutover with the platform team. Result: duplicate-charge incidents dropped from 4 in Q4 2025 to 0 in Q1 2026 (BILL-dashboard).
- Led INC-2026-014 response (Mar 14). Triaged and identified the root cause within 22 minutes of page (PagerDuty timeline), drove the fix, authored the post-mortem, and closed out three of four follow-up actions (BILL-1182, BILL-1184, BILL-1191) within the cycle. The fourth (BILL-1190) is on track for Q2.
- Mentorship and review depth. Reviewed 47 PRs across the cycle, with 28 going to two junior engineers on the team. Two peers (in upward feedback) independently flagged her review comments as a top-cited reason for their growth this cycle.
Growth areas.
- Cross-team influence and design surface. Priya's RFC work this cycle stayed within the payments team. The attempted cross-team RFC on retry semantics (RFC-114) stalled in draft after the initial review round. To reach E5, the next cycle should include at least one cross-team design document that ships to implementation. We discussed this in our 1:1 on April 22.
- Proactive blocker-surfacing. On three observed occasions this cycle (1:1 notes 2026-02-04, 2026-03-18, 2026-04-22), Priya raised blockers in 1:1 that, raised in the team channel a day earlier, would have unblocked her sooner. This is a habit shift, not a skill gap.
Peer-feedback synthesis. Five peer responses received. Themes: depth of code review (4/5), reliability under incident pressure (3/5), tendency to under-claim her own work in retros (2/5).
Goals for next cycle. Author and ship one cross-team RFC; pair with @senior-c on the cross-team design surface; continue billing service ownership through the v3 migration.
Notice what's there and what isn't. Every factual claim has an artifact. The growth areas are specific and actionable. The peer feedback is synthesized, not pasted. No adjectives stand alone.
Templates and tools
A blank doc is the enemy. Use a structured template — the one we publish, or your company's. The point is the structure, not the source.
Doing this by hand takes a day per cycle per engineer. PerfCopilot assembles the evidence table automatically — pulling PRs, tickets, incidents, and channel threads into the claim-to-artifact mapping — and produces a cited draft with bias flags pre-checked. But the method above works with or without it. The system is the point.
For the underlying tool category and how reviewing platforms compare, see our pillar on performance review software.
Frequently asked questions
How long should a performance review for a software engineer be?
For a six-month cycle, 600–900 words is the right range — long enough to cover three to five achievements with cited evidence, two growth areas, and a peer synthesis, short enough that calibration peers will actually read it. Reviews longer than 1,200 words usually signal padding, not depth.
What should I include in a software engineer performance review?
Four sections: achievements (3–5, each with an artifact), growth areas (1–2, evidence-backed), peer-feedback synthesis (themed, not quoted), and forward goals. Every factual claim should reference a specific PR, ticket, incident, or design doc. Adjectives without artifacts get cut.
How do you write a performance review for a senior engineer?
The structure is identical to mid-level, but the evidence surface shifts. Senior reviews lean less on PR counts and more on design docs authored, RFCs driven, incidents led, and engineers mentored. Look for influence on systems and people, not throughput. Cite the artifacts that prove the influence — RFC IDs, post-mortem authorship, mentee growth.
How do you avoid bias in engineering performance reviews?
Pull artifacts from the full cycle before writing, not just the last sprint, to neutralize recency. Run a five-minute checklist before submitting: recency, halo/horns, affinity, attribution, and gendered language. Have a peer manager read one review per cycle and flag patterns. The artifact-first method itself is the strongest anti-bias structural choice you can make.
By Samira Bahmanyar · HR Manager
Last updated: 2026-05-19.