"The middle path: 'evidence-guided' development. Neither pure opinion nor pure data, but balancing human judgment with evidence. The key is not to obliterate human judgment but to supercharge it with evidence."
"Two metrics: a North Star (value delivered to users) AND a Top KPI (value captured by the business)"
Evidence from the Archive
Gmail, YouTube, Microsoft
Google+ consumed 1,000+ engineers over multiple years, built a division the size of Android, and was shut down in...
Google+ consumed 1,000+ engineers over multiple years, built a division the size of Android, and was shut down in 2019 -- opinion-based development at its most expensive
Worked inside Google during both its greatest product failure (Google+, 1,000+ engineers, shut down) and one of its greatest product successes (Gmail tabbed inbox) -- giving him firsthand experience of what happens when the same company uses opposite approaches. Their core argument: The middle path: 'evidence-guided' development. Neither pure opinion nor pure data, but balancing human judgment with evidence.
The evidence is specific: Google+ consumed 1,000+ engineers over multiple years, built a division the size of Android, and was shut down in 2019 -- opinion-based development at its most expensive. Furthermore, gmail tabbed inbox: Wizard of Oz test where users saw what appeared to be a working tabbed inbox, but emails were manually sorted behind the scenes to gather evidence cheaply. Gmail team's internal testing progression: own inboxes -> other Googlers -> external testers -> usability studies -> data mining -> full launch.
In Itamar Gilad's own words: "Every successful product company out there that you look at Amazon, Airbnb, anyone you will check, at least in their best periods they found a way to balance human judgment with evidence. They didn't try to obliterate human judgment and opinion just to supercharge them with evidence." (Articulating the core thesis of evidence-guided development.)
Gmail, YouTube, Microsoft
Google+ consumed millions of person-hours based on leadership conviction, failed, and was shut down in 2019
Gmail tabbed inbox was validated with a Wizard of Oz test (manually sorted emails behind a facade of HTML) before any real engineering
Former PM at Google who worked on both Google+ (a cautionary tale of opinion-driven development) and Gmail's tabbed inbox (a success story of evidence-guided development); now a product coach and author of Evidence-Guided. Their core argument: Being evidence-guided means matching the right type of evidence to the right decision stage -- not blindly applying A/B testing or blindly trusting intuition.
The evidence is specific: Google+ consumed millions of person-hours based on leadership conviction, failed, and was shut down in 2019. Furthermore, gmail tabbed inbox was validated with a Wizard of Oz test (manually sorted emails behind a facade of HTML) before any real engineering. Google missed WhatsApp and Snapchat opportunities while pouring resources into Google+ -- evidence of the cost of opinion-driven bets.
In Itamar Gilad's own words: "You fake it, you do a fake door test, you do a smoke test, Wizard of Oz tests. We used a lot of those in the tabbed inbox by the way, one of the first early versions was actually we showed the tabbed inbox working to people. But it wasn't really Gmail, it was just a facade of HTML." (Describing how the Gmail tabbed inbox was validated with scrappy evidence before committing engineering resources.)
Gmail, YouTube, Microsoft
Airbnb: nights booked as NSM reflecting value for both hosts and guests
WhatsApp: messages sent as NSM (value delivered), with monetization metrics tracked separately
Former product lead at Gmail, YouTube, and Microsoft. Now a product coach who has trained thousands of PMs. Author of Evidence-Guided. Their core argument: Two metrics always: a North Star Metric (value delivered to users) AND a Top KPI (value captured by the business), connected through metric trees.
The evidence is specific: WhatsApp: messages sent as NSM (value delivered), with monetization metrics tracked separately. Furthermore, airbnb: nights booked as NSM reflecting value for both hosts and guests. Amplitude: weekly active learning users -- users who found an insight valuable enough to share with two other users.
In Itamar Gilad's own words: "The North Star metric measures how much value we create for the market. For example, let's take WhatsApp. WhatsApp for a very long time measured messages sent because every message sent is a little incremental of value for the sender, the receiver." (Distinguishing the true NSM (value delivered) from what most companies call their NSM.)