Twenty-five display advertising field experiments run at Yahoo!, amounting to over $2.8M worth of impressions, give insight into the volume of data needed to form reliable conclusions concerning advertising effectiveness. Relatively speaking, individual-level sales are typically volatile, and only "small" impacts from advertising are required for a positive ROI. Using data from major U.S. retailers, we present a statistical argument to show the required sample size for a randomized experiment to generate sufficiently informative confidence intervals for a given campaign is typically millions of individual users exposed to hundreds of thousands of dollars of advertising. The argument also shows that sources of heterogeneity bias unaccounted for by observational methods only need to explain a tiny fraction of the variation in sales to severely bias estimates. Measuring advertising effectiveness is thus a situation with low-powered experiments and faulty observational methods | precisely where we would expect poorly calibrated beliefs in the market.