Disclaimer: I’m a monthly donor to IPA
Innovations for Poverty Action — a very cool non-profit that conducts randomized trials of poverty alleviation programs —
has posted two “NYC Taxi Cab” ads to YouTube. The scuttlebutt on Twitter is that these ads are part of IPA’s own randomized A/B test to find out which ad works best at generating donations. Here’s the first ad (and a link to the second):
I love A/B tests of ads and can’t wait until YouTube makes an A/B tool widely available. Good for IPA for taking their organization’s mission and applying it to their own actions. However, I think IPA’s test has a few flaws that provide good lessons for others who want to gain knowledge about their advertising.
1) In order to track which donations came from which ad, IPA set up two different textable (SMS) key words on the same short code. To donate $10 to IPA, you can either text “IPA” (ad 1) or “action” (ad 2) to 80888. The difference of the memorability of those words might lead to differential giving patterns regardless of the ads’ relative effectiveness. If someone doesn’t have their phone in hand when the donation instructions appear on the screen, can they remember the acronym IPA? Maybe if they like the beer…but even then I think an English word is easier to remember, and thus could lead to more donations.
2) To rigorously test one ad vs the other, the ads would have to be sent randomly to the back seats of NYC taxi cabs. Perhaps the taxi ad placing agency, Verifone, has true randomization built into their system, but I worry that Verifone says to clients like IPA “yes, we can randomly place ads” while confusing arbitrary assignment with random assignment. Under arbitrary assignment, ad 1 might end up being seen more on the upper west side, and ad 2 ends up being seen more in the Bronx. NYC has large income and cultural disparities between nearby neighborhoods that could disrupt the experiment.
At the end of the day, my expectation is that issues (1) and (2) only create minor problems, and that if ad 1 were five or ten times more efficient at spurring action than ad 2, IPA would see the difference in their donation results. But that line of reasoning leads me to my final and largest concern:
3) The ads are very similar! I donate to IPA because I want them to test large ideas against each other — such as microfinance vs direct giving. In this way, a few good experiments can really shake up the poverty intervention landscape. With regard to the taxi ads, imagine that you work for IPA that ad 2 does a bit better than ad 1: what would your conclusion be? (Other than people don’t mind the shirt.) Since both ads have a narrator giving a straightforward explanation of the organization’s mission, I’m not sure what I would take from the test. The angle of each ad is slightly different (donor POV vs. scientist POV), but I would have much preferred IPA test two completely different messages or tactics against each other. For instance, staid vs zany, American POV vs African POV, or single-camera vs crowd-sourced. I realize that some of these ideas might be challenging to the creative team, but my position (if I were sitting at the IPA conference room) would be to hold off on the test until Creative had at least one off-beat idea that would contrast with these straightforward ads.
In general, A/B tests of video is a burgeoning field, and I’m glad IPA is leading the way so we can learn both what worked and what could be improved. Hopefully they’ll publish their results!
 Also, to be fair, there aren’t obvious solutions to the first two issues. IPA could spend money on a second short code, but of course the short codes might have variable memorability as well. (And getting deep in the weeds: IPA should have used the “IPA” short code in ad 2, which actually mentions the acronym in the ad’s narration.) For (2), we need more organizations to think like IPA and ask buyers to include true randomization in their ad placements./a