A/B Testing Hygiene
What AdGradr checks
Section titled “What AdGradr checks”AdGradr evaluates whether your account is running meaningful creative tests or just launching ads and hoping for the best:
- Ad count per campaign. LinkedIn recommends 4-5 active ads per campaign to give the algorithm options and generate comparative data.
- Creative variation. Whether ads within a campaign actually test different elements (headlines, images, copy angles) or are near-duplicates.
- Performance gap management. Whether underperforming ads are left running long after the data is clear.
- Campaign-to-audience ratio. Whether campaigns with small audiences are overloaded with too many ads.
AdGradr flags campaigns with only one active ad (the most significant finding), campaigns where all ads share the same headline, ads with large performance gaps left running well past the data being clear, and campaigns with too many ads for a small audience.
Why this matters
Section titled “Why this matters”LinkedIn’s CPCs are the highest in paid social. At $8-15 per click, every ad that underperforms is expensive. Testing is how you systematically reduce cost per lead and increase conversion rates over time.
But testing only works if you are actually varying the right elements and acting on the data. An account with 4 ads that all share the same headline is not testing messaging. It is testing which stock photo LinkedIn’s algorithm prefers. An account that lets a 0.2% CTR ad run alongside a 1.1% CTR ad for 6 weeks is burning budget on a proven loser.
The goal is structured learning: isolate one variable, let it run long enough to reach statistical relevance, then promote the winner and test the next variable.
What good looks like
Section titled “What good looks like”Recommended ad count per campaign
Section titled “Recommended ad count per campaign”| Audience Size | Active Ads | Rationale |
|---|---|---|
| Under 20K | 2-3 | Small audiences need concentrated impressions to generate meaningful data |
| 20K-100K | 3-4 | Enough volume for moderate testing |
| 100K+ | 4-5 | Full testing capacity; algorithm has room to optimize |
Testing hierarchy
Section titled “Testing hierarchy”Test these elements in order of impact:
- Offer/CTA (highest impact). “Download the guide” vs. “Book a demo” vs. “Watch the webinar.” This changes who clicks and what they expect.
- Headline/hook. The first line of ad copy and the headline are the two elements that determine whether someone stops scrolling.
- Creative format. Single Image vs. Document Ad vs. Video for the same message.
- Image/visual. Different images, illustrations vs. photos, people vs. abstract.
- Body copy. Long vs. short, stats-led vs. story-led, first person vs. third person.
Performance review cadence
Section titled “Performance review cadence”- Day 7: First look. Do not make decisions yet unless one ad has zero engagement.
- Day 14: Decision point. If one ad has 5x+ worse performance than the leader with 1,000+ impressions each, pause the loser.
- Day 21-28: Promote winners, swap in new challengers, begin the next test cycle.
Common mistakes
Section titled “Common mistakes”- Single ad per campaign. This is the most common issue. One ad means zero testing, zero learning, and complete dependence on whether that one creative happens to resonate. If it does not, you have no comparative data to guide your next move.
- “Testing” with identical headlines. If every ad in a campaign says “The Ultimate Guide to [Topic]” and only the image varies, you are not testing messaging. You are testing stock photography. Headline and hook variations produce the largest performance swings.
- Ignoring underperformers. An ad with a 0.3% CTR sitting next to one with a 1.0% CTR is actively dragging down campaign performance. LinkedIn’s algorithm will eventually shift spend away from the loser, but it takes time and budget to get there. Pause clear losers at the 14-day mark.
- Too many ads in a small audience campaign. Six ads targeting an audience of 15K means each ad gets roughly 2,500 impressions before the audience is saturated. That is not enough data to declare a winner. Fewer ads, more impressions per ad.
- Never rotating creative. Even winning ads fatigue. If you have been running the same top performer for 8+ weeks, CTR has likely declined. Set a maximum lifespan and force fresh creative into the mix.
How to fix it
Section titled “How to fix it”- Audit ad counts. Open each active campaign and count active ads. Flag any campaign with only 1 ad. Add 2-3 variations immediately, varying the headline and CTA.
- Check for real variation. Scan the headlines and primary copy across ads in each campaign. If they are all saying the same thing with different images, rewrite at least 2 ads with distinct hooks and angles.
- Set up a 14-day review cycle. Every two weeks, pull ad-level performance for each campaign. Pause any ad that has 5x worse CTR or cost-per-result than the campaign leader (with at least 1,000 impressions). Replace with a new variation.
- Right-size your ad count. For campaigns targeting audiences under 20K, reduce to 2-3 ads maximum. For larger audiences, expand to 4-5.
- Document your tests. Keep a simple log: what you tested, what won, and why you think it won. This builds institutional knowledge about what resonates with your audience and prevents you from re-running tests you have already settled.
When to ignore this check
Section titled “When to ignore this check”- ABM campaigns with Matched Audience company lists under 5K. These hyper-targeted campaigns may legitimately run a single ad tailored to a specific account list. The single-ad flag is informational for ABM.
- Campaigns in their first 7 days. New campaigns need time to accumulate data. The performance gap flag does not apply until Day 14.
- Retargeting campaigns with very small pools. A retargeting audience of 2K website visitors does not have enough volume for a 4-ad test. Two ads is appropriate.
Want someone to handle this? The Click Makers team manages LinkedIn Ads accounts for companies spending $10K+/month. Get in touch to see if we are a fit.