Introduction to Leaderboard

Marketing AI Performance Leaderboard - June 2025 Results
 

Introduction


What is the Marketing AI Performance Leaderboard?

 
We tested 18 LLMs in their native UIs (like the ChatGPT interface) with a set of 18 marketing tests divided into 4 categories:
 

LLM’s tested:

OpenAI
Alibaba
Perplexity
Google
Anthropic
DeepSeek
o3
qwen3-235b-a22b
sonar-pro
2.5-Flash-preview
Sonnet-4
r1-0528
4.1
qwen3-30b-a3b
sonar
2.5-Pro-preview
Opus-4
4.1-mini
qwen3-32b
r1-1776
o4-mini
qwen-max
o4-mini-high
4o
 

Marketing Tests:

Copywriting

  1. Long-form Blog (LLM Gen)
  1. Long form Blog (Transcript Supplied)
  1. Google Ads
  1. LinkedIn Ads
  1. Meta Ads
  1. Social Posts (from Blog Content)
 

Strategic Planning

  1. Go-to-Market (GTM) Strategy for a New Feature
  1. Annual Budget Allocation & Prioritization
  1. Quarterly OKR Development
  1. Scenario & Risk Response

Research Online

  1. Industry Overview Report
  1. Competitor Analysis
  1. Buyer Persona Development
  1. Content Gap Analysis
  1. Market Opportunities & Threats
 
 

Analysis of Internal Data

  1. Customer Journey Analysis
  1. Marketing ROI Attribution
  1. PPC Campaign Analysis
 
 

FAQs

What does this Leaderboard represent?
We have designed tests that simulate a marketer’s interaction with native platform UIs (e.g., ChatGPT, Gemini) across several marketing domains:
 
  • Copywriting: Generating ad copy, email subject lines, and social media posts.
  • Internal Data Analysis: Interpreting sample CRM data to identify trends and insights.
  • Strategic Planning: Creating marketing plans based on given scenarios.
  • Online Research: Gathering information from the web to support marketing decisions.
 
How were the tests scored?
Each test output is evaluated by specialised AI “judges.”
 
  • Judges are themselves AI agents configured with specific evaluation criteria.
  • They parse the Test Answer, compare it against expected outcomes or benchmarks, and score on multiple dimensions (e.g., factual correctness, tone, format).
  • Final scores are normalized and aggregated to produce a single value per test.
Where can I see the full results?