Real Marketing Tasks against the most popular AI’s. Who wins?

Pages List

List view

Marketing AI Performance Leaderboard

Introduction to Leaderboard

Overall Leaderboard June 2025

Copywriting Results

Data Analysis Results

Research Online Results

Strategic Planning Results

Marketing AI Performance Leaderboard - June 2025 Results

Introduction

What is the Marketing AI Performance Leaderboard?

We tested 18 LLMs in their native UIs (like the ChatGPT interface) with a set of 18 marketing tests divided into 4 categories:

Copywriting

Analysis of Internal Data

Research Online

Strategic Planning

LLM’s tested:

OpenAI	Alibaba	Perplexity	Google	Anthropic	DeepSeek
o3	qwen3-235b-a22b	sonar-pro	2.5-Flash-preview	Sonnet-4	r1-0528
4.1	qwen3-30b-a3b	sonar	2.5-Pro-preview	Opus-4	ㅤ
4.1-mini	qwen3-32b	r1-1776	ㅤ	ㅤ	ㅤ
o4-mini	qwen-max	ㅤ	ㅤ	ㅤ	ㅤ
o4-mini-high	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
4o	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ

Marketing Tests:

Copywriting

🔗 See Full Prompts

Long-form Blog (LLM Gen)

Long form Blog (Transcript Supplied)

Google Ads

LinkedIn Ads

Meta Ads

Social Posts (from Blog Content)

Strategic Planning

🔗 See Full Prompts

Go-to-Market (GTM) Strategy for a New Feature

Annual Budget Allocation & Prioritization

Quarterly OKR Development

Scenario & Risk Response

Research Online

🔗 See Full Prompts

Industry Overview Report

Competitor Analysis

Buyer Persona Development

Content Gap Analysis

Market Opportunities & Threats

Analysis of Internal Data

🔗 See Full Prompts

Customer Journey Analysis

Marketing ROI Attribution

PPC Campaign Analysis

FAQs

What does this Leaderboard represent?

We have designed tests that simulate a marketer’s interaction with native platform UIs (e.g., ChatGPT, Gemini) across several marketing domains:

Copywriting: Generating ad copy, email subject lines, and social media posts.

Internal Data Analysis: Interpreting sample CRM data to identify trends and insights.

Strategic Planning: Creating marketing plans based on given scenarios.

Online Research: Gathering information from the web to support marketing decisions.

🔗 Read more about the methodology here

How were the tests scored?

Each test output is evaluated by specialised AI “judges.”

Judges are themselves AI agents configured with specific evaluation criteria.

They parse the Test Answer, compare it against expected outcomes or benchmarks, and score on multiple dimensions (e.g., factual correctness, tone, format).

Final scores are normalized and aggregated to produce a single value per test.

🔗 Read more about the methodology here

Where can I see the full results?

https://benchmark.lavametrics.app/superset/dashboard/p/pRdMQdKrY32/