Free SQL Reference
28 production-ready SQL queries for GA4 BigQuery — with explanations, expected outputs, and pitfall warnings. Companion to our GA4 Complete Guide.
The GA4 BigQuery export stores one row per event. Parameters, user properties, and ecommerce items are stored in nested REPEATED RECORD arrays — you must use UNNEST() to access them. This is the single most important pattern in every query below.
GA4 has three separate traffic source fields: traffic_source (first-touch, user-level), collected_traffic_source (raw collected), and session_traffic_source_last_click (session-level, added Nov 2023). Most queries in Section 3 use the session-level field.
Count total events grouped by date — the simplest possible GA4 BigQuery query.
SELECT
event_date,
COUNT(*) AS event_count
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
event_date
ORDER BY
event_date
How it works: The events_* wildcard matches all daily export tables. _TABLE_SUFFIX filters the date range without scanning every table in the dataset, keeping costs low. Each row in GA4 BigQuery represents one event, so COUNT(*) gives total events per day.
Expected output:
| event_date | event_count |
|---|---|
| 20260101 | 14,523 |
| 20260102 | 15,891 |
| 20260103 | 12,047 |
event_date is a STRING (YYYYMMDD), not a DATE type. To use date functions, cast it: PARSE_DATE('%Y%m%d', event_date).Count unique users per day using user_pseudo_id — the GA4 equivalent of a client ID.
SELECT
event_date,
COUNT(DISTINCT user_pseudo_id) AS active_users
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
event_date
ORDER BY
event_date
How it works: user_pseudo_id is a first-party cookie identifier assigned to every visitor. COUNT(DISTINCT ...) deduplicates so each user is counted once per day, regardless of how many events they triggered.
Expected output:
| event_date | active_users |
|---|---|
| 20260101 | 2,341 |
| 20260102 | 2,587 |
| 20260103 | 1,923 |
Get page view counts with full URLs — your first encounter with UNNEST, the most important pattern in GA4 BigQuery.
SELECT
(SELECT value.string_value
FROM UNNEST(event_params) WHERE key = 'page_location'
) AS page_url,
COUNT(*) AS pageviews
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
AND event_name = 'page_view'
GROUP BY
page_url
ORDER BY
pageviews DESC
LIMIT 20
How it works: Page URLs are stored inside the nested event_params array under the key page_location. The correlated subquery with UNNEST(event_params) extracts the string value for that key. This subquery-UNNEST pattern is the foundation of nearly every GA4 BigQuery query.
Expected output:
| page_url | pageviews |
|---|---|
| https://example.com/ | 4,201 |
| https://example.com/pricing | 1,832 |
| https://example.com/blog/guide | 1,456 |
page_location includes full URLs with query parameters. To group by path only, wrap it in NET.HOST() and REGEXP_EXTRACT() or use page_location combined with SPLIT() to strip query strings.A reusable template for extracting any parameter from event_params — bookmark this one.
-- String parameters (page_location, page_title, page_referrer, etc.)
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'param_name')
-- Integer parameters (ga_session_id, ga_session_number, engagement_time_msec)
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'param_name')
-- Float parameters (custom numeric params)
(SELECT value.float_value FROM UNNEST(event_params) WHERE key = 'param_name')
-- Safe approach: COALESCE all value types
(SELECT COALESCE(value.string_value,
CAST(value.int_value AS STRING),
CAST(value.float_value AS STRING),
CAST(value.double_value AS STRING))
FROM UNNEST(event_params) WHERE key = 'param_name')
How it works: GA4 stores parameter values across four type columns: string_value, int_value, float_value, double_value. Only one is populated per parameter. You need to know which type your parameter uses — or use the COALESCE approach to safely handle any type.
page_location → string, ga_session_id → int, engagement_time_msec → int, page_title → string, percent_scrolled → int, session_engaged → string ("1" not integer).Count unique sessions per day — the most-requested GA4 BigQuery query that Google's docs don't cover.
SELECT
event_date,
COUNT(DISTINCT
CONCAT(
user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)
) AS sessions
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
event_date
ORDER BY
event_date
How it works: GA4 doesn't have a unique session ID. The ga_session_id parameter is only unique within a single user. To create a globally unique session key, you must concatenate user_pseudo_id + ga_session_id. This pattern is used in every session-level query.
Expected output:
| event_date | sessions |
|---|---|
| 20260101 | 3,105 |
| 20260102 | 3,422 |
| 20260103 | 2,678 |
ga_session_id alone is NOT unique across users. Two different users can have the same ga_session_id. Always concatenate with user_pseudo_id. Also, sessions spanning midnight will appear in two daily tables.Calculate GA4-style engagement rate and its inverse (bounce rate) from raw event data.
WITH sessions AS (
SELECT
CONCAT(user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session_key,
MAX(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged')
) AS session_engaged
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
session_key
)
SELECT
COUNT(*) AS total_sessions,
COUNTIF(session_engaged = '1') AS engaged_sessions,
ROUND(COUNTIF(session_engaged = '1') / COUNT(*) * 100, 1) AS engagement_rate,
ROUND((1 - COUNTIF(session_engaged = '1') / COUNT(*)) * 100, 1) AS bounce_rate
FROM
sessions
How it works: GA4 considers a session "engaged" if it lasted >10 seconds, had 2+ page views, or triggered a conversion. The session_engaged parameter is '1' for engaged sessions. The CTE groups events by session, then the outer query calculates the percentages. Bounce rate in GA4 is simply 1 - engagement_rate.
Expected output:
| total_sessions | engaged_sessions | engagement_rate | bounce_rate |
|---|---|---|---|
| 45,230 | 28,520 | 63.1% | 36.9% |
session_engaged is a STRING value '1', not an integer. Also, GA4's bounce rate is NOT the same as Universal Analytics' bounce rate — it's the inverse of engagement rate, which considers time, page views, and conversions.Calculate average session duration in seconds using engagement time.
WITH sessions AS (
SELECT
CONCAT(user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session_key,
SUM(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec')
) AS engagement_ms
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
session_key
)
SELECT
ROUND(AVG(engagement_ms) / 1000, 1) AS avg_duration_sec,
ROUND(AVG(engagement_ms) / 60000, 2) AS avg_duration_min
FROM
sessions
WHERE
engagement_ms > 0
How it works: GA4 tracks engagement_time_msec on each event, measuring active time on the page (tab in focus). Summing this per session gives total engagement time. We divide by 1000 for seconds or 60000 for minutes. The WHERE engagement_ms > 0 excludes bounced sessions.
Expected output:
| avg_duration_sec | avg_duration_min |
|---|---|
| 127.4 | 2.12 |
engagement_time_msec is accumulated — each event carries the total engagement time since the last event. SUM() across all events in a session can overcount. For precision, use MAX() per session instead, which gives the final accumulated value.Analyze landing pages by sessions, users, engagement rate, and conversions.
WITH session_starts AS (
SELECT
CONCAT(user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session_key,
user_pseudo_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page,
MAX((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged')) AS engaged
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
AND event_name = 'session_start'
GROUP BY
session_key, user_pseudo_id, landing_page
)
SELECT
landing_page,
COUNT(*) AS sessions,
COUNT(DISTINCT user_pseudo_id) AS users,
ROUND(COUNTIF(engaged = '1') / COUNT(*) * 100, 1) AS engagement_rate
FROM
session_starts
GROUP BY
landing_page
ORDER BY
sessions DESC
LIMIT 20
How it works: We filter for session_start events which fire at the beginning of every session. The page_location on this event is the landing page. The CTE captures the session key, user, landing page, and engagement flag, then the outer query aggregates by landing page.
Expected output:
| landing_page | sessions | users | engagement_rate |
|---|---|---|---|
| https://example.com/ | 5,201 | 4,870 | 58.3% |
| https://example.com/blog/guide | 1,832 | 1,654 | 72.1% |
| https://example.com/pricing | 1,104 | 985 | 81.5% |
REGEXP_EXTRACT(page_location, r'^https?://[^/]+(/.*)$') and then SPLIT(path, '?')[OFFSET(0)].Calculate the average number of page views per session.
WITH session_pages AS (
SELECT
CONCAT(user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS session_key,
COUNTIF(event_name = 'page_view') AS page_views
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
session_key
)
SELECT
ROUND(AVG(page_views), 2) AS avg_pages_per_session,
MAX(page_views) AS max_pages,
APPROX_QUANTILES(page_views, 100)[OFFSET(50)] AS median_pages
FROM
session_pages
How it works: The CTE counts page_view events per session using COUNTIF. The outer query calculates average, max, and median (via APPROX_QUANTILES). The median is often more useful than the average since a few very deep sessions can skew the mean.
Expected output:
| avg_pages_per_session | max_pages | median_pages |
|---|---|---|
| 2.84 | 47 | 2 |
APPROX_QUANTILES is an approximate function (faster and cheaper than exact). For exact median, use PERCENTILE_CONT(page_views, 0.5) OVER() but it's significantly more expensive on large datasets.Break down where your visitors come from, including organic search, paid campaigns, and referral traffic.
Replicate the GA4 source/medium report with sessions, users, and key engagement metrics.
SELECT
traffic_source.source AS source,
traffic_source.medium AS medium,
COUNT(DISTINCT user_pseudo_id) AS users,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) AS sessions,
COUNTIF(event_name = 'page_view') AS page_views
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
source, medium
ORDER BY
sessions DESC
LIMIT 20
How it works: Uses the traffic_source RECORD — this captures the first-touch attribution (the source that originally acquired the user). For session-scoped attribution, use collected_traffic_source fields instead (see Attribution section). Results are sorted by session count descending.
Expected output:
| source | medium | users | sessions | page_views |
|---|---|---|---|---|
| organic | 12,480 | 18,230 | 34,560 | |
| (direct) | (none) | 5,230 | 7,890 | 11,200 |
| newsletter | 1,450 | 1,680 | 3,210 |
traffic_source (first-touch, user-level), collected_traffic_source (session-scoped), and event-level source/medium from event_params. Mixing them produces inconsistent results.Build the default channel grouping logic in SQL for custom reporting and flexible attribution.
SELECT
CASE
WHEN traffic_source.source = '(direct)'
AND (traffic_source.medium = '(none)' OR traffic_source.medium = '(not set)')
THEN 'Direct'
WHEN traffic_source.medium = 'organic'
THEN 'Organic Search'
WHEN REGEXP_CONTAINS(traffic_source.medium, r'^(cpc|ppc|paidsearch)$')
THEN 'Paid Search'
WHEN REGEXP_CONTAINS(traffic_source.medium, r'^(display|cpm|banner)$')
THEN 'Display'
WHEN REGEXP_CONTAINS(traffic_source.medium, r'^(social|social-network|social-media)$')
THEN 'Social'
WHEN traffic_source.medium = 'email'
THEN 'Email'
WHEN traffic_source.medium = 'referral'
THEN 'Referral'
ELSE 'Other'
END AS channel_grouping,
COUNT(DISTINCT user_pseudo_id) AS users,
COUNTIF(event_name = 'page_view') AS page_views
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
channel_grouping
ORDER BY
users DESC
How it works: The CASE statement maps source/medium combinations into channels following Google's default channel grouping logic. You can customize this to match your own channel definitions — for example, adding rules for affiliate traffic or branded vs. non-branded paid search.
Expected output:
| channel_grouping | users | page_views |
|---|---|---|
| Organic Search | 12,480 | 34,560 |
| Direct | 5,230 | 11,200 |
| Paid Search | 2,100 | 4,890 |
Analyze UTM campaign performance with sessions, engagement, and conversion metrics.
SELECT
traffic_source.source,
traffic_source.medium,
traffic_source.name AS campaign,
COUNT(DISTINCT user_pseudo_id) AS users,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) AS sessions,
COUNTIF(event_name = 'purchase') AS conversions,
SUM(ecommerce.purchase_revenue) AS revenue
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
AND traffic_source.name IS NOT NULL
AND traffic_source.name != '(not set)'
GROUP BY
source, medium, campaign
ORDER BY
sessions DESC
LIMIT 25
How it works: Combines traffic_source.name (campaign name from UTM tags) with source/medium to give you a full campaign breakdown. The query filters out null and "(not set)" campaigns, then aggregates sessions, conversions, and revenue. This shows first-touch attribution; for session-scoped campaign data, use collected_traffic_source.manual_campaign_name.
Expected output:
| source | medium | campaign | users | sessions | conversions | revenue |
|---|---|---|---|---|---|---|
| cpc | brand_q1_2026 | 3,200 | 4,580 | 142 | 28,400 | |
| social | spring_promo | 1,800 | 2,100 | 56 | 8,960 | |
| newsletter | weekly_digest | 980 | 1,120 | 34 | 5,440 |
Find which pages attract organic search traffic and extract the search terms (when available).
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page,
traffic_source.source,
collected_traffic_source.manual_term AS keyword,
COUNT(DISTINCT user_pseudo_id) AS users,
COUNTIF(event_name = 'session_start') AS sessions,
COUNTIF(event_name = 'purchase') AS conversions
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
AND traffic_source.medium = 'organic'
AND event_name IN ('session_start', 'page_view', 'purchase')
GROUP BY
landing_page, source, keyword
ORDER BY
sessions DESC
LIMIT 25
How it works: Filters to organic traffic only, then groups by landing page and search keyword. The collected_traffic_source.manual_term field stores the utm_term parameter — for organic search, this is usually empty because Google encrypts search terms. For actual keyword data, combine with Google Search Console data in BigQuery.
Expected output:
| landing_page | source | keyword | users | sessions |
|---|---|---|---|---|
| https://example.com/blog/guide | (not set) | 2,450 | 3,120 | |
| https://example.com/features | (not set) | 890 | 1,080 | |
| https://example.com/pricing | bing | software pricing | 230 | 285 |
Segment traffic by new and returning users with engagement and conversion comparison.
SELECT
CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params)
WHERE key = 'ga_session_number') = 1
THEN 'New'
ELSE 'Returning'
END AS user_type,
COUNT(DISTINCT user_pseudo_id) AS users,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) AS sessions,
COUNTIF(event_name = 'purchase') AS conversions,
ROUND(SAFE_DIVIDE(
COUNTIF(event_name = 'purchase'),
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
))
) * 100, 2) AS conversion_rate
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
user_type
ORDER BY
users DESC
How it works: The ga_session_number event parameter is a sequential counter — when it equals 1, it's the user's very first session. Everything else is returning. The query compares users, sessions, and conversion rate between segments. Returning users typically convert at 2-3x the rate of new users.
Expected output:
| user_type | users | sessions | conversions | conversion_rate |
|---|---|---|---|---|
| New | 18,400 | 18,400 | 92 | 0.50 |
| Returning | 4,600 | 8,900 | 234 | 2.63 |
Revenue analysis, product performance, and purchase funnel queries for sites with ecommerce tracking enabled.
Attribute purchase revenue to traffic sources to understand which channels drive the most value.
SELECT
traffic_source.source,
traffic_source.medium,
COUNT(DISTINCT user_pseudo_id) AS purchasers,
COUNTIF(event_name = 'purchase') AS transactions,
ROUND(SUM(ecommerce.purchase_revenue), 2) AS revenue,
ROUND(SAFE_DIVIDE(
SUM(ecommerce.purchase_revenue),
COUNTIF(event_name = 'purchase')
), 2) AS avg_order_value
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
AND event_name = 'purchase'
GROUP BY
source, medium
ORDER BY
revenue DESC
LIMIT 15
How it works: Filters to purchase events only, then groups by first-touch source/medium. ecommerce.purchase_revenue holds the total transaction value. SAFE_DIVIDE prevents division by zero errors. This uses user-level (first-touch) attribution — the source that originally acquired the customer gets full credit.
Expected output:
| source | medium | purchasers | transactions | revenue | avg_order_value |
|---|---|---|---|---|---|
| organic | 342 | 410 | 45,280.00 | 110.44 | |
| cpc | 186 | 215 | 28,750.00 | 133.72 | |
| (direct) | (none) | 128 | 156 | 18,920.00 | 121.28 |
Analyze individual product views, add-to-carts, and purchases from the items array.
SELECT
items.item_name,
items.item_category,
COUNTIF(event_name = 'view_item') AS views,
COUNTIF(event_name = 'add_to_cart') AS add_to_carts,
COUNTIF(event_name = 'purchase') AS purchases,
ROUND(SUM(IF(event_name = 'purchase', items.price * items.quantity, 0)), 2) AS revenue,
ROUND(SAFE_DIVIDE(
COUNTIF(event_name = 'purchase'),
COUNTIF(event_name = 'view_item')
) * 100, 2) AS view_to_purchase_rate
FROM
`your-project.analytics_XXXXXX.events_*`,
UNNEST(items) AS items
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
AND event_name IN ('view_item', 'add_to_cart', 'purchase')
GROUP BY
item_name, item_category
ORDER BY
revenue DESC
LIMIT 20
How it works: UNNEST(items) explodes the nested items array so each product gets its own row. The query then counts views, add-to-carts, and purchases per product using COUNTIF. Revenue is calculated only for purchase events. The view-to-purchase rate reveals which products convert best from browsing.
Expected output:
| item_name | item_category | views | add_to_carts | purchases | revenue | view_to_purchase_rate |
|---|---|---|---|---|---|---|
| Pro Plan Annual | subscription | 4,200 | 680 | 210 | 25,200.00 | 5.00 |
| Starter Kit | physical | 8,500 | 1,200 | 340 | 16,660.00 | 4.00 |
UNNEST(items) creates one row per item per event. If a purchase event has 3 items, it creates 3 rows. This means you cannot count distinct transactions using COUNT(*) after UNNEST — use COUNT(DISTINCT ecommerce.transaction_id) instead.Build a complete ecommerce funnel from page view to purchase with drop-off rates at each step.
WITH funnel AS (
SELECT
COUNT(DISTINCT IF(event_name = 'session_start', user_pseudo_id, NULL)) AS step_1_sessions,
COUNT(DISTINCT IF(event_name = 'view_item', user_pseudo_id, NULL)) AS step_2_view_item,
COUNT(DISTINCT IF(event_name = 'add_to_cart', user_pseudo_id, NULL)) AS step_3_add_to_cart,
COUNT(DISTINCT IF(event_name = 'begin_checkout', user_pseudo_id, NULL)) AS step_4_checkout,
COUNT(DISTINCT IF(event_name = 'purchase', user_pseudo_id, NULL)) AS step_5_purchase
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
)
SELECT
step_1_sessions,
step_2_view_item,
ROUND(SAFE_DIVIDE(step_2_view_item, step_1_sessions) * 100, 1) AS view_rate,
step_3_add_to_cart,
ROUND(SAFE_DIVIDE(step_3_add_to_cart, step_2_view_item) * 100, 1) AS atc_rate,
step_4_checkout,
ROUND(SAFE_DIVIDE(step_4_checkout, step_3_add_to_cart) * 100, 1) AS checkout_rate,
step_5_purchase,
ROUND(SAFE_DIVIDE(step_5_purchase, step_4_checkout) * 100, 1) AS purchase_rate,
ROUND(SAFE_DIVIDE(step_5_purchase, step_1_sessions) * 100, 2) AS overall_conversion
FROM
funnel
How it works: Counts distinct users at each funnel step. This is an unordered funnel — it counts whether a user triggered each event, not whether events happened in sequence. For a sequential funnel (user must complete step N before step N+1), see the Advanced section. The drop-off rates between steps reveal your biggest optimization opportunities.
Expected output:
| step_1_sessions | step_2_view_item | view_rate | step_3_add_to_cart | atc_rate | step_5_purchase | overall_conversion |
|---|---|---|---|---|---|---|
| 24,500 | 8,200 | 33.5% | 2,100 | 25.6% | 580 | 2.37% |
Compare purchase behavior across device categories — desktop, mobile, and tablet.
SELECT
device.category AS device_type,
COUNT(DISTINCT ecommerce.transaction_id) AS transactions,
COUNT(DISTINCT user_pseudo_id) AS purchasers,
ROUND(SUM(ecommerce.purchase_revenue), 2) AS total_revenue,
ROUND(AVG(ecommerce.purchase_revenue), 2) AS avg_order_value,
ROUND(SAFE_DIVIDE(
SUM(ecommerce.purchase_revenue),
COUNT(DISTINCT user_pseudo_id)
), 2) AS revenue_per_user
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
AND event_name = 'purchase'
GROUP BY
device_type
ORDER BY
total_revenue DESC
How it works: Groups purchase events by device.category to compare AOV and revenue per user across devices. Desktop users typically have higher AOV, while mobile may have higher transaction volume. Revenue per user helps identify which platform drives the most value per customer.
Expected output:
| device_type | transactions | purchasers | total_revenue | avg_order_value | revenue_per_user |
|---|---|---|---|---|---|
| desktop | 380 | 310 | 52,460.00 | 138.05 | 169.23 |
| mobile | 245 | 220 | 24,500.00 | 100.00 | 111.36 |
| tablet | 42 | 38 | 5,040.00 | 120.00 | 132.63 |
Build your own attribution models in SQL — from simple first/last click to multi-touch linear attribution.
Credit 100% of each conversion to the first touchpoint that acquired the user.
WITH conversions AS (
SELECT
user_pseudo_id,
ecommerce.transaction_id,
ecommerce.purchase_revenue AS revenue,
traffic_source.source AS first_source,
traffic_source.medium AS first_medium
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
AND event_name = 'purchase'
)
SELECT
first_source,
first_medium,
COUNT(DISTINCT transaction_id) AS conversions,
ROUND(SUM(revenue), 2) AS attributed_revenue
FROM
conversions
GROUP BY
first_source, first_medium
ORDER BY
attributed_revenue DESC
How it works: This is the simplest attribution model. traffic_source.source/medium always refers to the first touchpoint that brought the user. By joining it with purchase events, you attribute all revenue to the channel that originally acquired the user. This model favors awareness channels like organic search and social.
Expected output:
| first_source | first_medium | conversions | attributed_revenue |
|---|---|---|---|
| organic | 410 | 45,280.00 | |
| cpc | 186 | 22,320.00 | |
| social | 95 | 11,400.00 |
Credit the conversion to the last marketing touchpoint before purchase, ignoring direct visits.
WITH sessions AS (
SELECT
user_pseudo_id,
event_timestamp,
(SELECT value.int_value FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS session_id,
collected_traffic_source.manual_source AS source,
collected_traffic_source.manual_medium AS medium,
event_name,
ecommerce.purchase_revenue AS revenue,
ecommerce.transaction_id
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
),
last_touch AS (
SELECT
s.user_pseudo_id,
s.transaction_id,
s.revenue,
t.source AS last_nd_source,
t.medium AS last_nd_medium
FROM
sessions s
JOIN (
SELECT
user_pseudo_id,
source,
medium,
event_timestamp,
ROW_NUMBER() OVER(
PARTITION BY user_pseudo_id
ORDER BY event_timestamp DESC
) AS rn
FROM
sessions
WHERE
source IS NOT NULL
AND source != '(direct)'
AND event_name = 'session_start'
) t
ON s.user_pseudo_id = t.user_pseudo_id AND t.rn = 1
WHERE
s.event_name = 'purchase'
)
SELECT
last_nd_source,
last_nd_medium,
COUNT(DISTINCT transaction_id) AS conversions,
ROUND(SUM(revenue), 2) AS attributed_revenue
FROM
last_touch
GROUP BY
last_nd_source, last_nd_medium
ORDER BY
attributed_revenue DESC
How it works: First, it collects all session-level touchpoints using collected_traffic_source (session-scoped, not user-scoped like traffic_source). Then it uses ROW_NUMBER() to find the most recent non-direct session before each purchase. This model is closest to what GA4 uses by default. It favors closing channels like branded search and retargeting.
Expected output:
| last_nd_source | last_nd_medium | conversions | attributed_revenue |
|---|---|---|---|
| cpc | 245 | 31,850.00 | |
| organic | 320 | 28,480.00 | |
| newsletter | 78 | 9,360.00 |
collected_traffic_source was added in late 2023. Earlier data may have NULL values. Also, this field uses manual_source/manual_medium (from UTM parameters) — for auto-tagged Google Ads traffic, check gclid fields instead.Distribute conversion credit equally across all touchpoints in the user's journey.
WITH touchpoints AS (
SELECT
user_pseudo_id,
event_timestamp,
collected_traffic_source.manual_source AS source,
collected_traffic_source.manual_medium AS medium
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
AND event_name = 'session_start'
AND collected_traffic_source.manual_source IS NOT NULL
),
conversions AS (
SELECT
user_pseudo_id,
ecommerce.transaction_id,
ecommerce.purchase_revenue AS revenue
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
AND event_name = 'purchase'
),
user_paths AS (
SELECT
c.transaction_id,
c.revenue,
t.source,
t.medium,
COUNT(*) OVER(PARTITION BY c.transaction_id) AS touchpoint_count
FROM
conversions c
JOIN
touchpoints t ON c.user_pseudo_id = t.user_pseudo_id
)
SELECT
source,
medium,
ROUND(SUM(1.0 / touchpoint_count), 1) AS attributed_conversions,
ROUND(SUM(revenue / touchpoint_count), 2) AS attributed_revenue
FROM
user_paths
GROUP BY
source, medium
ORDER BY
attributed_revenue DESC
How it works: For each conversion, the query finds all touchpoints (sessions) the user had. It then divides the revenue equally: if a user had 4 touchpoints, each gets 25% credit. The touchpoint_count window function counts sessions per transaction. This model gives a balanced view by crediting all channels in the customer journey, not just the first or last.
Expected output:
| source | medium | attributed_conversions | attributed_revenue |
|---|---|---|---|
| organic | 285.3 | 31,420.50 | |
| cpc | 198.7 | 24,105.25 | |
| newsletter | 112.5 | 13,500.00 |
Prepare conversion path data for Shapley value or Markov chain attribution modeling.
WITH sessions_with_source AS (
SELECT
user_pseudo_id,
event_timestamp,
CONCAT(
IFNULL(collected_traffic_source.manual_source, '(direct)'),
' / ',
IFNULL(collected_traffic_source.manual_medium, '(none)')
) AS channel,
event_name,
ecommerce.transaction_id,
ecommerce.purchase_revenue AS revenue
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
AND event_name IN ('session_start', 'purchase')
),
paths AS (
SELECT
user_pseudo_id,
STRING_AGG(
IF(event_name = 'session_start', channel, NULL)
IGNORE NULLS
ORDER BY event_timestamp
) AS conversion_path,
MAX(IF(event_name = 'purchase', 1, 0)) AS converted,
SUM(IF(event_name = 'purchase', revenue, 0)) AS revenue
FROM
sessions_with_source
GROUP BY
user_pseudo_id
)
SELECT
conversion_path,
converted,
COUNT(*) AS path_count,
ROUND(SUM(revenue), 2) AS total_revenue
FROM
paths
GROUP BY
conversion_path, converted
ORDER BY
path_count DESC
LIMIT 50
How it works: This query builds conversion paths — the sequence of channels each user interacted with, separated by commas. STRING_AGG concatenates channel touchpoints in chronological order. The output shows both converting and non-converting paths, which is the input needed for data-driven models like Markov chains (transition probabilities between channels) or Shapley values (each channel's marginal contribution). Export this data to Python or R for the actual modeling.
Expected output:
| conversion_path | converted | path_count | total_revenue |
|---|---|---|---|
| google / organic | 0 | 8,450 | 0.00 |
| google / organic, google / cpc | 1 | 142 | 18,460.00 |
| google / organic, (direct) / (none), newsletter / email | 1 | 86 | 10,320.00 |
Advanced queries for sequential funnel analysis, user retention cohorts, and engagement frequency.
Build a strict sequential funnel where users must complete each step in order before moving to the next.
WITH user_events AS (
SELECT
user_pseudo_id,
event_name,
event_timestamp,
MIN(IF(event_name = 'session_start', event_timestamp, NULL))
OVER(PARTITION BY user_pseudo_id) AS first_session,
MIN(IF(event_name = 'view_item', event_timestamp, NULL))
OVER(PARTITION BY user_pseudo_id) AS first_view,
MIN(IF(event_name = 'add_to_cart', event_timestamp, NULL))
OVER(PARTITION BY user_pseudo_id) AS first_atc,
MIN(IF(event_name = 'begin_checkout', event_timestamp, NULL))
OVER(PARTITION BY user_pseudo_id) AS first_checkout,
MIN(IF(event_name = 'purchase', event_timestamp, NULL))
OVER(PARTITION BY user_pseudo_id) AS first_purchase
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
)
SELECT
COUNT(DISTINCT user_pseudo_id) AS step_1_session,
COUNT(DISTINCT IF(first_view > first_session, user_pseudo_id, NULL)) AS step_2_view,
COUNT(DISTINCT IF(first_atc > first_view, user_pseudo_id, NULL)) AS step_3_atc,
COUNT(DISTINCT IF(first_checkout > first_atc, user_pseudo_id, NULL)) AS step_4_checkout,
COUNT(DISTINCT IF(first_purchase > first_checkout, user_pseudo_id, NULL)) AS step_5_purchase
FROM
user_events
WHERE
event_name = 'session_start'
How it works: Window functions compute the first timestamp each user triggered each event. Then the outer query enforces order: a user only counts for step 3 if their first add_to_cart happened after their first view_item. This is stricter than the unordered funnel in the Ecommerce section and gives more accurate drop-off analysis.
Expected output:
| step_1_session | step_2_view | step_3_atc | step_4_checkout | step_5_purchase |
|---|---|---|---|---|
| 24,500 | 7,800 | 1,950 | 1,120 | 520 |
Build a weekly retention cohort table showing what percentage of users return in subsequent weeks.
WITH user_activity AS (
SELECT
user_pseudo_id,
DATE_TRUNC(
PARSE_DATE('%Y%m%d', event_date), WEEK
) AS activity_week
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
GROUP BY
user_pseudo_id, activity_week
),
cohorts AS (
SELECT
user_pseudo_id,
MIN(activity_week) AS cohort_week
FROM
user_activity
GROUP BY
user_pseudo_id
)
SELECT
c.cohort_week,
DATE_DIFF(a.activity_week, c.cohort_week, WEEK) AS weeks_since_first,
COUNT(DISTINCT a.user_pseudo_id) AS active_users,
ROUND(SAFE_DIVIDE(
COUNT(DISTINCT a.user_pseudo_id),
MAX(COUNT(DISTINCT a.user_pseudo_id)) OVER(PARTITION BY c.cohort_week)
) * 100, 1) AS retention_pct
FROM
cohorts c
JOIN
user_activity a ON c.user_pseudo_id = a.user_pseudo_id
GROUP BY
c.cohort_week, weeks_since_first
ORDER BY
c.cohort_week, weeks_since_first
How it works: The cohorts CTE assigns each user to their first-activity week. Then it joins back to all activity to count how many users from each cohort return in subsequent weeks. DATE_DIFF(..., WEEK) calculates the gap. The retention percentage shows what fraction of the original cohort is still active. Export to a spreadsheet for the classic cohort heatmap visualization.
Expected output:
| cohort_week | weeks_since_first | active_users | retention_pct |
|---|---|---|---|
| 2026-01-05 | 0 | 3,200 | 100.0 |
| 2026-01-05 | 1 | 960 | 30.0 |
| 2026-01-05 | 2 | 640 | 20.0 |
Calculate daily, weekly, and monthly active users and the stickiness ratios (DAU/MAU, WAU/MAU).
WITH daily AS (
SELECT
PARSE_DATE('%Y%m%d', event_date) AS dt,
COUNT(DISTINCT user_pseudo_id) AS dau
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY dt
),
monthly AS (
SELECT
COUNT(DISTINCT user_pseudo_id) AS mau
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
),
weekly AS (
SELECT
ROUND(AVG(wau), 0) AS avg_wau
FROM (
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d', event_date), WEEK) AS wk,
COUNT(DISTINCT user_pseudo_id) AS wau
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY wk
)
)
SELECT
ROUND(AVG(d.dau), 0) AS avg_dau,
w.avg_wau,
m.mau,
ROUND(SAFE_DIVIDE(AVG(d.dau), m.mau) * 100, 1) AS dau_mau_pct,
ROUND(SAFE_DIVIDE(w.avg_wau, m.mau) * 100, 1) AS wau_mau_pct
FROM
daily d
CROSS JOIN
monthly m
CROSS JOIN
weekly w
GROUP BY
m.mau, w.avg_wau
How it works: Three CTEs compute daily, weekly, and monthly active users separately. CROSS JOIN combines them for the final calculation. The DAU/MAU ratio (stickiness) shows what fraction of monthly users visit on any given day — a ratio above 20% is considered strong for most products. WAU/MAU above 40% indicates healthy weekly engagement.
Expected output:
| avg_dau | avg_wau | mau | dau_mau_pct | wau_mau_pct |
|---|---|---|---|---|
| 2,450 | 8,200 | 24,500 | 10.0 | 33.5 |
Measure how many days it takes from a user's first visit to their first purchase.
WITH first_visit AS (
SELECT
user_pseudo_id,
MIN(PARSE_DATE('%Y%m%d', event_date)) AS first_date
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
GROUP BY
user_pseudo_id
),
first_purchase AS (
SELECT
user_pseudo_id,
MIN(PARSE_DATE('%Y%m%d', event_date)) AS purchase_date
FROM
`your-project.analytics_XXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
AND event_name = 'purchase'
GROUP BY
user_pseudo_id
)
SELECT
DATE_DIFF(p.purchase_date, v.first_date, DAY) AS days_to_convert,
COUNT(*) AS users
FROM
first_purchase p
JOIN
first_visit v ON p.user_pseudo_id = v.user_pseudo_id
GROUP BY
days_to_convert
ORDER BY
days_to_convert
LIMIT 30
How it works: Two CTEs find each user's first-ever visit date and first purchase date. DATE_DIFF calculates the gap in days. The distribution shows how quickly users convert — most conversions typically happen on day 0 (same-session), with a long tail. This helps set realistic attribution windows and retargeting schedules.
Expected output:
| days_to_convert | users |
|---|---|
| 0 | 245 |
| 1 | 68 |
| 2 | 42 |
| 7 | 31 |
| 14 | 18 |
user_pseudo_id is device-scoped and resets when cookies are cleared. Cross-device journeys will show as separate users with day-0 conversions that are actually multi-day journeys. For more accurate tracking, use user_id (requires authentication) if available.Estimate query costs and efficiently query daily and intraday tables to keep your BigQuery bill under control.
Estimate the size and cost of your GA4 tables before running expensive queries.
-- Check table sizes (run this in BigQuery console)
SELECT
table_id,
ROUND(size_bytes / (1024 * 1024 * 1024), 2) AS size_gb,
row_count,
ROUND(size_bytes / (1024 * 1024 * 1024) * 6.25, 2) AS estimated_cost_usd
FROM
`your-project.analytics_XXXXXX`.__TABLES__
WHERE
table_id LIKE 'events_%'
ORDER BY
table_id DESC
LIMIT 31
How it works: The __TABLES__ metadata view exposes table sizes without scanning data. The cost formula is size_GB × $6.25 (BigQuery on-demand pricing per TB = $6.25). A typical GA4 daily table for a medium-traffic site is 0.5-2 GB. Always check sizes before querying a wide date range — scanning 12 months of data can easily cost $10-50+.
Expected output:
| table_id | size_gb | row_count | estimated_cost_usd |
|---|---|---|---|
| events_20260131 | 1.24 | 2,450,000 | 7.75 |
| events_20260130 | 1.18 | 2,320,000 | 7.38 |
| events_20260129 | 0.95 | 1,880,000 | 5.94 |
_TABLE_SUFFIX to limit date ranges, (2) select only the columns you need, (3) consider creating materialized views for frequently-run reports.Combine yesterday's finalized data with today's streaming intraday table for near real-time reporting.
-- Combine finalized daily tables with today's streaming data
SELECT
event_date,
event_name,
COUNT(*) AS event_count,
COUNT(DISTINCT user_pseudo_id) AS users
FROM (
-- Finalized daily tables (yesterday and before)
SELECT * FROM `your-project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
AND
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
UNION ALL
-- Today's intraday (streaming, not yet finalized)
SELECT * FROM `your-project.analytics_XXXXXX.events_intraday_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE())
)
GROUP BY
event_date, event_name
ORDER BY
event_date DESC, event_count DESC
How it works: GA4 exports data to two table types: events_YYYYMMDD (finalized daily, usually available by next morning) and events_intraday_YYYYMMDD (streaming, updated throughout the day). UNION ALL combines both for near real-time reporting. The FORMAT_DATE + CURRENT_DATE() pattern makes the query automatically use the right dates.
Expected output:
| event_date | event_name | event_count | users |
|---|---|---|---|
| 20260302 | page_view | 4,280 | 1,850 |
| 20260302 | session_start | 1,920 | 1,850 |
| 20260301 | page_view | 12,450 | 5,230 |
Nine frequently-encountered issues when querying GA4 data in BigQuery, and how to fix them.
ga_session_id is only unique per user. Always combine it with user_pseudo_id: CONCAT(user_pseudo_id, CAST(ga_session_id AS STRING)) for a truly unique session key.
user_pseudo_id in some cases). Filter these out or account for them in your analysis.
page_location parameter can be truncated at ~1,000 characters. Long URLs with many query parameters may be cut off, breaking GROUP BY aggregations and URL parsing.
string_value, int_value, float_value, double_value. Using the wrong type returns NULL. Check the schema or use COALESCE(value.string_value, CAST(value.int_value AS STRING)).
events_intraday_* tables contain only data received so far — they're not final until converted to daily tables. Never compare intraday numbers to complete daily totals.
_TABLE_SUFFIX is always a string like '20260131', not a date. Use BETWEEN '20260101' AND '20260131' (string comparison), not date comparisons. Forgetting quotes causes full table scans.
event_params, user_properties, and items are REPEATED RECORD fields. You must use UNNEST() to access their contents. Forgetting UNNEST returns errors or incorrect results.
Google Analytics 4 stores event data in a nested, denormalized schema that's fundamentally different from Universal Analytics. When you link GA4 to BigQuery, each user interaction becomes a row in a daily export table — and the real analytical power begins.
Unlike the sampled, aggregated reports in GA4's interface, BigQuery gives you access to raw, unsampled, hit-level data. This means you can build custom attribution models, create user-level segments, and run analyses that would be impossible in the standard GA4 UI. The tradeoff is that you need SQL skills and an understanding of GA4's nested data structure.
Every GA4 BigQuery table follows the same structure: one row per event, with nested RECORD fields for parameters, user properties, and ecommerce items. The three fields you'll use in nearly every query are:
To access data inside these nested fields, you must use BigQuery's UNNEST() function. This is the single biggest syntax difference from standard SQL and trips up most beginners. Our GA4 complete guide covers the conceptual model in more depth.
GA4 doesn't export a "sessions" table — you have to reconstruct sessions from events. The standard approach is to combine user_pseudo_id with the ga_session_id event parameter:
CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))
This creates a unique session key. From there, you can calculate session duration using engagement_time_msec, count page views per session, or classify sessions as bounced vs. engaged using the session_engaged parameter.
One of the most confusing aspects of GA4 BigQuery is that traffic source information lives in three different places:
manual_source, manual_medium, manual_campaign_name fields.event_params as source, medium, campaign keys. Most granular but least commonly used.Using the wrong field is the most common cause of "my numbers don't match GA4" complaints. For reports that should match the GA4 UI, use collected_traffic_source (session-scoped). For customer acquisition analysis, use traffic_source (first-touch).
BigQuery charges per bytes scanned. A single GA4 daily table for a medium-traffic site (50K sessions/day) can be 1-3 GB. Querying a full year of data at that volume would scan 500+ GB, costing ~$3 at on-demand pricing. Here's how to keep costs down:
SELECT * scans every column; selecting specific fields reduces bytes scanned dramatically.After working with GA4 BigQuery data extensively, we've identified the patterns that cause the most frustration. The ga_session_id is not globally unique — it's only unique within a single user_pseudo_id. Using session ID alone for counting sessions will massively undercount because different users can share the same session ID value.
The event_params value types also trip people up. Each parameter stores its value in exactly one of four sub-fields: string_value, int_value, float_value, or double_value. If you query string_value for a parameter that uses int_value, you get NULL — no error, just silent NULL. Check the schema first, or use COALESCE() across all value types.
If you're seeing unexpected (not set) values in your traffic source data, our GA4 (not set) fix guide covers the root causes and solutions.
Not every analysis needs BigQuery. GA4's Exploration reports handle standard segmentation, funnel visualization, and path analysis well — and they're free. BigQuery becomes essential when you need:
The 28 queries in this starter pack cover the most requested analytics scenarios. Copy them, replace the project ID and date range, and adapt them to your specific event taxonomy. For a broader understanding of GA4's data model, tracking setup, and reporting capabilities, see our complete GA4 guide.
BigQuery offers a free tier that includes 1 TB of query processing and 10 GB of storage per month. For most small-to-medium websites, this is enough to run dozens of analytical queries without any cost. The GA4 BigQuery export itself is free — Google doesn't charge for the data transfer. You only pay for query processing (bytes scanned) and storage beyond the free tier. On-demand pricing is $6.25 per TB scanned.
BigQuery uses GoogleSQL (formerly called Standard SQL), which is ANSI SQL:2011 compliant with extensions for nested/repeated data, geographic functions, and ML operations. The biggest difference from MySQL or PostgreSQL is the UNNEST() function for handling ARRAY and STRUCT types, which you'll use extensively with GA4 data. All 28 queries in this library use GoogleSQL syntax.
Several factors cause discrepancies: GA4 applies data thresholding (suppresses small counts for privacy), uses data-driven attribution (BigQuery has raw data without model-based attribution), samples data in some reports, and applies consent-mode adjustments. BigQuery gives you the raw, unprocessed event stream. Differences of 5-15% are normal and expected. Both datasets are "correct" — they just use different processing rules.
Cost depends entirely on how much data your queries scan. A single day's GA4 data for a medium-traffic site (50K sessions/day) is typically 1-3 GB. Querying one month costs roughly $0.04-$0.12 with on-demand pricing ($6.25/TB). The most expensive operation is querying long date ranges without _TABLE_SUFFIX filtering — this can scan your entire dataset. Always use date filters and select only the columns you need.
Not truly real-time, but near real-time. GA4 creates events_intraday_* streaming tables that update throughout the day (roughly every 15-30 minutes). This data is replaced by the finalized events_YYYYMMDD daily table the following morning. For true real-time monitoring, use the GA4 Realtime report in the UI. BigQuery is better suited for historical analysis and batch reporting.
These SQL queries are provided for educational and informational purposes. All queries use placeholder project IDs (your-project.analytics_XXXXXX) — you must replace them with your actual BigQuery project and dataset identifiers. Sample output values are illustrative and do not represent real analytics data.
BigQuery billing depends on the volume of data processed. Always review estimated bytes scanned (shown in the BigQuery console) before running queries on large datasets. The cost estimates in this guide are based on on-demand pricing as of 2026 and may change.
While we strive for accuracy, GA4's BigQuery export schema evolves over time. Google may add, deprecate, or modify fields. Always verify query syntax against your actual table schema using INFORMATION_SCHEMA or the BigQuery console's schema viewer.
These queries are just the beginning. Our complete guide covers the full GA4 ecosystem — from initial setup and event tracking to advanced reporting and BigQuery integration.
Read the Complete GA4 Guide →