Free SQL Reference

GA4 BigQuery SQL Starter Pack

28 production-ready SQL queries for GA4 BigQuery — with explanations, expected outputs, and pitfall warnings. Companion to our GA4 Complete Guide.

28 queries across 7 categories

GA4 BigQuery Table Structure

events_YYYYMMDD event_date STRING event_name STRING event_timestamp INT64 user_pseudo_id STRING user_id STRING event_params[] RECORD user_properties[] RECORD items[] RECORD traffic_source RECORD collected_traffic_source REC session_traffic_source... device RECORD geo RECORD event_params[] key STRING value.string_value STRING value.int_value INT64 value.float_value FLOAT64 items[] item_id STRING item_name STRING price / quantity / revenue Key Insight Every row = one event. Nested arrays require UNNEST() to access. 3 traffic source field sets exist (see #3 below).

The GA4 BigQuery export stores one row per event. Parameters, user properties, and ecommerce items are stored in nested REPEATED RECORD arrays — you must use UNNEST() to access them. This is the single most important pattern in every query below.

GA4 has three separate traffic source fields: traffic_source (first-touch, user-level), collected_traffic_source (raw collected), and session_traffic_source_last_click (session-level, added Nov 2023). Most queries in Section 3 use the session-level field.

1

Getting Started

4 queries

Basic Event Count by Date

Beginner

Count total events grouped by date — the simplest possible GA4 BigQuery query.

SELECT
  event_date,
  COUNT(*) AS event_count
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
  event_date
ORDER BY
  event_date

How it works: The events_* wildcard matches all daily export tables. _TABLE_SUFFIX filters the date range without scanning every table in the dataset, keeping costs low. Each row in GA4 BigQuery represents one event, so COUNT(*) gives total events per day.

Expected output:

event_dateevent_count
2026010114,523
2026010215,891
2026010312,047
Watch out: event_date is a STRING (YYYYMMDD), not a DATE type. To use date functions, cast it: PARSE_DATE('%Y%m%d', event_date).

Daily Active Users

Beginner

Count unique users per day using user_pseudo_id — the GA4 equivalent of a client ID.

SELECT
  event_date,
  COUNT(DISTINCT user_pseudo_id) AS active_users
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
  event_date
ORDER BY
  event_date

How it works: user_pseudo_id is a first-party cookie identifier assigned to every visitor. COUNT(DISTINCT ...) deduplicates so each user is counted once per day, regardless of how many events they triggered.

Expected output:

event_dateactive_users
202601012,341
202601022,587
202601031,923
Watch out: BigQuery user counts will be higher than GA4 UI because the interface uses Google Signals to merge cross-device users. BigQuery has raw, unmerged data. Expect a 5-15% difference.

Page Views with URLs

Beginner

Get page view counts with full URLs — your first encounter with UNNEST, the most important pattern in GA4 BigQuery.

SELECT
  (SELECT value.string_value
   FROM UNNEST(event_params) WHERE key = 'page_location'
  ) AS page_url,
  COUNT(*) AS pageviews
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  AND event_name = 'page_view'
GROUP BY
  page_url
ORDER BY
  pageviews DESC
LIMIT 20

How it works: Page URLs are stored inside the nested event_params array under the key page_location. The correlated subquery with UNNEST(event_params) extracts the string value for that key. This subquery-UNNEST pattern is the foundation of nearly every GA4 BigQuery query.

Expected output:

page_urlpageviews
https://example.com/4,201
https://example.com/pricing1,832
https://example.com/blog/guide1,456
Watch out: page_location includes full URLs with query parameters. To group by path only, wrap it in NET.HOST() and REGEXP_EXTRACT() or use page_location combined with SPLIT() to strip query strings.

Extract Any Event Parameter (Template)

Beginner

A reusable template for extracting any parameter from event_params — bookmark this one.

-- String parameters (page_location, page_title, page_referrer, etc.)
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'param_name')

-- Integer parameters (ga_session_id, ga_session_number, engagement_time_msec)
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'param_name')

-- Float parameters (custom numeric params)
(SELECT value.float_value FROM UNNEST(event_params) WHERE key = 'param_name')

-- Safe approach: COALESCE all value types
(SELECT COALESCE(value.string_value,
    CAST(value.int_value AS STRING),
    CAST(value.float_value AS STRING),
    CAST(value.double_value AS STRING))
 FROM UNNEST(event_params) WHERE key = 'param_name')

How it works: GA4 stores parameter values across four type columns: string_value, int_value, float_value, double_value. Only one is populated per parameter. You need to know which type your parameter uses — or use the COALESCE approach to safely handle any type.

Watch out: Common parameters and their types: page_location → string, ga_session_id → int, engagement_time_msec → int, page_title → string, percent_scrolled → int, session_engaged → string ("1" not integer).
2

Sessions & Engagement

5 queries

Session Count by Date

Beginner

Count unique sessions per day — the most-requested GA4 BigQuery query that Google's docs don't cover.

SELECT
  event_date,
  COUNT(DISTINCT
    CONCAT(
      user_pseudo_id,
      (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
    )
  ) AS sessions
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
  event_date
ORDER BY
  event_date

How it works: GA4 doesn't have a unique session ID. The ga_session_id parameter is only unique within a single user. To create a globally unique session key, you must concatenate user_pseudo_id + ga_session_id. This pattern is used in every session-level query.

Expected output:

event_datesessions
202601013,105
202601023,422
202601032,678
Watch out: ga_session_id alone is NOT unique across users. Two different users can have the same ga_session_id. Always concatenate with user_pseudo_id. Also, sessions spanning midnight will appear in two daily tables.

Bounce Rate and Engagement Rate

Intermediate

Calculate GA4-style engagement rate and its inverse (bounce rate) from raw event data.

WITH sessions AS (
  SELECT
    CONCAT(user_pseudo_id,
      (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
    ) AS session_key,
    MAX(
      (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged')
    ) AS session_engaged
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  GROUP BY
    session_key
)
SELECT
  COUNT(*) AS total_sessions,
  COUNTIF(session_engaged = '1') AS engaged_sessions,
  ROUND(COUNTIF(session_engaged = '1') / COUNT(*) * 100, 1) AS engagement_rate,
  ROUND((1 - COUNTIF(session_engaged = '1') / COUNT(*)) * 100, 1) AS bounce_rate
FROM
  sessions

How it works: GA4 considers a session "engaged" if it lasted >10 seconds, had 2+ page views, or triggered a conversion. The session_engaged parameter is '1' for engaged sessions. The CTE groups events by session, then the outer query calculates the percentages. Bounce rate in GA4 is simply 1 - engagement_rate.

Expected output:

total_sessionsengaged_sessionsengagement_ratebounce_rate
45,23028,52063.1%36.9%
Watch out: session_engaged is a STRING value '1', not an integer. Also, GA4's bounce rate is NOT the same as Universal Analytics' bounce rate — it's the inverse of engagement rate, which considers time, page views, and conversions.

Average Session Duration

Intermediate

Calculate average session duration in seconds using engagement time.

WITH sessions AS (
  SELECT
    CONCAT(user_pseudo_id,
      (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
    ) AS session_key,
    SUM(
      (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec')
    ) AS engagement_ms
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  GROUP BY
    session_key
)
SELECT
  ROUND(AVG(engagement_ms) / 1000, 1) AS avg_duration_sec,
  ROUND(AVG(engagement_ms) / 60000, 2) AS avg_duration_min
FROM
  sessions
WHERE
  engagement_ms > 0

How it works: GA4 tracks engagement_time_msec on each event, measuring active time on the page (tab in focus). Summing this per session gives total engagement time. We divide by 1000 for seconds or 60000 for minutes. The WHERE engagement_ms > 0 excludes bounced sessions.

Expected output:

avg_duration_secavg_duration_min
127.42.12
Watch out: engagement_time_msec is accumulated — each event carries the total engagement time since the last event. SUM() across all events in a session can overcount. For precision, use MAX() per session instead, which gives the final accumulated value.

Landing Page Performance

Intermediate

Analyze landing pages by sessions, users, engagement rate, and conversions.

WITH session_starts AS (
  SELECT
    CONCAT(user_pseudo_id,
      (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
    ) AS session_key,
    user_pseudo_id,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page,
    MAX((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged')) AS engaged
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
    AND event_name = 'session_start'
  GROUP BY
    session_key, user_pseudo_id, landing_page
)
SELECT
  landing_page,
  COUNT(*) AS sessions,
  COUNT(DISTINCT user_pseudo_id) AS users,
  ROUND(COUNTIF(engaged = '1') / COUNT(*) * 100, 1) AS engagement_rate
FROM
  session_starts
GROUP BY
  landing_page
ORDER BY
  sessions DESC
LIMIT 20

How it works: We filter for session_start events which fire at the beginning of every session. The page_location on this event is the landing page. The CTE captures the session key, user, landing page, and engagement flag, then the outer query aggregates by landing page.

Expected output:

landing_pagesessionsusersengagement_rate
https://example.com/5,2014,87058.3%
https://example.com/blog/guide1,8321,65472.1%
https://example.com/pricing1,10498581.5%
Watch out: Landing page URLs include query strings. For cleaner grouping, extract just the path: REGEXP_EXTRACT(page_location, r'^https?://[^/]+(/.*)$') and then SPLIT(path, '?')[OFFSET(0)].

Pages Per Session

Intermediate

Calculate the average number of page views per session.

WITH session_pages AS (
  SELECT
    CONCAT(user_pseudo_id,
      (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
    ) AS session_key,
    COUNTIF(event_name = 'page_view') AS page_views
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  GROUP BY
    session_key
)
SELECT
  ROUND(AVG(page_views), 2) AS avg_pages_per_session,
  MAX(page_views) AS max_pages,
  APPROX_QUANTILES(page_views, 100)[OFFSET(50)] AS median_pages
FROM
  session_pages

How it works: The CTE counts page_view events per session using COUNTIF. The outer query calculates average, max, and median (via APPROX_QUANTILES). The median is often more useful than the average since a few very deep sessions can skew the mean.

Expected output:

avg_pages_per_sessionmax_pagesmedian_pages
2.84472
Watch out: APPROX_QUANTILES is an approximate function (faster and cheaper than exact). For exact median, use PERCENTILE_CONT(page_views, 0.5) OVER() but it's significantly more expensive on large datasets.

3 Traffic Sources

Break down where your visitors come from, including organic search, paid campaigns, and referral traffic.

Source / Medium Report

Intermediate

Replicate the GA4 source/medium report with sessions, users, and key engagement metrics.

SELECT
  traffic_source.source AS source,
  traffic_source.medium AS medium,
  COUNT(DISTINCT user_pseudo_id) AS users,
  COUNT(DISTINCT CONCAT(user_pseudo_id,
    (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
  )) AS sessions,
  COUNTIF(event_name = 'page_view') AS page_views
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
  source, medium
ORDER BY
  sessions DESC
LIMIT 20

How it works: Uses the traffic_source RECORD — this captures the first-touch attribution (the source that originally acquired the user). For session-scoped attribution, use collected_traffic_source fields instead (see Attribution section). Results are sorted by session count descending.

Expected output:

sourcemediumuserssessionspage_views
googleorganic12,48018,23034,560
(direct)(none)5,2307,89011,200
newsletteremail1,4501,6803,210
Watch out: GA4 has three traffic source field sets: traffic_source (first-touch, user-level), collected_traffic_source (session-scoped), and event-level source/medium from event_params. Mixing them produces inconsistent results.

Channel Grouping

Intermediate

Build the default channel grouping logic in SQL for custom reporting and flexible attribution.

SELECT
  CASE
    WHEN traffic_source.source = '(direct)'
      AND (traffic_source.medium = '(none)' OR traffic_source.medium = '(not set)')
      THEN 'Direct'
    WHEN traffic_source.medium = 'organic'
      THEN 'Organic Search'
    WHEN REGEXP_CONTAINS(traffic_source.medium, r'^(cpc|ppc|paidsearch)$')
      THEN 'Paid Search'
    WHEN REGEXP_CONTAINS(traffic_source.medium, r'^(display|cpm|banner)$')
      THEN 'Display'
    WHEN REGEXP_CONTAINS(traffic_source.medium, r'^(social|social-network|social-media)$')
      THEN 'Social'
    WHEN traffic_source.medium = 'email'
      THEN 'Email'
    WHEN traffic_source.medium = 'referral'
      THEN 'Referral'
    ELSE 'Other'
  END AS channel_grouping,
  COUNT(DISTINCT user_pseudo_id) AS users,
  COUNTIF(event_name = 'page_view') AS page_views
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
  channel_grouping
ORDER BY
  users DESC

How it works: The CASE statement maps source/medium combinations into channels following Google's default channel grouping logic. You can customize this to match your own channel definitions — for example, adding rules for affiliate traffic or branded vs. non-branded paid search.

Expected output:

channel_groupinguserspage_views
Organic Search12,48034,560
Direct5,23011,200
Paid Search2,1004,890
Watch out: Google's actual default channel grouping in GA4 uses many more rules (40+) including source-list matching. This simplified version covers the most common cases. For a complete match, refer to Google's documentation.

Campaign Performance

Intermediate

Analyze UTM campaign performance with sessions, engagement, and conversion metrics.

SELECT
  traffic_source.source,
  traffic_source.medium,
  traffic_source.name AS campaign,
  COUNT(DISTINCT user_pseudo_id) AS users,
  COUNT(DISTINCT CONCAT(user_pseudo_id,
    (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
  )) AS sessions,
  COUNTIF(event_name = 'purchase') AS conversions,
  SUM(ecommerce.purchase_revenue) AS revenue
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  AND traffic_source.name IS NOT NULL
  AND traffic_source.name != '(not set)'
GROUP BY
  source, medium, campaign
ORDER BY
  sessions DESC
LIMIT 25

How it works: Combines traffic_source.name (campaign name from UTM tags) with source/medium to give you a full campaign breakdown. The query filters out null and "(not set)" campaigns, then aggregates sessions, conversions, and revenue. This shows first-touch attribution; for session-scoped campaign data, use collected_traffic_source.manual_campaign_name.

Expected output:

sourcemediumcampaignuserssessionsconversionsrevenue
googlecpcbrand_q1_20263,2004,58014228,400
facebooksocialspring_promo1,8002,100568,960
newsletteremailweekly_digest9801,120345,440

Organic Landing Pages with Keywords

Intermediate

Find which pages attract organic search traffic and extract the search terms (when available).

SELECT
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page,
  traffic_source.source,
  collected_traffic_source.manual_term AS keyword,
  COUNT(DISTINCT user_pseudo_id) AS users,
  COUNTIF(event_name = 'session_start') AS sessions,
  COUNTIF(event_name = 'purchase') AS conversions
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  AND traffic_source.medium = 'organic'
  AND event_name IN ('session_start', 'page_view', 'purchase')
GROUP BY
  landing_page, source, keyword
ORDER BY
  sessions DESC
LIMIT 25

How it works: Filters to organic traffic only, then groups by landing page and search keyword. The collected_traffic_source.manual_term field stores the utm_term parameter — for organic search, this is usually empty because Google encrypts search terms. For actual keyword data, combine with Google Search Console data in BigQuery.

Expected output:

landing_pagesourcekeyworduserssessions
https://example.com/blog/guidegoogle(not set)2,4503,120
https://example.com/featuresgoogle(not set)8901,080
https://example.com/pricingbingsoftware pricing230285
Watch out: Google encrypts organic search queries, so the keyword field is almost always "(not set)" for organic traffic. To get real keyword data, link Google Search Console to BigQuery and join on URL.

New vs Returning Users

Intermediate

Segment traffic by new and returning users with engagement and conversion comparison.

SELECT
  CASE
    WHEN (SELECT value.int_value FROM UNNEST(event_params)
          WHERE key = 'ga_session_number') = 1
      THEN 'New'
    ELSE 'Returning'
  END AS user_type,
  COUNT(DISTINCT user_pseudo_id) AS users,
  COUNT(DISTINCT CONCAT(user_pseudo_id,
    (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
  )) AS sessions,
  COUNTIF(event_name = 'purchase') AS conversions,
  ROUND(SAFE_DIVIDE(
    COUNTIF(event_name = 'purchase'),
    COUNT(DISTINCT CONCAT(user_pseudo_id,
      (SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) WHERE key = 'ga_session_id')
    ))
  ) * 100, 2) AS conversion_rate
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
GROUP BY
  user_type
ORDER BY
  users DESC

How it works: The ga_session_number event parameter is a sequential counter — when it equals 1, it's the user's very first session. Everything else is returning. The query compares users, sessions, and conversion rate between segments. Returning users typically convert at 2-3x the rate of new users.

Expected output:

user_typeuserssessionsconversionsconversion_rate
New18,40018,400920.50
Returning4,6008,9002342.63

4 Ecommerce

Revenue analysis, product performance, and purchase funnel queries for sites with ecommerce tracking enabled.

Revenue by Source / Medium

Intermediate

Attribute purchase revenue to traffic sources to understand which channels drive the most value.

SELECT
  traffic_source.source,
  traffic_source.medium,
  COUNT(DISTINCT user_pseudo_id) AS purchasers,
  COUNTIF(event_name = 'purchase') AS transactions,
  ROUND(SUM(ecommerce.purchase_revenue), 2) AS revenue,
  ROUND(SAFE_DIVIDE(
    SUM(ecommerce.purchase_revenue),
    COUNTIF(event_name = 'purchase')
  ), 2) AS avg_order_value
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  AND event_name = 'purchase'
GROUP BY
  source, medium
ORDER BY
  revenue DESC
LIMIT 15

How it works: Filters to purchase events only, then groups by first-touch source/medium. ecommerce.purchase_revenue holds the total transaction value. SAFE_DIVIDE prevents division by zero errors. This uses user-level (first-touch) attribution — the source that originally acquired the customer gets full credit.

Expected output:

sourcemediumpurchaserstransactionsrevenueavg_order_value
googleorganic34241045,280.00110.44
googlecpc18621528,750.00133.72
(direct)(none)12815618,920.00121.28

Product Performance

Intermediate

Analyze individual product views, add-to-carts, and purchases from the items array.

SELECT
  items.item_name,
  items.item_category,
  COUNTIF(event_name = 'view_item') AS views,
  COUNTIF(event_name = 'add_to_cart') AS add_to_carts,
  COUNTIF(event_name = 'purchase') AS purchases,
  ROUND(SUM(IF(event_name = 'purchase', items.price * items.quantity, 0)), 2) AS revenue,
  ROUND(SAFE_DIVIDE(
    COUNTIF(event_name = 'purchase'),
    COUNTIF(event_name = 'view_item')
  ) * 100, 2) AS view_to_purchase_rate
FROM
  `your-project.analytics_XXXXXX.events_*`,
  UNNEST(items) AS items
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  AND event_name IN ('view_item', 'add_to_cart', 'purchase')
GROUP BY
  item_name, item_category
ORDER BY
  revenue DESC
LIMIT 20

How it works: UNNEST(items) explodes the nested items array so each product gets its own row. The query then counts views, add-to-carts, and purchases per product using COUNTIF. Revenue is calculated only for purchase events. The view-to-purchase rate reveals which products convert best from browsing.

Expected output:

item_nameitem_categoryviewsadd_to_cartspurchasesrevenueview_to_purchase_rate
Pro Plan Annualsubscription4,20068021025,200.005.00
Starter Kitphysical8,5001,20034016,660.004.00
Watch out: UNNEST(items) creates one row per item per event. If a purchase event has 3 items, it creates 3 rows. This means you cannot count distinct transactions using COUNT(*) after UNNEST — use COUNT(DISTINCT ecommerce.transaction_id) instead.

Purchase Funnel

Intermediate

Build a complete ecommerce funnel from page view to purchase with drop-off rates at each step.

WITH funnel AS (
  SELECT
    COUNT(DISTINCT IF(event_name = 'session_start', user_pseudo_id, NULL)) AS step_1_sessions,
    COUNT(DISTINCT IF(event_name = 'view_item', user_pseudo_id, NULL)) AS step_2_view_item,
    COUNT(DISTINCT IF(event_name = 'add_to_cart', user_pseudo_id, NULL)) AS step_3_add_to_cart,
    COUNT(DISTINCT IF(event_name = 'begin_checkout', user_pseudo_id, NULL)) AS step_4_checkout,
    COUNT(DISTINCT IF(event_name = 'purchase', user_pseudo_id, NULL)) AS step_5_purchase
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
)
SELECT
  step_1_sessions,
  step_2_view_item,
  ROUND(SAFE_DIVIDE(step_2_view_item, step_1_sessions) * 100, 1) AS view_rate,
  step_3_add_to_cart,
  ROUND(SAFE_DIVIDE(step_3_add_to_cart, step_2_view_item) * 100, 1) AS atc_rate,
  step_4_checkout,
  ROUND(SAFE_DIVIDE(step_4_checkout, step_3_add_to_cart) * 100, 1) AS checkout_rate,
  step_5_purchase,
  ROUND(SAFE_DIVIDE(step_5_purchase, step_4_checkout) * 100, 1) AS purchase_rate,
  ROUND(SAFE_DIVIDE(step_5_purchase, step_1_sessions) * 100, 2) AS overall_conversion
FROM
  funnel

How it works: Counts distinct users at each funnel step. This is an unordered funnel — it counts whether a user triggered each event, not whether events happened in sequence. For a sequential funnel (user must complete step N before step N+1), see the Advanced section. The drop-off rates between steps reveal your biggest optimization opportunities.

Expected output:

step_1_sessionsstep_2_view_itemview_ratestep_3_add_to_cartatc_ratestep_5_purchaseoverall_conversion
24,5008,20033.5%2,10025.6%5802.37%

Average Order Value by Device

Intermediate

Compare purchase behavior across device categories — desktop, mobile, and tablet.

SELECT
  device.category AS device_type,
  COUNT(DISTINCT ecommerce.transaction_id) AS transactions,
  COUNT(DISTINCT user_pseudo_id) AS purchasers,
  ROUND(SUM(ecommerce.purchase_revenue), 2) AS total_revenue,
  ROUND(AVG(ecommerce.purchase_revenue), 2) AS avg_order_value,
  ROUND(SAFE_DIVIDE(
    SUM(ecommerce.purchase_revenue),
    COUNT(DISTINCT user_pseudo_id)
  ), 2) AS revenue_per_user
FROM
  `your-project.analytics_XXXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  AND event_name = 'purchase'
GROUP BY
  device_type
ORDER BY
  total_revenue DESC

How it works: Groups purchase events by device.category to compare AOV and revenue per user across devices. Desktop users typically have higher AOV, while mobile may have higher transaction volume. Revenue per user helps identify which platform drives the most value per customer.

Expected output:

device_typetransactionspurchaserstotal_revenueavg_order_valuerevenue_per_user
desktop38031052,460.00138.05169.23
mobile24522024,500.00100.00111.36
tablet42385,040.00120.00132.63

5 Attribution

Build your own attribution models in SQL — from simple first/last click to multi-touch linear attribution.

First-Click Attribution

Advanced

Credit 100% of each conversion to the first touchpoint that acquired the user.

WITH conversions AS (
  SELECT
    user_pseudo_id,
    ecommerce.transaction_id,
    ecommerce.purchase_revenue AS revenue,
    traffic_source.source AS first_source,
    traffic_source.medium AS first_medium
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
    AND event_name = 'purchase'
)
SELECT
  first_source,
  first_medium,
  COUNT(DISTINCT transaction_id) AS conversions,
  ROUND(SUM(revenue), 2) AS attributed_revenue
FROM
  conversions
GROUP BY
  first_source, first_medium
ORDER BY
  attributed_revenue DESC

How it works: This is the simplest attribution model. traffic_source.source/medium always refers to the first touchpoint that brought the user. By joining it with purchase events, you attribute all revenue to the channel that originally acquired the user. This model favors awareness channels like organic search and social.

Expected output:

first_sourcefirst_mediumconversionsattributed_revenue
googleorganic41045,280.00
googlecpc18622,320.00
facebooksocial9511,400.00

Last Non-Direct Click Attribution

Advanced

Credit the conversion to the last marketing touchpoint before purchase, ignoring direct visits.

WITH sessions AS (
  SELECT
    user_pseudo_id,
    event_timestamp,
    (SELECT value.int_value FROM UNNEST(event_params)
     WHERE key = 'ga_session_id') AS session_id,
    collected_traffic_source.manual_source AS source,
    collected_traffic_source.manual_medium AS medium,
    event_name,
    ecommerce.purchase_revenue AS revenue,
    ecommerce.transaction_id
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
),
last_touch AS (
  SELECT
    s.user_pseudo_id,
    s.transaction_id,
    s.revenue,
    t.source AS last_nd_source,
    t.medium AS last_nd_medium
  FROM
    sessions s
  JOIN (
    SELECT
      user_pseudo_id,
      source,
      medium,
      event_timestamp,
      ROW_NUMBER() OVER(
        PARTITION BY user_pseudo_id
        ORDER BY event_timestamp DESC
      ) AS rn
    FROM
      sessions
    WHERE
      source IS NOT NULL
      AND source != '(direct)'
      AND event_name = 'session_start'
  ) t
    ON s.user_pseudo_id = t.user_pseudo_id AND t.rn = 1
  WHERE
    s.event_name = 'purchase'
)
SELECT
  last_nd_source,
  last_nd_medium,
  COUNT(DISTINCT transaction_id) AS conversions,
  ROUND(SUM(revenue), 2) AS attributed_revenue
FROM
  last_touch
GROUP BY
  last_nd_source, last_nd_medium
ORDER BY
  attributed_revenue DESC

How it works: First, it collects all session-level touchpoints using collected_traffic_source (session-scoped, not user-scoped like traffic_source). Then it uses ROW_NUMBER() to find the most recent non-direct session before each purchase. This model is closest to what GA4 uses by default. It favors closing channels like branded search and retargeting.

Expected output:

last_nd_sourcelast_nd_mediumconversionsattributed_revenue
googlecpc24531,850.00
googleorganic32028,480.00
newsletteremail789,360.00
Watch out: collected_traffic_source was added in late 2023. Earlier data may have NULL values. Also, this field uses manual_source/manual_medium (from UTM parameters) — for auto-tagged Google Ads traffic, check gclid fields instead.

Linear Multi-Touch Attribution

Advanced

Distribute conversion credit equally across all touchpoints in the user's journey.

WITH touchpoints AS (
  SELECT
    user_pseudo_id,
    event_timestamp,
    collected_traffic_source.manual_source AS source,
    collected_traffic_source.manual_medium AS medium
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
    AND event_name = 'session_start'
    AND collected_traffic_source.manual_source IS NOT NULL
),
conversions AS (
  SELECT
    user_pseudo_id,
    ecommerce.transaction_id,
    ecommerce.purchase_revenue AS revenue
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
    AND event_name = 'purchase'
),
user_paths AS (
  SELECT
    c.transaction_id,
    c.revenue,
    t.source,
    t.medium,
    COUNT(*) OVER(PARTITION BY c.transaction_id) AS touchpoint_count
  FROM
    conversions c
  JOIN
    touchpoints t ON c.user_pseudo_id = t.user_pseudo_id
)
SELECT
  source,
  medium,
  ROUND(SUM(1.0 / touchpoint_count), 1) AS attributed_conversions,
  ROUND(SUM(revenue / touchpoint_count), 2) AS attributed_revenue
FROM
  user_paths
GROUP BY
  source, medium
ORDER BY
  attributed_revenue DESC

How it works: For each conversion, the query finds all touchpoints (sessions) the user had. It then divides the revenue equally: if a user had 4 touchpoints, each gets 25% credit. The touchpoint_count window function counts sessions per transaction. This model gives a balanced view by crediting all channels in the customer journey, not just the first or last.

Expected output:

sourcemediumattributed_conversionsattributed_revenue
googleorganic285.331,420.50
googlecpc198.724,105.25
newsletteremail112.513,500.00

Data-Driven Attribution Preparation

Advanced

Prepare conversion path data for Shapley value or Markov chain attribution modeling.

WITH sessions_with_source AS (
  SELECT
    user_pseudo_id,
    event_timestamp,
    CONCAT(
      IFNULL(collected_traffic_source.manual_source, '(direct)'),
      ' / ',
      IFNULL(collected_traffic_source.manual_medium, '(none)')
    ) AS channel,
    event_name,
    ecommerce.transaction_id,
    ecommerce.purchase_revenue AS revenue
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
    AND event_name IN ('session_start', 'purchase')
),
paths AS (
  SELECT
    user_pseudo_id,
    STRING_AGG(
      IF(event_name = 'session_start', channel, NULL)
      IGNORE NULLS
      ORDER BY event_timestamp
    ) AS conversion_path,
    MAX(IF(event_name = 'purchase', 1, 0)) AS converted,
    SUM(IF(event_name = 'purchase', revenue, 0)) AS revenue
  FROM
    sessions_with_source
  GROUP BY
    user_pseudo_id
)
SELECT
  conversion_path,
  converted,
  COUNT(*) AS path_count,
  ROUND(SUM(revenue), 2) AS total_revenue
FROM
  paths
GROUP BY
  conversion_path, converted
ORDER BY
  path_count DESC
LIMIT 50

How it works: This query builds conversion paths — the sequence of channels each user interacted with, separated by commas. STRING_AGG concatenates channel touchpoints in chronological order. The output shows both converting and non-converting paths, which is the input needed for data-driven models like Markov chains (transition probabilities between channels) or Shapley values (each channel's marginal contribution). Export this data to Python or R for the actual modeling.

Expected output:

conversion_pathconvertedpath_counttotal_revenue
google / organic08,4500.00
google / organic, google / cpc114218,460.00
google / organic, (direct) / (none), newsletter / email18610,320.00
Watch out: Data-driven attribution in GA4's UI uses Google's proprietary algorithm that you can't replicate in SQL. This query prepares data for your own data-driven model. The accuracy depends heavily on having enough conversion volume — aim for 300+ conversions per month minimum.

6 Funnels & Retention

Advanced queries for sequential funnel analysis, user retention cohorts, and engagement frequency.

Sequential Funnel (Ordered Steps)

Advanced

Build a strict sequential funnel where users must complete each step in order before moving to the next.

WITH user_events AS (
  SELECT
    user_pseudo_id,
    event_name,
    event_timestamp,
    MIN(IF(event_name = 'session_start', event_timestamp, NULL))
      OVER(PARTITION BY user_pseudo_id) AS first_session,
    MIN(IF(event_name = 'view_item', event_timestamp, NULL))
      OVER(PARTITION BY user_pseudo_id) AS first_view,
    MIN(IF(event_name = 'add_to_cart', event_timestamp, NULL))
      OVER(PARTITION BY user_pseudo_id) AS first_atc,
    MIN(IF(event_name = 'begin_checkout', event_timestamp, NULL))
      OVER(PARTITION BY user_pseudo_id) AS first_checkout,
    MIN(IF(event_name = 'purchase', event_timestamp, NULL))
      OVER(PARTITION BY user_pseudo_id) AS first_purchase
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
)
SELECT
  COUNT(DISTINCT user_pseudo_id) AS step_1_session,
  COUNT(DISTINCT IF(first_view > first_session, user_pseudo_id, NULL)) AS step_2_view,
  COUNT(DISTINCT IF(first_atc > first_view, user_pseudo_id, NULL)) AS step_3_atc,
  COUNT(DISTINCT IF(first_checkout > first_atc, user_pseudo_id, NULL)) AS step_4_checkout,
  COUNT(DISTINCT IF(first_purchase > first_checkout, user_pseudo_id, NULL)) AS step_5_purchase
FROM
  user_events
WHERE
  event_name = 'session_start'

How it works: Window functions compute the first timestamp each user triggered each event. Then the outer query enforces order: a user only counts for step 3 if their first add_to_cart happened after their first view_item. This is stricter than the unordered funnel in the Ecommerce section and gives more accurate drop-off analysis.

Expected output:

step_1_sessionstep_2_viewstep_3_atcstep_4_checkoutstep_5_purchase
24,5007,8001,9501,120520

User Retention Cohort

Advanced

Build a weekly retention cohort table showing what percentage of users return in subsequent weeks.

WITH user_activity AS (
  SELECT
    user_pseudo_id,
    DATE_TRUNC(
      PARSE_DATE('%Y%m%d', event_date), WEEK
    ) AS activity_week
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
  GROUP BY
    user_pseudo_id, activity_week
),
cohorts AS (
  SELECT
    user_pseudo_id,
    MIN(activity_week) AS cohort_week
  FROM
    user_activity
  GROUP BY
    user_pseudo_id
)
SELECT
  c.cohort_week,
  DATE_DIFF(a.activity_week, c.cohort_week, WEEK) AS weeks_since_first,
  COUNT(DISTINCT a.user_pseudo_id) AS active_users,
  ROUND(SAFE_DIVIDE(
    COUNT(DISTINCT a.user_pseudo_id),
    MAX(COUNT(DISTINCT a.user_pseudo_id)) OVER(PARTITION BY c.cohort_week)
  ) * 100, 1) AS retention_pct
FROM
  cohorts c
JOIN
  user_activity a ON c.user_pseudo_id = a.user_pseudo_id
GROUP BY
  c.cohort_week, weeks_since_first
ORDER BY
  c.cohort_week, weeks_since_first

How it works: The cohorts CTE assigns each user to their first-activity week. Then it joins back to all activity to count how many users from each cohort return in subsequent weeks. DATE_DIFF(..., WEEK) calculates the gap. The retention percentage shows what fraction of the original cohort is still active. Export to a spreadsheet for the classic cohort heatmap visualization.

Expected output:

cohort_weekweeks_since_firstactive_usersretention_pct
2026-01-0503,200100.0
2026-01-05196030.0
2026-01-05264020.0

DAU / WAU / MAU Ratios

Advanced

Calculate daily, weekly, and monthly active users and the stickiness ratios (DAU/MAU, WAU/MAU).

WITH daily AS (
  SELECT
    PARSE_DATE('%Y%m%d', event_date) AS dt,
    COUNT(DISTINCT user_pseudo_id) AS dau
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
  GROUP BY dt
),
monthly AS (
  SELECT
    COUNT(DISTINCT user_pseudo_id) AS mau
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
),
weekly AS (
  SELECT
    ROUND(AVG(wau), 0) AS avg_wau
  FROM (
    SELECT
      DATE_TRUNC(PARSE_DATE('%Y%m%d', event_date), WEEK) AS wk,
      COUNT(DISTINCT user_pseudo_id) AS wau
    FROM
      `your-project.analytics_XXXXXX.events_*`
    WHERE
      _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'
    GROUP BY wk
  )
)
SELECT
  ROUND(AVG(d.dau), 0) AS avg_dau,
  w.avg_wau,
  m.mau,
  ROUND(SAFE_DIVIDE(AVG(d.dau), m.mau) * 100, 1) AS dau_mau_pct,
  ROUND(SAFE_DIVIDE(w.avg_wau, m.mau) * 100, 1) AS wau_mau_pct
FROM
  daily d
CROSS JOIN
  monthly m
CROSS JOIN
  weekly w
GROUP BY
  m.mau, w.avg_wau

How it works: Three CTEs compute daily, weekly, and monthly active users separately. CROSS JOIN combines them for the final calculation. The DAU/MAU ratio (stickiness) shows what fraction of monthly users visit on any given day — a ratio above 20% is considered strong for most products. WAU/MAU above 40% indicates healthy weekly engagement.

Expected output:

avg_dauavg_waumaudau_mau_pctwau_mau_pct
2,4508,20024,50010.033.5

Time to Conversion

Advanced

Measure how many days it takes from a user's first visit to their first purchase.

WITH first_visit AS (
  SELECT
    user_pseudo_id,
    MIN(PARSE_DATE('%Y%m%d', event_date)) AS first_date
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
  GROUP BY
    user_pseudo_id
),
first_purchase AS (
  SELECT
    user_pseudo_id,
    MIN(PARSE_DATE('%Y%m%d', event_date)) AS purchase_date
  FROM
    `your-project.analytics_XXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260101' AND '20260331'
    AND event_name = 'purchase'
  GROUP BY
    user_pseudo_id
)
SELECT
  DATE_DIFF(p.purchase_date, v.first_date, DAY) AS days_to_convert,
  COUNT(*) AS users
FROM
  first_purchase p
JOIN
  first_visit v ON p.user_pseudo_id = v.user_pseudo_id
GROUP BY
  days_to_convert
ORDER BY
  days_to_convert
LIMIT 30

How it works: Two CTEs find each user's first-ever visit date and first purchase date. DATE_DIFF calculates the gap in days. The distribution shows how quickly users convert — most conversions typically happen on day 0 (same-session), with a long tail. This helps set realistic attribution windows and retargeting schedules.

Expected output:

days_to_convertusers
0245
168
242
731
1418
Watch out: user_pseudo_id is device-scoped and resets when cookies are cleared. Cross-device journeys will show as separate users with day-0 conversions that are actually multi-day journeys. For more accurate tracking, use user_id (requires authentication) if available.

7 Optimization

Estimate query costs and efficiently query daily and intraday tables to keep your BigQuery bill under control.

Query Cost Estimation

Beginner

Estimate the size and cost of your GA4 tables before running expensive queries.

-- Check table sizes (run this in BigQuery console)
SELECT
  table_id,
  ROUND(size_bytes / (1024 * 1024 * 1024), 2) AS size_gb,
  row_count,
  ROUND(size_bytes / (1024 * 1024 * 1024) * 6.25, 2) AS estimated_cost_usd
FROM
  `your-project.analytics_XXXXXX`.__TABLES__
WHERE
  table_id LIKE 'events_%'
ORDER BY
  table_id DESC
LIMIT 31

How it works: The __TABLES__ metadata view exposes table sizes without scanning data. The cost formula is size_GB × $6.25 (BigQuery on-demand pricing per TB = $6.25). A typical GA4 daily table for a medium-traffic site is 0.5-2 GB. Always check sizes before querying a wide date range — scanning 12 months of data can easily cost $10-50+.

Expected output:

table_idsize_gbrow_countestimated_cost_usd
events_202601311.242,450,0007.75
events_202601301.182,320,0007.38
events_202601290.951,880,0005.94
Watch out: BigQuery's free tier includes 1 TB of querying per month. For cost savings: (1) always use _TABLE_SUFFIX to limit date ranges, (2) select only the columns you need, (3) consider creating materialized views for frequently-run reports.

Query Daily + Intraday Tables

Beginner

Combine yesterday's finalized data with today's streaming intraday table for near real-time reporting.

-- Combine finalized daily tables with today's streaming data
SELECT
  event_date,
  event_name,
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_pseudo_id) AS users
FROM (
  -- Finalized daily tables (yesterday and before)
  SELECT * FROM `your-project.analytics_XXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN
    FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
    AND
    FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

  UNION ALL

  -- Today's intraday (streaming, not yet finalized)
  SELECT * FROM `your-project.analytics_XXXXXX.events_intraday_*`
  WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE())
)
GROUP BY
  event_date, event_name
ORDER BY
  event_date DESC, event_count DESC

How it works: GA4 exports data to two table types: events_YYYYMMDD (finalized daily, usually available by next morning) and events_intraday_YYYYMMDD (streaming, updated throughout the day). UNION ALL combines both for near real-time reporting. The FORMAT_DATE + CURRENT_DATE() pattern makes the query automatically use the right dates.

Expected output:

event_dateevent_nameevent_countusers
20260302page_view4,2801,850
20260302session_start1,9201,850
20260301page_view12,4505,230
Watch out: Intraday tables are incomplete — they only contain data received so far today. Numbers will grow throughout the day. Don't compare today's intraday data to yesterday's finalized totals and conclude traffic dropped. Also, intraday tables are deleted once the daily table is finalized.

! Common Pitfalls & Troubleshooting

Nine frequently-encountered issues when querying GA4 data in BigQuery, and how to fix them.

1. Numbers don't match GA4 UI
GA4's interface applies sampling, thresholding, and data-driven attribution. BigQuery contains raw, unsampled data. Expect 5-15% differences — they're both "correct" using different methodologies.
2. Session ID is not unique
ga_session_id is only unique per user. Always combine it with user_pseudo_id: CONCAT(user_pseudo_id, CAST(ga_session_id AS STRING)) for a truly unique session key.
3. Consent mode causes NULLs
If consent mode is enabled and a user declines analytics cookies, many fields will be NULL (including user_pseudo_id in some cases). Filter these out or account for them in your analysis.
4. Sessions split at midnight
GA4 ends a session at midnight and starts a new one. A user active from 11:55 PM to 12:05 AM creates two sessions. This inflates session counts and can distort session-based metrics.
5. URL truncation in page_location
The page_location parameter can be truncated at ~1,000 characters. Long URLs with many query parameters may be cut off, breaking GROUP BY aggregations and URL parsing.
6. event_params has 4 value types
Each parameter stores its value in one of: string_value, int_value, float_value, double_value. Using the wrong type returns NULL. Check the schema or use COALESCE(value.string_value, CAST(value.int_value AS STRING)).
7. Intraday data is incomplete
events_intraday_* tables contain only data received so far — they're not final until converted to daily tables. Never compare intraday numbers to complete daily totals.
8. _TABLE_SUFFIX is a string
_TABLE_SUFFIX is always a string like '20260131', not a date. Use BETWEEN '20260101' AND '20260131' (string comparison), not date comparisons. Forgetting quotes causes full table scans.
9. UNNEST is required for nested fields
event_params, user_properties, and items are REPEATED RECORD fields. You must use UNNEST() to access their contents. Forgetting UNNEST returns errors or incorrect results.

How to Query GA4 Data in BigQuery: A Complete Guide

Google Analytics 4 stores event data in a nested, denormalized schema that's fundamentally different from Universal Analytics. When you link GA4 to BigQuery, each user interaction becomes a row in a daily export table — and the real analytical power begins.

Unlike the sampled, aggregated reports in GA4's interface, BigQuery gives you access to raw, unsampled, hit-level data. This means you can build custom attribution models, create user-level segments, and run analyses that would be impossible in the standard GA4 UI. The tradeoff is that you need SQL skills and an understanding of GA4's nested data structure.

Understanding the GA4 BigQuery Schema

Every GA4 BigQuery table follows the same structure: one row per event, with nested RECORD fields for parameters, user properties, and ecommerce items. The three fields you'll use in nearly every query are:

To access data inside these nested fields, you must use BigQuery's UNNEST() function. This is the single biggest syntax difference from standard SQL and trips up most beginners. Our GA4 complete guide covers the conceptual model in more depth.

Session Reconstruction in BigQuery

GA4 doesn't export a "sessions" table — you have to reconstruct sessions from events. The standard approach is to combine user_pseudo_id with the ga_session_id event parameter:

CONCAT(user_pseudo_id, CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING))

This creates a unique session key. From there, you can calculate session duration using engagement_time_msec, count page views per session, or classify sessions as bounced vs. engaged using the session_engaged parameter.

Attribution: Three Layers of Traffic Source Data

One of the most confusing aspects of GA4 BigQuery is that traffic source information lives in three different places:

  1. traffic_source — User-level, first-touch attribution (the source that originally acquired the user). This never changes for a given user.
  2. collected_traffic_source — Session-scoped attribution (the source/medium/campaign for the current session). Uses manual_source, manual_medium, manual_campaign_name fields.
  3. Event-level source — Available in event_params as source, medium, campaign keys. Most granular but least commonly used.

Using the wrong field is the most common cause of "my numbers don't match GA4" complaints. For reports that should match the GA4 UI, use collected_traffic_source (session-scoped). For customer acquisition analysis, use traffic_source (first-touch).

Cost Optimization Strategies

BigQuery charges per bytes scanned. A single GA4 daily table for a medium-traffic site (50K sessions/day) can be 1-3 GB. Querying a full year of data at that volume would scan 500+ GB, costing ~$3 at on-demand pricing. Here's how to keep costs down:

Common Mistakes to Avoid

After working with GA4 BigQuery data extensively, we've identified the patterns that cause the most frustration. The ga_session_id is not globally unique — it's only unique within a single user_pseudo_id. Using session ID alone for counting sessions will massively undercount because different users can share the same session ID value.

The event_params value types also trip people up. Each parameter stores its value in exactly one of four sub-fields: string_value, int_value, float_value, or double_value. If you query string_value for a parameter that uses int_value, you get NULL — no error, just silent NULL. Check the schema first, or use COALESCE() across all value types.

If you're seeing unexpected (not set) values in your traffic source data, our GA4 (not set) fix guide covers the root causes and solutions.

When to Use BigQuery vs. GA4 Explorations

Not every analysis needs BigQuery. GA4's Exploration reports handle standard segmentation, funnel visualization, and path analysis well — and they're free. BigQuery becomes essential when you need:

The 28 queries in this starter pack cover the most requested analytics scenarios. Copy them, replace the project ID and date range, and adapt them to your specific event taxonomy. For a broader understanding of GA4's data model, tracking setup, and reporting capabilities, see our complete GA4 guide.

Frequently Asked Questions

Is BigQuery free for GA4 data?

BigQuery offers a free tier that includes 1 TB of query processing and 10 GB of storage per month. For most small-to-medium websites, this is enough to run dozens of analytical queries without any cost. The GA4 BigQuery export itself is free — Google doesn't charge for the data transfer. You only pay for query processing (bytes scanned) and storage beyond the free tier. On-demand pricing is $6.25 per TB scanned.

What SQL dialect does BigQuery use?

BigQuery uses GoogleSQL (formerly called Standard SQL), which is ANSI SQL:2011 compliant with extensions for nested/repeated data, geographic functions, and ML operations. The biggest difference from MySQL or PostgreSQL is the UNNEST() function for handling ARRAY and STRUCT types, which you'll use extensively with GA4 data. All 28 queries in this library use GoogleSQL syntax.

Why don't my BigQuery numbers match the GA4 interface?

Several factors cause discrepancies: GA4 applies data thresholding (suppresses small counts for privacy), uses data-driven attribution (BigQuery has raw data without model-based attribution), samples data in some reports, and applies consent-mode adjustments. BigQuery gives you the raw, unprocessed event stream. Differences of 5-15% are normal and expected. Both datasets are "correct" — they just use different processing rules.

How much does it cost to query GA4 data in BigQuery?

Cost depends entirely on how much data your queries scan. A single day's GA4 data for a medium-traffic site (50K sessions/day) is typically 1-3 GB. Querying one month costs roughly $0.04-$0.12 with on-demand pricing ($6.25/TB). The most expensive operation is querying long date ranges without _TABLE_SUFFIX filtering — this can scan your entire dataset. Always use date filters and select only the columns you need.

Can I get real-time data from GA4 BigQuery?

Not truly real-time, but near real-time. GA4 creates events_intraday_* streaming tables that update throughout the day (roughly every 15-30 minutes). This data is replaced by the finalized events_YYYYMMDD daily table the following morning. For true real-time monitoring, use the GA4 Realtime report in the UI. BigQuery is better suited for historical analysis and batch reporting.

Disclaimer

These SQL queries are provided for educational and informational purposes. All queries use placeholder project IDs (your-project.analytics_XXXXXX) — you must replace them with your actual BigQuery project and dataset identifiers. Sample output values are illustrative and do not represent real analytics data.

BigQuery billing depends on the volume of data processed. Always review estimated bytes scanned (shown in the BigQuery console) before running queries on large datasets. The cost estimates in this guide are based on on-demand pricing as of 2026 and may change.

While we strive for accuracy, GA4's BigQuery export schema evolves over time. Google may add, deprecate, or modify fields. Always verify query syntax against your actual table schema using INFORMATION_SCHEMA or the BigQuery console's schema viewer.

Ready to Master GA4 Analytics?

These queries are just the beginning. Our complete guide covers the full GA4 ecosystem — from initial setup and event tracking to advanced reporting and BigQuery integration.

Read the Complete GA4 Guide →