Why Centralize AppsFlyer Data in a Cloud Data Warehouse?
AppsFlyer is a leading mobile attribution platform, generating crucial user-level and SKAN data. To unlock unified, trustworthy KPIs—such as ROAS, LTV, and cohort analytics—teams need to combine AppsFlyer data with marketing spend, CRM, product, and finance information. Centralizing this data in a cloud warehouse like Google BigQuery or Snowflake enables scalable analytics, rapid data joins, and robust governance. BigQuery and Snowflake are built for high-volume, elastic compute, powering analytics for companies at petabyte scale.
Keboola orchestrates the entire data journey, providing repeatable flows, versioned automation, flexible scheduling, and granular notifications. By leveraging Keboola’s platform, you can manage data privacy, standardize SKAN data, and adapt to changing attribution requirements—all while maintaining full transparency and control.
High-Level Architecture Overview
The standard pipeline follows a clear path:
- Extract data from AppsFlyer via the Pull API using Keboola’s dedicated AppsFlyer connector.
- Store raw data in Keboola Storage tables (
in.c-appsflyer.*
). - (Optional) Apply transformations for data curation (SQL, Python, or dbt).
- Write results to your warehouse of choice—BigQuery (partitioned/clustered) or Snowflake (micro-partitioned).
Each phase can be scheduled, monitored, and versioned as part of a Keboola Flow, ensuring repeatability and reliability.
[2col]Prerequisites Checklist
- [connection]AppsFlyer: Advertiser-level access, Pull API (Dev Key), App IDs, correct timezone and (optionally) currency.
- [tool]Keboola: Project with permissions for creating components, Flows, and notifications. Optionally, connect your AI client for conversational config generation using MCP Server.
- [database]BigQuery: GCP project and Service Account with BigQuery Data Editor and BigQuery Job User roles.
- [database]Snowflake: Host URL, WAREHOUSE, DATABASE, SCHEMA, and a dedicated user/role. Or use Keboola-provisioned Snowflake for BI sharing.
Essential Keboola Components
- [data-source]AppsFlyer data source connector (extractor)
- [database]BigQuery Writer or Snowflake Writer (destination)
- [sql]Transformations (SQL, Python, or dbt)
- [flow]Flow (orchestration, scheduling, notifications)
- [automation][Optional] MCP Server for AI-assisted config generation
Configuring the AppsFlyer Extractor
Set up one configuration with multiple rows—one per report type (e.g., installs_raw
, inapp_events_raw
, uninstalls_raw
, ad_revenue_raw
, skan_raw
). Key settings:
- [person]Credentials: Dev Key/token
- [data-source]Scope: App IDs (iOS/Android)
- [filter]Filters: date_from/date_to, media_source, event_name, geo, currency, timezone (must match app)
- [clock]Incremental window: Rolling re-pull for late events (recommended last 7–14 days)
- [database]Output tables:
in.c-appsflyer.*
Be mindful of AppsFlyer Pull API data retention: most reports allow access to the last 60–90 days (installs/uninstalls) or 31–90 days (in-app events). For high-volume, consider Data Locker for direct BigQuery delivery.
Transforming and Modeling Data
Use Keboola Transformations to curate and type your data. Examples:
-- BigQuery sample transformation
CREATE OR REPLACE TABLE `${project}.${dataset}.inapp_events_curated`
AS SELECT ... FROM `${project}.${dataset}.inapp_events_raw`; -- Snowflake sample transformation
CREATE OR REPLACE TABLE "INAPP_EVENTS_CURATED" AS SELECT ... FROM "INAPP_EVENTS_RAW";
Tip: Always deduplicate staged data by unique event identifiers for accuracy.
BigQuery Path: Writer Setup & Orchestration
- Configure BigQuery Writer with Service Account JSON and target dataset.
- Map curated tables; set load type (incremental for append-only, full load to overwrite).
- Partition tables by
event_time
or install_time
(DAY granularity), enable require_partition_filter
, and cluster by media_source
, campaign_id
. - Automate with Flows: extract, transform, write, schedule, and configure notifications for errors/warnings.
Example BigQuery Validation SQL
SELECT DATE(install_time) AS install_date, media_source, COUNTIF(is_organic = FALSE) AS non_organic_installs FROM `${project}.${dataset}.installs`
WHERE _PARTITIONTIME BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND CURRENT_TIMESTAMP() GROUP BY 1,2 ORDER BY 1,3 DESC;
See the BigQuery Writer GitHub for config examples.
Snowflake Path: Writer Setup & Orchestration
- Configure Snowflake Writer with account URL, user, role, warehouse, database, and schema.
- Map curated tables; set destination names and column types (
VARCHAR
, NUMBER
, etc.). - Enable incremental loads for upserts (define Primary Key) or use full load for table swaps.
- Orchestrate extraction and writing with Flows, schedule as needed, and set up notifications.
Example Snowflake Validation SQL
SELECT DATE_TRUNC('DAY', "EVENT_TIME") AS "D", "MEDIA_SOURCE", "CAMPAIGN_ID", SUM("EVENT_REVENUE") AS "REV" FROM "INAPP_EVENTS"
WHERE "EVENT_TIME" >= DATEADD(DAY, -14, CURRENT_TIMESTAMP()) GROUP BY 1,2,3 ORDER BY 1 DESC, "REV" DESC;
Check out the Snowflake Writer GitHub for implementation details.
Operational Guidance and Best Practices
- [clock]Timezone Alignment: Always match API timezone to app settings.
- [information]Identifier Restrictions: Prefer
appsflyer_id
+ timestamps. Consider privacy restrictions for certain media sources. - [database]BigQuery: Incremental loads are append-only. For upserts, use scheduled
MERGE
transformations. - [database]Snowflake: Quote identifiers to avoid case mismatches. Define Primary Keys for upsert logic. Use
AUTO_SUSPEND
for warehouse cost control. - [clock]Scheduling: Use UTC for Flows and stagger pulls after midnight in app timezone to reduce late data issues.
- [exclamation]Notifications: Add team emails and webhooks. Include context for error alerts.
- [tool]Development: Use separate projects or development branches for Dev/Test/Prod. Use Keboola‑as‑Code for config management.
Real-World Use Cases
- [analytics]Unified ROAS & LTV: Merge AppsFlyer, ad spend, CRM, and finance data for granular attribution.
- [analytics]SKAN Analytics: Decode SKAN postbacks and aggregate results for privacy-safe reporting.
- [exclamation]Fraud Detection: Blend AppsFlyer’s Protect360 signals with retention and LTV metrics.
- [data-science]Predictive LTV: Curate journeys for machine learning and operationalized prediction models.
- [advanced]Snowflake Extras: Use Snowpark UDFs for advanced analytics.
References & Official Documentation