Building a Scalable BigQuery Data Foundation
with CoreSignal Enrichment
A mid-sized asset management firm was running its analytics on biannual CSV cleanups and stale spreadsheet pipelines. We rebuilt their entire data foundation on Google Cloud Platform — BigQuery, Dataform, and CoreSignal enrichment — and handed back the time their analysts were losing every cycle.
data preparation time
post-partitioning
across all pipelines
down from 2 weeks
after go-live
- GCP project with BigQuery enabled & billing configured
- CoreSignal API access or equivalent enrichment source
- Data owner / analytics lead who can sign off on schemas
- Defined entity universe (funds, companies, portfolios)
- Compliance data retention requirements documented
- 2–3 week parallel run during pipeline cutover
- How to layer Bronze / Silver / Gold schemas in BigQuery
- How to enrich CRM data with CoreSignal firmographics automatically
- How to replace biannual spreadsheet cycles with live pipelines
- How to build audit-ready lineage inside Google Cloud
An Asset Manager That Had Outgrown Its Spreadsheets
This firm had built a workable analytics process on spreadsheets and biannual data cleanup sprints. But as AUM grew and the entity universe expanded, the manual overhead had become a structural liability — analysts spending the first week of every cycle cleaning data instead of analysing it.
By the time they engaged us, they had tried to build an internal BigQuery pipeline that stalled at the schema design stage. They needed a team that understood both the data engineering side and the asset management use case — not a generic cloud consultancy learning on their project.
Four Data Gaps That Were Costing Analyst Capacity
The surface complaint was "our data is slow." Underneath that were four distinct structural failures, each with a direct cost to analytical output and decision quality.
Biannual Cycle Start Was a Cleanup Sprint, Not Analysis
Every analytics cycle began with a week of data normalization — deduplication, schema reconciliation, re-linking entities broken by upstream changes. Analysts were junior data engineers by default for the first 20% of every cycle.
CoreSignal Enrichment Was Pulled Manually Per Analyst
Each analyst ran their own CoreSignal pulls — different entity lists, different field sets, no shared enrichment layer. The same company was enriched multiple times per cycle with different results. Reconciliation consumed hours that should have been analysis.
No Audit Trail — Compliance Reviews Were Manual Reconstructions
When compliance needed to trace a decision back to source data, analysts reconstructed the lineage manually from email threads and file timestamps. Each review took 3–5 days of analyst time and carried material risk of gap or error.
Query Performance Degraded as Data Volume Grew
As the entity universe expanded, ad-hoc BigQuery queries on raw tables took 4–8 minutes to return. Analysts avoided running exploratory queries because the feedback loop was too slow. Data was technically available but practically unused.
How We Structured the Engagement
We've built data foundations on Google Cloud for FinTech, asset management, and SaaS firms across the US, UK, and Australia. Our approach is always schema-first — before we write a single pipeline, we map the full entity model and enrichment dependencies. Then we build the layered architecture that supports it, not what looks clean in a diagram.
Discovery & Architecture Design
Before any build, we spent two weeks mapping the full data flow — from upstream CSV sources through CoreSignal enrichment to downstream analyst consumption. We documented every entity, every field dependency, and every compliance requirement. That map became the schema blueprint.
We also audited the failed internal BigQuery build to understand exactly where the schema design had stalled — and built the new architecture to avoid those constraints from the ground up.
Bronze / Silver / Gold Schema Build
We built the three-layer schema architecture in BigQuery using Dataform — raw ingestion in Bronze, standardised and deduplicated in Silver, analytics-ready in Gold. Every transformation is version-controlled, tested, and documented. No "it worked last time" pipelines.
Partitioning by date and clustering by entity type reduced query costs by over 80% and brought average query time from 6 minutes to under 60 seconds on the production dataset.
CoreSignal Enrichment Pipeline
We built a centralised enrichment layer that pulls CoreSignal firmographic, headcount, and funding data on a scheduled basis — replacing the per-analyst manual pulls that had been producing inconsistent results across the team.
Every entity in the Gold layer is now enriched from a single authoritative source. Enrichment jobs run nightly, with delta detection so only changed records trigger re-enrichment. No analyst touches CoreSignal manually. The full enrichment history is preserved in the Silver layer for audit purposes.
Query Optimisation, Runbooks & Training
With the schema and enrichment pipeline live, we spent the final phase on query performance tuning, cost controls, and making the system self-sufficient. Partition pruning and column clustering strategies brought BigQuery processing costs down by over 60% at production scale.
We delivered full operational runbooks for every pipeline procedure. Role-based training sessions enabled the internal analytics team to run future cycles — and troubleshoot independently — without external support. Cycle start dropped from a two-week cleanup sprint to a three-day validation run followed by immediate analysis.
Are your analysts rebuilding the same spreadsheet every quarter?
We audit your current data infrastructure and deliver written architecture recommendations in 5 business days — at no cost.
What Changed — In Numbers and In Practice
The metric that resonated most with the team wasn't the percentage reduction in manual work — it was the shift in what analysts were doing with their time. Before this platform, a biannual cycle started with a cleanup sprint that consumed the first week. After go-live, cycle start meant running a notebook, validating a quality dashboard, and moving straight to analysis. That's the compounding return on a good data foundation.
"Before this platform, our cycle started with a cleanup sprint that consumed the first week. After go-live, it meant running a notebook and moving straight to analysis on day one."
— Twopir Project Lead · Asset Management Data Foundation Engagement · 2025What Asset Management Firms Ask Before Working With Us
Related Case Studies
Your Analysts Deserve a Data Foundation
That Works as Hard as They Do
Start with a free BigQuery and data architecture audit. We review your current setup, map the gaps, and deliver written findings in 5 business days — no commitment required.
Serving US · UK · Australia · UAE · Canada · US EST · UK GMT · AEST coverage · Response within 24 hours guaranteed