The Client

An Asset Manager That Had Outgrown Its Spreadsheets

AMF
Mid-Sized Asset Management Firm
Identity confidential · Full GCP + BigQuery Data Foundation Engagement
IndustryAsset Management / FinTech
RegionUnited States — Multi-office
Team Size8–12 analysts, 3 data engineers
Volume2 analytics cycles per year, 40M+ enrichment records
Stack BeforeCSV exports + Excel pipelines + manual CoreSignal pulls
EngagementDiscovery → Architecture → Build → Training → Handoff

This firm had built a workable analytics process on spreadsheets and biannual data cleanup sprints. But as AUM grew and the entity universe expanded, the manual overhead had become a structural liability — analysts spending the first week of every cycle cleaning data instead of analysing it.

By the time they engaged us, they had tried to build an internal BigQuery pipeline that stalled at the schema design stage. They needed a team that understood both the data engineering side and the asset management use case — not a generic cloud consultancy learning on their project.


The Problem

Four Data Gaps That Were Costing Analyst Capacity

The surface complaint was "our data is slow." Underneath that were four distinct structural failures, each with a direct cost to analytical output and decision quality.

01

Biannual Cycle Start Was a Cleanup Sprint, Not Analysis

Every analytics cycle began with a week of data normalization — deduplication, schema reconciliation, re-linking entities broken by upstream changes. Analysts were junior data engineers by default for the first 20% of every cycle.

02

CoreSignal Enrichment Was Pulled Manually Per Analyst

Each analyst ran their own CoreSignal pulls — different entity lists, different field sets, no shared enrichment layer. The same company was enriched multiple times per cycle with different results. Reconciliation consumed hours that should have been analysis.

03

No Audit Trail — Compliance Reviews Were Manual Reconstructions

When compliance needed to trace a decision back to source data, analysts reconstructed the lineage manually from email threads and file timestamps. Each review took 3–5 days of analyst time and carried material risk of gap or error.

04

Query Performance Degraded as Data Volume Grew

As the entity universe expanded, ad-hoc BigQuery queries on raw tables took 4–8 minutes to return. Analysts avoided running exploratory queries because the feedback loop was too slow. Data was technically available but practically unused.


The Solution

How We Structured the Engagement

We've built data foundations on Google Cloud for FinTech, asset management, and SaaS firms across the US, UK, and Australia. Our approach is always schema-first — before we write a single pipeline, we map the full entity model and enrichment dependencies. Then we build the layered architecture that supports it, not what looks clean in a diagram.

Four Phases · 10-Week Delivery
We divided the engagement into four structured phases, each validated before advancing. Every pipeline was tested against production-volume data in a staging environment before cutover.
Phase 01 — Discovery & Architecture Design Phase 02 — Bronze / Silver / Gold Schema Build Phase 03 — CoreSignal Enrichment Pipeline Phase 04 — Query Optimisation, Docs & Training
Phase 01

Discovery & Architecture Design

Before any build, we spent two weeks mapping the full data flow — from upstream CSV sources through CoreSignal enrichment to downstream analyst consumption. We documented every entity, every field dependency, and every compliance requirement. That map became the schema blueprint.

We also audited the failed internal BigQuery build to understand exactly where the schema design had stalled — and built the new architecture to avoid those constraints from the ground up.

Gold Layer Schema dim_company Enriched_Firms Company_ID Firm_Name Enriched_Attributes dim_relationship Counterpart_Client_Mappings Contact_ID Client_ID fact_distribution_activity Distribution_Method Activity_Date Entity_ID
Discovery & Architecture Design — Gold Layer Schema
🗺
End-to-End Entity Mapping
Every data source, enrichment dependency, and downstream consumer documented before any build began. Schema blueprint finalised and signed off.
🔍
Failed Pipeline Audit
Prior internal BigQuery build reviewed in full — what could be salvaged, what required a clean rebuild. No assumptions carried forward.
📐
Architecture Blueprint Delivered
Bronze / Silver / Gold layer design, partitioning strategy, and CoreSignal enrichment integration architecture finalised before build commenced.
Phase 02
Phase 02

Bronze / Silver / Gold Schema Build

We built the three-layer schema architecture in BigQuery using Dataform — raw ingestion in Bronze, standardised and deduplicated in Silver, analytics-ready in Gold. Every transformation is version-controlled, tested, and documented. No "it worked last time" pipelines.

Partitioning by date and clustering by entity type reduced query costs by over 80% and brought average query time from 6 minutes to under 60 seconds on the production dataset.

BRONZE · RAW raw_coresignal_pull CSV → BigQuery ingestion raw_crm_export HubSpot / Salesforce feeds raw_market_data Bloomberg / market feeds SILVER · CLEAN stg_companies Deduped + normalised stg_enrichment Merged CoreSignal fields stg_relationships Entity linking layer GOLD · ANALYTICS dim_company Analyst-ready entities fact_distribution_activity Partitioned + clustered dim_relationship Counterpart mappings Managed by Dataform · version-controlled · tested
Bronze / Silver / Gold Schema Architecture in BigQuery
🏗
Three-Layer Schema with Dataform
Bronze (raw), Silver (clean), Gold (analytics-ready) — every transformation version-controlled and tested before promotion.
Partitioning & Clustering
Partitioned by ingestion date, clustered by entity type. Average query time dropped from 6 minutes to under 60 seconds on production volume.
🔄
Automated Ingestion Pipelines
Cloud Composer DAGs replace all manual CSV uploads. Pipelines run on schedule with alerting on any upstream source failure.
Phase 03
Phase 03

CoreSignal Enrichment Pipeline

We built a centralised enrichment layer that pulls CoreSignal firmographic, headcount, and funding data on a scheduled basis — replacing the per-analyst manual pulls that had been producing inconsistent results across the team.

Every entity in the Gold layer is now enriched from a single authoritative source. Enrichment jobs run nightly, with delta detection so only changed records trigger re-enrichment. No analyst touches CoreSignal manually. The full enrichment history is preserved in the Silver layer for audit purposes.

CORESIGNAL REST API Firmographic · Headcount Funding · Tech Stack DELTA DETECTION Changed records only Nightly schedule Cloud Composer DAG SILVER LAYER ENRICHMENT Full history preserved Audit-ready lineage Version timestamped GOLD dim_company Analyst queries Single source of truth No analyst touches CoreSignal manually after go-live
CoreSignal Enrichment Pipeline Architecture
🔗
Centralised Enrichment Layer
Single CoreSignal pipeline replaces per-analyst manual pulls. Every entity enriched from one authoritative source — no reconciliation required.
🔄
Delta Detection & Nightly Jobs
Only changed records trigger re-enrichment. Cloud Composer DAG runs nightly with alerting on API failures or enrichment gaps.
📜
Full Enrichment History in Silver
Every enrichment version timestamped and preserved. Compliance can trace any Gold record back to its source pull without manual reconstruction.
Phase 04
Phase 04

Query Optimisation, Runbooks & Training

With the schema and enrichment pipeline live, we spent the final phase on query performance tuning, cost controls, and making the system self-sufficient. Partition pruning and column clustering strategies brought BigQuery processing costs down by over 60% at production scale.

We delivered full operational runbooks for every pipeline procedure. Role-based training sessions enabled the internal analytics team to run future cycles — and troubleshoot independently — without external support. Cycle start dropped from a two-week cleanup sprint to a three-day validation run followed by immediate analysis.

BigQuery Performance — Before vs After BEFORE Avg query: 6 min 20s Cost/cycle: $4,200 Cleanup time: 8 days AFTER Avg query: 48s ▼ 87% Cost/cycle: $1,620 ▼ 61% Cleanup time: 3 days ▼ 63% Partition pruning · Column clustering · Cost controls · Runbooks delivered
Query Optimisation & Cost Control — Before vs After
Query & Cost Optimisation
Partition pruning and column clustering strategies applied to control BigQuery processing costs as data volume scales across future cycles.
📖
Runbooks & Documentation
Full operational runbooks for every pipeline procedure — role-based training sessions enabling the internal analytics team to run future cycles without external support.
🎓
Role-Based Training
Analyst, engineer, and compliance tracks. Teams validated on the system before handoff — no black-box pipelines left in production.

Impact & Outcomes

What Changed — In Numbers and In Practice

The metric that resonated most with the team wasn't the percentage reduction in manual work — it was the shift in what analysts were doing with their time. Before this platform, a biannual cycle started with a cleanup sprint that consumed the first week. After go-live, cycle start meant running a notebook, validating a quality dashboard, and moving straight to analysis. That's the compounding return on a good data foundation.

80%
Reduction in manual data preparation time per cycle
Query performance improvement post-partitioning & clustering
3 days
Cycle start-to-analysis — down from a 2-week cleanup sprint
61%
BigQuery cost reduction per analytics cycle at production scale
Single unified BigQuery foundation replaced 4 disconnected CSV pipelines and 3 per-analyst CoreSignal workflows
Zero manual CSV uploads after go-live — all ingestion automated via Cloud Composer DAGs with failure alerting
Complete enrichment audit trail across all entities — compliance traces any Gold record back to source without manual reconstruction
Analysts validated full independence — running pipeline maintenance and future cycle starts without external support
Cost controls and partition strategy in place — pipeline economics improve as data volume scales, not degrade

"Before this platform, our cycle started with a cleanup sprint that consumed the first week. After go-live, it meant running a notebook and moving straight to analysis on day one."

— Twopir Project Lead · Asset Management Data Foundation Engagement · 2025

Frequently Asked Questions

What Asset Management Firms Ask Before Working With Us

BigQuery's serverless model eliminates infrastructure management and scales automatically with query volume — well-suited for biannual high-intensity analytics cycles where you'd otherwise over-provision for peak. Combined with Dataform for pipeline orchestration and the GCP ecosystem for enrichment integrations, it offers a full-stack solution without additional tooling overhead. The per-query cost model also aligns well with irregular cycle patterns common in asset management.
We build enrichment pipelines with delta detection — only changed records trigger API calls, which dramatically reduces volume. We also implement retry logic, dead-letter queues for failed enrichments, and nightly health-check alerts. The Silver layer preserves full enrichment history, so a transient API failure doesn't corrupt downstream Gold records — it simply delays the next update cycle.
Yes — we start every engagement with a schema and pipeline audit. We document what's working, what introduces downstream risk, and what needs a clean rebuild. We've completed partial-build rescues for firms in FinTech, asset management, and SaaS — the pattern is consistent enough that we have a standard remediation framework for it.
We deliver full data lineage documentation, pipeline runbooks, schema change logs, and enrichment audit trails as standard. Every transformation in the Gold layer is traceable back to its Bronze source with timestamped versions preserved in Silver. For firms with regulatory audit requirements, we can extend this to include formal data governance documentation aligned to specific frameworks.
Most GCP partners have general cloud infrastructure expertise but limited depth in financial data modelling and enrichment pipeline architecture. We have 12+ years and 200+ implementations — with focused practices in FinTech data foundations, asset management analytics, and CRM enrichment (Salesforce, HubSpot, Propertybase). We hold Salesforce Gold Partner status and work across US, UK, AU, and UAE time zones. We don't learn the use case on your project.

More From Twopir

Your Analysts Deserve a Data Foundation
That Works as Hard as They Do

Start with a free BigQuery and data architecture audit. We review your current setup, map the gaps, and deliver written findings in 5 business days — no commitment required.

Serving US · UK · Australia · UAE · Canada  ·  US EST · UK GMT · AEST coverage  ·  Response within 24 hours guaranteed