Streaming Data Ingestion Pipeline
Data Engineering
- Loading data from Pub/Sub subscription to different tables based on different event types
- Ingestion to BigQuery Tables with ingestion time-based partitioning
Google Cloud services

Pub Sub

Cloud Dataflow

Big Query

Cloud Build

Deployment Manager

Cloud Monitoring

Cloud Logging

Cloud Repositories
Features

- Serverless – No Infra provisioning
- Auto Scalable – Scales up and down as per data size
- Deployment - Managed and automated
- Security – Access-controlled & Encryption-enabled
- Performance – Can process GBs of data in seconds
- Data stream types of JSON supported
- Any data volumes as supported by Pub/Sub
- Huge number of streaming records
- Auto detection of schema and changes
- Single pipeline can handle multiple type of events
- Effectively this is an EVENT ROUTER

Adoption
Operationalization

Deployment -
Automated through Cloud Build. Required services and infra is provisioned through deployment scripts.

Ingestion -
JSON schema is auto detected and auto updated as per messages.
Customization
Code can be extended to support other file formats.
Any specific generalization can be switched off or enhanced to meet specific requirements.
Can be extended to accommodate CMEK (customer managed encryption keys) related requirements.
Benefits
- Faster onboarding on Google Cloud means faster time to market
- Decreases ramp up time by 4–8 weeks
- Standardization of solutions leads to ease of maintenance
- Configuration driven allows businesses to deploy changes faster
- Out-of-box solutions for common tasks means reduced efforts
- Better risk management leads to more predictable outcomes
Use cases
