Shiv Hari Baral
Data Engineer • Lakehouse & Streaming
I design GCP-first, multi-cloud data platforms—batch and streaming—usingDatabricks,Apache Spark, andDelta Lake. Governed with Unity Catalog, automated with Airflow/Composer and dbt, and optimized for BI & ML.
Technical Expertise
Lakehouse & Streaming
Orchestration & ELT
Cloud & Warehouses
Governance & Data Quality
DevOps & IaC
BI & Analytics Enablement
Featured Data Engineering Projects
GridSense Lakehouse (Streaming + Batch)
Real-time energy telemetry platform on Databricks unifying Kafka streams and batch drops into Delta Lake with a Bronze-Silver-Gold medallion model.
Operational Metrics
Technical Implementation
- ▹Kafka → Databricks Structured Streaming (exactly-once)
- ▹Delta Live Tables with expectations & Change Data Feed
- ▹Gold marts served via Databricks SQL and Looker
- ▹OPTIMIZE + ZORDER; Unity Catalog RBAC and lineage

RevOps ELT on GCP (dbt + Composer)
End-to-end ELT for product, billing, and marketing data (HubSpot, Stripe, web events) into BigQuery for ARR/MRR, churn, and LTV dashboards.
Operational Metrics
Technical Implementation
- ▹Fivetran ingestion + dbt models (semantic layer for KPIs)
- ▹Airflow/Cloud Composer orchestration with backfills
- ▹Partitioning & clustering; materialized views in BigQuery
- ▹Great Expectations for data quality + Slack alerts

FHIR/HL7 Healthcare Ingestion (Multi-Cloud)
HIPAA-compliant pipelines normalizing EHR, claims, and device data across AWS, Azure, and GCP into curated analytics layers.
Operational Metrics
Technical Implementation
- ▹Auto Loader to Delta Lake; schema enforcement & evolution
- ▹SCD2 patient registry via MERGE; CDC from Debezium
- ▹Power BI, Looker, and QuickSight on Gold datasets
- ▹Terraform + CI/CD (GitHub Actions) for jobs and DLT

Modern Data Engineering Capabilities
Lakehouse Architecture
- Delta Lake with ACID
- Medallion (Bronze/Silver/Gold)
- Unity Catalog governance
Streaming Ingestion
- Kafka / PubSub / Kinesis
- Structured Streaming (exactly-once)
- Watermarking & late-arrival handling
Orchestration & ELT
- Delta Live Tables (DLT)
- Airflow / Cloud Composer / Workflows
- dbt / Dataform with backfills
Data Quality & Lineage
- Great Expectations / dbt tests
- DLT expectations & quarantine
- Change Data Feed (CDF) lineage
Warehousing & SQL
- BigQuery / Snowflake / Redshift / Databricks SQL
- Materialized views, partitions, clustering
- Z-ORDER and data skipping
Performance & Cost
- OPTIMIZE, Auto Optimize, file compaction
- Photon, serverless SQL, caching
- Storage tiering & pruning
Security & Compliance
- IAM/KMS, Key Vault, VPC Service Controls
- Row/column masking & PII tokenization
- Audit, lineage, HIPAA/GDPR/HITRUST
DevOps & IaC
- Terraform / CDK / ARM / Deployment Manager
- GitHub Actions CI/CD, Bundles
- Secrets management & policies
BI & Semantic Layer
- Looker / Power BI / Databricks SQL
- MRR, churn, LTV metric definitions
- Delta Sharing to partners
ML & Feature Pipelines
- MLflow tracking & registry
- Feature Store (batch/online)
- Batch and real-time scoring
Observability & SLAs
- DLT event log & alerts
- Cloud Monitoring/Watch/Log Analytics
- Freshness/error budgets & SLOs
Interoperability
- CDC via Debezium
- Open tables (Delta/Iceberg/Hudi)
- External locations & cross-cloud
Let's Engineer a Reliable Data Platform
Looking for a data engineer to design **Lakehouse** architectures, build **batch + streaming ELT** with **Databricks/Spark/Delta**, and deliver **analytics-ready models** on **GCP/AWS/Azure** with governance and CI/CD?