Shiv Hari Baral

Data Engineer • Lakehouse & Streaming

I design GCP-first, multi-cloud data platforms—batch and streaming—usingDatabricks,Apache Spark, andDelta Lake. Governed with Unity Catalog, automated with Airflow/Composer and dbt, and optimized for BI & ML.

PythonSQLApache SparkDatabricksDelta LakeKafkaAirflow / ComposerdbtBigQuery • Redshift • SynapseTerraformGCP • AWS • Azure

View Projects Get in Touch

🐙💼🐦✍️

Scroll to explore

Technical Expertise

🌊

Lakehouse & Streaming

Apache Spark & PySpark95%

Databricks & Delta Lake93%

Structured Streaming90%

Kafka / PubSub / Kinesis90%

🧩

Orchestration & ELT

Airflow / Cloud Composer / Workflows92%

dbt & Dataform88%

Auto Loader & CDC (Debezium, CDF)86%

Batch + Streaming Pipelines94%

☁️

Cloud & Warehouses

GCP (BigQuery, GCS, Dataflow)92%

AWS (S3, Glue, Redshift)86%

Azure (Synapse, ADF, ADLS)84%

Snowflake85%

🛡️

Governance & Data Quality

Unity Catalog & RBAC88%

Great Expectations & dbt tests90%

Lineage & Data Catalog89%

HIPAA / GDPR / HITRUST87%

🚀

DevOps & IaC

Terraform & CDK88%

GitHub Actions / CI-CD90%

Docker & Kubernetes (GKE/EKS/AKS)84%

Observability (Cloud Monitoring/Watch)85%

📊

BI & Analytics Enablement

SQL & Data Modeling (Star/Snowflake)92%

Looker / Power BI / Databricks SQL88%

Materialized Views & Tuning90%

Semantic Layer (MRR, churn, LTV)86%

Featured Data Engineering Projects

GridSense Lakehouse (Streaming + Batch)

Real-time energy telemetry platform on Databricks unifying Kafka streams and batch drops into Delta Lake with a Bronze-Silver-Gold medallion model.

Operational Metrics

1.2M evts/sec

throughput

< 500 ms

latency

99.9% SLA

uptime

Technical Implementation

▹Kafka → Databricks Structured Streaming (exactly-once)
▹Delta Live Tables with expectations & Change Data Feed
▹Gold marts served via Databricks SQL and Looker
▹OPTIMIZE + ZORDER; Unity Catalog RBAC and lineage

RevOps ELT on GCP (dbt + Composer)

End-to-end ELT for product, billing, and marketing data (HubSpot, Stripe, web events) into BigQuery for ARR/MRR, churn, and LTV dashboards.

Operational Metrics

5 min

freshness

45% lower

cost reduction

350+ dbt tests

tests

Technical Implementation

▹Fivetran ingestion + dbt models (semantic layer for KPIs)
▹Airflow/Cloud Composer orchestration with backfills
▹Partitioning & clustering; materialized views in BigQuery
▹Great Expectations for data quality + Slack alerts

FHIR/HL7 Healthcare Ingestion (Multi-Cloud)

HIPAA-compliant pipelines normalizing EHR, claims, and device data across AWS, Azure, and GCP into curated analytics layers.

Operational Metrics

5B+ rows/day

records daily

0 incidents

pii leak

3x query boost

speedup

Technical Implementation

▹Auto Loader to Delta Lake; schema enforcement & evolution
▹SCD2 patient registry via MERGE; CDC from Debezium
▹Power BI, Looker, and QuickSight on Gold datasets
▹Terraform + CI/CD (GitHub Actions) for jobs and DLT

FHIR/HL7 Healthcare Ingestion (Multi-Cloud)

Modern Data Engineering Capabilities

🧱

Lakehouse Architecture

Delta Lake with ACID
Medallion (Bronze/Silver/Gold)
Unity Catalog governance

⚡

Streaming Ingestion

Kafka / PubSub / Kinesis
Structured Streaming (exactly-once)
Watermarking & late-arrival handling

🧩

Orchestration & ELT

Delta Live Tables (DLT)
Airflow / Cloud Composer / Workflows
dbt / Dataform with backfills

🛡️

Data Quality & Lineage

Great Expectations / dbt tests
DLT expectations & quarantine
Change Data Feed (CDF) lineage

🏗️

Warehousing & SQL

BigQuery / Snowflake / Redshift / Databricks SQL
Materialized views, partitions, clustering
Z-ORDER and data skipping

🚀

Performance & Cost

OPTIMIZE, Auto Optimize, file compaction
Photon, serverless SQL, caching
Storage tiering & pruning

🔒

Security & Compliance

IAM/KMS, Key Vault, VPC Service Controls
Row/column masking & PII tokenization
Audit, lineage, HIPAA/GDPR/HITRUST

🛠️

DevOps & IaC

Terraform / CDK / ARM / Deployment Manager
GitHub Actions CI/CD, Bundles
Secrets management & policies

📊

BI & Semantic Layer

Looker / Power BI / Databricks SQL
MRR, churn, LTV metric definitions
Delta Sharing to partners

🤖

ML & Feature Pipelines

MLflow tracking & registry
Feature Store (batch/online)
Batch and real-time scoring

🔭

Observability & SLAs

DLT event log & alerts
Cloud Monitoring/Watch/Log Analytics
Freshness/error budgets & SLOs

🤝

Interoperability

CDC via Debezium
Open tables (Delta/Iceberg/Hudi)
External locations & cross-cloud

Let's Engineer a Reliable Data Platform

Looking for a data engineer to design **Lakehouse** architectures, build **batch + streaming ELT** with **Databricks/Spark/Delta**, and deliver **analytics-ready models** on **GCP/AWS/Azure** with governance and CI/CD?

Milpitas, CA

Get in Touch View Resume

DatabricksDelta LakeApache SparkKafkaAirflowdbtBigQueryTerraformUnity Catalog