Shiv Hari Baral

Data Engineer • Lakehouse & Streaming

I design GCP-first, multi-cloud data platforms—batch and streaming—usingDatabricks,Apache Spark, andDelta Lake. Governed with Unity Catalog, automated with Airflow/Composer and dbt, and optimized for BI & ML.

PythonSQLApache SparkDatabricksDelta LakeKafkaAirflow / ComposerdbtBigQuery • Redshift • SynapseTerraformGCP • AWS • Azure
Scroll to explore

Technical Expertise

🌊

Lakehouse & Streaming

Apache Spark & PySpark95%
Databricks & Delta Lake93%
Structured Streaming90%
Kafka / PubSub / Kinesis90%
🧩

Orchestration & ELT

Airflow / Cloud Composer / Workflows92%
dbt & Dataform88%
Auto Loader & CDC (Debezium, CDF)86%
Batch + Streaming Pipelines94%
☁️

Cloud & Warehouses

GCP (BigQuery, GCS, Dataflow)92%
AWS (S3, Glue, Redshift)86%
Azure (Synapse, ADF, ADLS)84%
Snowflake85%
🛡️

Governance & Data Quality

Unity Catalog & RBAC88%
Great Expectations & dbt tests90%
Lineage & Data Catalog89%
HIPAA / GDPR / HITRUST87%
🚀

DevOps & IaC

Terraform & CDK88%
GitHub Actions / CI-CD90%
Docker & Kubernetes (GKE/EKS/AKS)84%
Observability (Cloud Monitoring/Watch)85%
📊

BI & Analytics Enablement

SQL & Data Modeling (Star/Snowflake)92%
Looker / Power BI / Databricks SQL88%
Materialized Views & Tuning90%
Semantic Layer (MRR, churn, LTV)86%

Featured Data Engineering Projects

GridSense Lakehouse (Streaming + Batch)

Real-time energy telemetry platform on Databricks unifying Kafka streams and batch drops into Delta Lake with a Bronze-Silver-Gold medallion model.

Operational Metrics

1.2M evts/sec
throughput
< 500 ms
latency
99.9% SLA
uptime

Technical Implementation

  • Kafka → Databricks Structured Streaming (exactly-once)
  • Delta Live Tables with expectations & Change Data Feed
  • Gold marts served via Databricks SQL and Looker
  • OPTIMIZE + ZORDER; Unity Catalog RBAC and lineage
GridSense Lakehouse (Streaming + Batch)

RevOps ELT on GCP (dbt + Composer)

End-to-end ELT for product, billing, and marketing data (HubSpot, Stripe, web events) into BigQuery for ARR/MRR, churn, and LTV dashboards.

Operational Metrics

5 min
freshness
45% lower
cost reduction
350+ dbt tests
tests

Technical Implementation

  • Fivetran ingestion + dbt models (semantic layer for KPIs)
  • Airflow/Cloud Composer orchestration with backfills
  • Partitioning & clustering; materialized views in BigQuery
  • Great Expectations for data quality + Slack alerts
RevOps ELT on GCP (dbt + Composer)

FHIR/HL7 Healthcare Ingestion (Multi-Cloud)

HIPAA-compliant pipelines normalizing EHR, claims, and device data across AWS, Azure, and GCP into curated analytics layers.

Operational Metrics

5B+ rows/day
records daily
0 incidents
pii leak
3x query boost
speedup

Technical Implementation

  • Auto Loader to Delta Lake; schema enforcement & evolution
  • SCD2 patient registry via MERGE; CDC from Debezium
  • Power BI, Looker, and QuickSight on Gold datasets
  • Terraform + CI/CD (GitHub Actions) for jobs and DLT
FHIR/HL7 Healthcare Ingestion (Multi-Cloud)

Modern Data Engineering Capabilities

🧱

Lakehouse Architecture

  • Delta Lake with ACID
  • Medallion (Bronze/Silver/Gold)
  • Unity Catalog governance

Streaming Ingestion

  • Kafka / PubSub / Kinesis
  • Structured Streaming (exactly-once)
  • Watermarking & late-arrival handling
🧩

Orchestration & ELT

  • Delta Live Tables (DLT)
  • Airflow / Cloud Composer / Workflows
  • dbt / Dataform with backfills
🛡️

Data Quality & Lineage

  • Great Expectations / dbt tests
  • DLT expectations & quarantine
  • Change Data Feed (CDF) lineage
🏗️

Warehousing & SQL

  • BigQuery / Snowflake / Redshift / Databricks SQL
  • Materialized views, partitions, clustering
  • Z-ORDER and data skipping
🚀

Performance & Cost

  • OPTIMIZE, Auto Optimize, file compaction
  • Photon, serverless SQL, caching
  • Storage tiering & pruning
🔒

Security & Compliance

  • IAM/KMS, Key Vault, VPC Service Controls
  • Row/column masking & PII tokenization
  • Audit, lineage, HIPAA/GDPR/HITRUST
🛠️

DevOps & IaC

  • Terraform / CDK / ARM / Deployment Manager
  • GitHub Actions CI/CD, Bundles
  • Secrets management & policies
📊

BI & Semantic Layer

  • Looker / Power BI / Databricks SQL
  • MRR, churn, LTV metric definitions
  • Delta Sharing to partners
🤖

ML & Feature Pipelines

  • MLflow tracking & registry
  • Feature Store (batch/online)
  • Batch and real-time scoring
🔭

Observability & SLAs

  • DLT event log & alerts
  • Cloud Monitoring/Watch/Log Analytics
  • Freshness/error budgets & SLOs
🤝

Interoperability

  • CDC via Debezium
  • Open tables (Delta/Iceberg/Hudi)
  • External locations & cross-cloud

Let's Engineer a Reliable Data Platform

Looking for a data engineer to design **Lakehouse** architectures, build **batch + streaming ELT** with **Databricks/Spark/Delta**, and deliver **analytics-ready models** on **GCP/AWS/Azure** with governance and CI/CD?

Milpitas, CA
DatabricksDelta LakeApache SparkKafkaAirflowdbtBigQueryTerraformUnity Catalog