Designed and built a consolidated enterprise data lake on AWS EMR and Apache Iceberg for a major research university — turning complex on-premises Workday and PeopleSoft data into queryable, relational tables through an auto-relationalization engine.
- Designed and built an auto-relationalization engine in Python and Spark that converts complex Workday and PeopleSoft XML into queryable Iceberg tables — automatically, end-to-end.
- Replaced Airflow with an event-driven Step Functions / EventBridge / Lambda architecture for the Student Data Warehouse — dynamic DAG generation, no orchestration server to maintain.
- Stood up a Bedrock-powered Lambda for autonomous CloudWatch metrics analysis; reduced API call volume by 98% through BU-filtered queries.
- Implemented complete infrastructure-as-code using Terraform: 50+ Lambda functions, 20+ Step Functions workflows, full data governance framework.