Vanshaj Gupta

Hi, my name is

Vanshaj Gupta.

Data Engineer | Software Engineer.

Curiously captivated by the MYSTERIES hidden in data and systems.

I approach problems like puzzles - solving them piece by piece to build scalable solutions that drive meaningful business impact. My passions span AI, AUTOMATION, DATA ENGINEERING, and SOFTWARE DESIGN. Outside of tech, you'll find me solving Rubik's cubes, exploring entrepreneurship, following sports, or diving into music.

Let's connect if any of these resonate with you - I'm always up for a great conversation.

LinkedIn
Email
GitHub
Tableau

Experience

My professional journey in data engineering and analytics

6 months
Jun 2025 - Present

Business and Data Analyst II

Arizona State University, Learning Enterprise

  • Automated feedback data processing for 60+ courses by designing an ETL pipeline with Apache Airflow, Python and Google Cloud to ingest API data into BigQuery, saving the design team 30+ hours per term
  • Collaborated with 7+ clients to improve revenue reporting by developing end-to-end Alteryx workflow, reducing invoice errors by 75% and manual reporting efforts by 95%
  • Led data architecture design by creating materialized views and data models in BigQuery using SQL, enabling 4+ Looker Studio dashboards that improved business intelligence reporting and decision-making
1 year 5 months
Jan 2024 - Jun 2025

Data Engineer

Arizona State University, Learning Enterprise

  • Performed data modeling on 300K+ Salesforce records in AWS Redshift using Airflow and Python, automating weekly pipelines that reduced operational cost and produced 4+ customized files to support team OKR reporting
  • Enabled self-service analytics for 7+ cross-functional teams by building 12+ Tableau and Looker Studio dashboards with automated refresh schedules and interactive data visualizations, decreasing monthly data team requests by 65%
  • Contributed to data architecture by migrating 8+ database tables from Star Schema to Relational Schema in AWS Redshift
  • Improved data quality by developing Python and SQL validation scripts on AWS Redshift and PostgreSQL, reducing incorrect insights and system failures by 70%, while strengthening data governance through code reviews and schema documentation
  • Accelerated monthly and quarterly reporting cycles by automating 30+ business intelligence reports across Salesforce, Excel, and Google Sheets, providing stakeholders with timely actionable insights to guide business decisions
3 months
Feb 2023 - May 2023

Data Engineer

GCS Medical College, Hospital and Research Centre

  • Enabled accurate healthcare analytics by automating ETL workflows with Apache Airflow and Python to ingest 500K+ electronic healthcare records into PostgreSQL, improving data quality by 35% for downstream reporting
  • Supported targeted healthcare campaigns by developing demographic-based Power BI dashboards with SQL DirectQuery, increasing patient engagement by 25% and expanding reach across 10K+ patients
  • Designed financial analytics dashboard in Power BI for 100+ clients across 20+ services, monitoring revenue trends and equipping finance teams with insights for quarterly strategy reviews

Projects

Featured work and data projects

Stark: A Personalized AI Data Analyst

Stark: A Personalized AI Data Analyst

An intelligent data analysis Slack Bot that uses LLMs to provide personalized insights for EdTech platforms. Built with Python, AWS, and Google Sheets integration.

LLMPythonSlackAWSGoogle Sheets
Cloud Based Face Recognition System

Cloud Based Face Recognition System

End-to-end data pipeline processing real-time sales data from multiple sources. Implemented using AWS Lambda, Kinesis, and Redshift.

AWSPython
Top 50 Highest Paid Athletes

Top 50 Highest Paid Athletes

Interactive Tableau dashboard analyzing compensation trends across different sports. Features dynamic filtering and year-over-year comparisons.

TableauExcel
HR Analytics Dashboard

HR Analytics Dashboard

Machine learning model predicting customer churn with 92% accuracy. Built with Python, scikit-learn, and deployed on AWS SageMaker.

TableauExcel
Super Store Sales Report

Super Store Sales Report

Machine learning model predicting customer churn with 92% accuracy. Built with Python, scikit-learn, and deployed on AWS SageMaker.

Power BIExcel
The Godfather of Cinema

The Godfather of Cinema

Machine learning model predicting customer churn with 92% accuracy. Built with Python, scikit-learn, and deployed on AWS SageMaker.

Tableau

Skills

Tools and technologies I work with

Programming Languages

Python
SQL
Java
Scala
Shell Scripting
Unix/Linux
JavaScript
HTML
CSS

Data Visualization

Tableau
Power BI
Looker Studio

Data Engineering

Apache Airflow
Pandas
NumPy
PySpark
Snowflake
Apache Spark
Kafka
Hadoop
ETL/ELT

Databases and Query Tools

PostgreSQL
AWS Redshift
BigQuery
SQL Server
MySQL
Oracle
MongoDB
Data Modeling

Cloud and DevOps

AWS
GCP
Docker
Azure Data Factory
Kubernetes
Git
Github Actions
CI/CD

Certifications

Professional credentials and achievements

Alteryx Designer Core Certification

Alteryx Designer Core Certification

Alteryx

March 2025

Certified in Alteryx Designer with expertise in data preparation, blending, and analytics workflow automation to solve real-world business problems

View Credential
Hands-On Essentials : Data Warehouse

Hands-On Essentials : Data Warehouse

Snowflake

April 2025

Completed Snowflake Hands-On Essentials workshop with practical experience in data warehousing fundamentals and cloud data platform operations

View Credential

Let's Connect

I'm always interested in hearing about new opportunities, collaborations, or just having a chat about data. Feel free to reach out!