Shuvo Barman

Data Engineer
& Analyst

Automating economic data workflows with Python & Azure at Environment & Climate Change Canada. MSc Computer Science ยท Building the infrastructure that turns raw data into decisions.

๐Ÿ Python & SQL โš™๏ธ DataOps & CI/CD โ˜๏ธ Cloud Infrastructure ๐Ÿ”ฅ Distributed Processing ๐Ÿ›ก๏ธ Data Governance & Quality
Scroll
Shuvo Barman
SB
85%
Faster Data Processing
4K+
Fintech Transactions / Day
6+
Years Experience
M.Sc.
in Computer Science

Building data infrastructure
that drives decisions.

I'm a data and software professional with an M.Sc. in Computer Science, passionate about data engineering and analytics. My career has taken me from building Android healthcare applications in Bangladesh, to processing financial transactions at scale in fintech โ€” and now to designing and automating data workflows for Canada's federal government.

At Environment and Climate Change Canada, I build end-to-end data pipelines in Azure DevOps, develop Power BI dashboards for policy and analysis teams, and write Python and Julia tools that replace slow manual processes with reliable, automated systems. My official title is Junior Economist โ€” but the work is engineering through and through.

I'm actively seeking opportunities in Data Engineering where I can design scalable pipelines, build clean data infrastructure, and help teams spend less time moving data โ€” and more time using it.

85% reduction in data processing time after architecting Azure DevOps pipelines with YAML-based build and release definitions at ECCC
99.99% accuracy on an automated Python system processing 4,000+ daily cashback transactions at Upay, replacing an entirely manual process
40% improvement in stakeholder response time after delivering real-time Power BI dashboards to decision-makers at ECCC
Currently

Junior Economist โ€” Data & Automation ยท Environment and Climate Change Canada ยท Gatineau, QC

Data Pipelines & Orchestration
ETL / ELT Pipelines Apache Airflow Azure DevOps CI/CD Batch Processing & Ingestion
Data Processing & Transformation
Python SQL PySpark Julia Pandas NumPy Data Wrangling Data Modeling
Cloud, Storage & Databases
Microsoft Azure Amazon Web Services (AWS) Azure Blob Storage Data Lake Data Warehouse PostgreSQL Vector Databases NoSQL File Systems Parquet
Business Intelligence
Power BI DAX Tableau Data Visualization Matplotlib
Data Quality & Governance
Schema Validation Data Quality Testing Great Expectations Ensuring Data Integrity Data Management/Governance
Machine Learning
Scikit-learn Feature Engineering Predictive Modeling Customer Segmentation Statistical Analysis

Projects

๐Ÿ›๏ธ
Python Scikit-Surprise SVD KNNWithMeans Pandas

Amazon Beauty Product Recommendation System

Implemented and compared five recommendation algorithms on Amazon's Beauty product dataset to tackle cold-start and data sparsity challenges in real-world recommender systems.

  • Built demographic (weighted average), rank-based, item-based (KNNWithMeans), and matrix factorisation models (SVD, SVD++) evaluated with 5-fold cross-validation
  • SVD achieved best cold-start performance (RMSE: 0.72, MAE: 0.007); user-based SVD performed best on sparse data (RMSE: 1.25, MAE: 0.97)
  • Addressed the sparsity problem by reducing the dataset to users with 5+ ratings, bringing interaction density to 0.28% โ€” a realistic production scenario
View on GitHub
๐Ÿ 
Python EDA Feature Engineering KNN Regression Linear Regression

New York Airbnb โ€” EDA & Price Prediction

Analysed the September 2022 NYC Airbnb listings dataset to identify price-driving features and build a listing price prediction model using supervised regression.

  • Used ML-based imputation to handle outliers and missing values โ€” KNN with Robust Scaler for minimum nights, Linear Regression with MaxAbs Scaler for review scores
  • Applied Mutual Information scoring across features to select the 14 most price-relevant attributes; compared 5 scaling techniques to optimise model input
  • Identified host listing count, review scores, and room type as the strongest predictors of listing price
View on GitHub
โค๏ธ
Python Logistic Regression KNN K-Fold CV EDA

Heart Attack Analysis & Prediction

Built a binary classifier to predict heart attack risk from clinical patient data, comparing three algorithms across five scaling techniques to find the best-performing combination.

  • Compared KNN, Logistic Regression with K-Fold CV, and Linear Classification โ€” K-Fold Logistic Regression achieved the highest training accuracy (~90%) and was selected for test prediction
  • Applied 5 scaling methods (MinMax, Z-score, MaxAbs, Robust, Quantile) to evaluate their effect on each model's F-score before choosing the final configuration
  • Achieved 86% accuracy on test data using K-Fold cross-validated Logistic Regression with the liblinear solver
View on GitHub

Experience

Sep 2023 โ€“ Present
Environment & Climate Change Canada
โ†— canada.ca/eccc
Junior Economist ยท May 2024 โ€“ Present
  • Scaled the Azure DevOps CI/CD pipeline to process 15 scenarios end-to-end in under 40 minutes โ€” down from hours of manual work using Access database files that could not be shared across the team
  • Engineered the pipeline to extract 58 target .dta files from 460+ files per scenario folder, producing 58 unique Parquet files, 63 master files, and 5 delta comparison files tracking changes across Greenhouse Gas (GHG) Emissions, Air Pollution, Electricity Generation, and Generation Capacity โ€” covering economic sectors including Oil & Gas, Electricity, Transportation, Heavy Industry, Agriculture, Buildings, and Waste & Others
  • Automated VM cleanup and Azure Blob Storage upload as part of the same pipeline run, ensuring no manual handoff steps between processing and reporting
  • Scaled Power BI reporting to support up to 20 scenarios across 18 stakeholder-facing tabs โ€” replacing an R/Access workflow that could not handle multi-scenario comparison at this scale, with each tab tailored to a specific visual need
  • Built a post-pipeline email alert system in Python that monitors key economic variables and z-driver values against sector-specific thresholds โ€” proactively notifying Economic Sector Leads of significant cross-sector changes after every pipeline run
  • Cut province-specific stakeholder report distribution from 3โ€“4 hours to under 2 minutes using Python automation โ€” covering 13 provinces, 10โ€“30 tabs per workbook, with consistent formatting, bilingual data validation, and dynamic schema handling across all outputs
Junior Economist ยท Jan 2024 โ€“ Apr 2024
  • Built automated Azure ETL pipelines to replace fragmented manual workflows, reducing manual data handling by 70% with zero data integrity issues
  • Developed Python scripts to convert legacy .dta files to Parquet format, enabling direct Power BI integration and eliminating intermediate processing steps
  • Saved Economic Sector Leads 10 hours of manual reporting work per week by delivering Power BI dashboards that automated previously manual weekly scenario runs
Co-op Student ยท Sep 2023 โ€“ Dec 2023
  • Eliminated manual scientific data conversion by building a Julia package that automated transformation of raw datasets into NeXus/HDF5 format, with full technical documentation for team knowledge transfer
Aug 2021 โ€“ Jul 2022
Upay โ€” UCB Fintech Company Limited
Sr. Executive, Revenue Assurance & Business Intelligence
โ†— upaybd.com
  • Engineered ETL pipelines using PostgreSQL and Python to process customer transaction data daily, enabling real-time analytics across customer care, marketing, sales, and tech operations teams
  • Replaced a fully manual cashback disbursement process by building an automated Python system handling 4,000+ daily transactions with 99.99% accuracy โ€” reducing processing time from 24 hours to 30 minutes and cutting weekly customer complaints by 25%
  • Built Tableau dashboards tracking revenue performance and operational metrics, giving business teams a single source of truth for day-to-day decision making
  • Developed proof-of-concept ML models for customer churn prediction, next-transaction prediction, and transaction pattern analysis โ€” identifying at-risk customers with 85% accuracy
  • Built customer segmentation models that improved campaign targeting accuracy by 30% and increased weekly transaction volume by 15%
  • Automated financial reporting and stakeholder email alerts using Python, replacing manual report preparation across the revenue assurance team
  • Evaluated campaign plans submitted by product and sales teams against historical data and target benchmarks before go-live โ€” ensuring every campaign launched was grounded in evidence, not assumption
Dec 2018 โ€“ Aug 2021
USAID MaMoni Maternal & Newborn Care Project
Java Programmer โ€” Mobile Application ยท ~3 Years
โ†— MaMoni Project โ€” Save the Children Resource Centre
Retained via: Save the Children in Bangladesh (Mar 2020 โ€“ Aug 2021) Enroute International Limited (Oct 2019 โ€“ Feb 2020) Dnet โ€” a social enterprise (Dec 2018 โ€“ Sep 2019)
  • Digitised maternal and child health records for the Government of Bangladesh by converting paper-based registers into a structured mobile application โ€” capturing routine checkups, clinical visits, and full health histories for pregnant women and newborns
  • Improved application performance and maintainability by replacing third-party UI libraries with native Android design components, reducing external dependencies and ensuring a consistent user experience across devices
  • Delivered new modules end-to-end โ€” from requirements gathered with system analysts and DGFP stakeholders through to interface design, database schema, and production code
  • Supported nationwide adoption by conducting hands-on training across 10 districts, training 300+ government health workers on using the system in the field
  • Maintained delivery schedules across a multi-organisation project structure by collaborating with project managers, system analysts, and government counterparts on implementation planning

Education

M.Sc. in Computer Science
Memorial University of Newfoundland
Sep 2022 โ€“ Apr 2024 ยท St. John's, NL, Canada
Focused on data analysis, data wrangling, data visualisation, and machine learning. Built hands-on experience applying statistical and computational techniques to real-world datasets across coursework and projects.
B.Sc. in Computer Science & Software Engineering
American International University-Bangladesh (AIUB)
2014 โ€“ 2018 ยท Dhaka, Bangladesh
Studied algorithms, data structures, software development, and computer networks. Built a strong foundation in both software engineering theory and applied development.

Certifications

DataCamp
Data Manipulation with pandas
Python ยท Data Wrangling
View Credential โ†—
DataCamp
Introduction to Data Visualization with Matplotlib
Python ยท Data Visualisation
View Credential โ†—
DataCamp
Introduction to NumPy
Python ยท Scientific Computing
View Credential โ†—

Extracurricular Activities

Say Hello

Let's Connect

Whether you have an opportunity, a question, or just want to talk data โ€” my inbox is always open.

Send Me an Email