Data Science Experience
Veritas Technologies
Senior Principal Data Scientist (Full-time), Feb 2022 - June 2023
Responsibilities |
Projects |
Initiatives
Data, BI and Reporting Services team in corporate IT. About 60 percent hands-on in ML activities.
Responsibilities
- Lead coordination of teams — junior level data scientists, data architects and visualization specialists — in building machine learning solutions and prototypes for complex business problems in Finance, Talent, Compliance, and for customer support and product management business teams.
- Responsible for the end-to-end development of the data science part of large-scale software engineering projects and ad hoc prototype projects, from ideation and scoping to implementation and deployment, under agile approach.
- Collaborate with business and client teams from various functions to define business objectives, identify value drivers and develop metrics to estimate their impact, identify project and business risks, and discuss strategic opportunities that extend from the data.
- Develop solution architectures or blueprints that break down a business problem into parts, prescribe a suitable model for each part, then recombine the parts as holistic solution through visualization apps like PowerBI, Tableau or Qlik.
- Translate project objectives into a plan to coordinate data preparation and exploration, transformations, feature engineering, model performance and accuracy testing, cross-validation, bagging, boosting, etc.
- Carry out individually and collaboratively as team-member the entirety of the data science process — from data preparation and exploration, to transformation and feature engineering, modeling and interpretation.
- Participate in and sometimes lead daily and weekly routines such as scrum calls, storyboard and status updates, task (re)prioritizations, and slippage notifications.
- Present to business teams, client teams and executives from different functions the actionable outcomes with emphases on discussing business impacts, risk mitigation strategies, and new use case opportunities – all in a non-technical manner.
- Mentor and develop junior data scientists in US, Ukraine and India in machine learning and predictive analytics best practices, identify and prioritize their developmental needs, and provide input to their performance evaluations.
- Carry out continuous learning and develop innovative thinking on assorted machine learning and predictive analytics best practices, and present these ideas to research teams, data science and analytics CoPs, and also publicly at national & regional conferences.
Projects (Team or individual effort, duration 6-8 months each)
Speeding Up Subscription License Conversion (Supervised, 7 Million Records)
- Built a supervised ML/AI model to calculate probability estimations on the likelihood of converting to subscription license across assorted customer characteristics. Probability estimations facilitate prioritization of the Veritas field organization's and resellers' effort on customers who are more likely to convert, thus speeding up overall conversion rate.
- Mentored intern in effort to increase the supervised ML/AI model's prediction accuracy through model parameter tuning and carrying out data preparation for additional predictors sourced from (free) public data sources.
- Mentored intern in building a forecast model to use as a baseline comparison against which actual subscription conversion levels are compared for a truer assessment of the supervised model's impact.
- Value proposition is revenue enhancement vis-a-vis subscription license advantage over perpetual.
- Skills Used: Apache Spark, PySpark, Spark SQL, Synapse, PowerBI, FB Prophet; Methods Used: Regression, Forecasting.
Machine Learning / Artificial Intelligence (ML/AI) Roadmap (not hands-on)
- Conducted about 40 meetings with cross-functional group of corporate and product leaders on their ML/AI activities and plans. Discussions were recorded and transcribed as an archive, which formed basis to create the ML/AI roadmap.
- Roadmap development entailed clarification and syntheses of business objectives and priorities, which were mapped out by function and ML/AI algorithm.
- Used roadmap to streamline and prioritize ML/AI engineering and application development projects by identifying business objectives—initially designated as "siloed"—that were shared across functions.
- Recommended implementation of "explainable AI" module SHAP to leverage data science staff's skills acquisition across algorithms and thus across projects by extension.
Using Market-basket Analysis to Identify Cross-sell Opportunities (Unsupervised, 2 Million Records)
- Directed a market-basket model development effort to identify what products are sold together as sales opportunities, which form bases of various up/cross-sell practices. These practices are leveraged by product managers to increase yield.
- Augmented model predictions with customer data to provide interactive dashboard capability for enhanced actionability. That capability lists specific customers who have purchased one product but not the other.
- Reused and adapted code to extend application's capabilities to satisfy additional product managers' reporting needs.
- Value propositions include greater firm revenue and sales force time savings.
- Skills Used: Apache Spark, PySpark, Spark SQL, Synapse, PowerBI; Methods Used: Association Rules.
Assessing Customers' Propensity-to-Churn (Supervised, 50-100K Records)
- Validated a supervised model that estimates the likelihood that a customer will churn. Validation procedures uncovered multicollinearity problem, mishandling of missing values, invalid data type conversions and unbalanced train-test data.
- Expanded model's prediction accuracy by including customer engagement (CE) measure as predictor and elucidating its relatedness to propensity to churn. CE expanded CSR personnel's options for actionability.
- Wrote and shared guidelines for alternatives to handling missing values with CSR team. Guidelines were sufficiently specific to be of practical value, yet sufficiently general for reuse across projects.
- Value proposition relates to less revenue loss due to customer churn.
- Skills Used: Apache Spark, PySpark, Spark SQL, Synapse; Methods Used: Logistic Regression.
Compliance Product Team Consulting (Not hands on)
Provided consultations to U.S. product manager and India-based software engineering team on project and ad hoc bases.
- Conducted qualitative assessment on the data regarding the efficacy of using text mining–semantic extraction for generating synthetic data through simulation to overcome limitations of small dataset condition.
- Instructed engineering team on analytic procedures related to data segmentation, logistic regression coefficient interpretation, and those distinct to multinomial logistic regression.
- Identified situations where abandoning traditional programming approaches for exclusive reliance on machine learning is applicable or desirable.
Initiatives (Individual effort, duration 1-2 months, concurrent with project work)
Data Science Training Program
- Developed training programs using Udemy online courses and partially tailored for Azure Spark platform. Subject areas included the machine learning process, select modeling methods, and Python programming.
- Courses organized around employee personas' commonly stated learning objectives.