Does not let me download sas university edition
It was not the “sexiest job of 2019” I was told about. I didn’t know about this side of data science. That means calculating the feature values for a few customers manually. Does a sudden drop in income make sense? Yes, Covid. This involves visualising feature trends and validating them with domain knowledge. The issue was that there was nothing to compare my model features to. To minimise this, a lot of testing was done. If mistakes are made (they were) they would cause a lot of pain down the line (they did). The underlying data fields were spread across multiple tables with inconsistent documentation (if there was any). I had to build many of these variables from scratch. To do so, I included any variables needed for sampling and representation analysis, segmentation analysis, fairness analysis and model evaluation. I had to justify all of my modelling decisions. This does not only include model features. Most of this time went into building our dataset. Imagine my surprise when it took a team of 3 of us 8 months to build a credit risk model. It took me a couple of hours to get 99.9% accuracy.
Lesson 3: working with data is hard workīuilding models at university was a breeze-clean datasets, pre-engineered features and automated hyper-parameter tuning. It gave me the opportunity to create something that could impact the world more than I could have ever done alone. These models were used to automate processes across the bank.
Marketing - identify the best customers to promote a product to.Churn-identify customers who intend to leave the bank.Pre-areas - identify customers in financial distress.Fraud - predict if customers do not intend to repay a loan.Credit risk - predict default due to financial distress.It sank in when I saw all the applications in the banking industry alone. Less disappointing was the realisation of how useful machine learning is. Lesson 2: machine learning has many applications Many data scientists will never need them. In the first week, I remember one of my senior colleagues saying: Leaving uni, I had learned so much about random forests, XGBoost and neural networks. I would have also needed to explain the method I used to explain my model. The problem is they wouldn’t give me the same level of certainty. Sure, I could have used methods like SHAP or PDPs and ICE Plots. Black box models would have been more difficult to explain. A non-technical colleague had to agree they captured a relationship that existed in reality.
Each of these features had to be thoroughly explained. With regression, I ended up with models that had 8 to 10 features. The improvement also had to justify the effort of explaining the algorithm. To adopt a new algorithm, it not only had to outperform regression. They are also widely understood and accepted at the bank. The performance of regression models was good enough. From banking to insurance, much of the financial world runs on regression. I exclusively build models using logistic regression. You may think that, with such high stakes, I would be doing advanced machine learning. I’m talking applications worth billions of euros a year. They were used to automate lending on a large scale. My job involved building credit risk and fraud models. Lesson 1: logistic regression goes a long way I hope that we can get around the hype and improve your understanding of what a data scientist does. In short, I fell victim to the hype around the profession. Applying new methods to drive unique insights. I expected to work on the forefront of computer science, statistics and machine learning. I was able to work on impactful projects and I learned an immense amount. I was ready for my first job in one of Ireland’s biggest banks.