BT5153 Applied Machine Learning for Business Analytics

Course webiste for BT5153

View My GitHub Profile

BT5153

Selected Reports for 2019 Spring

Group 3

The report content is generally quite well done. There is good effort put in to construct dataset for their problem needs by combining various sources of data using various tools, which includes data gathering, cleaning, and transformation. This is in contrast to groups that just obtain data from a data source and sometimes just using it without analysis. There is also good work done in experimenting with various models with fine-tuning details clearly stated. Even constant/ adaptive learning rates, different activation functions, and weights for L2 regularisation were experimented for their neural network models. However, the report needs to be penalised due to the group leaving all important figures and results in the Appendix, which should have been regarded as optional materials. There is also a weird section where movie similarity is said to be useful to predict revenue, yet the group uses it just to identify similar movies instead of using this as a feature to predict movie revenue.

Group 4

The report is well done in general. Good work done in performing data analysis to discover over-presentation for one of the classification classes, looking at average length, syllabus, etc for the input text questions, analysing readability of questions. Also discussed why they used F1 instead of accuracy, good explanation of the strength of word embeddings over tf-idf features. Good work done in experimenting with traditional ML techniques, stacking, and a neural network. Also good work in using word clouds to see which unigrams and bigrams occur most frequently in sincere and insincere questions. There seems to be some confusion by claiming that word embeddings is a model when it is only initialising some weights of a neural network model. There must also be some mistakes made when running some of the models which displayed significant drop (up to half) in F1 when they introduced 6 additional features on top of their tf-idf 60000 features.

Group 12

Excellent work. The group demonstrated clearly how can various techniques work together to solve a business problem. The amount of effort and thought for this project is commendable. This is quite different from others approach in only building 1 model. They did an end-to-end data science project, starting with data scraping, then understanding the possible types of reviews, then labeling the reviews using sentiment analysis-topic models as a starting point, then curating list of replies, and lastly using machine learning algorithm to generate a reply to a review. The part on replies generation is interesting as it showcases how a problem can be solved with models of different complexities.

However, a better evaluation metric for the multi-class classification problem (section5.2) should be used instead of accuracy. While section 5.3 is a good attempt, its description and analysis are kind of abrupt, and can be improved upon.

Group 14

Excellent work. The report is coherent and clear. The ideas are well thought out, and the results from the model are actionable (recommending wine + giving a summary of description of the recommended wine). Even though the models selected are not as technical as the other groups, this project is a good showcase of how to bring value to business problems using analytics, even without state-of-the-art models. The thought process of using lessons learnt from previous step (i.e exploratory data analysis) in the next step (feature engineering) is very nicely illustrated. Just one point to note, the features engineering portion is a little fluffy and can be improved upon.