Assignment 3 Text Analysis of an Unknown Corpus
Communicating the outcomes of analysis of an unknown corpus of documents. Developing and comparing outcomes from ‘bag of words’, TFIDF, clustering, LSA, and LDA / topic models
- Author
- Joshua McCarthy
- Published
- Tue, Jun 16 2020
- Last Updated
- Tue, Jun 16 2020
Context
A fun scenario for an interesting assignment utilising text analysis techniques ando a return to the continually evolving DAM report template.
Buried deep in your manager’s computer is a directory helpfully named “docs.” He has no recollection of what they’re about and is disinclined to wade through them himself. Knowing of your interest in data analysis, he sends you the folder with the challenge to “provide insights into the contents and themes of the documents in the directory” (words excerpted verbatim from his email). As it happens, last weekend you learnt a bunch of techniques that could help you do just that. So you accept the challenge, pretty confident that you can deliver. As you are about to start your analysis, you realise that there could be more at stake here than just a one-off data challenge. If you do a good job, your manager may even keep you in mind when he discusses that new data science role with the CIO. The task thus presents an opportunity to highlight not just your analytical skills but also your ability to weave a narrative around your findings, supported by appropriate visualisations.
Report
The assignment was quite challenging including some information that was quite hard to represent visually without becoming overwhelming.
Example File Name: 2018-02-05_title-title-title Original File: Doc01.txt Link: https://www.linktodoc.notareallink Theme: Organisations, Decision Making Subject: strategic problem solving, risk management Category Tags: Organizations, Paper Review, Risk analysis, sensemaking, Wicked Problems