Prerequisites (knowledge of topic)
As long ago as 2010, Eric Schmidt, the executive chairman of Alphabet, observed that every two days we generate as much information as was created in the entire history of civilization until 2003. The problem is only that much of this information is unstructured by not being organized in a pre-defined manner. This lack of structure complicates extracting useful insights from these massively increasing data sources. Students should have some familiarity with the Python/R programming. Please bring a laptop to class. You also need a Google account to practice using Colab.
Learning objectives and course content
In this class, we will explore different statistical approaches that have proven useful in making sense out of unstructured data. The course is centered around business applications that involve the analyses of text, social networks, images as well as well as their relationships with meta-data. For most of the analyses, we will use Python/R and dedicate some of the class sessions to hands-on time. Students are invited to bring their unstructured data sets but doing so is not required.