Analyzing Unstructured Data



Next course


Further and more detailed information, including the schedule, can be found in the current course tables in the syllabus of the respective course, if the course is offered in the next sessions. The following text serves as information on what can be expected in terms of content in the course.

As long ago as 2010, Eric Schmidt, the executive chairman of Alphabet, observed that every two days we generate as much information as was created in the entire history of civilization until 2003. The problem is only that much of this information is unstructured by not being organized in a pre-defined manner. This lack of structure complicates extracting useful insights from these massively increasing data sources. Students should have some familiarity with the Python/R programming. Please bring a laptop to class. You also need a Google account to practice using Colab. Learning objectives and course content In this class, we will explore different statistical approaches that have proven useful in making sense out of unstructured data. The course is centered around business applications that involve the analyses of text, social networks, images as well as well as their relationships with meta-data. For most of the analyses, we will use Python/R and dedicate some of the class sessions to hands-on time. Students are invited to bring their unstructured data sets but doing so is not required.