Network Analysis – Statistical Analysis of Social Network Data



Next course


Further and more detailed information, including the schedule, can be found in the current course tables in the syllabus of the respective course, if the course is offered in the next sessions. The following text serves as information on what can be expected in terms of content in the course.

Prerequisites and content:

Prerequisite knowledge for the course includes the fundamentals of probability and statistics, especially hypothesis testing and regression analysis. This intermediate level course assumes that students can interpret the results of Ordinary Least Squares, Probit, and Logit regressions. They should also be familiar with the problems that are most common in regression, such as multicollinearity, heteroscedasticity, and endogeneity. Finally, students should be comfortable working with computers and data. No prior knowledge of R or network analysis is required.

The concept of “social networks” is increasingly a part of social discussion, organizational strategy, and academic research. The rising interest in social networks has been coupled with a proliferation of widely available network data, but there has not been a concomitant increase in understanding how to analyze social network data. This course presents concepts and methods applicable for the analysis of a wide range of social networks, such as those based on family ties, business collaboration, political alliances, and social media.

Classical statistical analysis is premised on the assumption that observations are sampled independently of one another. In the case of social networks, however, observations are not independent of one another, but are dependent on the structure of the social network. The dependence of observations on one another is a feature of the data, rather than a nuisance. This course is an introduction to statistical models that attempt to understand this feature as both a cause and an effect of social processes.

Since network data are generated in a different way than many other kinds of social data, the course begins by considering the research designs, sampling strategies, and data formats that are commonly associated with network analysis. A key aspect of performing network analysis is describing various elements of the network’s structure. To this end, the course covers the calculation of a variety of descriptive statistics on networks, such as density, centralization, centrality, connectedness, reciprocity, and transitivity. We consider various ways of visualizing networks, including multidimensional scaling and spring embedding. We learn methods of estimating regressions in which network ties are the dependent variable, including the quadratic assignment procedure and exponential random graph models (ERGMs). We consider extensions of ERGMs, including models for two-mode data and networks over time. Instruction is split between lectures and hands-on computer exercises. Students may find it to their advantage to bring with them a social network data set that is relevant to their research interests, but doing so is not required. The instructor will provide data sets necessary for completing the course exercises.