Uniform Sampling of Facebook
Monday, March 8, 2010 - 6:00 p.m. to Tuesday, March 9, 2010 - 6:55 p.m.
Engineering Gateway 3161
Center for Pervasive Communications and Computing Seminar Series
Featuring Minas Gjoka
Ph.D. Candidate
The Henry Samueli School of Engineering, UC Irvine
Location: Engineering Gateway 3161
Free and open to the public
Abstract:
With more than 250 million active users, Facebook is currently one of the most important online social networks. Our goal is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph. In this quest, we consider and implement several candidate techniques. Two approaches that are found to perform well are the Metropolis-Hasting random walk (MHRW) and a re-weighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the "ground-truth", obtained through true uniform sampling of Facebook userIDs. In contrast, the traditional Breadth-First-Search (BFS) and Random Walk (RW) perform quite poorly, producing substantially biased results. In addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these can be used to effectively determine when a random walk sample is of adequate size and quality for subsequent use. Using these methods, we collect the first, to the best of our knowledge, unbiased sample of Facebook. Finally, we use one of our representative datasets, collected through MHRW, to characterize several key properties of Facebook.
About the Speaker:
Minas Gjoka received a B.S. degree in computer science from the Athens University of Economics and Business, Greece, in 2005 and an M.S. degree in networked systems from the University of California, Irvine, in 2008. He is currently a Ph.D. student in the Department of Electrical Engineering and Computer Science at the University of California, Irvine. His research interests include online social networks, peer-to-peer systems, network measurements, network protocols, and internet modeling.
Share
Upcoming Events
-
EECS Seminar: Steering Diffusion Models for Generative AI, From Multimodal Priors to Test-Time Scaling
-
MAE 298 SEMINAR: Hypersonic Viscous Aerothermochemistry - External Aerothermodynamics and Scramjet Fuel-Air Mixing
-
CBE 298 Seminar: Finding Catalysts of Gut Reactions - The Gut Microbiota in Disease Onset and Treatment
-
CEE Seminar: Confirming a Critical Foundation of Global Warming - Direct Observational Evidence from Space of the Impact of CO2 Growth on Infrared Spectra
-
CBE 298 Seminar: Teaching Transport Phenomena Through Observation - From Einstein’s Tea Leaves to Dissolving Skittles