Science Inventory

How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations

Citation:

Sayles, J., R. Furey, AND M. Ten Brink. How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations. Applied Network Science. Springer Nature, New York, NY, 7:36, (2022). https://doi.org/10.1007/s41109-022-00472-0

Impact/Purpose:

Social network analysis provides information about how and why different people or organizations are connected. Social network analysis can help address many environmental problems that require people to collaborate or coordinate, such as working together to reduce pollution throughout a watershed. Social network analysis, for example, can reveal strong collaborations to build upon and weak ones in need of support. Data for social network analysis has traditionally been collected using interviews and surveys, which can be time consuming and expensive to conduct. Online information from organizations’ websites are an alternative data source that can be collected faster and cheaper. A website consists of any number of individual web-pages that contain information such as hyperlinks, which are often used to understand relationships among organizations in social network analysis. Searching more web-pages within a website, i.e., ‘searching deeper,’ has the potential to provide more data, but requires more time and computing resources and may yield a lot of redundant information. Existing studies using hyperlink data search to a variety of depths with little explanation why and guidance on the pros and cons of searching websites to different depths is lacking. In this paper, we analyze how searching to different depths affects social network analysis and if there are differences in the online data gathered between environmental organizations focused on education and advocacy versus more on-the-ground management activities; these different foci might lead organizations to put different kinds of information on their websites. Our results have both scientific and applied merits. We provide guidance about how to do social network analysis studies using online hyperlink data so that different studies can be more easily compared and to ensure that all relevant data are gathered. We also outline how practitioners looking to use online data to guide activities such as stakeholder engagement can make the best use of their time and resources. We also outline strengths and limitations of the approach. Our results are informative to anyone that wants to learn more about online data collection for social network analysis or is interested in environmental governance, social network tools, and stakeholder capacity building.

Description:

Social network analysis (SNA) tools and concepts are essential for addressing many environmental management and sustainability issues. One method to gather SNA data is to scrape them from environmental organizations’ websites. Web-based research can provide important opportunities to understand environmental governance and policy networks while potentially reducing costs and time when compared to traditional survey and interview methods. A key parameter is ‘search depth,’ i.e., how many connected pages within a website to search for information. Existing research uses a variety of depths and no best practices exist, undermining research quality and case study comparability. We therefore analyze how search depth affects SNA data collection among environmental organizations, if results vary when organizations have different objectives, and how search depth affects social network structure. We find that scraping to a depth of three captures the majority of relevant network data regardless of an organization’s focus. Stakeholder identification (i.e., who is in the network) may require less scraping, but this might under-represent network structure (i.e., who is connected). We also discuss how scraping web-pages of local programs of larger organizations may lead to uncertain results and how our work can combine with mixed methods approaches.

Record Details:

Record Type: DOCUMENT ( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date: 06/06/2022
Record Last Revised: 06/14/2022
OMB Category: Other
Record ID: 354975