Prior to performing this citation network analysis, we described the methodology in a study protocol and stored it at an online repository [13]. Deviations from this protocol are mentioned in the digital Additional file 1. In brief, we applied a search strategy to the Web of Science Core Collection, identified relevant literature, downloaded these records with their reference lists, extracted data for each article, built a dataset with potential citations, and used specialized software to determine which citations had occurred. These steps will be explained in more detail below.
Search strategy
The search strategy combined terms related to (a) the determinant (terms like chlorinated pool, swimming, trichloramine), (b) the health outcome (asthma diagnosis and symptoms, lung measures), and (c) the population age (children and adolescents). The exact search strategy can be found in Additional file 2.
Publications were identified via the Web of Science Core Collection. Identification of articles via reference list checking was not applied, as this could result in an overrepresentation of cited articles. The search was performed by BD and updated until 20 June 2017. There were no restrictions with regard to publication language or year.
Both empirical and non-empirical articles were included. Articles were included if they presented data or contained a statement on the association between swimming in indoor chlorinated pools and childhood asthma or other asthma-related health outcome measures. Articles on swimming as recommendation to asthmatic patients were excluded, as well as those on swimming pool accidents. We made some minor deviations with regard to the criteria as described in the study protocol, the updated list can be found in the Additional file 1. Selection of articles was conducted by two authors (BD and MJEU) followed by a consensus meeting. Agreement was reached in all cases.
Data extraction
A range of variables were extracted from each included publication. Data extraction was performed by two authors (BD and MJEU), followed by a consensus meeting. Agreement was reached in all cases. In addition, we developed measures for the within-network authority of the authors and for the occurrence of self-citations. These extracted and developed variables were classified in three distinct categories: article characteristics (study outcome, other content-related, not content-related), author characteristics, and citation characteristics.
Article characteristics––study outcome
We differentiated between two ways of looking at study outcome: data-based conclusion and authors’ conclusion. Selective citation based on either of these classifications of study outcome would signify citation bias.
The data-based conclusion was based on the asthma diagnosis results as reported in the result sections of empirical articles. Asthma diagnosis was assessed in different ways in the various articles. We ranked these assessments in order of decreasing validity: (1) physician’s assessment, (2) self-assessment, (3) occurrence of asthma symptoms, (4) positive lung tests, and (5) blood tests indicative of increased lung permeability. A data-based conclusion was scored as positive if a statistically significant, positive relationship of swimming in chlorinated water with asthma diagnosis was reported. In case of contradicting results, we used the asthma diagnosis with the highest validity. For instance, if blood tests showed a positive relationship with swimming but the physician’s assessment did not, the data-based conclusion was scored as negative.
The authors’ conclusion was scored in a similar way, but then based on the authors’ interpretation of the results instead of the statistical significance of asthma diagnosis. The authors’ conclusion was extracted from the discussion or abstract of a publication.
Article characteristics––other content-related
The following variables were in this subcategory: article type (and study design), sample size, study quality, and specificity.
Article type was classified into empirical articles and non-empirical articles (narrative reviews and commentaries). For some analyses, empirical articles were further classified into study design: experimental studies and observational studies (such as cross-sectional, cohort, ecological, case control studies, case studies, and systematic reviews of observational studies).
Sample size concerned the number of underage participants in the articles (younger than 18). Narrative reviews had no sample size, the other study designs were classified in three fairly equal categories.
The study quality of cross-sectional designs was rated with the NIH National Heart Lung and Blood Institute’s assessment for cross-sectional designs [14]. According to this scale, articles could be classified as good, fair, or poor. Other designs were not rated since the vast majority of the empirical studies was cross-sectional and the other designs had a very low number of publications.
The specificity of the articles could vary. Some articles may deal only with the statement under investigation (i.e., the relationship between swimming in chlorinated water and the development of asthma in children), others were broader (e.g., the health effects of swimming in the general population). The higher the specificity of an article, the better this article would fit in the network. Specificity ranged from 1 (very broad) to 5 (highly specific). Specificity was assessed based on the title of the article.
Article characteristics––not content-related
The following variables were in this category: language (English or not), conclusiveness of the title, funding source, number of authors, number of affiliations, number of references, and journal impact factor. Title conclusiveness was coded as yes if a clear outcome was stated in the title (e.g., “swimming and asthma are related” or “(...) not related”), otherwise as no (e.g., “swimming and asthma”). Funding source was coded as non-profit (e.g., government or university), for profit, both, or not reported. Journal impact factor, in the publication year of the potentially cited article, was retrieved from the Journal Citation Reports (JCR) database.
Author characteristics
The following variables were in this category: gender of the corresponding author (assessed by first name and/or salutation), country of the corresponding author, and affiliation of the corresponding author. Affiliation was classified as government, university, industry, or other.
Citation characteristics
There were some variables that depend on the cited article as well as the citing article: time to citation, authority, and self-citation. (For clarification: when we write about cited articles, citing articles, and citation paths, we refer to potential citations that may or may not occur.)
Time to citation was the number of years between the publication date of the cited article and the submission date of the citing article. This variable was also used to determine the dataset of potential citation paths (see “Section 2.3” below).
As for the publication date, we used either the electronic publication date or the paper publication date, depending on which one was earlier. The average duration from submission to publication was 7 months in this network. Submission date was not always given. If submission date was missing, it was estimated by subtracting 7 months from an article’s publication date.
Within-network authority was a measure for the authority of the authors of a cited article within the network. It was calculated for each author and each year separately, by counting the number of within-network citations to all publications in which the author had been involved. As the number of citations is likely to increase each year, so does the author’s authority. Because we were interested in the authority at the moment of citation, the authority value of a cited article also depends on the publication year of the citing article. In case of multiple authors, we used the authority value of the author with the highest authority in that year.
A self-citation was defined as a citation between two articles that have at least one author in common.
Statistical analysis
The dataset consisted of all potential citation paths between citing and cited articles. A potential citation path means that the cited article is published before submission of the citing article (i.e., time to citation has a positive value). The underlying assumption is that articles can only cite up to their submission date and can only be cited from their publication date onwards. This assumption was met for the entire network with one exception: one article had cited another article that was not yet published at the moment of submission of the citing article. The same authors were involved in both articles, which explains why they could be aware of the cited article before it was published. (This citation was not considered a potential citation and therefore excluded from our analyses.)
Our dependent variable was citation or, in other words, whether a potential citation path was used or not. We used the built-in algorithm of CitNetExplorer to determine whether a citation had occurred [15]. This algorithm makes use of reference lists that can be downloaded from the Web of Science Core Collection. The reference lists of all articles in the network were linked by the algorithm with the actual articles in the network. If possible, this linkage was done by DOI, a unique Digital Object Identifier assigned to most present-day articles; otherwise, it was based on a combination of first author’s surname, first author’s first initial, publication year, volume number, and first page number. Manual checking of the reference lists of the included articles showed that all classified citations were correct and that no citations were missed by the algorithm. The determinants of citation were the characteristics of the cited article as described above.
Since each article could cite multiple other articles, the potential citation paths were related. Therefore, we used a multilevel approach in which the potential citations were nested under the citing article. Specifically, we performed a univariate random-effects logistic regression for each determinant of citation. We repeated these analyses while adjusting for article type.
Where applicable, we also calculated whether the cited and the citing articles had the same characteristics (concordance). This would for instance be the case if positive articles would prefer to cite other positive articles and if negative articles would prefer to cite other negative articles. If citation would be based on the concordance of study outcome, it would be another sign of citation bias. To test if concordance on several characteristics has an impact on the likelihood of citation, univariate and adjusted fixed-effects logistic regression analyses were applied.
Software
We used the built-in algorithm of CitNetExplorer 1.0.0 to extract the actual citations between articles. We used R 3.2.4 to create a dataset with all potential citation paths, based on the data extraction sheet and the actual citations, and also to calculate the within-network authority, self-citation score, and time to citation for each potential citation path. Finally, we used Stata 13.1 to analyze the results and VOSviewer 1.6.0 to assess the co-authorship networks. CitNetExplorer and VOSViewer were also used to visualize the network.