Discrepancies among Scopus and Web of Science, coverage of funding information in medical journal articles: a follow-up study

Objective: This follow-up study aims to determine if and how the coverage of funding information in Web of Science Core Collection (WoS) and Scopus changed from 2015 to 2021. Methods: The number of all funded articles published in 2021 was identified in WoS and Scopus bibliographic databases using bibliometric analysis on a sample of 52 prestigious medical journals. Results: The analysis of the number of funded articles with funding information showed statistically significant differences between Scopus and WoS due to substantial differences in the number of funded articles between some single journals. Conclusion: Due to significant differences in the number of funded articles indexed in WoS and Scopus, which might be attributed to the different protocols for handling funding data in WoS and Scopus, we would still advise using both databases to obtain and analyze funding information.


INTRODUCTION
Research databases such as Web of Science Core Collection, Dimensions, Scopus, MEDLINE, and similar organize research information for knowledge discovery, research assessment, and research access. Bibliographic data that can be used for bibliometric analysis from these research databases is also essential to the research community. This data is currently used to assess research visibility, evolution, and regional and global collaborations through bibliometric analysis [1][2][3]. Most bibliometric research assessments are based on citations; other important data for this are funding information, coauthorship, publishers' and authors' location, and the number of authors. For instance, funding information has been used to predict research visibility, researchers' maturity, authors' intention to collaborate, and prolific funding agencies. Funding information has also been included in some national, institutional, or regional research policies in assessing researchers or research institutions for promotion, funding consideration, ranking, and awards [4][5][6][7][8][9][10][11][12]. Librarians may be asked to analyze the linkages between funding and academic publications, information sources or research topics, identify funding possibilities, to perform bibliometric analyses involving funding acknowledgment patterns identification and similar [13][14][15]The emergence of the COVID pandemic has increased the need for researchers to be informed about funding strategies for COVID research and there are a the number of bibliometric funding studies which analyze international cooperation in COVID research [16], funding of COVID research projects [17], and the funding of COVID vaccine development [18].
Journal indexing systems should provide accurate information because their role in the formal evaluation of scientific productivity translates into the power to steer research [19]. Web of Science and Scopus are the two most important bibliographic databases providing funding information [20,21]. Previous research has shown that there are some discrepancies between those two databases in general, for example, in the journal subject classification [22], the number of published records [23], the number of citations [24], and the document type [25]. This study is a follow up to the 2015 study on funding data differences See end of article for supplemental content.
of prestigious medical journals in WoS, Scopus, and PubMed [26] The 2015 study revealed significant differences between the number of funded articles (FAs) in those three bibliographic databases and that WoS contained the largest number of FAs. The follow-up study's objective is first to determine if and how the coverage of funding information in WoS and Scopus for the same family of medical journals changed from 2015 to 2021 and secondly to assess the differences and overlap of funding information of those two databases in 2021.

METHODOLOGY
In the original study, the authors analyzed funding information for articles published in three prestigious families of medical journals: The Lancet, Journal of American Medical Association (JAMA), and British Medical Journal (BMJ) indexed in WoS and Scopus bibliographical databases. The selection of the above three families were chosen because they are highly regarded, well known, and impactful in terms of citations received. PubMed funding data can only be exported in NBIB format (the NBIB file is a bibliographic citation file saved in the PubMed format), which could not be analysed in the form as it can be with the other two databases. Additionally, since 2016 WoS has included Medline funding data in its bibliographic records, and all the analyzed journals are indexed in both PubMed and WoS; we didn't include PubMed in the present study. The information if an article was funded was obtained from the Funding organization field in WoS and the Funding sponsor field in Scopus. Two types of corpora, one for FAs and one for all articles for each database, were created for articles published in 2021. Search strings used are shown in Table 1. Table 1 Search strings used in the study. A list of all possible funding organizations and sponsors was formed using the wildcard character (*).

Bibliographic database
Search string for All articles corpora Search string for FAs corpora Web of Science (WoS) SO = (JAMA* or BMJ* or Lancet*) and PY = 2021 SO = (JAMA* or BMJ* or Lancet*) and PY = 2021 and FO = (a* or b* or c* or d* or e* or f* or g* or h* or i* or j* or k* or l* or m* or n* or o* or p* or q* or r* or s* or t* or u* or v* or z* or x* or y* or w* or 1* or 2* or 3* or 4* or 5* or 6* or 7* or 8* or 9* or 0*) Scopus SRCTITLE(Lancet or BMJ or JAMA) SRCTITLE(Lancet or BMJ or JAMA) and PUBYEAR = and PUBYEAR = 2021 2021 and FUND-SPONSOR(a* or b* or c* or d* or e* or f* or g* or h* or i* or j* or k* or l* or m* or n* or o* or p* or q* or r* or s* or t* or u* or v* or z* or x* or y* or w* or 1* or 2* or 3* or 4* or 5* or 6* or 7* or 8* or 9* or 0*)  The most frequent types of FAs identified in WoS were in articles (71%), reviews (11%), editorials (8%), and letters (7%), whereas the most frequent types of FAs in Scopus were in articles (73%), reviews (13%), letters (11%), and editorials (1%). The main difference between both databases is the percentages of editorials and letters, the first having more funding acknowledgments in WoS and the second in Scopus. In the 2015 study, all FAs in Scopus were articles, while in 2021, articles represented 59% of FAs.
The analysis of the most prolific funding organizations in both databases is shown in Table 3. We can see that both lists differ significantly. Among funding organizations, the United States Department of Health and Human Services, European Commission, and UK Research Innovation are mentioned considerably more frequently in WoS. In contrast, National Institute for Health Research, Welcome Trust, and pharmaceutical organizations are more frequent in Scopus. To be able to perform this comparison we had to align slight differences in funding organization naming between both two databases. The process was done manually.

DISCUSSION
The difference in the percentage of FAs between Scopus and WoS in the family of prestigious medical journals has reduced since 2015 from 21.5% to 1.4%. in 2021. However, the difference is still statistically significant due to sometimes large differences between single journals. Furthermore, there are only 25 journals where the difference between percentages of FAs in both databases is less than 10%, mainly in the cases where percentages of FAs are larger in Scopus. According to our analysis, one of the differences between WoS and Scopus FAs is the larger share of acknowledgments appearing in documents labeled as editorials in WoS. But we also believe that the main reason is the handing/acquiring of funding data by both databases. According to the Web of Science Group [27], WoS started to supplement funding information with grant agencies from Researchfish and Medline in 2016 and simultaneously started to unify the funding data. On the other hand, Scopus comprehensively cover grants from the United States, United Kingdom, pan European bodies, and some other selected funding organizations around the globe, based on the Founder Registry, which was facilitated by CrossRef, and Elsevier is one of the founders [28].
While recently, bibliographic databases have become leading providers of publication metadata for research assessment [9,20] and research grants performance and monitoring [29,30], our study implies that selecting the only one of these databases might produce misleading results. For example, a funding agency might evaluate their impact by analyzing the number of FAs in publications and over or underestimate the impact by selecting the wrong database. Similarly, a research institution's human relations manager might use this data to find or consider suitable candidates and may not have an accurate representation of how they have used funding to support their research. On a more individual level, a researcher seeking funding for their research project might submit a proposal to an agency that does not financing their research topic. We would, therefore, advise researchers, librarians, grant administrators seeking funding information related to medical topics, funding bodies, or research organizations to use both databases to obtain more reliable information about funding data and patterns.

STUDY LIMITATIONS
The study has potential limitations. The search string might not have captured all FAs, the same string was used in the original study and may not capture funding organization names that start with something other than a letter. The analysis was performed on a sample of 52 medical journals, among more than 1000 indexed in both WoS and Scopus in 2021, so the generalization of the results might be limited.

DATA AVAILABILITY
For legal reasons, data from Clarivate Web of Science and Scopus cannot be made openly available.