Integrity of the Scholarly Record Archives - Digital Science

Shining a light on conflict of interest statements

Simon Linacre — Thu, 05 Sep 2024 14:56:41 +0000

Authors either have a conflict of interest or not, right? Wrong. Research from Digital Science has uncovered a tangled web of missing statements, errors, and subterfuge, which highlights the need for a more careful appraisal of published research.

At this year’s World Conference on Research Integrity, a team of researchers from Digital Science led by Pritha Sarkar presented a poster with findings from their deep dive on conflict of interest (COI) statements. Entitled Conflict of Interest: A data driven approach to categorisation of COI statements, the initial goal was to look at COI statements with a view to creating a binary model that determines whether a Conflict of Interest statement is present or not in an article.

However, all was not as it seemed. While some articles had no COI and some had one present, those present covered a number of different areas, which led the team to think COIs might represent a spectrum rather than binary options.

Gold standard

Conflict of interest is a crucial aspect of academic integrity. Properly declaring a COI statement is essential for other researchers to assess any potential bias in scholarly articles. However, those same researchers often encounter COI statements that are either inadequate or misleading in some way even if they are present.

The Digital Science team – all working on research integrity with Dimensions – soon realized the data could be leveraged further to better explore the richness inherent in the nuanced COI statements. After further research and analysis, it became clear that COI statements could be categorized into six distinct types:

None Declared
Membership or Employment
Funds Received
Shareholder, Stakeholder or Ownership
Personal Relationship
Donation

This analysis involved manually annotating hundreds of COI statements with Natural Language Processing (NLP) tools. The aim was to create a gold standard that could be used to categorize all other COI statements, however despite the team’s diligence a significant challenge persisted in the shape of ‘data skewness’ – which can be defined as an imbalance in the distribution of data within a dataset that can impact data processing and analytics.

Fatal flaw

One irresistible conclusion to the data skewness was a simple one – that authors weren’t truthfully reporting their conflicts of interest. But could this really be true?

The gold standard approach came from manually and expertly annotating COI statements to develop an auto-annotation process. However, despite the algorithm’s ability to auto-annotate 33,812 papers in just 15 minutes, the skewness that had been initially identified persisted, leading to the false reporting theory for authors (see Figure 1 of COI Poster).

To firm up this hypothesis, when the Retraction Watch database was analyzed, the troubling trend, including the discrepancy between reported COI category and retraction reason, became even more apparent (see Figure 2 of the COI Poster).

Moreover, when the team continued with the investigation, they found there were 24,289 overlapping papers in Dimensions GBQ and Retraction Watch, and among those papers, 393 were retracted due to conflict of interest. Out of those 393 papers, 134 had a COI statement, however 119 declared there was no conflict to declare.

Conclusion

Underreporting and misreporting conflict of interest statements or types can undermine the integrity of scholarly work. Other research integrity issues around paper mills, plagiarism and predatory journals have already damaged the trust the public has with published research, so further problems with COIs can only worsen the situation. With the evidence of these findings, it is clear that all stakeholders in the research publication process must adopt standard practices on reporting critical trust markers such as COI to uphold the transparency and honesty in scholarly endeavors.

To finish on a positive note, this research poster was awarded second-place at the 2024 World Conference on Research Integrity, showing that the team’s research has already attracted considerable attention among those who seek to safeguard research integrity and trust in science.

You can find the poster on Figshare: https://doi.org/10.6084/m9.figshare.25901707.v2

Partial data and the code for this project are also available on Figshare.

For more on the topic of research integrity, see details of Digital Science’s Catalyst Grant award for 2024, which focuses on digital solutions around this topic.

About the Author

Simon Linacre, Head of Content, Brand & Press | Digital Science

Simon has 20 years’ experience in scholarly communications. He has lectured and published on the topics of bibliometrics, publication ethics and research impact, and has recently authored a book on predatory publishing. Simon is an ALPSP tutor and has also served as a COPE Trustee.

The post Shining a light on conflict of interest statements appeared first on Digital Science.

Open Principles

Aileen Irons — Tue, 16 Apr 2024 07:13:05 +0000

Digital Science Open Principles

Transforming Research

Digital Science exists at the intersection of publicly and privately funded research, by serving universities, funders and governments on the one hand, and commercial research organizations on the other. While we aim for a world in which research can make the biggest difference to all, these principles help to contextualize the work that Digital Science does to support the publicly funded, open research ecosystem.

1. Community ownership: We believe that research outcomes are owned by the global community and should be available to all.

We believe that research is the single most powerful transformative force for the positive development of humanity, and as such, knowledge and research outcomes should be shared for common good. Only by making research, and the metadata that describes research, available to all can society derive maximal benefit. Leading innovation is based on knowledge and research – being able to organize, locate, access and share research is critical as a basis for sustainable innovation. We acknowledge that not all research can be made available straight away due to ethical or practical considerations, but we believe that, consistent with reasonable expectations, the outputs of publicly funded research should be available to all.

At Digital Science, we provide advanced technologies to all that help to locate the right research and provide mechanisms for the community to make research available to the broadest possible audience so that they can discover more and better innovate.

Altmetric and Dimensions are built on a mix of open and licensed data. Dimensions’ free edition and Altmetric free researcher tools ensure that the whole community can benefit from analyses using open and licensed data. Figshare and Symplectic are both key systems for research organizations to collect, enhance and share data from and about their research. Digital Science never asserts ownership of the data added to these systems and always ensures that the community is able to extract their own data from them without friction. Overleaf is a platform for collaborative writing, allowing researchers to express and develop their ideas together, and Writefull brings down barriers between researchers by improving communication through enhancement of language, together helping the community to share and innovate.

2. Participating in open infrastructure: We commit to support the use of open standards and to build, contribute to, and extend open infrastructures.

Research only works if we can collectively contribute and build – that requires shared trust. The adoption of open standards ensures the most efficient flow of data and information to allow the possibility of maximal benefit from data. The use of well-maintained, stakeholder-led, open infrastructures ensures transparency and clarity of provenance, which provides the trust framework to allow the crystallization of benefit.

At Digital Science, we build on open formats, open standards, open data, and include open identifiers in all our products wherever possible. We enhance data to add value to stakeholders across a diverse, global research landscape, while making the open data on which our products are built available back to the community that created it.

Digital Science has been at the forefront of innovation in research infrastructure since its inception. We created GRID, the Global Research Identifier Database, and then made these data available to the community in 2015. We made the licence more permissive (changed to CC0) in 2016 to allow the Research Organization Registry (ROR) to be seeded with these data. The ROR dataset now forms a key part of OpenAlex’s infrastructure, and GRID (with a mapping to ROR) continues to power Dimensions.

The next generation of technologies that develop around the scholarly record will ensure that research is both human- and machine-readable. The infrastructures in which we invest must be open and neutral, allowing both human and machine readability.

3. Stakeholders’ primacy: We believe that stakeholder benefits should be at the forefront.

The aspiration of the global research community has always been the pursuit and sharing of knowledge, with the aim to operate beyond politics and beyond borders. In recent years, it has become clear that, even when global relationships are more strained politically, research relationships transcend artificial barriers.

At Digital Science, we believe that engaging our stakeholders and ensuring that their opinions are represented in our work is critical to creating value as we view research as an ecosystem rather than a “sector” or a “market”. We believe that we can participate positively in the research ecosystem by being innovative and helping stakeholders experiment and increase their level of innovation. Digital Science’s core values:

Brave in the pursuit of better;
Always open-minded;
Collaborative and inclusive;
From and for the community

are a key articulation of this belief. They are at the centre of everything that we do.

In addition to our ongoing product-feedback processes, regular user days, participation in industry conferences, and direct engagement with the wider research and scientometrics community, Digital Science will be launching a senior advisory board of representatives from different parts of the research ecosystem to help ensure that we continue to listen to, and align with, the goals of the community that we serve.

4. Establishing trust: We believe that a trusted stakeholder in the research ecosystem must be responsible, transparent and sustainable.

In research, as in our increasingly complex world, context is everything. Understanding the provenance of data, understanding the nature of the processing applied to it, as well as its origin, is critical. Transparency is also important in gaining a shared understanding of the resilience and impact of stakeholders in the research ecosystem.

At Digital Science, we have worked to increase the transparency of data provenance through our publication of research that we carry out. In the spirit of always stretching ourselves, we will start publishing an annual report that increases transparency for Digital Science’s stakeholders.

We also carry our research into prospective ways in which we can help the research community – our recent work on the use of machine learning for research classification was shared as a preprint and published through an open access journal; further work on research integrity and papermill detection has been shared through the same approach.

Latest Articles

Welcome to… Research Transformation

Read more

Putting Data at the Heart of your Organizational Strategy

Read more

The State of Open Data

Read more

The post Open Principles appeared first on Digital Science.

Putting Data at the Heart of your Organizational Strategy

Simon Linacre — Mon, 08 Jan 2024 07:34:22 +0000

‘Have you done your due diligence?’ These six words induce fear and dread in anyone involved in finance, with the underlying threat that huge peril may be about to engulf you if the necessary homework hasn’t been done. Due diligence in the commercial sphere is a hygiene factor – a basic, if detailed, audit of risk to ensure that all possible outcomes have been assessed so nothing comes out of the woodwork once an investment has been made.

The question, however, is just as important for academic institutions looking to check the data on their research programs: have you done your due diligence on that? If not, then a linked database such as Dimensions can help you.

Strategic Objectives

At a recent panel discussion hosted by Times Higher Education (THE) in partnership with Digital Science on optimizing research strategy, the question of due diligence was framed by looking at the academic research lifecycle and the challenges emanating from the increased amount of data now accessible to universities. More specifically, how universities could extract and utilize verified data from the ever–increasing number of sources they had at their disposal.

Speaking on the panel, Digital Science’s Technical Product Solutions Manager Ann Campbell believes there are numerous benefits to using new modes of data to overcome problems associated with data overload. “It’s important to think holistically, of not only the different systems that are involved here but also the different departments and stakeholders,” she said. “It’s better to have an overarching data model or a perspective from looking at the research life cycle instead of separate research silos or different silos of data that you find within these systems.”

The panel recognized that self–reporting for academics could lead to gaps in the data, while different impact data could also be missed due to a lack of knowledge or understanding on behalf of faculty members.

Digital Science seeks to address these problems by adding some power to its Dimensions linked database in the shape of Google BigQuery. By marrying this computing power to the size and scope of Dimensions, academics and research managers are empowered to identify specific data from all stages of the research lifecycle. This allows researchers to seamlessly combine external data with their own internal datasets, giving them the holistic view of research identified by Ann Campbell in the discussion.

Accessing Dimensions on Google BigQuery.

Data Savant

The theme of improving the capabilities of higher education institutions when it comes to data utilization has been most vividly described by Ann Campbell in her November presentation to the Times Higher Education Digital Universities conference in Barcelona in October. Memorably, she compared universities’ use of data to the plot of popular TV drama Game of Thrones. Professors as dragons? Rival departments as warring families? Well not quite, but what Ann did observe was that there are many competing elements within HEIs – research management, research information, academic culture, the library – and above them are senior management who have key questions that can only be answered using data and insights across all of them:

Which faculties have a high impact? Should we invest more in them?
Which faculties have high potential but are under–resourced?
How can we promote our areas of excellence?
How can we identify departments with strong links to industry?
What real–world research impact can we feed back into our curriculum?
Are we mitigating potential reputational risk through openness and transparency?

Bringing these disparate challenges together requires a narrative, which is another reason why the Game of Thrones analogy works so well as we see that for all the moving parts of the story to work, a coherent story is required. This can be how an institution’s research culture strategy is working with a rise in early career international collaborations, how an increase in new funding opportunities followed a drive to increase interdisciplinary collaborations, or how the global reputation of a university could be seen to have improved its impact rankings position due to increased SDG–related research.

Any good story needs to have the right ingredients, and where Digital Science can really help an institution is to bring together those ingredients from across an organization into viewable and manageable narratives.

Telling Stories

But the big picture is not the whole story, of course. There are other, smaller narratives swirling through HEIs at any given time that reflect the different specialisms, hot topics or focus areas of the university. Three of these focus areas most commonly found in modern universities are research integrity, industry partnerships and research impact, and these were discussed recently at another collaborative webinar between THE and Digital Science: Utilising data to deliver research integrity, industry partnerships and impact.

This panel discussion was a little more granular, and teased out some specific challenges for institutions when it came to data utilization. For research integrity, certain data relating to authorship can be used as ‘trust markers’, based around authorship, reproducibility and transparency. Representing Digital Science, Technical Product Solutions Manager Kathryn Weber–Boer went through the trust markers that form the basis of the Dimensions Research Integrity solution for universities.

But why are these trust markers important? The panel discussion also detailed that outside universities’ realm of interest, both funders and publishers were increasingly interested in research integrity and the provenance of research emanating from universities. As such, products like Dimensions Research Integrity were forming a key part of the data management arsenal that universities needed in the modern research funding environment.

In addition, utilization and scrutiny of such data can help move the dial in other important areas, such as changing research culture and integrity. Stakeholders want to trust in the research that’s being done, know it can be reproduced, and also see there is a level of transparency. All of these factors then influence the promotion and implementation of more open research activities.

Another important aspect of research integrity and data utilization is not just having information on where data is being shared in what way, it is also whether it is being shared as it has been recorded as, and where it is actually located. As pointed out in the discussion, Dimensions is a ‘dataset of datasets’ and allows the cross–referencing of these pieces of information to understand if research integrity data points are aligned.

Dimensions Research Integrity trust markers.

Positive Outlook

Discussions around research integrity and data management can often be gloomy affairs, but there is some degree of optimism now there are increasing numbers of products on the markets to help HEIs meet their goals and objectives in these spheres of activity. Effective data utilization will undoubtedly be one of THE critical success factors for universities in the future, and it won’t just be for the effective management of issues like research integrity or reputations. With the lightning fast development, adoption of Generative AI in the research space and increasing interest in issues like research security and international collaboration, data utilization – and who universities partner with to optimize it – has never been higher up the agenda.

You can view the webinars here on utilizing new modes of data and delivering research integrity.

Learn more about how Dimensions can help you

About the Author

Simon Linacre, Head of Content, Brand & Press | Digital Science

The post Putting Data at the Heart of your Organizational Strategy appeared first on Digital Science.

Standing on the Digits of Giants: Another Excellent Cross-Stakeholder Discussion in Scholarly Communication

Phill Jones — Thu, 17 Mar 2016 14:53:33 +0000

The destruction of the Library of Alexandria. from ‘Hutchinsons History of the Nations’, c. 1910

Last week, I spoke at an ALPSP seminar that was jointly organized by the Digital Preservation Coalition (DPC) and ably moderated by William Kilbride, who’s the Executive Director of the DPC. Kilbride stated two of the challenges that the DPC faces in its mission to ensure preservation of the scholarly record, are difficulties in engaging with publishers and getting caught up in the Open Access (OA) debate. Specifically, he was interested in knowing how publishers think about preservation of the scholarly record. Does the industry think that the problem is solved? Does it need to be solved? Obviously, it does.

The subject of my talk was, Transformations in Scholarly Communications. That’s a pretty broad title given the storied history of the industry. I eschewed the temptation to give a history lecture and chose instead to focus on how scholarly communication is currently changing. Specifically, I talked about the growth of open science and open data, the reasons why there is currently a surge of interest, and how we might change current workflows and incentive structures to enable it. Specifically, I make the argument that data sharing workflows have to be as simple and intuitive as possible, and fully integrate things like metadata creation and structure compliance from the point at which data is created. The slides for my talk as well as the audio is available, along with all the others from the day, at the event page on alpsp.org. You can play a fun game of guess when the slide transition happened.

Like all the best events, last week’s seminar had a diverse lineup of speakers including librarians, archivists, publishers, technologists and academics. Robert Gurney, Professor of Earth Observation Science at the University of Reading gave an excellent talk on the subject of open data in climate science. With all the concentration on the behaviours of those in the life sciences, I think that the perspectives of those in the physical sciences are occasionally overlooked. Particularly, the idea that researchers don’t want to share data for fear of being scooped is a distinctly alien one to Professor Gurney, who explained how the sharing of data is inherent to the way his discipline operates.

Peter Burnhill, Director of EDINA, who I seem to only see at conferences despite the fact he happens to work five minutes walk from where I live, talked about maintaining the integrity of the scholarly record. I’ve heard that turn of phrase a lot over the past couple of years, but usually in reference to the threat of so-called predatory publishers. Burnhill didn’t talk about that, instead he pointed out that because we’re moving towards a scholarly communication system that treats data and other digital outputs as part of the legitimate scholarly record, we have to make sure that those resources are both preserved and conserved. That is to say, for data citation to be meaningful, the link to the data must point to a resource that still exists and the content of that resource cannot have changed. The popular terms for these phenomena are ‘link rot’ and ‘content drift’. Burnhill points out that by design, the web is dynamic and content changes over time. As a remedy, he made the pleasant analogy that like fish, data must be flash frozen when captured to preserve it.

Other highlights from the day included the Regional Director for Europe at ORCID, Josh Brown’s observation that it’s already possible to assign identifying metadata for researchers, institutions, grants and data to specific research projects, which we can define or specify by the DOI of the version of record.

Wendy White, Associate Director of the Hartley Library at Southampton University identified the theme of the day, which as she put it was about being in the researcher space…[to]…capture the most effective metadata. In other words, the need to integrate data sharing and preservation into researchers workflows. At the same time, both she and Sarah Callaghan, Senior Research Scientist at STFC and Editor-in-Chief of Data Science Journal reminded us that the purpose of open data is to enable it to be understandable and reusable in both the short and long term.

White put the challenge into context when she asked whether a future historian, a thousand years from now might be able to read and understand a digital copy of the United Nations Universal Declaration of Human Rights. Aside from technical and language considerations, there’s also cultural context. Would a person 1,000 years from now know what the UN was, or indeed what the concept of a right is? Clearly, if we’re serious about preserving the scholarly record, we’re only just beginning to tackle the challenges that are involved.

The post Standing on the Digits of Giants: Another Excellent Cross-Stakeholder Discussion in Scholarly Communication appeared first on Digital Science.