Simon Porter Articles - TL;DR - Digital Science

Presenting: Research Transformation: Change in the era of AI, open and impact

David Ellis — Mon, 28 Oct 2024 09:45:00 +0000

Research Transformation: Change in the era of AI, open and impact.

As part of our ongoing investigation into Research Transformation, we are delighted to present a new report, Research Transformation: Change in the era of AI, open and impact.

Within the report, we sought to understand from our academic research community how research transformation is experienced across different roles and responsibilities. The report, which is a mixture of surveys and interviews across libraries, research offices, leadership and faculty, reflects transformations in the way we collaborate, assess, communicate, and conduct research.

The positions that we hold towards these areas are not the same as those we held a decade or even five years ago. Each of these perspectives represent shifts in the way that we perceive ourselves and the roles that we play in the community. Although there is concern about the impact that AI will have on our community, our ability to adapt and change is reflected strongly across all areas of research, including open access, metrics collaboration and research security. That such a diverse community is able to continually adapt to change reflects well on our ability to respond to future challenges.

Key findings from the report:

Open research is transforming research, but barriers remain

Research metrics are evolving to emphasize holistic impact and inclusivity

AI’s transformative potential is huge, but bureaucracy and skill gaps threaten progress

Collaboration is booming, but increasing concerns over funding and security

Security and risk management need a strategic and cultural overhaul

We do these kinds of surveys to understand where the research community is moving and how we can tweak and adapt our approach as a company. We were very grateful to the great minds who helped us out with a deep dive into what has affected their roles and will affect their roles going forward. Metrics, Open Research and AI are very aligned with the tools that we provide for academics, and the strategy we have to make research more inclusive, transparent and trustworthy.

You can read the report here

The post Presenting: Research Transformation: Change in the era of AI, open and impact appeared first on Digital Science.

Welcome to… Research Transformation!

Aileen Irons — Mon, 21 Oct 2024 13:15:18 +0000

Open research is transforming the way research findings are discovered, shared and reproduced. As part of our commitment to the Open Principles and research transformation, we are looking into how open research is transforming roles, approaches, policies and, most importantly, mindsets for everyone across the research landscape. See our inspiring transformational stories so far.

Academia is at a pivotal juncture. It has often been criticized as slow to change, but external pressures from an increasingly complex world are forcing rapid change in the sector. To understand more about how the research world is transforming, what’s influencing change, and how roles are impacted, we reached out to the research community through a global survey and in-depth interviews.

Research Transformation stories so far…

How has innovation shaped Open Research? What does the future hold – especially with the impact of AI? Here’s Dan Valen speaking about Figshare’s key role, with innovation helping to transform the research landscape.

Digital Science has always understood its role as a community partner – working towards open research together. Here’s some ways in which we have helped to transform research over the last 14 years.

In our first piece, Simon Porter and Mark Hahnel introduce the topic and detail the three areas the campaign will focus on.

Making data more usable
Opening up channels & the flow of information
Transforming data through innovation & AI
Maintaining trust & integrity

Seeing both perspectives
What success looks like for knowledge transfer
Evolving roles and the role of people in bridging gaps

Research Transformation White Paper
How have roles changed:
- In Academia?
- In Publishing?
- In Industry?
State of AI Report
How are we using AI in our research workflows?

Research Transformation

The way we interact with information can amplify our ability to make connections, and in doing so transforms how we understand the world. Supercharged by the AI moment that we are in, the steady march of digital transformation in society over the last three decades is primed for rapid evolution. What is true for society, is also doubly so for research. Alongside ground-breaking research and discoveries is the constant invitation to adapt to new knowledge and abilities. Combine the general imperative within the research sector to innovate with the rapidly evolving capabilities of generative AI and it is safe to say that expectations are high. Taking effective advantage of new possibilities as they arise however, requires successful coordination within society and systems.

There is an art to transformation, and understanding the mechanisms of transformation places us in the best position to take advantage of the opportunities ahead.

In this series, we specifically seek to explore Research Transformation with an eye to adapting what we already know to the present AI moment. Transformation in Research is not just about digital systems, but it is also about people and organisations – crossing boundaries from research to industry, emerging new research sectors, creating new narratives and adapting to the possibilities that change brings.

At Digital Science, we have always sought to be an integral part of research transformation, aiming to provide products that enable the research sector to evolve research practice – from collaboration and discovery through to analytics and administration. Our ability to serve clients from research institutions to funders, publishers, and industry has placed us in a unique position to facilitate change across the sector, not simply within silos, but between them. In this series, we will be drawing on our own experiences of research transformation, as well as inviting perspectives from the broader community. As we proceed we hope to show that Research Transformation isn’t just about careful planning, but requires a sense of playfulness – a willingness to explore new technology, a commitment to a broader vision for better research, as well as an ability to build new bridges between communities.

1. The story of research data transformation

In the first of three themes, we will cover Research Transformation from the perspective of the data and metadata of research. How do changes to the metadata of research transform our ability to make impact, as well as see the research community through new lenses? How does technology enable these changes to occur? Starting almost from the beginning, we will look at how transitions in publishing practice have enabled the diversity of the research workforce to become visible. We will also trace the evolving story of the structure of a researcher’s papers, from the critical use of identifiers, to adoption of the credit ontology, through to the use of trust markers (including ethics statements and data and code availability, and conflict of interest statements.) The evolving consensus on structured and semi structured nature of research articles changes not only the way we discover, read and trust individual research papers, but also transforms our ability to measure and manage research itself.

Our focus will not only be reflective, but will also look forward to the emerging challenges and opportunities that generative AI offers. We will ask deep questions about how research should make its way into large language models. We also explore the new field of Forensic Scientometrics that has arisen in response to the dramatic increase in bad faith science in part enabled by generative AI, and the new research administration collaborations that this implies – both with research institutions and across publishing. We will aso offer more playful, experimental investigations. For example, a series on ‘prompt engineering for librarians’ draws on the original pioneering spirit of the 1970’s MEDLARS Analysts to explore the possibilities that tools such as OpenAI can offer.

2. The story of connection

Lifting up from the data, we note that a critical part of our experience of research transformation has been the ability to experience and connect with research fromshifting perspectives. In this second theme exploring research transformation, we aim to celebrate the art of making connections, from the personal transformations required to make the shift from working within research institutions to industry, through to the art of building research platforms that support multiple sectors. We also cover familiar topics from new angles, For instance, how do the FAIR data principles benefit the pharmaceutical industry? How do we build effective research collaborations with emerging research sectors in Africa?

3. The story of research innovation

In our third theme, we will explore Research Transformation from the perspective of innovation, and how it has influenced the way research is conducted. Culminating in a Research Transformation White Paper we will explore how roles have changed in academia, publishing, and industry. Within this broader context of Research transformation, we ask ‘How are we using AI in our research workflows?’ How do we think we will be using AI in years to come?

Of course, many of us in the Digital Science community have been engaging with different aspects of research transformation over many years. If you are keen to explore our thinking to date, one place that you might like to start is at our Research Transformation collection on Figshare. Here we have collated what we think are some of our most impactful contributions to Research Transformation so far. We are very much looking forward to reflecting on research transformation throughout the year. If you are interested in contributing, or just generally finding out more, why not get in touch?

The post Welcome to… Research Transformation! appeared first on Digital Science.

The Barcelona Declaration… exploring our responsibilities as metadata consumers

Simon Porter — Wed, 10 Jul 2024 20:22:50 +0000

Towards creating responsible metadata consumers…

The first commitment of the Barcelona Agreement articulates that, ‘We will make openness the default for the research information that we use and produce’, but who ‘we’ are is critical in understanding all of our roles and responsibilities in the research ecosystem. Funders, publishers, infrastructure providers, institutions and researchers all have different ways of interacting with data in their contexts as producers, consumers and aggregators of data.

The Barcelona Declaration is perhaps the first document to begin to frame community responsibility with regards to consuming open metadata. Yet, it is just that, a beginning – we believe that understanding with granular detail what should be expected from each part of our ecosystem is critical in making Barcelona actionable, to drive us forward into a more open metadata landscape. Indeed, open metadata is only important if we commit to using it in our practice, allowing it to shape the way that we interact across the research world.

A commitment to consume, however, still requires us to pay attention to the type of open metadata that we use, the contexts in which we apply it, and the expectations that we place on others when doing so. Without explicitly articulating our roles both as creators as well as consumers of research metadata, we risk creating an open, yet untrusted research landscape.

Not all metadata are the same

There is a fundamental asymmetry between production and consumption (and also aggregation). Whilst the responsibilities associated with creating metadata are relatively easy to articulate, the responsibilities around consuming and aggregating metadata are not so well thought through as, to this point in time, this has been the less proximate issue. (Indeed, Barcelona makes it clear that we have reached a milestone in that we now need to consider this issue.) We argue that responsibilities around consumption are contextual in nature, depending on the provenance of the metadata itself, and work needs to be put into articulating these responsibilities for each participant and use case. In the context of the recent Barcelona Declaration then, it is useful to explore some of the different ways metadata can be created and then exploring what responsibilities could result for consumers.

Within the Barcelona Declaration there are (at least) three different sorts of metadata records that are implicitly referred to:

Open metadata records

Open metadata records are those that have been created from inception with open research principles in mind. For example, a publication created under these principles will have an ORCiD associated with each researcher and a ROR ID associated with each affiliation. Within the body of the publication (and its metadata), funding organisations will be linked to their Open Funder Registry ID (or ROR ID), and the grant itself will be linking to open persistently identified grant records (for example via the Crossref grant linking system). The publication itself (along with a rich metadata representation) will be associated with a DOI, and all references that resolve to a DOI will also be openly available. When we speak about open here, we have in mind a CC0 licence for these data. Within the paper itself we might expect to see other links such as a link to a data repository, along with other trust markers that establish the provenance of the paper and situate it within the norms of good research practice. We might have similar expectations for grants, datasets, research software code, and other research objects.

Algorithmically enhanced records

Algorithmically enhanced records are metadata records that have had elements derived from algorithmic processing that was not part of the original record. The algorithm may not be open, the approach used may not be known and the probability that the metadata is correct may also not be known. (This is something of a hidden variable in many analyses today – it is generally assumed that data in an article may have statistical variances but that metadata describing an article does not.) Many publication records that have been created over time do not meet our current requirements for metadata openness. Either the technology (or identifier infrastructure) did not exist at the time that they were created, or good metadata practices have yet to take hold within the context that the record was created. For records such as these, algorithms are used to enhance the record with identifiers. Prominent examples include algorithms that are used to identify institutional affiliations, but also to reconstruct researcher identities. Algorithms can also be used to enhance the description of a record by adding links to external research classifications that would never have existed in the original metadata.

This type of data is likely to become more and more commonplace as LLMs and other AI systems are becoming more easily and cheaply available. And hence, it is likely that for some years to come metadata will have inbuilt statistically generated inaccuracies which may be ignored by the community at large, if they can be proven to be negligible in key analyses.

Institutionally enhanced metadata records

Institutionally enhanced metadata records are those enhanced through university processes for the purposes of institutional and government reporting. These records, harvested from multiple sources, or manually curated, may have additional metadata associated with them. An author on a paper might be associated with an institutional ID, new research classifications might be added with links to dataset. These institutional records might be made public through institutional profiles or syndicated to larger state or national initiatives.

What are our responsibilities when using and reusing research metadata?

The text of the Barcelona Declaration treats all three types of metadata that we have defined above to be on an equal footing: To be shared under a CC0 licence, allowing an unrestricted ability to reuse. Issues of licence aside, the way we reuse metadata should be informed by the provenance of the created information.

When considering how to implement the objectives of the Barcelona Declaration then, it is worth thinking carefully about a general approach to the responsibilities associated with reuse. As with the Barcelona Declaration, we propose these as a beginning and a discussion rather than an absolute. Refining these responsibilities will take community discussion.

Here are three responsibilities that we think would be useful to begin the conversation:

Responsibility 1. The purpose for which a piece of metadata is intended to be used must place a limit on both the scope (types of interpretation) and range (geographical, subject or temporal extent) under which it can be responsibly used

Beyond considerations of openness, the context of the data that is being propagated needs to be considered. Metadata is generated for a purpose, and that purpose defines the accuracy and care to which the metadata is applied. It also defines the limits and responsibilities for maintaining its accuracy.

For institutions, the Barcelona Declaration explicitly identifies Current Research Information Systems (CRIS systems) as one mechanism to make research information open. It is required that all relevant research information can be exported and made open, using standard protocols and identifiers where available. This requirement builds on a movement, initially gaining traction around 2010 with the VIVO and Harvard Catalyst profiles projects funded by the NIH. The key use cases for these public profiles has been expertise finding, either at the institution, state, or national level. The key insight of this movement is that information collected for internal reporting and administrative purposes could also be used to create public profiles – a single source of information efficiently driving multiple uses. In some cases the approach of CRIS aggregated information has been taken further to create state-based portals such as the Ohio Innovation Exchange, or national open research analytics platforms such as Research Portal Denmark. Although successful, the nature of the provenance of these records means that there are practical limitations to the way the information can be reused beyond these applications.

Implicit in the name of a CRIS is a key limitation. CRISs are used to maintain/modify/aggregate information about ‘current’ researchers. There is (for an institution) no implied duty of care for the maintenance of public information about past staff. Indeed, from the perspective of expertise finding it may be inconvenient to have these profiles remain discoverable in the same way.

Metadata within CRIS systems are also often collected for a politically aligned purpose such as the demonstration of value to voters (which is often presented as a national purpose in the form of government reporting), and can lead to unbalanced metadata records when used in a broader context. For instance, publications recorded for the purposes of national reporting might very accurately record the researcher affiliations within a country, but will be significantly less accurate on international affiliations for whom the reporting exercise has little bearing.

Records can become unbalanced in other ways too: research can be classified to reflect the goals of the individual reporting exercises (a point that we wrote about in detail in our article on FoR Classification) – both in terms of the classifications that are applied, and the time and effort to which those classifications are maintained, and the scope of research classified. If there is a purpose to reusing this classification metadata in a different context, the provenance under which it was recorded must be maintained and understood.

A potential interpretation of the Barcelona Declaration could be that all metadata must be curated with the understanding that it will be used and consumed within the broader research community in perpetuity. If this is the intended interpretation, then we should be realistic about the extra effort that this requires, both in terms of effort and the structures that should be put around the codification and documentation of data curation approaches. This interpretation also instantly begs several practical questions: Does the storing, and passing on of a metadata record imply a responsibility to keep it up to date forevermore? What inequalities would this interpretation place on the broader research community? Specifically, does this interpretation advantage the “metadata rich” (those with the infrastructure to invest in improving records) and disadvantage the metadata poor (those who have poor embedded mechanisms or post hoc mechanisms for the curation of metadata)? This concern is not hypothetical, as current lack of visibility of African research has hindered efforts to comprehensively understand, evaluate and build upon African nations.

There are of course already remedies to address many of the persistence challenges associated with making institutional metadata open. One mechanism is to transfer the responsibility for the metadata from the institution to the individual researcher via their ORCiD. Within this workflow, researchers remain responsible for maintaining a public record of their outputs, and institutions can maintain responsibility for asserting when a researcher worked for them. Coupled with a national push to publish research in open access journals and repositories, the Barcelona Declaration complements the approach taken by national persistent identifier strategies as they move towards PID-optimised research cycles.

Responsibility 2. Machine-generated metadata should not be propagated beyond the systems for which it was created, without human curation or verification

Machine-generated metadata, such as the association of an institutional identifier to an address expressed as a string, research classifications, or algorithmically determined research IDs are all generated within precision and recall tolerances. These tolerances are set by system providers, and are aligned with the requirements of their users. Individual statements, however, are not guaranteed against any particular record. What is more, algorithmically generated data can be regenerated as methods improve, potentially invalidating records from previous runs. This notion defines a hitherto overlooked metadata provenance. Without accompanying provenance, metadata can be considered to have ‘escaped’ from its originating system and runs the risk of being “orphaned”, with no ability to be updated or appropriately contextualised. To move an algorithmically generated metadata record out of the context of the system for which it was created must be to take ownership of the provenance and the statements that can result from its use.
Whilst not so much of a problem for publications (an updated version of the record can always be requested using the DOI,) this is particularly problematic for algorithmically generated researcher IDs, as (in the case of an identifier that refers to more than one person), improved algorithms could radically change the identity that the researcher that the identifier refers to. In the case of a researcher record that split because it was really two researchers, the existing researcher ID could end up pointing to a different researcher.

The Barcelona Declaration is right to focus on data sharing practices using standard protocols and identifiers where available. But here too, care must be taken to assess where metadata has come from as many algorithms associate a persistent identifier with a metadata record. For instance, if an ORCiD is used instead of an internal researcher ID to refer to a researcher, but the set of assertions that are produced have been algorithmically generated, then communicating these assertions outside of the system that they were generated breaks the model of trust established by ORCiD.

Responsibility 3. Ranking platforms should be independent of the data aggregations from which they are drawn

A key use case enabled by algorithmically generated metadata is comparative research performance assessment, often encoded in rankings systems. At a first glance, this responsibility may appear to be incompatible with responsibility 2 – if metadata should be strongly coupled to its provenance and context, why should it be divorced from the ranking use case? We regard this issue as being similar to a separation of evaluation bodies and those being evaluated. Because of the different choices that different scientometric platforms make with regards to precision on recall, the same ranking methodology can lead to different results when implemented over different scientometric platforms. However, rankings systems are often entangled with single systems, providing perverse incentives for institutions to engage (both in terms of investment and data quality feedback) with one dataset over another.

One benefit of the focus on persistent identifiers that the Barcelona Declaration is that information assessment models can (and should) be constructed without reference to individual scientometric datasets. By decoupling data aggregations from the rankings themselves, we allow new data aggregation services to emerge without locking in single sources of truth. In this way scientometric data sources should be treated like Large Language Models LLM – extraordinarily useful, but with an ability to swap out one for another. Perhaps we need to add another R (replaceable) to FAIR data principles for scientometric datasets.

The decoupling of data from ranking also has another effect, in that it discourages investment in the data quality of a single system, and focuses on either improving data at the source (for instance Crossref) or by improving independent disambiguation algorithms (such as those offered by the Research Organization Registry).

To develop an independent rankings infrastructure will require agreement to use not only the persistent identification infrastructure that we have, but a commitment to develop systems that refer to external classification systems.

Can we go further? Building on a commitment to independent rankings infrastructure for instance, Is it reasonable to expect a common query language for scientometric research and analysis across scientometric systems?

The beginning of a conversation…

Finally, from the exploration above, we hope that we have made the case that our responsibilities as metadata consumers go beyond simple considerations of licence or platform. With the current state-of-the-art in research infrastructure our experiences of how to facilitate open data are not embedded in metadata and do not travel with it. How we use metadata places unclear expectations on others, and affects perceptions of trust in our analysis or in the research information system more generally. As the Barcelona Declaration moves from declaration to implementation, perhaps even blending with evolving national persistent identifier strategies, we hope that these considerations form part of the continuing conversation.

The post The Barcelona Declaration… exploring our responsibilities as metadata consumers appeared first on Digital Science.

Exploring Research Transformation through the lens of Persistent Identifiers

Simon Porter — Wed, 26 Jun 2024 12:48:47 +0000

When considering research transformation within the research ecosystem, it is hard to think of a greater change than the rise of persistent identifiers (DOIs and ORCiDs being two prominent examples.) Arising out of the original digital transformation of research in the internet age, persistent identifiers emerged out of the need to refer to the same digital object over time, even though every single piece of infrastructure used to support it might change.

From a narrow perspective persistent identifiers (PIDs) might be understood as a response to a technical problem, however the key innovation surrounding persistent identifiers has been an ever- expanding social infrastructure – a constant invitation to collaborate, with roots that stretch much further back than the internet.

It begins in 1973…

Excerpt from the National Library of Medicine Technical Bulletin 1972

Although you might think of persistent identifiers as something particularly linked to the digital age, I think a useful starting point is to consider the creation of perhaps the first citation identifier in 1973. Within a National Library of Medicine (NLM) document from 1972 is the following note:

“With the generation of the 1973 MEDLINE database, citations will carry a unique identification number which consists of:

The International Standard Serial Number (ISSN)
Volume number of the journal
Beginning page of the article
A two digit number for the year…

…This identifier will ultimately serve as a bridge between various machine readable data bases to allow for the interchange of bibliographic data between various of the abstracting and indexing services. In addition, it is also intended to serve ultimately as a link between the Library’s retrieval and document delivery services. Although these two uses may not be fully implemented for some time, we have begun to carry the identifier in the database as a first step toward this end. We believe this use to be the first major operational use of the recently implemented International Standard Serial Number.”

Within this short statement are some of the key ideas behind the persistent identifiers that we use today. Not only does the identifier point to something, it is an open bridge between different knowledge sets – an invitation to connect. Reading further, we see that this identifier is built upon the fruits of a significant international collaboration between publishers and libraries. The newly created ISSN – established to identify journals as they transform through time – provides a language to identify the same research articles across different databases and representations. By becoming the first major adopter of the recently implemented ISSN, the NLM also exhibits another common feature of the persistent identifier story – the capacity to invest and place trust in the future success of community-developed infrastructure.

While the implementation of the first citation identifier was short-lived (it seems to be deleted from MEDLINE files by 1979), the service in which it was embodied was certainly not. The MEDLINE index, now most prominently accessed through PubMed, and syndicated through almost every other comprehensive research search index, is today an essential part of any medical researcher’s toolkit, and the use of PubMed IDs to refer to publications is commonplace.

2002: From catalogue to community….

With the arrival of the internet in 1995 and the move towards digital rather than physical, the need for a persistent identifier had resurfaced, this time to solve the problem of being able to persistently locate the digital representation of an article, as websites, digital infrastructure and even publisher ownership changed around them.

Unlike the citation identifier proposed in 1973, the implementation of Digital Object Identifiers in 2002 would do more than just bridge representations together. Instead, facilitated by Crossref, and scaffolded by emerging common understanding of how to digitally describe a research article (JATS-XML), DOIs for research articles would encapsulate a representation of the object that they describe, along with a persistable link to where you could find it. More than just a technology, facilitated by CrossRef, DOIs effectively shift responsibility for journal article metadata from citation indexes back to publishers. This responsibility extends to not just assigning DOIs to their research articles, but also to using DOIs to reference other articles. To put it another way, DOIs are made real through shared meaning and practice within the publishing community.

Although initiated within and between publishers, DOIs provided other invitations to collaborate within the broader research community. Current Research Information Systems (such as Symplectic Elements) could enable institutions to choose which representation of a publication they wanted to include in their system. Common identifiers provided institutions with a choice over the publication data providers that they consume. Institutions are able to create their own representation of publications - enhanced with links to university staff and local research classifications, and yet linked by a DOI, able to connect this representation to a broader ecosystem of metrics offered by an expanding set of service providers.

Over the next two decades, outside of publications, the use of DOIs in Wikipedia, policy documents, Twitter/X, Facebook etc., creates new invitations and possibilities for services that track alternative metrics (such as Altmetric)

2009: From publications to datasets…

From the perspective of 2024 the idea that you should also be able to cite datasets as well as publications seems natural, however this is the result of concerted efforts to transform research practice over the last 20 years. Finding a growing network of support in 2009, the initiative is not led by publishers this time, but instead by the library community. Although data repositories had already existed within institutions with local identifiers, the idea of of a DOI for datasets brings with it an associated set of expectations - datasets should be able to accumulate metrics, we should be able to assess their impact on research, and as with publications researchers should receive credit for their production.

As with DOIs for publications before them, DataCite DOIs - established by the library community - create an open invitation for the broader community to innovate. The ability and expectation to create citable DOIs for datasets creates a global need for data repository infrastructure, and the rationale for global generalist repositories like Figshare, Zenodo, the Open Science Framework (OSF), Dryad and others.

2010: From objects to people

While the conversations to create identifiers for DOIs for publications and datasets emerged from localised homes within the research community (publishers, libraries), discussions on establishing a common persistent identifier for researchers reached out to all parts of the research community at once. In many ways ORCiD sought to establish a common research information citizenship right from the beginning, bringing together research institutions, funders, publishers, researchers(!), and service providers. ORCiDs had the potential to save researchers time and effort, but only if all parts of the research community moved together. As ORCiDs were owned by researchers the success of this initiative depended on (and continues to depend on) the constant engagement with and utility to researchers themselves.

Publishers and funders played significant roles in providing early compliance reasons for ORCiD adoption. Perhaps unsurprisingly for an identifier about people, different communities of researchers adopted ORCiDs at different rates. Adoption rates were different by different fields of research, but also by country. Regional strategies had a significant impact on the rate of ORCiD adoption, and these efforts continue today in the form of National PiD Strategies.

For services providers ORCiD provided not much as an invitation to collaborate but an imperative. Current Research Information Systems that could integrate with ORCiD could not only save researchers time by downloading relationships to publications that they had already associated with themselves elsewhere, but also help curate and add value to a researcher’s record.

The idea of an ORCiD is that it would go anywhere a researcher could authenticate. For a generalist repository like Figshare, it means that ORCiDs linked to a user’s account (along with those of their collaborators) form part of the metadata associated with the published object. The ability to associate an ORCiD with a service account provided other benefits - such as access to Overleaf accounts - as a user moved between one institution to the next.

2019: From people to institutions

In 2019 the research organisation registry was created to provide identifiers to institutions involved in research, evolving from a well-established need to unambiguously describe and compare the research profiles of institutions. Without an open and, critically, universally adopted set of identifiers for institutions, questions of institutional assessment are limited to the boundaries of individual scientometrics data sets (such as Dimensions, Web of Science.) (For more on the creation of ROR see “Are you ready to ROR?” (Scholarly Kitchen, 2019).

Unlike other persistent identifiers mentioned so far, the relationship between a ROR and an institution is slightly more distant. The Research Organization Registry is seeded from the independently created Global Research Identifier Database created by Digital Science in order to describe all of the research institutions involved in research within the Dimensions database. A ROR is not part of an institution itself, a ROR ID is created in response to an institution participating in research.

A central use case for ROR IDs is address disambiguation. When applied retrospectively to the research corpus via algorithms they provide a common lens through which to understand institutional contributions to research. In this sense, ROR IDs share many common attributes with externally defined research classification schemes (such as SDGs or Fields of Research). The use of algorithms to connect addresses to institutions, although powerful, introduces a new discipline to persistent identifiers, namely how to deal with assertions that are likely to be true (but not always.)

Although still new, one research transformation that ROR IDs invite is the possibility of datasource independent institutional ranking systems. If we can agree on the external identifiers used to describe research, then we can swap one provider out with another and compare results.

Understanding research through algorithms. The image above provides a representation of a university based on the internal co-authorship patterns on papers affiliated to University College London. Algorithms are used to identify people, institutions, and research classifications, and research clusters. Source (https://figshare.com/projects/What_does_a_university_look_like_/159509) .When linked to algorithms, can persistent identifiers provide a language to compare different representations derived from different scientometric datasets?

2024: What is next? From finished objects to workflows

Finally, what research transformations await from the newest persistent identifier to reach implementation?

One of the most promising developments is the emergence of Research Activity (RAiD) identifiers and their ability to represent research activities as they evolve, recording both the participants of a project, and the outputs that they are associated with. RAiDs offer the promise of providing structure for new research outputs that an activity creates, and then finally to contributing to the discovery and provenance infrastructure necessary to build trust in research.

Reaching deep into the workflows of research, RAiDs provide a forward connection between research infrastructure of research right through to publication, as well as the promise of a backward connection back to CRIS systems and funders. Continuing a change in perspective that began with ORCiD, RAiDs challenge the entire research sector to view research metadata as an active system rather than a series of static representations.

The success of RAiDs, more than the identifiers that have come before them, will rely on the imaginations of service providers to incorporate them into their workflows, and connect through to other services. The relationship between service providers and persistent identifiers has moved from invitation, to imperative, to potential catalyst.

To demonstrate the potential of RAiDs I gave a PIDFest lightning talk/demonstration that showed the promise of RAiD workflows by using the Figshare project as a proxy for a RAiD activity definition. It demonstrated how RAiD can facilitate the flow of metadata from creation through to publication - automatically creating publication authorship details from associated ORCiDs, and providing incentives for researchers to improve their ORCiD records in the process. You can see the demonstration here.

By the end of the first day of the conference, I was delighted to be given access to the actual sandbox RAiD service. I then spent the rest of the conference in a personal hackathon of one to create an actual RAiD workflow demonstration - using a RAiD to control the users on a figshare project, and pushing author, and author contribution statements (based on project roles) into an overleaf document. You can see the resulting overleaf project here.

Of course, this is only part of the story. The agenda for PIDFest was packed with discussions on the need for persistent identifiers for instruments, samples, prizes and awards, organisms, cultural heritage objects, and more. Encouragingly, for a community that has always been about expanding the boundaries of collaboration, discussions about equity and access to persistent identifier infrastructure, and who was in the room (or online) for discussion also played a prominent role in the conference.

May the transformation continue. I am excited to see what new opportunities arise.

The post Exploring Research Transformation through the lens of Persistent Identifiers appeared first on Digital Science.

The Initial Transformation

Simon Porter — Wed, 10 Apr 2024 07:04:13 +0000

In this ongoing Investigation into Research Transformation, we seek to celebrate the art of change. How does change happen in research? What influences our behaviour? How do all of the different systems in research influence each other?

We begin our reflection on transformation with perhaps one of the most unremarked on, yet most pervasive changes in research – the switch between initials and full first names in the author records. As we will see, the shift from the formal to the familiar has been in flux from the start of scholarly publishing, however – particularly in the last 80 years – we can trace the influence of countries, fields of research, publishers and journal submission technology, funders and scholarly knowledge graphs on author name behaviours. In more recent history, we can observe that the shift towards full names has also been gendered, particularly in medicine, with men shifting towards full names earlier than women.

Why does it matter? The increase in transparency afforded by first author names is not simply a curiosity. First names, in the ethnicities and genders that they suggest, provide an (albeit imperfect) high level reflection of the diversity of experiences that are brought to research. It is just as important to see ourselves reflected in the outputs of the research careers that we choose to pursue, as the voices that represent us on panels at conferences. Framed this way, the progress towards the use of first names is part of the story of inclusion in research. The ‘Initial Transformation’ is also an initial problem.

Fortunately, the use of initials as part of author names has been in steady, if gradual, decline. The full details of the “The Rise and Fall of the Initial Era” can be found in our recent paper on arXiv: https://arxiv.org/abs/2404.06500.

Below are six observations from the paper:

The transformation from initials to full first names is part of the the broader transformation of the journal article as technology

The form of a research article itself a the technology used to encode the global norms of science. As a key building block of shared knowledge, the evolution of the form of a research article must be at once slow enough to allow the discoveries of the past to be understood today, and flexible enough to codify new patterns of behavior (such as researcher identifiers ORCiD, funding statements, conflicts of interest, authors contribution statements and other trust markers).

Over time, not only has the structure of the content of a research article evolved, the way that authors are represented has also changed. From 1945 through to 1980, we identify a period of name formalism (referring to authors by first initial and surname). This is the only period in the history of publishing where initials are used in preference to full first names. We call this period the ‘Initial Era’.

In the ‘Initial Era’, we suggest that accommodating a growing number of authors per paper on a constrained physical page size encouraged the formalism towards initials. From 1980, full names begin to be used more commonly than initials marking the beginning of the ‘Modern Era’. Within the ‘Modern Era’, name formalism continues a gradual decline through to the 1990s. In the period between 1990 through to 2003 – a period of significant digital transformation in which the research article was recast as a digital object, name formalism drops steeply. After 2003, the decline in name formalism is less steep, but steadily trends toward zero.

The story of the Initial transformation is one of different research cultures becoming homogenised

The US is the first country to shift towards the familiar, followed reasonably quickly by other western countries, with France perhaps holding out the longest. Slavic Countries are more formal for longer but also increasingly shift towards familiar names. At the bottom of the graph (see below) in green, are three countries in the Asia-Pacific region – Japan, South Korea and China. For these countries there is no concept of a first initial, and where names have been anglicised, full names were preferred.

The story of Initial Transformation highlights a discipline separation in research culture

How we name ourselves on papers has nothing to do with the type of research that we conduct, yet there are very clear differences in the rate of shift from name formalism between disciplines. Research does not change at a single pace, local cultures can impact change regardless of their relationship to the change itself.

Technology influenced our name formalism

The choice to use first names or initials has not always been a choice that resides with researchers themselves. Below we present an analysis of three journals that all went live with online journal systems in 1995-96. From the mid 70s through to 1995, journals still mostly employed typesetting houses that set the style of the journal. Even before the onset of online submission systems, journal styles influenced the way that first initials were represented. From the mid 70s these three journals take different approaches. Tetrahedron shifts from a majority initials approach, whereas The BMJ and the Journal of Biological Chemistry switch to typesetting that preferences initials. With the emergence of the internet in 1995, research articles began to be recast as discoverable landing pages, and here the Journal of Biological Chemistry switches all at once to a system that enforces full names, and The BMJ – a system that allows choice. In all cases where author choice is allowed, the trend away from formal names continues.

Changes in Infrastructure can affect how we understand the past as well as the present

Between 2003 and 2010, DOI infrastructure run by CrossRef was adopted by the majority of publishers. As part of the CrossRef metadata schema, a separate field for given names was assumed. Critically, during this transition most journals chose to implement their back catalogue, including full names where possible. We owe our ability to view full name data in the past to infrastructure changes in the first decade of the 2000s.

How were publishers able to communicate first names to the crossref DOI standard? At a layer below DOIs was another language to describe the digital structure of papers. The Journal Article Tag Suite (JATS XML), now a common standard used to describe the digital form of a journal article – aiding both the presentation, and preservation of digital content - was first released in 2003, and reflected over a decade of prior work in the industry to reexpress the journal article as a digital object. Within this standard full names were also codified, and the requirement of a publisher to preserve all digital content meant that there was an imperative to apply this standard (or at least compatible earlier versions) to their complete catalogues.

Although the communication of first names seems to have occurred reasonably seamlessly to DOI metadata, the transition of first names to the scholarly knowledge graphs of the time was slower.

MedLine (and by relation pubmed) only began adding full names to its metadata records in 2002. Journals that relied on MedLine records for discovery (and chose not to implement DOIs) did not benefit from retrospective updates.

The difference in the adoption of first names between crossref and MedLine/PubMed also highlights a risk in adopting scholarly knowledge graphs as infrastructure. Scholarly Knowledge graphs have their own constraints on infrastructure, and make decisions on what is sustainable to present. Although enormously valuable, they are a disconnection point with the sources of truth they present. We can see this split starkly if we look at publications from those journals that chose not to create DOIs for their articles, relying instead just on the services provided by MedLine.

The shift to full names happened at different rates for men and women, and at least for publications associated with pubmed, technology influenced the practice

With the benefit of gender guessing technology, we note that progress towards first names has occurred at different rates for men and women. This is particularly stark for publications in PubMed.

Why is there a jump in 2002? As mentioned above, 2002 was the year that you could start to interact with author first names, with pubmed and medline incorporating it into their search. Although we cannot draw a direct causal connection, it is tempting to make the argument that this subtle shift in critical technology used by almost all medical researchers had a small but important impact on making research more inclusive. When we look at articles that have both a PubMed ID and a DOI, we can see that in 2002 the average number of first names on papers associated with women rose by 17%, and 13% for men. This jump is not present in publications that have not been indexed by PubMed.

For medical disciplines associated with papers in pubmed, after 2002 there also is a distinct difference in the rate of first name transformation for men and women. The rate of change for men is less than half that of women, rising only 5% in 20 years, compared to 12%. For some disciplines then, this raises a methodological challenge in gender studies as (at least based on author records,) the changes in participation rates of women in science must be disentangled from changes in the visibility of women in science.

Embracing Initial Transformation

Finally, the transition from initials to first names has happened slowly and without advocacy. Whilst this has been to our advantage in identifying some of the axis along which research transformation occurs, an argument could be made that, if first names help provide us (imperfectly) access to the diversity of experiences that are brought to research, then the pace of change has not been fast enough. For instance, could more have been made of the use of ORCiD to facilitate the shift to using first names so that older works by the same researcher identified by an initial based moniker could be linked to newer works that use the researchers full first name?

The transformation away from name formalism of course does not stop at author bylines. Name formalism is also embraced in reference formats. It could be argued that even within a paper, this formalism suppresses the diversity signal in the research that we encounter. Reference styles were defined in a different era with physical space constraints. Is it time to reconsider these conventions?
Within contribution statements that use the CRediT taxonomy, initials are also commonly employed to refer to authors. Here, this convention also creates disambiguation issues when two authors share the same surname and first initials. Here too, as the digital structure of a paper continues to evolve, we should be careful not to unquestioningly embed the naming conventions of a different era into our evolving metadata standards.

The post The Initial Transformation appeared first on Digital Science.

Launching a new way to interact with scientific content on OpenAI’s ChatGPT platform

Suze Kundu — Wed, 28 Feb 2024 14:00:46 +0000

Today, Digital Science releases its first custom GPT on OpenAI’s ChatGPT platform – Dimensions Research GPT – as a free version based on Open Access content and an enterprise version calling on all the diverse content types in Dimensions from publications, grants, patents and clinical trials. In alignment with our goals to be responsible about the AIs that we introduce, we explore below some of the steps that we’ve taken in its development, explain our key principles in developing these tools, and make the context of these tools clear for the community that we intend them to serve.

For any software development company, there is an implicit responsibility to the user communities that they serve. Typically, this commitment might extend to being conscientious about how the software is developed; ensuring, to the greatest extent possible, that their software should be secure, not contain bugs, and that it will function as described to the client, would seem to be some of the basic requirements.

The rise of AI should raise the value that systems can bring to users, but it also raises the bar in the relationship between developer and user, especially with large language models (LLMs). Users need to understand how the data that they submit to the system are being used, and they also need to understand the limitations of the responses that they receive. Developers need to understand and minimise biases in the tools they create, as well as understand complex concepts such as hallucination and work out how to educate users about how they should think about trusting different types of output from their software.

All these problems are magnified tenfold when it comes to supporting researchers or the broader research enterprise. The research system is so fundamental to how society functions and progresses that we cannot afford for new technologies to undermine the trust that we have in it.

At Digital Science we believe that research is the single most powerful tool that humanity possesses for the positive transformation of society and, as such, we have a responsibility to provide software that does not damage research. Although that sounds simple, it is tremendously difficult. In an era of papermills and p-hacking, providing information tools that support research requires deeper thinking before releasing a product to users.

Beyond all the requirements that we have listed above, to support researchers and the research community, we believe that we need to:

ensure that researchers understand what uses of the system are valid and which aren’t;
sensitise users to the fact that this technology is in its early stage of development and that it cannot be completely trusted;
provide users with the ability to contextualise the output that they get so that they don’t have to trust without verification;
ensure that no groups of researchers are artificially or through commercial approaches disenfranchised or excluded from accessing this type of technology.

Many of these features have been built into the offering that we launch today: this blog attempts to address some of the points above; we are working to ensure equitable access by creating a free version; and we have made specific functionality choices to try to address our concerns with where this technology can lead. Overall, it is with some pride and much excitement that we launch Dimensions Research GPT today!

The Road to Dimensions Research GPT

Our free offering Dimensions Research GPT and its more powerful counterpart Dimensions Research GPT Enterprise are the result of a long period of testing and feedback from the community. We started developing this type of functionality in late 2022, but by summer 2023 it had reached a phase where we needed more understanding from the sector. Thus, in August 2023 we launched the Dimensions AI Assistant as a beta concept. We quickly learned that “question answering” can be challenging not just from a technical perspective (for example, providing a low-to-no-hallucination experience) but also in terms of providing users with an interface that continues to be engaging and which fuels curiosity.

In addition, we found that there is a certain “fuzziness” in querying through an LLM that doesn’t sit comfortably in an environment that involves highly structured data, such as Dimensions. That realisation led us to make certain design decisions that you’ll see informing the way that we develop both the products launched today and Dimensions in the future.

For better or worse, since the beginning of modern search in the mid-1990s we have become used to searching the web and seeing pages of search results – some of which are more relevant to our search, and some of which appear less relevant. With most LLMs, the information experience is different to a standard internet search: We ask a question and we get an answer. What’s more, we get an answer that typically does not equivocate or sound anything less than completely confident. It does not encourage us to read around a field or notice interesting articles that might not be relevant – it focuses us on the answer rather than being curious about all the things around the answer. Launching a tool that has those characteristics in a research context is not only potentially irresponsible but also dangerous. We have used that concern as a guiding principle for how we have built Dimensions Research GPT.

What is Dimensions Research GPT?

Dimensions Research GPT and Dimensions Research GPT Enterprise both bring together the language capabilities of OpenAI’s ChatGPT and the content across the different facets of Dimensions. In the case of Dimensions Research GPT, data related to research articles from the open access corpus contained in Dimensions is used to provide context to the user’s question and discover more. This free tool gives users the ability to interact with the world’s openly accessible scholarly content via an interface that ensures that answers refer back to the research that underlies the answer. This provides two important features: Firstly, the ability to verify any assertions made by Dimensions Research GPT, and secondly, the ability to see references to a set of articles that may be relevant to their question so that users continue to be inquisitive and read around a field. Basing this free tool on content that is free-to-read provides the greatest chance for equity and impact.

Dimensions Research GPT Enterprise runs the same engine and approach as Dimensions Research GPT but it extends the scope of the content that it can access to include data from the full Dimensions database covering 350 million records, including research articles, grant information, the clinical trials, and the patents. A truly fascinating dataset to explore in this new way.

Before we explore further what Dimensions Research GPT is, and the kinds of things that you can do, it is worth taking a moment to be clear about what it is not. Put simply, it is not intended for analytics. While many users are familiar with Dimensions as an analytics tool, the Dimensions Research GPT is not a tool for asking evaluative or quantitative questions. Thus, asking Dimensions Research GPT to calculate your H-index or rank the people in your field by their attention will be a fruitless task. Similarly, the system is designed to help you explore knowledge, not people; hence, if you ask Dimensions Research GPT to summarise your own work, provide rankings, or tell you who the most prolific people are in your field, you will be disappointed. Many of these use cases, with the exception of those involving H-index (Digital Science is a signatory to DORA) are already covered by Dimensions Analytics.

An example of how to use Dimensions Research GPT

We’ve covered at a high level the principles behind building a tool like Dimensions Research GPT, and we’ve also explained what it is and is not, so now we really should show you how to think about using the tool.

Below, we show a brief conversation with Dimensions Research GPT about a research area known to one of the co-authors of this blog. We encourage readers to carry out the same queries in ChatGPT or Dimensions Research GPT Enterprise and compare the answers that they receive.

Our first prompt introduces the area of interest…

Summarise three of the most important, recent applications of PT-symmetric quantum theory to real-world technologies

The references link over to Dimensions to give full contextualised details of the articles and connect over to source versions so that you can read further. Maybe we’re not from the field and we want to understand that response in simpler terms. That might look like:

Rewrite your last response at the level which a high-school student can understand and highlight the potential for application in the real world

With this query, we’ve just begun to explore the base functionality that ChatGPT provides under Dimensions Research GPT. This is just scratching the surface of the open-ended possibilities implied here.

Finally, we ask Dimensions Research GPT to speculate:

Please speculate on the potential applications of PT symmetry to medical device development, providing references to appropriate supporting literature

Again, the tool shows references that back up these speculations about these exciting potential advances.

We fully realise that this is not a panacea, but at the same time, we think that this approach is worthy of exploration and pursuit in a way that can help the research community benefit from new AI technologies in a responsible way. We’re sure that we won’t get everything right on the first attempt – but we aim to learn. On that note, we hope that you will be part of our experiment – please do tell us how you use this platform to inform and accelerate your own research. Like us, we’re sure you’ll find that with this technology there are always possibilities.

If you want to try Dimensions Research GPT, you can do so as a ChatGPT Plus or Enterprise user, by going to your OpenAI/ChatGPT environment and looking for Dimensions Research GPT under Explore GPTs.

The post Launching a new way to interact with scientific content on OpenAI’s ChatGPT platform appeared first on Digital Science.

What does a university look like? On tour at the ICSSI conference….

Aileen Irons — Fri, 07 Jul 2023 10:40:33 +0000

We are on tour!

Recently, we had the chance to make a mini exhibit of our ‘What does a University Look like project’ at the International Conference of Science of Science and Innovation in Chicago. Placed prominently in the main auditorium, the posters have triggered many interesting conversations around the structure of institutions – particularly how different universities undertake interdisciplinary collaboration. I have noticed that whilst many people encounter the posters as art, people linger longest when it is their own university pointing out structural, and historical reasons why a cluster might look a certain way.

For the conference, we chose seven Universities. Northwestern University and University of Chicago were selected as universities in the area. The remaining five universities were chosen to illustrate different university shapes. Harvard University was chosen to represent a medically-focused university, and also showed off its enormous scale of 62,000 affiliated researchers. MIT and ETH Zurich represented the shapes of institutes of technology. The University of Oxford and Peking University were chosen as two different examples of comprehensive universities. Both universities show a balanced set of disciplines, however their shapes are quite different. Oxford exhibits a common shape of disciplines wrapped around a clinical sciences core. The University of Peking on the other hand, is a tale of two halves, with almost two independent clinical and technological sides.

Examples of medically focused universities

Access the full poster

Examples of comprehensive universities

Access the full poster

Examples of technology focused universities

Access the full poster

Access all seven posters and more

You can find all seven posters as additions to the ‘What does a university look like’ project in figshare.

—– What does a University look like Primer —-

What does a University look like? By modelling data from Dimensions into a 3D visualization tool called Blender, we present a new way of exploring university research collaboration diagrams in a consistent format.

To create the university networks we use Dimensions to extract a coauthorship network based on university affiliation. The network is then given shape using the BatchLayout algorithm. To add color, we’ve used the 2020 Field of Research (FoR) codes, to represent research discipline, and we’ve designated a color to each one of those codes. Each single point of color represents an individual researcher coded by the 2-digit FoR they’re most associated with; researchers are depicted by a sphere, and the size of the sphere is based on the number of publications that researcher has produced.

To add depth, we then apply algorithms developed by CWTS at Leiden University to determine research clusters – co-authorship networks – within a specific university. These clusters are then layered on top of each other by discipline, with Clinical Science clusters at the bottom, then moving up through Health Sciences, then Science and Engineering, and Linguistics at the top.

More on this project: https://www.digital-science.com/blog/2023/05/discovering-galaxies-of-research-within-universities/

The post What does a university look like? On tour at the ICSSI conference…. appeared first on Digital Science.

Reproducibility and Research Integrity top UK research agenda

John Hammersley — Thu, 11 May 2023 11:55:39 +0000

Digital Science reflections on the House of Commons Science, Innovation and Technology Committee report on Reproducibility and Research Integrity.

The new Reproducibility and Research Integrity report released by The House of Commons Science, Innovation and Technology Committee is a timely reminder that Digital Science plays a critical role in supporting research integrity and reproducibility across the sector globally. The following is our response to the report’s findings.

TL;DR: What does the report say?

The report says that publishers, funders, research organisations and researchers all have a role to play in improving research integrity in the UK, with the focus on reproducibility and transparency. It recommends that funding organisations should take these factors into account when awarding grants and that the Research Excellence Framework (REF), a method for assessing how to share around £2 billion of funding among UK universities based on the quality of their research output, should score transparent research more highly. Publishers are advised to mandate sharing of research data and materials (e.g. code) as a condition of publication and to increase publication of registered reports (published research methodologies that can be peer reviewed before a study is run) and confirmatory studies. Research organisations such as universities must ensure that their researchers have the space, freedom and support required to design and carry out robust research, including mandatory training in open and rigorous research practices.

What’s new about the report?

Having been in the open science movement for many years, the report covers familiar ground for observers at Digital Science. The report makes some expected statements such as that reproducible research should come with a data management plan, have clearly documented reproducible methods, and link to openly available code and data. These have been recognised as good practices for some time, but this report goes further to suggest that these practices should be basic expectations from all scientific research, right now.

The report also goes much further than we’ve seen before in stating that the extent to which best practice is not being followed today is the fault of the system, and not the individual researchers. Systemic barriers to best practice are identified as:

Incentivising novel research over the reproduction of research
Funding the ‘fast’ science over more time-consuming reproducible research
Overemphasis on traditional research metrics – volume and citations centred around papers and people – at the expense of recognising all roles in research (such as statisticians, data scientists, and research programmers)
Journal policies and practices that do not enforce high expectations of research reproducibility on submission
Journal policies which do not respond quickly enough when publications should be retracted
Lack of expertise on reproducibility and statistical methods in peer review
Lack of institutional research integrity and reproducibility training across all levels of research seniority.

To remove these barriers, the report offers 28 observations and recommendations. At their core is the sentiment:

“…the research community, including research institutions and publishers, should work alongside individuals to create an environment where research integrity and reproducibility are championed.”

If the report has a limitation, it is that it leaves the role of some parts of the research community (including research infrastructure and service providers) unexamined. As part of the research community then, it is timely to reflect on Digital Science’s role in supporting research integrity and reproducibility.

What do we believe?

Digital Science’s long-held position is that research should be as open as possible and as closed as necessary for the research to be done within ethical and practical bounds; that research should be carried out as reproducibly and infused with integrity. Through investment in and work on tools such as Figshare and Dimensions Research Integrity, Digital Science supports researchers, research institutions, funders, governments and publishers to engage with these important topics.

Figshare

First to most people’s mind would be the role that Figshare plays in enabling transparency and reproducibility by making it possible for any researcher anywhere to deposit and share research data with a DOI. Launched in its current form in 2012, Figshare quickly became the dominant generalist repository (based on percentage share of repository links in data availability statements.) By 2020 Zenodo and OSF had joined Figshare in providing the global infrastructure for data. As institutions move to take responsibility for their own research data, Figshare has also been instrumental in helping institutions to manage their own research data repositories.

Figure: Repository Share as a percentage of repository mentions in Data Availability Statements.

Dimensions Research Integrity

The above analysis showing the growth of data repositories is possible thanks to the Dimensions Research Integrity Dataset, a data set measuring research integrity practices that is the first of its kind in terms of both size and scope. Designed with research institutions, funders, and publishers in mind, Dimensions Research Integrity measures the changing practice in the communication of Trust Markers: ethics approvals, author contribution statements, data and code availability, conflicts of interest, and funding statements. Trust Markers allow a reader to ascertain whether a piece of research contains the hallmarks of ethical research practice. To our knowledge, Dimensions Research Integrity is the first available global dataset on Research Integrity practice.

So what does the data look like for Reproducibility and Research Integrity? Taken as a global aggregate, the picture is perhaps less gloomy than the report on reproducibility and research integrity paints. Over the last three years, there has been a significant shift in structuring of research papers to make them more transparent. Although some of these changes might be cosmetic (for instance, having a data availability statement does not mean that your data is available,) they do represent a change in the research system towards research transparency. That these trust markers exist on papers at all at significant levels is down to a combination of funder policies, changing journal guidance to authors, as well as research integrity training at the institutional level.

Figure: Evolving Science Trust Markers 2011-2021. Trust markers have evolved rapidly over time expressed as a percentage of the scientific literature.
*The percentage of ethics papers is calculated over publications with a mesh classification of Humans or Animals. The ethics trust marker looks at those papers that include a specific ethics section (as opposed to mentioning ethics approval somewhere in the text).

As the report on reproducibility and integrity indicates, efforts to improve transparency should be addressed across all disciplines. By looking at the data for 2021, we can get a sense at how exposed each discipline is to each of the different trust markers. Where an area has reached at least 30% coverage, it is asserted that a tipping point has been reached, and that there are few impediments towards pushing towards more complete adoption with goals set around compliance. Areas of maturing community practice between 10% and 30% and developing community practice (less than 10%) require a different sort of engagement focus on education. This level of data is crucial both for being able to adapt Research Integrity training by discipline, but also for setting benchmarks, and measuring how successful different research integrity interventions have been.

Figure: Research Integrity Policy Implementation Bands. Based on the adoption percentages for 2021, Fields of Research are assigned policy implementation bands. For fields in band 1, there is already well established practice, and it would be reasonable to work towards 100\% compliance for all papers for which a University has a corresponding author, or is a principal investigator on a funded project. For band 2, there is awareness of the trust marker, but more training is required to shift practice. For bands 3 low awareness is assumed, and significant training is required.

As examples of how Dimensions Research Integrity can be used, these brief analyses are only the beginning of a broader evidence-based discussion on research integrity and reproducibility.

We look forward to having many more conversations with you throughout the year on Research Integrity!

The post Reproducibility and Research Integrity top UK research agenda appeared first on Digital Science.

Our new avenue for interesting things

John Hammersley — Thu, 27 Apr 2023 18:25:36 +0000

Welcome to Digital Science TL;DR, our new avenue for interesting things!

We bring you short, sharp insights into what’s going on across the Digital Science group; both through our in-house experts and in conversation with amazing people from the community. And we’ll keep it brief!

Why TL;DR? Because we’ve all experienced the “Too long; didn’t read” feeling at times, and by explicitly calling this out we’re making sure we provide a short summary at the top of every article here.

Introducing our core team

We have a core team of five (at present!) who will be the primary authors of new content on the site, often working in collaboration with our in-house experts and those in the scientific and research community.

You can think of it like our core team acting as the lightning rods attracting cool, exciting, and sometimes provocative content from across the Digital Science group and our wider community of partners, end users, customers and friends.

And so without further ado, please say hello to: Briony, John, Leslie, Simon and Suze!

Briony Fane

Briony Fane is Director of Researcher Engagement, Data, at Digital Science. She gained a PhD from City, University of London, and has worked both as a funded researcher and a research manager in the university sector. Briony plays a major role in investigating and contextualising data for clients and stakeholders. She identifies and documents her findings, trends and insights through the curation of customised in-depth reports. Briony has extensive knowledge of the UN Sustainable Development Goals and regularly publishes blogs on the subject, exploring and contextualising data from Dimensions.

John Hammersley

John Hammersley has always been fascinated by science, space, exploration and technology. After completing a PhD in Mathematical Physics at Durham University in 2008, he went on to help launch the world’s first driverless taxi system now operating at London’s Heathrow Airport.

John and his co-founder John Lees-Miller then created Overleaf, the hugely popular online collaborative writing platform with over eleven million users worldwide. Building on this success, John is now championing researcher and community engagement at Digital Science.

He was named as one of The Bookseller’s Rising Stars of 2015, is a mentor and alumni of the Bethnal Green Ventures start-up accelerator in London, and in his spare time (when not looking after two little ones!) likes to dance West Coast Swing and build things out of wood!

Image credit Alf Eaton. Prompt: “A founder of software company Overleaf, dancing out of an office and into London while fireworks explode. high res photo, slightly emotional.”

Leslie McIntosh

Leslie McIntosh is the VP of Research Integrity at Digital Science and dedicates her work to improving research and investigating and reducing mis- and disinformation in science.

As an academic turned entrepreneur, she founded Ripeta in 2017 to improve research quality and integrity. Now part of Digital Science, the Ripeta algorithms lead in detecting trust markers of research manuscripts. She works around the globe with governments, publishers, institutions, and companies to improve research and scientific decision-making. She has given hundreds of talks including to the US-NIH, NASA, and World Congress on Research Integrity, and consulted with the US, Canadian, and European governments.

Simon Porter

Simon Porter is VP of Research Futures at Digital Science. He has forged a career transforming university practices in how data about research is used, both from administrative and eResearch perspectives. As well as making key contributions to research information visualization, he is well known for his advocacy of Research Profiling Systems and their capability to create new opportunities for researchers.

Simon came to Digital Science from the University of Melbourne, where he worked for 15 years in roles spanning the Library, Research Administration, and Information Technology.

Suze Kundu

Suze Kundu (pronouns she/her) is a nanochemist and a science communicator. Suze is Director of Researcher and Community Engagement at Digital Science and a Trustee of the Royal Institution. Prior to her move to DS in 2018, Suze was an academic for six years, teaching at Imperial College London and the University of Surrey, having completed her undergraduate degree and PhD in Chemistry at University College London.

Suze is a presenter on many shows on the Discovery Channel, National Geographic and Curiosity Stream, a science expert on TV and radio, and a science writer for Forbes. Suze is also a public speaker, having performed demo lectures and scientific stand-up comedy at events all over the world, on topics ranging from Cocktail Chemistry to the Science of Superheroes.

Suze collects degrees like Pokémon, the latest being a Masters from Imperial College London that focused on outreach initiatives and their impact on the retention of women engineering graduates within the profession.

Suze is a catmamma and in her spare time loves dance and Disney, moshing and musical theatre.

Introducing our core topics

We are focusing our content around a set of core topics which are critical not just to the research community but to the world as a whole; at Digital Science we believe research is the single most powerful transformational force for the long-term improvement of society, and our vision is a future where a trusted, frictionless, collaborative research ecosystem helps to drives progress for all.

With this vision in mind, our five core topics at launch are: Global Challenges, Research Integrity, The Future of Research, Open Research, and Community Engagement.

These topics will no doubt continue to evolve over time, but that gives us a lot to get started with! Here’s the short summary of what those topics mean to us:

Global Challenges

Most of the world’s technical and medical innovations begin with a scientific paper. It has been said that the faster science moves, the faster the world moves.

But perhaps more importantly, society increasingly looks to science for solutions to today’s most pressing social and environmental challenges. If we’re going to face up to complex health issues, an ageing population, and the digital transformation of the world, we need science and research that is faster, more trustworthy, and more transparent.

With this in mind, we explore how science and research, and its communication, is evolving to meet the needs of our rapidly changing world.

Research Integrity

Research integrity will be a dominant theme in scholarly communications over the next decade. Challenges around ChatGPT, papermills, and fake science will only get thornier and more complex. We expect all stakeholders – research institutions, publishers, journalists, funding agencies, and many others – will need to dedicate more resources to fortify trust in science.

Even faced with these challenges, taking the idea of making research better from infancy to integration is exciting. Past and present, our team has built novel and faster ways to establish trust in research. We are happy to have grown a diverse group that will continue to develop the technical pieces needed to assess trust markers.

The Future of Research

Since its inception, Digital Science has always concerned itself with the future of research tools and infrastructure, with many of our products playing a transformative role in the way research is collaborated on, organised, described and analysed. Within this topic, we explore how Digital Science capabilities can continue to contribute to research future discussions, as well as highlighting interesting developments and initiatives that capture our imagination.

Open Research

At Digital Science, we build tools that help the researchers who will change the world. Information wants to be free and since the dawn of the web, funders have been innovating their policies to ensure that all research will become open.

Digital Science believes that Open Research will help level the playing fields and allow anyone anywhere to contribute to the advancement of knowledge. It also helps with other areas that pre-web academia struggled with. These include, reproducibility, transparency, accessibility and inclusivity.

These posts will cover the why and the how of open research, as it becomes just “research”.

Community Engagement

One of Digital Science’s founding missions was to invest in and nurture small, fledging start-ups to transform scholarly research and communication. Those founding teams now form the heart of Digital Science, and the desire to make, build, and change things for the better is at the core of what we do.

But we’ve never done that in isolation; Digital Science is a success because it’s always worked with the community, and most of us came from the world of research in one form or another!

In these community engagement posts we highlight and showcase some of the brilliant new ideas and start-ups in the wider science, research and tech communities.

What’s up next?

That’s all for this welcome post, but stay tuned for a whole batch of launch content being written as we speak! We’ll also have regular weekly posts from the team, and would love to hear from you if you have an idea for a subject we should cover, or simply if you’d like to say hello!

You can contact us via the button in the top bar or footer, or via the social media links for our individual authors.

Ciao for now!

The post Our new avenue for interesting things appeared first on Digital Science.

Simon Porter Articles - TL;DR - Digital Science

Presenting: Research Transformation: Change in the era of AI, open and impact

Welcome to… Research Transformation!

Research Transformation stories so far…

Digital Science Open Principles

Open Access: Mo Money Mo Problems

The Initial Transformation

Academic Survey Report Pre-registration

State of Open Data 2024 – Special Edition

Will 2025 be a turning point for Open Access? – Digital Science

Research Transformation

1. The story of research data transformation

2. The story of connection

3. The story of research innovation

The Barcelona Declaration… exploring our responsibilities as metadata consumers

Towards creating responsible metadata consumers…

Not all metadata are the same

Open metadata records

Algorithmically enhanced records

Institutionally enhanced metadata records

What are our responsibilities when using and reusing research metadata?

Responsibility 1. The purpose for which a piece of metadata is intended to be used must place a limit on both the scope (types of interpretation) and range (geographical, subject or temporal extent) under which it can be responsibly used

Responsibility 2. Machine-generated metadata should not be propagated beyond the systems for which it was created, without human curation or verification

Responsibility 3. Ranking platforms should be independent of the data aggregations from which they are drawn

The beginning of a conversation…

Exploring Research Transformation through the lens of Persistent Identifiers

It begins in 1973…

2002: From catalogue to community….

2009: From publications to datasets…

2010: From objects to people

2019: From people to institutions

2024: What is next? From finished objects to workflows

The Initial Transformation

The transformation from initials to full first names is part of the the broader transformation of the journal article as technology

The story of the Initial transformation is one of different research cultures becoming homogenised

The story of Initial Transformation highlights a discipline separation in research culture

Technology influenced our name formalism

Changes in Infrastructure can affect how we understand the past as well as the present

The shift to full names happened at different rates for men and women, and at least for publications associated with pubmed, technology influenced the practice

Embracing Initial Transformation

Launching a new way to interact with scientific content on OpenAI’s ChatGPT platform

The Road to Dimensions Research GPT

What is Dimensions Research GPT?

An example of how to use Dimensions Research GPT

What does a university look like? On tour at the ICSSI conference….

We are on tour!

Examples of medically focused universities

Examples of comprehensive universities

Examples of technology focused universities

Access all seven posters and more

Reproducibility and Research Integrity top UK research agenda

TL;DR: What does the report say?

What’s new about the report?

What do we believe?

Figshare

Dimensions Research Integrity

Our new avenue for interesting things

Introducing our core team

Briony Fane

John Hammersley

Leslie McIntosh

Simon Porter

Suze Kundu

Introducing our core topics

Global Challenges

Research Integrity

The Future of Research

Open Research

Community Engagement

What’s up next?