reproducibility Archives - Digital Science https://www.digital-science.com/tags/reproducibility/ Advancing the Research Ecosystem Wed, 14 Feb 2024 13:49:35 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 Provocative Paper Titles https://www.digital-science.com/blog/2021/08/provocative-paper-titles/ Tue, 24 Aug 2021 15:00:03 +0000 https://www.digital-science.com/?p=55333 Does a disconnect between a paper’s abstract and its title indicate a potential need to inspect the article for possible trust issues?

The post Provocative Paper Titles appeared first on Digital Science.

]]>
By Dr Leslie McIntosh, Founder and CEO of Ripeta, and Dr Hilde van Zeeland, Applied Linguist at Writefull.

At Ripeta, we develop tools to automatically scan manuscripts for key scientific quality indicators and provide feedback on ways to improve research reporting. We assess, design, and disseminate practices and measures to improve the reproducibility of, and trust in science with minimal burden on scientists.

In what can often feel like a sea of dry scientific writing, provocative titles in scientific research papers stand out. Occasionally, legitimate scientists conducting good research will attempt more humorous titles. Sometimes, they even land! 

To highlight the joy of a jaunty paper title, our friends at Writefull, providers of AI-based research proofreading services, have developed a fun app to generate scientific paper titles based on article abstracts. When pondering paper titles, I wondered whether a disconnect between a paper’s abstract and its title could indicate a potential need to inspect the article for possible trust issues, and what better way to investigate it than to use their app!

And so, without further ado, or indeed statistical significance, I present three articles: 

Article 1: A (very not) scintillating title 

Human Created Title

An analysis of form and function of a research article between and within publishers and journals

Writefull’s Computer-Generated Title 
  • Research Article Heading Organization and Forms for Machine Learning and Natural Language Processing: A Case Study from a Single Institution
  • A Heading Form and Function Analysis for Machine Learning
  • Research Article Heading Form and Function Analysis Using Rhetorical Functions

Picking an enticing article we just published at Ripeta about research article heading and subheadings, I wanted to see how close our paper name compared to those generated by a computer. Based on the alternative possibilities, the similarities of titles reassured me that we accurately framed our paper as dryly as possible, keeping in line with scientific naming conventions. Quite an interesting article when training machine learning algorithms to parse and categorize articles. However, definitely not click-bait.

An image of a paper entitled "An analysis of form and function of a research article between and within publishers and journals" on an iPad screen

Article 2: A title from an author trying to be clever (apologies Dr. Luke)

Human Created Title

Where there’s smoke there’s money: Tobacco industry campaign contributions and U.S. Congressional voting

Writefull’s Computer-Generated Title Possibilities
  • Voting Behaviors of Representatives from the Tobacco Industry Political Action Committees in the United States: A Cross-Sectional Analysis
  • The Effectiveness of Campaign Contributions for Tobacco-Related Legislators in the United States: A Cross-Sectional, Multilevel Model
  • Voting Behavior of Tobacco Industry Political Action Committees

A search in Dimensions shows over 160 articles alluding to the proverb ‘Where there’s smoke’ in the title. Not that uncommon. Maybe even overused? From personal experience, Dr. Doug Luke enjoys using more flavourful titles for his papers and talks to make statistics sound as interesting as it really is. The generated titles compare favourably to the original segment after the academic colon.

An image of a Dimensions screen showing a paper entitled "Where there's smoke there's money: Tobacco industry campaign contributions and U.S. Congressional voting" on an iPad screen

Article 3: A provocative title (from a retracted article)

Human Created Title

The Safety of COVID-19 Vaccinations—We Should Rethink the Policy

Writefull’s Computer-Generated Title Possibilities
  • Vaccine Safety and Risk Assessment for mRNA Vaccine COVID-19
  • Vaccination of COVID-19: A Review of the Safety of Vaccines
  • Safety Evaluation of COVID-19 Vaccines: The mRNA Vaccination versus the Number Needed for Vaccination

The problem with this title is the authors put in a recommendation into the title, which plays on the boundaries of scientific cultural norms. In fact the term ‘rethink the policy’ appears in only a handful of article titles. More troublesome is that the recommendation in the title does not logically follow from the paper, as also reflected by the auto-generated titles given by Writefull. Before even considering the fraughtful methods of the paper, we know the title and substance of the paper don’t agree with each other.

Provocative paper titles remind us that, first, scientists are able to laugh at themselves a little, and second that the title itself could have a bearing on the readership and thus the exposure of the science within. Could there be a relationship between paper titles and trust? We’d love to hear your thoughts. Tweet us @ripetaReview.

An image of a paper entitled "The Safety of COVID-19 Vaccinations—We Should Rethink the Policy" on an iPad screen

Want to try your hand at the title generation app? Go to the Writefull Title Generator and let us know what you found @Writefullapp and @ripetaReview.

At Ripeta we will keep exploring and automating checks to make better science easier. To learn more, head to the Ripeta website or contact us at info@ripeta.com.

Leslie Ripeta - Headshot

Dr. Leslie McIntosh
CEO and Founder, Ripeta

Leslie is the founder and CEO of Ripeta and a researcher passionate about mentoring the next generation of data scientists. She is active in the Research Data Alliance, grew the St. Louis Machine Learning and Data Science Meetup to over 1500 participants, and was a fellow with a San Francisco based VC firm. She recently concluded as the Director of Center for Biomedical Informatics (CBMI) at Washington University in St. Louis where she led a dynamic team of 25 individuals facilitating biomedical informatics services. Dr. McIntosh has a focus of assessing and improving the full research cycle and making the research process reproducible.

The post Provocative Paper Titles appeared first on Digital Science.

]]>
Curious Case within Preprints: Is the Author Real? https://www.digital-science.com/resource/curious-case-within-preprints/ Mon, 10 May 2021 20:34:15 +0000 https://www.digital-science.com/?post_type=story&p=52108 This article highlights one case of a person acting as a scientist and placing papers on multiple preprint platforms.

The post Curious Case within Preprints: Is the Author Real? appeared first on Digital Science.

]]>

Case within Preprints: Is the Author Real?

An assumption at the heart of the scientific publication process is that the author of a manuscript is a scientist. But how can we tell when that is not the case? This supplements an article in The Scholarly Kitchen highlighting one case of a person acting as a scientist and placing papers on multiple preprint platforms.

We are committed to supporting researchers on their path to a more open and reproducible research 

Reproducibility by design

Reproducibility should be a natural and integral part of the research process – embedded and invisible wherever possible. We are committed to supporting researchers on their path to a more open and reproducible research.

Ripeta

Ripeta aims to make better science easier by providing a quick and accurate way to assess the trustworthiness of research. Ripeta focuses on assessing the quality of the reporting and robustness of the scientific method.

Making Science Better

The report focuses on the increasing importance of failure in supporting modern research and addresses three areas including appropriate documentation and sharing of research data, clear analysis and processes, and the sharing of code.

The post Curious Case within Preprints: Is the Author Real? appeared first on Digital Science.

]]>
Imposters and Impersonators in Preprints https://www.digital-science.com/resource/imposters-and-impersonators-in-preprints/ Wed, 17 Mar 2021 12:25:47 +0000 https://www.digital-science.com/?post_type=story&p=48051 Known challenges do exist but now is the time to build a coalition – to foster credibility and integrity into the open science ecosystem.

The post Imposters and Impersonators in Preprints appeared first on Digital Science.

]]>

Case within Preprints: Is the Author Real?

Imposters and Impersonators in Preprints: How do we trust authors in Open Science?

Leslie Ripeta - Headshot

In this Scholarly Kitchen post, Leslie D. McIntosh founder and CEO of Ripeta, tackles indicators of trust, and the curious cases of imposters and impersonators in covid preprints. Leslie walks us through an example to illustrate how open science practices have been manipulated through fake authorship.

Known challenges exist and these include fake peer reviewers, paper mills, and falsified institutional affiliations. These challenges offer us all the opportunity to discuss as a community the checks and balances we can put in place. Now is the time to build a coalition – to foster credibility and integrity into the open science ecosystem.

The post Imposters and Impersonators in Preprints appeared first on Digital Science.

]]>
Reproducibility By Design https://www.digital-science.com/challenge/reproducibility-by-design/ Mon, 21 Dec 2020 11:35:09 +0000 https://www.digital-science.com/?post_type=project&p=42386 Reproducibility should be a natural and integral part of the research process - embedded and invisible wherever possible.

The post Reproducibility By Design appeared first on Digital Science.

]]>

Reproducibility by Design

Reproducibility should be a natural and integral part of the research process – embedded and invisible wherever possible. However, in recent years, the research world has encountered a reproducibility crisis whereby methods, analytical software, and data reported have not been shared fully openly or accurately.

We are committed to supporting researchers on their path to a more open and reproducible research through the development and implementation of technological solutions.

A Transparency SnapShot

Author awareness and compliance needed

To understand the adoption and impact of transparency guidelines the Ripeta team analysed the top 25 highest impact journals’ manuscripts from 2019. Their results indicate that even the most compliant journals failed to ensure that half of the authors included data availability statements. These results signal the need for better author awareness and improved methods for checking compliance.

Watch the video

Helping to solve the Reproducibility Crisis

Resources

The Anatomy of a Data Availability Statement (DAS)

Reproducibility, Replicability and Trust in Science

Trusting Science in the Time of Coronavirus

Reproducibility or Producibility? Metrics and their Masters

The post Reproducibility By Design appeared first on Digital Science.

]]>
Reproducibility, Falsifiability and the Scientific Method https://www.digital-science.com/resource/reproducibility-falsifiability-and-the-scientific-method/ Wed, 11 Sep 2019 21:05:19 +0000 https://www.digital-science.com/?post_type=story&p=41708 The report looks at the current state of reproducibility, as well as the importance of falsifiability in the research process.

The post Reproducibility, Falsifiability and the Scientific Method appeared first on Digital Science.

]]>

Making Science Better: Reproducibility, Falsifiability and the Scientific Method

This report focuses on the increasing importance of failure in supporting modern research.

‘Making Science Better: Reproducibility, Falsifiability and the Scientific Method’ looks at the current state of reproducibility in 2019, as well as the importance of falsifiability in the research process. The report addresses three areas including appropriate documentation and sharing of research data, clear analysis and processes, and the sharing of code.

The analysis comes from the Digital Science portfolio company, Ripeta, which aims to make better science easier by identifying and highlighting the important parts of research that should be transparently presented in a manuscript and other materials.

The post Reproducibility, Falsifiability and the Scientific Method appeared first on Digital Science.

]]>
New Report on Falsifiability and Reproducibility in Scientific Research https://www.digital-science.com/blog/2019/09/new-report-on-falsifiability-and-reproducibility-in-scientific-research/ Wed, 11 Sep 2019 12:01:28 +0000 https://www.digital-science.com/?p=32038 Making Science Better: Reproducibility, Falsifiability and the Scientific Method looks at the current state of reproducibility in 2019, as well as the importance of falsifiability in the research process.

The post New Report on Falsifiability and Reproducibility in Scientific Research appeared first on Digital Science.

]]>

Our new report addresses three areas including appropriate documentation and sharing of research data, clear analysis and processes, and the sharing of code.

Making Science Better: Reproducibility, Falsifiability and the Scientific Method looks at the current state of reproducibility in 2019, as well as the importance of falsifiability in the research process. The analysis comes from the Digital Science portfolio company, ripeta, which aims to make better science easier by identifying and highlighting the important parts of research that should be transparently presented in a manuscript and other materials.

Key report findings include:

  • All research stakeholders have a responsibility to make their work both reproducible and falsifiable.  Reproducible: so that anyone can follow the stated method and reach the same conclusions; and falsifiable: so that the method used can appropriately test the hypothesis.

  • While not all research materials need to be accessible due to confidentiality and/or anonymity, achieving adequate transparency is essential to reproducibility.

  • The research paper should be a route to test and recreate the research that has been carried out.  This is the basis of the scientific method.

  • Falsifiability is an integral part of the research process. It adds credibility to research and allows further work to build on solid foundations.

  • Establishing a well-structured framework against which assessments of reproducibility can be made, alongside appropriate reporting, allows the barriers in reusing scientific work, supporting scientific outcomes, and assessing scientific quality to be reduced.

  • Good data documentation, which includes the research design, data collection, data cleaning, and analyses leads to ‘good’ science. Well-documented science and research enables further advancement through transparency and adequate data documentation.

  • Clear data analysis reporting is not only related but critical to the practice of good science.

  • By supplying code, documenting which version of software was used, and storing code for future reference, science can be made more accurate, more reproducible, and more useful to scientists within and across domains and geographies.

  • The scientific community needs faster and more scalable means to assess and improve reproducibility. An important part of that is fundamentally changing how we think about reproducibility. The difficulty is that while we all have a sense of what reproducibility is in our own fields, reproducibility as a concept does not easily translate between fields.

  • We need to build structure into our research processes that automate the checking of the process itself and alert us to problems when they arise. This new machinery of checks and counterbalances needs to take both falsifiability and reproducibility into account.

Leslie McIntosh, CEO of ripeta, said:

“The pursuit of knowledge is important, and should be undertaken thoroughly, accurately, and transparently. Accessible, reproducible research is an important and often challenging aspect of that pursuit. Technology has made conducting science faster and more sophisticated. We need ways to quickly and accurately capture and report all the methods without asking more from the scientists. ripeta addresses one part of the problem.”

Ripeta has been nominated as a finalist for the ALPSP Awards for Innovation in Publishing, with the winner announced tomorrow evening at the annual conference dinner.

The post New Report on Falsifiability and Reproducibility in Scientific Research appeared first on Digital Science.

]]>
Reproducibility: a Cinderella Problem https://www.digital-science.com/blog/2016/11/reproducibility-cinderella-problem/ https://www.digital-science.com/blog/2016/11/reproducibility-cinderella-problem/#comments Tue, 29 Nov 2016 09:20:59 +0000 https://www.digital-science.com/?p=22505 Image: CSIRO The reproducibility of research has been an increasingly important topic in the scholarly communication world for several years[1]. Despite the academic world’s commitment to peer-review as part of the communication ecosystem, reproducibility – which might be seen as a form of in-depth peer-review – has never been treated as seriously. The reproducibility process […]

The post Reproducibility: a Cinderella Problem appeared first on Digital Science.

]]>
csiro_scienceimage_3218_examining_petri_dish

Image: CSIRO

The reproducibility of research has been an increasingly important topic in the scholarly communication world for several years[1]. Despite the academic world’s commitment to peer-review as part of the communication ecosystem, reproducibility – which might be seen as a form of in-depth peer-review – has never been treated as seriously.

The reproducibility process – by which a piece or claim of research is validated, by being recreated by independent researchers following the methodology described in a paper – can be tedious. But there can be few of us who haven’t been frustrated by some missing detail, or partially described process[2]. For me, it’s often when I’m presented with curated data that doesn’t seem to quite match what I’d have expected to see from the raw data.

Problems relating to reproducibility are not going to be a universal experience, the different characteristics of different fields are as present here as in other aspects of research. A proof-based discipline, such as mathematics, requires a different approach from a probabilistic science, or from social sciences involving perhaps observations and conversation.

Much of the research world has an inherent bias against integrating this more rigorous method of testing results. Journals are optimized to publish new or unique works: research output – as measured by published papers – is the key data by which researchers are measured, and funds are – broadly speaking – in favor of new research. In short: reproducibility is a Cinderella problem, in need of some attention and investment if it’s to flourish.

Earlier this month, I had the pleasure of attending an NSF / IEEE organized workshop on reproducibility, “The Future of Research Curation and Research Reproducibility”. There were many intelligent and thought-provoking contributions, those that stick in my mind included presentations by Amy Friedlander (Deputy Division Director, National Science Foundation), Bernie Rous (ACM), Victoria Stodden (UIUC), my colleague Dan Valen of Figshare, Michael Forster (IEEE), Todd Toler (Wiley) and Jelena Kovacevic (Carnegie Mellon). I’m not going to attempt to summarize the event, we’ll post a link once it’s published, but I have had a number of reflections on reproducibility as a network – or system – problem, that I wanted to share. You won’t be surprised that I also have some thoughts about how we can capture this data, and develop metrics in the space.

Reproducibility is complex – it means many things to many people

We lack a coherent concept of reproducibility: it’s as complex as anything else you might expect to find in the research world. I’m going to use a strawman example in this blog post: the simple availability of data. However, this is just an example – even data issues are multifaceted. Are we discussing raw data, or curated? Or the curation process? What are the ethical, privacy and licensing concerns about the various forms of data? How is the data stored, and protected? If a finding fails to be reproduced because of a referenced value, how does this affect the status of this particular paper?

A reproducibility ecosystem

1. A reproducibility statement

It should be possible for researchers to formally express the steps that they have undertaken to make an experiment or a paper reproducible. For example, “The complete data set is available at (address)”, “The curated data is available at (address), and the raw data is available on request”, “The data used in this experiment contains private information and is not available.” Note that there’s no sense of obligation in this process: it’s simply a structural device to support the structured communication of the intentions of the authors. The statement could be embedded in the methodology section of a paper.

2. Identifying reproducibility

One relatively simple form of reproducibility would be to test the above statement. Although this shouldn’t be the limits of reproducibility, even the simple process of making a statement about what has been done to support reproducibility enables a straightforward task – that of confirming the author’s statement. The benefits of this explicit stage is that it could be embedded in the existing peer-review and publishing process.

Bernie Rous of the ACM presented a process like this at the NSF / IEEE workshop[3]. In this case, the publisher supports a structured approach to confirming reproducibility, and displays the results by the use of badges. LINK. These badges are related to elements in the TOPS project / documentation(?), and this could be used as a general purpose taxonomy to support reproducibility statements, the actual elements being selected by journal editors for relevance and appropriateness.

3. Embedding reproducibility

Research output does not live in a single place: it’s common to have several versions of the full-text available in different venues. Titles, abstracts – and increasingly references – are being fed to many systems. The infrastructure to support embedded metadata is mature: DOIs are ubiquitous, ORCID iDs are increasingly appearing against research output: CrossRef has millions of documents with various open-access and other licenses, described in machine-readable data), funding information and text-mining licenses, DataCite maintains open lists of linked data and articles. Whilst introducing a standard for describing reproducibility, potentially based on the TOPS guidelines[4] and FORCE11 work on related principles[5], wouldn’t be trivial, the process of developing standards and sharing data is something the community understands and supports.

4. Securing reproducibility

Merely making the data available at the time of publishing is not the end of the data storage problem: the question of where and how the data is stored has to be addressed. FORCE11’s Data Citation Principles[6] described some steps needed to promote data as a first class research object. These include metadata, identifiers and other elements. FORCE11 is currently engaged in implementation projects that are supported by some of the world’s biggest organizations in the research environment.

Probably the most important issue is understanding how long data will be secured for, and what arrangements are made to guarantee this security. Repositories can be certified, by a stepped process.[7]

5. Funding reproducibility

Even if we adapt our current processes to thoroughly support reproducibility, we haven’t addressed the issue of who is to fund it. It has to be observed that many of the agencies involved in pushing for reproducibility are funding agencies, and to this end, I would call upon them, firstly, to invest in the structural changes needed, and secondly, to develop a nuanced view of the problem.

I identified earlier that different fields have different needs, and it is obviously true to say that different topics have different senses of urgency. That sense of urgency – the need to verify research findings – could very well be a driver for reproducibility. This could be determined at the time of publishing, or alternatively decided at periods afterwards. Making predictions about citation rates for individual papers is notoriously difficult: if, over the course of a year or two, it appears that a paper is being used as a foundation stone for future research, then that might highlight the need for verification. This would be all the more true if the findings were unique to that piece of research.

In both cases, it would be possible to pre-define rules that – once reached – could unlock related funding. A funding agency could include a proportion of money that would be held back to fund reproducibility should these thresholds of importance or use be reached. By limiting the degree to which findings need to be reproduced, and by focussing the need, it should be possible to increase the efficiency of research – by increasing the certainty of reproduced claims, and by reducing incorrect dependencies on research that couldn’t be reproduced.

6. Publishing reproducibility

At present, publishers do not often support the publishing of papers that reproduce others. If reproducibility is to be taken seriously, the outputs must become part of the scholarly record, with researchers able to claim this work as part of their output.

I have to be mindful that publishers will not want to damage their journal metrics, however! It is unlikely that papers that describe a reproduced experiment would be cited often, and widespread publication of such papers would both tie up journal staff and also ‘damage’ their metrics. I have two relevant ideas to share about how reproducibility output could be incorporated into the publishing context.

Firstly, journals could publish such material as an appendix, adjunct to the journal itself. This would be particularly important if the new output acted as a correction, or meaningful addition to the original paper.

Secondly, reproduced work that doesn’t meaningfully add work could be presented as an annotation to the original paper, in the same manner in which a service such as Publons allows for open annotation, review and linking to papers.

Both routes could use the same metadata standards as described earlier in this document: importantly, the role of authorship should be incorporated. A reproducibility statement that is made by an author, and verified by a peer-review needs to be distinguished from a third-party annotation on an open platform. Nevertheless, this distinction can be incorporated in the metadata.

Needless to say, there is a cost distinction between the two paths. Journals, and their editing and content processes, have a direct cost associated with them. Services, such as Publons, are frequently free at the point of cost.

By incorporating the correct metadata and authorship relations, authorship of reproduced research can be credited to the researchers, providing all important currency to those researchers and institutions. This recognition both rewards the work, and validates reproducibility as a primary research task: it may encourage early stage researchers to go the extra distance and get rewards for their work in their field.

7. Measuring reproducibility

A standardized way of collecting the elements of reproducibility and communicating those facts means that we can count and measure reproducibility. Echoing my earlier observation that all reproducibility is not relevant for all research, this would allow funders, institutions and journals to measure the degree to which reproducibility is being adopted. Reproducibility is not a simple binary process: the greater the degree that reproducibility has been undertaken (with success), the higher the likelihood that the findings can be treated as verified.

Conclusion

This suggested ecosystem describes a way that efforts can be used focus need, to discover reproducibility, to reward these efforts: to suggest ways in which the various members of the scholarly environment can adapt their citizenship roles to support the future success of reproducibility.

Reproducibility can look like one large problem, but the reality is that it is a number of issues, which can be seen as being distributed throughout the environment. We need to recognize and reward the work that has already been done – by funders, service providers, agencies such as the RDA and publishers, and to plan for a joined up future that fully enables reproducibility throughout the scholarly ecosystem.

Thanks to Dan Valen and Simon Porter for suggestions and corrections.

[1] Vasilevsky NA, Minnier J, Haendel MA, Champieux RE. (2016) Reproducible and reusable research: Are journal data sharing policies meeting the mark? PeerJ Preprints 4:e2588v1 https://doi.org/10.7287/peerj.preprints.2588v1

[2] Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ, LaRocca GM, Haendel MA. (2013) On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1:e148 https://doi.org/10.7717/peerj.148

[3] ACM, (2016). Result and Artifact Review and Badging

[4] The Transparency and Openness Promotion Guidelines https://cos.io/top/

[5] Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data Publishing https://www.force11.org/fairprinciples

[6] FORCE11 has a number of active data citation projects, based around the original declaration, including implementation pilots for repositories and publishers https://www.force11.org/datacitation

[7] ICPSR’s Trusted Data Respositories certification http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/preservation/trust.html

The post Reproducibility: a Cinderella Problem appeared first on Digital Science.

]]>
https://www.digital-science.com/blog/2016/11/reproducibility-cinderella-problem/feed/ 2