Platforms for scholarly communication

What is a platform? A platform is more than “software”: it is in fact an ecosystem that includes software, data, services and people.  It is in its essence sociotechnical, and its function is to enable research and scholarly communication. The Web, in its implementation and its philosophy, is the basis of each of the emergent, transformative scholarly communication platforms. A useful notion in this regard is the “social object”, be it data, workflow or paper, as the object of scientific discourse and thus also the creation of scientific social networks.


Successes.

Reviewing the characteristics of several successful platforms for scholarly communication (where “success” is measured by the research that is enabled, rather than software downloads or revenue), several common features emerge: they solve a definite problem, they are easier to use than other solutions (have a low barrier to entry); they enable connections among a community that mutually benefits from participation; they evolve continuously in response to community and technology advances; they offer both user interfaces (UIs) and application programming interfaces (API’s), and there is often a reputational benefit to participating in the ecosystem revolving around the platform. The social elements are a key component to successful platforms; they are designed around users and what they share, not just interoperable infrastructures. In this discussion we consider only Web-based platforms.


At the most basic level of use, successful platforms in scientific settings are platforms that are generic rather than research-specific. These include document management systems such as Google Docs and Office Web Apps. Project and file management software such as Basecamp and Dropbox. General blogging and microblogging platforms such as Wordpress and Twitter have been “reappropriated” by research communities, becoming platforms for scientific and scholarly communication. The same applies to web content management systems such as Drupal. Successful platforms often require little in the way of formal training and researchers can rapidly derive value. Such platforms also generally fit easily into existing workflows and accommodate many types of hardware devices.


Among platforms that were built purposely for research, success stories include ArXiv, an open-access preprint repository (which is crucial in shaping and fostering research communities), and Github, an open-source repository of code and software. Often openness lends itself to success since it supports collaboration, access, re-use and agility which are conditions that can support both social and technical aspects of platforms in support of research.


Although many successful platforms will continue to be built around research needs, new ones are emerging that blur the line between the workplace and the social environment. There is a greater expectation among researchers that the platforms they use in their personal lives will also be available in their professional lives.


Failures.

Given that the Web is the fundamental platform, it is perhaps not surprising that failure modes are those contrary to the way the web works. There are failures when platforms are not created through sociotechnical co-evolution but rather “build it and they will come”, and when they fail to evolve because they are too brittle and cannot respond to rapid changes in technology or user expectations.


For example, the Twine academic platform recently failed, possibly due to its pre-structured configuration by which use had to conform to a predetermined technology (in this case: semantic web data formats). The social networking platform Academia.edu has also failed to obtain wide use and uptake as it is addressing a use case that is already met by other solutions, for example LinkedIn.


Another example of unsuccessful research project, at a much larger scale, is the Virtual Observatory Intiative in Astronomy. The VO initiative suffered from too much focus on the design and development of large-scale infrastructural solutions. By doing so, it lacked an agile, dynamic component by which users could interact directly with tools and create communities of adopters. The Virtual Observatory has many rich features, such as catalogues, data base queries and data conversion and integration;  but in the context of this analysis it falls into the category of ‘brittle’, having been over-engineered and therefore not lending itself to use and responsiveness.


Key unsolved problems.

The major issue with a number of existing smart laboratory software and scientific management tools is poor adoption. A number of different systems have been developed: some aimed at scholarly and scientific communities at large, others aimed at specific disciplinary and laboratory contexts. Yet, adoption and penetration rates are overall low. By and large, most scientific research is performed via a heterogeneous assemblage of generic and domain-based tools, platforms, and practices. Wide adoption of a single platform for the management and publication of scientific workflows is an ongoing challenge fueled by the disparate data needs, scholarly practices, and tools across different domains.  


Promising platforms.

Because of the sociotechnical nature of platforms and the need for people and systems to co-evolve, building a successful platform is a challenge; more often, they are grown rather than created whole-cloth. One pattern for building platform is to take a system that has worked well in a particular domain or context, and extend it to other areas. Open source citizen science platforms such as GalaxyZoo’s extension to Zooniverse is a prime example; Stack Overflow (a question and answer system for computing programming) has also been successfully applied to other domains (e.g. Math, Computer Science, LaTeX.) Large-scale collaborative platforms such as Polymath are also promising collaborative tools as are similar efforts in the humanities (digital humanities projects).


In terms of supporting research a number of key platform tools are highlighted below that can be currently used by researchers.


Appendix: Examples of platforms for scholarly communication


Building tools and services

At the most basic level, with the arrival of cloud computing, the last decade has seen a revolution in the way that researchers can use computers, both for computation/analysis, and to set up web services. It is now possible for any scientist to set up web servers and services very quickly with no up-front cost, and without having to worry about setting up hardware or most software. Examples of cloud providers include:


On top of this existing infrastructure that is ready to go, there are several web frameworks that allow rapid creation of dynamic websites. These include:


An example of a web site for scientific communication built with Drupal is http://painresearchforum.org.  The Harvard Time Series Center at http://timemachine.iic.harvard.edu is an example of such a site built using Django.


Another interesting contender is the Google App Engine (http://code.google.com/appengine/) - which provides both the infrastructure and the framework to quickly develop services.


Collaboration amongst researchers

Several tools already exist to facilitate collaboration between researchers and allow collaborative document editing, including:


Services also exist to easily synchronize files over multiple computers and allow sharing between several people, including:


Examples of hybrids of these are Evernote (http://www.evernote.com) and OneNote (http://www.microsoft.com/onenote) which, in addition to being good platforms for scientists to keep organized notes and files relating to their research, allow sharing and collaborative editing of specific notebooks.


Beyond simple sharing of files and documents, teams can make use of wikis to coordinate projects. Examples of wiki frameworks or services include:


Software development often occupies an important place in research, and several websites now facilitate publication of version controlled code. For example, SourceForge (http://www.sourceforge.net) is an example of a website for publishing open-source software projects. More recently, GitHub (http://www.github.com), which pride themselves as a ‘social’ coding website, has become on of the best solutions for hosting open- and closed-source code, allowing not only the publication of the code, but active collaboration on the development itself, and communication between developers.


Direct verbal communication between scientists is of course essential to efficient research, and modern communication platforms such as Skype (http://www.skype.com) or ‘hangouts’ on Google Plus (http://plus.google.com) now allow multi-user video conferences without traditional expensive equipment.


Finally, larger-scale communication between scientists at conferences/meetings and more generally worldwide can greatly benefit from Twitter (http://www.twitter.com), which with the use of hash tags allows focused communication between groups of individuals interested in a particular topic or meeting.


Reference Management

The process of publishing papers and exploring the existing literature has also seen a transformation, with the advent of advanced bibliography management tools which not only allow researchers to keep track of interesting publications, but also add the social/sharing aspect, and in many cases offer automated recommendations for other researchers and publications that would be of interest. These platforms are invaluable in navigating the vast literature in many fields. Examples include:



Repositories

Over the past decade the use of a number of repository systems has arisen in order to support the publishing of research outputs and in the main these have focused on papers and in particular open access pre-prints. These repositories support the storage and access to research outputs, and might be used by universities, research groups or disciplinary communities. Examples include:


Communicating with the public

Once the research has been carried out and published, one final important step is communication with the wider scientific community and the public. Many platforms enable researchers to reach a wide audience, including:

  • Twitter (http://www.twitter.com) - which in addition to being a collaboration tool already mentioned previously, is also an excellent platform for reaching a wide audience

  • Facebook (http://www.facebook.com) - which similarly allows researchers to reach wide audiences and engage with the public. For example, the Hubble Space Telescope has a Facebook page that has over 70,000 ‘fans’.

  • Wordpress (http://www.wordpress.org) and Tumblr (http://www.tumblr.com) - two popular blogging platforms.



Authors/Contributors

Mark Abbott, Oregon State University - mark@coas.oregonstate.edu

Taliesin Beynon, Wolfram Alpha - taliesinb@wolfram.com

Rachel Bruce, Joint Information Systems Committee, JISC, r.bruce@jisc.ac.uk

Derick Campbell, Microsoft Research - derickc@microsoft.com

Tim Clark, Harvard Medical School / Mass General Hospital - tim_clark@harvard.edu

Tom Cramer, Stanford University - tcramer@stanford.edu

Dave De Roure, Oxford e-Science Centre, david.deroure@oerc.ox.ac.uk

Cory Knobel, University of Pittsburgh - cknobel@pitt.edu

Alberto Pepe, Harvard - apepe@cfa.harvard.edu

Thomas Robitaille, Harvard-Smithsonian CfA - trobitaille@cfa.harvard.edu


Rough Transcript of my Opening Remarks

Dear Colleagues:

 It is an honor to be asked to address this group, some of whom I seem to see more than my own family, to set the stage for this 2011 Microsoft Research sponsored eScience workshop on Transforming Scholarly Communication.

 The first fundamental question to ask is, do we need a transformation in the first place? Obviously we all believe we do otherwise we would not be here, but what about the majority of scholars? I use my colleagues up and down the corridor as a benchmark to answer that question. A group currently oblivious to much of what we will show tomorrow. But nevertheless a group increasingly not oblivious to the changes going on around them – data sharing policies, cuts in library budgets, open access, and our students. Let me illustrate this latter point with a recent example of something remarkable that happened to me.

 A couple of months ago I received by email a paper to PLoS Comp Biol. This happens from time to time as authors try and circumvent the standard submission procedure and contact me as Editor in Chief directly. It was a paper in pandemic modeling, which appeared to question conventional approaches to such modeling. Not being an expert here I sent the manuscript to Simon Levin in Princeton who is on our Editorial Board for his opinion. Simon is a Kyoto Prize winner and an expert in large-scale biological modeling. He indicated there was something special about this well written paper. Since the sole author was living in San Diego I agreed to meet with her and discuss the work. Simply by asking she had received a large amount of free computer time from the San Diego Supercomputer Center (SDSC), got free access to Mathamatica and had clearly benefited from the open access literature as well as resources like Wikipedia. I encouraged her to submit the work to Science, which she has done, and it is currently under review.

 

The sole author’s name is Meredith.  What makes this story remarkable, is that she is 15 years old and a senior at La Jolla High School in San Diego. She subsequently presented her work at my lab meeting, which I must say was much better attended than usual. Sitting there with my eyes closed I thought I was in the presence of a professor deep into their area of expertise. It was only when I opened my eyes and saw the braces did reality sink in.

Clearly this is an extreme case, but lets not be modest, what we are trying to do here is enable anyone with an Internet connection and a will to learn, achieve what Meredith has achieved. I cannot think of a more noble cause. While we have seen this possibility for a long time, what is new is that others are now seeing it too.

We all have our own Meredith stories or at the very least some driver that moves us in the same way. For some it is the glacial pace at which knowledge exchange takes place; for others it is the sense of unease about the lack of reproducibility in our own science; for others it is the inaccessibility of knowledge; and for others still it is the totally qualitative way quantitative scientists measure the value of scholarship.

With Meredith as our motivation, let us take a minute to analyze the path we are on towards transforming scholarship through what has happened this past year and then what might happen as a result of this workshop.

2011 may well be remembered as the year that stakeholders – scientists, publishers, archivists and librarians, developers, funders, and decision makers went from working in isolation to beginning to work together. What started with Beyond the PDF in January had become a “movement” by the time the summer meetings were over. The Dagstuhl meeting captured the spirit in a manifesto that should become a living document for us all to consider. Movements have transformed entrenched systems before and it remains an open and very exciting question as to whether that will happen here. For a small group to cause change to many requires that the many believe that change is needed and gradually get on board. I believe that time has come.

The driver of change is the ground swell towards open science. When I first heard that a group of prominent life scientists got together and agreed to start a new open access journal I was disappointed – such vision coming up with something that we had already. But if the effort by HHMI, Wellcome Trust and Max Plank does indeed compete with Science and Nature it will precipitate change. My sense is that Publishers see the writing on the wall, or more appropriately the screen and the smart ones are gearing up to a future with different business models. The winning publishers will move from serving science through scientific process and dissemination to doing that plus enabling knowledge discovery, more equitable reward systems and improving comprehension by a broader audience. Interestingly, it is not clear to me, based on my interactions with OAPSA, that open access publishers see it that way. Many simply see delivering papers as before, but with a different revenue model. Ironically even if they see the promise of change, they do not have the resources to make it happen. We must help them and that is why meetings like this one are so important. A serious example of what we must fix is the lack of consistent representation of their papers in XML. PubMed Central will come back to haunt us when developers begin to seriously try and use the content. This is history repeating itself – look at the biological databases. We should learn from history.

Open science is more that changing how we interface with the final product it is interfacing with the complete scientific process – motivation, ideas, hypotheses, experiments to test the hypotheses, data generated, analysis of that data, conclusions and  awareness. This is a tall order and I believe we need to proceed in steps. Clearly access and effective use of data is a valuable next step. Funders are demanding it, scientists (to some degree) are providing it and repositories exist to accept it. But right now it is a mess, but we have an opportunity. Ontologies exist, some tools exist and so we have the opportunity NOT to repeat the horrible loss of productivity we see in the publishing world of rehashing the same material for different publishers. Let us define and implement data standards and input mechanisms that capture the generic metadata, provide the hooks for more domain specific deeper content and allow a more universal deposition and search. We need to do this now before systems become entrenched. Otherwise Google, Bing and the like will be our tools for data discovery – we need deep and meaningful search of data.

Let me conclude with a couple of thoughts on what I believe should come from the workshop. 

1.     We will hear about some wonderful tools and innovative software developments to support scholarly communication – we must define a way to aggregate these efforts to facilitate uptake by others around a focused and shared development effort.

2.     We need to define ways to recruit to the movement – it will take more than tools to do so – are there clear wins for all concerned? If so what are they? Platforms to disseminate scholarship, new reward systems, knowledge discovery from open access content, proven improved comprehension.

What can we do so that more 15 year olds are active contributors to scholarship? This is our challenge. Thank you very much.

 

 

 

 

 

 

Bios of the Organizers

Bio sketches of the meeting organizers, who will float from group to group:

  1. Alyssa Goodman, Harvard University
  2. Alberto Pepe, Harvard University
  3. Mary Lee Kennedy, Harvard University
  4. Malgorzata (Gosia) Stergios, Harvard University
  5. Lee Dirks, Microsoft Research
  6. Alex Wade, Microsoft Research
  7.  Joshua M. Greenberg, Alfred P. Sloan Foundation
  8. Chris Mentzel, Gordon & Betty Moore Foundation

Bios of attendees in #platforms theme

  1. Mark Abbott, Oregon State University
  2. Taliesin Beynon, Wolfram Alpha
  3. Rachel Bruce, JISC
  4. Derick Campbell, Microsoft Research 
  5. Tim Clark, Harvard University
  6. Tom Cramer, Stanford University Library
  7. Trisha Cruse, CDL / UC3
  8. David De Roure , Oxford OeRC
  9. Cory Knobel, Pitt iSchool
  10. Jill Mesirov, Broad Institute (MIT)
  11. Alberto Pepe, Harvard University (CfA)
  12. Thomas Robitaille, Harvard-Smithsonian Center for Astrophysics

Demonstrations on PLATFORMS on Meeting Day 1

PLATFORMS Project collaboration software, “smart” laboratory software, provenance systems

Facilitator: Jill Mesirov, Associate Director, Broad Institute of MIT and Harvard

  • David De Roure, Professor Of e-Research, Oxford e-Research Centre (Demo of myExperiment)
  • Tim Clark, Director Of Bioinformatics, MassGeneral Institute for Neurodegenerative Disease, Harvard Medical School (Demo of SWAN)