Report from #review session

Starting with first principles: Broadening the definition of scholarly communication, review, and peer


The communications of today’s scholars encompass not only book and journal publications, but also less formal textual communications and a variety of other work products, many of them made possible by recent advances in information technology. It is not uncommon today for an academic CV or faculty activity report (the productivity report that faculty members provide each year) to list news articles, blog postings, tweets, video presentations, artworks, patents, datasets, recordings of government testimony, computer code, in-person (conference and meeting) communications, and other artifacts. Yet our entrenched systems of review and recognition are out of sync with this evolving scholarly record. New ways to track our broader scholarly record, and new methods and metrics for identifying relevant work and evaluating quality and impact of works, are needed to help legitimize a broader definition of academic contribution.

What is review? Who is a peer?

The term “review” refers to several different events or functions in the scholarly communications workflow. It can refer to evaluation of the aggregate contributions of a researcher as well as evaluation of specific publications describing specific research. It encompasses both qualitative and quantitative evaluation, at both pre-publication and post-publication stages.


Our vision of the future of peer review in scholarly communication sees a diminishing role for pre-publication evaluation, as the high costs associated with it have begun to outweigh the researcher incentives to contribute on a quid pro quo basis. New systems for post-publication evaluation are much better poised to leverage current and emerging scholarly workflows, and to capture multi-dimensional measures that combine qualitative and quantitative information in a way that serves the researcher-as-reader (by helping the reader to most effectively allocate limited attention to an ever more overwhelming scholarly record) and the researcher-as-author/contributor. Additionally, these new forms of review can provide valuable input into appointment (promotion and tenure) assessment processes — though we strongly oppose an over-reliance on simple quantitative measures related to publications as a substitute for thorough peer review of a scholar’s contributions within the assessment process.

Our vision of the future of peer review also sees a much broader definition of “peers” — those with a voice to participate in the review process — to include any consumer of the work, evaluating the work in any traceable context. This will add new complexity to the system, but also perhaps new transparency, and may have the effect of turning some review processes into more conversational activities.

What should be carried forward, and what should be left behind?

The current scholarly evaluation system works in many ways. It is increasingly clear that it could work better, but before we start suggesting alternatives, we need to examine what’s good about the present evaluation system. At their best, our current methods of review persist because they have been relatively successful at filtering out erroneous, duplicative, and less significant work, as well as helping us identify significant insights, advances, and discoveries. They leverage the trust we place in our peers. The hierarchy of peer-reviewed journals, supported by quantitative metrics like the Impact Factor, offers easy-to-use reputational heuristics in the academic review process.


These systems represented the state-of-the-art of an earlier technological era (the era of paper journals and books), and have served the community well. However, new technologies present opportunities to do better. Where legacy systems increasingly fall short is that they don’t function well across disciplines, they are subject to misuse, gaming, and misinterpretation, they can substantially slow the process of disseminating research findings, and they place significant burdens on researchers (this is particularly true with regards to pre-publication peer review). The result is that the process can be said to fail (in some measure) for the majority of published content.


Because the review process is deeply embedded in cultural norms for disciplines and sub-fields, innovative tools fail more commonly due to reasons of incompatibility with existing norms and behavior rather than due to technical shortcomings. Recent experiments that have failed have done so because they fail to understand disciplinary norms and boundaries (for instance, while a tool like arXivhas been very successful in certain fields of physics, the attempt to port it into certain fields of chemistry failed due to differing norms of sharing in the two fields); or they attempt to source crowds where there are no interested crowds (e.g., Facebook-like social networks for scientists).  Even the best tools will not succeed unless they harness the essential elements of existing review systems — particularly filtering and trust — and also support existing communities’ cultural parameters. Change of this nature is likely to come slowly, particularly if it is to have lasting impact.


Most promising technologies and practices for the future


Considering the diversity of research, its practitioners, and the quickly-changing scholarly environment, we should not expect a one-size-fits-all solution to the problems of review nor for a single Holy Grail metric to emerge in the foreseeable future. Rather, the future will rely on a range of tools to filter the work and assess its quality in the their disciplinary communities and amongst individual scholars.


Different disciplines will obviously rely on different tools but scholars must take responsibility for the accuracy and completeness of their own part of the scholarly record. While Open Access is one part of the solution, scholars need to educate themselves on the specific tools used in their disciplines and be aware of the tools used in others to maximize the possibilities.


These tools monitor the traces they leave to ensure their scholarship is accurately presented in the venues their disciplines value most. It is the responsibility of (and indeed, incumbent upon) the individual scholar to take control of his or her bibliographic legacy and reputation. Beyond this, scholars have vastly extended opportunities to understand how their work is being read and built upon by other scholars.


While the tools for consumers and producers of scholarship listed below provide an important overview of the current possibilities, there isn’t an expectation that they are appropriate or relevant for all disciplines.

Tools for consumers of scholarship: How to find the good stuff

     Multidimensional aggregators (ALM, citedin, ScienceCard,

     Formalized, directed commenting systems and systems that allow for rating the raters (F1000, CommentPress, digressit)

     Aggregated researcher data: Mendeley, Zotero

     Social bookmarking tools: Delicious, Cite-U-Like, Connotea

     RSS feeds for scientists, shared through bookmarking

     Social Networks: Twitter, Google+, Facebook

     Mentions and tweets (e.g., for digital humanities where community re-tweets of relevant links are valued for review purposes; see also

     Discipline-specific Twitter lists, Facebook groups, Google+ circles (making it easier to follow colleagues and colleagues’ work)

     Computing over literature: textual analysis (SEASR)

Tools for producers of scholarship: How to make sure your stuff is given its just rewards

     Keep a social record of your scholarly activities (Papers,  Mendeley, ReadCube, Zotero)

     Proactively create a broader record of scholarly communication, and co-create an institutional record (e.g., VIVO, CatalystProfiles)

     Disambiguate your own scholarly record (ORCID, MicrosoftAcademicSearch GoogleScholar, WebofScience)

     Track when you are being cited (regular vanity searches in Google Scholar, Microsoft Academic Search,, Scopus, WebofScience)

     Monitor social network activity surrounding your research (Twitter, Facebook, Google+ searches, Klout)

     Manage, track and present your own measures of impact (,, TotalImpact,,


Currently Unsolved Problems, Solutions, and Strategic Opportunities


There are several (currently) unsolved problems, these fall into 3 distinct categories:


Technical / Business Limitations

Problem: There are a lack of industry standards, meaning that the same metric cannot be easily compared between different platforms. Solution: Standards bodies (such as NISO) should define cross platform industry standards


Problem: Related to this, many corpora are not open in a machine readable fashion, making it problematic to apply a uniform metric across them. Solution: If possible, authors should mirror their content in Open Access Repositories


Problem: Certain metrics benefit from content being available in a specific (non-universal) format (e.g., Open Access content will gain more usage; multimedia content will receive more interaction). Hence certain content will be naturally disadvantaged in any alt-metrics evaluation of its benefits. Solution: Standard meta data sets should be attached to all output.


Problem: If we rely on third parties for data, then we must accept that those sources may change over time (or disappear). This means that alt-metric evaluations may never be ‘fixed’ or ‘repeatable’. Solution: Everything decays, but permalinks, and archival storage of data can limit the damage.


Problem: Metrics by and large only include so-called formal publications and do not capture the variety of informal media and data that are increasingly important to scholarly discourse. Solution: Providers of data need to open up their data sources, allowing tools to easily mine the widest possible variety of sources


Societal Limitations

Problem: Important decision makers (e.g., tenure committees) do not use alt-metrics in their evaluation process. Solution: The utility of alt-metrics needs to be demonstrated in order to persuade decision makers to use them


Problem: People ‘cite’ work in a wide variety of ways with a variety of semantics, resulting in difficulty automatically mining these cites. Solution: Due to the human condition, It is possible that there is no solution


Problem: Some work is never cited at all, but simply influences the work of others (e.g., a study which may inform a change in governmental policy). This can make automated mining impossible. Solution: Perhaps automation is impossible, but crowdsourcing this discovery is a solution


Adoption / Understanding Issues

Problem: Generational, geographic, and disciplinary differences mean that not all academics have adopted new methods of dissemination / evaluation to the same extent, hence disadvantaging certain sectors. Solution: Metrics that get valued the most should be the ones which have been adopted to the greatest extent. Societies and funders should encourage ‘best’ adoption of tools.


Problem: The notion of long-term ‘impact’ is not really well understood in traditional metrics and therefore is hard to replicate or improve upon in emerging methods. Different metrics have different value for different people and so academia will need to understand that there may never be a single metric to describe all work. Solution: We need a more nuanced and multi-dimensional understanding of impact for different groups.


Strategic Agenda:


     Propagate changes in attribution across scholarly systems (ORCID). ORCID is not in a position to handle the retrospective problem, so perhaps ‘gamify’ the problem to disambiguate author names; 

     Develop centralized (disambiguated) tools to track when you are being cited in the widest variety of possible sources;

     Develop standards to define metrics and metadata

     Open up common social media tools to develop more open environments for data interchange

     Dedicate more attention to non-Western tools and services (including multilingual tools)

     Propagate the adoption of Open (pre-publication) Peer Review


Interdependencies with other topics


The future of peer review in scholarly communication will include new methods and metrics for evaluating quality and impact that:

     extend beyond traditional print and digital outputs,

     are dependent on a broadening definition and semantic richness of literature — that is, of the communications that form the “reviewable” scholarly record.


Efforts to broaden and filter our scholarly record will legitimize a broader definition of academic productivity and enable new models of academic recognition and assessment that better align with the actual activities and contributions of today’s scholars.



Peter Binfield, PLoS

Amy Brand, Harvard University

Gregg Gordon, SSRN

Sarah Greene, Faculty of 1000

Carl Lagoze , Cornell University

Clifford Lynch, CNI

Tom McMail, Microsoft Research

Jason Priem, UNC-Chapel Hill

Katina Rogers, Alfred P. Sloan Foundation

Tom Scheinfeldt, Roy Rosenzweig Center for History and New Media at George Mason University


Media for eScholarship

Composite Media eScholarship Short List of Tools to consider

This list isn’t meant to be comprehensive but is the group’s choices of suggested tools and techniques to enhance eScholarship

·         Low level Media Tools

o    Mac

§  Screen capture- Opt-Shift-4  Whole screen

§  Application capture- Opt-Shift-Space  Application screen

§  Application demonstration capture -

o    Windows

§  Screen capture-  Snipit (windows accessory) or PrtSc

§  Application demonstration capture -  Camtasia

·         Versioning for scientific documents

Sharing documents by email is inefficient, and often confusing when multiple authors are editing the same documents - filenames like ‘Proposal FINAL_EDIT FINAL.docx’ don’t help anybody. Even the ‘Track changes’ tool in Word is useful to follow what’s changed, but there are more advanced tools available. Storing a complete history of changes to a document is also useful for recovering who did what, and why, later on.

Tools used by developers to keep track of code changes can be repurposed powerfully for document versioning. Git is probably the most straight forward distributed revision control solution. 

 About Git

Git offers a distributed revision control repository. Concretely this means that every member in a collaboration has a full copy of all documents and the directory structure available locally, rather than just on a server. Various online services for Git  (such as Github) exist; when there is a need for a ‘master’ repository various contributions can easily be synced.

Using Git

Designed by Linus Torvalds, Git’s primary user interface is command line although a basic default graphical user interface is supplied. There are however various graphical user interfaces readily available for Windows, Mac and Linux that allow for easier Git operations and enhanced functionality over the default GUI: 

* Windows: Git Extensions (,
 TortoiseGit (

* Mac: GitX (, GitBox (
* Linix: RabitVCS (
A good resource for learning git is the Git Community Book:

To learn about Github (and also git concepts) try:

Using Git to version Word Documents: 

git-annex allows managing files with git, without checking the file contents into git. While that may seem paradoxical, it is useful when dealing with files larger than git can currently easily handle, whether due to limitations in memory, check-summing time, or disk space.     

GRAND: Git Revisions As Named Data

GRAND is an experimental extension of Git, a distributed revision control system, which enables the synchronization of Git repositories over Content-Centric Networks (CCN). GRAND brings some of the benefits of CCN to Git, such as transparent caching, load balancing, and the ability to fetch objects by name rather than location. 

Named Data

Related to versioning is named data which, among other things, leads to named-data networking (NDN) or content-centric networking (CCN). Project CCNx™ is an open source project exploring the next step in networking, based on one fundamental architectural change: replacing named hosts with named content as the primary abstraction.





·         Blogging 

Blogs are primarily a tool for communication with the outside world - an easy way to publish content on the web. They can also be useful research tools - a private, password protected blog can be an online repository for your own notes and thoughts, an online lab notebook that is searchable. As blogs are inherently serial - one post following another - they’re particularly useful for tracking evolving thoughts and ideas. : Wordpress is the most commonly installed software for blogs; if you don’t want the hassle of running a server, will host your blog for you - it takes literally seconds. 

 Researchers who are good models of blogging in a radically open way :

Rosie Redfield (Astrobiology)- “Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.”

David Hogg (Astrophysics) - “I must post five days per week (I choose which five), except when traveling, no matter what I have done.”

Mary Beard (Classics) - Some outreach, some university politics but also a look at the processes of lecturing, teaching and thinking. is a network of bloggers writing about (mostly) peer reviewed research.

Astrobetter is a community of researchers writing about what tools help them :


An important aspect for to long-term accessibility and interoperability aspects is the choice of format for different classes of media. Widely spread formats based on open standards are more likely to be migratable to new formats than those that are based on proprietary formats of niche vendors. 

Another important aspect in the choice of formats would be the preservation of underlying data i.e. lossless formats, both in terms of the quality of the data (for instance resolution in images) as the overall information (such as the source file of a 3D model as opposed to the actual render of it). In some cases it might be necessary to do use a lossy format, for instance when need arises to precisely control the rendering of a document at the pixel level across different platform one would probably use PDF as apposed to ODF or OoXML. When deliberately choosing a lossy format, it is import to understand what kind of strategy to implement to keep the original rich data accessible as well. 

The following list isn’t intended to be a set of hard and fast rules - use whatever gets the job done - but it is worth bearing in mind that correct choice of format helps your document and media survive and helps others use it. 


Do use

Don’t use


·         ODF

·         OoXML

·         PDF with XMP

·         binary Word

·         plain PDF (without XMP)


·         OGG Vorbis

·         FLAC

·         AAC

·         Apple Lossless


·         WebM

·         H.264

·         DivX

·         QuickTime

·         Flash Video


·         WebP

·         PNG

·         SVG

·         GIF

·         JPEG

Interactive content

·         HTML5/CSS3/Javascript

·         Flash

·         Air

·         Silverlight


·         Screencasting

It’s often easier to show rather than tell; explaining how to use software, tools or even sharing visualizations is much simpler if you have video. 

Decent screencasting software includes the ability to record sound along with the video, to focus attention on an area or a window, and to emphasise the position of the mouse so that it’s easy to follow what’s going on. 

Camtasia : 30 day free trial :

ScreenFlow : $99, but full featured. Mac only

Pencasting (also known as ‘Livescribe’) is an interesting way to record sound and drawing at once. It requires the purchase of a special pen, but can produce interesting results :

RSA Animate - animated lectures :

Explaining a paper (recorded off the cuff) :

o   New type of journals: video journal (e.g. JOVE), interactive journals,

·         Enhanced Media



o   Dnatube

·         Interactive 3D Models

o   Why 3d models in papers?

§  3D models can convey spatial context through interaction that no number of static figures can equal. Eg this example in Nature  

o   What tools should I use?

§  Adobe Acrobat X enables import of 3D models and data

§  Creating 3D data visualizations: Octave, SciPyLab, R


·         Citizen Scholarship

What is Citizen Scholarship?
Citizen Scholarship is a way to involve the power of many minds to solve anything from grand-challenge science to understanding the nuances of changing culture.  Everyone can be part of addressing challenges, solving problems, finding solutions.  It can be applied to a wide variety of fields. It encourages everyone to engage in the scholarly processes.

How to get started?
There are projects that count on citizen involvement. Here are four:

    Get involved!


Platforms for scholarly communication

What is a platform? A platform is more than “software”: it is in fact an ecosystem that includes software, data, services and people.  It is in its essence sociotechnical, and its function is to enable research and scholarly communication. The Web, in its implementation and its philosophy, is the basis of each of the emergent, transformative scholarly communication platforms. A useful notion in this regard is the “social object”, be it data, workflow or paper, as the object of scientific discourse and thus also the creation of scientific social networks.


Reviewing the characteristics of several successful platforms for scholarly communication (where “success” is measured by the research that is enabled, rather than software downloads or revenue), several common features emerge: they solve a definite problem, they are easier to use than other solutions (have a low barrier to entry); they enable connections among a community that mutually benefits from participation; they evolve continuously in response to community and technology advances; they offer both user interfaces (UIs) and application programming interfaces (API’s), and there is often a reputational benefit to participating in the ecosystem revolving around the platform. The social elements are a key component to successful platforms; they are designed around users and what they share, not just interoperable infrastructures. In this discussion we consider only Web-based platforms.

At the most basic level of use, successful platforms in scientific settings are platforms that are generic rather than research-specific. These include document management systems such as Google Docs and Office Web Apps. Project and file management software such as Basecamp and Dropbox. General blogging and microblogging platforms such as Wordpress and Twitter have been “reappropriated” by research communities, becoming platforms for scientific and scholarly communication. The same applies to web content management systems such as Drupal. Successful platforms often require little in the way of formal training and researchers can rapidly derive value. Such platforms also generally fit easily into existing workflows and accommodate many types of hardware devices.

Among platforms that were built purposely for research, success stories include ArXiv, an open-access preprint repository (which is crucial in shaping and fostering research communities), and Github, an open-source repository of code and software. Often openness lends itself to success since it supports collaboration, access, re-use and agility which are conditions that can support both social and technical aspects of platforms in support of research.

Although many successful platforms will continue to be built around research needs, new ones are emerging that blur the line between the workplace and the social environment. There is a greater expectation among researchers that the platforms they use in their personal lives will also be available in their professional lives.


Given that the Web is the fundamental platform, it is perhaps not surprising that failure modes are those contrary to the way the web works. There are failures when platforms are not created through sociotechnical co-evolution but rather “build it and they will come”, and when they fail to evolve because they are too brittle and cannot respond to rapid changes in technology or user expectations.

For example, the Twine academic platform recently failed, possibly due to its pre-structured configuration by which use had to conform to a predetermined technology (in this case: semantic web data formats). The social networking platform has also failed to obtain wide use and uptake as it is addressing a use case that is already met by other solutions, for example LinkedIn.

Another example of unsuccessful research project, at a much larger scale, is the Virtual Observatory Intiative in Astronomy. The VO initiative suffered from too much focus on the design and development of large-scale infrastructural solutions. By doing so, it lacked an agile, dynamic component by which users could interact directly with tools and create communities of adopters. The Virtual Observatory has many rich features, such as catalogues, data base queries and data conversion and integration;  but in the context of this analysis it falls into the category of ‘brittle’, having been over-engineered and therefore not lending itself to use and responsiveness.

Key unsolved problems.

The major issue with a number of existing smart laboratory software and scientific management tools is poor adoption. A number of different systems have been developed: some aimed at scholarly and scientific communities at large, others aimed at specific disciplinary and laboratory contexts. Yet, adoption and penetration rates are overall low. By and large, most scientific research is performed via a heterogeneous assemblage of generic and domain-based tools, platforms, and practices. Wide adoption of a single platform for the management and publication of scientific workflows is an ongoing challenge fueled by the disparate data needs, scholarly practices, and tools across different domains.  

Promising platforms.

Because of the sociotechnical nature of platforms and the need for people and systems to co-evolve, building a successful platform is a challenge; more often, they are grown rather than created whole-cloth. One pattern for building platform is to take a system that has worked well in a particular domain or context, and extend it to other areas. Open source citizen science platforms such as GalaxyZoo’s extension to Zooniverse is a prime example; Stack Overflow (a question and answer system for computing programming) has also been successfully applied to other domains (e.g. Math, Computer Science, LaTeX.) Large-scale collaborative platforms such as Polymath are also promising collaborative tools as are similar efforts in the humanities (digital humanities projects).

In terms of supporting research a number of key platform tools are highlighted below that can be currently used by researchers.

Appendix: Examples of platforms for scholarly communication

Building tools and services

At the most basic level, with the arrival of cloud computing, the last decade has seen a revolution in the way that researchers can use computers, both for computation/analysis, and to set up web services. It is now possible for any scientist to set up web servers and services very quickly with no up-front cost, and without having to worry about setting up hardware or most software. Examples of cloud providers include:

On top of this existing infrastructure that is ready to go, there are several web frameworks that allow rapid creation of dynamic websites. These include:

An example of a web site for scientific communication built with Drupal is  The Harvard Time Series Center at is an example of such a site built using Django.

Another interesting contender is the Google App Engine ( - which provides both the infrastructure and the framework to quickly develop services.

Collaboration amongst researchers

Several tools already exist to facilitate collaboration between researchers and allow collaborative document editing, including:

Services also exist to easily synchronize files over multiple computers and allow sharing between several people, including:

Examples of hybrids of these are Evernote ( and OneNote ( which, in addition to being good platforms for scientists to keep organized notes and files relating to their research, allow sharing and collaborative editing of specific notebooks.

Beyond simple sharing of files and documents, teams can make use of wikis to coordinate projects. Examples of wiki frameworks or services include:

Software development often occupies an important place in research, and several websites now facilitate publication of version controlled code. For example, SourceForge ( is an example of a website for publishing open-source software projects. More recently, GitHub (, which pride themselves as a ‘social’ coding website, has become on of the best solutions for hosting open- and closed-source code, allowing not only the publication of the code, but active collaboration on the development itself, and communication between developers.

Direct verbal communication between scientists is of course essential to efficient research, and modern communication platforms such as Skype ( or ‘hangouts’ on Google Plus ( now allow multi-user video conferences without traditional expensive equipment.

Finally, larger-scale communication between scientists at conferences/meetings and more generally worldwide can greatly benefit from Twitter (, which with the use of hash tags allows focused communication between groups of individuals interested in a particular topic or meeting.

Reference Management

The process of publishing papers and exploring the existing literature has also seen a transformation, with the advent of advanced bibliography management tools which not only allow researchers to keep track of interesting publications, but also add the social/sharing aspect, and in many cases offer automated recommendations for other researchers and publications that would be of interest. These platforms are invaluable in navigating the vast literature in many fields. Examples include:


Over the past decade the use of a number of repository systems has arisen in order to support the publishing of research outputs and in the main these have focused on papers and in particular open access pre-prints. These repositories support the storage and access to research outputs, and might be used by universities, research groups or disciplinary communities. Examples include:

Communicating with the public

Once the research has been carried out and published, one final important step is communication with the wider scientific community and the public. Many platforms enable researchers to reach a wide audience, including:

  • Twitter ( - which in addition to being a collaboration tool already mentioned previously, is also an excellent platform for reaching a wide audience

  • Facebook ( - which similarly allows researchers to reach wide audiences and engage with the public. For example, the Hubble Space Telescope has a Facebook page that has over 70,000 ‘fans’.

  • Wordpress ( and Tumblr ( - two popular blogging platforms.


Mark Abbott, Oregon State University -

Taliesin Beynon, Wolfram Alpha -

Rachel Bruce, Joint Information Systems Committee, JISC,

Derick Campbell, Microsoft Research -

Tim Clark, Harvard Medical School / Mass General Hospital -

Tom Cramer, Stanford University -

Dave De Roure, Oxford e-Science Centre,

Cory Knobel, University of Pittsburgh -

Alberto Pepe, Harvard -

Thomas Robitaille, Harvard-Smithsonian CfA -

Types of resource

basic resources => data types => PNG images, Excel tables,
create resources => web based collaborative tools => google docs, google tables,
share resources => sharing platforms => dropbox, others
discover resources => good search tools =>
publish resources => curation, linking, discovery => dryad, dataverse

“Actions of a researcher”:

1. Plan and Discover

2. Generate ideas

3. Collect Data (observe and generate data)

4. Analyze

5. Disseminate and viz

6. Impact

Fig 1: The top 50 word cloud of terms used from this document. Note that “google”, “data”, “tools”, “discovery” and “search” are key features

Fig 2: A detailed word cloud of terms used from the tools section of this document.

1. Planning and Discovery

●Meet funder requirements for data management

○California Digital Library/ UC Curation Center (CDL/UC3) Data Management

Planning Tool (based on DCC tool)


●Analyze state of the art research

○Literature search, coupled with notification services.  General scholarly search


○Discipline-specific engines:




■Biomed experts:

■Google Alerts:


○Explore: Wolfram alpha - search over curated data:

○Data discovery: find related datasets/studies [GAP: no good ways to search for

data across disciplines, hard even within a particular domain]; some

discipline-specific examples:

●Library repository

○General data repositories, eg,

○Domain specific databases, eg,

●Obtain persistent identifiers

○services, eg,,

○identifiers, eg, DOIs, HTTP URIs

●After data is generated, archive it and generate citations and expose them to appropriate

abstracting and indexing services (eg, Web of Knowledge



○Dataverse:, (social science data), (astronomy data)



2. Generate Ideas

●Google Docs, Word, excel, latex,

●Wikis (

●mind map and concept map software



○Personal Brain:

●Evernote, data sharing - “cloud storage surfaces” :

●Blogs (, Twitter (, Disqus (


●Skype, WebEx, Adobe Connect

3. Collect Data

●Google spreadsheets:

●Microsoft Excel

●Relational and non relational databases

○mysql, oracle, postgresql BDB, CouchDB, NoSQL

●Future: Excel DataScope:

●Google Forms

●GIS, geo tagging

●Sensor Streaming Software

●Storing data and meta data:,_Digital_Libraries,_Persistent_Archi


4. Analyze

●Reviews of these tools:


●Data Wrangler:

●Google Fusion Tables:

●Google Refine:

●R, Splus

●Hadoop, Map/Reduce:


●Traditional perl, python, ruby, sed, awk, grep, (unix tools)










○Fusion Tables


○Many Eyes:

5. Disseminate and Viz.

See Generate above


○See Analysis above


●Google Docs

●Wikis (, Blogs (

●Pubmed / / / / /

●Google visualization API

●Open Layers


●BioCatalogue (web-services), Dryad (data), Dataverse (data), Google Code /

SourceForge / GitHub / Bitbucket (software)

6. Impact / klout / ranking / f1000 /

●H and G numbers  and

Key Unsolved Problems

●Universal scientific search

○“the email problem” - conversations over email are part of science, how to


○“the file transfer problem” - institute firewalls, “” freemium service

○“the file format problem” (video, documents, binary blobs)

○“the library subscription problem” (open access)

○converting audio to text, multimedia indexing and searching

●Lack of integration and seamlessness. Long list of tools that don’t interconnect.

●Not enough inter-disciplinary tools

●Making sense of thousands of papers, sites, etc. Processing vast amount of information

(without having to read them all). Some text mining tools that are OK, but lots to develop

in this area. Eg, Summarizing tools, aggregate tools, zoom in/zoom out, intelligent

filtering, recommendation engines,

●How are we going to teach all the tools, resources. The advocacy problem,

Possible answers to key unsolved problems  (concept, logic model) - we need a scientific version of this to trigger integration

●Searchable Registry for scientific, scholarly tools and resources (across domains)

#recognition part 1

Now that you have your work published, how do you become recognized? Of course, there are many different possibilities to distribute your profile and display your work. How do you get started with this process? Is it helpful to have profiles on LinkedIn, VIVO, Mendeley, BioMedExpert. Which tool is the best place to post your profile and published works? In the present time, there is not one leading place where people post and search for experts, making the process of recognition more challenging. Since there is not one leading service, it is necessary to use multiple sources and technologies to gain recognition in your field. 

In light of the fact, that most people will use more than one service, probably at least three. We have identified areas as good places to start for a researcher seeking to gain recognition: 

1. University homepage

As a part of a univeristy community taking adavantage of the systems that are offered by can increase your research transparency to your local community. Some systems have made this process easy by integrating prepopulating profiles such as the VIVO system.  However, sometimes these have restrictive formats.

2. Social Networking sites:

Social networking sites increase researchers’ opportunities to make connections to others in their fields of work. For example, maintaining a Mendeley profile, you can receive personalized statistics on your papers and connect to like-minded scholars within the network and discover new potential colleagues. 

3. Personal webpage or blog

Personal webpages and blogs allow you to personalize your profile and list your works in your own fashion. You have more “freedom” on a personal webpage or blog than you would in one of the above options. It does take a little more work to custom make a webpage, but is worth the time and effort. Of course, you can also link to your other sites and profiles from this page. 

4. Reference Mangers

Reference management tools can be an initial step to help researchers aggregate their publications. Some will also integrate online.

In each of the above areas we have compiled resources that fit into each category, see our Appendix below. It is important to note that some services integrate with each other, in most other cases information will be transferred via files, e.g. in BibTeX or RIS format. At the present time maintaining multiple outlets is standard, however there are also aggregation tools for some of the social media outlets where researchers can take advantage of integrating all of their outlets. 

A profile page should list the scholar activities of the researcher. You should try to give the following information - of course depending on your individual situation:


    It helps to spend some time to present a nice photo.

Contact form, email address (obfuscated) 

    Having the email readable to all is not recommended because of spam.

    Affiliation, past affiliations

Short description of your work and research interests, including a list of research interests/areas of expertise

     This could also included some personal information


    Papers, book chapters, books, posters, dissertations, presentations, etc. If possible, indicate Open Access and/or link to fulltext in institutional repository

Other scholarly activities: grants, patents,datasets, software development, peer review as reviewer or editor, as well as work in progress.

People working for you, collaborating with you

Awards, H-index and similar metrics, including (readership/download stats)

    It depends on social factors whether or not this information is appropriate

 Professional and public service activities

 Languages (including  computing resources), research sites, and experience working abroad.

Links to other services

    Twitter, Facebook, LinkedIn, Google+ - where appropriate

Events/conferences that you attended and will attend

Networking is important to build a reputation, this can be done a variety of ways; Twitter, Facebook, LinkedIn, and attending physical events and conferences. In order to be found by others researching you, it is necessary to have a web presence that can be found via a google search. Part of social networking is knowing your audience, it is important to learn how to describe your work orally in 1, 3, and 10 minute versions; learn to describe your work textually in 3, 10, and 25 sentences.

Challenge: Note that these practices can vary significantly by employment sectors (governments, industries, academia), institutions, sub/fields of research, and countries. We need strategies for crossing these tacit boundaries.

Challenge: Prepopulation of content and portability of the content to other services.Currently it is difficult to port content from one profile to another system. 

Challenge: The format of articles in scholarly research journals has been stable for about 300 years; the goal of such articles is to represent research results in a highly formuaic way. The new technologies/techniques under discussion here are representing the process of making knowledge in various ways.  Perhaps it is best to see the new tools as additions to the traditional practices, rather than displacing them.

Appendix of Tools: Based on the recommended areas for researchers, we have included a list of resources within each category. 

1- University Content Management System:




generic content management system

2-Social networking sites (for scientists):









3- Blogs, personal websites:



Wordpress or other blog hsoted on personal domain

4- Reference Managers: 







Author identifier services:


Microsoft Academic Search:

Researcher ID (ScienceCard):

Scopus Author Profile:

Aggregation tools




Educational Vignette: How to gain recognition for your research from the wider community

The following is addressed to a researcher seeking to understand how they can best present themselves and their research capabilities, skills, and expertise to the wider world.

You do great work, but the contribution you can make to the wider community isn’t always clear. How do people find you? What can you do to take the work you do and represent that to the wider public, whether you are looking for a job outside of research, contributing to discussions in the media, or providing your expertise to the courts? 

The bottom line is people will find you first via search — first through google, and second through facebook, LinkedIn, and other widely used social networks.  The same way that you seek out information on the web is the way many, or most, people will come to you. What happens when you do a search on your name?  Do the links there represent the best of what you have done? Do the web pages linked to provide summaries of your work in non-specialized language?  And if not, what can you do? Having an online profile, either your own website, or on a recognised service like LinkedIn is a great way to rise up the search results. You will have presentations that you’ve given. Do you share those on a service like SlideShare? Many researchers have presentations that have been viewed thousands of times, reaching a much larger audience than the people who were in the room. Done well these presentations are a powerful way to demonstrate your communication skills.

Writing is a skill you can take anywhere, and writing online, whether on blogs, forums, or places like Wikipedia is an effective way to improve those skills. If you write online about your research you can both promote your research work and raise its profile as well as hone and demonstrate those more generic writing and communication skills. The content you create will help people to find you. People who are looking for speakers, people who are looking for experts, and people who are looking for the right collaborators for their team. Also if you have a common name it is this online content that will differentiate you from all those others with similar names.

You can monitor your online presence with automated google alerts and similar services. Moreover, these forms of online work — slides on SlideShare, blogposts, etc — generate forms of metadata and usage metrics that can be aggregated by services such as Total Impact, which pull together information about how that work has been used, allowing a researcher to demonstrate their influence on other researchers in the field. These tools can help you to decide what is working for you, as well as help you to show people how your work compares to that of others. And what is more you can start using this information to enrich your CV, give evidence to mentors writing letters of recommendation, to demonstrate to the world who you are and what you can do. 

Notepad for the Recognition Group

The working notes for the recognition group can be found at:

#platforms workbook

#platforms are being discussed here in a collaborative fashion

Notes for #review group

We’re taking notes and chatting here:

Video links now posted on the Workshop website


All of the videos for today’s demos/presentation will soon be posted and available on the workshop website (on the agenda page) at  Feel free to share these links!


Updated version of the FORCE11 whitepaper

Dear all,

Just to hammer home the fact that this is very much a work in progress, enclosed pse find a next iteration of the manifesto, with updated references.

I’ll see if I can update the hard copy pile next to the registration desk as well :-)!


- to the html version of this document is

- to other versions of the manifesto:

- links to related efforts (please add your own!):

Thanks for your interest,

- Anita. 

Anita de Waard

Disruptive Technologies Director, Elsevier Labs

Rough Transcript of my Opening Remarks

Dear Colleagues:

 It is an honor to be asked to address this group, some of whom I seem to see more than my own family, to set the stage for this 2011 Microsoft Research sponsored eScience workshop on Transforming Scholarly Communication.

 The first fundamental question to ask is, do we need a transformation in the first place? Obviously we all believe we do otherwise we would not be here, but what about the majority of scholars? I use my colleagues up and down the corridor as a benchmark to answer that question. A group currently oblivious to much of what we will show tomorrow. But nevertheless a group increasingly not oblivious to the changes going on around them – data sharing policies, cuts in library budgets, open access, and our students. Let me illustrate this latter point with a recent example of something remarkable that happened to me.

 A couple of months ago I received by email a paper to PLoS Comp Biol. This happens from time to time as authors try and circumvent the standard submission procedure and contact me as Editor in Chief directly. It was a paper in pandemic modeling, which appeared to question conventional approaches to such modeling. Not being an expert here I sent the manuscript to Simon Levin in Princeton who is on our Editorial Board for his opinion. Simon is a Kyoto Prize winner and an expert in large-scale biological modeling. He indicated there was something special about this well written paper. Since the sole author was living in San Diego I agreed to meet with her and discuss the work. Simply by asking she had received a large amount of free computer time from the San Diego Supercomputer Center (SDSC), got free access to Mathamatica and had clearly benefited from the open access literature as well as resources like Wikipedia. I encouraged her to submit the work to Science, which she has done, and it is currently under review.


The sole author’s name is Meredith.  What makes this story remarkable, is that she is 15 years old and a senior at La Jolla High School in San Diego. She subsequently presented her work at my lab meeting, which I must say was much better attended than usual. Sitting there with my eyes closed I thought I was in the presence of a professor deep into their area of expertise. It was only when I opened my eyes and saw the braces did reality sink in.

Clearly this is an extreme case, but lets not be modest, what we are trying to do here is enable anyone with an Internet connection and a will to learn, achieve what Meredith has achieved. I cannot think of a more noble cause. While we have seen this possibility for a long time, what is new is that others are now seeing it too.

We all have our own Meredith stories or at the very least some driver that moves us in the same way. For some it is the glacial pace at which knowledge exchange takes place; for others it is the sense of unease about the lack of reproducibility in our own science; for others it is the inaccessibility of knowledge; and for others still it is the totally qualitative way quantitative scientists measure the value of scholarship.

With Meredith as our motivation, let us take a minute to analyze the path we are on towards transforming scholarship through what has happened this past year and then what might happen as a result of this workshop.

2011 may well be remembered as the year that stakeholders – scientists, publishers, archivists and librarians, developers, funders, and decision makers went from working in isolation to beginning to work together. What started with Beyond the PDF in January had become a “movement” by the time the summer meetings were over. The Dagstuhl meeting captured the spirit in a manifesto that should become a living document for us all to consider. Movements have transformed entrenched systems before and it remains an open and very exciting question as to whether that will happen here. For a small group to cause change to many requires that the many believe that change is needed and gradually get on board. I believe that time has come.

The driver of change is the ground swell towards open science. When I first heard that a group of prominent life scientists got together and agreed to start a new open access journal I was disappointed – such vision coming up with something that we had already. But if the effort by HHMI, Wellcome Trust and Max Plank does indeed compete with Science and Nature it will precipitate change. My sense is that Publishers see the writing on the wall, or more appropriately the screen and the smart ones are gearing up to a future with different business models. The winning publishers will move from serving science through scientific process and dissemination to doing that plus enabling knowledge discovery, more equitable reward systems and improving comprehension by a broader audience. Interestingly, it is not clear to me, based on my interactions with OAPSA, that open access publishers see it that way. Many simply see delivering papers as before, but with a different revenue model. Ironically even if they see the promise of change, they do not have the resources to make it happen. We must help them and that is why meetings like this one are so important. A serious example of what we must fix is the lack of consistent representation of their papers in XML. PubMed Central will come back to haunt us when developers begin to seriously try and use the content. This is history repeating itself – look at the biological databases. We should learn from history.

Open science is more that changing how we interface with the final product it is interfacing with the complete scientific process – motivation, ideas, hypotheses, experiments to test the hypotheses, data generated, analysis of that data, conclusions and  awareness. This is a tall order and I believe we need to proceed in steps. Clearly access and effective use of data is a valuable next step. Funders are demanding it, scientists (to some degree) are providing it and repositories exist to accept it. But right now it is a mess, but we have an opportunity. Ontologies exist, some tools exist and so we have the opportunity NOT to repeat the horrible loss of productivity we see in the publishing world of rehashing the same material for different publishers. Let us define and implement data standards and input mechanisms that capture the generic metadata, provide the hooks for more domain specific deeper content and allow a more universal deposition and search. We need to do this now before systems become entrenched. Otherwise Google, Bing and the like will be our tools for data discovery – we need deep and meaningful search of data.

Let me conclude with a couple of thoughts on what I believe should come from the workshop. 

1.     We will hear about some wonderful tools and innovative software developments to support scholarly communication – we must define a way to aggregate these efforts to facilitate uptake by others around a focused and shared development effort.

2.     We need to define ways to recruit to the movement – it will take more than tools to do so – are there clear wins for all concerned? If so what are they? Platforms to disseminate scholarship, new reward systems, knowledge discovery from open access content, proven improved comprehension.

What can we do so that more 15 year olds are active contributors to scholarship? This is our challenge. Thank you very much.







Pre-reading for Harvard/MSR eScience Workshop


A bit of recommended reading to suggest prior to the workshop – something hot off the press! 

One of our workshop attendees, Anita de Waard (the Disruptive Technologies Director at Elsevier Labs) is sharing with us a DRAFT version of a document that was prepared by attendees at workshop held in Dagstuhl, German this past August.  The group was convened to address the same challenge we are tackling in this workshop – how to speed change in scholarly communication.  The group calls itself the Future of Research Communication (the FoRC Network – or Force11), and we are lucky to have ~6 of these people joining us for our meeting.  We hope very much that our workshop in Cambridge can carry forward the momentum from Dagsthul – as well as from the “Beyond the PDF” workshop held in San Diego this past January.

The Dagstuhl Manifesto document can be found here, and is in various formats (PDF, HTML, and LaTex).

We’d like to stress to everyone that (a) this document is still very much in DRAFT form and (b) has not been validated by the Force11 participants.  We ask that you not disseminate it further at this point, but simply consume it for the purposes of informing yourselves in advance of this workshop.  Note that it will be moved to a permanent web-home shortly, so this is just a preliminary “sneak peek” of this white paper.  Once it is more formally released, you will be able to point to it, blog about it, etc – but we ask you to hold off for now.  Our thanks to Anita and the other authors for sharing it at this stage.

Hope you are able to review this short paper before the workshop commences!


Bios of the Organizers

Bio sketches of the meeting organizers, who will float from group to group:

  1. Alyssa Goodman, Harvard University
  2. Alberto Pepe, Harvard University
  3. Mary Lee Kennedy, Harvard University
  4. Malgorzata (Gosia) Stergios, Harvard University
  5. Lee Dirks, Microsoft Research
  6. Alex Wade, Microsoft Research
  7.  Joshua M. Greenberg, Alfred P. Sloan Foundation
  8. Chris Mentzel, Gordon & Betty Moore Foundation