Media for eScholarship

Composite Media eScholarship Short List of Tools to consider

This list isn’t meant to be comprehensive but is the group’s choices of suggested tools and techniques to enhance eScholarship

·         Low level Media Tools

o    Mac

§  Screen capture- Opt-Shift-4  Whole screen

§  Application capture- Opt-Shift-Space  Application screen

§  Application demonstration capture -

o    Windows

§  Screen capture-  Snipit (windows accessory) or PrtSc

§  Application demonstration capture -  Camtasia

·         Versioning for scientific documents

Sharing documents by email is inefficient, and often confusing when multiple authors are editing the same documents - filenames like ‘Proposal FINAL_EDIT FINAL.docx’ don’t help anybody. Even the ‘Track changes’ tool in Word is useful to follow what’s changed, but there are more advanced tools available. Storing a complete history of changes to a document is also useful for recovering who did what, and why, later on.

Tools used by developers to keep track of code changes can be repurposed powerfully for document versioning. Git is probably the most straight forward distributed revision control solution. 

 About Git

Git offers a distributed revision control repository. Concretely this means that every member in a collaboration has a full copy of all documents and the directory structure available locally, rather than just on a server. Various online services for Git  (such as Github) exist; when there is a need for a ‘master’ repository various contributions can easily be synced.

Using Git

Designed by Linus Torvalds, Git’s primary user interface is command line although a basic default graphical user interface is supplied. There are however various graphical user interfaces readily available for Windows, Mac and Linux that allow for easier Git operations and enhanced functionality over the default GUI: 

* Windows: Git Extensions (http://code.google.com/p/gitextensions/),
 TortoiseGit (
http://code.google.com/p/tortoisegit/)

* Mac: GitX (http://gitx.frim.nl/), GitBox (http://gitboxapp.com/)
* Linix: RabitVCS (
http://www.rabbitvcs.org/)
A good resource for learning git is the Git Community Book:
http://book.git-scm.com/

To learn about Github (and also git concepts) try:

http://learn.github.com/p/intro.html
http://help.github.com/

Using Git to version Word Documents:
http://www.cybersprocket.com/2010/project-management/sdlc/versions/versioning-word-documents-in-git/ 

git-annex allows managing files with git, without checking the file contents into git. While that may seem paradoxical, it is useful when dealing with files larger than git can currently easily handle, whether due to limitations in memory, check-summing time, or disk space.

http://git-annex.branchable.com/     

GRAND: Git Revisions As Named Data
http://janakj.org/papers/grand.pdf

GRAND is an experimental extension of Git, a distributed revision control system, which enables the synchronization of Git repositories over Content-Centric Networks (CCN). GRAND brings some of the benefits of CCN to Git, such as transparent caching, load balancing, and the ability to fetch objects by name rather than location. 

Named Data

Related to versioning is named data which, among other things, leads to named-data networking (NDN) or content-centric networking (CCN). Project CCNx™ is an open source project exploring the next step in networking, based on one fundamental architectural change: replacing named hosts with named content as the primary abstraction.

o    http://www.named-data.net/

o    https://github.com/ProjectCCNx/ccnx

o    http://www.parc.com/work/focus-area/content-centric-networking/

 

·         Blogging 

Blogs are primarily a tool for communication with the outside world - an easy way to publish content on the web. They can also be useful research tools - a private, password protected blog can be an online repository for your own notes and thoughts, an online lab notebook that is searchable. As blogs are inherently serial - one post following another - they’re particularly useful for tracking evolving thoughts and ideas. 

Wordpress.com : Wordpress is the most commonly installed software for blogs; if you don’t want the hassle of running a server, wordpress.com will host your blog for you - it takes literally seconds. 

 Researchers who are good models of blogging in a radically open way :

Rosie Redfield (Astrobiology)- “Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.” http://rrresearch.fieldofscience.com/

David Hogg (Astrophysics) - “I must post five days per week (I choose which five), except when traveling, no matter what I have done.” http://hoggresearch.blogspot.com/

Mary Beard (Classics) - Some outreach, some university politics but also a look at the processes of lecturing, teaching and thinking. http://timesonline.typepad.com/dons_life/

Researchblogging.org is a network of bloggers writing about (mostly) peer reviewed research.

Astrobetter is a community of researchers writing about what tools help them : http://www.astrobetter.com/

Formats

An important aspect for to long-term accessibility and interoperability aspects is the choice of format for different classes of media. Widely spread formats based on open standards are more likely to be migratable to new formats than those that are based on proprietary formats of niche vendors. 

Another important aspect in the choice of formats would be the preservation of underlying data i.e. lossless formats, both in terms of the quality of the data (for instance resolution in images) as the overall information (such as the source file of a 3D model as opposed to the actual render of it). In some cases it might be necessary to do use a lossy format, for instance when need arises to precisely control the rendering of a document at the pixel level across different platform one would probably use PDF as apposed to ODF or OoXML. When deliberately choosing a lossy format, it is import to understand what kind of strategy to implement to keep the original rich data accessible as well. 

The following list isn’t intended to be a set of hard and fast rules - use whatever gets the job done - but it is worth bearing in mind that correct choice of format helps your document and media survive and helps others use it. 

Class

Do use

Don’t use

Documents

·         ODF

·         OoXML

·         PDF with XMP

·         binary Word

·         plain PDF (without XMP)

Audio

·         OGG Vorbis

·         FLAC

·         AAC

·         Apple Lossless

Video

·         WebM

·         H.264

·         DivX

·         QuickTime

·         Flash Video

Graphics

·         WebP

·         PNG

·         SVG

·         GIF

·         JPEG

Interactive content

·         HTML5/CSS3/Javascript

·         Flash

·         Air

·         Silverlight

 

·         Screencasting

It’s often easier to show rather than tell; explaining how to use software, tools or even sharing visualizations is much simpler if you have video. 

Decent screencasting software includes the ability to record sound along with the video, to focus attention on an area or a window, and to emphasise the position of the mouse so that it’s easy to follow what’s going on. 

Camtasia : 30 day free trial : http://www.techsmith.com/camtasia.html

ScreenFlow : $99, but full featured. Mac only http://www.telestream.net/screen-flow/

Pencasting (also known as ‘Livescribe’) is an interesting way to record sound and drawing at once. It requires the purchase of a special pen, but can produce interesting results :

RSA Animate - animated lectures : http://www.youtube.com/watch?v=u6XAPnuFjJc

Explaining a paper (recorded off the cuff) : http://thebeautifulstars.blogspot.com/2011/04/pencasting-galaxy-zoo-science-at.html

o   New type of journals: video journal (e.g. JOVE), interactive journals,

·         Enhanced Media

o   SCIVEE

o   Benchfly.com

o   Dnatube

·         Interactive 3D Models

o   Why 3d models in papers?

§  3D models can convey spatial context through interaction that no number of static figures can equal. Eg this example in Nature  

o   What tools should I use?

§  Adobe Acrobat X enables import of 3D models and data

§  Creating 3D data visualizations: Octave, SciPyLab, R

  

·         Citizen Scholarship

What is Citizen Scholarship?
Citizen Scholarship is a way to involve the power of many minds to solve anything from grand-challenge science to understanding the nuances of changing culture.  Everyone can be part of addressing challenges, solving problems, finding solutions.  It can be applied to a wide variety of fields. It encourages everyone to engage in the scholarly processes.

How to get started?
There are projects that count on citizen involvement. Here are four:

    Get involved!

 


Rough Transcript of my Opening Remarks

Dear Colleagues:

 It is an honor to be asked to address this group, some of whom I seem to see more than my own family, to set the stage for this 2011 Microsoft Research sponsored eScience workshop on Transforming Scholarly Communication.

 The first fundamental question to ask is, do we need a transformation in the first place? Obviously we all believe we do otherwise we would not be here, but what about the majority of scholars? I use my colleagues up and down the corridor as a benchmark to answer that question. A group currently oblivious to much of what we will show tomorrow. But nevertheless a group increasingly not oblivious to the changes going on around them – data sharing policies, cuts in library budgets, open access, and our students. Let me illustrate this latter point with a recent example of something remarkable that happened to me.

 A couple of months ago I received by email a paper to PLoS Comp Biol. This happens from time to time as authors try and circumvent the standard submission procedure and contact me as Editor in Chief directly. It was a paper in pandemic modeling, which appeared to question conventional approaches to such modeling. Not being an expert here I sent the manuscript to Simon Levin in Princeton who is on our Editorial Board for his opinion. Simon is a Kyoto Prize winner and an expert in large-scale biological modeling. He indicated there was something special about this well written paper. Since the sole author was living in San Diego I agreed to meet with her and discuss the work. Simply by asking she had received a large amount of free computer time from the San Diego Supercomputer Center (SDSC), got free access to Mathamatica and had clearly benefited from the open access literature as well as resources like Wikipedia. I encouraged her to submit the work to Science, which she has done, and it is currently under review.

 

The sole author’s name is Meredith.  What makes this story remarkable, is that she is 15 years old and a senior at La Jolla High School in San Diego. She subsequently presented her work at my lab meeting, which I must say was much better attended than usual. Sitting there with my eyes closed I thought I was in the presence of a professor deep into their area of expertise. It was only when I opened my eyes and saw the braces did reality sink in.

Clearly this is an extreme case, but lets not be modest, what we are trying to do here is enable anyone with an Internet connection and a will to learn, achieve what Meredith has achieved. I cannot think of a more noble cause. While we have seen this possibility for a long time, what is new is that others are now seeing it too.

We all have our own Meredith stories or at the very least some driver that moves us in the same way. For some it is the glacial pace at which knowledge exchange takes place; for others it is the sense of unease about the lack of reproducibility in our own science; for others it is the inaccessibility of knowledge; and for others still it is the totally qualitative way quantitative scientists measure the value of scholarship.

With Meredith as our motivation, let us take a minute to analyze the path we are on towards transforming scholarship through what has happened this past year and then what might happen as a result of this workshop.

2011 may well be remembered as the year that stakeholders – scientists, publishers, archivists and librarians, developers, funders, and decision makers went from working in isolation to beginning to work together. What started with Beyond the PDF in January had become a “movement” by the time the summer meetings were over. The Dagstuhl meeting captured the spirit in a manifesto that should become a living document for us all to consider. Movements have transformed entrenched systems before and it remains an open and very exciting question as to whether that will happen here. For a small group to cause change to many requires that the many believe that change is needed and gradually get on board. I believe that time has come.

The driver of change is the ground swell towards open science. When I first heard that a group of prominent life scientists got together and agreed to start a new open access journal I was disappointed – such vision coming up with something that we had already. But if the effort by HHMI, Wellcome Trust and Max Plank does indeed compete with Science and Nature it will precipitate change. My sense is that Publishers see the writing on the wall, or more appropriately the screen and the smart ones are gearing up to a future with different business models. The winning publishers will move from serving science through scientific process and dissemination to doing that plus enabling knowledge discovery, more equitable reward systems and improving comprehension by a broader audience. Interestingly, it is not clear to me, based on my interactions with OAPSA, that open access publishers see it that way. Many simply see delivering papers as before, but with a different revenue model. Ironically even if they see the promise of change, they do not have the resources to make it happen. We must help them and that is why meetings like this one are so important. A serious example of what we must fix is the lack of consistent representation of their papers in XML. PubMed Central will come back to haunt us when developers begin to seriously try and use the content. This is history repeating itself – look at the biological databases. We should learn from history.

Open science is more that changing how we interface with the final product it is interfacing with the complete scientific process – motivation, ideas, hypotheses, experiments to test the hypotheses, data generated, analysis of that data, conclusions and  awareness. This is a tall order and I believe we need to proceed in steps. Clearly access and effective use of data is a valuable next step. Funders are demanding it, scientists (to some degree) are providing it and repositories exist to accept it. But right now it is a mess, but we have an opportunity. Ontologies exist, some tools exist and so we have the opportunity NOT to repeat the horrible loss of productivity we see in the publishing world of rehashing the same material for different publishers. Let us define and implement data standards and input mechanisms that capture the generic metadata, provide the hooks for more domain specific deeper content and allow a more universal deposition and search. We need to do this now before systems become entrenched. Otherwise Google, Bing and the like will be our tools for data discovery – we need deep and meaningful search of data.

Let me conclude with a couple of thoughts on what I believe should come from the workshop. 

1.     We will hear about some wonderful tools and innovative software developments to support scholarly communication – we must define a way to aggregate these efforts to facilitate uptake by others around a focused and shared development effort.

2.     We need to define ways to recruit to the movement – it will take more than tools to do so – are there clear wins for all concerned? If so what are they? Platforms to disseminate scholarship, new reward systems, knowledge discovery from open access content, proven improved comprehension.

What can we do so that more 15 year olds are active contributors to scholarship? This is our challenge. Thank you very much.

 

 

 

 

 

 

Bios of the Organizers

Bio sketches of the meeting organizers, who will float from group to group:

  1. Alyssa Goodman, Harvard University
  2. Alberto Pepe, Harvard University
  3. Mary Lee Kennedy, Harvard University
  4. Malgorzata (Gosia) Stergios, Harvard University
  5. Lee Dirks, Microsoft Research
  6. Alex Wade, Microsoft Research
  7.  Joshua M. Greenberg, Alfred P. Sloan Foundation
  8. Chris Mentzel, Gordon & Betty Moore Foundation

Bios of attendees in #media theme

  1. Magchiel Bijsterbosch, SURF
  2. Aaron Culich, Univ of California, Berkeley
  3. Josh Greenberg, Sloan Foundation
  4. Chris Lintott, Zooniverse
  5. Mimi McClure, NSF Office of Cyberinfrastructure 
  6. Paul Oka, Microsoft Research - New England
  7. Moshe Pritsker, Journal of Visualized Experiments 
  8. Jeffrey Schnapp, Harvard (MetaLab)
  9. Susan Schreibman, TCD - Digital Humanities Observatory, Dublin
  10. Katie Vale, Harvard Instructional Computing 
  11. Curtis Wong, Microsoft Research

Demonstrations on MEDIA on Meeting Day 1

MEDIA Production, distribution, archiving (e.g., video, 3-D modeling, databases)
Facilitator: Curtis Wong, Microsoft Research

  • Phil Bourne, Professor of Pharmacology, University of California, San Diego (Demos of SciVee and BioLit)
  • Moshe Pritsker, CEO, Editor-in-Chief, Co-founder, Journal of Visualized Experiments (Demo of JoVe)
  • Martin Wattenberg, Co-leader, “Big Picture” Data Visualization Group at Google (Demo of ManyEyes)