Types of resource

basic resources => data types => PNG images, Excel tables,
create resources => web based collaborative tools => google docs, google tables,
share resources => sharing platforms => dropbox, others
discover resources => good search tools =>
publish resources => curation, linking, discovery => dryad, dataverse

“Actions of a researcher”:

1. Plan and Discover

2. Generate ideas

3. Collect Data (observe and generate data)

4. Analyze

5. Disseminate and viz

6. Impact

Fig 1: The top 50 word wordle.com cloud of terms used from this document. Note that “google”, “data”, “tools”, “discovery” and “search” are key features

Fig 2: A detailed wordle.com word cloud of terms used from the tools section of this document.

1. Planning and Discovery

●Meet funder requirements for data management

○California Digital Library/ UC Curation Center (CDL/UC3) Data Management

Planning Tool (based on DCC tool) https://dmp.cdlib.org/

○DCC: https://dmponline.dcc.ac.uk/

●Analyze state of the art research

○Literature search, coupled with notification services.  General scholarly search





○Discipline-specific engines:

■Physics: http://arxiv.org

■Astronomy: http://ads.harvard.edu

■Medicine: http://www.ncbi.nlm.nih.gov/pubmed/

■Biomed experts: http://www.biomedexperts.com/Portal.aspx

■Google Alerts: http://www.google.com/alerts

■PubCrawler: http://pubcrawler.gen.tcd.ie/

○Explore: Wolfram alpha - search over curated data: http://wolframalpha.com

○Data discovery: find related datasets/studies [GAP: no good ways to search for

data across disciplines, hard even within a particular domain]; some

discipline-specific examples:

●Library repository http://www.bids.ac.uk/

○General data repositories, eg, http://dataone.org/

○Domain specific databases, eg, http://www.pdb.org/

●Obtain persistent identifiers

○services, eg, http://n2t.net/ezid, http://handle.net/

○identifiers, eg, DOIs, HTTP URIs

●After data is generated, archive it and generate citations and expose them to appropriate

abstracting and indexing services (eg, Web of Knowledge http://wokinfo.com/)

○DataCite http://datacite.org/

○Dryad http://datadryad.org

○Dataverse: http://thedata.org, http://dvn.iq.harvard.edu (social science data),

http://dvn.theastrodata.org (astronomy data)


○MyExperiment: http://www.myexperiment.org

2. Generate Ideas

●Google Docs, Word, excel, latex,

●Wikis (http://wikispaces.com)

●mind map and concept map software

○mindmeister: http://www.mindmeister.com/

○CMapTools http://cmap.ihmc.us/

○Personal Brain: http://www.thebrain.com/

●Evernote, data sharing - “cloud storage surfaces” : http://www.evernote.com/

●Blogs (http://wordpress.com), Twitter (http://twitter.com), Disqus (http://disqus.com)

●WorldMap: http://worldmap.harvard.edu/

●Skype, WebEx, Adobe Connect

3. Collect Data

●Google spreadsheets: http://docs.google.com

●Microsoft Excel

●Relational and non relational databases

○mysql, oracle, postgresql BDB, CouchDB, NoSQL


●Future: Excel DataScope: http://research.microsoft.com/en-us/projects/exceldatascope/

●Google Forms

●GIS, geo tagging http://en.wikipedia.org/wiki/Geotagging

●Sensor Streaming Software


●Storing data and meta data:



4. Analyze

●Reviews of these tools:



●Data Wrangler: http://vis.stanford.edu/wrangler/app/

●Google Fusion Tables: http://www.google.com/fusiontables/Home/

●Google Refine: http://code.google.com/p/google-refine/

●R, Splus http://www.math.montana.edu/Rweb/

●Hadoop, Map/Reduce: http://hadoop.apache.org/

●AWS: http://aws.amazon.com/

●Traditional perl, python, ruby, sed, awk, grep, (unix tools)


●Lucene: http://lucene.apache.org/java/docs/index.html

●Matlab: http://www.mathworks.com/products/matlab/index.html

●Mathematica: http://www.wolfram.com/mathematica/

●Wolfram/Alpha: http://www.wolframalpha.com/






○Tableau: http://www.tableausoftware.com/

○Fusion Tables


○Many Eyes: http://www-958.ibm.com/software/data/cognos/manyeyes/

5. Disseminate and Viz.

See Generate above


○See Analysis above

●Mendeley http://mendeley.com

●Google Docs

●Wikis (http://wikispaces.com), Blogs (http://wordpress.com)


twitter.com / blogger.com / tumblr.com / posterous.com / google.com / wordpress.com

●Google visualization API

●Open Layers


●BioCatalogue (web-services), Dryad (data), Dataverse (data), Google Code /

SourceForge / GitHub / Bitbucket (software)

6. Impact

total-impact.org / klout / ranking / f1000 /

●H and G numbers eigenfactor.org  and readermeter.org


Key Unsolved Problems

●Universal scientific search

○“the email problem” - conversations over email are part of science, how to


○“the file transfer problem” - institute firewalls, “dropbox.com” freemium service

○“the file format problem” (video, documents, binary blobs)

○“the library subscription problem” (open access)

○converting audio to text, multimedia indexing and searching

●Lack of integration and seamlessness. Long list of tools that don’t interconnect.

●Not enough inter-disciplinary tools

●Making sense of thousands of papers, sites, etc. Processing vast amount of information

(without having to read them all). Some text mining tools that are OK, but lots to develop

in this area. Eg, Summarizing tools, aggregate tools, zoom in/zoom out, intelligent

filtering, recommendation engines, http://www.nactem.ac.uk

●How are we going to teach all the tools, resources. The advocacy problem,

Possible answers to key unsolved problems

ifttt.com  (concept, logic model) - we need a scientific version of this to trigger integration

●Searchable Registry for scientific, scholarly tools and resources (across domains)

  1. hkeclectic reblogged this from msrworkshop and added:
    Good summarized review...issues concerning research
  2. ariana-reed reblogged this from msrworkshop
  3. Merce Crosas submitted this to msrworkshop
blog comments powered by Disqus