Types of resource


basic resources => data types => PNG images, Excel tables,
create resources => web based collaborative tools => google docs, google tables,
share resources => sharing platforms => dropbox, others
discover resources => good search tools =>
publish resources => curation, linking, discovery => dryad, dataverse

“Actions of a researcher”:

1. Plan and Discover

2. Generate ideas

3. Collect Data (observe and generate data)

4. Analyze

5. Disseminate and viz

6. Impact

Fig 1: The top 50 word wordle.com cloud of terms used from this document. Note that “google”, “data”, “tools”, “discovery” and “search” are key features

Fig 2: A detailed wordle.com word cloud of terms used from the tools section of this document.

1. Planning and Discovery

●Meet funder requirements for data management

○California Digital Library/ UC Curation Center (CDL/UC3) Data Management

Planning Tool (based on DCC tool) https://dmp.cdlib.org/

○DCC: https://dmponline.dcc.ac.uk/

●Analyze state of the art research

○Literature search, coupled with notification services.  General scholarly search

engines:

http://scholar.google.com

http://academic.research.microsoft.com

http://www.scopus.com/home.url

○Discipline-specific engines:

■Physics: http://arxiv.org

■Astronomy: http://ads.harvard.edu

■Medicine: http://www.ncbi.nlm.nih.gov/pubmed/

■Biomed experts: http://www.biomedexperts.com/Portal.aspx

■Google Alerts: http://www.google.com/alerts

■PubCrawler: http://pubcrawler.gen.tcd.ie/

○Explore: Wolfram alpha - search over curated data: http://wolframalpha.com

○Data discovery: find related datasets/studies [GAP: no good ways to search for

data across disciplines, hard even within a particular domain]; some

discipline-specific examples:

●Library repository http://www.bids.ac.uk/

○General data repositories, eg, http://dataone.org/

○Domain specific databases, eg, http://www.pdb.org/

●Obtain persistent identifiers

○services, eg, http://n2t.net/ezid, http://handle.net/

○identifiers, eg, DOIs, HTTP URIs

●After data is generated, archive it and generate citations and expose them to appropriate

abstracting and indexing services (eg, Web of Knowledge http://wokinfo.com/)

○DataCite http://datacite.org/

○Dryad http://datadryad.org

○Dataverse: http://thedata.org, http://dvn.iq.harvard.edu (social science data),

http://dvn.theastrodata.org (astronomy data)

●Workflows:

○MyExperiment: http://www.myexperiment.org

2. Generate Ideas

●Google Docs, Word, excel, latex,

●Wikis (http://wikispaces.com)

●mind map and concept map software

○mindmeister: http://www.mindmeister.com/

○CMapTools http://cmap.ihmc.us/

○Personal Brain: http://www.thebrain.com/

●Evernote, data sharing - “cloud storage surfaces” : http://www.evernote.com/

●Blogs (http://wordpress.com), Twitter (http://twitter.com), Disqus (http://disqus.com)

●WorldMap: http://worldmap.harvard.edu/

●Skype, WebEx, Adobe Connect

3. Collect Data

●Google spreadsheets: http://docs.google.com

●Microsoft Excel

●Relational and non relational databases

○mysql, oracle, postgresql BDB, CouchDB, NoSQL

http://en.wikipedia.org/wiki/NoSQL

●Future: Excel DataScope: http://research.microsoft.com/en-us/projects/exceldatascope/

●Google Forms

●GIS, geo tagging http://en.wikipedia.org/wiki/Geotagging

●Sensor Streaming Software

http://www.dataturbine.org/

●Storing data and meta data:

https://www.irods.org/index.php/IRODS:Data_Grids,_Digital_Libraries,_Persistent_Archi

ves,_and_Real-time_Data_Systems

4. Analyze

●Reviews of these tools:

http://www.computerworld.com/s/article/9215504/22_free_tools_for_data_visuali

zation_and_analysis

●Data Wrangler: http://vis.stanford.edu/wrangler/app/

●Google Fusion Tables: http://www.google.com/fusiontables/Home/

●Google Refine: http://code.google.com/p/google-refine/

●R, Splus http://www.math.montana.edu/Rweb/

●Hadoop, Map/Reduce: http://hadoop.apache.org/

●AWS: http://aws.amazon.com/

●Traditional perl, python, ruby, sed, awk, grep, (unix tools)

http://en.wikipedia.org/wiki/Man_page

●Lucene: http://lucene.apache.org/java/docs/index.html

●Matlab: http://www.mathworks.com/products/matlab/index.html

●Mathematica: http://www.wolfram.com/mathematica/

●Wolfram/Alpha: http://www.wolframalpha.com/

●Visualization

http://code.google.com/apis/chart/

○Matlab

○Scilab

○R

○Tableau: http://www.tableausoftware.com/

○Fusion Tables

○Mathematica

○Many Eyes: http://www-958.ibm.com/software/data/cognos/manyeyes/

5. Disseminate and Viz.

See Generate above

●Visualization

○See Analysis above

●Mendeley http://mendeley.com

●Google Docs

●Wikis (http://wikispaces.com), Blogs (http://wordpress.com)

●Pubmed

twitter.com / blogger.com / tumblr.com / posterous.com / google.com / wordpress.com

●Google visualization API

●Open Layers

●Processing

●BioCatalogue (web-services), Dryad (data), Dataverse (data), Google Code /

SourceForge / GitHub / Bitbucket (software)

6. Impact

total-impact.org / klout / ranking / f1000 /

●H and G numbers eigenfactor.org  and readermeter.org

http://opencitations.wordpress.com/2011/10/17/the-five-stars-of-online-journal-articles-3/

Key Unsolved Problems

●Universal scientific search

○“the email problem” - conversations over email are part of science, how to

capture

○“the file transfer problem” - institute firewalls, “dropbox.com” freemium service

○“the file format problem” (video, documents, binary blobs)

○“the library subscription problem” (open access)

○converting audio to text, multimedia indexing and searching

●Lack of integration and seamlessness. Long list of tools that don’t interconnect.

●Not enough inter-disciplinary tools

●Making sense of thousands of papers, sites, etc. Processing vast amount of information

(without having to read them all). Some text mining tools that are OK, but lots to develop

in this area. Eg, Summarizing tools, aggregate tools, zoom in/zoom out, intelligent

filtering, recommendation engines, http://www.nactem.ac.uk

●How are we going to teach all the tools, resources. The advocacy problem,

Possible answers to key unsolved problems

ifttt.com  (concept, logic model) - we need a scientific version of this to trigger integration

●Searchable Registry for scientific, scholarly tools and resources (across domains)


  1. hkeclectic reblogged this from msrworkshop and added:
    Good summarized review...issues concerning research
  2. Merce Crosas submitted this to msrworkshop
blog comments powered by Disqus