Monday, March 3, 2014

Publishing PLoS

One of my favorite journals is the Public Library of Science. PLoS is an open access, online journal which has consistently gone above and beyond in an effort to bring science to the general public.  Today PLoS took that effort one step further by requiring authors make data sets available on request after publication. Essentially authors must provide access to a minimal data set (the data utilized to substantiate the studies results) or a valid exclusionary excuse in order to be published in PLoS.

As an advocate of open access in scientific research I was thrilled to hear of this policy change.  However, even I was aware that some data sets should remain private either due to moral concerns or difficulty in delivery.  Luckily PLoS has already addressed the concerns that I and many others have.

Some research generates such vast amounts of data that retaining it (much less supplying it) is simply financially and logistically burdensome.  The Large Hadron Collider for example generates upwards of three hundred and fifty gigabytes of data every second.  Over ninety percent of this data is immediately discarded before significant analysis is performed. After further filtering over ninety-nine point nine nine percent of all data is lost forever.  Only a minuscule portion of data is ever seen by researchers, much less the general public.  Obviously it is ridiculous for the LHC to provide full data sets for every result they publish.

PLoS of course agrees.  It's important to note that researchers must only publish minimal data sets, or the data that is required to derive their presented results.  For example, the LHC would only need to provide access to the data for collisions directly relevant to a particular conclusion.  Yet, even then the amount of data required is massive.  Luckily PLoS is willing to work with authors when such difficulties arise. They point to a variety of services such as Dryad, Genbank, and as possible solutions for large data sets.  If none of those are adequate to an author's needs PLoS will work with them on a one on one basis to make sure everyone's interests are served.

At times the ability to supply data sets isn't a concern but the moral repercussions are questionable.  Should ecologists studying valuable endangered species be required to reveal their habitats locations? Should patients in clinical trials have their personal details revealed? Should the genetic sequences for dangerous pathogens be freely available?  It's difficult to make an argument that any of these questions should be unequivocally answered "yes".  PLoS acknowledges that at times ethical and legal concerns make data publication untenable and will work with authors when this is the case.

Despite these reasonable exceptions PLoS's policy changes are a monumental step in the right direction for open access.  I can only hope that they continue to push the frontier of scientific publication forward in the future.

