Public or Proprietary Data?

Public or Proprietary Data?

Nowadays astronomical data is recorded in two ways. Either remotely, for example with space telescopes, or in ground based observatories. In the latter case however, astronomers do not usually perform the observation by themselves, but they propose which object (and how) should be observed. Then, if the proposal is approved, a mission control centre or a resident astronomer in a specific observatory sets up the instrument to perform the observations. What happens next?

Usually (with few exceptions), the data is almost immediately moved to a server and access is granted to the astronomer that proposed that observation via a password-protected download link. The data is then kept private for a period of time that can vary but is almost always of one year. After that year the data is unlocked and the entire astronomical community can access the (now public) data.

In my opinion this process is today overly slow and inefficient. Right before the internet era, the data were collected and recorded on different means and then directly physically delivered to the astronomer. The data were then transferred to an hard drive and processed with relatively slow computers. If a group of collaborators were working on the same data, then a copy had to be made and delivered to the collaborators, in what would look today as a never ending process. Then when results were finally collected cross-checked and approved, the paper had to be written. I have been lucky enough to never experience the need to perform a bibliographic search without internet (and ADS), but you can guess what that meant back then.

It is straightforward to understand how much internet has improved and speeded up the entire process of data reduction, analysis and results validation. So what is the need today of a one year proprietary data ? It is certainly true that it might take about one year before a peer-reviewed paper is accepted and published, even today. But the data analysis process does not (in most circumstances) last for much more than a few weeks. There are recent examples of space telescopes that immediately release data only in public format and they work very well. In my experience these data remain “hot” for a longer time, as many different groups can work on the data at almost the same time. This usually leads to more publications per observation performed and to a richer scientific debate. If other groups can access the data at the same time then it is also possible to cross-check on the fly the results of other groups and possibly correct mistakes before they spread too far. Also it has happened quite often that public data are used for a different reason than the original proposers envisaged and this has triggered unexpected and/or serendipitous discoveries. Using public data allows also a more efficient planning of future related observations.

I have been working with such satellites (e.g. with Swift and with the now decommissioned RXTE) and the downside of all this is certainly the stressful and painful constant fear of being scooped one moment or the other by someone just slightly faster than you. Also this might mean sometimes trying to rush thus increasing the chances of errors in the data analysis.

However, I believe that a one year long proprietary data window is still not fully justified. One month or two is more than sufficient to almost certainly guarantee the lead of the proposer on a certain work and to be the first to publish a specific result obtained with the data collected. Also, the today common practice of posting papers just submitted to refereed journals on the arXiv would even more increase that chance. The possibility of cross-checking in “real-time” the results of other groups  compensates by far the increase of possible mistakes due to a faster data analysis process (which in any case is not even so true as basically no-one nowadays really works on the same data for one year…).

The lack of public data access prevents other scientists to schedule their observations more efficiently, as the non-accessible information contained in those data could be crucial to plan a different observation with other observatories.

In short, my proposal is to reduce the proprietary data time window down to a minimal period of one month, just to avoid the big stress that public data can have. This will increase the scientific production and quality and decrease the costs of the research in our field. Is there any specific strong reason why shouldn’t we do this?

Alessandro Patruno is a researcher at the Leiden University working in the field of compact objects (neutron stars, black holes and white dwarfs) and high energy astrophysics. In his blog Astrosplash Alessandro discusses news in his research field and posts updates on his work.