Skip to content

Open Source Data Analysis for scientific studies: why does it matter?

Posted on:October 11, 2023 at 01:00 PM

From time to time, I conduct psychological scientific research. When I encounter statistical tasks such as factor analysis, I often struggle with the tools. Traditionally, many social scientists use SPSS to perform statistical analysis. SPSS is a proprietary tool and one of the market leaders.

I do not use it. I am a firm believer that science and scientific results should be open to all. Everyone should be able to repeat each aspect of an experiment, especially if it does not require significant effort. With proprietary tools, this becomes a challenging task as one needs to buy a license. Moreover, these tools limit customisation options (which are sometimes crucial) and do not allow you to verify the real algorithm used.

The latter may seem odd — what’s the point of looking at the algorithms? The reality is that without access to the code, you cannot fully trust the results. For instance, when I was performing an exploratory factor analysis a while ago, the results I received from different software packages varied slightly. Which one should I use?

I am convinced that scientific research should rely solely on open-source software, especially in large fields like statistics. Over the past few years, I have noticed that some scientists (particularly younger ones) have begun creating and maintaining Python and R packages to support their own studies and those of others. I hope this will become more widespread.

P.S. As for me, I publish all my research on GitHub under the MIT license. Some of the recurring tasks I carry out are organised as Python or Fortran packages, e.g., ‘resonances’.