Before running a single analysis, data scientists spend over 80% of their time in preparing data for the consumption in statistical models. The same holds, but is probably lesser known, for academic research. At our institute, and everywhere in the world, scientists are working day-in-day-out on implementing their own data processing pipelines. This process is not only highly inefficient and in transparent, but prone to errors, primarily when conducted by less experienced programmers (see for example Bigdely-Shamlo et al. 2015, 2016). Therefore, a significant contribution is made to research practices by those who develop generalizable, efficient, and tested toolboxes, in particular in open source programming languages like R. At our institute this contribution and demand is especially visible for the processing of sensor data, and will continue to grow shortly.
Goal of Thesis
The main goal of the thesis is to implement a R package serving investigators, scientists, and practitioners as a collection of functions to analyze neuro- and physiological data. Therefore the package (1) has to have high usability, (2) needs to be enough general that it integrates well in the existing environment of packages for R and (3) efficient and performant enough to process high volume data on smaller machines. Those properties can be addressed by investing a reasonable amount of time in requirement elicitation leading to well-defined software design. In the end, the implementation has to meet the requirements raised by CRAN, the leading distribution network for published R packages.
- Motivate and identify the gap by investigating similar packages at CRAN and in the Matlab environment
- Raise requirements by interviewing scientists at the KIT which have already conducted neuro- and physiological experiments. The use of their existing code is appreciated but should not be the only source in the requirement elicitation phase.
- Before you already start implementing, we kindly ask you to provide a software design to meet the properties mentioned in section “goal of the thesis”.
- Once we agreed on a software design, you can start your implementation. A clean end-user documentation using the capabilities of the IDE RStudio is a must to get accepted by CRAN.
- As scientist will rely on your solution, we expect you to write test code with the R package “testthat” and run your functions with a data set collected in the KD2lab which will be challenging regarding the size of the data.
- Successfully submit your R package to CRAN
- High intrinsic motivation and proper time management
- Good networking competence, as requirements need to be elicitated from a variety of sources
- Experience with the statistical programming language R and preferable package development in R
- Familiar with software design
- Fluent English (as the thesis has to be written in English)
- Deep dive into R, despite Python one of the most common data science languages
- Access to very large experimental data sets to test and apply your code
- Build a strong skill set to analyze physio and experimental data which prepares you for a carrier in both industry and research; furthermore, you create a code asset which can be used for applications demonstrating your programming experience
- Shape and extend the ability of R as a data science language and be part of the R community once your package is published to CRAN
If you are interested, drop Sven Michalczyk an email with a short motivation statement, your CV and your current transcript of records. If you have questions before, do not hesitate to contact me.
- Michalczyk, S., Jung, D., Nadj, M., Knierim, M. T., & Rissler, R. (2019). brownieR: The R-Package for Neuro Information Systems Research. In Information Systems and Neuroscience(pp. 101-109). Springer, Cham.
- Bigdely-Shamlo, N., Makeig, S., & Robbins, K. A. (2016). Preparing laboratory and real-world EEG data for large-scale analysis: a containerized approach. Frontiers in neuroinformatics, 10, 7.
- Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K. M., & Robbins, K. A. (2015). The PREP pipeline: standardized preprocessing for large-scale EEG analysis. Frontiers in neuroinformatics, 9, 16.
- Wickham, H. (2015): “R Packages”, First Edition, O’Reilly
- Wichkam, H. (2016): “Advanced R”, First Edition, O’Reilly
- Abelson, Harold & Sussman, Gerald. (1985). Structure and Interpretation of Computer Programs / H. Abelson, G.J. Sussman ; colaboración de J. Sussman ; pról. de Alan J. Perlis.. Computer Music Journal. 10.2307/3679579