Reproducible Research in Python and R
Thank you to all those who attended Jodie’s seminar earlier this week. We had some awesome discussions about our reproducible research workflows.
By popular demand, the video of Jodie’s seminar is now available. A huge thank you to Luke Zappia for filming and cutting together this video.
You can also download the slides: reproducible-research-in-python-and-r
You’ve just heard back about that article you submitted 6 months ago, and great news – they’ve asked for minor revisions! You open up the Dropbox folder you have with all of your scripts to get started on the changes, and … you’re lost. Which script did you start with? What does this random chunk of code do? Where is the original data file? You finally sort out your scripts, but then your code fails every second line because you don’t even remember which packages you used before. What should have been a couple of hours adding in some extra analyses ends up being a week of piecing together your previous work before you can get started.
What if I told you that there is a better way to keep track of your analyses, and that it is easier than you think to do so? In this talk I will show you how using a reproducible research approach to your analyses can save you hours of time when revisiting or updating old projects, and demonstrate some of the tools that Python and R have available to make this possible. This talk will cover how to manage your packages using virtualenvs and Packrat, how to thoroughly document your analysis using Jupyter Notebook and R Markdown, how to keep track of any changes using source control systems like Git and how to collaborate effectively using GitHub. By the end you will wonder why you’ve ever done your analyses any other way, and will be happily maintaining and improving your research for many years to come!
Jodie Burchell loves data – no seriously, she loves data. It took a while before she discovered this passion, but during her PhD in psychology she realised that all she wanted to do was apply her knowledge of behavioural sciences and statistics to interesting problems. She currently works as a data scientist in client-side analytics in SEEK Australia. Her favourite languages are Python, R and Stata. When she is not dreaming about analyses, she enjoys baking, studying Spanish (badly) and reading Reddit.
This event is brought to you by COMBINE, of The Australian Bioinformatics And Computational Biology Society.