8 minutes reading time
R vs Python – Which is best?

Live Training With Hadelin
Discover the 5 steps to Unlock your Career!
Register My Spot Now
05
Days
18
Hours
11
Minutes
04
Seconds
If you are reading this article, I imagine you, like many other data scientists, are wondering which programming language to embark upon learning. Whether you have experience in other coding tools or not, the individual features of these two, including the vast arrays of libraries and packages may initially seem daunting, but don't worry, we're here to help!
To no one's surprise, both R and Python boast their respective advantages for a multitude of applications and are widely used by professionals in its global community. This article is going to help you decide which has the right tools to get you going.
In order to begin, it's likely a good idea to revisit what exactly you want to use the programming language for in terms of your data science. For example, a data scientist working predominantly on genetics research may find themselves among those using R (as it's highly used across genetics and popular with bioinformaticians), whereas someone working on models for image analysis, say an employee at Tesla creating self-driving car technology, might find themselves working with people who prefer Python, due to its sophisticated image manipulation tools. Ultimately, it's still your choice, and while it would usually never be a good philosophy to just blindly do what everyone else is doing, do take the time to discover why these professionals are preferring certain languages. It is important to be able to “speak” the same language as your future peers.
If you haven't yet, I would advise you to take a look at SuperDataScience's related blog posts on R and Python in the workplace, and Learn All the Pros and Cons of Python vs R Programming for a breakdown of the key differences between the two, and their uses in the field.
Who uses R and what's its purpose?
R was created initially as a platform for statistical computing, hosting all the classical tests, time-series analysis, clustering, and more. It has a large community of data miners which means lots of accessible packages, both from R developers and users. In terms of graphics there is multitude of packages and layers for plotting and analysing graphs, such as ggplot2. Importantly, R has emerged onto the new-style artificial intelligence scene providing tools for neural networks, machine learning, and Bayesian inference and is compatible with such packages for deep learning as MXNet and TensorFlow. You can read more about these here at Quick list of useful R packages. It would seem R has a solid following of not only data scientists but largely statisticians and associated fields requiring data manipulation (for instance those in medicine, finance and the social sciences). For us data scientists, finding a widely used program is important; we want to be able to speak to as many disciplines within one language as possible, making our findings easily translatable.
Who uses Python and what's its purpose?
On the other side of the court, Python is an excellent tool for programmers and developers across the board. Whether developing algorithms for simulating biomolecules or delivering anti-spam software, you'll find yourself at home using its interface and array of functions. Released in 1989, it is quoted as being one of the most significant general-purpose object-oriented programming languages. Python has an ever-growing popularity among new programmers (data scientists among them), which of course means a rich community of users and trouble-shooters.
Similarly, on the hot topic of artificial intelligence, Python is also the most popular choice; it has tools for machine learning, neural networks, and Tensorflow. Additionally, covering some more general purposes, its users benefit from libraries such as NumPy for statistical analysis, pandas for data preparation, and seaborn for generating plots. Check out this article on the Top 20 Python libraries for Data Science.
R vs Python: Limitations
To the more interesting part: how do they each match up? Uncovering limitations early is possibly one of the most important pieces of advice. Speaking from experience, jumping from using Matlab where there is an enormity of online support (and usually some wonderful person who's written an exact code for your needs), to labVIEW where there was little to no online presence, I know the sensation too well of panicking and being unable to solve that bug and becoming frustrated at not having considered these obvious potential limitations.
Some of the main things to consider for a data science application are:
- Processing speed (will you be using large amounts of data?)
- Online community (it really is invaluable and has saved me many times)
- Steep learning curve (how much time and patience do you have to specialise/have you already learnt programming before and are better equipped to learn a new language?)
- User-friendly interface (Are you familiar with programming or do you prefer something easy to visualise and pretty?)
- Widely spoken (Have you considered future connections across fields and their languages?)
Let's have a look at how each fares on these topics...
Processing speed:
R is considered to be slow. It requires its objects to be stored in a physical memory, meaning it's not a great option when trying to harness Big Data. That being said, faster processors are reducing this limitation, and there are various packages out there focused on tackling this. Python however, is more suited for large datasets and its ability to load large files faster. For more information, check out Quora's Which is better for data analysis: R or Python?
Online community:
As I mentioned, both R and Python have a widely backed support network for you to reach out to, this being an invaluable source of help for those bugs you just can't seem to troubleshoot readily.
Steep learning curve:
This may or may not be considered as a limitation of R, but its steep learning curve is due to its extensive power for statisticians. Being developed by experts in the field, R is an incredible tool, but you pay the price for this with your initial investment of time. On the other hand, Python is very attractive to new programmers for its ease of use and its relative accessibility.
Both programs will require you to get familiar with terminology which may seem initially daunting and confusing (like the difference between a “package” and a “library”), with the set-up for Python having the edge on R in terms of the user-friendly experience, again a link to R being developed by statisticians and based heavily on its mature predecessor, S. Although, Python will be unrelentingly strict with users on syntax and refuse to run if you haven't met easily-missable faults (though these do enhance user experience in the long run as it makes us better, neater code writers). R has the lovely attribute in relation to its many academic users of providing the user lots more control over design for their graphics, allowing various display exports and formats.
Importantly, both are intepreter-based and it has been found, in relation to other languages (such as C++), that this makes spotting bugs so much easier.
User-friendly interface:
Rstudio is widely considered the favourite platform for interfacing in R and once you begin familiarising yourself with it, you'll understand why that is the case. It's classified as an integrated development environment (IDE) and comprises a console for direct code execution with all the functions for plotting, supporting interactive graphics, debugging and workspace management, see RStudio IDE Features for a more detailed guide.
Python hosts numerous IDEs for the choosing. The benefit of this is that it provides a nice opportunity for you to choose one which feels familiar based on your background. For instance, coming from a computer science background, Spyder is a clear favourite. Whereas, beginners in the field find PyCharm accessible and intuitive. Top 5 Python IDEs for Data Science is a helpful, comprehensive article on this topic.
Widely used:
We've touched on this topic and I would stress that this is subjective to your chosen field. If you are leaning towards the fields of academia, finance, healthcare, R would most likely be much more widely spoken and you'll want to take advantage of that. Whereas, those of you interested in software development, automation, or robotics, may find yourself immersed in the Python community.
R vs Python: Advantages
R:
- An excellent choice if you want to manipulate data. It boasts over 10,000 packages for data wrangling on its CRAN.
- You can make beautiful, publication-quality graphs very easily; R allows users to alter aesthetics of graphics and customise with minimal coding, a huge advantage over its competitors.
- Perhaps its most powerful tool is its statistical modelling, creating statistical tools for data scientists and being the forerunners in this field, preferred by experienced programmers.
- Users benefit from its interface to Github's large platform to discover and share better software.
Python:
- It's very easy and intuitive to learn for beginners (unlike R, Python was developed by programmers, and its ease of use makes it a favourite for Universities across the board).
- It is appealing to a wide range of users, creating an ever-growing community in more disciplines and increased communication between open-source languages.
- The strict syntax will force you to become a better coder, writing more condensed, legible code.
- Python is faster at dealing with large datasets and can load files with ease, making it more appropriate for Big Data handlers.
With all this in mind, choosing a language to begin with highly depends on what you want from it. If you are the kind of data scientist who is specializes in statistical analysis or you work in research, you may find R works best for you. However, if you are someone who sees themselves branching across multiple disciplines, you could make use of Python's generality and diverse network. You may also agree that it would benefit you to eventually learn both (at least enough to be able to read the other's syntax) as you get to know each for their respective strengths. This will undoubtedly open more doors for you in terms of landing jobs, and more importantly, give you that clarity to decide what career path you want to take. But don't be overwhelmed; learning the second language will be easier than the first! You no doubt will also find yourself excited about opening up a whole new community to immerse yourself as you grow as a data scientist.
Good luck and happy coding!
Resources: