Is R used outside of Statistics/Data analysis?

Question

Adrian Olszewski · Answer

I have employed R for creation of both web-based and windows-based, full featured reporting systems and data transformation (HL7, XML, databases, data from sensors and laboratory machinery) adapters many times. Both as a freelancer and contracted programmer. The CRO company I currently work for uses R extensively this way. But yes - this is still - some method of data analysis and presentation :)

When it comes to create a small reporting system (50 concurrent users at max), I create it directly in R. I strongly prefer OpenCPU for this task. R gives me everything I need in my everyday work, incl

When it comes to create a small reporting system (50 concurrent users at max), I create it directly in R. I strongly prefer OpenCPU for this task. R gives me everything I need in my everyday work, including the ability to:

* query all possible database engines, Active Directory, GPIO bus on Raspberry PI in order to read data from sensors,
 * call external web services,
 * interact with the host operating system,
 * produce documents in all commonly used formats (docx, xlsx, pptx, rtf, open document, pdf, postscript, static and dynamic html), both static and dynamic (knitr, RMarkdown, sweave, odfWeave)
 * generate high-quality, complex graphs (often hard to obtain in typical graphing libraries), which can be turned into dynamic, JS-based presentations
 * create complete, bootstrap-based web pages or embeddable partial HTML views
 * call code written in many other languages: C++ (in this case I can even mix R and C via Rcpp), C#, Java. There are adapters (bridges) between R and majority of programming languages, including Python and Perl.
 * exchange data with R via various channels, including DDE, COM+, TCP/IP. Thank to this R can be accessed from, say, Excel, Word or other applications allowing the user the send commands and receive data this way.
 * expose written code as a RESTful web service with OpenCPU; create interactive (and responsive) web-applications with Shiny
 * talk with SAS, SPSS and other statistical packages
Such approach reveals some significant advantages:

* lack of additional layers (tiers) which reduces the system performance (translation of R objects into PHP/C++/C#/Java/Python structures takes resources),
 * avoiding external programming languages; R programmers are likely to happen in CRO company, but C++ or PHP - not necessarily :) It’s easier to keep consistent programming environment.
 * perfect integration with R - well, in fact, it’s build on a top of R. It matters, when the majority of a process is done by R. In this case, setting up additional run time frameworks or launchers seems questionable.
In certain cases, when I need more flexibility or when I’m going to create something more advanced, I combine R and ASP.NET MVC framework. I followed this pattern multiple times with good results on both Windows and Linux (Debian) systems.

One just has to feel the fine line between using proper tools for given tasks and using hammer as a screwdriver all the time.

Let me also share a personal feeling about one thing. For me, being a C++ and C# programmer for over 15 years, it’s a really strange experience to find R syntax much more handy when it comes to play with data. Now it happens that I launch R much more often than the Visual Studio to play with algorithms or mangle data. With just 5 lines of code I can query two completely different data sources and write the result into a third one. Sometimes, the lack of a full-control over a process (provided by “true languages”, like C++) is nothing wrong. It’s just a matter of needs. Sometimes we really don’t need anything more. But, actually, I would hate to lose my beloved strongly-typed languages for more serious projects :)

Sairaam Varadarajan · Answer

Okay, R with NO ML. Lets see the related-topics of R Programming language in Quora

1. R-Programming
2. SAS
3. RStudio
4. SAS Software
5. Data Analysis
6. Data Science (ML!!)
Ref: R (programming language) [ https://www.quora.com/topic/R-programming-language ]

I believe the list seen in this Quora link is ordered with a magnitude of affinity among R and its related topics.

Lets move on and look at the top downloaded packages with R. The 5 most popular R packages [ http://blog.revolutionanalytics.com/2016/11/most-popular-r-packages.html ]

1. dplyr (data manipulation)
2. devtools (generic!)
3. Foreign (read data)
4. cluster (Damn ML)
5. ggplot2 (Visualization)
I wish data.table package in the top-15.

Lastly, Let's go to stackoverflow and look at the related tags along with R: Newest 'r' Questions [ https://stackoverflow.com/questions/tagged/r ]

1. ggplot2
2. dataframe
3. plot
4. shiny
5. data.table
6. dplyr
My takeaway is that more than ML, R is extensively used for visualization and data manipulation. ML is a small piece of it. Data professionals spend more time on data cleansing and manipulation rather than ML.

With the time spend in R, defines the packages and exploration available with R.

I would think that the sample of questions which you saw in Quora in relation to ML-R is skewed. It may be because you happen to up-vote ML related questions and R-related questions and you got that specific blend in your view.

Nevermind, Let's discuss other applications of R rather than ML, data manipulation and visualization. I got an opportunity to use it for:

* Shiny dashboards
 * SparkR(hive queries and data manipulation with sparkler- dplyr )
 * I had a short-lived blog hosted with R-Markdown.
 * web-scraping
 * All my presentations are with R-Markdown. Some of my presentations doesn't have any R-Script but still I tend to compile ppt via rmd
 * ETL jobs(Not recommended for production-run)
I had seen R-scripts to merge google and outlook calendar thereby notify the overlaps via email.

* Drop-box integration with R
 * IFTTT with R

Jody Diaz · Answer

Yes, R is used outside of traditional statistics and data analysis. It is widely utilized in fields like bioinformatics, finance, and social sciences for tasks such as data visualization, reporting, and even machine learning applications. Its versatility makes it a valuable tool for various types of data-related tasks. For more insights on R's applications, check out my Quora Profile!

Jeremy Miles · Answer

I’m not much of a programmer, but I’m better in R than anything else. So if I need to write a program to do something, and it’s possible to do it in R, that’s where I’m going.

Couple of examples:

1. Need to move a whole bunch of files around into separate folders, based on file name? R.
2. Need to scrape PDFs from the web, convert them to text, and save as csv files? R.

ChatGPT · Answer

Yes, R is used outside of traditional statistics and data analysis in several fields and applications. Here are some notable areas where R is applied:
1. Machine Learning: R has numerous packages (like [code]caret[/code], [code]randomForest[/code], and [code]xgboost[/code]) for building predictive models and performing machine learning tasks.
2. Bioinformatics: R is widely used in bioinformatics for analyzing biological data, including genomics and proteomics. Packages like [code]Bioconductor[/code] provide tools for analyzing genomic data.
3. Finance: In finance, R is used for risk analysis, portfolio management, and financial modeling. The [code]quantmod[/code] and [code]TTR[/code] packages are popular for quantitative trading and analysis.
4. Marketing Analytics: R is leveraged for customer segmentation, A/B testing, and analyzing marketing campaign effectiveness, often using visualization packages like [code]ggplot2[/code].
5. Social Sciences: Researchers in sociology, psychology, and political science use R for survey analysis, experimental data analysis, and social network analysis.
6. Geospatial Analysis: R has strong capabilities for handling spatial data with packages like [code]sf[/code], [code]sp[/code], and [code]raster[/code], making it useful in geography and environmental science.
7. Web Development: Through packages like [code]Shiny[/code], R can be used to build interactive web applications for data visualization and analysis.
8. Text Mining: R can be used for natural language processing and text mining, with packages like [code]tm[/code] and [code]text[/code].
In summary, R's versatility and rich ecosystem of packages make it a valuable tool across various disciplines beyond just statistics and data analysis.

Rishabh Thukral · Answer

Recently, I was reading about Steganography

[1]

and I thought about creating my own tool that can hide data in form of text inside images without causing much damage to the quality of image. I chose R for creating this entire project because of the large number of Image Processing packages available in R. Moreover, the documentation available for R and it’s packages is very neat which substantially reduces the development time.

With the platform and idea, I developed this software that hides the textual data into images by encoding them into intensity values of the pixels in the images.
Here are s

Footnotes

[1]

Steganography - Wikipedia

With the platform and idea, I developed this software that hides the textual data into images by encoding them into intensity values of the pixels in the images.
Here are some screenshots of the app:-

Consider this image that we will use for hiding our data in.

Then, we execute the application step by step:-

1. It asks us to select an input image.
2.Then ,the system asks for the message you want to hide inside the image.

Note : The maximum limit on number of characters is dependent upon the resolution of the input image.

3. It performs certain operations on image and produce an output image. It shows a successful message once the process is over. It also shows the percentage loss at pixel level in quality of image. The user specifies the save location for output image and the system writes the image file onto that location.

After saving the image, we can see that there is not much difference visible in the new image.

Note : This image now stores some textual information as well. The quality seems pretty much the same due to low information loss which happened because of short message. Quality of image will decrease for longer texts but the difference will not be clearly visible to eyes.

Obtaining the data back from the image is the exact reverse of the above-mentioned process.

1.Select the image you wish to decode.

2.Based on the technique used for hiding the data in image, we extract it out from the image.

We obtain the same message back.

Steganography has an advantage over cryptography in terms of security which is that it doesn’t attract attention of malicious attackers.

This entire application was developed in RStudio using R. If you want to look at the code, then the source code is available at my Github repository, the link to which is provided below:-
supercool276/StegnographyR [ https://github.com/supercool276/StegnographyR ]

1. Steganography - Wikipedia [ https://en.wikipedia.org/wiki/Steganography ]

Rakesh Kumar · Answer

I’ve used ‘R’ for really weird reasons. Started to use R as a tool to learn data science, was fascinated with tidyverse package. Got good hold of all the data manipulating tools within ‘R’. One fine (not so fine) day there was a big production issue with wrong price displayed (You might guess I work for a Big retail company) to the customers on site. Data being highly complicatedly modeled, none of them could understand how to identify and correct all pricing. I used all my data manipulation skills and was able to figure out all the differences within like 15 minutes. Most of them thought it was magic and some folks did not even trust. Fortunately or Unfortunately management had no choice but to use my analysis to correct the data. SURPRISE all prices were recovered in 2 hrs, we were able to mitigate lot of potential loses. Now I have my R Programs running between multiple systems for data consistency checks.

Folks still cannot understand how dplyr joins between multiple tables with million of records executes in matter of seconds. And I proudly say, ‘It’s magic’.

Thanks to Hadley Wickham for the tidyverse suite of packages

Folks still cannot understand how dplyr joins between multiple tables with million of records executes in matter of seconds. And I proudly say, ‘It’s magic’.

Thanks to Hadley Wickham [ https://www.quora.com/profile/Hadley-Wickham ] for the tidyverse suite of packages

Håkon Hapnes Strand · Answer

My first introduction to R was through a university course called Statistical Modeling and Simulation. It was taught with the textbook Statistical Computing with R by Maria Rizzo. The book does not touch machine learning even once in its approximately 400 pages, nor did the course.

I have later used techniques from that course in my job in at least two projects. One where I used Markov chains to model a queue, and another where I used kernel density estimation and simulations to model future workflows in a machine shop.

In fact, I rarely use R for machine learning, but I use it all the time for things that could be considered part of data science, like visualization, statistical analysis and simulations.

Alket Cecaj · Answer

Yes it can bè used for other things as for example Web development. There is a package Valledoria shiny that you can use for development web applications.

Anonymous · Answer

I use R regularly for finance. In fact, our entire portfolio optimization procedure is in R, as is our dashboard of recession indicators.

I regularly test pet theories, and whip up econometric regressions. R is VERY powerful for finance, and I have long since abandoned Excel.

I have tried (and so far failed) to develop a ML algo for bubble detection. However, my first choice for this attempt is R.

Here are some sample outputs:

Anonymous · Answer

The most recent R program I ran was a simulation of stochastic larval dispersal (283 dual socket Broadwell compute nodes, took about five hours to finish).

The biggest program prior to that was an exploration of whether 4200 Broadwell processors could be distinguished based solely on their performance. The answer is “probably yes”, but there’s still quite a bit of work to do for that result to be useful.

I also do all of my data visualization in R.

Machine learning? Never tried it.

Sandeep Kale · Answer

R is a programming language that was originally developed by Ross Ihaka and Robert Gentleman in the 1990s. It is used mostly for statistical analysis and data manipulation. R is one of the most popular programming languages in use today, with over 2 million users worldwide.

R is fading away as a machine learning language though it is being used in statistical functions normally? Why?

In my opinion, R is fading away as a machine learning language because there are better options out there for machine learning: Python, Julia, and MATLAB. These languages are more suited for machine learning because they were designed specifically for the purpose of creating algorithms and models.

Yes, R is fading away as a machine learning language. Although it is being used in statistical functions normally, there are other languages that have taken its place in the field of machine learning.

The reason why R is fading away as a machine learning language is that there are many other languages that have more functionalities and can do the same thing as R. Also, R has been around for quite some time now and there are newer languages that are more capable than R because they were built with machine learning in mind.

David Johnston · Answer

Yes. It was never that popular to begin with with data scientists that actually deliver applications to production. It's fine for exploratory work and has good visualizations. But it's not great for large datasets and isn't well integrated into the rest of modern software stacks. While it has loads of useful packages, it is really only good when there is very little programming to do. R is a serviceable scripting language for gluing together some library API calls. But it's not a good, modern general purpose language like Python is.

A data scientist needs to be more than someone who can read in

A data scientist needs to be more than someone who can read in a csv file, plug data into libraries and make charts. They need to be software developers to some degree and well versed into modern software stacks. Python is the alternative that is used almost exclusively for that. R users who want to cross that divide and become full stack data scientists need to make that leap. Leaving R behind is part of that process.

The R community is quite insular. R users tend to be the types that don't want to learn other languages and don't want to really ever cross the divide to become real software developers. The ideal users of R actually aren't data scientists. It's academic researchers whose real interest is their science not scientific computing or analytics. And R is great for them most of the time just as Matlab is. But when you become more serious about delivering data science solutions to business production environments you gotta graduate from that and embrace Python and general purpose computing.

Thomas Subia · Answer

Nelson in a previous post wrote: “No. R wasn't intended to be used to do anything but statistics. Everything else is a hack.”

I’m not sure what motivated that response but R can be used for many things not related to statistics. Here is a good example.

I have to create SPC charts from Excel files. Since there are literally thousands of files to comb through, cutting and pasting that data into an Excel file might take literally weeks.

Fortunately, R can do this easily, Here is how this is done.

Let’s say our data exists in cell B9 of a spreadsheet. We want R to go through all the Excel files, copy and paste this data into a file.

# You will need these libraries

library(plyr)

library(readxl)

# read in all Excel flies from the directory

files %3C- list.files(pattern="*.xls", full.names = FALSE)

View(files) # this ensures that one can check if all the files were read in correctly

# Extract Work Order

WO %3C- lapply(files, read_excel, sheet="Sheet1", range=("B9"))

WO_list %3C- as.data.frame(WO)

trans_WO %3C- t(WO_list)

write.table(trans_WO ,"WO.txt")

# Reading through more than 300 files took less than 10 seconds to run.

While Nelson claims that R was intended for use as solely a statistical analysis tool, R can be used as a efficient and time saving solution for data collecting and storage. Efficient and time saving solutions are hardly a hack.

Jon Wayland · Answer

Not in the slightest.

I think any perceived popularity loss is due in part by the wide adoption of those with CS-before-stats backgrounds investing their time in Python — a language that has a more general adoption in this community to begin with.

On the other hand, those with stats-before-CS backgrounds are typically introduced to R first, and thus form their most productive skills with this language.

It really boils down to what language someone dedicated most of their time creating valuable data solutions in. For those who studied CS in school, this tends to be Python. For those who studied stats in school, this tends to be R.

In the working world, many tech-focused industries prefer Python. Conversely, many companies whose product isn’t primarily digital in its delivery, such as healthcare or theme parks, prefer R.

One thought I’d like to end with: I have found that more and more entry-level candidates are coming out with stats degrees — no doubt because of their interest in data science — and are entering the workforce with R as their primary tool. If stats degrees continue to be popular for aspiring data scientists, I would bet that R’s popularity will only increase.

Christopher Stern · Answer

For numerical work MATLAB is a much better choice than R. R has some support for matrices and such but outside of stats it's slow and incomplete comparared to more apropriate tools. I'm not sure what area of pure math you are looking into, perhaps you're not yet either; but being at least somewhat aquated with some of  the most commonly used tools in your field is just part of 'speaking the language'. You don't have to be in expert in everything, few people are, but it's strange not to have expertise in 1 or 2. If everyone around you is sharing MATLAB recipes you don't want to be trying to reinvent everything in R.

Craig Slinkman · Answer

I know this answer will conflict with a prior answer to this question. His advice is to stick with Python and to develop your Python skills at a deeper level.

You should be aware that when you ask this question in an online forum you will get biased answers. Since I use R, I will answer R. If a Python user answers this she or he will say Python.

My real answer is that you should know both. If you are gathering data by screen scraping, for example, I would recommend Python. If you are building a regression model to predict a response variable, I would recommend R.

I started my computing career in 1967. The perfect computer languages that have existed over that time are FORTRAN IV, PL/I, Pascal, LISP, c, c++, and Python. They come and go. If you are new to the profession I guarantee that you need to learn other languages besides Python and R. The important thing is to have enough ambition to be willing to learn and study the language. This means either getting your organization to send you to professional courses or spending nights writing code and checking your results. P.S. You need to know the correct answers! Thus, you should have a book by a really competent author.

Before my fellow R users call for my head I should point out that R is an environment. This is especially true when you use tools like RStudio.

Be aware that there is no such thing as a perfect all-purpose language. You should be intimately involved with both languages as they are the hear of data science practice at the current time.

Incidentally, it is important to know SQL and database management.

Don’t bet your career on a single technology. Be adaptable and be able to adapt as the technical environment changes. If for no other reason, this is the reason to learn a second language.

Jeremy Deats · Answer

1. R was created for the purpose of Statistical Computing. Prior to R, FORTRAN or C would have been the most popular choice, but R’s syntax was designed around this purpose and it has rich graphing/visualization features baked in.
2. R is Free and available on multiple OS environments 
(Windows, Mac OS, Linux)
3. R has been adopted by Universities to teach Data Science. What’s used in the classroom tends to bleed over to the work environment.
4. R has a massive repository of community created and supported libraries with a group that manages the repo and insures quality of the packages that are hosted in it. This last point is very important because having access to the right library for the job and a rich community to support those libraries can greatly increase productivity which in the business world translates into saving money.

Albert de Koninck · Answer

I will answer in terms of pure, not applied mathematics. (For applied mathematics, I would recommend either R or MATLAB.)

It depends on what you want a language for. To do number theory and some general calculations, I use Pari Droid (e.g., it can do 1001! instantaneously with complete precision). If you want to perform set theoretical calculations and programs, then Setlx will do the job. Both applications are available for Android as well as Linux and Windows. Finally, if you want to use a completely different programming paradigm similar to the old APL, then J is your language. J is an array-based language available for IOS, Android, Linux and Windows and requires completely rethinking what you know about programming.

Debdatta Chatterjee · Answer

I just love this question. I think this is a plausible and trending debate topic amongst data science enthusiast I suppose. It is hard to pick one out of those two amazingly flexible data analytics languages. Both are free and open source, and were developed in the early 1990s — R for statistical analysis and Python as a general-purpose programming language. For anyone interested in machine learning, working with large datasets, or creating complex data visualizations, they are absolutely essential. This answer can be derived logically following these points.

Process of Data Science

Now, it is t

Process of Data Science

Now, it is time to look at these two languages a little bit deeper regarding their usage in a data pipeline, including:

1. Data Collection
2. Data Exploration
3. Data Modeling
4. Data Visualization
Data Collection

Python

Python supports all kinds of different data formats. You can play with comma-separated value documents (known as CSVs) or you can play with JSON sourced from the web. You can import SQL tables directly into your code [ http://stackoverflow.com/questions/32912373/importing-multiple-sql-tables-using-pandas ].

You can also create datasets. The Python requests library [ http://docs.python-requests.org/ ] is a beautiful piece of work that allows you to take data from different websites with a line of code. It simplifies HTTP requests into a line of code. You’ll be able to take data from Wikipedia tables, and once you’ve organized the data you get withbeautifulsoup [ http://www.crummy.com/software/BeautifulSoup/bs4/doc/ ], you’ll be able to analyze them in-depth.

You can get any kind of data with Python. If you’re ever stuck, google Python and the dataset you’re looking for to get a solution.

R

You can import data from Excel, CSV, and from text files into R [ http://www.r-tutor.com/r-introduction/data-frame/data-import ]. Files built in Minitab or in SPSS format can be turned into R data frames as well. While R might not be as versatile at grabbing information from the web like Python is, it can handle data from your most common sources.

Many modern packages for R data collection have been built recently to address this problem. Rvest [ https://github.com/hadley/rvest ] will allow you to perform basic web scraping, while magrittr [ https://github.com/smbache/magrittr ] will clean it up and parse the information for you. These packages are analogous to the requests and beautiful soup libraries in Python.

Data Exploration

Python

To unearth insights from the data, you’ll have to use Pandas [ http://pandas.pydata.org/ ], the data analysis library for Python. It can hold large amounts of data without any of the lag that comes from Excel. You’ll be able to filter, sort and display data in a matter of seconds.

Pandas is organized into data frames [ http://pandas.pydata.org/pandas-docs/stable/dsintro.html ], which can be defined and redefined several times throughout a project. You can clean data by filling in non-valid values such as NaN (not a number) with a value that makes sense for numerical analysis such as 0. You’ll be able to easily scan through the data you have with Pandas and clean up data that makes no empirical sense.

R

R was built to do statistical and numerical analysis of large data sets, so it’s no surprise that you’ll have many options while exploring data with R. You’ll be able to build probability distributions, apply a variety of statistical tests to your data, and use standard machine learning and data mining techniques.

Basic R functionality encompasses the basics of analytics, optimization, statistical processing, optimization, random number generation, signal processing, and machine learning. For some of the heavier work, you’ll have to rely on third-party libraries.

Data Modeling

Python

You can do numerical modeling analysis with Numpy [ http://www.numpy.org/ ]. You can do scientific computing and calculation with SciPy [ http://www.scipy.org/ ]. You can access a lot of powerful machine learning algorithms with the scikit-learn [ http://scikit-learn.org/ ] code library. scikit-learn offers an intuitive interface that allows you to tap all of the power of machine learning without its many complexities.

R

In order to do specific modeling analyses, you’ll sometimes have to rely on packages outside of R’s core functionality. There are plenty of packages out there for specific analyses such as the Poisson distribution and mixtures of probability laws [ http://www.revolutionanalytics.com/r-language-features-applications-and-extensions#datamining ].

Data Visualization

Python

The IPython Notebook that comes with Anaconda has a lot of powerful options to visualize data. You can use the Matplotlib [ http://matplotlib.org/ ] library to generate basic graphs and charts from the data embedded in your Python. If you want more advanced graphs or better design, you could try Plot.ly [ https://plot.ly/ ]. This handy data visualization solution takes your data through its intuitive Python API [ https://plot.ly/python/ ] and spits out beautiful graphs and dashboards that can help you express your point with force and beauty.

You can also use the nbconvert [ https://ipython.org/ipython-doc/1/interactive/nbconvert.html ] function to turn your Python notebooks into HTML documents. This can help you embed snippets of nicely-formatted code into interactive websites or your online portfolio. Many people have used this function to create online tutorials [ https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks ] on how to learn Python and interactive books.

R

R was built to do statistical analysis and demonstrate the results [ http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Rgraphics/Rgraphics.pdf ]. It’s a powerful environment suited to scientific visualization with many packages that specialize in graphical display of results. The base graphics module allows you to make all of the basic charts and plots you’d like from data matrices. You can then save these files into image formats such as jpg., or you can save them as separate PDFs. You can use ggplot2 [ http://ggplot2.org/ ] for more advanced plots such as complex scatter plots with regression lines.