Robert Kabacoff

author
+ Follow
since Mar 28, 2011
Cows and Likes
Cows
Total received
5
In last 30 days
0
Total given
0
Likes
Total received
8
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Robert Kabacoff

Thanks. That is a great resource. R also has Matlab emulation packages, so that you can port your Matlab code to R more easily.

Rob
6 years ago
Hi Kondwani,

It depends on your definition of BIG data. If you have data in the Gigabyte range (or have a larger dataset in a DMBS but only need to analyze a subset of it) then the methods in R in Action will work just fine. There is an Appendix on BiG Data (Terabyte range), but it is not the focus of the book. If you need to analyze REALLY large datasets (and not subsets of them), you can use special functions in R or turn to commercial version of R by Revolution Analytics or Oracle. Other language alternatives are Python and Julia.

Hope this help,

Rob
6 years ago
R is actually one of the most popular languages for data mining. I don't know of any other platform that has more extensive data mining features.

R's ability to handle really BIG data (terabytes) is still under rapid development. Base R can handle moderately sized data sets (gigabytes) but for larger datasets, specialized packages and or methods are used (for example connecting R to a DBMS). There are commercial versions of R (e.g., Revolution Analytics, ROracle) that can natively handle really big data. I talk about some of the ways of dealing with this issue in R in Action.

Rob
6 years ago
Hi Randy,

I cover categorical data in terms of
(1) creating basic and complex tables and cross tabulations
(2) testing for interactions between categorical variables
(3) visualizing categorical data (both univariate and multivariate)
(4) predicting a categorical outcome (e.g., logistic regression, decision trees, random forests)

I don't cover more advanced categorical models such as log-linear models, or event history analysis.

There are some very good books on categorical data analysis (e.g., http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470463635.html) but I don't know of other R books dedicated to the topic. I think R in Action would give you a good start.

Rob
6 years ago
Hi Bijesh,

I am getting questions like this more and more frequently. If you are new to analytics, I would suggest taking the free Data Science sequence at Coursera. The courses are quite good and will give you a sense of what you need. In the past I would have said that a formal education in math and statistics is necessary, but I no longer believe that that is true. I wrote R in Action for beginners. It will take you pretty far, and has lots of material for advanced users, but there shouldn't be any barriers for people new to the field. I don't assume any particular prerequisites.

Rob
6 years ago
Hi Adam,

The target audience is really anyone dealing with data. There is a very strong emphasis on visualizing data. It is the main reason I got into R development in the first place, and having used SAS, SPSS, and Systat for many years (as well as Excel and Tableau and other software), I really believe that R is the most versatile platform for graphics. The examples come from all walks of life (automotive data, medical data, psychological data, marketing data, etc.). When you finish the book, you should be able to import data from almost any source, clean it up,summarize it, and create elegant visualizations. Depending on your interests, you will also learn to create predictive models (machine learning, forecasting), write your own custom functions and packages, and generate customized automated reports.

Hope this helps,

Rob
6 years ago
Hi Folks,

Great to be here. I will try to answer any questions you may have.

Rob
6 years ago
Hi Karthik,

I tried to write R in Action so that readers without a statistical background could read most of the chapters. There are a few advanced chapters in which a stats background will certainly help, but I still tried to make it readable.

I wrote the second edition for a couple of reasons. First, the R language really is evolving rapidly and their have been many changes. Second, there were topics that I wanted to add that I didn't get a chance to in the first edition.
In fact, I added many new chapters including

(1) working with time series data
(2) predictive analytics and machine learning
(3) a deeper dive into the R language for programmers
(4) writing your own R packages
(5) creating attractive dynamic reports using Markdown and LaTeX
(6) an extended discussion on visualizing data with ggplot2

I also added more information on working with big data in the Appendices.

Hope this helps.

Rob
6 years ago
Congratulations Folks!

And thanks for having me. It was a pleasure.

Rob
10 years ago
Hi Igor,

Modern Applied Statistics with S is an excellent book. So is Data Analysis and Graphics Using R, by Maindonald and Braun. If you are just starting out in the area of statistics, these books are likely to prove challenging.

If you are looking for some good on-line courses, you might want to look at offerings from Statistics.com.
I have taken a number of courses from them, and most have been good.
10 years ago
Hi Igor,

I am seeing a lot of job announcements for data modelers in finance, healthcare, and marketing. Predictive analytics and data mining are also very hot right now.

Linked In is a good place to see ads for jobs that combine statistics and software engineering. I regularly see job postings that combine these areas in the following groups:
  • The R Project for Statistical Computing
  • Advanced Business Analytics, Data Mining and Predictive Modeling
  • Data Mining, Statistics, and Data Visualization

  • The New York Times had an article in the last few months, saying that statistics is the new sexy job.
    I feel like I should make a joke here, but I can't think of one.
    10 years ago
    I would agree with Igor. S-Plus and R had their roots in S. S-Plus is proprietary and was acquired a while ago by TIBCO and renamed TIBCO Spotfire S+. I have to believe that R is receiving more active development given the enormous academic user base world-wide.

    Given their their roots, the two languages are very similar, but be careful. They can differ in their scoping rules. In particular, they differ in how free variables (variables within functions that are not assigned values within the function call, and are not assigned values locally) are ultimately assigned values.

    To learn more, search on the phrase "lexical scoping in R".

    10 years ago
    Hi Igor,

    I often find the best results when I put R at the end of the search phrase. For example, in Google you might try phrases like

  • linear models in R
  • creating graphics with R
  • importing data into R
  • embedding R in other applications
  • etc.

  • The R Site Search allows you to search through the R documentation and the extensive R-help mail archives online.
    10 years ago
    Hi Fred,

    There are quite a number of R-Web interfaces. They are described in the R FAQ. You should see solutions using CGI, Java, Javascript, and PHP. Others exist.

    There are also numerous solutions for accessing R from within another language. For example, you can access R functionality from within Java using rJava or rCaller. Python users can use rpy.

    I don't cover this topic in the book. The book primarily focuses on using base R and user-contributed R packages to analyze data and create graphs.
    10 years ago
    Hi Carol,

    These are good questions. In the book, I approach R as a data scientist. I thought about what it takes to successfully process, analyze and understand data, including
  • Accessing the data (getting the data into the application from multiple sources)
  • Cleaning the data (coding missing data, fixing or deleting miscoded data, transforming variables into more useful formats)
  • Annotating the data (in order to remember what each piece represents)
  • Summarizing the data (getting descriptive statistics to help characterize the data)
  • Visualizing the data (because a picture really is worth a thousand words)
  • Modeling the data (uncovering relationships and testing hypotheses)
  • Preparing the results (creating publication quality tables and graphs)

  • Then I tried to explain how to use R to accomplish each of these tasks.

    I don't mention SAS, SPSS, and Stata explicitly, but since these are the same tasks you would use in each of these programs, the organization and topics should make immediate sense to users of those packages.

    With regard to necessary background, here is the description from the preamble (the important point is in blue):

    "R in Action" provides you with a guided introduction to R, giving you a 2,000 foot view of the platform and its capabilities. It will introduce you to the most important functions in the base installation and more than 90 of the most useful contributed packages. Throughout the book, our goal is practical application - how can we make sense of our data and communicate that understanding to others. When you finish, you should have a good grasp of how R works and what it can do, and where you can go to learn more. You will be able to apply a wide variety of techniques for visualizing data, and you will have the skills to tackle both basic and advanced data analytic problems.

    Users without a statistical background, who want to use R to manipulate, summarize, and graph data should find chapters 1-6, 11, and 16 easily accessible. Chapter 7 and 10 assume a one semester course in statistics, while chapters 8, 9, 12-15 would benefit from 2 semesters of statistics. However, I have tried to write each chapter in such a way that both beginning and expert data analysts will find something interesting and useful.



    I hope this helps.
    10 years ago