Data typing in R

I find data types in R to be a PAIN!

TLDR: arithmetic functions sometimes need to be surrounded in as.double() function, why? Also when asked typeof() R claims it is a double. What is going on with data types in R for arithmetic functions? Also the dataframe function [ , ] returns a dataframe with the element requested instead of the element requested, why?

Most recently I am struggling with the return types of arithmetic functions ( + , - , / , * , ^ )
In the following example my dataset is -20, -3, 2, 4, 4, 10, 50 (from https://www.codecademy.com/courses/learn-r/lessons/quartiles-r/exercises/q-1-and-q-3 , with a BS in math and a minor in CS I don’t need the quartiles learning but am instead using this to learn about R functions, I’ll post my code below)
Trying to run:
print(mydataset[(length(mydataset)+3)/2:length(mydataset)])
#returns
##[1] 4 2 -3 -3 -20 -20
#instead of the expected 4,10,50

In the end I found that running
print(mydataset[as.double((length(mydataset)+3)/2):length(mydataset)])
#returns the expected
##[1] 4 10 50

How do I know that this is a problem with the arithmetic functions? because when I tried as.doubling() just the length section it still returned the wonky information
So my question is what data type do the arithmetic functions return? well trying
print(typeof(length(mydataset)+3/2))
#returns
## [1] “double”

#but if that is a double, then why does forcing it to be a double with as.double() work?

My code for this exercise (I feel like I could have more concisely dealt with the various cases, but I would have needed pen and paper to help me think so I went a bit messy):


```{r}
myq<-function(mydataset,q=2){
mydataset<-mydataset%>%sort()
  if(length(mydataset)%%2==0){mymedian<-(mydataset[length(mydataset)/2]+mydataset[length(mydataset)/2+1])/2
  } else {mymedian<-mydataset[(length(mydataset)+1)/2]
  }
  if(q==2) {return(mymedian)} 
  if (q==1 & length(mydataset)%%2==0){
    return(myq(mydataset[1:length(mydataset)/2],2))
  } else if (q==1) {
    return(myq(mydataset[1:(length(mydataset)-1)/2],2))
  } else if (length(mydataset)%%2==0){
    return(myq(mydataset[length(mydataset)/2+1:length(mydataset)],2))
  } else {
    return(myq(mydataset[as.double((length(mydataset)+3)/2):length(mydataset)],2))
  }
}
dataset_one <- c(50, 10, 4, -3, 4, -20, 2)
# sorted dataset_one: [-20, -3, 2, 4, 4, 10, 50]

dataset_two <- c(24, 20, 1, 45, -15, 40)

dataset_one_q2 <- 4
dataset_two_q2 <- 22

# define the first and third quartile of both datasets here:
dataset_one_q1<-dataset_one%>%myq(1)
dataset_one_q3<-dataset_one%>%myq(3)
dataset_two_q1<-dataset_two%>%myq(1)
dataset_two_q3<-dataset_two%>%myq(3)



Also I just realized that there are other parts of my code where the arithmetic functions do seem to work? I am certain that R must be consistent because it is a computer, but it doesn’t seem consistent to me. Can someone explain?

from a previous encounter
I was driven to trying the as.double() function by an earlier data typing issue with the [ , ] function for data frames! In this previous spat, I found that running
somedataframe[3, 4]
#returned a dataframe with a single element of the value at (3, 4) instead of the value at (3, 4)

anybody want to explain what is going on? Why did R get built this way?

A friend helped me solve this one. Apparently in the original problem, it wasn’t actually a datatype problem, but a parenthesis when does R resolve problem so:
print(mydataset[((length(mydataset)+3)/2):length(mydataset)])
works

1 Like

Still trying to work out how R was interpreting it though without the needed parenthesis.

Figured it out, R order of operations puts the : function before the / function. Then doing so 10/2:7 provides the indicies 10/2, 10/3, 10/4, 10/5, 10/6, 10/7

1 Like