/r/RStudio

Photograph via snooOG

A place for users of R and RStudio to exchange tips and knowledge about the various applications of R and RStudio in any discipline.

Please use this as a forum to discuss R, and learn more about it. If you have any questions about how to do specific things in R, this is the place to ask. If you are looking for more advanced help using R, please visit /r/Rstats.

You can download R itself here.

You can download RStudio here. It is an incredibly powerful IDE for R, and what the mods recommend you use.

NOTE: Due to a couple of recent posts offering "compensation" for help with an assignment let's make this official: You are not allowed to offer payment for help with an assignment. If you want help with an assignment please post the work you've done/completed so far and highlight the issue you are having. Members will then help where they can. If you desire to pay someone for tutoring in R this is not the place to look for it.

/r/RStudio

30,942 Subscribers

1

*Sigh* White Screen

*Update at end of post.

tldr: When I open a particular project in RStudio, the screen turns white after a few seconds, the program becomes unresponsive, and I've tried everything on the Posit forums about the "white screen of death".

Yesterday I used st_union() to unite two polygon layers. I then used mapview() to visualize this union to ensure I did it correctly; however, this code could never run. I stopped and restarted the run several times with the same outcome, and eventually I got the white screen.

I had gotten a message that a newer version of R was available, so I deleted R and RStudio, then installed the latest version of R (4.4.0), executed the file, then downloaded and installed RStudio.

I continue to get the white screen on this particular project. I have uninstalled and reinstalled both R & RStudio several times. I have shut down and restarted my computer. I have tried changing the rendering of RStudio to "software". I deleted the only .Rhistory file I had. I'm not using a VPN. Other projects seem to run fine. I tried waiting out the white screen but a couple YouTube videos later and it was still white and unresponsive...

Do I just delete the project and start over? Obviously I'd rather not but that might be faster at this point.

*UPDATE: I decided to delete the project. Luckily I already knitted it to html so I can follow along with what I did and make some changes to avoid the dataset becoming too large (I assume this was the issue, although I really didn't think it would be so devastatingly large). RStudio is working just fine and no white screens have appeared!

2 Comments
2024/05/02
20:28 UTC

2

GGPlot knit cancels in R Markdown

Hi all,

Struggling with this R Markdown. I have my libraries set to load at the top of the markdown (code below) but I always get the error in the screenshot, what am I missing?

title: "5_GGPlot"
output:
pdf_document: default
html_document: default
date: "2024-01-16"

`{r setup, include=FALSE}``

knitr::opts_chunk$set(echo = TRUE)

library(ggplot2)
library(dplyr)
library(ggbeeswarm)
n_fun<-function(x){
return(data.frame(y = 0.95*log10(50),
label = length(x)))}

https://preview.redd.it/ifgauwtym0yc1.png?width=671&format=png&auto=webp&s=0915155d4ef88661a0e98f3089bfb8803a694c9b

3 Comments
2024/05/02
13:31 UTC

0

Boxplot text

Hello. I'm trying to insert text into a boxplot function. I always end up with this message (the last part is in french)

Error in axis(side = base::quote(1), at = base::quote(1:5), labels = base::quote(c("Reading",  : 
  les longueurs de 'at' et de 'labels' diffèrent, 5 != 4

The code I used (as a test) is this one :

boxplot(Modelisation$Reading,Modelisation$Naming, Modelisation$Switching1, Modelisation$Switching2,

text (3,2, "Test"),

main = "Répartition des scores selon les conditions",

names=c("Reading","Naming","Switching1","Switching2"),

col = c("palegreen2","steelblue1","orchid","mediumpurple2"))

Do you know what types of mistake can cause this message?

1 Comment
2024/05/02
10:36 UTC

0

Radarchart

Hope someone can help!
So, I'm really struggling making this Radarchart from fmsb package. I have tried so many times, but simply can't get it to work. Also updated RStudio, reloading, trying to perform small chunks so everything is not at once, no success. So final hope is making my first reddit post and seeing if anyone would know what I do wrong. Here is my script:

library(fmsb)

data <- data.frame(

Identity_Sense_of_Place = c(49, 42, 41, 40, 24),

Subsistence_Crop = c(48, 41, 11, 34, 2),

Cash_Crop = c(45, 50, 8, 45, 8),

Soil_Nutrient_Cycling_Maintenance = c(24, 24, 22, 18, 8),

Water_Recharge_Cycling = c(20, 28, 26, 22, 26),

Fuel = c(14, 7, 24, 13, 18),

Recreation = c(10, 9, 7, 12, 10),

Natural_Medicines = c(7, 13, 11, 9, 15),

Biodiversity_Maintenance = c(5, 8, 8, 4, 15),

Wild_Plants = c(5, 8, 31, 14, 17),

Timber = c(2, 1, 22, 16, 37),

Fibre = c(2, 2, 22, 18, 38),

Ecotourism = c(1, 2, 1, 2, 5),

Wild_Meat = c(0, 3, 5, 4, 18)

)

rownames(data) <- c("Upland Rice", "Mixed Crop", "Fallow Land", "Agroforestry", "Forest")

Generate radar chart

radarchart(data,

axistype = 1,

seg = 5,

pcol = c("#FF0000", "#00FF00", "#0000FF", "#FF00FF", "#00FFFF"),

plwd = 2,

plty = 1,

cglcol = "grey",

cglty = 1,

axislabcol = "black",

caxislabels = seq(0, 50, 10),

cglwd = 0.8,

vlcex = 0.8)

3 Comments
2024/05/02
07:39 UTC

3

Large dataset grouping, adding new column

https://preview.redd.it/ko5hizw1twxc1.png?width=603&format=png&auto=webp&s=2568deed6efb90fd92e7f95404c59d536c29c36c

https://preview.redd.it/0p4kyzw1twxc1.png?width=526&format=png&auto=webp&s=acb0731d659ad098cfbe5b83b3fcf7456a921c98

I have this dataset with stop_id ranging from 1 to 3407, I would like to create a new column which would show how many times each stop_id appears on Weekday, Saturday, and Sunday for 24MAR. The numbers in veh_sched column do not matter, I am only trying to count how much service we get on different days.

4 Comments
2024/05/02
00:46 UTC

1

Need help with copying columns from 2 data sets

I have 2 data sets, March and September. Each has many columns with the similar names, I am trying to get columns named Stops and Vehicles copied to a New data set from both March and September.

First I run:

New<- select(March, Stops, Vehicles)

And it worked fine, but if i run:

New<- select(September, Stops, Vehicles)

it replaces my previous columns with the new ones. I have tried changing the column names since they are same, but it did not work.

Shortly, I need the New set to have 4 columns, 2 from each March and September.

8 Comments
2024/05/02
00:11 UTC

2

Am I over differencing the data?

If, having done an Augmented Dickey-Fuller test, my data is stationary at the 5% level but non-stationary at the 1% level, should I difference it again (I have already differenced it once) to make it appropriate for ARMA modelling?

1 Comment
2024/05/01
20:41 UTC

4 Comments
2024/05/01
19:49 UTC

1

RStudio has started giving me an error when trying to install packages

So title basically says is all. I started learning R a few months ago, and I used to be able to instal and use packages (at least the simple ones) but i've noticed this last few weeks that I'm now getting an error message.

For example, I tried to instal leaps, and go this message. "

install.packages("leaps")

Warning in install.packages : unable to access index for repository https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6: cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/PACKAGES' Package which is only available in source form, and may need compilation of C/C++/Fortran: ‘leaps’ Do you want to attempt to install these from sources? (Yes/no/cancel) "

Do I just need to update my applications or is there some way to fix this?

Thanks!

1 Comment
2024/05/01
16:44 UTC

3

Creating Map

Hey guys. I have a problem and I need your suggestions. I perform analyzes using Telecom Customer Churn data on R. And I want to show those who stayed in the company, those who join and those who churned, on the map. But I don't know how to show it more effectively. I wrote a code like this, but it shows the entire map. I only want to show the state (California) where the data is located. I wrote a code like this, but it shows the entire map. I only want to show the state (California) where the data is located. Everyone's feedback is important to me.

library(leaflet)
library(dplyr)
library(RColorBrewer)

bubble_data <- churned_dataset %>%
  group_by(City, Latitude, Longitude, Churn.Category) %>%
  summarise(Count = n(), .groups = 'drop') 

colors <- colorFactor(palette = "Set1", domain = bubble_data$Churn.Category)

bubble_map <- leaflet(bubble_data) %>%
  addTiles() %>%
  addCircles(
    lng = ~Longitude, lat = ~Latitude,
    weight = 3, color = ~colors(Churn.Category),
    radius = ~Count * 100,  
    popup = ~paste("City:", City, "<br>Churn Category:", Churn.Category, "<br>Count:", Count),
    fillOpacity = 0.5
  )%>%
  addLegend(
    position = "bottomleft",
    pal = colors,
    values = ~Churn.Category,
    title = "Churn Reasons",
    opacity = 1
  )
bubble_map

https://preview.redd.it/qs5fk00zcuxc1.png?width=618&format=png&auto=webp&s=2707bf216dd48b9da326940fd6b23f20b14368b2

7 Comments
2024/05/01
16:27 UTC

1

how to analyze results of simple linear regression

I have the following summary for a simple linear regression model between the columns cnt and month_name and i am asked to report the reference month's (which I assumed is April but please correct me if i am wrong) predicted cnt and the predicted cnt for the months january and june. I have no idea how to find these values, is there a code for this or is this a calculation i should be doing by hand. thanks in advance!

https://preview.redd.it/l0w8fsttqtxc1.png?width=1174&format=png&auto=webp&s=9e1864333e1c132f47ea1eca3794b17826c8dc78

2 Comments
2024/05/01
14:28 UTC

1

RStudio Console opening in another language

My RStudio console is showing a different language than English, and I am new to R so I am unsure of how to fix this.

Is there a way to set the default language to English? I have tried LANGUAGE=en but it doesn't seem to be working. I could be doing this wrong.

2 Comments
2024/05/01
07:57 UTC

1

Rmpi Package Error

Hi all,

Sort of still new to R. Trying to use the RCarbon package with the function modelTest() in order to create a simulation. When I get to this particular function, I receive the following error:

 Loading required namespace: Rmpi 
Failed with error:  ‘.onLoad failed in loadNamespace() for 'Rmpi', details: 
 call: inDL(x, as.logical(local), as.logical(now), ...) 
  error: unable to load shared object 'C:/Users/emikr/AppData/Local/R/win-library/4.2/Rmpi/libs/x64/Rmpi.dll':   
LoadLibrary failure:  The specified module could not be found.
 ’ 
Error in makeMPIcluster(spec, ...) :  
  the `Rmpi' package is needed for MPI clusters. 

I didn't call the Rmpi package in my session, though I've never come across this problem before running my code with this package. I've tried reinstalling the Rmpi package and restarting my R session but I'm not sure what the problem could be.

If anyone has anyone advice please let me know!

4 Comments
2024/05/01
04:12 UTC

2

How to weightings to variables in R

Hello,

I'm wondering how I can add weightings to variables in R please. I have individual bird IDs, the number of birds each bird has interacted with, and its associated proportion of wins. I'm wanting to add weightings to the birds wins based on the number of birds they interacted with. As an example, you can see that bird LB/M/LG-S has won about 87% of its interactions, but it's only interacted with 4 other birds, so it should have a lesser weighting. Whereas bird LB/M/LG-S has interacted with 10 other birds and only won 13% of interactions, so it should have a higher weighting. What would be the best code to use to add the weightings here? Thank you. Here is the data structure:

```

structure(list(BirdA_ID = c("DB/M/DB-S", "DB/M/Y-S", "DB/M-W/S", "DB/Y/LG-S", "Gold/LG-LG", "Gold/W-LB", "LB/DB/LB-LB", "LB/M/LG-S", "LB/O-B", "LB/R/LB-R"), Number_birds_interacted_with = c(18, 9, 4, 10, 2, 8, 10, 4, 3, 1), Average_prop_wins = c(0.327342047930283, 0.666666666666667, 0, 0.133333333333333, 0, 0.125, 0.761111111111111, 0.875, 0.333333333333333, 0)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

```

Also here is the table version:

https://preview.redd.it/mj3uv6a5vpxc1.png?width=777&format=png&auto=webp&s=1ad729b87787cef5c1bc1b273c00582e6e47cfe2

Thanks very much!

6 Comments
2024/05/01
01:31 UTC

4

A data mining work on a chess database!

Hi to everyone

As a work to finish my degree on statistics I'm doing a work on data mining techniques with a chess database. I have more than 500.000 chess games with variables about the number of turns, elo and how many times each piece has been moved (for example, B_K_moves is how many times Black has moved the King)

Problem is, I'm supposed to do the decision tree with all the steps but ... the decision tree only has 3 nodes of depth. This is the tree, and I'm supposed to do steps like the pudding but ... it's very simple and I don't know why the algorithm doesn't use variables like W_Q_moves (how many times white has moved the queen) or B_R_moves (how many times Black has moved a rook).

This is the code I've used with the library caret in R:

control <- trainControl(method = "cv", number = 10)
modelo <- train(Result ~ ., data = dataset, method = "rpart", trControl = control)
print(modelo)
## CART
##
## 212282 samples
## 15 predictor
## 3 classes: ’Derrota’, ’Empate’, ’Victoria’
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 191054, 191054, 191054, 191054, 191053, 191054, ...
1
## Resampling results across tuning parameters:
##
## cp Accuracy Kappa
## 0.01444892 0.6166044 0.2417333
## 0.02930692 0.5885474 0.1931878
## 0.13442808 0.5668073 0.1448201
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.01444892

And the code to plot the tree:

library(rpart.plot)
## Loading required package: rpart
rpart.plot(modelo$finalModel, yesno = 2, type = 0, extra = 1)

As I said, I don't know why the depth is so small and I don't know what to change in the code to make it deeper

3 Comments
2024/04/30
23:51 UTC

3

Data wrangling

I am trying to collect data on 50 countries for a 20 year period to put in panel form to visualize and run a few regressions. Each variable I collected has a sheet with the years as columns and the countries as the rows. How do I combine 10+ variables, each having their own .csv, in a way that I can create visuals and perform GMM estimations? I have done some basic stuff in R, but mainly use stata. I normally would read a book or take a free online course, but this is due in a week, so I’m short on time.

2 Comments
2024/04/30
22:35 UTC

2

Help: R Encountered a Fatal Error and won’t start.

Help! I’m new to using R and otherwise not a huge tech person. I was running a read.csv line when it crashed and it hasn’t worked since. The R terminal still works

I have a Mac.

I have since then:

  • updated r and RStudio to the latest versions
  • deleted the file that was running when it crashed the first time
  • restarted my computer
  • deleted and redownloaded RStudio
  • looked at everything I can find online
  • tried to delete .rdata but can’t find it anywhere.

Help!

2 Comments
2024/04/30
20:35 UTC

1

Too many significant ACFs

Hi guys, got a fairly simple issue here. I am attempting to produce an ACF and PACF in order to get a suitable ARMA model. The data had a trend, however I differenced it. 35 of my ACFs are significant, while 3 of my PACFs are significant, giving me an ARMA (35,3). The ARIMA functions gives me (2,0,1). Clearly, there is a major contradiction. To me something seems wrong. Can you tell me if this is possible/likely and if not, could you suggest any of the main areas I may have gone wrong. Feel free to ask any questions to get any more info. Thanks for reading.

5 Comments
2024/04/30
16:53 UTC

1

I can't get the expected barplot

2 Comments
2024/04/30
10:32 UTC

2

Openxlsx2 help

Hi all,

TLDR: excel table isn’t expanding among new data being appended from Rstudio. How can I fix this.

Recently started building out a simple excel report for my parents after painfully watching how they manage their data for their business. Currently trying to set up automations for them so they no longer have to manually download what they need bit by bit. This led me to writing a script that automatically takes the new raw data cleans it and appends to the table in the report I made. After failing for hours the original openxlsx package kept currupting the file since the table had slicers attached to it. I finally got the excel file to update with the slicers in place using the new openxlsx2, however now the table will not automatically expand to the rows below in excel, so my new appended rows are not a part of the table. I know I could easily go in and fix that or even just make the table huge before hand, but I want this as hands free as possible. My parents can be technologically challenged so I wouldn’t want them having to do anything other than click on the slicers to see the summary statistics they filter on.

Question: how do I append the new data files from r to excel while also expanding the table in excel to include the new rows.

Thanks in advance for any help!

Edit: screenshot posted.

https://preview.redd.it/96gxdrchzlxc1.png?width=1217&format=png&auto=webp&s=f8b62b611271fcdbbe802e72958bba91f7854876

13 Comments
2024/04/30
01:56 UTC

17

Need feedback on ChatGPT + R Studio

Feel free to remove if not the right place.

My name is Rahul and I am building a new tool that's kind of like ChatGPT meets R Studio, to help researchers and data scientists save time. Kind of like Code Interpreter but for R.
This is not an ad so I won't share the name of the tool. Mainly I'm looking for feedback on how I should test it.

I am a python user and not experienced in R so I would love to know:

  1. What are certain things I should make sure work in this? Any libraries, types of analysis or R functionalities that I should make sure work?

  2. In python, I think most popular libraries for statistical analysis & visualization are Numpy, Pandas, Matplot, Sci-kit learn, Seaborn, Statsmodels. What are the most popular equivalent R libraries that I should make sure are supported.

Thanks in advance 🙏

Here's a screenshot of how it currently works:

https://preview.redd.it/7wp3n3flnhxc1.png?width=1608&format=png&auto=webp&s=417da41775b4400b2188ed075302e32f61038b72

18 Comments
2024/04/29
21:51 UTC

2

Working with date times

For anyone that is interested in a coding problem I could use some help. I don't have a lot of experience with date time data types. Here is my question on stackoverflow.

https://stackoverflow.com/questions/78405258/working-with-date-times-to-average-values-across-records-in-r

5 Comments
2024/04/29
21:47 UTC

2

ggplot axis dates not formatted as obvious dates

Using tidyverse package. My x-axis is showing dates a little funky. Not sure if it's possible to a) get them to show as dates and b) get them in order. (Random figure visual tips also appreciated.) Visuals provided towards the bottom. I apologize in advance if this doesn't seem explained very well.

Current state of my figure

How my dataframe is structured. There are 35 rows of each date, one for each school. Dates go from a specific day in May 2023 to a specific day in April 2024.

Original way I formatted dates in pre-pivoted dataframe before entering data:

#create dataframe of dates
med.school.cycle <- data.frame(DATE = seq.Date(from = as.Date("2023/05/30"), to = as.Date("2024/04/03", "%Y/%m/%d"),by = "day"))

The block where I actually try making the figure:

#create horizontal bar chart

#pivot original dataframe
med.school.cycle.2 <- med.school.cycle
med.school.cycle.2 <- med.school.cycle.2 %>% pivot_longer(cols = c(-"Date"),
                    names_to = "School",
                    values_to = "Status")

#change order of stacked bar chart fills
med.school.cycle.2$Status <- fct_relevel(med.school.cycle.2$Status, "AMCAS Submitted", "AMCAS Processed", "Secondary Received", "Secondary Submitted", "Interview Invite", "Interview Attended", "Waitlist", "Accepted", "Rejected")

#oops make it backwards
med.school.cycle.2$Status <- fct_rev(med.school.cycle.2$Status)

View(med.school.cycle.2)

ggplot(med.school.cycle.2, aes(x = School, y = Date, fill = Status)) + 
  geom_bar(stat = "identity") +
  theme_bw() +
  theme(#axis.text.y = element_blank(),
        axis.text.y = element_text(hjust = 0,
                                   margin = margin(r = -445),
                                   color = "black"),
        axis.title.y = element_text(margin = margin(r = 50)),
        axis.ticks.y = element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank()) +
    coord_flip() +
  scale_x_discrete(limits=rev) +
  scale_fill_manual(values=c("#c41616", "#09b520", "#eded0c", "#f5c07a", "#f08a05", "#74abf2","#073d85", "#d4d2d2", "#8a8686"), guide = guide_legend(reverse = TRUE))
5 Comments
2024/04/29
20:46 UTC

48

R vulnerability discovered via malicious "deserialization of untrusted data". Recommended to upgrade your R version to the recently released 4.4.0.

3 Comments
2024/04/29
19:23 UTC

0

Very lost with R time series regression

Ok so I’m trying to use R to perform a time series regression analysis (to see if there are specific years where there is a spike in abortion travel in specific states but I am super lost and confused…..at this point I don’t even know what Im doing wrong and am probably overlooking really obvious reasons as to why its not working…..I’m wondering if my data is organized incorrectly (not in an optimal format).........any suggestions or advice on how to perform this test with the data I have?

This is the link to the data - it has data from 2012-2021 which has abortion data by state of residence and state of service....the year column is BI (further in).....im trying to see if specific states of residence have more people traveling for care during certain years...but i am very new to R and am lost :(

Data Link

1 Comment
2024/04/29
16:29 UTC

1

Data simulation

Hello,

I am performing a data simulation of around 1000 iterations. However, I cannot run the data simulation in one go due to a lack of short-term memory. Instead, my supervisor advised me to do the data simulation in parts, for example, 250 iterations each.

In his words:

"you can simply store the results in a workspace, then after 1000 you can combine the results by opening all the workspaces and then finalise computations. Remember to give the objects different names, otherwise, you cannot combine them."

Can anyone provide me with an outline of how to do this? Like I do not need the specific code or anything, but just some direction on how to do it. If you need any additional information just ask me.

Thank you so much.

11 Comments
2024/04/29
14:36 UTC

2

What are the required parameters for Bair's supervised PCA method?

Hello, I'm currently working on an exam project. I have a dataset of some medical readings and I'm currently trying to use SPCA to determine the factors which might be related to a higher level of glycerated hemoglobin. I'm having trouble understanding the code documentation for the function. I was wondering if anyone could explain to me some of the variables. I'm currently following the example but to no avail.

Documentation for the function superpc.train (package superpc)

Current attempt (after performing KNN imputation on \"meddataforpca\"

Error message

Documented example

2 Comments
2024/04/29
14:13 UTC

3

Calculating hedge's g

I am working on a meta-analysis and need to calculate effect size. I have all my data (mean, sd, and n) pulled out and put into a CSV file. Now that I have this, how can I have R calculate the effect size for each independent variable I'm looking at? Ive been watching youtube vids and googling all week with no luck.

6 Comments
2024/04/28
22:35 UTC

Back To Top