/r/RStudio

Photograph via snooOG

A place for users of R and RStudio to exchange tips and knowledge about the various applications of R and RStudio in any discipline.

Please use this as a forum to discuss R, and learn more about it. If you have any questions about how to do specific things in R, this is the place to ask. If you are looking for more advanced help using R, please visit /r/Rstats.

You can download R itself here.

You can download RStudio here. It is an incredibly powerful IDE for R, and what the mods recommend you use.

NOTE: Due to a couple of recent posts offering "compensation" for help with an assignment let's make this official: You are not allowed to offer payment for help with an assignment. If you want help with an assignment please post the work you've done/completed so far and highlight the issue you are having. Members will then help where they can. If you desire to pay someone for tutoring in R this is not the place to look for it.

/r/RStudio

35,751 Subscribers

1

Having trouble downloading packages

2 Comments
2024/12/04
02:24 UTC

2

Consolidating two different files that contains similar people and data

I am trying to organize two sets of data into one file.

https://preview.redd.it/rzdwjuo7jo4e1.png?width=1138&format=png&auto=webp&s=06b78e123a9071723a661f215d966a1a65790413

Here I have a group of people with some basic info about them.

https://preview.redd.it/togqsedcjo4e1.png?width=1126&format=png&auto=webp&s=e64d83d261d5dd618974d4d0225d1b2aa063cc8c

Then I have a second file with overlapping information, but not for everyone in the original list.

https://preview.redd.it/2z34u6vgjo4e1.png?width=1242&format=png&auto=webp&s=15c1032397fe34f53677decd307cea0740bfb891

And I want it to look something like this. Here I just did it manually for my example, but for my actual data, I have over a hundred names and would like to use R to organize it into one table.

I tried googling "sort dataframe by column r", "r sort row based on column value" "r two columns same value align row" and I was not able to find an example that matched what I was asking.

Any help is appreciated!

4 Comments
2024/12/03
18:56 UTC

3

Trouble with lme4 and Matrix (and Rtools?)

Newbie on R here. I have to do some geostatistical plot on R, and for that I need the lme4 and Matrix packages. When I run my code, I get the error message

function 'cholmod_factor_ldetA' not provided by package 'Matrix'

From some googling the issue seems to be that I need to install a binary version of Matrix. However, when I try, I get the warning

WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

Except, I already have Rtools installed (4.3, my version of R is 4.3.2 and RStudio 2023.12.0). From other answers online it seems to be a path issue but I don't know how to solve it. Also I'm working on a company laptop and I don't have the privileges to install and uninstall software.

Any help is appreciated!

2 Comments
2024/12/03
12:01 UTC

2

Reorder column with geom_col

Hello, I plan to make an histogram with 2 column per condition. However, the order of the column vary according to the lower number. I would like to have the 5 min column always to the left.

I tried a function called "fct_reorder" but it doesn't worked. My data is order with all the 5 minutes then all the 90 minutes in a row.

Here is a screenshot of my code as well as my graph. I really hope someone can help me.

https://preview.redd.it/gxcqpe9i9i4e1.png?width=2400&format=png&auto=webp&s=1fcca3fe63a9966686d8ee926157089eccd4947b

https://preview.redd.it/x1dciv4j9i4e1.png?width=1642&format=png&auto=webp&s=55a5f2e1fa3c13fe9c23c0846f9151baee21b3bd

3 Comments
2024/12/02
21:46 UTC

1

Help needed: Interpreting fixed effects model with counterintuitive results in panel data analysis

Hello everyone, I am currently having a minor crisis over my methods class, so please bear with me if all of these questions are really stupid.

I'm working on a panel data analysis for my research project, and I'm running into some issues interpreting my results. My study examines how institutional quality (QoG) affects voter turnout, with a particular interest in whether ethnic fractionalization moderates this relationship.

Model and Data: I'm using the standard time-series dataset from QoG

Dependent variable: Voter turnout (percentage).

Independent variable: QoG (institutional quality).

Moderator: Ethnic fractionalization.

Interacted term: QoG × Ethnic fractionalization.

Panel structure: Unbalanced panel of 125 countries from 2000–2019 (n=585).

Problems I'm facing:

Unexpected direction of QoG's effect:

In my two-way fixed effects model (model = "within"), the direct effect of QoG on voter turnout is negative and not consistently significant. This contradicts theory and the positive relationship I observed in my earlier OLS models. I understand that fixed effects models only capture within-country variation over time, and this might explain some of the difference, but it’s still puzzling. Could it be that QoG doesn't vary enough within countries over time, or is there something else I might be missing?

Low explanatory power:

The R-squared values in my fixed effects models are incredibly and hilariously low (around 1%), which makes me question whether I'm even modeling this relationship correctly. I fully understand that a single variable like QoG (and even its interaction with ethnic fractionalization) isn't going to explain all of the variation in voter turnout, but I'm wondering if I'm expected to include control variables in a fixed effects framework? I’ve read that fixed effects already account for unobserved heterogeneity, so including controls might be redundant, but at the same time, I feel like my model is missing something crucial.

Interpreting the interaction term:

The interaction term (QoG × Ethnic Fractionalization) is positive and significant, but its interpretation is confusing in the context of the negative direct effect of QoG. If the main effect of QoG is negative, does it make sense that the interaction term suggests the effect of QoG becomes more positive as ethnic fractionalization increases? I might be overthinking it, but I’m struggling to make theoretical sense of this.

Multicollinearity concerns:

I’m also worried about multicollinearity between QoG, Ethnic Fractionalization, and the interaction term. Should I center my variables before creating the interaction to reduce multicollinearity? Or is the observed multicollinearity just something inherent to interaction models and something I need to accept?

I know something is seriously wrong with my approach, and I’m open to any and all suggestions to fix or reframe this. Thank you so much for your patience and time—I genuinely appreciate any insights you can provide.

3 Comments
2024/12/02
21:02 UTC

1 Comment
2024/12/02
21:01 UTC

2

What is the minimum system requirement for a TEFL student?

I really don't know what my professor is going to teach us with R. But it has something to do with coding for research or whatever a TEFL student needs to get his M.A degree.

I appreciate any suggestions you might have concerning the laptop I need to buy.

2 Comments
2024/12/02
17:36 UTC

0

I have my practice exam of Rstudio on wednesday - any tips?

what are some of your best tips on using R? We can use our notes so I’ll be writing your advice down :)

4 Comments
2024/12/02
15:38 UTC

2

debugging with knit

My team executes a knitted code. when there s a problem and I need to debug, I don't find the environment variables. I have to execute it all over chunck by chunk. is there a way to access the specefic variables of my team's knit execution

3 Comments
2024/12/02
11:24 UTC

3

Allowing for duplicate files when copying images

I want to copy 4 files from different file paths into one folder in R-studio. However, the file names on two of these are the same. My destination folder is

"E:/Masterdirectory/2023/VALIDERING/duplicate"

my file path is a list of where the images are stored in

[1] "E:/Masterdirectory/2023/RENAMED/CAD12_RENAMED/101RECNX/101RECNX__2023-05-12__17-05-00(1).JPG"
[2] "E:/Masterdirectory/2023/RENAMED/CAD15_RENAMED/100RECNX/100RECNX__2023-04-23__10-10-00(1).JPG"
[3] "E:/Masterdirectory/2023/RENAMED/CAD16_RENAMED/100RECNX/100RECNX__2023-04-23__10-10-00(1).JPG"
[4] "E:/Masterdirectory/2023/RENAMED/CAD17_RENAMED/101RECNX/101RECNX__2023-05-12__17-05-00(1).JPG"

I have tried the function here, but duplicates gets removed, and im stuck with 2 images instead of 4. These have the same name due to being previously renamed with their timestamps.

for (file in file_paths) {
file.copy(file, destination, overwrite = FALSE,) }

how do i allow for duplicates in Rstudio when copying images?

8 Comments
2024/12/02
11:09 UTC

1

Raster Issues Again

https://preview.redd.it/11lxnchavd4e1.png?width=1306&format=png&auto=webp&s=9e7362607f4d381ca805a3638448e67b0f453180

I'm using the USDA cropland data. I’m trying to show change in land use for summer vegetables in 2008 and and 2023. I tried to plot them together on one chunk, but R kept trying to kill itself. I’m now just trying to plot one, and the chunk just stays loading. The plot never appears. I’m still learning. Is there anything glaringly wrong? Is it just my computer? Is there a much better way to do this?

2 Comments
2024/12/02
07:00 UTC

0

Research Paper with RStudio

Hey all. I am a poli sci major and I have a research paper due in a week worth 20% of my grade using RStudios. I am to upload my data from GSS Explorer into the software, analyze it and then write my paper over the data. I am completely lost even after watching tutorial videos all day. Is there a way to get chat gpt to do all the analyzing, etc? As interesting as RStudios looks I just need to get this done as soon as possible

2 Comments
2024/12/02
05:44 UTC

2

Using WorldValuesSurvey to code religiosity in correlation with vote choice

I am trying to code a religiosity index in RStudio using WorldValuesSurvey. I am new to R, so I am unsure if I should pick a new project if this seems too difficult (please tell me if you think I should). Anyway, I've selected five or six questions that represent the importance of religion and if they play an active role in it. For a question like "How important is religion to you?" I want to code "rarely important" and "not at all" as 0 (not religious) and all the rest as 1 (religious). I try to do this and my whole Excel sheet turns into a jumble of random numbers. How would I best code this and use this data in correlation with their identified vote choice (like what would be the best way to build a regression graph)?

Again, this is sounding a little out of my league writing out, so I might choose to drop this project

1 Comment
2024/12/01
19:34 UTC

3

New computer recommendations

I need a new computer but there are so many I don't know which one to get. For my thesis I'll need to use large data, and I'll want to make a shiny app. I was looking into a Microsoft surface laptop but I saw that there are issues with the newer processors. It's that still a thing? If the end product will be used on a Microsoft computer can I use a Mac for making the project?

13 Comments
2024/11/30
18:13 UTC

3

How do I create this graph?

https://preview.redd.it/voo2alkw214e1.png?width=277&format=png&auto=webp&s=56cacae5741b9edb58691382ef90952ef10bd9c2

Is it a violin plot + bar chart? How do I make this graph? Sorry, I'm new to R.

7 Comments
2024/11/30
11:56 UTC

2

RGB color codes on r

4 Comments
2024/11/30
05:07 UTC

4

How to scrape an excel sheet off of a website?

I'm wondering how to scrape or access a dynamic link from a website that automatically downloads an excel file into my computer. I need RStudio to grab this excel file without manually loading it into the environment and converting it into a data frame. Any help?

7 Comments
2024/11/30
02:30 UTC

0

Calculate the critical value for an interval using QNORM?

How do i calculate the critical value of an interval with qnorm? i thought i use qt but it says to use qnorm in the question?

1 Comment
2024/11/30
01:25 UTC

0

Relational issue: 2 is less than 2?

Working on a program for class that uses a simple loop. I need to increment a variable by a user-set amount (h) and break the loop when it is 2 or greater. Code I'm using for this below.

Instead of breaking on 2 like it should, when x reaches 2, it is considered to be less than 2. I've tried using the same code with 1, 3, and 4 instead, and it works as intended, but not with 2. I need it to be 2 because the interval I'm required to work with is over 0-2 and I need to stay within bounds.

Anyone have any idea why this is happening and how to avoid it? I'm thinking an error with floating point rounding, but I don't know how to work around it.

while(x<2){
cat("x before increment:", x)
x <- x+h
cat("x after increment:", x)
}
6 Comments
2024/11/29
22:30 UTC

1

How to resolve 'Variable lengths differ' error when using ancboot (wsr2 package)?

I want to run an ancova in r using the robust variant ancboot(). The code I found looks fairly simple:

print(df)
library(WRS2)
result <- ancboot(stat_1 ~ group + stat_A + stat_B, data = df)

However, while the print(df) yields an intact dataframe (as far as I can tell), the function yields the Error 'Variable lengths differ for group'. However, there are no Nan values in the dataframe df, nor are there more or less than 60 values for each column. There are 30 participants in each group.

Here is the full dataframe:

group   stat_A  stat_B  stat_C  stat_1
0     2       4       3       4       1
1     2       1       2       1       4
2     2       3       2       2       3
3     1       2       2       1       5
4     2       5       4       5       5
5     1       1       4       4       4
6     2       3       4       2       4
7     1       4       4       4       4
8     2       3       3       3       4
9     2       1       2       2       2
10    1       2       4       4       3
11    1       1       4       2       4
12    2       2       4       4       4
13    1       2       2       2       4
14    2       1       3       1       3
15    1       1       2       4       3
16    1       1       3       1       3
17    2       2       3       2       1
18    1       1       4       4       4
19    1       1       1       1       3
20    1       2       2       1       4
21    1       2       3       2       4
22    2       3       4       4       5
23    1       1       1       1       2
24    1       5       5       4       5
25    2       2       5       3       4
26    1       4       4       4       5
27    2       3       5       5       5
28    1       2       1       1       4
29    1       1       1       1       2
30    2       3       3       4       5
31    2       1       2       2       5
32    1       1       2       2       4
33    2       1       3       2       2
34    2       3       3       2       2
35    1       2       2       2       4
36    2       3       4       3       5
37    1       3       4       3       4
38    1       2       2       2       4
39    2       2       3       4       5
40    2       2       2       2       4
41    2       5       5       5       4
42    1       3       3       4       4
43    2       4       2       2       4
44    2       2       3       3       4
45    1       1       3       3       4
46    2       2       3       2       4
47    1       3       3       3       3
48    1       2       4       2       5
49    2       3       2       4       5
50    1       3       4       3       4
51    2       1       1       1       5
52    2       1       3       1       5
53    1       1       5       5       2
54    1       4       3       3       4
55    2       3       4       4       4
56    1       1       5       3       5
57    2       1       2       2       4
58    1       4       4       4       5
59    2       1       3       2       4

Does anyone know what is causing this error? I am struggling to resolve it.

Troubleshooting, I tried:

print(any(is.na(df)))      # returned False
print(sapply(df, length))  # returned 60 per column
df$group <- as.factor(df$group) # tried with and without this line
1 Comment
2024/11/29
17:29 UTC

0

What test am i doing?

I’m currently doing an animal study where I train insects to run a maze, and see if their time improves over learning trials.

Half of them (16) did learning trials for 5 days and the other half (15) did trials for 10 days.

I want to compare the before and after times of the insects that did 5 trials to those who did 10, to see if the times are better after more training trials, but I have no idea what analysis I would use. Any help would be massively appreciated

9 Comments
2024/11/29
12:08 UTC

1

Creating pathway error bar plots... Error in $<-.data.frame(*tmp*, "group", value = c(2L, 1L, 2L, 2L, : replacement has 179 rows, data has 1 In addition: Warning message: In cbind(sample = colnames(sub_relative_abundance_mat), group = Group, : number of rows of result is not a multiple of vector len

Heyy, I am trying to run a script called ggpicrust2 and I am facing this error. Does anyone know what can It ne?

6 Comments
2024/11/29
11:33 UTC

0

Base Datos

Estoy realizando un trabajo, uno e los apartados es crear una base de de datos, con 600 datos, estos datos debe ir de 0.1 a 20. Y deben tener mayor representación los números cercanos a 5 (está parte se cumple en código, según yo). Pero al momento de ejecutar mi código, los valores se repiten y no quiero que suceda eso ¿Cómo puedo solucionarlo?Además de ser posible quisiera que mi código los valores siguiera una progresión aritmética cualquiera Este es mi código: #LIBRERIAS library(ggplot2) library(dplyr)

             #BASE DE DATOS

set.seed(123) #Semilla de reroductibidad

Generar 600 números aleatorios con una distribución normal

Diametro <- rnorm(600, mean = 5, sd = 7) # Media en 5 y desviación estándar 2

Asegurarse de que los valores estén en el rango de 1 a 25

Dametro <- pmax(0.1, pmin(25, Diametro))

Tabla <- data.frame(Diametro = Diametro)# Crear un data frame #print(head(Tabla)) Tabla_ordenada <-Tabla %>% arrange(Diametro)

3 Comments
2024/11/29
04:02 UTC

2

Plotting dive depth and time

Hi there! I have the following variables and would love to know how to construct a line graph on R studio for them:

  • Descension rate (meters per second)
  • Time spent at bottom (minutes)
  • ascension rate (meters per second)

I would love any help please! I've scoured the internet with little success and I want to showcase that data for 5 different animals on the same graph to show variation in max depth and max bottom time

Thankyou so much

2 Comments
2024/11/28
22:17 UTC

1

Problem installing RStudio after uninstalling it

Like the title says I had an issue reinstalling RStudio after uninstalling it since I had a problem with it beforehand and had my professor suggest me to uninstall and reinstall. After downloading the file for RStudio on the Posit website, I got to my sticking point where there's a window that opens and says that "RStudio is currently running. Please close it before installing a new version." Anyone know the fix for this?

2 Comments
2024/11/28
20:26 UTC

2

SVM Predict Error

Hi all,

I am going out of my mind trying to figure out what my problem is and stack overflow, and other sources have not helped. I have split my data set into a train/test split and tried to run an SVM model. I am getting the following error:

Error in names(x) <- temp :
'names' attribute [11048] must be the same length as the vector [3644]

I would note that I have checked my variables including the ones I only care about, made sure there are no N/A values, and my categorical variables are factors.

Sample Data

|| || |engine_hp|engine_cylinders|transmission_type|drivetrain|number_of_doors|highway_mpg|city_mpg| |260|6|Automatic|Front Wheel Drive|2|27|17| |150|4|Automatic|All Wheel Drive |4|35|24| |201|4|Automated_manual|Front Wheel Drive|4|36|25| |201|4|Automated_manual|Front Wheel Drive|4|36|25| |201|4|Automated_manual|Front Wheel Drive|4|36|25| |201|4|Automated_manual|Front Wheel Drive|4|35|25|

Model

library(e1071)

svm_model <- svm(drivetrain ~ ., 
               data = train,
               type = 'C-classification')

summary(svm_model)

Call:
svm(formula = drivetrain ~ ., data = train[complete.cases(train), ], type = "C-classification")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 

Number of Support Vectors:  5586

 ( 1410 888 1742 1546 )


Number of Classes:  4 

Levels: 
 All Wheel Drive Four Wheel Drive Front Wheel Drive Rear Wheel Drive

Predict
predictions <- predict(svm_model, newdata = test, type='class')

str() outputs.

> str(train)
tibble [8,270 × 7] (S3: tbl_df/tbl/data.frame)
 $ engine_hp        : num [1:8270] 210 285 174 225 260 132 99 172 329 210 ...
 $ engine_cylinders : num [1:8270] 4 6 4 4 8 4 4 6 6 6 ...
 $ transmission_type: Factor w/ 5 levels "Automated_manual",..: 4 2 2 4 2 4 2 4 2 2 ...
 $ drivetrain       : Factor w/ 4 levels "All Wheel Drive",..: 3 2 3 3 4 3 3 3 4 4 ...
 $ number_of_doors  : num [1:8270] 2 2 4 4 4 4 4 4 2 4 ...
 $ highway_mpg      : num [1:8270] 31 22 42 26 24 31 46 24 29 20 ...
 $ city_mpg         : num [1:8270] 23 17 31 18 15 24 53 17 20 14 ...
 - attr(*, "na.action")= 'exclude' Named int [1:99] 1754 1755 2154 2159 2160 2162 2168 2169 3683 3691 ...
  ..- attr(*, "names")= chr [1:99] "1754" "1755" "2154" "2159" ...

> str(test)
tibble [3,545 × 7] (S3: tbl_df/tbl/data.frame)
 $ engine_hp        : num [1:3545] 260 150 201 201 201 201 140 140 140 140 ...
 $ engine_cylinders : num [1:3545] 6 4 4 4 4 4 4 4 4 4 ...
 $ transmission_type: Factor w/ 5 levels "Automated_manual",..: 2 2 1 1 1 1 4 4 4 4 ...
 $ drivetrain       : Factor w/ 4 levels "All Wheel Drive",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ number_of_doors  : num [1:3545] 2 4 4 4 4 4 4 2 2 2 ...
 $ highway_mpg      : num [1:3545] 27 35 36 36 36 35 29 29 29 28 ...
 $ city_mpg         : num [1:3545] 17 24 25 25 25 25 22 22 22 22 ...
 - attr(*, "na.action")= 'exclude' Named int [1:99] 1754 1755 2154 2159 2160 2162 2168 2169 3683 3691 ...
  ..- attr(*, "names")= chr [1:99] "1754" "1755" "2154" "2159" ...
3 Comments
2024/11/27
22:05 UTC

1

Any way to easily export a dataframe to csv output in the terminal so it's easy to copy and paste?

I'm working in emulated R on DataCamp and want to follow along locally on my machine, but it's difficult to get dataframes (impossible to download, don't want to have issues with formatting several hundred rows). I just want to copy and paste into a .txt file then convert to csv and import locally.

3 Comments
2024/11/27
21:40 UTC

1

Binomial data

When analysing binomial data, do i need to test for variance and normality or can these be assumed?

1 Comment
2024/11/27
17:41 UTC

Back To Top