/r/RStudio
A place for users of R and RStudio to exchange tips and knowledge about the various applications of R and RStudio in any discipline.
Please use this as a forum to discuss R, and learn more about it. If you have any questions about how to do specific things in R, this is the place to ask. If you are looking for more advanced help using R, please visit /r/Rstats.
You can download R itself here.
You can download RStudio here. It is an incredibly powerful IDE for R, and what the mods recommend you use.
NOTE: Due to a couple of recent posts offering "compensation" for help with an assignment let's make this official: You are not allowed to offer payment for help with an assignment. If you want help with an assignment please post the work you've done/completed so far and highlight the issue you are having. Members will then help where they can. If you desire to pay someone for tutoring in R this is not the place to look for it.
/r/RStudio
Hello!
I need to store a username and password to access my data on a website but it seems I have a problem with the curl package. Downloading works just fine, however when I try > library(curl) I get an error stating it can not load it:
Error: package or namespace load failed for ‘curl’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/curl/libs/curl.so':
dlopen(/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/curl/libs/curl.so, 0x0006): symbol not found in flat namespace '_curl_url_strerror'
I'm fairly new to R so I apologize if this has a super easy fix, but I cannot figure out how to solve this problem on my own.
Thanks in advance!!
When importing from CSV column is numeric but when I transform the data frame into XTS it becomes a character. I then can't make into a numeric using as.numeric() function, I've check for missing values, dollar signs or anything else that could be a problem but came empty-handed
hi everyone, i just started using r studio so i'm not very familiar with the language. i read a piece of code and am not sure if i understand the function min_rank correctly as well as the code.
the code is:
"longest_delay <- mutate(flights_sml, delay_rank = min_rank(arr_delay))
arrange(longest_delay, delay_rank)"
am i right to say that longest_delay is a new object created, and this code is mutating the variable arr_delay in the set flights_sml to create a new variable delay_rank which assigns the ranking according to arr_delay starting with the smallest ranking? e.g. smallest number in arr_delay is 301 and there is 2 of such numbers so they will both be 1 in delay_rank.
and the second portion of the code is to arrange the new object longest_delay according to the new variable delay_rank?
thank you all in advance and sorry for the confusing explanation
I apologise that this is probably a silly question but I'm just learning Rstudio this week for my research course, and I'm trying to analyse in the dataset how many women went to university. The data has the answers obviously in each row representing a participants but I'm unsure as to only pull those participants and not the males who went to university at the same time from the dataset. I hope this made sense, thank you so much!
Hello! I'm very new to RStudio (just started learning it in a class) and I'm struggling to figure out how to make my code work. This is what I'm trying to do:
...
cleaned_lyrics_data <- lyrics_data %>%
mutate(Gender = as.factor(Gender),
Gender = recode(Gender, "1" = "Male", "2" = "Female"),
Year = as.factor(Year),
Year = recode(Year, "1" = "Freshman", "2" = "Sophomore", "3" = "Junior", "4" = "Senior"),
Condition = as.factor(Condition),
Condition = recode(Condition, "1" = "Complete", "2" = "Instrumental", "3" = "Audio", "4" = "Nothing"),
LyricsOnly = as.factor(LyricsOnly),
LyricsOnly = recode(LyricsOnly, "1" = "HeardLyrics", "2" = "HeardNoLyrics"),
Pieces = as.numeric(Pieces))
...
This is the error I keep getting:
...
Error in `mutate()`:
ℹ In argument: `Gender = as.factor(Gender)`.
Caused by error:
! object 'Gender' not found
...
For context of what I'm trying to do, this is the instruction in the assignment: "Clean so that Condition, Gender, Year, LyricsOnly, are factors. Recode them with labels. Clean so that Pieces is numeric."
I have already set my working directory and brought my csv file in.
Any help would be very appreciated, thank you!!
Hello,
I’m currently learning how to code in RStudio and was wondering if anyone could help me with my plot visualization. Here’s a screenshot of it.
Can anyone tell me how to make the trend line less pixelated?
Here is my code:
# Fitting a linear regression model
modele_regression <- lm(moyenne_sacres ~ age, data = data_moyenne)
# Generating predictions and 95% confidence intervals
predictions <- predict(modele_regression, newdata = data_moyenne, interval = "confidence", level = 0.95)
# Creating the plot without the points
plot(NA, xlim = range(data_moyenne$age), ylim = range(predictions[, 2:3]),
xlab = "Age", ylab = "X Freq.",
type = "n") # "n" means no points will be displayed
# Adding the confidence interval (gray band around the regression line)
polygon(c(data_moyenne$age, rev(data_moyenne$age)),
c(predictions[, 2], rev(predictions[, 3])),
col = rgb(0.3, 0.5, 1, 0.3), border = NA) # Transparent gray shadow
# Adding the regression line
lines(data_moyenne$age, predictions[, 1], col = "black", lwd = 2)
# Improving the appearance of the plot
grid() # Adding a grid for better readability
diff(predictions[, 3] - predictions[, 2]) # Width of the confidence interval at each point
I am hoping to make something like the graphic below using ggplot or plotly in R. Any ideas other than cobbling together geoms and labels?
Hi,
I use R for plotting my experiment data.
Recently I found patchwork and ggarrange, which, l think, is great tool for simple figure arrangement.
But usually my figures also include png, jgp or svg image files generated from other software.
How can I integrate those files for easy figure management?
Does anyone have tips for this situation?
Thanks.
Trying to knit to word:
Quitting lines from 78-79 [unnamed-chunk] Execution halted
Any ideas?
I have a dataset with fish length and width in pixels that I am working with in R. To create a simple proxy for size can I multiply length * width? Fish are mostly of the same species, but obviously not rectangles in reality. Or is better to just discuss length/width and disregard "size" - I am looking at prey success of a bird species.
I don't have the time or skill (or dataset as of now) to create a more accurate estimation of size.
Hi, my classmate and I are working on a senior research project at our college and we are attempting to use R to graph and do stats on our data. WE NEED HELP we are struggling!!!!!! Anyone feel like helping like through a zoom or email or something? We are desperate.
Has anyone else noticed this irritating issue with the rename function?
I'll use rename to change column names, like so:
rename(mydata,c("new.column.name" = "old.column.name"))
This works most of the time, but some days it seems that R decides to flip the syntax so that rename will only work as:
rename(mydata,c("old.column.name" = "new.column.name"))
So, I just leave both versions in my code and use the one that R wants on a given day, but it's still irritating. Does anyone know of a fix?
Using plm in R i haven't been able to do the IPS Test for Unit Roots in Panel Models.
I keep getting errors like this:
Error in if (stat < min(cv)) { : missing value where TRUE/FALSE needed
But I have no NAs. It's in the right format. I have tried with different subsets and a balanced panel. Nothing works.
Can anyone help me with this?
I'm trying to run a goodness of fit test ready to do a POPAN model analysis but I keep getting this error:
'''release.gof(capt.pr)
RELEASE NORMAL TERMINATION
Error in (x3 + 4):length(out) : argument of length 0'''
I don't know where to go from here I cant find much about the code release.gof(capt.pr)
Hi, I am getting the following error message when I am trying to render a file in RMD.
I have tried to renv::init() as well as using restarting of the server.
Interestingly this works fine
quarto::quarto_render("reports/performance/_outcomes.qmd")
The data for Swedetown is not showing up correctly on this graph. I have no idea why and every time I change something it messes it up more. The time goes from 0:00 - 16:36 for McLain and 13:52 - 17:02 for Swedetown but is plotting at 9:00 - 12:00.
```{r}
ggplot() +
geom_line(data = mclain1013, aes(x = Date.Time, y = Wind.Speed, color = "McLain"), group = 1) +
geom_line(data = swedetown1013, aes(x = Date.Time, y = Wind.Speed, color = "Swedetown"), group = 1) +
labs(
title = "Wind Speed Over Time on 10/13 at McLain and Swedetown",
x = "Time (GMT)",
y = "Wind Speed (m/s)",
color = "Location"
) +
scale_x_datetime(date_labels = "%H:%M", date_breaks = "2 hours") +
scale_color_manual(values = c("McLain" = "blue", "Swedetown" = "red")) +
theme_cowplot() +
theme(
panel.grid.major = element_line(color = "darkgray", size = 0.5),
panel.grid.minor = element_line(color = "darkgray", size = 0.5)
)
```
Hi all,
My dependent variable is an ordered factor, gender is a factor of 0,1, main variable of interest (first listed) is my primary concern, and assumptions hold for only it when using Brent test.
When trying to fit using VGLM and specifying that it be treated as holding to prop odds, but not the others, I've had no joy.
> logit_model <- vglm(dep_var ~ primary_indep_var +
+ gender +
+ var_3 + var_4 + var_5,
+
+ family = cumulative(parallel = c(TRUE ~ 1 + primary_indep_var),
+ link = "cloglog"),
+ data = temp)
Error in x$terms %||% attr(x, "terms") %||% stop("no terms component nor attribute") :
no terms component nor attribute
Any help would be appreciated!
With thanks
Hi all, I am running a glm in R and from the residuals plots, the model doesnt meet the assumptions perfectly. My question is how well do these assumptions need to be met or is some deviation ok? I've tried transformations, adding interaction terms, removing outliers etc but nothing seems to improve it.
I am modelling yield in response to species proportions and also including dummy variables to account for special mixtures/treatment (controls)
glm(Annual_DM_Yield ~ 0 + Grass + Legume + I(Legume**2) + I(Legume**3) + Herb +
AV +
PRG_300N + PRG_150N + PRG_0N + PRGWC_0N + PRGWC_150N + N_Treatment_150N,
data=yield )
Any help greatly appreciated!
Say I have a dataset about a school, with class, age, gender and grades for each student. I want to calculate the percentage of girls in each class but I keep getting different errors, the last one in my apply ().
Here is my code (in short)
Data <- read_excel ("directory") ##this part works
Girls <- table(Data$girl)
Tot_students <- sum(Girls)
Perc_girls <- (Girls/Tot_students)*100
Data%>%
group_by(class) %>%
apply(data$girl, MARGIN = 1, Perc_girls)
The latest error I've been getting is "Error in match.fun(FUN): 'data$girl' it's not a function, a character or a symbol"
Gender in the girl column is coded as 1 (if is a girl) and 0 (if not).
Any help?
Besides the usual press 1,2,3 to either update or not the R packages after installing something, R should really ask for confirmation. After updating some packages by mistake (I pressed 2 instead of 3….) now I completely broke my library and many don’t load anymore. I mean…it is already a mess trying to make all the different packages and version work together without conflicts, so for the love of god please ask for confirmation when updating to avoid hours of work trying to make things as before….
I am currently doing a project in R, and have this dataframe:
lrdf
nseg meanlen loglr
1 27 16.64982 2.163818549
2 18 15.49226 0.524823313
3 22 23.85373 0.570587756
(it goes up to 10000 rows)
I want to create a heatmap(or 2d density plot) in R Studio. I want nseg on the x-axis, meanlen on the y-axis,and loglr to be the z value which fills the heatmap.
I read that first the dataframe has to be converted from wide to long format. So i did this:
lrdf_long <- lrdf %>%
pivot_longer(cols = c(loglr),
names_to = "variable",
values_to = "loglr")
Which gave me this:
lrdf_long
# A tibble: 10,000 × 4
nseg meanlen variable loglr
<int> <dbl> <chr> <dbl>
1 27 16.6 loglr 2.16
2 18 15.5 loglr 0.52
3 22 23.9 loglr 0.57
Now, using ggplot to create the heatmap,i did this:
ggplot(lrdf_long, aes(x = nseg, y = meanlen, fill = loglr)) +
geom_tile() +
scale_fill_viridis_c() +
labs(title = "Heatmap of loglr", x = "nseg", y = "meanlen") +
theme_minimal()
This code, though,gave me an empty plot (attached figure)(https://i.sstatic.net/KnentbdG.png):
Is there anyone who could help solve this problem?
Still new to R, when I update my excel spreadsheet is there a line of code that updates the changes made in the spreadsheet instead of re importing it? Formatting wise it is time consuming
Hi all,
I am working on a project right now which requires the use of local projections with linear IRF. However, I need to do a shock of -1 unit using the lp_lin command. I’m not very familiar with this package since it’s my first time using it but any help would be appreciated. I can only find information on positive shocks but nothing on negative.
TIA!
Hello everyone,
I have a homework that ask us to illustrate the legend is some kind of way. Like in the map below (found on chartbins.com/) the intervals are <0.95, 0.95-0.97, 0.97-0.99 etcetera. But I need to have intervals that goes from 0.95-0.97 and then 0.98-0.99, so no overlapping on intervals and that the program splits intervals by himself without putting values in breaks.
I search on google, chatgpt, but I still had the overlapping issue. the most "succesfull" code was this one :
```
Ratio_map <- tm_shape(Ratio_data)+
tm_polygons(col="Ratio2",title="Ratio1",palette="Blues",
style = "fixed", breaks = c(0.459160365469439, 0.499999999999, 0.500, 0.532467532467532)) + #Values I want
tm_symbols(col="Ratio2",n=2,size=0.8,alpha=0.8)+
tm_scale_bar()+
tm_compass() +
tmap_options(max.categories = 580)
```
Should I splits values in two before mapping? If there's no way to do it by the tm code..
Thank you for your help
Apologies if this is a simple question, but I've been having issues with reading a tsv file into RStudio. Because the entries in the second column tend to be two-word entries, the space breaks the row into a new line, resulting in an incorrectly parsed file. please see the code and result below
neuro_subclusters <- read.delim(url("https://shendure-web.gs.washington.edu/content/members/DEAP\_website/public/RNA/update/neuronal\_fine\_scale\_annotations/neuronal\_subcluster\_annotations.txt"), col_names = TRUE)
Any help would be appreciated, as this has been driving me insane! thank you
I have tried reformatting the file to "detect" likely line breaks by for looping through each row, but I haven't been able to do it successfully. Basically I am open to any ideas.
When trying to form a model using a csv files to compare data, the table only produces 1 variable where should be atleast two i think? would this issue either be to my code or the formatting of the base file?