/r/Rlanguage

Photograph via snooOG

We are interested in implementing R programming language for statistics and data science.

This reddit seeks new methods. For life and organization. We are interested in implementing R language for statistics and data science.

R and Statistics subs:

/r/rstats

/r/statistics

/r/Rstudio

/r/rprogramming

R resources:

R on Stack Overflow

Comprehensive R Archive Network

Swirl: Learning R with interactive lessons within the R console

/r/Rlanguage

42,864 Subscribers

7

not NA, just missing

HOLD UP I'VE DONE IT!
Thanks so much for your help folks, I was scared to ask here but you were all super nice!

Howdy y'all, I'm in desperate need of help and nothing that I've looked at seems to be talking about my specific problem?
I'm not great at R, I'm trying to learn, so I might just be an idiot?
I'm trying to replace a missing value in my data, not NA so is.na and na.omit aren't working. The spaces are just blank? I just don't know how to fix it?
Can anyone give me a hand?
Sorry if this isn't the right place to post this, I'm really not trying to be rude or step on any toes.

this is the kind of thing I'm looking at, if that helps?

https://preview.redd.it/osg4v83kt3yd1.png?width=264&format=png&auto=webp&s=98edc947012dd2708bbc530bec105a8ffbc5a579

13 Comments
2024/10/31
14:44 UTC

1

Differences between SQL Server and DBO/ODBC syntax?

Edit: Typo in title, should be DBI

We have large SQL scripts that use many temp tables, pivots and database functions for querying a database on SQL Server (they're the result of extensive testing for extraction speed). While these scripts work in SSMS and Azure Data Studio, they often fail when using DBI and ODBC in R. And by fail, I mean an empty data frame is returned, with no error codes or warnings.

So far I've identified some differences:

  • DBI/ODBC doesn't like "USE <db_name>".
  • DBI/ODBC likes "SET NOCOUNT ON".
  • DBI/ODBC doesn't like large columns such as "VARCHAR(MAX)" unless they are at the end (right) of the output table.

Any other ideas or differences?

1 Comment
2024/10/31
00:02 UTC

20

On posting problems

I get that not everyone who posts here write code for a living, or regularly troubleshoot with users. That’s okay, we get it. And I’m not talking about all you “do my homework plix” guys either, there is a specific hell for you and that is called a job down the line.

What I am speaking about, and I am genuinely, actually really on a scientific level, curious about the thought process behind some of the posts in this, and r/rstats. Do you really believe that with a scrap of information, say a blurry photo of a graph, some random code or some vague information about a really niche biostats package that you omitted in the text, we’ll be able to troubleshoot, guide and do your work? I mean, thanks for the belief in humanity, but prepare to be disappointed I guess.

/rant

13 Comments
2024/10/30
22:39 UTC

3

How can I filter a dataset to retain only the smallest starting location for overlapping segments based on specific criteria?

I have a dataset with columns for chrom, loc.start, loc.end, and seg.mean. I need help selecting rows where the locations are contained within one another. Specifically, for each unique combination of chrom and seg.mean, I want to keep only the row with the smallest loc.start value when there is overlap in location ranges.

For example, given this data:

chromloc.startloc.endseg.mean
113000addition
110003000addition
112000addition
15001000addition

The output should only retain the last row, as it has the smallest segment length within the overlapping ranges for chrom 1 and seg.mean "addition."

Currently, my method only works for exact matches on loc.start or loc.end, not for ranges contained within each other. How can I adjust my approach?

filtered_unique_locations <- unique_locations %>%

group_by(chrom, loc.start, seg.mean) %>%

slice_min(order_by = loc.end, n = 1) %>% # Keep only the row with the smallest loc.end within each group

ungroup() %>%

group_by(chrom, loc.end, seg.mean) %>%

slice_max(order_by = loc.start, n = 1) %>% # Keep only the row with the largest loc.start within each group

ungroup()

6 Comments
2024/10/29
20:23 UTC

2

Leaflet legend customization

Hi all

I am currently making an interactive map using the leaflet package, and am trying to costuming the legends without using html widgets.

I have two questions-

  1. can I change the size of the legends?

  2. can I make it so that the legends for base layers are invisible unless the layer is activated?

Again- I am hoping to do this in base leaflet without using HTML widgets.

Thanks 🖤

1 Comment
2024/10/29
18:48 UTC

0

Getting glmer to add "specials".

I have been having an issue using emmeans with a glmer model. It may be because glmer doesn't save "specials" as an attribute in the model. Is there a way anyone knows to force glmer to do this?

5 Comments
2024/10/28
14:22 UTC

1

Error with emmeans and a glmer

I have a glmer with the call

Threshold.mod <- glmer(formula = Threshold ~ Genotype + poly(Frequency, degree = 2) + Sex + Treatment + Week + Genotype:poly(Frequency, degree = 2) + poly(Frequency, degree = 2):Sex + poly(Frequency, degree = 2):Treatment + Sex:Week + Treatment:Week + (1 | Id), data = thresh.dat, family = inverse.gaussian(link = "log"), control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05)))

When I attempt to use emmeans at all, I get the error message

Error in (function (..., degree = 1, coefs = NULL, raw = FALSE)  : 
  wrong number of columns in new data: c(0.929265485292125, 0.139620983362299)

What am I doing wrong?

9 Comments
2024/10/27
21:00 UTC

1

Help with function to loop Mann-Whitney and output results into tibble

I'm trying to create a function that will run a Mann-Whitney U test of var~epoch (a 2 level factor) and when used with mapdf on the numeric variables in a tibble (survey_likert), it will output a new tibble with the median Q25, Q75 for each of the levels of epoch, the W and the p value (so that I don't have to manually do this for each variable and collate the data). All of the numeric variables in the tibble are Likert responses coded 1-5 from a strongly disagree-->strongly agree scale.

GPT4o mini has created this, which keeps getting stuck with the same error, no matter how many times I trouble shoot it.

# Define a function to perform the Mann-Whitney U test and extract the required statistics
run_mann_whitney <- function(data, var_name) {
  test_result <- wilcox.test(data[[var_name]] ~ data$epoch)
  
  # Extract median, 0.25 and 0.75 quantiles for each group
  stats <- data %>%
    group_by(epoch) %>%
    summarize(
      median = median(.data[[var_name]], na.rm = TRUE),
      q25 = quantile(.data[[var_name]], 0.25, na.rm = TRUE),
      q75 = quantile(.data[[var_name]], 0.75, na.rm = TRUE)
    ) %>%
    ungroup()
  
  # Create a summary row with test results and group stats
  tibble(
    variable = var_name,
    median_group1 = stats$median[1],
    q25_group1 = stats$q25[1],
    q75_group1 = stats$q75[1],
    median_group2 = stats$median[2],
    q25_group2 = stats$q25[2],
    q75_group2 = stats$q75[2],
    W = test_result$statistic,
    p_value = test_result$p.value
  )
}

# Apply the function to each numeric variable in the tibble 
result_table <- survey_likert %>% 
  select(where(is.numeric)) %>% 
  names() %>% 
  map_df(~ run_mann_whitney(survey_likert, .x))

# View the results
print(result_table)

The individual components of the function can be run manually successfully on a single variable, but when using the mapdf it keeps giving the same error, which seems to be a problem with epoch variable being passed through the group_by argument:

Error in summarize(., median = median(.data[[var_name]], na.rm = TRUE), : argument "by" is missing, with no default

ChatGPT can't come up with a solution that fixes this, no matter how I enter the prompt and it's given me about 8 different versions. Does anyone have an answer to how to fix this, or something that achieves the desired outcome but works as I'm a the limit of my R understanding?

Much appreciated

6 Comments
2024/10/27
13:35 UTC

1

linking 2 datasets in RStudio

I'm still a noob in R and learning the language. Now I have 2 datasets: questionnaire_menu and fixations_selections_menu. Both are csv. 30 persons were tested and both files contain data about the same 30 persons. To analyze I now need to link both together. In the first dataset the colomn to identify the testpersons is called "person" in the second it's called "Su". The "person" column is a num variable with cells containing the numbers 1 until 30. The second is chr variable with cells containing the text p1.asc, p2.asc and so on until p30.asc. Now how can I make the "Su column" an num with numbers 1 to 30 and how then can I link both sets together using this info?

Thx for helping me...

5 Comments
2024/10/26
15:36 UTC

3

List of lists of lists parsing

Hello, I parsed a son file in R. Now I have an issue that cannot resolve as I have the issue of extracting this lists of lists of lists that also have different row numbers each list and also different columns as, depending on the object, some columns may not be there. How best access these dataframes buried into lists and also these columns hidden into dataframes nested into lists? I am really struggling. Thanks.

29 Comments
2024/10/26
06:04 UTC

5

Literate programming in Obsidian with R (later Python)

0 Comments
2024/10/24
13:10 UTC

9

Anyway to find task based work in R?

i have no hope of ever finding a career. I would be happy to do random task work if I could find anything. Is there a reccomended way to do this?

7 Comments
2024/10/24
00:50 UTC

3

Nested data: two manipulations

I imported a CSV (it comes from PsychoPy, a common psychology experiment platform). I have a couple of columns that contain sublists that I need to manipulate, and I want to know what the best way to proceed is. What variable type should I be holding these as?

  1. I have a column 'mouse.clicked_name' that looks like this. I just need to mutate to create a new variable taking each value from, say, ['pez', 'pez'] to just pez as a character value. I could do that with string manipulation, I suppose, but would it be easier to just convert it to the right variable type and extract the first item? What variable type would that be and how would I go about doing this?

https://preview.redd.it/nkm91x4otkwd1.png?width=141&format=png&auto=webp&s=9725e9274f149a702bbdbffc0edd78f9d8208b4a

  1. I have a three variables (x_coord, y_coord, and time) that are also vectors, and will be worked with as such in some later calculations. Should I convert this another variable type to work with them? If so, which and how? Thanks!

https://preview.redd.it/t8mz15g6ukwd1.png?width=806&format=png&auto=webp&s=ce47fbe5f205dc4257a60e99b5b2fa469b08ff11

5 Comments
2024/10/23
21:50 UTC

0

Soapbox: R needs to change to a more permissive license

Switching the R license from its current restrictive GPL-2 license to a more permissive one, such as the PSF license, would help R stop it's decline as a leading language for data analysis. Python's popularity continues to rise, partially due to its more flexible licensing, and pushing out R's market share. A permissive license would allow easier integration into projects. For example, Microsoft is integrating Python into Excel when R seems like a more user friendly option. Similar licenses, like the PSF, Apache License or MIT License, have proven successful in fostering widespread usage while still encouraging contributions.

Other projects have successfully made similar transitions:

  • Mozilla Firefox re-licensed from Mozilla Public License (MPL) 1.1 to MPL 2.0, which is more permissive and compatible with other open source licenses.

  • Mono, originally under the LGPL, switched to the MIT License to make it more appealing to commercial users.

  • React.js transitioned from the BSD + Patent license to the MIT License, which eliminated concerns around patent clauses.

To switch the license, the first step involves obtaining permission from all contributors, as they hold copyrights to their respective parts of the codebase. If some contributors are unreachable, parts of the code may need to be rewritten. Once all permissions are obtained or code has been modified, the new license can be adopted, allowing R to better compete in the modern open-source ecosystem.

This change would be challenging, but it has the potential to secure a bright future for R, positioning it as a more competitive and appealing choice for data analysts and developers.

16 Comments
2024/10/23
16:06 UTC

1

Want to make a wrapper package around XlsxWriter python package

Do you know this powerful python package https://xlsxwriter.readthedocs.io/index.html? It is superior to any R package I know in creating polished xlsx files (eg. openxlsx, writexl). But maybe there are important packages I have missed? Really need to know coz I got the willing to make a complete R wrapper around it? Any ideas, comments?

2 Comments
2024/10/23
10:35 UTC

1

Density ridge plot in ggplot

I’m in an into class for R and I can’t figure out how to make a density ridge plot. Can anyone help me?

I have the packages “tidyverse”, “openintro”, “janitor”, and “ggridges” loaded. Those are what I was instructed to put in.

My code so far is ggplot( data = [data] , mapping = aes( x = [categorical data in data set] , y = [numerical data in the data set] )) + geom_density_ridges() + [all my labels]

The error code I have says Error in geom_density_ridges() : error occurred in 1st layer. Geom_denisty_ridges() requires the following missing aesthetics: y

I have tried also with my code reading geom_density_ridges( scale = 1 , alpha = 0.5)

Nothing has worked, any advice?

2 Comments
2024/10/23
04:13 UTC

2

Dealing with Underdispersion when using the DHARMa package

So I've just come across the DHARMa package today while working on a project with a lot GLMs and count data. I'm finding that much of my data is underdispered though not significantly so, I think. However I'm unsure how to deal with it as the vingette for it is a little confusing as it seems that if it's over dispersed it's not a poisson model, but if it's also underdispersed it's also could not be a poisson model. If anyone with experience with this package can help me I'd be super appreciative.

2 Comments
2024/10/23
01:36 UTC

1

ggplot2 boxplot only showing skinny lines

I am trying to make simple boxplots and they are only showing up as vertical skinny lines. The Y axis is not correct and I have no idea what it is doing. The Y axis (NA) should range from 0 to around 55, so it's not showing the points anywhere near where they should be. Here is a picture of my code.

https://preview.redd.it/mzf3uhyj6dwd1.jpg?width=1919&format=pjpg&auto=webp&s=7926a0e14808278f39967a655207f4a8efa4711e

 Label Experiment `Tree#` Height `Leaf#`  `NA`
  
<chr>
      
<dbl>
   
<dbl>
  
<dbl>
 
<chr>
   
<dbl>
1 C1             1       1     36 a        3.8 
2 C1             1       1     36 b        3.69
3 C1             1       1     36 c        0.88
4 C1             1       2     28 a       13.5 
5 C1             1       2     28 b       11.2 
6 C1             1       2     28 c        8.61

Below is an example of the data.

https://preview.redd.it/1qa5572b6dwd1.jpg?width=1919&format=pjpg&auto=webp&s=091aaee4df4c37c4e2d4848f9c11bf0d6938e999

4 Comments
2024/10/22
20:00 UTC

2

Solve me a Riddle <3

Hi guys!

I am currently learning to program with R. There is one exercise that I cannot solve - it would be amazing if someone could help!

I am supposed to generate n conventional dice rolls (1:6) and answer the following question: Does a sequence of k consecutive odd dice rolls exist? The code is supposed to work with different sequences for arbitrary n and k. Also, I am not allowed to use loops or stuff like sapply, because we are really just starting to learn.

This is what i have for now:

n <- 10
k <- 3
wurf <- sample(1:6, size = n, replace = TRUE)
binary <- ifelse ((roll %% 2) != 0, yes = 1, no = 0)
k_sequence <- rep(1, each = k)
cum_binary <- cumsum(binary)
difference <- diff(c(0, cum_binary))
k == sum(ung_sequenz %in% diff_funktion)

I have thought about everything for a very long time but I cannot seem to find an answer on how to check whether a SEQUENCE exists. If anyone has an idea that is suited for a beginner like me, I'd love to hear it!

Thank you guys <3

9 Comments
2024/10/22
13:12 UTC

4

R Live Help/Office Hours?

Is there any service where a real person can help me live via video call with R? Preferably free, though I’m willing to pay.

I just need an expert to tell me wtf is happening with R/RStudio debugging tools, my issues aren’t even syntax related. I’m experiencing incredibly inconsistent behavior where my code seems to be ignoring browser() and debug() statements, warnings are appearing for lines that haven’t run yet, etc. and I am at my wits’ end. None of the online resources I’ve seen cover this sort of thing.

8 Comments
2024/10/22
01:00 UTC

52

plotscaper: An R package for interactive data exploration (CRAN release)

11 Comments
2024/10/21
14:06 UTC

1

[Question] Incorrect/weird shapes in a leaflet plot

Edit: Fixed with help from a person in the comments, it got converted to points when it should have stayed as polygons.

Hello everyone,

I am working on this dashboard for a zip code project, I have 87 unique zip codes across 9000 rows.

I have been trying to make this leaflet map work to map each zip codes, but I get this eldritch nightmare of a shape.

https://preview.redd.it/fuyr15tnf0wd1.png?width=706&format=png&auto=webp&s=e7995065625131235709dd405ec88fee576d951e

I am confused on what I did wrong because:
I converted the spatial data from NAD83 to EPSG:4326 following guidelines:

bexar_spatial <- st_transform(bexar_spatial, crs = "+proj=longlat +datum=WGS84")

I checked the geometry, and it all looks good:

> table(st_is_valid(bexar_county_medical_licenses_sf)) #geommetries are valid

TRUE 
9579 

I did see intersects in the geometry and tried some buffer/simplifications but kept getting odd shapes.

> st_intersects(bexar_county_medical_licenses_sf)
Sparse geometry binary predicate list of length 9579, where the predicate was `intersects'
first 10 elements:
 1: 1, 2, 87, 109, 250, 261, 275, 310, 319, 413, ...
 2: 1, 2, 87, 109, 250, 261, 275, 310, 319, 413, ...
 3: 3, 4, 6, 18, 29, 41, 48, 50, 55, 58, ...
 4: 3, 4, 6, 18, 29, 41, 48, 50, 55, 58, ...
 5: 5, 83, 108, 110, 121, 471, 643, 762, 765, 767, ...
 6: 3, 4, 6, 18, 29, 41, 48, 50, 55, 58, ...
 7: 7, 192, 638, 1033, 1113, 1504, 1952, 1985, 2068, 2125, ...
 8: 8, 9, 10, 13, 16, 17, 33, 39, 47, 171, ...
 9: 8, 9, 10, 13, 16, 17, 33, 39, 47, 171, ...
 10: 8, 9, 10, 13, 16, 17, 33, 39, 47, 171, ...

The overall code for the map is here:

bexar_med <- reactive({
  bexar_county_medical_licenses_sf[bexar_county_medical_licenses_sf$practice_county == input$county, ]
})

output$map <- renderLeaflet({
  bm <- bexar_med()
  
  map <- leaflet(bm) %>%
    addProviderTiles("CartoDB.Positron") %>%
    clearShapes() %>%
    addPolygons(
      lng = ~Lng, lat = ~Lat,
      color = "red",
      stroke = FALSE, smoothFactor = 0,
      fillOpacity = 0.7,
      layerId = ~zipcode
    ) 

  map
})
leafletOutput("map")

The geometry looks something like this so that is a relief, the 'centroids' kinda.

plot(st_geometry(bexar_county_medical_licenses_sf), col = 'lightblue', border = 'black')

I am wildly confused, is the issue more on the polygon data itself (prepping, cleaning, or just having the right data)? Or am I missing something in the code.

PD: There is not a class for this in my school, I have scheduled a meeting with an IT collaborator who may be able to help me but that is next Friday (I am an undergraduate in a small liberal arts school)

PD2: It is a POINT geommetry, it was MULTIPOLYGON before, so I think I should fix it right?. https://we.tl/t-ziebBYmRjM WeTransfer file with files necessary if replication is needed. [All the instructions are in the .Rmd file]

> head(bexar_county_medical_licenses_sf)
Simple feature collection with 6 features and 34 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -98.57243 ymin: 29.34271 xmax: -98.43975 ymax: 29.506
Geodetic CRS:  +proj=longlat +datum=WGS84
8 Comments
2024/10/21
02:12 UTC

30

Any good videos/streams of in-depth data analysis using R?

I'm a python guy typically but I'm trying to branch out a little bit. But in trying to find resources to learn R for data analysis I've seen lots of 10 min tutorials with a default data set. I'm hoping to find something more like someone sitting down with some dirtier, more realistic data and working through it for a few hours, something more like what I'm actually doing in the real world.

25 Comments
2024/10/21
00:12 UTC

33

Anyone using VSCode instead of Rstudio?

Hi, title says it all.

I would like to switch IDE and just use VScode for all, from bash to R. I don’t use notebooks/markdown.

What’s your experience with it?

46 Comments
2024/10/20
08:59 UTC

3

R Package looking at the wrong place

I have a local CRAN https server with this path: htpps://localservername.com/R/src/contrib

install.packages("tidyverse", repos = " htpps://localservername.com/R/", dependencies = TRUE)

R is not looking into htpps://localservername.com/R/src/contrib/PACKAGES to do my installation. However, it's looking into htpps://localservername.com/R/bin/windows/contrib/3.5/PACKAGES which is non existence on the server. This error is bombing out my package installatoin.

I tried editing the RProfile and looking at other config file to see how I can overwrite this and force it to looking into the correct path to grab the index for repository. Does anyone know where it is?

THanks

1 Comment
2024/10/19
04:12 UTC

3

Help please: Char to int function

Hey guys this is my first time using R, and I'm just doing some basic data analysis.

Here is my issue: The dataset that I'm using has a few columns that should be integers, but they are in character format.

The problem with most of the values in this column is that they have values like '4.2k' for 4200.

Here's my thought process:

My attempt

This is how my brain wants to do this, but it just wont work, can someone tell me where I'm going wrong?

4 Comments
2024/10/18
22:19 UTC

4

Command for creating a new variable based on existing variable

I would like to search an open text variable for a string and set a new variable to 1 if it is present, 0 if not. What commands would you recommend? New to R, thanks in advance.

5 Comments
2024/10/18
21:43 UTC

1

help with dose response curve

I am using the drm function from the drc package to fit a model to data from an experiment. When I plot the model it looks like everything works fine but when I want to calculate the EC50 value it makes no sense. From the plot it looks like 50% of the response is around dose 0.8 but I get 34 as an output. I will attach an image of the graph and the code block.

Does anyone know what is happening???

CODE:

drm(

data = final_long,

formula = resp ~ rel_conc,

fct = LL.4(names=c("Hill slope", "Min", "Max", "EC50")),

logDose = NULL,

) -> model

ec50 <- ED(model, 50, interval = "delta")

print(ec50)

summary(model) gives this output:

Model fitted: Log-logistic (ED50 as parameter) (4 parms)

Parameter estimates:

                         Estimate Std. Error t-value   p-value    
Hill slope:(Intercept)  -0.407440   0.067213 -6.0619 6.324e-06 ***
Min:(Intercept)          3.803964   5.762166  0.6602   0.51668    
Max:(Intercept)        313.939933 132.288552  2.3731   0.02777 *  
EC50:(Intercept)        34.465830  57.379136  0.6007   0.55481    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error:

 9.696248 (20 degrees of freedom)

https://preview.redd.it/llp51v1rcavd1.png?width=748&format=png&auto=webp&s=d7bd8c959c0130496acc0476d6d1db0454222653

7 Comments
2024/10/17
09:29 UTC

Back To Top