/r/Rlanguage

Photograph via snooOG

We are interested in implementing R programming language for statistics and data science.

This reddit seeks new methods. For life and organization. We are interested in implementing R language for statistics and data science.

R and Statistics subs:

/r/rstats

/r/statistics

/r/Rstudio

/r/rprogramming

R resources:

R on Stack Overflow

Comprehensive R Archive Network

Swirl: Learning R with interactive lessons within the R console

/r/Rlanguage

43,455 Subscribers

3

How to use "raw" SQL in dbplyr?

I'm trying to emit a SELECT from_unixtime(ts) like so:

tbl(db, "table") |> select(ts=dbplyr::sql_expr(from_unixtime(ts)))

but I get this error: \dbplyr::sql_expr(from_unixtime(ts), db)` must be numeric or character, not a <sql/character> object.`

The dbplyr docs don't really explain how to use sql_expr() in connection with other dplyr functions.

4 Comments
2024/11/29
14:44 UTC

2

Yet another plyr vs purrr question

Hi all,

real life example. A bunch of "tools" is executing a bunch of "jobs". Each job is either a production of maintenance job. I need to flag each production job that was followed in time by a maintenance job. This sample does what I want:

library(tidyverse)

jobs <- as.tibble(read.table(textConnection("
tool time is_maintenance
   1    1   0
   1    2   0
   1    3   1
   1    4   0
   2    1   0
   2    2   0
   2    3   0
   2    4   0
   "), header=T))

jobs.1 <- ddply(jobs, "tool", function(x) {
    # sort by time so we can know what the "next" job on a particular
    # tool is
    x <- x[order(x$time),]
    # "next_maintenance" is "is_maintenance" shifted one up
    x$next_maintenance <- c(x$is_maintenance[2:nrow(x)], NA)
    x
})

print(jobs.1)

jobs.1 is a data frame with an additional column next_maintenance that flags if the next job is a maintenance job. (Of course due to the stupidity of R's "inclusive subscritping" of 1-indexed sequences and this will break if some tool made less that 2 jobs but I'll let that slide for the moment.)

This works well enough but doesn't seem to be the preferred method in 2024. I've found nothing in the tidyverse documentation that resembles this workflow:

  1. Chop the data frame into groups

  2. Do some arbitrary stuff with each group, yielding new data (tibbles) with possibly additional or fewer rows and/or columns than the original

  3. join the group results row-wise

It's the "arbitrary" part of 2) that I'm having trouble finding information on because tidyverse seems to be focused on summarizing groups rather then creating new, row-wise data.

7 Comments
2024/11/28
09:46 UTC

28

A new platform to develop and share Shiny apps!

Hey r/Rlanguage ,

I want to share a project I've been working on: a platform to develop and share Shiny apps. I'd greatly appreciate it if you gave it a try and shared your feedback!

https://preview.redd.it/k7x3sld7pi3e1.png?width=1112&format=png&auto=webp&s=d37fb7ad2ab6fbcb722feb2f47def0d1657f496e

Features

  • There is no need to install R or Shiny locally; everything runs on your browser.
  • Edit the code and see the preview immediately.
  • Generate an initial app from a plain text description; you can also edit existing code with AI.
  • In-app chat to get quick answers on Shiny and R.
  • Entire revision history to go back to old versions of your app
  • Easily share your apps (for free!); here's an example. You can also embed apps in your blog or website (similar to YouTube's embed feature).
  • There is no need to register (some features do require creating an account, like saving an app)

Limitations

  • The applications run via WebAssembly (via Shinylive); hence, not all R packages are available.
  • Code generated with AI might not work in the browser if it uses packages unavailable in WebAssembly, but you can download the code and run it locally.
  • Apps have a startup time that depends on the number of packages used: since it uses WebAssembly, the browser must install everything whenever the user opens the URL
  • It requires a relatively modern browser since WebAssembly is a new technology, and old browsers don't support it.

Feedback

Let me know if you have any suggestions, feature requests, or any issues; I'll be happy to help!

4 Comments
2024/11/27
22:09 UTC

1

Highlighting adjacent zip codes to dataset

I have a dataset of zip codes and want to highlight all zips that are adjacent to those listed in the dataset. I actually want to do this one more time so that there is a collar 2 zip codes thick around all listed zips. How would I do this, I am having trouble getting started.

3 Comments
2024/11/27
18:08 UTC

1

demography package in R

Hello everyone,

I've started working with the demographics package in R and I have some questions. I want to apply certain models provided by this library, but I'm not sure what type of data these models require. As I understand, I need to have the data in a demogdata object. The only thing I found was how to create a demogdata object by importing data from text files. However, I had to make several data transformations, so my fully prepared data is now in a data frame (I have several columns: age group, years, population, and fertility rates).

My question is: how can I convert my data frame to a demogdata object to use with these forecasting methods?

Thank you in advance.

1 Comment
2024/11/27
15:59 UTC

1

Can't install ggplot2

I'm on Windows 10.

If I try to install ggplot2 with install.packages("ggplot2") I get several errors about dependencies. If I write library(ggplot2) I get "Error in library(ggplot2) : there is no package called ‘ggplot2’". My R version is 3.6.1 and I'm using RStudio through Anaconda.

Error on installation:

Warning in install.packages :
  unable to access index for repository 
  cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/PACKAGES'https://cran.rstudio.com/bin/windows/contrib/3.6:

If I go to C:\Users\myusername\AppData\Local\Temp\RtmpyofeSw\downloaded_packages I can see ggplot2_3.5.1.tar.gz.

It's a fresh installation, so I don't know what is happening.

Edit: Yeah, it was the R and/or RStudio version. I was using whatever Anaconda has installed, but I've uninstalled that one and installed it on my own behalf and now it works. Thanks to everyone!

8 Comments
2024/11/27
15:59 UTC

1

Social sciences student needing help/tutorials with R

Hi there, so my tasks with R concern primarily importing data and forming graphs (I have a macbook). It's mainly statistics for public administration. I'm very amateur and so is everyone in my class. We have calculated assignments but I think i'm kind of losing it somewhere and falling behind. A midterm is approaching so I would really appreciate someone knowledgeable and willing to help/guide me through this. Thank you in advance :)

10 Comments
2024/11/24
21:10 UTC

2

Kinda dumb question about coding

So I finished my bachelors in sociology this year and now looking for jobs in data analysis. I’ve been using R throughout college for various research projects and have always relied on using chatgpt or googling how to do stuff because I’ve always had trouble memorising the exact syntax for what I’m trying to do. I am quite familiar with the statistical concepts behind what I’m doing and can analyse and interpret the results but whenever it comes to actual coding I still heavily rely on looking up the syntax or telling chatgpt what I need to do. I tried memorising the syntax but I always forget a special character here or a comma there and my output results in errors.

So my question is do other people have this issue or do people really memorise all the syntax including all special characters?

I’m sorry if this is kind of a dumb question but I have an interview coming up and I’ve been practicing using R but I keep running into the same problem.

Any advice or opinions are appreciated.

19 Comments
2024/11/24
14:42 UTC

0

A soles

1 Comment
2024/11/24
00:52 UTC

5

Lambda R Function

Hey y'all! First time poster on Rlang. I'm working with a friend on a mapping project. Neither of us are professionals in the industries, but I have some experience in JavaScript(TS) and Rust and my partner in this is pretty proficient in R and GIS (he went to school for it).

I'm hoping we can put our R scripts into a serverless function to avoid heavy, custom environments in the static server. I came across this git repo (https://github.com/UI-Research/lambda-r-demo) that embeds R in a Python function using a version of rpy2 and handlr to create a python-based handler and deploy it to lambda. I'm finding that, even though the repo was posted last year, a few of the dependencies are unavailable (EPEL v7). I dug pretty deep in the Dockerfile trying to get this particular implementation to work, but 8 hours later I'm at the bar posting on reddit about it. I'm not attached to thir particular implementation, but it seemed to make sense. Our project is a javascript-based mountain bike trail mapping app for our area and id rather just send all our geospatial data to a svls function than deploy a whole environment with rust, python, r, and JavaScript. Does anyone have any insight into this?

3 Comments
2024/11/23
02:45 UTC

10

Expand your Bluesky network with R + atrrr

https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r

I wrote this post demonstrating how to find people followed by the people you follow, but who you don't follow, using R and the atrrr package.

https://preview.redd.it/pfhcdp9wxg2e1.png?width=576&format=png&auto=webp&s=2073199f5dc6365d4a397030620f3262a3422fb3

7 Comments
2024/11/22
15:08 UTC

2

Replace NA values by numeric distribution of existing values

Hey there people,

Got a bit of a pickle with Rstudio

https://preview.redd.it/jyilp8vjuf2e1.png?width=1920&format=png&auto=webp&s=5c7ad6ec3e76df74843483ee657bf9ccb1b90213

TL.DR : I want to replace NA values of each column in the same numeric distribution than non-NA values (see green example). How do I do that in Rstudio?

See upper dataframe, I have phenotypic numeric values for different species of Squamata. Lots of NA which messes up stats analyses. I want to replace those NA by numeric values.

What I've done currently : I calculated the mean value of non-NA values and replace NA by mean values for each column.

optional question : how do I do that in Rstudio ? Ressources online didn't work and doing it "by hand" on Excel was aids

What I want : replace NA values of each column by mimicking the distribution of other numeric values in the same column. Basically what I did manually in green as an example : Min value is 15, max is 38, and most variables are around 22. Thus NAs are replaced to mimic that.

Actual question : is there any commonly used script in scientific research which does something similar to what I want to do ? No need for anything too complex, it's for a school project.

If not, I'd like to calculate the extent for one column, divide that by the number of NA values. And increment the result while replacing NAs. Example : for green column, min is 15, max is 38. Extent is 38-15 = 23. lets say there are 23 NA values. 23/23=1. Replace 1st NA value by min value : 15. Replace 2nd by 15+1 =16. Replace 3rd by 16+1 = 17, etc...

I can do that manually in Excel, but is it possible to do so in R studio ?

Many thanks for any help!

 

6 Comments
2024/11/22
11:34 UTC

2

Please help a suffering Stats student

Hi, so I have an assignment where my prof. wants two different quantile-quantile plots for the following data. I have tried to figure it out myself with the help of websites. But as someone who has very, very little knowledge of this software I don't understand what any of it means. I pretty much need to code two separate quantile-quantile plots, one for the "Yes" category of lactating and another for the "No". I have tried to copy and paste this data into two separate spreadsheets but R gives me an error so this is my last hope 😭. Please help a suffering uni student in her time of need 🙏

https://preview.redd.it/1uwvoa8cgb2e1.png?width=271&format=png&auto=webp&s=c1c5d4a0fbb704296699c41a13c2d2ecabc8b1eb

3 Comments
2024/11/21
20:44 UTC

0

When I run the last 4 lines of the code this error pops up: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y' Anybody know how to fix this? I already checked if there are NA values.

Code: options(

digits = 2,

scipen = 999,

warn = -1

)

rm(

list = ls()

)

library(magrittr)

library(readr)

Predicting_Demand_2 <- read_csv("~/Predicting Demand 2.csv")

col_types = "cnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnccn"

name_repair = janitor::make_clean_names

library(dplyr)

Predicting_Demand_2$train_test <- "Train"

train_data<- Predicting_Demand_2 %>%

dplyr::filter(train_test == "Train") %>%

dplyr::mutate(

train_test = ifelse(runif(n()) > 0.5, "Validation", train_test)

)

Predicting_Demand_2$box_cox_quantity <- "id + lat + long + pop + shop + quantity + price"

#box_cox_quantity <- "city + lat + long + pop + shop + brand + container + capacity + price"

#dplyr::select(-shop)

`Predicting_Demand_2 <- Predicting_Demand_2 %>%

filter(train_test == "Train") %>%

lm_house <- lm(

formula = box_cox_quantity ~ id + lat + long + pop + shop + quantity + price,

data = Predicting_Demand_2 %>%

dplyr::filter(

train_test == "Train"

)

)

1 Comment
2024/11/21
20:08 UTC

3

Grouping in dplyr 1.1.0?

I am doing data analysis for my PhD on plastic pollution. I used to group data based on an unique ID column with dplyr and a combination of group_by(id) and summarize(...) to sum up data and so on. Now this stopped working and tells me to use reframe() instead of summarize(). However, grouping does not work anymore and neither do the summarizing functions (e.g. tot_litter_grams = sum(litter_grams)). The dplyr documentation does not help me as of now, and neither did ChatGPT. Does anyone know how to get this working again?

Edit: Solved. I changed one function inside the summarize() function, which threw an error about summarize() not being supported anymore. Changing back the function inside summarize() solved the issue.

14 Comments
2024/11/21
12:40 UTC

2

Writing DataFrames to Tables in Databricks

The code below is what I'm using. If I do 10 rows, fine, it works. The problem is my data frame is 7.3m rows. I'm testing it with a 1m subset, and it's been running for 3 hours, so that's obviously not going to be very feasible. Any suggestions?

library(sparklyr)

# Connect to databricks

sc<-spark_connect(method="databricks")

# subset it to smaller number of rows for testing speed icMX<-icM[1:1000000,]

# Convert it to a Spark Dataframe

spark_df<-sdf_copy_to(sc,icMX,overwrite=TRUE)

# Save it

spark_write_table(spark_df, "edlprod.lead_ranking.intent_wide", mode="overwrite")

1 Comment
2024/11/20
19:46 UTC

2

Removing certain characters when knitting using Rmarkdown

Not sure if this is the right channel or if there is another one better, but since I didn't fine one for RMarkdown, here we go.

I'm doing some writing using RMarkdown and a VS Code plugin called FOAM (Logseq-like). I'm writing the documents in a .md file and build the stuff using a single .rmd file. The thing is, FOAM uses the characters [[ and ]] to create links between the files, pretty useful to create a wiki-like structure for writing. The main problem is, the characters appear on the output pdf. I want to get rid of those characters when I build, but I'm not experience enough with R to do so and I cannot find any proper solution by myself. The closest solution I found is the following post (not the main answer, but the other one), but I don't know how to adapt it for my purposes.

The .rmd file looks like this:

---
title             : Some Title
subtitle          : Some Subtitle
author: | 
  | My Name

wordcount         : "X"
documentclass     : article
floatsintext      : no
figurelist        : no
tablelist         : no
footnotelist      : no
linenumbers       : no
mask              : no
draft             : no
tables            : no
output: 
  bookdown::pdf_book:
    toc: false
  
header-includes:
   - \usepackage[spanish]{babel}
   - \usepackage{booktabs}
   - \usepackage{placeins}
   - \usepackage{titling}
---
```{r, include = FALSE}
library(knitr)
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = '.')
```

```{r, child=c('MyMarkdownDocument.md')}
```

Any advice to get rid of those characters? I want to avoid the manual option of totally remove the symbols every time I build, if I can.

0 Comments
2024/11/19
01:05 UTC

5

Question about LCA in R

I recently need to use latent class analysis (LCA) function. However, once I installed in R 4.4.0, it says the lcca package was created for previous version, like before R 4.x.x version. Does anyone know how to install this package and use it smoothly in the most updated R? Thank you!

7 Comments
2024/11/18
20:52 UTC

2

Is it silly to run multiple time consuming scripts at once on windows?

I am running two r scripts at once, both on different desktops (windows option to have another screen?).

Will R run slower if there are multiple scripts going at once? Would it be wiser to run them one at a time?

7 Comments
2024/11/18
17:25 UTC

6

Package initialization function ... is there such a thing?

I made an R package that needs some initialization code run upon loading of the package using library(). Is there a possibility to do this?

3 Comments
2024/11/18
13:22 UTC

5

devtools: Package works only in dev environment but not after installation

I'm trying to write a convenience package that facilitates access to a database I use all the time. Here's a minimal example of the single R file involved:

.pdb = DBI::dbConnect(odbc::odbc(), driver="SQL Server",
                      <more connection args>)

#' @export
Anlage <- dplyr::tbl(.pdb, 'Anlage')

Yes, there's a DB connection hard-coded into a package. Never mind. This is only for my local use, not distribution.

Enter a Windows shell in the package source directory and load the package in the development environment:

PS > R.exe

R version 4.4.1 (2024-06-14 ucrt) -- "Race for Your Life"

> library(devtools)
Loading required package: usethis
> load_all()
ℹ Loading ProdDB
> class(Anlage)
[1] "tbl_Microsoft SQL Server" "tbl_dbi"
[3] "tbl_sql"                  "tbl_lazy"
[5] "tbl"
> Anlage
# Source:   table<"Anlage"> [?? x 43]
# Database: Microsoft SQL Server 13.00.6300[ProdDB]
   anlagentyp anlagennummer cre_dat             end_dat
   <chr>      <chr>         <dttm>              <dttm>
 1 " EXT"     "1    "       1992-12-23 09:40:22 5512-05-04 21:13:51
 2 "01LI"     "409  "       2012-03-20 13:57:54 5512-05-04 21:13:51

So that works fine. Let's build and install it (no errors, output from commands omitted):

> build()
> install()
* DONE (ProdDB)

Exit and re-enter R:

> q()
Save workspace image? [y/n/c]: n

PS > R.exe

R version 4.4.1 (2024-06-14 ucrt) -- "Race for Your Life"

Load and test installed package:

> library(ProdDB)
> class(Anlage)
[1] "tbl_Microsoft SQL Server" "tbl_dbi"
[3] "tbl_sql"                  "tbl_lazy"
[5] "tbl"

This looks like before. Let's get some data:

> Anlage
$src
$con
Loading required package: odbc
Error: external pointer is not valid

Now that's where I am. The top of traceback() looks like this:

> traceback()
10: stop(structure(list(message = "external pointer is not valid",
        call = NULL, cppstack = NULL), class = c("Rcpp::exception",
    "C++Error", "error", "condition")))
9: connection_info(dbObj@ptr)
8: dbGetInfo(object)
7: dbGetInfo(object)
7 Comments
2024/11/18
12:07 UTC

34

lovecraftr: A data r package with lovecrafts work for text and sentiment analysis.

Hi, I recently came across a paper that performed sentiment analysis on H.P. Lovecraft's texts, and I found it fascinating.

However, I was unable to find additional studies or examples of computational text analysis applied to his work. I suspect this might be due to the challenges involved in finding, downloading, and processing texts from the archive.

To support future research on Lovecraft and provide accessible examples for text analysis, I developed an R package (https://github.com/SergejRuff/lovecraftr). This package includes Lovecraft's work internally, but it also allows users to easily download his texts directly into R for straightforward analysis.

https://preview.redd.it/s7r4shuunh1e1.png?width=881&format=png&auto=webp&s=1cb99d596fbe45b7a3c12811e738cb40a72ff687

6 Comments
2024/11/17
16:29 UTC

12

What is something you wish available as a R package?

Hi everyone,

I’m looking to take on a side project of building an R package and releasing it to the public. However, I’m struggling with deciding what the package should include. The R community is incredibly active and has already built so many tools to make developing in R easier, which makes it tricky to identify gaps.

My question to you: What’s something useful and fairly basic that you find yourself scripting on your own because it’s not included in any existing R packages?

I’d love to hear your thoughts or ideas. My goal is to compile these small but helpful functionalities into a package that could benefit others in the community.

Thanks in advance for sharing your suggestions!

7 Comments
2024/11/17
09:37 UTC

2

Web host with r and quarto

I want to create a fastapi-based web site, and much of its functionality will be provided by r and quarto. (I am part of a community that wrangles data and creates reports using both r and quarto. Also, I know and have used python since the 90s so I know it provides these abilities as well. However, this community doesn't.) I have been looking for a web hosting service that would allow me to call r (via rpy2) and quarto on the server; however, I have been unsuccessful.

Any help would be appreciated.

4 Comments
2024/11/16
11:34 UTC

9

[dbplyr] What's so hard about giving columns their full names?

This is really frustrating. I'm trying to make a complex joins of a half a dozen tables, and some of them have a column called flags. To differentiate them, R names themflags.x, flags.y, ... in the order they appear in the join. Yes I know I can specify a suffix argument to the the inner_join() function, but that only gets appended if that column is actually used in the query.

  1. Why make it a suffix instead of a prefix? In SQL the table name is prefixed (I know the native R merge() uses suffixes)
  2. Why not give the option to prepend (not append) the SQL table name to each field name? Why the arbitrary limitation to two characters?
  3. Why is the suffix appended conditionally only in case a column name appears more than once in the query, breaking the code each time one refactors the query?

I know better than complain about FOSS. I just can't understand why these in my exes counterproductive decisions were made. I'm a strong proponent of "explicit is better than implicit", which is why I wouldn't mind if any multi-table query would by default prepend the table name to all variables so there is never any ambiguity.

7 Comments
2024/11/14
09:46 UTC

0

Need Help Deciding what Function to Use

I have two data frames where one contains all the values and the second is missing a column of values, but I need to maintain the order of the second data frame. I'm having the hardest time doing this after two years if not using R. I'm not even sure the best function to use. Any help would be appreciated.

5 Comments
2024/11/13
18:38 UTC

Back To Top