/r/Rlanguage

Photograph via snooOG

We are interested in implementing R programming language for statistics and data science.

This reddit seeks new methods. For life and organization. We are interested in implementing R language for statistics and data science.

R and Statistics subs:

/r/rstats

/r/statistics

/r/Rstudio

/r/rprogramming

R resources:

R on Stack Overflow

Comprehensive R Archive Network

Swirl: Learning R with interactive lessons within the R console

/r/Rlanguage

39,688 Subscribers

1

Building regression models with Y as a factor?

I’m very new to using R. I have a data set that seeks to predict hotel rating scores from 1 to 5. 1 being the worst, 5 the best.

So far in my class, I’ve learned about using factors for the predictor variables but I’m unsure if we’re allowed to use factors in the response variable as well? Would that make sense? How would that work?

If not, what would I do in this situation?

2 Comments
2024/04/29
02:16 UTC

2

Create categoric vectors where you specify the 1st # of values = a specific value, & the next range of #'s = a different set of values?

I'm trying to create 4 columns with categorical data based on the table below: The 4 columns would be:

  • Gender (column of Female/Male)
  • Location (column of Urban/Rural)
  • Seat belt use (Y/N)
  • Injury (column of yes/no)

Is there an easy function to create say the "Gender" column so the 1st (7,287 +11,587 +3,256 + 6,134) = 28,254 values/rows are Female, & the next (10,381+10,969+6,123+6,693) = 34,166 values/rows are Male?

https://preview.redd.it/k40dsiwoi4xc1.png?width=356&format=png&auto=webp&s=9028242900ccb01998b01599c6e0e27de2fd06e6

5 Comments
2024/04/28
01:37 UTC

1

How can I create a boxplot of these variables

https://preview.redd.it/mp6k9kr074xc1.png?width=498&format=png&auto=webp&s=6493abef71b546c52326889747da95e401ff14f0

I am doing ANCOVA analyzing if omega 6 intake from either plants (Plant) or seafood (Seafood) has an impact on countries' rates of Alzheimer's disease. I also included the countres' proportions of people older than 65 (Plus) as a variable to account for that (since countries with older populations will probably naturally have higher rates regardless of diet). I just need to make a box plot to assess the assumption of correlation. I remember my professor said something along the lines of “the predictor variables can't be correlated" and that the boxplots should overlap. So I think I need to make a boxplot with two boxes, one for Plant and one for Seafood, where Plus is the y axis, right? But when I try to do this, the box plots look... well, like the thing above. boxplot(Rate~Seafood*Plant,data=alzheimersdietdata). What am I dong wrong?

0 Comments
2024/04/28
00:31 UTC

0

Help needed for multilevel analysis, willing to pay for teaching me!

Hi! I am trying to learn statistical analysis using R but for some reason when it comes to multilevel analyses I just don’t get it. I’m looking for somebody who can explain it to me and I’m willing to pay for this! I wrote a general script already that seems to get me halfway there, but the final interactions I just don’t understand.

3 Comments
2024/04/27
13:06 UTC

1

Function to pull min and max from each dataframe in a list of dataframes, to apply the min and max to the title of each dataframe in said list.

This is... a lot. So I want to provide some context.

First, the source document: I have an Excel workbook that contains a tab for each experiment I'm working on, and each experiment has a range of unique plots listed (think Ag research).

I'm writing some code to pull all of the tabs from an xlsx workbook into R, rename each 'tab' (now a dataframe), add some empty columns, and export each dataframe into its own xlsx or csv file for use in my work.

So I've successfully pulled the data into R, with each tab from the excel file being a dataframe in a list of dataframes 'df_list'. What I need next is to create a function which will pull the min and max values from the 'plots' column of each df_list$entry and put it in to a string as a new name for that entry, so that

df_list$entry <- "min(plot)_entry_max(plot)"

right now, I have this:

> df_list
$Test1
   Code        Line Rep Plot 
1 19005  T1_19005-1   1  13  
2 19006  T1_19006-3   1  14 
3 19007  T1_19007-12  1  15  
4 19008  T1_19008-2   1  16  

$Test2
   Code   Line Rep Plot 
1 20001  T2-01   1  17  
2 20016  T2-16   1  18  
3 20003  T2-03   1  19  
4 20008  T2-08   1  20  


sheets <- c("Test1", "Test2")     # a vector of the NAMES of each dataframe in df_list

filenames <- function(sheet) {
  filename_list <- c()
  min_plot <- min(df_list$sheet$Plot)
  max_plot <- max(df_list$sheet$Plot)
  tabname <- paste0(min_plot, "_", sheet, "_", max_plot)

  filename_list <- append(filename_list, tabname)
}

filename_list <- lapply(sheets, function(x) filenames(x))
print(filename_list)

This almost kind of works; it pulls the correct names from the sheets list and appends the output of the min and max correctly for each, but it can't seem to pull the min or max values. It instead gives me a filler value of "Inf" for the min, and "-Inf" for the max.
So my filename_list reads "Inf_Test1_-Inf", etc.

When I go through the steps I've put in the function individually with a specific entry from "sheets", I get the correct outputs. But it won't work as a function.

Hopefully this all makes sense, and someone can help me! I just wanted to be able to iterate through the entries in a list of dfs and pull the min and max out for each one. I am relatively new to R so I've been searching all over, but this listception makes it difficult to find a good result anywhere.

*Edit: Changed the column title "Row" to "Plots" to reduce confusion.
Added a pair of example dataframes from df_list.

5 Comments
2024/04/26
22:36 UTC

4

How to make a column name in string/made with paste0() be read as argument in a function?

Hello,

I'm trying to put the generation of graphs into a loop. The problem is that I'm also looping the name of the variables I use in the graph, as in the "nested" loop (line 5). The variables that change according to the loop are in lines 9 and 11. I know that I cannot use strings for this, so I tried to modify it. For example, in line 5, I tried the following:

y = sym(paste0("coef_", j))

y = !!paste0("coef_", j)

y = !!sym(paste0("coef_", j))

And none worked. Any help here is appreciated :)

1. count <- 0
2. graphs <- list()
3. 
4. for (i in 1:11) {
5.   for (j in c("max", "low")) {
6.     count <- count + 1    
7.     graph <- ggplot(
8.       df[df$group == paste0(var_groups[i]),],
9.       aes(x = var_label, y = paste0("coef_", j)) +
10.         geom_point(color = graph_colors[i]) +
11.         geom_errorbar(aes(ymin = paste0("ci_lower_",j), ymax = paste0("ci_upper_", j)), width = 0.2,
12.                     color = graph_colors[i], size = 1) +
13.         coord_flip() +
14.         geom_hline(yintercept = 0, color = "blue", linetype = "dashed", size = 1) +
15.         scale_y_continuous(limits = c(-x_axis, x_axis)) +
16.         theme_bw()
17.     graphs[[count]] <- graph
18.   }
19. }
8 Comments
2024/04/26
21:06 UTC

1

How to make this sheet work for a fixed effects model?

I have an excel sheet of data that I need to modify so I can run a fixed effects model in R. As I understand it each observation needs to be a row with a variable denoting what year the observation was made. In my spread sheet the data is set up so it is a college and its data for every year. I would like to have it set up so the row is a college, its observation for a specific year, and the year the observation was made. In other words I have this

CollegeTuition 2021Tuition 2022
xyz

And need this

CollegeTuitionYear
xy2021
xy'2022

And so on. Anyone Know how to do this with the tidyverse or something else in R?

2 Comments
2024/04/26
17:42 UTC

3

Cloud Computing with R Studio

I have a dataset that I would like to run a RAM-intensive algorithm on. Ideally I would like to have at least 400 GB of RAM, and be able to program using an R-studio interface. Does anyone have experience with a cloud service that does this?

6 Comments
2024/04/26
14:05 UTC

2

weights argument in lm()

I want to estimate this normal likelihood using weighted least squares with lm() in R.

https://preview.redd.it/su8jb3jkyqwc1.png?width=464&format=png&auto=webp&s=f3693d3597506fc7ebf91ed5fee1dc04cf7daab1

What should I use in the weights= argument? Is it c(1/n_1, ..., 1/n_k) or c(n_1, ..., n_k) or something else?

4 Comments
2024/04/26
03:54 UTC

2

Help with Moderated Mediation

Hi everyone! I am currently trying to learn data analysis with R and need to do a moderated mediation analysis. Here is an example of the model type I am trying to analyze.

Here is the code I have been trying to run, but I realized I unfortunately am too new with R to be able to fully understand the outputs and thus, what I am missing. Could someone please help me figure out how to get the strengths of the effects in my model, and the significance? Thank you so much in advance!

2. Load your packages (i.e. mini-programs)

first time: install.packages(""), later library("")

library("lavaan")

library("foreign")

library("multilevel")

library("mediation")

library("sjPlot")

library("lme4")

3. Load your data

Long format data

data<-read.spss("Long NEW.sav", use.value.labels=FALSE,

to.data.frame=TRUE, use.missings=TRUE)

hrdata<- as.data.frame(data)

names(hrdata)

4. Is multilevel necessary?

hrs.mod1<-aov(ENG_Sum~as.factor(Res_ID),data=hrdata)

ICC1(hrs.mod1)

ICC2(hrs.mod1)

5. Multilevel like a pro

LMX(mediator) as outcome

model1 <-lme(LMX_Sum ~ EMP_Sum

, random=~1|Res_ID

,method="ML",na.action=na.omit, data=hrdata)

summary(model1)

Engagement (dependent) as outcome

model2 <-lme(ENG_Sum ~ LMX_Sum + EMP_Sum

, random=~1|Res_ID

,method="ML",na.action=na.omit, data=hrdata)

summary(model2)

Make interaction term

hrdata$I_EMP_Sum_Gender_Dissimilarity <- hrdata$EMP_Sum * hrdata$Gender_Dissimilarity

Moderation

model3 <-lme(LMX_Sum ~ EMP_Sum + Gender_Dissimilarity

  • I_EMP_Sum_Gender_Dissimilarity

, random=~1|Res_ID

,method="ML",na.action=na.omit, data=hrdata)

summary(model3)

6. Test for mediation and moderated mediation (can take a while!)

Preparation (other package is used for compatibility,

please enter centered variables [c_] for the interaction [*])

model4 <-lmer(LMX_Sum ~ EMP_Sum + Gender_Dissimilarity + I_EMP_Sum_Gender_Dissimilarity

  • (1|Res_ID), data = hrdata,

REML = F, start = NULL,

verbose = 0L, na.action=na.omit,

contrasts = NULL, devFunOnly = F)

model5 <-lmer(ENG_Sum ~ LMX_Sum + EMP_Sum + Gender_Dissimilarity

  • (1|Res_ID), data = hrdata,

REML = F, start = NULL,

verbose = 0L, na.action=na.omit,

contrasts = NULL, devFunOnly = F)

mod.med<- mediate(model4, model5, treat = "EMP_Sum",

mediator = "LMX_Sum", sims=1000,

group.out = "Res_ID", dropobs=F,

boot=F, data=hrdata)

summary(mod.med)

0 Comments
2024/04/25
13:00 UTC

1

Proximity analysis

Hi, I was wondering if someone can help me with my R code for my thesis?

I have 6 datasets with xy pixel coordinates of animals in 6 different zoos. I was wondering if someone knows how I can analyse and compare their proximities. It was done at intervals so you have specific xy coordinates every 3 minutes. The xy is in pixels and not to scale yet. And if possible I would like to compare average distance and how often they are in close proximity.

Can someone please help me? :)

1 Comment
2024/04/24
15:34 UTC

0

Julia code: "size(X)[n]" in R

How do I get the nth dimension of a matrix/vector, etc., in R?

In Julia the code is size(X)[n].

The problem is that Julia's size is more generic:
v = [1,2,3]
A = [1 2; 3 4]
size(v) == (3,)
size(A) == (2,2)

R leads to NULL values:
A = matrix(c(1,2,3,4), nrow = 2, ncol=2)
v = c(1,2,3)
dim(A)
dim(v) # NULL

5 Comments
2024/04/23
14:14 UTC

1

compute biodiversity index

can someone help me with computing basic biodiversity indices? my data is in long form (see picture) but it appears that package vegan works best for matrices. should I really be wrangling it into matrix or is there anther option?

2 Comments
2024/04/23
05:12 UTC

0

welp

hey I hope everybody's doing alright, I'm new to R and i'm currently working on my basics. what should I focus on after that? i'm kinda lost so if anybody could provide a list of some sort like a pathway or something. will mean a lot to me
thank you

13 Comments
2024/04/22
10:50 UTC

2

Matrix standard multiplication in R

i write this code in R:

X <- matrix(c(4, 5, 2, 4, 3, 3), nrow=3, byrow=TRUE)

b <- c(3, -2)

print(X*b)

an the output is:

     [,1] [,2]
[1,]   12  -10
[2,]   -4   12
[3,]    9   -6

     [,1] [,2]
[1,]   12  -10
[2,]    6   -8
[3,]    9   -6

but why it's not like this:

if X*b is standard multiplication it most be the second one
11 Comments
2024/04/22
10:48 UTC

0

Why is the histogram so inaccurate?

Hi, I don't have a clue how to fix this issue it literally makes zero sense.

symbol<-sort(c("INTC", "AAPL", "MSFT","AMZN","GOOG",

"META","TSLA","NVDA","PYPL",

"NFLX","ADBE","QCOM","AMD","MRNA","FDX","EBAY","EA","HOOD","BABA",

"CAN","PENN","TLRY","MARA","SPCE","LMND","NNDM","BNGO","NIO","COIN",

"BIDU","DOCU",

"PLTR","WKHS","CRM","QQQ","UBER","TWLO","TDOC","SPOT",

"SNOW","SE","RIVN",

"PDD","NTLA","NET","CRSP","ETSY","CRWD","CRSR",

"SBUX", "ZM", "WDAY", "WBD", "VRTX", "VRSK", "TTD",

"TXN", "TMUS", "SBUX",

"SIRI", "ROST", "MNST", "MU", "MELI", "LULU", "JD",

"ADP", "ABNB",

"ALGN", "AEP", "AMGN", "ADI", "ANSS", "AMAT", "ASML",

"AZN", "TEAM", "ADSK",

"BKR", "BIIB", "BKNG", "AVGO", "CDNS", "CHTR", "CTAS",

"CSCO", "CTSH",

"CMCSA", "CEG", "CPRT", "CSGP", "COST", "CSX", "DDOG",

"DXCM", "FANG",

"DLTR", "ENPH", "EXC", "FAST", "FTNT", "GEHC", "GILD",

"HON", "IDXX",

"ILMN", "INTU", "ISRG", "KDP", "KLAC", "KHC", "LRCX",

"LCID", "MRVL",

"MDLZ", "ORLY", "ODFL", "PAYX", "PEP", "REGN", "CVNA"))

equities<-symbol

#equities <- tq_index("SP500") %>%

arrange(symbol) %>%

pull(symbol)

Get the data for each equity

data_list <- map(equities, ~ tq_get(., get = "stock.prices", from = "2018-01-01"))

Combine the data for each equity into a single data frame

data_df <- reduce(data_list, full_join)

write_csv(data_df, "df_file.csv")

change a format, compute a new variable

data_df$symbol=factor(data_df$symbol)

data_df$cap=data_df$open*1000

Add the open, close, minimum, maximum, volume, and capitalization values for the day after

data_df <- data_df %>%

group_by(symbol) %>%

mutate(open_next_day = lead(open),

close_next_day = lead(close),

close_before_day = lag(close),

low_next_day = lead(low),

high_next_day = lead(high),

volume_next_day = lead(volume),

cap_next_day = lead(cap))

################################## Here ends the data import

data_df$gain_next_day<-(data_df$high_next_day-data_df$close)/data_df$close*100

data_df$gain_day<-(data_df$high-data_df$close_before_day)/data_df$close_before_day*100

summary(data_df$gain_day)

hist(gain_day)

hist(clean_data$gain_day,freq=FALSE,breaks = c(-80,-50,-25,-10,0,10,25,50,100))

The summary of gain_day sets me a large range of datas but when I plot the graphic the visualization is inexplicably wrong, it says that the range is too wide but that's totally not true as you can see from the summary.

Can someone please help me?

2 Comments
2024/04/22
08:20 UTC

1

renderDatatable not working on first load but does on second

I have two instances of renderdatatable in my code. one is working fine and loading in the first load. but other is not working on first load but does work on second load.

Here is the working code that loads in the first load.

Theme based keywords {data-navmenu="Competition"}
=======================================================================

Column {.sidebar}
-----------------------------------------------------------------------

```{r}

Data43 <- j1

themelevel11 <- factor(Data43$theme, levels =sort(unique(Data43$theme)))

selectInput('themelevel11', 'Theme:', c("Select",levels(factor(themelevel11))), selected = "Select")

```

Column
-----------------------------------------------------------------------

### Theme based keywords

```{r}


dataset43 <- reactive({
  subset(Data43,
         (input$themelevel11 == "Select" | themelevel11 %in% input$themelevel11) 
  )
})


DT::renderDataTable(
  dataset43(), 
  filter = 'top',
  extensions = 'FixedColumns',
  options = list(
    pageLength = 10, 
    autoWidth = TRUE,
    scrollY = '400px',  
    scrollX = TRUE,
    fixedColumns = list(leftColumns = 3),    
    order = list(list(2, 'desc')), 
    columnDefs = list(
      list(className = 'dt-center', targets = '_all'),
      list(className = 'dt-left', targets = 0),
      list(visible = FALSE, targets = c(3))  
    ) 
  )  
)

The below code doesn't work on first scroll and just showing 420 rows count without any data, however the data gets loaded on the second scroll.

Long titles {data-navmenu="Technical SEO"}
=======================================================================

Column {.sidebar}
------------------------------------------------------------------
```{r}

Data36 <- c1

themelevel10 <- factor(Data36$theme, levels = sort(unique(Data36$theme)))

selectInput('themelevel10', 'Theme:', c("Select",levels(factor(themelevel10))), selected = "Select")



```

Column 1 
-----------------------------------------------------------------------

```{r}

dataset36 <- reactive({
  subset(Data36,
         (input$themelevel10 == "Select" | themelevel10 %in% input$themelevel10) 
  )
})


DT::renderDataTable(dataset36(),
            filter = 'top',
            options = list(
              pageLength = 10, 
              autoWidth = TRUE,
              scrollY = '450px',  
              scrollX = TRUE,
              order = list(list(5, 'desc')),
              columnDefs = list(
                list(className = 'dt-center', targets = '_all'),
                list(className = 'dt-left', targets = 0),
                list(visible=FALSE, targets=c(6))
              )
            )
  )

What could be the reason. I want the second code to be loaded in the first scroll when "Select" filter is chosen.

0 Comments
2024/04/22
06:51 UTC

4

Learning R and trying to understand multilevel with lme

Hi! I’m trying to pick up R right now and following a tutorial but I don’t seem to quite understand it. This is specifically about making/analyzing a multilevel model.

I am using this script and it works, but I just don’t fully understand it. Engagement is the outcome variable, but the final model I was given is a moderated/moderated mediation model. Is empathetic the independent variable, and emotion the mediation variable? Where does the moderation variable go then?

model1 <-lme(s_engage ~ s_empathetic + s_emotion , random=~1|Res_ID ,method="ML",na.action=na.omit, data=hrdata)

Thanks so much in advance and sorry if I’m unclear with what I mean, I’m still trying to pick this up.

2 Comments
2024/04/21
23:53 UTC

1

Help formatting some data in R

So i have train_set_tsibble, where

> str(train_set_tsibble)     tbl_ts [31 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame) $ date  : Date[1:31], format: "2015-01-01" "2015-01-02" "2015-01-03" ...        $ Delays: num [1:31] 25 52 57 76 41 46 37 42 32 16 ...       
- attr(*, "key")= tibble [26 × 2] (S3: tbl_df/tbl/data.frame)               
..$ Delays: num [1:26] 12 13 16 18 21 23 24 25 26 28 ...               
..$ .rows : list<int> [1:26]                
.. ..$ : int 24               
.. ..$ : int 19               
.. ..$ : int [1:3] 10 14 25               
.. ..$ : int 23               
.. ..$ : int 20               
.. ..$ : int 27               
.. ..$ : int 15               
.. ..$ : int 1               
.. ..$ : int 13               
.. ..$ : int 22               
.. ..$ : int 17               
.. ..$ : int 28               
.. ..$ : int 9               
.. ..$ : int 18               
.. ..$ : int 7               
.. ..$ : int 12               
.. ..$ : int [1:3] 5 21 26               
.. ..$ : int 8               
.. ..$ : int 16               
.. ..$ : int 29               
.. ..$ : int 6               
.. ..$ : int 2               
.. ..$ : int [1:2] 3 31               
.. ..$ : int 11               
.. ..$ : int 4               
.. ..$ : int 30               
.. ..@ ptype: int(0)          ..- attr(*, ".drop")= logi TRUE - attr(*, "index")= chr "date"         ..- attr(*, "ordered")= logi TRUE - attr(*, "index2")= chr "date" - attr(*, "interval")= interval [1:1] 1D         ..@ .regular: logi TRUE - attr(*, ".regular")= logi TRUE 

This causes errors when I try to apply it in model(), I assume because the key tibble is formatted really incorrectly. I think if I had it formatted like this it would be good:

> str(train_set_tsibble)       tbl_ts [31 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame) $ date  : Date[1:31], format: "2015-01-01" "2015-01-02" "2015-01-03" ...        $ Delays: num [1:31] 25 52 57 76 41 46 37 42 32 16 ...        - attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)         ..$ .rows : list<int> [1:31]                .. ..$ : int 1 2 3 4 5 6 7 8 9 20 ...               .. ..@ ptype: int(0)          ..- attr(*, ".drop")= logi TRUE - attr(*, "index")= chr "date"         ..- attr(*, "ordered")= logi TRUE - attr(*, "index2")= chr "date" - attr(*, "interval")= interval [1:1] 1D         ..@ .regular: logi TRUE - attr(*, ".regular")= logi TRUE
0 Comments
2024/04/21
15:56 UTC

3

Scraping a website

I'm trying to scrape the table from the following webpage:https://www.nasdaq.com/market-activity/stocks/aaa/dividend-history

I'm doing so with rselenium because I can't seem to download the html using rvest. However I'm finding that all the actual values of the table are coming up empty. Here's the code I'm using:

library(RSelenium)
rD <- rsDriver(browser = 'firefox', port = 4833L, chromever = NULL)
remDr <- rD[["client"]]
remDr$navigate(paste0("https://www.nasdaq.com/market-activity/stocks/aaa/dividend-history"))
Sys.sleep(11)
html <- read_html(remDr$getPageSource()[[1]])
df <- html_table(html_nodes(html, "table"))

If I try another url on the same website it works:

library(RSelenium)
rD <- rsDriver(browser = 'firefox', port = 4833L, chromever = NULL)
remDr <- rD[["client"]]
remDr$navigate(paste0("https://www.nasdaq.com/market-activity/stocks/a/dividend-history"))
Sys.sleep(11)
html <- read_html(remDr$getPageSource()[[1]])
df <- html_table(html_nodes(html, "table"))

I'm not sure why it works for one url but not the other. Hoping someone can explain what's going on and how I get the info in the table.

6 Comments
2024/04/21
09:27 UTC

1

First R Assignment and summary(data frame) not working

This is the code I am running for an assignment, but when it runs summary(E1_1) I only receive back Length, Class and Mode. Summary has worked for me in the past with an uploaded excel dataset but, not sure how to get it to work properly now

# Creating Vectors for Cities recorded High and Low Temperatures in Celcius

# vCity = City names

# vHigh = the highest temperature recorded

# vLow= the lowest temperature recorded

vCity <- c("Barecelona", "Berlin", "Lisbon", "London", "Paris", "Rome")

vHigh <- c(14, 2, 14, 5, 2, 14)

vLow <- c(6, -1, 3, 0, -3, 3)

#Creating a data frame titled E1_1 combining the three vectors

E1_1 <- as.data.frame(cbind(vCity, vHigh, vLow))

Sys.time()

Sys.getenv("username")

E1_1

#running summary to find mean,median, min, max, 1st and 3rd quartiles of dataframe

summary(E1_1)

Sys.time()

Sys.getenv("username")

3 Comments
2024/04/20
20:50 UTC

2

Get data from ATP website

Hi all!

I'm trying to read https://www.atptour.com/en/players/-/S0AG/rankings-history?year=2020 this page so as to extract the Rank column in Singles table.

I've used

page=readLines(https://www.atptour.com/en/players/-/S0AG/rankings-history?year=2020)

and i get the html code. I expected to se the number 37 at line 692, but i get

<div>{{playerItem.SglRollRank}}</div>

so it seems to get variables insted of values.

Do you know a way I can get those '37' values?

Thank you <3

5 Comments
2024/04/20
18:21 UTC

1

Which plots look best?

Hi,

I'm looking at an MLR for continuous income - here are my plots for MLR without logging the outcome, with logging and with logging + 1. Without logging has a lower R2 while logging and logging +1 have the same R^2. But I think the residual vs fitted graph isn't looking too good for the logged versions. Any advice/help would be appreciated. Thanks!

https://preview.redd.it/aw581ms2anvc1.png?width=1420&format=png&auto=webp&s=a134a194517fea010ebaaaa83a882dfb1ade417c

https://preview.redd.it/c1hv4ns2anvc1.png?width=1420&format=png&auto=webp&s=224bcd96acf5bd24321a1c0d353b470a0060d281

https://preview.redd.it/q2xfvh7aanvc1.png?width=1420&format=png&auto=webp&s=08411ae2736e5e585d6e0978c8269e0f2c7056fc

2 Comments
2024/04/20
14:30 UTC

2

Adjusting shape and x-axis of multi group graph using R

I am trying to make it so that all of the lines have different shapes, but right now it shows The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes difficult to discriminate ℹ you have requested 7 values. Consider specifying shapes manually if you need that many have them. I tried to add scale_shape_manual(breaks = 6:12, values = 6:12, name = "group"), but it did nothing but make the points disappear. Also, right now the objects in the legend is in a weird order. How should it make it so that they are in the proper order from Obj_6 to Obj_12?

I also want to make the x-axis show each power, but it only shows values like 0.5 and 1.0 instead of the values I need, which are c(0,0.25,0.33,0.35,0.375,0.4,0.425,0.45,0.475,0.5,0.525,0.55,0.575,0.6,0.7,0.8,0.9,1,1.1,1.2,1.3), and I have no idea how to do that. Is it possible to show them vertically since putting them horizontally together would be crowded?

As an absolute noob, any help is appreciated!!! Thanks in advance

Code:

graph_data_long %>%
  ggplot(aes(x=as.numeric(power),y=value,group=obj_name,color=obj_name)) +
  geom_line() +
  geom_point(aes(shape=obj_name)) + 
  scale_y_continuous(trans = "log10") +
  theme_linedraw() +
  scale_shape_manual(breaks = 6:12, values = 6:12, name = "group")

Data frame graph_data_long:

structure(list(obj_name = c("Obj_6", "Obj_6", "Obj_6", "Obj_6", 
"Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", 
"Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", 
"Obj_6", "Obj_6", "Obj_6", "Obj_7", "Obj_7", "Obj_7", "Obj_7", 
"Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", 
"Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", 
"Obj_7", "Obj_7", "Obj_7", "Obj_8", "Obj_8", "Obj_8", "Obj_8", 
"Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", 
"Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", 
"Obj_8", "Obj_8", "Obj_8", "Obj_9", "Obj_9", "Obj_9", "Obj_9", 
"Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", 
"Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", 
"Obj_9", "Obj_9", "Obj_9", "Obj_10", "Obj_10", "Obj_10", "Obj_10", 
"Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", 
"Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", 
"Obj_10", "Obj_10", "Obj_10", "Obj_11", "Obj_11", "Obj_11", "Obj_11", 
"Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", 
"Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", 
"Obj_11", "Obj_11", "Obj_11", "Obj_12", "Obj_12", "Obj_12", "Obj_12", 
"Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", 
"Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", 
"Obj_12", "Obj_12", "Obj_12"), power = c("0", "0.25", "0.33", 
"0.35", "0.375", "0.4", "0.425", "0.45", "0.475", "0.5", "0.525", 
"0.55", "0.575", "0.6", "0.7", "0.8", "0.9", "1", "1.1", "1.2", 
"1.3", "0", "0.25", "0.33", "0.35", "0.375", "0.4", "0.425", 
"0.45", "0.475", "0.5", "0.525", "0.55", "0.575", "0.6", "0.7", 
"0.8", "0.9", "1", "1.1", "1.2", "1.3", "0", "0.25", "0.33", 
"0.35", "0.375", "0.4", "0.425", "0.45", "0.475", "0.5", "0.525", 
"0.55", "0.575", "0.6", "0.7", "0.8", "0.9", "1", "1.1", "1.2", 
"1.3", "0", "0.25", "0.33", "0.35", "0.375", "0.4", "0.425", 
"0.45", "0.475", "0.5", "0.525", "0.55", "0.575", "0.6", "0.7", 
"0.8", "0.9", "1", "1.1", "1.2", "1.3", "0", "0.25", "0.33", 
"0.35", "0.375", "0.4", "0.425", "0.45", "0.475", "0.5", "0.525", 
"0.55", "0.575", "0.6", "0.7", "0.8", "0.9", "1", "1.1", "1.2", 
"1.3", "0", "0.25", "0.33", "0.35", "0.375", "0.4", "0.425", 
"0.45", "0.475", "0.5", "0.525", "0.55", "0.575", "0.6", "0.7", 
"0.8", "0.9", "1", "1.1", "1.2", "1.3", "0", "0.25", "0.33", 
"0.35", "0.375", "0.4", "0.425", "0.45", "0.475", "0.5", "0.525", 
"0.55", "0.575", "0.6", "0.7", "0.8", "0.9", "1", "1.1", "1.2", 
"1.3"), value = c(3.72410744566162, 5.51168003923104, 7.4060839154296, 
8.20393019709473, 9.24723797685855, 10.7562379662041, 12.8868025478135, 
15.5100727558972, 18.5534222223628, 23.1409478278656, 22.6504985586419, 
20.9065115341602, 19.4181576551116, 17.8554687886568, 10.8993991443694, 
7.81271135699668, 6.13966797266852, 5.6977077662796, 5.37408529874299, 
5.02962030371785, 4.86258697386329, 3.19737465216476, 5.59774993516746, 
7.50121965706743, 8.25167100160226, 9.03798643979528, 9.46022583168103, 
9.93268505520465, 10.4640129694759, 11.0615850694332, 13.19482999695, 
14.746170594877, 15.990014190891, 17.4551966969221, 17.9361361197718, 
7.99349308078362, 6.49960908806224, 5.84308526845936, 5.52762501102557, 
5.38280815735708, 4.91888518373503, 4.71444863014914, 3.07771847969777, 
5.47433228431448, 6.44213757389488, 6.66742430744288, 6.97535025689521, 
7.31701680907091, 7.70094772378251, 8.91690959596007, 10.6984857467849, 
11.925022990115, 11.142586680758, 10.1820818794844, 9.37740693193392, 
8.68294638919202, 7.48047503801594, 6.17035953638092, 5.8675475823089, 
5.42126227945987, 5.39176138127794, 4.86428358796829, 4.5258515829203, 
3.04646629915112, 4.54796886657372, 5.38135683525506, 5.64564288700107, 
6.08552141033587, 6.82143106487618, 7.77461891298966, 8.26190488706468, 
8.76033401364452, 9.32880000729427, 9.25312478347536, 9.12503690998079, 
9.00089994659873, 8.76811470531463, 6.62734982132179, 5.81577592377926, 
5.85559194097889, 5.36002906494716, 5.08166826736183, 4.73000595543503, 
4.44023340784953, 3.09333501144575, 4.35881755510305, 5.18741750910549, 
5.58348875895986, 6.11140873800865, 6.54912995115038, 7.06295188841488, 
7.67372446427145, 8.40813885910582, 9.30806145486793, 9.1532682896371, 
8.66516577218275, 8.2283334585154, 7.78803322738736, 6.63338647326677, 
5.84027839953848, 5.34384133619004, 5.42904421949546, 4.66831095851281, 
4.74506379824897, 4.49021089461522, 3.07796931479018, 4.1217081751311, 
5.14012654850439, 5.36151701687943, 5.66980178677372, 6.01978990929582, 
6.42277739775086, 6.89108438347427, 7.43891180869636, 8.0884637798103, 
8.12528629344052, 8.0037684219851, 7.88626899591346, 7.72705134947085, 
6.49204630463644, 5.53832636547457, 5.33078894588476, 4.98055757657446, 
4.63421867612929, 4.31216950703026, 4.0601002676063, 2.77184102938775, 
3.78612559323399, 4.42463386554498, 4.62383257394448, 4.90246206056175, 
5.22055525535581, 5.58906805889066, 6.02037960424028, 6.52930328523149, 
7.13892440760706, 7.30138340021254, 7.34092525148487, 7.38060453871976, 
7.37971301100791, 6.26914351168065, 5.2321378547673, 4.4381055134539, 
4.52424178992708, 4.68529499485843, 4.2375259073016, 3.80679536473264
)), row.names = c(NA, -147L), class = c("tbl_df", "tbl", "data.frame"
))
2 Comments
2024/04/20
08:41 UTC

0

Issues installing Rtools package

I have Rstudio 2023.12.1 build 402.

I initially tried installing the Rtools package & got the following error:

install.packages("Rtools") WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

I went to the below site & downloaded the Rtools43 installer, and am still getting the same error when i try running install.packages("Rtools") or install.packages("Rtools43").

https://cran.r-project.org/bin/windows/Rtools/rtools43/rtools.html

Does anyone know what I'm doing wrong?

17 Comments
2024/04/20
04:29 UTC

2

NMDS

Can anyone help me! I'm trying to display an NMDS , I want to display species based on the (Type) of forest I have , but instead it shows me 9 circles colored differently instead os species, where's the problem ?

library(readxl)

groupsnmds <- read_excel("~/Mes tests R studio/groupsnmds.xlsx")

library(scatterplot3d)

library(vegan)

library(readxl)

# Charger la matrice de données

groupsnmds <- read_excel("~/Mes tests R studio/groupsnmds.xlsx")

# Ou spécifier directement le chemin du fichier

# Matrix <- read_excel("C:/Users/kitab/Documents/Mes tests R studio/Matrix.xlsx")

# Séparer les noms des sites et les données

clean <- groupsnmds[,3:63]

env <- groupsnmds[,64:69]

m_com = as.matrix(clean)

ord <- metaMDS(m_com, k = 2)

ord

# Afficher le stress plot et les graphiques NMDS

stressplot(ord)

plot(ord)

en = envfit(ord, env, permutations = 999, na.rm = TRUE)

plot(en)

data.scores=as.data.frame(scores(ord, display = "sites"))

# Attribuer des noms de colonnes aux scores d'ordination

colnames(data.scores) <- c("MDS1", "MDS2")

# Afficher les premières lignes du cadre de données

head(data.scores)

data.scores$season = groupsnmds$Type

#extract NMDS scores (x and y coordinates) for sites from newer versions of vegan package

data.scores = as.data.frame(scores(ord)$sites)

#add 'season' column as before

data.scores$Type = groupsnmds$Type

en_coord_cont = as.data.frame(scores(en, "vectors")) * ordiArrowMul(en)

en_coord_cat = as.data.frame(scores(en, "factors")) * ordiArrowMul(en)

species_scores <- as.data.frame(scores(ord, "species"))

# Add a column equivalent to the row name to create species labels

species_scores$species <- rownames(species_scores)

species_scores

library(viridis)

gg = ggplot(data = species_scores, aes(x = NMDS1, y = NMDS2)) +

geom_point(data = data.scores, aes(colour = Type), size = 3, alpha = 0.5) +

scale_colour_manual(values = c("#4169E1", "#3cb471", "#ffeb3b", "#900C3F", "#66a1d0", "goldenrod1", "darkorchid", "slateblue","black")) +

theme(axis.title = element_text(size = 10, face = "bold", colour = "grey30"),

panel.background = element_blank(), panel.border = element_rect(fill = NA, colour = "grey30"),

axis.ticks = element_blank(), axis.text = element_blank(), legend.key = element_blank(),

legend.title = element_text(size = 10, face = "bold", colour = "grey30"),

legend.text = element_text(size = 9, colour = "grey30")) +

labs(colour = "Type")

gg

0 Comments
2024/04/19
16:41 UTC

0

Hey peps I’m at university I need help understanding data, transform data, building models, using your model for prediction, and to make audio slide presentations. I’m new to this coding I need help please DM I will pay hourly

0 Comments
2024/04/19
12:42 UTC

7

How to give CSV columns names recognised in R for plotting?

Hi, I am trying to plot a scattergraph of OTUs (DNA sequences) against abundance in 2 samples. My source is a CSV file which I have imported so it looks like this

https://preview.redd.it/p2wyf8ctwavc1.png?width=372&format=png&auto=webp&s=d9ca5684669b97ef5844b8d3cf7ad28c83213e5a

My problem is when I go to plot this, R does not recognise the column names (OTU, Count - control and Count - 50). I converted the CSV to a dataframe using as.dataframe but am unsure how to rename these columns or what I need to do to make them into a language R understands to then use ggplot(aes(x= and so on. Thank you!

8 Comments
2024/04/18
20:55 UTC

Back To Top