/r/Rlanguage
We are interested in implementing R programming language for statistics and data science.
This reddit seeks new methods. For life and organization. We are interested in implementing R language for statistics and data science.
R and Statistics subs:
R resources:
Comprehensive R Archive Network
Swirl: Learning R with interactive lessons within the R console
/r/Rlanguage
I’m very new to using R. I have a data set that seeks to predict hotel rating scores from 1 to 5. 1 being the worst, 5 the best.
So far in my class, I’ve learned about using factors for the predictor variables but I’m unsure if we’re allowed to use factors in the response variable as well? Would that make sense? How would that work?
If not, what would I do in this situation?
I'm trying to create 4 columns with categorical data based on the table below: The 4 columns would be:
Is there an easy function to create say the "Gender" column so the 1st (7,287 +11,587 +3,256 + 6,134) = 28,254 values/rows are Female, & the next (10,381+10,969+6,123+6,693) = 34,166 values/rows are Male?
I am doing ANCOVA analyzing if omega 6 intake from either plants (Plant) or seafood (Seafood) has an impact on countries' rates of Alzheimer's disease. I also included the countres' proportions of people older than 65 (Plus) as a variable to account for that (since countries with older populations will probably naturally have higher rates regardless of diet). I just need to make a box plot to assess the assumption of correlation. I remember my professor said something along the lines of “the predictor variables can't be correlated" and that the boxplots should overlap. So I think I need to make a boxplot with two boxes, one for Plant and one for Seafood, where Plus is the y axis, right? But when I try to do this, the box plots look... well, like the thing above. boxplot(Rate~Seafood*Plant,data=alzheimersdietdata). What am I dong wrong?
Hi! I am trying to learn statistical analysis using R but for some reason when it comes to multilevel analyses I just don’t get it. I’m looking for somebody who can explain it to me and I’m willing to pay for this! I wrote a general script already that seems to get me halfway there, but the final interactions I just don’t understand.
This is... a lot. So I want to provide some context.
First, the source document: I have an Excel workbook that contains a tab for each experiment I'm working on, and each experiment has a range of unique plots listed (think Ag research).
I'm writing some code to pull all of the tabs from an xlsx workbook into R, rename each 'tab' (now a dataframe), add some empty columns, and export each dataframe into its own xlsx or csv file for use in my work.
So I've successfully pulled the data into R, with each tab from the excel file being a dataframe in a list of dataframes 'df_list'. What I need next is to create a function which will pull the min and max values from the 'plots' column of each df_list$entry and put it in to a string as a new name for that entry, so that
df_list$entry <- "min(plot)_entry_max(plot)"
right now, I have this:
> df_list
$Test1
Code Line Rep Plot
1 19005 T1_19005-1 1 13
2 19006 T1_19006-3 1 14
3 19007 T1_19007-12 1 15
4 19008 T1_19008-2 1 16
$Test2
Code Line Rep Plot
1 20001 T2-01 1 17
2 20016 T2-16 1 18
3 20003 T2-03 1 19
4 20008 T2-08 1 20
sheets <- c("Test1", "Test2") # a vector of the NAMES of each dataframe in df_list
filenames <- function(sheet) {
filename_list <- c()
min_plot <- min(df_list$sheet$Plot)
max_plot <- max(df_list$sheet$Plot)
tabname <- paste0(min_plot, "_", sheet, "_", max_plot)
filename_list <- append(filename_list, tabname)
}
filename_list <- lapply(sheets, function(x) filenames(x))
print(filename_list)
This almost kind of works; it pulls the correct names from the sheets list and appends the output of the min and max correctly for each, but it can't seem to pull the min or max values. It instead gives me a filler value of "Inf" for the min, and "-Inf" for the max.
So my filename_list reads "Inf_Test1_-Inf", etc.
When I go through the steps I've put in the function individually with a specific entry from "sheets", I get the correct outputs. But it won't work as a function.
Hopefully this all makes sense, and someone can help me! I just wanted to be able to iterate through the entries in a list of dfs and pull the min and max out for each one. I am relatively new to R so I've been searching all over, but this listception makes it difficult to find a good result anywhere.
*Edit: Changed the column title "Row" to "Plots" to reduce confusion.
Added a pair of example dataframes from df_list.
Hello,
I'm trying to put the generation of graphs into a loop. The problem is that I'm also looping the name of the variables I use in the graph, as in the "nested" loop (line 5). The variables that change according to the loop are in lines 9 and 11. I know that I cannot use strings for this, so I tried to modify it. For example, in line 5, I tried the following:
y = sym(paste0("coef_", j))
y = !!paste0("coef_", j)
y = !!sym(paste0("coef_", j))
And none worked. Any help here is appreciated :)
1. count <- 0
2. graphs <- list()
3.
4. for (i in 1:11) {
5. for (j in c("max", "low")) {
6. count <- count + 1
7. graph <- ggplot(
8. df[df$group == paste0(var_groups[i]),],
9. aes(x = var_label, y = paste0("coef_", j)) +
10. geom_point(color = graph_colors[i]) +
11. geom_errorbar(aes(ymin = paste0("ci_lower_",j), ymax = paste0("ci_upper_", j)), width = 0.2,
12. color = graph_colors[i], size = 1) +
13. coord_flip() +
14. geom_hline(yintercept = 0, color = "blue", linetype = "dashed", size = 1) +
15. scale_y_continuous(limits = c(-x_axis, x_axis)) +
16. theme_bw()
17. graphs[[count]] <- graph
18. }
19. }
I have an excel sheet of data that I need to modify so I can run a fixed effects model in R. As I understand it each observation needs to be a row with a variable denoting what year the observation was made. In my spread sheet the data is set up so it is a college and its data for every year. I would like to have it set up so the row is a college, its observation for a specific year, and the year the observation was made. In other words I have this
College | Tuition 2021 | Tuition 2022 |
---|---|---|
x | y | z |
And need this
College | Tuition | Year |
---|---|---|
x | y | 2021 |
x | y' | 2022 |
And so on. Anyone Know how to do this with the tidyverse or something else in R?
I have a dataset that I would like to run a RAM-intensive algorithm on. Ideally I would like to have at least 400 GB of RAM, and be able to program using an R-studio interface. Does anyone have experience with a cloud service that does this?
I want to estimate this normal likelihood using weighted least squares with lm()
in R.
What should I use in the weights=
argument? Is it c(1/n_1, ..., 1/n_k)
or c(n_1, ..., n_k)
or something else?
Hi everyone! I am currently trying to learn data analysis with R and need to do a moderated mediation analysis. Here is an example of the model type I am trying to analyze.
Here is the code I have been trying to run, but I realized I unfortunately am too new with R to be able to fully understand the outputs and thus, what I am missing. Could someone please help me figure out how to get the strengths of the effects in my model, and the significance? Thank you so much in advance!
library("lavaan")
library("foreign")
library("multilevel")
library("mediation")
library("sjPlot")
library("lme4")
data<-read.spss("Long NEW.sav", use.value.labels=FALSE,
to.data.frame=TRUE, use.missings=TRUE)
hrdata<- as.data.frame(data)
names(hrdata)
hrs.mod1<-aov(ENG_Sum~as.factor(Res_ID),data=hrdata)
ICC1(hrs.mod1)
ICC2(hrs.mod1)
model1 <-lme(LMX_Sum ~ EMP_Sum
, random=~1|Res_ID
,method="ML",na.action=na.omit, data=hrdata)
summary(model1)
model2 <-lme(ENG_Sum ~ LMX_Sum + EMP_Sum
, random=~1|Res_ID
,method="ML",na.action=na.omit, data=hrdata)
summary(model2)
hrdata$I_EMP_Sum_Gender_Dissimilarity <- hrdata$EMP_Sum * hrdata$Gender_Dissimilarity
model3 <-lme(LMX_Sum ~ EMP_Sum + Gender_Dissimilarity
, random=~1|Res_ID
,method="ML",na.action=na.omit, data=hrdata)
summary(model3)
model4 <-lmer(LMX_Sum ~ EMP_Sum + Gender_Dissimilarity + I_EMP_Sum_Gender_Dissimilarity
REML = F, start = NULL,
verbose = 0L, na.action=na.omit,
contrasts = NULL, devFunOnly = F)
model5 <-lmer(ENG_Sum ~ LMX_Sum + EMP_Sum + Gender_Dissimilarity
REML = F, start = NULL,
verbose = 0L, na.action=na.omit,
contrasts = NULL, devFunOnly = F)
mod.med<- mediate(model4, model5, treat = "EMP_Sum",
mediator = "LMX_Sum", sims=1000,
group.out = "Res_ID", dropobs=F,
boot=F, data=hrdata)
summary(mod.med)
Hi, I was wondering if someone can help me with my R code for my thesis?
I have 6 datasets with xy pixel coordinates of animals in 6 different zoos. I was wondering if someone knows how I can analyse and compare their proximities. It was done at intervals so you have specific xy coordinates every 3 minutes. The xy is in pixels and not to scale yet. And if possible I would like to compare average distance and how often they are in close proximity.
Can someone please help me? :)
How do I get the nth dimension of a matrix/vector, etc., in R?
In Julia the code is size(X)[n].
The problem is that Julia's size is more generic:v = [1,2,3]
A = [1 2; 3 4]
size(v) == (3,)
size(A) == (2,2)
R leads to NULL values:A = matrix(c(1,2,3,4), nrow = 2, ncol=2)
v = c(1,2,3)
dim(A)
dim(v) # NULL
can someone help me with computing basic biodiversity indices? my data is in long form (see picture) but it appears that package vegan works best for matrices. should I really be wrangling it into matrix or is there anther option?
hey I hope everybody's doing alright, I'm new to R and i'm currently working on my basics. what should I focus on after that? i'm kinda lost so if anybody could provide a list of some sort like a pathway or something. will mean a lot to me
thank you
i write this code in R:
X <- matrix(c(4, 5, 2, 4, 3, 3), nrow=3, byrow=TRUE)
b <- c(3, -2)
print(X*b)
an the output is:
[,1] [,2]
[1,] 12 -10
[2,] -4 12
[3,] 9 -6
[,1] [,2]
[1,] 12 -10
[2,] 6 -8
[3,] 9 -6
but why it's not like this:
if X*b is standard multiplication it most be the second one
Hi, I don't have a clue how to fix this issue it literally makes zero sense.
symbol<-sort(c("INTC", "AAPL", "MSFT","AMZN","GOOG",
"META","TSLA","NVDA","PYPL",
"NFLX","ADBE","QCOM","AMD","MRNA","FDX","EBAY","EA","HOOD","BABA",
"CAN","PENN","TLRY","MARA","SPCE","LMND","NNDM","BNGO","NIO","COIN",
"BIDU","DOCU",
"PLTR","WKHS","CRM","QQQ","UBER","TWLO","TDOC","SPOT",
"SNOW","SE","RIVN",
"PDD","NTLA","NET","CRSP","ETSY","CRWD","CRSR",
"SBUX", "ZM", "WDAY", "WBD", "VRTX", "VRSK", "TTD",
"TXN", "TMUS", "SBUX",
"SIRI", "ROST", "MNST", "MU", "MELI", "LULU", "JD",
"ADP", "ABNB",
"ALGN", "AEP", "AMGN", "ADI", "ANSS", "AMAT", "ASML",
"AZN", "TEAM", "ADSK",
"BKR", "BIIB", "BKNG", "AVGO", "CDNS", "CHTR", "CTAS",
"CSCO", "CTSH",
"CMCSA", "CEG", "CPRT", "CSGP", "COST", "CSX", "DDOG",
"DXCM", "FANG",
"DLTR", "ENPH", "EXC", "FAST", "FTNT", "GEHC", "GILD",
"HON", "IDXX",
"ILMN", "INTU", "ISRG", "KDP", "KLAC", "KHC", "LRCX",
"LCID", "MRVL",
"MDLZ", "ORLY", "ODFL", "PAYX", "PEP", "REGN", "CVNA"))
equities<-symbol
#equities <- tq_index("SP500") %>%
data_list <- map(equities, ~ tq_get(., get = "stock.prices", from = "2018-01-01"))
data_df <- reduce(data_list, full_join)
write_csv(data_df, "df_file.csv")
data_df$symbol=factor(data_df$symbol)
data_df$cap=data_df$open*1000
data_df <- data_df %>%
group_by(symbol) %>%
mutate(open_next_day = lead(open),
close_next_day = lead(close),
close_before_day = lag(close),
low_next_day = lead(low),
high_next_day = lead(high),
volume_next_day = lead(volume),
cap_next_day = lead(cap))
################################## Here ends the data import
data_df$gain_next_day<-(data_df$high_next_day-data_df$close)/data_df$close*100
data_df$gain_day<-(data_df$high-data_df$close_before_day)/data_df$close_before_day*100
summary(data_df$gain_day)
hist(gain_day)
hist(clean_data$gain_day,freq=FALSE,breaks = c(-80,-50,-25,-10,0,10,25,50,100))
The summary of gain_day sets me a large range of datas but when I plot the graphic the visualization is inexplicably wrong, it says that the range is too wide but that's totally not true as you can see from the summary.
Can someone please help me?
I have two instances of renderdatatable in my code. one is working fine and loading in the first load. but other is not working on first load but does work on second load.
Here is the working code that loads in the first load.
Theme based keywords {data-navmenu="Competition"}
=======================================================================
Column {.sidebar}
-----------------------------------------------------------------------
```{r}
Data43 <- j1
themelevel11 <- factor(Data43$theme, levels =sort(unique(Data43$theme)))
selectInput('themelevel11', 'Theme:', c("Select",levels(factor(themelevel11))), selected = "Select")
```
Column
-----------------------------------------------------------------------
### Theme based keywords
```{r}
dataset43 <- reactive({
subset(Data43,
(input$themelevel11 == "Select" | themelevel11 %in% input$themelevel11)
)
})
DT::renderDataTable(
dataset43(),
filter = 'top',
extensions = 'FixedColumns',
options = list(
pageLength = 10,
autoWidth = TRUE,
scrollY = '400px',
scrollX = TRUE,
fixedColumns = list(leftColumns = 3),
order = list(list(2, 'desc')),
columnDefs = list(
list(className = 'dt-center', targets = '_all'),
list(className = 'dt-left', targets = 0),
list(visible = FALSE, targets = c(3))
)
)
)
The below code doesn't work on first scroll and just showing 420 rows count without any data, however the data gets loaded on the second scroll.
Long titles {data-navmenu="Technical SEO"}
=======================================================================
Column {.sidebar}
------------------------------------------------------------------
```{r}
Data36 <- c1
themelevel10 <- factor(Data36$theme, levels = sort(unique(Data36$theme)))
selectInput('themelevel10', 'Theme:', c("Select",levels(factor(themelevel10))), selected = "Select")
```
Column 1
-----------------------------------------------------------------------
```{r}
dataset36 <- reactive({
subset(Data36,
(input$themelevel10 == "Select" | themelevel10 %in% input$themelevel10)
)
})
DT::renderDataTable(dataset36(),
filter = 'top',
options = list(
pageLength = 10,
autoWidth = TRUE,
scrollY = '450px',
scrollX = TRUE,
order = list(list(5, 'desc')),
columnDefs = list(
list(className = 'dt-center', targets = '_all'),
list(className = 'dt-left', targets = 0),
list(visible=FALSE, targets=c(6))
)
)
)
What could be the reason. I want the second code to be loaded in the first scroll when "Select" filter is chosen.
Hi! I’m trying to pick up R right now and following a tutorial but I don’t seem to quite understand it. This is specifically about making/analyzing a multilevel model.
I am using this script and it works, but I just don’t fully understand it. Engagement is the outcome variable, but the final model I was given is a moderated/moderated mediation model. Is empathetic the independent variable, and emotion the mediation variable? Where does the moderation variable go then?
model1 <-lme(s_engage ~ s_empathetic + s_emotion , random=~1|Res_ID ,method="ML",na.action=na.omit, data=hrdata)
Thanks so much in advance and sorry if I’m unclear with what I mean, I’m still trying to pick this up.
So i have train_set_tsibble, where
> str(train_set_tsibble) tbl_ts [31 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame) $ date : Date[1:31], format: "2015-01-01" "2015-01-02" "2015-01-03" ... $ Delays: num [1:31] 25 52 57 76 41 46 37 42 32 16 ...
- attr(*, "key")= tibble [26 × 2] (S3: tbl_df/tbl/data.frame)
..$ Delays: num [1:26] 12 13 16 18 21 23 24 25 26 28 ...
..$ .rows : list<int> [1:26]
.. ..$ : int 24
.. ..$ : int 19
.. ..$ : int [1:3] 10 14 25
.. ..$ : int 23
.. ..$ : int 20
.. ..$ : int 27
.. ..$ : int 15
.. ..$ : int 1
.. ..$ : int 13
.. ..$ : int 22
.. ..$ : int 17
.. ..$ : int 28
.. ..$ : int 9
.. ..$ : int 18
.. ..$ : int 7
.. ..$ : int 12
.. ..$ : int [1:3] 5 21 26
.. ..$ : int 8
.. ..$ : int 16
.. ..$ : int 29
.. ..$ : int 6
.. ..$ : int 2
.. ..$ : int [1:2] 3 31
.. ..$ : int 11
.. ..$ : int 4
.. ..$ : int 30
.. ..@ ptype: int(0) ..- attr(*, ".drop")= logi TRUE - attr(*, "index")= chr "date" ..- attr(*, "ordered")= logi TRUE - attr(*, "index2")= chr "date" - attr(*, "interval")= interval [1:1] 1D ..@ .regular: logi TRUE - attr(*, ".regular")= logi TRUE
This causes errors when I try to apply it in model(), I assume because the key tibble is formatted really incorrectly. I think if I had it formatted like this it would be good:
> str(train_set_tsibble) tbl_ts [31 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame) $ date : Date[1:31], format: "2015-01-01" "2015-01-02" "2015-01-03" ... $ Delays: num [1:31] 25 52 57 76 41 46 37 42 32 16 ... - attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame) ..$ .rows : list<int> [1:31] .. ..$ : int 1 2 3 4 5 6 7 8 9 20 ... .. ..@ ptype: int(0) ..- attr(*, ".drop")= logi TRUE - attr(*, "index")= chr "date" ..- attr(*, "ordered")= logi TRUE - attr(*, "index2")= chr "date" - attr(*, "interval")= interval [1:1] 1D ..@ .regular: logi TRUE - attr(*, ".regular")= logi TRUE
I'm trying to scrape the table from the following webpage:https://www.nasdaq.com/market-activity/stocks/aaa/dividend-history
I'm doing so with rselenium because I can't seem to download the html using rvest. However I'm finding that all the actual values of the table are coming up empty. Here's the code I'm using:
library(RSelenium)
rD <- rsDriver(browser = 'firefox', port = 4833L, chromever = NULL)
remDr <- rD[["client"]]
remDr$navigate(paste0("https://www.nasdaq.com/market-activity/stocks/aaa/dividend-history"))
Sys.sleep(11)
html <- read_html(remDr$getPageSource()[[1]])
df <- html_table(html_nodes(html, "table"))
If I try another url on the same website it works:
library(RSelenium)
rD <- rsDriver(browser = 'firefox', port = 4833L, chromever = NULL)
remDr <- rD[["client"]]
remDr$navigate(paste0("https://www.nasdaq.com/market-activity/stocks/a/dividend-history"))
Sys.sleep(11)
html <- read_html(remDr$getPageSource()[[1]])
df <- html_table(html_nodes(html, "table"))
I'm not sure why it works for one url but not the other. Hoping someone can explain what's going on and how I get the info in the table.
This is the code I am running for an assignment, but when it runs summary(E1_1) I only receive back Length, Class and Mode. Summary has worked for me in the past with an uploaded excel dataset but, not sure how to get it to work properly now
# Creating Vectors for Cities recorded High and Low Temperatures in Celcius
# vCity = City names
# vHigh = the highest temperature recorded
# vLow= the lowest temperature recorded
vCity <- c("Barecelona", "Berlin", "Lisbon", "London", "Paris", "Rome")
vHigh <- c(14, 2, 14, 5, 2, 14)
vLow <- c(6, -1, 3, 0, -3, 3)
#Creating a data frame titled E1_1 combining the three vectors
E1_1 <- as.data.frame(cbind(vCity, vHigh, vLow))
Sys.time()
Sys.getenv("username")
E1_1
#running summary to find mean,median, min, max, 1st and 3rd quartiles of dataframe
summary(E1_1)
Sys.time()
Sys.getenv("username")
Hi all!
I'm trying to read https://www.atptour.com/en/players/-/S0AG/rankings-history?year=2020 this page so as to extract the Rank column in Singles table.
I've used
page=readLines(https://www.atptour.com/en/players/-/S0AG/rankings-history?year=2020)
and i get the html code. I expected to se the number 37 at line 692, but i get
<div>{{playerItem.SglRollRank}}</div>
so it seems to get variables insted of values.
Do you know a way I can get those '37' values?
Thank you <3
Hi,
I'm looking at an MLR for continuous income - here are my plots for MLR without logging the outcome, with logging and with logging + 1. Without logging has a lower R2 while logging and logging +1 have the same R^2. But I think the residual vs fitted graph isn't looking too good for the logged versions. Any advice/help would be appreciated. Thanks!
I am trying to make it so that all of the lines have different shapes, but right now it shows The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes difficult to discriminate ℹ you have requested 7 values. Consider specifying shapes manually if you need that many have them.
I tried to add scale_shape_manual(breaks = 6:12, values = 6:12, name = "group")
, but it did nothing but make the points disappear. Also, right now the objects in the legend is in a weird order. How should it make it so that they are in the proper order from Obj_6 to Obj_12?
I also want to make the x-axis show each power, but it only shows values like 0.5 and 1.0 instead of the values I need, which are c(0,0.25,0.33,0.35,0.375,0.4,0.425,0.45,0.475,0.5,0.525,0.55,0.575,0.6,0.7,0.8,0.9,1,1.1,1.2,1.3)
, and I have no idea how to do that. Is it possible to show them vertically since putting them horizontally together would be crowded?
As an absolute noob, any help is appreciated!!! Thanks in advance
Code:
graph_data_long %>%
ggplot(aes(x=as.numeric(power),y=value,group=obj_name,color=obj_name)) +
geom_line() +
geom_point(aes(shape=obj_name)) +
scale_y_continuous(trans = "log10") +
theme_linedraw() +
scale_shape_manual(breaks = 6:12, values = 6:12, name = "group")
Data frame graph_data_long
:
structure(list(obj_name = c("Obj_6", "Obj_6", "Obj_6", "Obj_6",
"Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6",
"Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6", "Obj_6",
"Obj_6", "Obj_6", "Obj_6", "Obj_7", "Obj_7", "Obj_7", "Obj_7",
"Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7",
"Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7", "Obj_7",
"Obj_7", "Obj_7", "Obj_7", "Obj_8", "Obj_8", "Obj_8", "Obj_8",
"Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8",
"Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8", "Obj_8",
"Obj_8", "Obj_8", "Obj_8", "Obj_9", "Obj_9", "Obj_9", "Obj_9",
"Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9",
"Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9", "Obj_9",
"Obj_9", "Obj_9", "Obj_9", "Obj_10", "Obj_10", "Obj_10", "Obj_10",
"Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10",
"Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10", "Obj_10",
"Obj_10", "Obj_10", "Obj_10", "Obj_11", "Obj_11", "Obj_11", "Obj_11",
"Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11",
"Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11", "Obj_11",
"Obj_11", "Obj_11", "Obj_11", "Obj_12", "Obj_12", "Obj_12", "Obj_12",
"Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12",
"Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12", "Obj_12",
"Obj_12", "Obj_12", "Obj_12"), power = c("0", "0.25", "0.33",
"0.35", "0.375", "0.4", "0.425", "0.45", "0.475", "0.5", "0.525",
"0.55", "0.575", "0.6", "0.7", "0.8", "0.9", "1", "1.1", "1.2",
"1.3", "0", "0.25", "0.33", "0.35", "0.375", "0.4", "0.425",
"0.45", "0.475", "0.5", "0.525", "0.55", "0.575", "0.6", "0.7",
"0.8", "0.9", "1", "1.1", "1.2", "1.3", "0", "0.25", "0.33",
"0.35", "0.375", "0.4", "0.425", "0.45", "0.475", "0.5", "0.525",
"0.55", "0.575", "0.6", "0.7", "0.8", "0.9", "1", "1.1", "1.2",
"1.3", "0", "0.25", "0.33", "0.35", "0.375", "0.4", "0.425",
"0.45", "0.475", "0.5", "0.525", "0.55", "0.575", "0.6", "0.7",
"0.8", "0.9", "1", "1.1", "1.2", "1.3", "0", "0.25", "0.33",
"0.35", "0.375", "0.4", "0.425", "0.45", "0.475", "0.5", "0.525",
"0.55", "0.575", "0.6", "0.7", "0.8", "0.9", "1", "1.1", "1.2",
"1.3", "0", "0.25", "0.33", "0.35", "0.375", "0.4", "0.425",
"0.45", "0.475", "0.5", "0.525", "0.55", "0.575", "0.6", "0.7",
"0.8", "0.9", "1", "1.1", "1.2", "1.3", "0", "0.25", "0.33",
"0.35", "0.375", "0.4", "0.425", "0.45", "0.475", "0.5", "0.525",
"0.55", "0.575", "0.6", "0.7", "0.8", "0.9", "1", "1.1", "1.2",
"1.3"), value = c(3.72410744566162, 5.51168003923104, 7.4060839154296,
8.20393019709473, 9.24723797685855, 10.7562379662041, 12.8868025478135,
15.5100727558972, 18.5534222223628, 23.1409478278656, 22.6504985586419,
20.9065115341602, 19.4181576551116, 17.8554687886568, 10.8993991443694,
7.81271135699668, 6.13966797266852, 5.6977077662796, 5.37408529874299,
5.02962030371785, 4.86258697386329, 3.19737465216476, 5.59774993516746,
7.50121965706743, 8.25167100160226, 9.03798643979528, 9.46022583168103,
9.93268505520465, 10.4640129694759, 11.0615850694332, 13.19482999695,
14.746170594877, 15.990014190891, 17.4551966969221, 17.9361361197718,
7.99349308078362, 6.49960908806224, 5.84308526845936, 5.52762501102557,
5.38280815735708, 4.91888518373503, 4.71444863014914, 3.07771847969777,
5.47433228431448, 6.44213757389488, 6.66742430744288, 6.97535025689521,
7.31701680907091, 7.70094772378251, 8.91690959596007, 10.6984857467849,
11.925022990115, 11.142586680758, 10.1820818794844, 9.37740693193392,
8.68294638919202, 7.48047503801594, 6.17035953638092, 5.8675475823089,
5.42126227945987, 5.39176138127794, 4.86428358796829, 4.5258515829203,
3.04646629915112, 4.54796886657372, 5.38135683525506, 5.64564288700107,
6.08552141033587, 6.82143106487618, 7.77461891298966, 8.26190488706468,
8.76033401364452, 9.32880000729427, 9.25312478347536, 9.12503690998079,
9.00089994659873, 8.76811470531463, 6.62734982132179, 5.81577592377926,
5.85559194097889, 5.36002906494716, 5.08166826736183, 4.73000595543503,
4.44023340784953, 3.09333501144575, 4.35881755510305, 5.18741750910549,
5.58348875895986, 6.11140873800865, 6.54912995115038, 7.06295188841488,
7.67372446427145, 8.40813885910582, 9.30806145486793, 9.1532682896371,
8.66516577218275, 8.2283334585154, 7.78803322738736, 6.63338647326677,
5.84027839953848, 5.34384133619004, 5.42904421949546, 4.66831095851281,
4.74506379824897, 4.49021089461522, 3.07796931479018, 4.1217081751311,
5.14012654850439, 5.36151701687943, 5.66980178677372, 6.01978990929582,
6.42277739775086, 6.89108438347427, 7.43891180869636, 8.0884637798103,
8.12528629344052, 8.0037684219851, 7.88626899591346, 7.72705134947085,
6.49204630463644, 5.53832636547457, 5.33078894588476, 4.98055757657446,
4.63421867612929, 4.31216950703026, 4.0601002676063, 2.77184102938775,
3.78612559323399, 4.42463386554498, 4.62383257394448, 4.90246206056175,
5.22055525535581, 5.58906805889066, 6.02037960424028, 6.52930328523149,
7.13892440760706, 7.30138340021254, 7.34092525148487, 7.38060453871976,
7.37971301100791, 6.26914351168065, 5.2321378547673, 4.4381055134539,
4.52424178992708, 4.68529499485843, 4.2375259073016, 3.80679536473264
)), row.names = c(NA, -147L), class = c("tbl_df", "tbl", "data.frame"
))
I have Rstudio 2023.12.1 build 402.
I initially tried installing the Rtools package & got the following error:
install.packages("Rtools") WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
I went to the below site & downloaded the Rtools43 installer, and am still getting the same error when i try running install.packages("Rtools") or install.packages("Rtools43").
https://cran.r-project.org/bin/windows/Rtools/rtools43/rtools.html
Does anyone know what I'm doing wrong?
Can anyone help me! I'm trying to display an NMDS , I want to display species based on the (Type) of forest I have , but instead it shows me 9 circles colored differently instead os species, where's the problem ?
library(readxl)
groupsnmds <- read_excel("~/Mes tests R studio/groupsnmds.xlsx")
library(scatterplot3d)
library(vegan)
library(readxl)
# Charger la matrice de données
groupsnmds <- read_excel("~/Mes tests R studio/groupsnmds.xlsx")
# Ou spécifier directement le chemin du fichier
# Matrix <- read_excel("C:/Users/kitab/Documents/Mes tests R studio/Matrix.xlsx")
# Séparer les noms des sites et les données
clean <- groupsnmds[,3:63]
env <- groupsnmds[,64:69]
m_com = as.matrix(clean)
ord <- metaMDS(m_com, k = 2)
ord
# Afficher le stress plot et les graphiques NMDS
stressplot(ord)
plot(ord)
en = envfit(ord, env, permutations = 999, na.rm = TRUE)
plot(en)
data.scores=as.data.frame(scores(ord, display = "sites"))
# Attribuer des noms de colonnes aux scores d'ordination
colnames(data.scores) <- c("MDS1", "MDS2")
# Afficher les premières lignes du cadre de données
head(data.scores)
data.scores$season = groupsnmds$Type
#extract NMDS scores (x and y coordinates) for sites from newer versions of vegan package
data.scores = as.data.frame(scores(ord)$sites)
#add 'season' column as before
data.scores$Type = groupsnmds$Type
en_coord_cont = as.data.frame(scores(en, "vectors")) * ordiArrowMul(en)
en_coord_cat = as.data.frame(scores(en, "factors")) * ordiArrowMul(en)
species_scores <- as.data.frame(scores(ord, "species"))
# Add a column equivalent to the row name to create species labels
species_scores$species <- rownames(species_scores)
species_scores
library(viridis)
gg = ggplot(data = species_scores, aes(x = NMDS1, y = NMDS2)) +
geom_point(data = data.scores, aes(colour = Type), size = 3, alpha = 0.5) +
scale_colour_manual(values = c("#4169E1", "#3cb471", "#ffeb3b", "#900C3F", "#66a1d0", "goldenrod1", "darkorchid", "slateblue","black")) +
theme(axis.title = element_text(size = 10, face = "bold", colour = "grey30"),
panel.background = element_blank(), panel.border = element_rect(fill = NA, colour = "grey30"),
axis.ticks = element_blank(), axis.text = element_blank(), legend.key = element_blank(),
legend.title = element_text(size = 10, face = "bold", colour = "grey30"),
legend.text = element_text(size = 9, colour = "grey30")) +
labs(colour = "Type")
gg
Hi, I am trying to plot a scattergraph of OTUs (DNA sequences) against abundance in 2 samples. My source is a CSV file which I have imported so it looks like this
My problem is when I go to plot this, R does not recognise the column names (OTU, Count - control and Count - 50). I converted the CSV to a dataframe using as.dataframe but am unsure how to rename these columns or what I need to do to make them into a language R understands to then use ggplot(aes(x= and so on. Thank you!