# R 数据框的选定列的总和

Jesse John 2023年1月30日

``````# Create five variables.
Student = c("Student A", "Student B", "Student C")
Hobby = c("Music", "Sports", "Cycling")
Maths = c(40, 35, 30)
Statistics = c(30, 35, 20)
Programming = c(25, 20, 35)

# Create a data frame from the variables.
df_students = data.frame(Student, Hobby, Maths, Statistics, Programming)

# View the data frame.
df_students
``````

## 使用 Base R 的 `rowSums()` 函数计算数据框选定列的总和

``````# This adds the new column to the data frame.
df_students\$myRowSums = rowSums(df_students[,c("Maths", "Statistics", "Programming")])

# View the data frame with the added column.
df_students

# We can also give a vector of column positions.
# df_students\$myRowSums = rowSums(df_students[,c(3:5)])
``````

``````> # View the data frame with the added column.
> df_students
Student   Hobby Maths Statistics Programming myRowSums
1 Student A   Music    40         30          25        95
2 Student B  Sports    35         35          20        90
3 Student C Cycling    30         20          35        85
``````

``````# Save the list of columns as a vector of strings.
col_list = c("Maths", "Statistics", "Programming")

# Pass the vector of strings to the subsetting square brackets.
df_students\$myRowSums = rowSums(df_students[,col_list])

# View the data frame with the added column.
df_students
``````

## 使用 Base R 的 `apply()` 函数计算数据框选定列的总和

1. 数据框所需的列。
2. 要保留的数据框的维度。`1` 表示行。
3. 我们要计算的函数，`sum`

``````# We will recreate the data frame from the variables.
df_students = data.frame(Student, Hobby, Maths, Statistics, Programming)

# In base R, we can delete a column by setting its name to NULL.
# df_students\$myRowSums = NULL

# A new column gets created.
df_students\$myApplySums = apply(df_students[,col_list], 1, sum)

# View the data frame with the added column.
df_students
``````

``````# Names of columns as a vector of strings.
df_students\$myApplySums = apply(df_students[,c("Maths", "Statistics", "Programming")], 1, sum)

# Vector of columns positions.
df_students\$myApplySums = apply(df_students[,c(3, 4, 5)], 1, sum)
``````

## 使用 Tidyverse 函数计算 R 中数据框选定列的总和

1. 管道运算符，`%>%`，以避免嵌套某些函数。
2. `rowwise()` 使其他函数在行上工作。
3. `mutate()` 添加列。
4. `sum()` 用于加法。
5. `c_across()` 旨在与 `rowwise()` 一起使用。
6. `all_of()` 从字符向量中选择值。

`rowwise()` 是一种分组类型。使用后，我们可能需要使用 `ungroup(data_frame_name)` 并将未分组的版本保存为对象。

``````# We will recreate the data frame from the variables.
df_students = data.frame(Student, Hobby, Maths, Statistics, Programming)

library(dplyr)

# Create a tibble from the data frame.
# This could have been done with the next step but obscured the main point.
tb_students = as_tibble(df_students)

# We have to assign the RHS to an object to save the column to the object.
# It can be the same as the original tibble.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(all_of(col_list))))

# View the rowwise tibble with the added column.
tb_students
``````

``````> # View the rowwise tibble with the added column.
> tb_students
# A tibble: 3 x 6
# Rowwise:
Student   Hobby   Maths Statistics Programming myTidySum
<chr>     <chr>   <dbl>      <dbl>       <dbl>     <dbl>
1 Student A Music      40         30          25        95
2 Student B Sports     35         35          20        90
3 Student C Cycling    30         20          35        85
``````

``````tb_students = as_tibble(df_students)
# Take the union of the column names.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(Maths | Statistics | Programming)))

tb_students = as_tibble(df_students)
# Give a range of columns as a range of names.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(Maths:Programming)))

tb_students = as_tibble(df_students)
# Give a range of columns as a range of column positions.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(3:5)))

tb_students = as_tibble(df_students)
# Select all columns having 'at' or 'am'
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(contains('at') | contains('am'))))

tb_students = as_tibble(df_students)
# Select all columns except Student and Hobby.
# Make sure the tibble only has the required columns before running the next line.
tb_students = tb_students %>% rowwise() %>% mutate(myTidySum = sum(c_across(!c(Student, Hobby))))
``````

## 结论

`rowSums()``apply()` 函数使用简单。要添加的列可以使用名称或列位置直接在函数中指定，也可以作为字符向量提供。

Tidyverse 方法虽然有点复杂，但提供了许多替代方法来指定要添加的列。

Jesse is passionate about data analysis and visualization. He uses the R statistical programming language for all aspects of his work.