How to Drop Column by Name in R

Sheeraz Gul Feb 22, 2024
  1. How to Drop a Column by Name From a Data Frame in R Using the dplyr Package
  2. How to Drop a Column by Name From a Data Frame in R Using the names() Function
  3. How to Drop a Column by Name From a Data Frame in R Using the subset() Function
  4. How to Drop a Column by Name From a Data Frame in R Using the data.table Package
  5. Conclusion
How to Drop Column by Name in R

Data frames are fundamental structures in R for storing and manipulating data, commonly used in data analysis, statistics, and machine learning tasks. In many real-world scenarios, it becomes necessary to remove specific columns from a data frame, either due to redundancy or irrelevance, or to streamline data processing pipelines.

One common operation is dropping a column by its name. In this article, we will explore various methods to achieve this task in R.

We’ll delve into techniques leveraging popular packages such as dplyr and data.table, as well as built-in functions like names().

How to Drop a Column by Name From a Data Frame in R Using the dplyr Package

dplyr is a powerful package in R designed to make data manipulation tasks easier and more intuitive. It provides a set of functions that streamline common data manipulation tasks, such as filtering, selecting, arranging, and summarizing data.

One of the key functions in the dplyr package is select(), which allows us to subset columns from a data frame based on their names.

The syntax for dropping a column by name using the select() function from the dplyr package is as follows:

select(dataframe, -column_name)

Here, dataframe refers to the data frame from which we want to drop the column, and column_name is the name of the column we wish to remove. By using the - sign before the column name, we specify that we want to drop that particular column from the data frame.

The function returns the data frame without the specified column.

Before we proceed to the example, ensure you have the dplyr package installed. If not, you can install it from CRAN using:

install.packages("dplyr")

Let’s demonstrate the usage of the select() function to drop a column by name from a data frame. Consider a data frame containing employee information with columns for Name, LastName, Id, and Designation.

library(dplyr)

Delftstack <- data.frame(
    Name = c("Jack", "John", "Mike", "Michelle", "Jhonny"),
    LastName = c("Danials", "Cena", "Chandler", "McCool", "Nitro"),
    Id = c(101, 102, 103, 104, 105),
    Designation = c("CEO", "Project Manager", "Senior Dev", "Junior Dev", "Intern")
)

print("The dataframe before dropping the column:-")
print(Delftstack)

print("The dataframe after dropping the column:-")
print(select(Delftstack, -Name))

Initially, we load the dplyr library to access its powerful data frame manipulation functions. Then, we create a data frame named Delftstack containing employee information with columns for Name, LastName, Id, and Designation.

Before making any modifications, we print the data frame to understand its structure.

Moving on to the actual operation, we use the select() function from the dplyr package to drop the Name column from the Delftstack data frame. This function takes the data frame as its first argument, and the column name prefixed with a minus sign (-) as its second argument.

The minus sign signifies that we want to exclude the specified column from the resulting data frame. Upon execution, the select() function returns the modified data frame without the Name column.

To validate the operation, we print the updated data frame after dropping the column. This allows us to verify that the modification was successful.

Output:

Drop a Column by Name From a Data Frame in R Using dplyr

In the printed output, we observe that the Name column is absent from the data frame, while the other columns (LastName, Id, and Designation) remain intact. This confirms that the select() function effectively removed the specified column as intended.

How to Drop a Column by Name From a Data Frame in R Using the names() Function

Another approach to dropping a column by name from a data frame involves using the names() function. This method provides a straightforward way to specify columns to be removed.

The syntax for dropping a column by name using the names() function is as follows:

dataframe <- dataframe[, !(names(dataframe) %in%
    columns_to_drop)]

Here, dataframe refers to the data frame from which we want to drop the column, and columns_to_drop is a vector containing the names of the columns to be removed.

The names() function retrieves the column names of a data frame as a character vector. By using the %in% operator along with the negation operator !, we specify which columns to retain and which to drop.

The resulting logical vector is used for subsetting the data frame, effectively removing the specified columns.

Let’s demonstrate the usage of the names() function to drop a column by name from a data frame. We’ll utilize the same example data frame containing employee information.

Delftstack <- data.frame(
    Name = c("Jack", "John", "Mike", "Michelle", "Jhonny"),
    LastName = c("Danials", "Cena", "Chandler", "McCool", "Nitro"),
    Id = c(101, 102, 103, 104, 105),
    Designation = c("CEO", "Project Manager", "Senior Dev", "Junior Dev", "Intern")
)

print("The dataframe before dropping the column:-")
print(Delftstack)

columns_to_drop <- c("Name", "Id")

Delftstack <- Delftstack[, !(names(Delftstack) %in%
    columns_to_drop)]

print("The dataframe after dropping the column:-")
print(Delftstack)

In this code snippet, we begin by creating a data frame called Delftstack containing employee information with columns for Name, LastName, Id, and Designation. Before proceeding, we print the data frame to observe its structure.

Moving on to the actual operation, we define a vector columns_to_drop containing the names of the columns we wish to remove, which in this case are "Name" and "Id".

We then use the names() function to retrieve the column names of the data frame and create a logical vector indicating which columns to retain. By negating this vector, we effectively specify which columns to drop.

Finally, we subset the data frame using the logical vector obtained from the names() function, resulting in a data frame without the specified columns. We print the updated data frame to confirm the success of the operation.

Output:

Drop a Column by Name From a Data Frame in R Using names()

The output confirms that the Name and Id columns have been successfully dropped from the data frame. The updated data frame now contains only the LastName and Designation columns, reflecting the desired operation.

How to Drop a Column by Name From a Data Frame in R Using the subset() Function

In R, the subset() function provides another method to drop a column by name from a data frame. This approach allows for concise and intuitive data frame manipulation.

The syntax for dropping a column by name using the subset() function is as follows:

subset(dataframe, select = -c(column_name))

Here, dataframe refers to the data frame from which we want to drop the column, and column_name is the name of the column we wish to remove.

The subset() function is primarily used to create subsets of data frames based on specified conditions. However, it can also be utilized to drop columns by name.

By specifying the select argument as -c(column_name), we indicate that we want to exclude the specified column from the resulting data frame. This effectively removes the column from the data frame.

Let’s demonstrate the usage of the subset() function to drop a column by name from a data frame. We’ll use the same example data frame.

Delftstack <- data.frame(
    Name = c("Jack", "John", "Mike", "Michelle", "Jhonny"),
    LastName = c("Danials", "Cena", "Chandler", "McCool", "Nitro"),
    Id = c(101, 102, 103, 104, 105),
    Designation = c("CEO", "Project Manager", "Senior Dev", "Junior Dev", "Intern")
)

print("The dataframe before dropping the column:-")
print(Delftstack)

Delftstack <- subset(Delftstack, select = -c(Name, Id))

print("The dataframe after dropping the column:-")
print(Delftstack)

In this code snippet, we start by creating a data frame called Delftstack containing employee information with columns for Name, LastName, Id, and Designation. Before proceeding, we print the data frame to observe its structure.

Next, we use the subset() function to drop the Name and Id columns from the Delftstack data frame. We specify the columns to be excluded by setting the select argument as -c(Name, Id). This instructs the function to create a data frame excluding the specified columns.

Finally, we print the updated data frame to confirm the success of the operation.

Output:

Drop a Column by Name From a Data Frame in R Using subset()

The output confirms that the Name and Id columns have been successfully dropped from the data frame. The updated data frame now contains only the LastName and Designation columns, reflecting the desired operation.

How to Drop a Column by Name From a Data Frame in R Using the data.table Package

In R, the data.table package offers efficient methods for data manipulation, including dropping a column by name from a data frame. This package provides a concise and powerful approach to working with data frames.

The syntax for dropping a column by name using the data.table package is as follows:

setDT(dataframe)
dataframe <- dataframe[, !"column_name", with = FALSE]

Here, dataframe refers to the data frame from which we want to drop the column, and column_name is the name of the column we wish to remove.

The data.table package enhances data manipulation capabilities in R by providing a variety of functions optimized for performance. To drop a column by name, we first convert the data frame to a data.table using the setDT() function.

Then, we specify the column to be dropped using the [ operator, followed by the column name prefixed with a ! to indicate negation. The with = FALSE argument ensures that we refer to column names directly rather than evaluating them in the data frame’s environment.

Let’s demonstrate the usage of the data.table package to drop a column by name from a data frame. We’ll utilize the same example data frame containing employee information.

library(data.table)

Delftstack <- data.frame(
    Name = c("Jack", "John", "Mike", "Michelle", "Jhonny"),
    LastName = c("Danials", "Cena", "Chandler", "McCool", "Nitro"),
    Id = c(101, 102, 103, 104, 105),
    Designation = c("CEO", "Project Manager", "Senior Dev", "Junior Dev", "Intern")
)

setDT(Delftstack)
print("The dataframe before dropping the column:-")
print(Delftstack)

Delftstack <- Delftstack[, !"Name", with = FALSE]

print("The dataframe after dropping the column:-")
print(Delftstack)

In this code snippet, we first load the data.table package to access its functions for efficient data frame manipulation. Then, we create a data frame called Delftstack containing employee information with columns for Name, LastName, Id, and Designation.

Before proceeding, we print the data frame to observe its structure.

Next, we convert the data frame to a data.table using the setDT() function. This allows us to utilize data.table-specific operations for efficient manipulation.

We then drop the Name column from the Delftstack data.table by specifying the column name within square brackets, prefixed with a ! for negation. The with = FALSE argument ensures that column names are referred to directly.

Finally, we print the updated data.table to confirm the success of the operation.

Output:

Drop a Column by Name From a Data Frame in R Using data.table

The output confirms that the Name column has been successfully dropped from the data frame. The updated data.table now contains only the LastName, Id, and Designation columns, reflecting the desired operation.

Conclusion

In conclusion, dropping a column by name from a dataframe in R is a common task in data manipulation. Throughout this article, we explored several methods to accomplish this task using different R packages.

We discussed how to drop a column using the dplyr package’s select() function, the names() function, the subset() function, and the data.table package’s efficient operations. Each method offers its syntax and approach, providing flexibility for users to choose the one that best fits their workflow and preferences.

Whether it’s through the concise syntax of dplyr, the simplicity of names(), the flexibility of subset(), or the performance optimization of data.table, R provides versatile tools for data frame manipulation. By understanding and using these methods, analysts and data scientists can efficiently manage and process data, ensuring it meets the requirements of their analysis or modeling tasks.

Author: Sheeraz Gul
Sheeraz Gul avatar Sheeraz Gul avatar

Sheeraz is a Doctorate fellow in Computer Science at Northwestern Polytechnical University, Xian, China. He has 7 years of Software Development experience in AI, Web, Database, and Desktop technologies. He writes tutorials in Java, PHP, Python, GoLang, R, etc., to help beginners learn the field of Computer Science.

LinkedIn Facebook

Related Article - R Column