How to Read File Word by Word in C++

Jinku Hu Feb 12, 2024
  1. Read a File Word by Word in C++ Using the Extraction Operator (>>)
  2. Read a File Word by Word in C++ Using (>>) With std::ispunct and std::string::erase
  3. Read a File Word by Word in C++ Using the get() Function
  4. Conclusion
How to Read File Word by Word in C++

Reading a file word by word is a fundamental operation in many programming tasks. In C++ programming, this process involves a combination of file handling and string manipulation.

This article will guide you through the steps of reading a file word by word in C++ using different methods, providing simple and concise examples.

Read a File Word by Word in C++ Using the Extraction Operator (>>)

One straightforward approach to reading the contents of a file word by word is the extraction operator (>>) in conjunction with the std::ifstream class.

The >> extraction operator in C++ is commonly used for input operations. When applied to a std::ifstream object, it reads data from the file associated with that stream.

In the context of reading a file word by word, each use of >> extracts a word until a whitespace character (like space, tab, or newline) is encountered. This operation makes it a convenient choice for parsing words from a file.

Code Example:

#include <fstream>
#include <iostream>
#include <vector>

int main() {
  // Specify the filename
  std::string filename = "example_file.txt";

  // Open the file for reading
  std::ifstream file(filename);

  // Check if the file is open successfully
  if (!file.is_open()) {
    std::cerr << "Could not open the file - '" << filename << "'" << std::endl;
    return EXIT_FAILURE;
  }

  // Vector to store the words
  std::vector<std::string> words;

  // Read the file word by word using the extraction operator
  std::string word;
  while (file >> word) {
    // Store each word in the vector
    words.push_back(word);
  }

  // Close the file
  file.close();

  // Display the words
  for (const auto &w : words) {
    std::cout << w << std::endl;
  }

  return EXIT_SUCCESS;
}

Let’s break down the key components of the code above:

  • Here, we first specify the filename as example_file.txt and create a std::ifstream object named file to handle input operations on the file. This object is used to open the file for reading.
    std::string filename = "example_file.txt";
    std::ifstream file(filename);
    
  • Following this, we check if the file was opened successfully. If not, an error message is displayed on the standard error stream (std::cerr), indicating the failure to open the specified file.

    The program then exits with a failure status (EXIT_FAILURE).

    if (!file.is_open()) {
      std::cerr << "Could not open the file - '" << filename << "'" << std::endl;
      return EXIT_FAILURE;
    }
    
  • Then, we declare a std::vector named words to store the words read from the file. This dynamic array allows for the flexible storage of strings.
    std::vector<std::string> words;
    
  • Using a while loop, we read the file word by word using the extraction operator (>>). Each word is stored in the vector words using the push_back function, expanding the vector dynamically as needed. This loop continues until the end of the file is reached.
    std::string word;
    while (file >> word) {
      words.push_back(word);
    }
    
  • Once all the words are read and stored, it’s good practice to close it using the close method of the std::ifstream object. This releases system resources associated with the file.
    file.close();
    
  • Finally, we display the stored words on the standard output (std::cout) using a range-based for loop. Each word is printed on a new line.
    for (const auto &w : words) {
      std::cout << w << std::endl;
    }
    

Code Output:

Assuming the contents of example_file.txt are:

Hello world. C++ is awesome!

The program output will be:

Read a File Word by Word in C++ - Output 1

Read a File Word by Word in C++ Using (>>) With std::ispunct and std::string::erase

While the previous example efficiently reads a file word by word, it may include punctuation characters alongside words.

To address this, we can employ the std::ispunct function to check for punctuation and the std::string::erase method to remove these symbols. This approach ensures cleaner word extraction from the file.

The std::ispunct function is part of the <cctype> header and is used to check whether a character is a punctuation symbol. It takes a single character as an int parameter and returns a non-zero integer if the character is punctuation; otherwise, it returns zero.

In order to ensure proper behavior, it’s recommended to cast the character to the corresponding type using static_cast<unsigned char>.

The std::string::erase method, on the other hand, removes characters from a string. In this case, it is applied to eliminate punctuation symbols from the beginning or end of each word.

Code Example:

#include <fstream>
#include <iostream>
#include <vector>

int main() {
  // Specify the filename
  std::string filename = "example.txt";

  // Open the file for reading
  std::ifstream file(filename);

  // Check if the file is open successfully
  if (!file.is_open()) {
    std::cerr << "Could not open the file - '" << filename << "'" << std::endl;
    return EXIT_FAILURE;
  }

  // Vector to store the words
  std::vector<std::string> words;

  // Read the file word by word using the extraction operator
  std::string word;
  while (file >> word) {
    // Check and remove punctuation from the front of the word
    if (std::ispunct(static_cast<unsigned char>(word.front())))
      word.erase(word.begin());

    // Check and remove punctuation from the end of the word
    if (std::ispunct(static_cast<unsigned char>(word.back())))
      word.erase(word.end() - 1);

    // Store each modified word in the vector
    words.push_back(word);
  }

  // Close the file
  file.close();

  // Display the modified words
  for (const auto &w : words) {
    std::cout << w << std::endl;
  }

  return EXIT_SUCCESS;
}

This code builds upon the previous example by adding enhanced punctuation handling. Inside the while loop, where words are read from the file, there are two conditional blocks.

The first block checks if the first character of the word is punctuation using std::ispunct. If true, it uses std::string::erase to remove the punctuation from the front of the word.

Similarly, the second block checks if the last character is punctuation and removes it if true. The modified word is then stored in the words vector.

This ensures that punctuation symbols are properly handled, enhancing the word extraction process.

Code Output:

Assuming the contents of example.txt are:

Hello, world! C++ is awesome.

The program output will be:

Read a File Word by Word in C++ - Output 2

Read a File Word by Word in C++ Using the get() Function

Another approach to read a file word by word involves utilizing the get() function. Unlike the extraction operator (>>), get() reads characters from a stream until a specified delimiter is encountered, allowing for more granular control over the reading process.

The get() function in C++ is a member function of the std::istream class (or its derived classes, such as std::ifstream). The syntax of the get() function is as follows:

int_type get();

Here, int_type is the return type of the function, which is typically int. The get() function reads and returns the next character from the input stream.

If successful, it returns the ASCII value of the character as an integer. If the end-of-file (EOF) is reached or an error occurs, it returns EOF (which is often defined as -1).

Additionally, there is an overloaded version of the get() function that takes a single parameter:

istream& get(char_type& ch);

Here, char_type is the character type used by the stream. The function reads the next character from the input stream and stores it in the variable referenced by ch.

The return type is a reference to the stream itself (istream&), allowing for chaining of input operations.

When used in the context of reading a file word by word, you typically use the version without parameters, checking for specific characters (like space or newline) to determine word boundaries. This allows us to handle words without the influence of punctuation or other delimiters.

Code Example:

#include <fstream>
#include <iostream>

int main() {
  // Specify the filename
  std::string filename = "example.txt";

  // Open the file for reading
  std::ifstream file(filename);

  // Check if the file is open successfully
  if (!file.is_open()) {
    std::cerr << "Could not open the file - '" << filename << "'" << std::endl;
    return EXIT_FAILURE;
  }

  // Read the file word by word using the get() function
  char ch;
  std::string word;

  while (file.get(ch)) {
    // Check if the character is a space or newline (word delimiter)
    if (ch == ' ' || ch == '\n') {
      // Process the word
      std::cout << word << std::endl;
      // Reset the word for the next iteration
      word.clear();
    } else {
      // Append the character to the current word
      word += ch;
    }
  }

  if (!word.empty()) {
    std::cout << word << std::endl;
  }

  // Close the file
  file.close();

  return EXIT_SUCCESS;
}

Similar to the previous examples, we start by specifying the filename and opening the file using std::ifstream for reading. If the file opening is unsuccessful, an error message is displayed on the standard error stream (std::cerr), indicating the filename that could not be opened.

Assuming the file is opened successfully, we enter a while loop that utilizes the get() function to read characters from the file one by one. Inside this loop, each character (ch) is examined to determine if it is a space or newline character, which serves as a word delimiter.

while (file.get(ch)) {
  if (ch == ' ' || ch == '\n') {
    std::cout << word << std::endl;
    word.clear();
  } else {
    word += ch;
  }
}

If a delimiter is encountered, the program processes and prints the current word using std::cout and then clears the word string to prepare for the next word. If the character is not a delimiter, it is appended to the current word.

This process continues until the end of the file is reached. After reading and processing all the words, the file is closed using the close() method. We then return a success status (EXIT_SUCCESS).

Code Output:

Assuming the contents of example.txt are:

Hello world C++ programming is fun

The program output will be:

Read a File Word by Word in C++ - Output 3

Hello
world
C++
programming
is
fun

Conclusion

In conclusion, we’ve explored various methods in C++ for reading a file word by word, each with its advantages and use cases. The extraction operator (>>) provides a simple and concise way to achieve this task, making it suitable for straightforward scenarios.

The get() function offers greater flexibility, allowing for custom delimiters and fine-grained control over the reading process. Additionally, we enhanced the word-by-word reading process by incorporating std::ispunct and std::string::erase, addressing punctuation-related challenges.

The choice of method depends on the specific requirements of the task at hand. The extraction operator is convenient for general cases, while get() and additional string manipulation functions provide more advanced options.

We can tailor our approach based on the file structure, desired word boundaries, and the need for additional processing steps.

Author: Jinku Hu
Jinku Hu avatar Jinku Hu avatar

Founder of DelftStack.com. Jinku has worked in the robotics and automotive industries for over 8 years. He sharpened his coding skills when he needed to do the automatic testing, data collection from remote servers and report creation from the endurance test. He is from an electrical/electronics engineering background but has expanded his interest to embedded electronics, embedded programming and front-/back-end programming.

LinkedIn Facebook

Related Article - C++ File