Read File Word by Word in C++
-
Use
std::ifstream
to Read File Word by Word in C++ -
Use
std::ispunct
andstd::string::erase
Functions to Parse Punctuation Symbols in C++
This article will demonstrate multiple methods about how to read a file word by word in C++.
Use std::ifstream
to Read File Word by Word in C++
The std::ifstream
class can be utilized to conduct input operations file-based streams. Namely, the std::ifstream
type is used to interface with file buffer and operate on it using the extraction operator. Note that, std::fstream
type is also provided in the I/O library that’s compatible with both extraction (>>
) and insertion operators (<<
).
At first, we need to create an object of type ifstream
by calling one of its constructors; in this case, only filename string is passed to the constructor function. Once the ifstream
object is created, one of its methods - is_open
should be called to verify that the call was successful and then proceed to read the file contents.
To read the file word by word, we call the extraction operator on ifstream
object. We redirect it to the string variable, which automatically reads in the first word before the first space character is encountered. Since we need to read each word until the end of the file, we insert the extraction statement into a while
loop expression. Additionally, we declared a vector
of strings to store each word on every iteration and print later with a separate loop block.
#include <iostream>
#include <fstream>
#include <vector>
using std::cout; using std::cerr;
using std::endl; using std::string;
using std::ifstream; using std::vector;
int main()
{
string filename("input.txt");
vector<string> words;
string word;
ifstream input_file(filename);
if (!input_file.is_open()) {
cerr << "Could not open the file - '"
<< filename << "'" << endl;
return EXIT_FAILURE;
}
while (input_file >> word) {
words.push_back(word);
}
for (const auto &i : words) {
cout << i << endl;
}
input_file.close();
return EXIT_SUCCESS;
}
Use std::ispunct
and std::string::erase
Functions to Parse Punctuation Symbols in C++
The only downside of the previous method is that it stores the punctuation characters close to words in the destination vector
. It would be better to parse each word and then store them into a vector
container. We are using the ispunct
function that takes a single character as int
parameter and returns a non-zero integer value if the character is punctuation; otherwise - zero is returned.
Note that the behavior of ispunct
function is undefined if the given argument is not representable as unsigned char
; thus, it is recommended to cast the character to the corresponding type. In the following example, we implemented the two simple if
conditions to check the first and last characters of each word. If the punctuation is found, we call a built-in string function - erase
to remove the found characters.
#include <iostream>
#include <fstream>
#include <vector>
using std::cout; using std::cerr;
using std::endl; using std::string;
using std::ifstream; using std::vector;
int main()
{
string filename("input.txt");
vector<string> words;
string word;
ifstream input_file(filename);
if (!input_file.is_open()) {
cerr << "Could not open the file - '"
<< filename << "'" << endl;
return EXIT_FAILURE;
}
while (input_file >> word) {
if (ispunct(static_cast<unsigned char>(word.back())))
word.erase(word.end()-1);
else if (ispunct(static_cast<unsigned char>(word.front())))
word.erase(word.begin());
words.push_back(word);
}
for (const auto &i : words) {
cout << i << endl;
}
input_file.close();
return EXIT_SUCCESS;
}