How to Convert String to Lower Case in C++

Jinku Hu Feb 02, 2024
How to Convert String to Lower Case in C++

In this article, we will introduce how to convert string to the lower case in C++.

The first thing to ask yourself before you do the string conversion in C++ is what kind of encoding my input string has? Because if you will use std::lower with multi-byte encoding characters, then you would definitely get buggy code.

Even if the following function seems neat implementation of the std::string lowercase conversion, but it doesn’t convert all the characters to the lower case because its encoding is UTF-8.

#include <algorithm>
#include <iostream>

std::string toLower(std::string s) {
  std::transform(s.begin(), s.end(), s.begin(),
                 [](unsigned char c) { return std::tolower(c); });
  return s;
}

int main() {
  std::string string1 = u8"ÅSH to LoWer WÅN";
  std::cout << "input string:  " << string1 << std::endl
            << "output string: " << toLower(string1) << std::endl;
  return 0;
}

The above code works fine for the ASCII strings and some other non-ASCII strings as well, but once you give it a bit unusual input, say some Latin symbols in it, the output would not be satisfactory.

Output:

input string:  ÅSH to LoWer WÅN
output string: Åsh to lower wÅn

It is incorrect since it should have lowered Å symbol to å. So, how can we solve this issue to get the correct output?

The best portable way of doing this is using the ICU (International Components for Unicode) library, which is mature enough to offer stability, widely accessible, and will keep your code cross-platform.

We only need to include the following headers in our source file. There’s a good chance that these libraries are already included and available on your platform, so the code samples should work fine. But if you get IDE/compile-time errors, please see the instructions to download the library in ICU documentation website.

#include <unicode/locid.h>
#include <unicode/unistr.h>
#include <unicode/ustream.h>

Now that we have included headers, so we can write std::string to lowercase conversion code as follows:

#include <unicode/locid.h>
#include <unicode/unistr.h>
#include <unicode/ustream.h>

#include <iostream>

int main() {
  std::string string1 = u8"ÅSH to LoWer WÅN";
  icu::UnicodeString unicodeString(string1.c_str());
  std::cout << "input string:  " << string1 << std::endl
            << "output string: " << unicodeString.toLower() << std::endl;
  return 0;
}

Note that we should compile this code with the following compiler flags to include ICU library dependencies:

g++ sample_code.cpp -licuio -licuuc -o sample_code

Run the code, and we get the correct output as expected:

input string:  ÅSH to LoWer WÅN
output string: åsh to lower wån

The very same function can process some different language that we don’t usually expect as user input, and we can also explicitly specify locale as a parameter to the toLower function:

#include <iostream>
#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>

int main() {
    std::string string2 = "Κάδμῳ ἀπιϰόμενοι.. Ελληνας ϰαὶ δὴ ϰαὶ γράμματα, οὐϰ ἐόντα πρὶν Ελλησι";
    icu::UnicodeString unicodeString2(string2.c_str());
    std::cout  << unicodeString2.toLower("el_GR") << std::endl;
    return 0;
}
Author: Jinku Hu
Jinku Hu avatar Jinku Hu avatar

Founder of DelftStack.com. Jinku has worked in the robotics and automotive industries for over 8 years. He sharpened his coding skills when he needed to do the automatic testing, data collection from remote servers and report creation from the endurance test. He is from an electrical/electronics engineering background but has expanded his interest to embedded electronics, embedded programming and front-/back-end programming.

LinkedIn Facebook

Related Article - C++ String