Search

4.11 — Chars

To this point, the fundamental data types we’ve looked at have been used to hold numbers (integers and floating point) or true/false values (booleans). But what if we want to store letters?

The char data type was designed to hold a character. A character can be a single letter, number, symbol, or whitespace.

The char data type is an integral type, meaning the underlying value is stored as an integer. Similar to how a Boolean value 0 is interpreted as false and non-zero is interpreted as true, the integer stored by a char variable are intepreted as an ASCII character.

ASCII stands for American Standard Code for Information Interchange, and it defines a particular way to represent English characters (plus a few other symbols) as numbers between 0 and 127 (called an ASCII code or code point). For example, ASCII code 97 is interpreted as the character ‘a’.

Character literals are always placed between single quotes (e.g. ‘g’, ‘1’, ‘ ‘).

Here’s a full table of ASCII characters:

Code Symbol Code Symbol Code Symbol Code Symbol
0 NUL (null) 32 (space) 64 @ 96 `
1 SOH (start of header) 33 ! 65 A 97 a
2 STX (start of text) 34 66 B 98 b
3 ETX (end of text) 35 # 67 C 99 c
4 EOT (end of transmission) 36 $ 68 D 100 d
5 ENQ (enquiry) 37 % 69 E 101 e
6 ACK (acknowledge) 38 & 70 F 102 f
7 BEL (bell) 39 71 G 103 g
8 BS (backspace) 40 ( 72 H 104 h
9 HT (horizontal tab) 41 ) 73 I 105 i
10 LF (line feed/new line) 42 * 74 J 106 j
11 VT (vertical tab) 43 + 75 K 107 k
12 FF (form feed / new page) 44 , 76 L 108 l
13 CR (carriage return) 45 - 77 M 109 m
14 SO (shift out) 46 . 78 N 110 n
15 SI (shift in) 47 / 79 O 111 o
16 DLE (data link escape) 48 0 80 P 112 p
17 DC1 (data control 1) 49 1 81 Q 113 q
18 DC2 (data control 2) 50 2 82 R 114 r
19 DC3 (data control 3) 51 3 83 S 115 s
20 DC4 (data control 4) 52 4 84 T 116 t
21 NAK (negative acknowledge) 53 5 85 U 117 u
22 SYN (synchronous idle) 54 6 86 V 118 v
23 ETB (end of transmission block) 55 7 87 W 119 w
24 CAN (cancel) 56 8 88 X 120 x
25 EM (end of medium) 57 9 89 Y 121 y
26 SUB (substitute) 58 : 90 Z 122 z
27 ESC (escape) 59 ; 91 [ 123 {
28 FS (file separator) 60 < 92 \ 124 |
29 GS (group separator) 61 = 93 ] 125 }
30 RS (record separator) 62 > 94 ^ 126 ~
31 US (unit separator) 63 ? 95 _ 127 DEL (delete)

Codes 0-31 are called the unprintable chars, and they’re mostly used to do formatting and control printers. Most of these are obsolete now.

Codes 32-127 are called the printable characters, and they represent the letters, number characters, and punctuation that most computers use to display basic English text.

Initializing chars

You can initialize char variables using character literals:

You can initialize chars with integers as well, but this should be avoided if possible

Warning

Be careful not to mix up character numbers with integer numbers. The following two initializations are not the same:

Character numbers are intended to be used when we want to represent numbers as text, rather than as numbers to apply mathematical operations to.

Printing chars

When using std::cout to print a char, std::cout outputs the char variable as an ASCII character:

This produces the result:

ab

We can also output char literals directly:

This produces the result:

c

A reminder

The fixed width integer int8_t is usually treated the same as a signed char in C++, so it will generally print as a char instead of an integer.

Printing chars as integers via type casting

If we want to output a char as a number instead of a character, we have to tell std::cout to print the char as if it were an integer. One (poor) way to do this is by assigning the char to an integer, and printing the integer:

However, this is clunky. A better way is to use a type cast. A type cast creates a value of one type from a value of another type. To convert between fundamental data types (for example, from a char to an int, or vice versa), we use a type cast called a static cast.

The syntax for the static cast looks a little funny:

static_cast<new_type>(expression)

static_cast takes the value from an expression as input, and converts it into whatever fundamental type new_type represents (e.g. int, bool, char, double).

Key insight

Whenever you see C++ syntax (excluding the preprocessor) that makes use of angled brackets, the thing between the angled brackets will most likely be a type. This is typically how C++ deals with concepts that need a parameterizable type.

Here’s using a static cast to create an integer value from our char value:

This results in:

a
97
a

It’s important to note that the parameter to static_cast evaluates as an expression. When we pass in a variable, that variable is evaluated to produce its value, which is then converted to the new type. The variable is not affected by casting its value to a new type. In the above case, variable ch is still a char, and still holds the same value.

Also note that static casting doesn’t do any range checking, so if you cast a large integer into a char, you’ll overflow your char.

We’ll talk more about static casts and the different types of casts in a future lesson (6.15 -- Explicit type conversion (casting)).

Inputting chars

The following program asks the user to input a character, then prints out both the character and its ASCII code:

Here’s the output from one run:

Input a keyboard character: q
q has ASCII code 113

Note that std::cin will let you enter multiple characters. However, variable ch can only hold 1 character. Consequently, only the first input character is extracted into variable ch. The rest of the user input is left in the input buffer that std::cin uses, and can be extracted with subsequent calls to std::cin.

You can see this behavior in the following example:

Input a keyboard character: abcd
a has ASCII code 97
b has ASCII code 98

Char size, range, and default sign

Char is defined by C++ to always be 1 byte in size. By default, a char may be signed or unsigned (though it’s usually signed). If you’re using chars to hold ASCII characters, you don’t need to specify a sign (since both signed and unsigned chars can hold values between 0 and 127).

If you’re using a char to hold small integers (something you should not do unless you’re explicitly optimizing for space), you should always specify whether it is signed or unsigned. A signed char can hold a number between -128 and 127. An unsigned char can hold a number between 0 and 255.

Escape sequences

There are some characters in C++ that have special meaning. These characters are called escape sequences. An escape sequence starts with a ‘\’ (backslash) character, and then a following letter or number.

You’ve already seen the most common escape sequence: ‘\n’, which can be used to embed a newline in a string of text:

This outputs:

First line
Second line

Another commonly used escape sequence is ‘\t’, which embeds a horizontal tab:

Which outputs:

First part        Second part

Three other notable escape sequences are:
\’ prints a single quote
\” prints a double quote
\\ prints a backslash

Here’s a table of all of the escape sequences:

Name Symbol Meaning
Alert \a Makes an alert, such as a beep
Backspace \b Moves the cursor back one space
Formfeed \f Moves the cursor to next logical page
Newline \n Moves cursor to next line
Carriage return \r Moves cursor to beginning of line
Horizontal tab \t Prints a horizontal tab
Vertical tab \v Prints a vertical tab
Single quote \’ Prints a single quote
Double quote \” Prints a double quote
Backslash \\ Prints a backslash.
Question mark \? Prints a question mark.
No longer relevant. You can use question marks unescaped.
Octal number \(number) Translates into char represented by octal

Hex number \x(number) Translates into char represented by hex number

Here are some examples:

Prints:

"This is quoted text"
This string contains a single backslash \
6F in hex is char 'o'

Newline (\n) vs. std::endl

We cover this topic in lesson 1.5 -- Introduction to iostream: cout, cin, and endl.

What’s the difference between putting symbols in single and double quotes?

Stand-alone chars are always put in single quotes (e.g. ‘a’, ‘+’, ‘5’). A char can only represent one symbol (e.g. the letter a, the plus symbol, the number 5). Something like this is illegal:

Text put between double quotes (e.g. “Hello, world!”) is called a string. A string is a collection of sequential characters (and thus, a string can hold multiple symbols).

For now, you’re welcome to use string literals in your code:

We’ll discuss strings in the next lesson (4.12 -- An introduction to std::string).

Rule

Always put stand-alone chars in single quotes (e.g. ‘t’ or ‘\n’, not “t” or “\n”). This helps the compiler optimize more effectively.

What about the other char types, wchar_t, char16_t, and char32_t?

wchar_t should be avoided in almost all cases (except when interfacing with the Windows API). Its size is implementation defined, and is not reliable. It has largely been deprecated.

As an aside...

The term “deprecated” means “still supported, but no longer recommended for use, because it has been replaced by something better or is no longer considered safe”.

Much like ASCII maps the integers 0-127 to American English characters, other character encoding standards exist to map integers (of varying sizes) to characters in other languages. The most well-known mapping outside of ASCII is the Unicode standard, which maps over 110,000 integers to characters in many different languages. Because Unicode contains so many code points, a single Unicode code point needs 32-bits to represent a character (called UTF-32). However, Unicode characters can also be encoded using multiple 16-bit or 8-bit characters (called UTF-16 and UTF-8 respectively).

char16_t and char32_t were added to C++11 to provide explicit support for 16-bit and 32-bit Unicode characters. char8_t has been added in C++20.

You won’t need to use char8_t, char16_t, or char32_t unless you’re planning on making your program Unicode compatible. Unicode and localization are generally outside the scope of these tutorials, so we won’t cover it further.

In the meantime, you should only use ASCII characters when working with characters (and strings). Using characters from other character sets may cause your characters to display incorrectly.


4.12 -- An introduction to std::string
Index
4.11 -- Compound statements and nested blocks

87 comments to 4.11 — Chars

  • PReinie

    Pathik asked "Why do you need ‘/?’…. we can use question mark with cout…." and Alex responded with "Honestly, I don’t know. :)". Pathik's example used "?".

    From my years of C coding, single quotes enclose one character and double-quotes enclose a string. Strings were (are?) far different than a char - mostly in storage (memory) and using them. Strings are one or more characters and terminated (usually with a zero - not the character 0, but all zero bits). Strings always take up more than 8 bits (one byte).
    cout may not care, but your variable declaration (compiler) may be particular about it and/or your code may not work.

    The slash question mark may be used in case a compiler interprets question marks as something else. This depends on your compiler, IE, it's just to make sure your code really does (compiles to) what you want it to do. Some pattern matching (MS Word?) use a question mark to substitute/match a single character, and to match a question mark you have to escape it. Same thing may be for some compilers.

  • PReinie

    Alex, thanks for the info!

    From my background I've used binary, octal, hex and decimal when referring to characters, so please make sure above when you say an 'a' is 97 you specify that it's (it is) a decimal 97. Granted the compiler assumes decimal, but readers may not.

    Also, "The following program asks the user to input a character, then prints out both the character and it’s ASCII code:" Please check in all the tutorials each use of "it's" which is short for "it is" and is never possessive. "its" is possessive just is his, hers and theirs are possessive and you do not use an apostrophe for them.

  • Kaiser

    Alright, so I picked up a few ideas from replies on here.
    I just want some verification on my idea. I'm pretty sure it will work and going to test it out after this reply..

    anyways to get to it.

    the

    SYSTEM("");

    cmd is some sort of source for calling dos commands? if this is true... I could use code

    SYSTEM("MKDIR C:?.*");

    to make a structure directory program ??? based upon the user's

    cin>>""

    < input? right??
    And would arrays be a better idea on structuring?
    thanks for the look over.

    I'm in college atm and i end up making a million nested directories.. for which i use dosprompt for) (goes faster in my opinion).

  • maxwellsdemon

    I want to compare strings with each other but i run in some serious problems.
    String which I have declared in the exact same way when compared with each other
    give a false (0) in stead of true (1). Here is an example:

    #include <iostream>
    #include "math.h"
    
    using namespace std;
    
    int main()
    {
        char sss[]={'s', 'i', 'n'};
        char ddd[]={'s', 'i', 'n'};
        bool cmp;
        cmp=(sss==ddd);
        cout<<cmp;
    
    
        return 0;
    }
    

    This reterns 0.

    so does this:

    #include <iostream>
    #include "math.h"
    
    using namespace std;
    
    int main()
    {
        char sss[]="sin";
        char ddd[]="sin";
        bool cmp;
        cmp=(sss==ddd);
        cout<<cmp;
    
    
        return 0;
    }
    

    What is the problem when comparing strings. Is it that null operator at the end or what?
    I have done conversion to an integer and even that does not come out the same. In the case above it is the last two digits that come up different.

    Can someone explain this to me.

    • Kaiser

      I'm guessing it has something to do with the [].
      correct me if I'm wrong pros, taking a guess. Started learning c++ (first lang ever) last week, so I might be wrong on my theory)

      the [] will make both of them different even if the values are the same... so try this...

      char sss='sin';
          char ddd='sin';

      Just testing myself in problem solving. please correct me if I'm wrong.

      • Kaiser

        EDIT**

        The reason they are different is because your pulling up the address into the equation by using []. i believe.. because your dealing with arrays.

      • PReinie

        Using [] declares the variable as an array of characters. An array is different than a string and is different than one character... but each element of the array is one character. Unlike a string, the array is not terminated by a zero (null), unless you put one there.

        If you compare each element of the array one for one (zero to zero, one to one and two to two) each element will give '1' for the match. In my tests, I made each array larger:
        ={'s', 'i', 'n', 's'}; and if you compare sss[0] to ddd[0] or sss[0] to ddd[3] they match.

        To compare strings you will have to use a string-compare function. (I haven't gotten that far yet. Just recalling unix shell and C code from my past.

  • Nivm

    I have this odd feeling Alex just isn't looking here anymore.

  • Nivm

    I have a bit of code that aims to print all the available codes and characters in a list,

    #include <iostream>
    
    char eachChar=0;
    
    int main()
    {using namespace std;
     if (eachChar=127)
      return 0;
     else
      cout<<"Code "<<(int)eachChar<<" is "<<eachChar<<endl;
      eachChar=eachChar+1;
      main();
     }

    Why isn't this working as intended? There are no debug errors, but it seems to simply go strait to the "return 0;", but returns a ridiculously large negative number, this

    Process terminated with status -1073741510 (0 minutes, 3 seconds)

    in red, to be specific. (still working with Code Blocks)

    • Nivm

      So I cannot simply increment a char, I must use a string array.

    • PReinie

      Nivm - your line: "if (eachChar=127)" is actually assigning 127 to the variable eachChar. Depending on the result of that operation, you may be causing the "if" to fail, or it's just looking at the result of that operation and then executing the next line which is the return 0;.

      You can increment a char. You do not need a string. Incrementing a string doesn't make any sense (at least in C).

  • prabhakar

    i have the necessary message informing re. the registration but the link presribed is not working pl. advise prabhakar
    link gives is
    http://www.dev-spot.com/forums/index.php?action=activate;u=658;code=5d7dfff065

  • prabhakar

    the comments by me as on 2009-12-03 produced above be please be used in FORUM for begginers. i have registered 0n 2009-12-09. thanks prabhakar

  • prabhakar

    respected ALEX, i have been deligently and sincerely trying to follow your "lessons".
    i tried to compile one code re. CHAR.
    I AM NOT GETTING THE RESULTS AS PROJECTED BY U.

    [One word of caution: be careful not to mix up character (keyboard) numbers with actual numbers. The following two assignments are not the same

    view sourceprint?
    1.
    char chValue = '5'; // assigns 53 (ASCII code for '5')
    2.
    char chValue2 = 5; // assigns 5 ]

    What 'cout' gives me is

    a)5 for chValue
    b)a sign like "club in playing cards"

    please advise me---thanks prabhakar

    • Alex

      If you print char 53, you'll get the character '5'.
      If you print char 5, you'll get the "club in playing cards" on Windows, because the font that Windows uses in the console has this club symbol mapped to code point 5. Why? I guess Microsoft figured that putting a printable symbol at that code point was better than leaving something unprintable there.

      Note that this club symbol is not a standard part of ASCII and will not work on other operating systems, like Unix.

  • Shawn

    Hey alex how come you have used #include "iostream"; (in the 4th coding)
    i thought u said the ";" is not required

    p.s. so far i love this tutorial very informative and most of all very easy to understand =)
    Thanks a lot for this wonderful site and wish u guys all the best

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">