13.5 — Stream states and input validation

Stream states

The ios_base class contains several state flags that are used to signal various conditions that may occur when using streams:

Flag Meaning
goodbit Everything is okay
badbit Some kind of fatal error occurred (eg. the program tried read past the end of a file)
eofbit The stream has reached the end of a file
failbit A non-fatal error occurred (eg. the user entered letters when the program was expecting an integer)

Although these flags live in ios_base, because ios is derived from ios_base and ios takes less typing than ios_base, they are generally accessed through ios (eg. as std::ios::failbit).

ios_base also provides a number of member functions in order to conveniently access these states:

Member function Meaning
good() Returns true if the goodbit is set (the stream is ok)
bad() Returns true if the badbit is set (a fatal error occurred)
eof() Returns true if the eofbit is set (the stream is at the end of a file)
fail() Returns true if the failbit is set (a non-fatal error occurred)
clear() Clears all flags and restores the stream to the goodbit state
clear(state) Clears all flags and sets the state flag passed in
rdstate() Returns the currently set flags
setstate(state) Sets the state flag passed in

The most commonly dealt with bit is the failbit, which is set when the user enters invalid input. For example, consider the following program:

Note that this program is expecting the user to enter an integer. However, if the user enters non-numeric data, such as “Alex”, cin will be unable to extract anything to nAge, and the failbit will be set.

If an error occurs and a stream is set to anything other than goodbit, further stream operations on that stream will be ignored. This condition can be cleared by calling the clear() function.

Input validation

Input validation is the process of checking whether the user input meets some set of criteria. Input validation can generally be broken down into two types: string and numeric.

With string validation, we accept all user input as a string, and then accept or reject that string depending on whether it is formatted appropriately. For example, if we ask the user to enter a telephone number, we may want to ensure the data they enter has ten digits. In most languages (especially scripting languages like Perl and PHP), this is done via regular expressions. However, C++ does not have built-in regular expression support (it’s supposedly coming with the next revision of C++), so typically this is done by examining each character of the string to make sure it meets some criteria.

With numerical validation, we are typically concerned with making sure the number the user enters is within a particular range (eg. between 0 and 20). However, unlike with string validation, it’s possible for the user to enter things that aren’t numbers at all -- and we need to handle these cases too.

To help us out, C++ provides a number of useful functions that we can use to determine whether specific characters are numbers or letters. The following functions live in the cctype header:

Function Meaning
isalnum(int) Returns non-zero if the parameter is a letter or a digit
isalpha(int) Returns non-zero if the parameter is a letter
iscntrl(int) Returns non-zero if the parameter is a control character
isdigit(int) Returns non-zero if the parameter is a digit
isgraph(int) Returns non-zero if the parameter is printable character that is not whitespace
isprint(int) Returns non-zero if the parameter is printable character (including whitespace)
ispunct(int) Returns non-zero if the parameter is neither alphanumeric nor whitespace
isspace(int) Returns non-zero if the parameter is whitespace
isxdigit(int) Returns non-zero if the parameter is a hexadecimal digit (0-9, a-f, A-F)

String validation

Let’s do a simple case of string validation by asking the user to enter their name. Our validation criteria will be that the user enters only alphabetic characters or spaces. If anything else is encountered, the input will be rejected.

When it comes to variable length inputs, the best way to validate strings (besides using a regular expression library) is to step through each character of the string and ensure it meets the validation criteria. That’s exactly what we’ll do here.

Note that this code isn’t perfect: the user could say their name was “asf w jweo s di we ao” or some other bit of gibberish, or even worse, just a bunch of spaces. We could address this somewhat by refining our validation criteria to only accept strings that contain at least one character and at most one space.

Now let’s take a look at another example where we are going to ask the user to enter their phone number. Unlike a user’s name, which is variable-length and where the validation criteria are the same for every character, a phone number is a fixed length but the validation criteria differ depending on the position of the character. Consequently, we are going to take a different approach to validating our phone number input. In this case, we’re going to write a function that will check the user’s input against a predetermined template to see whether it matches. The template will work as follows:

A # will match any digit in the user input.
A @ will match any alphabetic character in the user input.
A _ will match any whitespace.
A ? will match anything.
Otherwise, the characters in the user input and the template must match exactly.

So, if we ask the function to match the template “(###) ###-####”, that means we expect the user to enter a ‘(‘ character, three numbers, a ‘)’ character, a space, three numbers, a dash, and four more numbers. If any of these things doesn’t match, the input will be rejected.

Here is the code:

Using this function, we can force the user to match our specific format exactly. However, this function is still subject to several constraints: if #, @, _, and ? are valid characters in the user input, this function won’t work, because those symbols have been given special meanings. Also, unlike with regular expressions, there is no template symbol that means “a variable number of characters can be entered”. Thus, such a template could not be used to ensure the user enters two words separated by a whitespace, because it can not handle the fact that the words are of variable lengths. For such problems, the non-template approach is generally more appropriate.

Numeric validation

When dealing with numeric input, the obvious way to proceed is to use the extraction operator to extract input to a numeric type. By checking the failbit, we can then tell whether the user entered a number or not.

Let’s try this approach:

If the user enters a number, will be false, and we will hit the break statement, exiting the loop. If the user enters input starting with a letter, will be true, and we will go into the conditional.

However, there’s one more case we haven’t tested for, and that’s when the user enters a string that starts with numbers but then contains letters (eg. “34abcd56”). In this case, the starting numbers (34) will be extracted into nAge, the remainder of the string (“abcd56”) will be left in the input stream, and the failbit will NOT be set. This causes two potential problems:

1) If you want this to be valid input, you now have garbage in your stream.
2) If you don’t want this to be valid input, it is not rejected (and you have garbage in your stream).

Let’s fix the first problem. This is easy:

If you don’t want such input to be valid, we’ll have to do a little extra work. Fortunately, the previous solution gets us half way there. We can use the gcount() function to determine how many characters were ignored. If our input was valid, gcount() should return 1 (the newline character that was discarded). If it returns more than 1, the user entered something that wasn’t extracted properly, and we should ask them for new input. Here’s an example of this:

Numeric validation as a string

The above example was quite a bit of work simply to get a simple value! Another way to process numeric input is to read it in as a string, process it as a string, and if it passes the validation, convert it to a numeric type. The following program makes use of that methodology:

Whether this approach is more or less work than straight numeric extraction depends on your validation parameters and restrictions.

As you can see, doing input validation in C++ is a lot of work. Fortunately, many such tasks (eg. doing numeric validation as a string) can be easily turned into functions that can be reused in a wide variety of situations.

13.6 -- Basic file I/O
13.4 -- Stream classes for strings

10 comments to 13.5 — Stream states and input validation

  • real wonderful tutorial !!!!!!!!!!!!!

  • undo

    anyone? please people i'm stuck..i feel like an android

  • undo

    well..i'm going crazy..i look at it again and again but i don't get anything..And guess im the only one who does not get anything about this template validation.?

    we don't have any (? _ @) symbols in our (###) ###-#### template so how come we expect for our template's indexes will match with those symbols? we logically want strTemplate[nIndex] = # or strTemplate[nIndex] = ? etc.. to execute our statement under case label.. isn't that right? i dont know dude..what is this? what is this meaning:

    we pass this string ---------> (###) ###-####

    case '@' : ---------------->(how could i expect my strTemplate[nIndex] will match a '@'symbol? we have no '@' symbol in our(###) ###-#### ???? and i need to match those two to execute statement under case label?? )

    i know i miss something here and i can't see it.. i understood to whole c++ thing beggining to end except this template thing :D
    please any help will be so appreciated.

  • Ravi Gautam

    Is it because any sting is an array with a terminator at the end?

  • Ravi Gautam

    Is (strUserInput[nIndex] an 'Array' ?

    If yes how can we use an array without first declaring it?

  • bla

    After the function that uses the template “(###) ###-####” the text says:
    "if #, @, _, and ? are valid characters in the user input, this function won’t work, because those symbols have been given special meanings."
    I don't get why it wouldn't work... Can somebody explain this? Has the "special meaning" been given by the code or does it originate in C++?

    • D.M. Ryan

      The special characters are missed by isdigit, isspace and isalpha. Since those three commands are being used to screen out bad input, any of those special characters would be treated as bad input. If they're valid, then the function won't work because it won't let those special characters through.

      The only way I know how to test for them would be to make the conditionals that "return false" more stringent through the use of and's. For example, taking out

      case '#': // match a digit  
          if (!isdigit(strUserInput[nIndex]))  
              return false;  

      and replacing it with an added and, like so:

      case '#': // match a digit
          if (!isdigit(strUserInput[nIndex]) && strUserInput[nIndex] != '?') // note added and
              return false;

      means the test conditional will not return "false" if a question mark is entered where a digit is supposed to be. That allows the user to substiture a question mark for a digit, but only for a digit.

  • Martin

    Thanks a lot.
    Really good article. Except I used
    instead of
    cin.ignore(1000, '\n');

    I had a problem with two loops because I used clear and ignore at wrong places. So if 123abc was entered, somehow my app just kept silent until I pressed enter. Using your sample code for
    "2) If you don’t want this to be valid input, it is not rejected (and you have garbage in your stream)."

    fixed my problems and now my integer validation works reliably no matter when and how often I use it.


  • Violator

    And how do you filter out F1 to F12 keys?

  • sagar

    good program's.
    i want some programs which give from user only digits.something like this.
    if u have please send me on my e-mail.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">