[Introduction]

Unix Incompatibility Notes:
Character Type Functions

Jan Wolter

This page describes portability issues related to the ctype.h functions in Unix. There aren't really that many.

Versions of ctype.h exist on every Unix system I've ever encountered, but there are small differences in the set of functions defined and their behavior.

Originally these were implemented using the English language alphabet. One of the bigger changes in the ANSI standard version was the extension to international alphabets, so that their behavior changes based on the locale that you set.

Character Classification

In older versions of Unix, the other character conversion macros were defined only if isascii() was true. So you'd always need to write
   (isascii(ch) && isalpha(ch))
This limitation exists in some fairly recent implementations (eg, Solaris), you should still do this. I think in most implementations, even old ones, these functions will work sensibly if passed an EOF value.
isalnum()
Check if alphanumeric. Equivalent to (isalpha(c) || isdigit(c)).

isalpha()
Check if alphabetic. In the default 'C' locale, this is equivalent to (isupper(c) || islower(c)), but this is not true in all domains, where some alphabetic characters are neither upper nor lower case.

isascii()
Check if ASCII. Seven bit character values between 0 and 127 are ASCII. In older versions of Unix, this was the only one of the character classification macros defined on non-ascii characters.

isblank()
Check if a space or tab character. This is a Gnu extension, is not in the ANSI standard and is not available everywhere.

iscntrl()
Check if a control character. Character values between 0 and 31 are control characters, as is character value 127 (the DEL character).

isdigit()
Check if a digit.

isgraph()
Check if a printable character other than a space. This didn't exist in early implementations of ctype.h

islower()
Check if lower case. Which characters are lower case depends on locale.

isprint()
Check if printable. Equivalent to (isgraph(c) || c == ' ') or to !iscntrl(c).

ispunct()
Equivalent to (isgraph(c) && !isalnum(c)).

isspace()
Check if white space. In "C" and "POSIX" locale, these are space, form-feed ('\f'), newline ('\n') carriage-return ('\r') horizontal-tab ('\t') and vertical-tab ('\v').

isupper()
Check if upper case. Which characters are upper case depends on locale.

isxdigit()
Check for hexidecimal digit. That is '0' through '9', 'A' through 'F', or 'a' through 'f', This can differ in different locales. This didn't exist in early implemtations of ctyle.

Character Conversion

toascii()
Converts a character to ascii by clearing the high bit. Not safe outside the standard locales, since it turns accented letters into random characters.

This does not exist in some of the very old Unix versions, but those are probably rare enough now not to be worth worrying about.

tolower()
If given an uppercase letter, as defined by isupper(), return the corresponding lowercase letter.

ANSI versions return the input character if the input character is not upper case. However, older versions would return random junk if passed a character that was not upper case. For compatibility with such implementations, you'd need always to do:

    ((isascii(ch) && isupper(ch)) ? tolower(ch) : ch)
Ain't backwards compatibility lovely?

toupper()
If given an lowercase letter, as defined by islower(), return the corresponding uppercase letter.

ANSI versions return the input character if the input character is not lower case or if there is no corresponding upper case letter (in German, the sharp s has no upper case version). However, older versions would return random junk if passed a character that was not upper case. For compatibility with such implementations, you'd need always to do:

    ((isascii(ch) && islower(ch)) ? toupper(ch) : ch)

_tolower()
This macro version of tolower() is available on most newer versions of Unix. It behaves like the old-fashioned tolower() function in that it's result is undefined if it is passed a character that is not upper case.

_toupper()
This macro version of toupper() is available on most newer versions of Unix. It behaves like the old-fashioned toupper() function in that it's result is undefined if it is passed a character that is not a lower case.


Jan Wolter (E-Mail)
Thu Mar 6 09:38:43 EST 2003 - Original Release.