[Introduction]

Unix Incompatibility Notes:
Byte Order

Jan Wolter

Any program that writes binary files that may have to be read by another computer needs to be concerned about byte order issues. Different processors write integers differently.

There is a minority view that says if you code properly then you never need to know the endianness of your machine. You should certainly consider carefully if you can do so in your application.

Terminology

Let's suppose we are writing out a four byte long integer 67305985. In hexadecimal, this is 0x04030201, so the most significant byte contains the hexadecimal value 04 and the least significant byte contains the hexadecimal value 01. Suppose this is written out to memory address x. The value will actually be written to four consecutive addresses, x through x+3. Which byte of data goes in which memory location? It depends on the processor. The alternatives are named after Lilliputian political parties: Some processors (PowerPC, MIPS, DEC Alpha) can be either big-endian or little-endian depending on software settings.

Network byte order is the standard used in packets sent over the internet. It is big-endian (except that technically it refers to the order in which bytes are transmitted, not the order in which they are stored). If you are going to chose an arbitrary order to standardize on, network-byte order is a sensible choice.

The unix functions htonl(), htons(), ntohl(), and ntohs() convert longs and shorts back and forth between the host byte order and network byte order. However, though they are widely available, they are not universally available.

Compile-time Tests

We'd usually prefer to determine endianness at compile time. Most modern Unix systems define the byte order in the sys/param.h include file. Some code I've seen references the endian.h or machine/endian.h files instead, but I think that if those exist, then sys/param.h always pulls the appropriate ones in. Note however that some older systems (including SunOS 4.1) have sys/param.h but it does not define any byte order information.

The sys/param.h header normally defines the symbols __BYTE_ORDER, __BIG_ENDIAN, __LITTLE_ENDIAN, and __PDP_ENDIAN. You can test endianness by doing something like:

   #include <sys/param.h>

   #ifdef __BYTE_ORDER
   # if __BYTE_ORDER == __LITTLE_ENDIAN
   #  define I_AM_LITTLE_ENDIAN
   # else
   #  if __BYTE_ORDER == __BIG_ENDIAN
   #   define I_AM_BIG_ENDIAN
   #  else
       Error: unknown byte order!
   #  endif
   # endif
   #endif /* __BYTE_ORDER */
If __BYTE_ORDER is not defined, you may want to test for the existance of BYTE_ORDER, BIG_ENDIAN and LITTLE_ENDIAN. Linux defines these as synonym of the versions with underscores, apparantly in attempt to be compatible with BSD Unix.

If that is not defined, you might try things like:

   #if defined (i386) || defined (__i386__) || defined (_M_IX86) || \
        defined (vax) || defined (__alpha)
   # define I_AM_LITTLE_ENDIAN
   #endif
However trying to cover all bases with this sort of thing seems futile, and may be complicated by architectures that can work either way. Ultimately, it is better to fall back to a run-time test.

Run-time Tests

It's easy enough to write code to check if you are big or little endian. The following function returns true if we are big endian.
  int am_big_endian()
  {
     long one= 1;
     return !(*((char *)(&one)));
  }
Or an alternate version using unions (based on Harbison & Steele):
  int am_big_endian()
  {
      union { long l; char c[sizeof (long)]; } u;
      u.l = 1;
      return (u.c[sizeof (long) - 1] == 1);
  }

I suspect that these run-time tests are the better solution.

Myths

Discussions of endian-ness on the web seem to contain quite a lot of bogus information. This includes:
Jan Wolter (E-Mail)
Sat Dec 24 23:24:14 EST 2005 - Myths and various updates.
Tue Feb 5 22:51:05 EST 2002 - Original release.