Multi-byte

Multi-byte refers to string encoding where characters are coded over several bytes, rather than just one.

In ASCII, the roman alphabet is coded over 1 character, making it mono-byte. But other languages, such as Chinese, needs for than one byte to represents all the ideograms.

Unicode, for example, is multi-byte: 2 bytes. It may be encoded as UTF-8 on 2 bytes, UTF-16 on 4 bytes or UTF-32 on 8 bytes.

In PHP, string functions are single-byte. When needed, extensions such as iconv, intl and mbstring are able to manipulate multi-byte characters without breaking them.

<?php

    print strlen('me'); // 2 chars

    print strlen('我'); // 3 chars

    print mb_strlen('我'); // 1 chars

?>

See also Character Encoding.

Related : Unicode, UTF-8, American Standard Code for Information Interchange (ASCII), Byte, Text