Using UTF8 on FreeBSD

Copied the content from here https://www.b1c1l1.com/blog/2011/05/09/using-utf-8-unicode-on-freebsd/ all credits to the original author Benjamin Lee

May 9, 2011 · Benjamin Lee · freebsd · unicode

Unicode is a set of character encodings that are compatible with the Universal Coded Character Set (UCS) defined by ISO/IEC 10646. Unicode was designed to replace all previous character encodings such as the American Standard Code for Information Interchange (US-ASCII) and ISO/IEC 8859.

UTF-8, which is also described in RFC 3629, is a variable-length Unicode character encoding that is backwards compatible with US-ASCII. That is, all US-ASCII characters have the same encoding under both US-ASCII and UTF-8. Due to the widespread use of US-ASCII in computing environments, this backwards compatibility makes UTF-8 convenient to deploy and therefore a popular choice for multilingual computing environments.

FreeBSD, like many UNIX-based operating systems, is unfortunately not configured to use UTF-8 by default. This sometimes causes confusion about whether Unicode is supported on FreeBSD. Fortunately, it is easy to enable UTF-8 on FreeBSD.

  1. Determine the appropriate UTF-8 locale for your language and country. locale(1) can be used to print the names of all available locales.locale -a | grep '\.UTF-8$'
  2. Update the charset, lang, and setenv attributes in login.conf(5). It is recommended that LC_COLLATE be set to C because some programs still require ASCII ordering in order to function correctly.
    • To enable UTF-8 on a system-wide basis, update the default login class in /etc/login.conf.blee@eclipse ~ $ diff -u /usr/src/etc/login.conf /etc/login.conf --- /usr/src/etc/login.conf 2011-03-10 13:48:59.000000000 -0800 +++ /etc/login.conf 2011-05-08 16:44:01.000000000 -0700 @@ -26,7 +26,7 @@ :passwd_format=md5:\ :copyright=/etc/COPYRIGHT:\ :welcome=/etc/motd:\ - :setenv=MAIL=/var/mail/$,BLOCKSIZE=K,FTP_PASSIVE_MODE=YES:\ + :setenv=MAIL=/var/mail/$,BLOCKSIZE=K,FTP_PASSIVE_MODE=YES,LC_COLLATE=C:\ :path=/sbin /bin /usr/sbin /usr/bin /usr/games /usr/local/sbin /usr/local/bin ~/bin:\ :nologin=/var/run/nologin:\ :cputime=unlimited:\ @@ -44,7 +44,9 @@ :pseudoterminals=unlimited:\ :priority=0:\ :ignoretime@:\ - :umask=022: + :umask=022:\ + :charset=UTF-8:\ + :lang=en_US.UTF-8: #
    • To enable UTF-8 on a per-user basis, update ~/.login_conf. This is useful on servers that you do not administer and therefore cannot make system-wide changes.blee@eclipse ~ $ cat ~/.login_conf me:\ :charset=UTF-8:\ :lang=en_US.UTF-8:\ :setenv=LC_COLLATE=C:
  3. If /etc/login.conf was modified, run cap_mkdb(1) to rebuild the login class capabilities database.sudo cap_mkdb /etc/login.conf
  4. Exit all existing sessions that have the old locale settings.
  5. Verify that the new settings took effect by running locale(1).blee@eclipse ~ $ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_COLLATE=C LC_TIME="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_ALL=
  6. If applicable, make application-specific configuration changes to enable UTF-8. Note that this has become increasingly unnecessary as applications have begun respecting locale settings.
  7. Restart all applications that were started with the old locale settings.
updatedupdated2023-08-262023-08-26