|
||||
|
SymptomsIn the output of a command line program (like e.g. KPScript), some special characters are displayed correctly and some are not. When redirecting the output to a TXT file using the ' These problems usually appear only under Windows, not Linux (because Linux usually uses UTF-8 encoding). CauseThe Windows command line window by default uses OEM code pages. For general information about code pages, see Wikipedia: Code page. In the US, code page 437 is used; in Western Europe, code page 850 is used; etc. A detailed list can be found here: Code Page Identifiers. These OEM code pages do not support all characters. They include a small subset of foreign characters (e.g. the US code page 437 includes some Greek characters), thus some special characters display properly in a command line window. Characters that are rarely used cannot be encoded though. When redirecting the output to a TXT file using the ' Finding your code page.
You can find out which code page your command line window is using by
typing ' Weak solution: Good text editorThe most simple solution is to tell the text editor which code page the TXT file is using. All characters supported by the code page are then loaded/displayed properly. The disadvantage of this solution is that characters outside the console code page are lost. Every advanced text editor supports selecting the code page; some examples:
Recommended solution: Change console code pageThe console character encoding can be changed to UTF-8, which is identified by code page 65001 (on Windows systems). UTF-8 allows encoding all Unicode characters, i.e. special characters of all languages are supported. In order to change the code page to UTF-8, run the following command: Chcp 65001
This works fine under Windows 7 and higher. Older operating systems might not support it. The command must be executed in the command line window before running the command that redirects the output to the TXT file. Windows does not save the chosen code page, so the code page change command must be executed in every command line window separately. The output TXT file will be encoded using UTF-8. This encoding is supported by almost every text editor. UTF-8 is usually detected automatically, i.e. you do not have to select the encoding / code page manually; you can "just open" the file. After changing the encoding to UTF-8, special characters might be displayed improperly in the command line window (but are written fine to the TXT file), because the default raster font does not support the characters. In this case, select a different font (by clicking on the command line window's icon → 'Properties'), like e.g. 'Consolas' or 'Lucida Console'. PowerShellWhen using the Windows PowerShell instead of the standard command line window,
the TXT file will always be encoded using UTF-16 LE, independent of
which console code page is selected.
PowerShell automatically converts the output of a command line program
from the currently active console code page to the UTF-16 LE representation.
So, PowerShell does not magically preserve special characters.
The command line program can only output characters that can be encoded
using the currently active console code page.
Thus, it is recommended to use the |
|
||