Delphi – HIGHCHARUNICODE directive (Delphi) – RAD Studio
Posted by jpluimers on 2010/01/18
I forgot about it, but this thread (which got wiped by Embarcadero) reminded be about the differences between these two character values.
Quoting from the first post:
c1 := #128; c2 := chr(128); Assert(c1 = c2);
the assertion fails, meaning that c1 <> c2.
In fact c1 =
#$20AC
and c2 =#$80
.
Since Chr
is a pseudo-function that does a conversion from an integer to a Unicode character, c2
ends up as Unicode codepoint U+0080
, whereas c1
gets converted from the AnsiChar
value 0x80
(the [WayBack] Euro Sign in a lot of Ansi codepages) into Unicode codepoint U+20AC
.
[Way Back] Allen Bauer correctly mentioned that in order to define a character constant as a true Unicode codepoint, you have to use 4 hexadecimal digits:
c1 := #$0080; c2 := chr(128); Assert(c1 = c2);
This syntax with 4 hexadecimal digits is backwards compatible: with the above code, Pre-Delphi-2009 compilers, will get Ansi codepoint 128.
If you cannot rely on the encoding of your Delphi source files (for instance because your version control system mangles them, or for other reasons) that is the only way to go, hence my SO answer on [WayBack] Wrong Unicode conversion, how to store accent characters in Delphi 2010 source code and handle character sets?
Don’t rely on the encoding of your Delphi source code files.
It might be mangled when using any non-Unicode tool to work on your text files (or even buggy Unicode aware tools).
The best way is to specify your characters as a 4-digit Unicode code point.
const MyEuroSign = #$20AC;
A few more notes:
Here you can find a few of the Unicode codepoints (thanks [WayBack] Thomas Schild!):
[Way Back] Rudy Velthuis explains that you can automagically force the Delphi compiler to always use Unicode codepoints using the $HIGHCHARUNICODE
directive (I didn’t know that <g>). That is not always what you want though. So it is better to expand your character constants into 4 hexadecimal digits.
See: [Archive.is] HIGHCHARUNICODE directive (Delphi) – RAD Studio (which got first fully documented in XE3, as the 2009 documentation left out the #xxx
case).
Some more people that got bitten by this
- [WayBack] Roman Yankovsky – Google+ – I think I found a compiler issue, could you please vote for….
- [WayBack] Unicode Leftover Bug From Hell – DelphiTools.
- [WayBack] varc: Char;begincase c ofChar(#$C0)..Char(#$D6) : begin end;end;Why I get error:[dcc32 Error] E2011 Low bound exceeds high bound on Delphi Tokyo? – Jacek Laskowski – Google+
–jeroen
Olaf Monien said
Very important to note here, is that you are actually working with VARIABLES and not with CONSTANTS. For constants you have to realize how types are deferred:
const
c1 = #$80;
var
c2 : char = #$80;
assert(c1=c2);
The above assertion will fail as the deferred type for c1 will be ANSICHAR! (c2 will be #$20AC as you pointed out) For me both character representations appear to equal ( the Euro sign €), but in other parts of the world c1 may be something different – i.e. assert has to fail, as we don’t check the locale here.
jpluimers said
Indeed. Thanks for adding this.
–jeroen