Migrating to tomcat 6 gives formatted currency amounts problem

Discussion:

Willem Moors

2008-09-10 15:27:51 UTC

I'm transferring my application from a tomcat 5.5.26 server to tomcat
6.0.18, and notice that my formatted currency amounts are not being properly
displayed. Instead of a Pound (GBP) sign I get a question mark within a
black diamond (the app works fine in 5.5.26).

This can easily be emulated. Add the following lines to the
HelloWorldExample.java of the servlet examples in Tomcat 5.5.26 and those of
6.0.18:

java.text.NumberFormat currencyFormat=
java.text.NumberFormat.getCurrencyInstance(Locale.UK);
out.print("Formatted currency (GBP) : " + currencyFormat.format(
1623540.00 ) );

This will display the following :

In Tomcat 6.0.18: Formatted currency (GBP) : <?>1,623,540.00
(I've emulated the question-mark within diamond here, I'll send you a
screenshot if you want)

Tomcat 5.5.26: Formatted currency (GBP) : £1,623,540.00
(depending on your client you may or not may see the pound sign in front of
the above amount)

What can be the problem, is there some extra locale configuration that needs
to be done ?

Thanks for your answer,

Regards,

Willem

Steve Ochani

2008-09-10 15:50:52 UTC

Permalink

Send reply to: Tomcat Users List <***@tomcat.apache.org>
Date sent: Wed, 10 Sep 2008 17:27:51 +0200
From: Willem Moors <***@gmail.com>
To: ***@tomcat.apache.org
Subject: Migrating to tomcat 6 gives formatted currency amounts problem

> I'm transferring my application from a tomcat 5.5.26 server to tomcat
> 6.0.18, and notice that my formatted currency amounts are not being
> properly displayed. Instead of a Pound (GBP) sign I get a question
> mark within a black diamond (the app works fine in 5.5.26).
>
> This can easily be emulated. Add the following lines to the
> HelloWorldExample.java of the servlet examples in Tomcat 5.5.26 and
> those of 6.0.18:
>
> java.text.NumberFormat currencyFormat=3D
> java.text.NumberFormat.getCurrencyInstance(Locale.UK);
> out.print("Formatted currency (GBP) : " + currencyFormat.format(
> 1623540.00 ) );
>
> This will display the following :
>
> In Tomcat 6.0.18: Formatted currency (GBP) : <?>1,623,540.00
> (I've emulated the question-mark within diamond here, I'll send you a
> screenshot if you want)
>
> Tomcat 5.5.26: Formatted currency (GBP) : =A31,623,540.00
> (depending on your client you may or not may see the pound sign in
> front of the above amount)
>

Works fine for me, fresh install of 6.0.18, changed the HelloWorldExample.=
java and
recompiled.

Tried with both IE7 and FF 3.

Are you sure you don't have a httpd in front of tomcat?

I've seen simillar problem when using apache httpd.
I had to turn off the option

AddDefaultCharset

-Steve O.

> What can be the problem, is there some extra locale configuration that
> needs to be done ?
>
> Thanks for your answer,
>
> Regards,
>
> Willem
>

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Willem Moors

2008-09-10 16:58:48 UTC

Permalink

>
> Works fine for me, fresh install of 6.0.18, changed the
> HelloWorldExample.java and
> recompiled.
>
> Tried with both IE7 and FF 3.
>
>
> Are you sure you don't have a httpd in front of tomcat?
>
> I've seen simillar problem when using apache httpd.
> I had to turn off the option
>
> AddDefaultCharset
>
>
> -Steve O.
>

Thanks for your quick response. No there is no Apache Httpd in front of it
(yet). It's strange that you don't have the problem.

The environment in which I'm running both tomcat 6.0.18 and tomcat 5.5.26 is
on Ubuntu 64-bit linux with this java version:
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)

Regards,

Willem

Steve Ochani

2008-09-10 17:55:02 UTC

Permalink

Send reply to: Tomcat Users List <***@tomcat.apache.org>
Date sent: Wed, 10 Sep 2008 18:58:48 +0200
From: Willem Moors <***@gmail.com>
To: Tomcat Users List <***@tomcat.apache.org>
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> >
> > Works fine for me, fresh install of 6.0.18, changed the
> > HelloWorldExample.java and
> > recompiled.
> >
> > Tried with both IE7 and FF 3.
> >
> >
> > Are you sure you don't have a httpd in front of tomcat?
> >
> > I've seen simillar problem when using apache httpd.
> > I had to turn off the option
> >
> > AddDefaultCharset
> >
> >
> > -Steve O.
> >
>
> Thanks for your quick response. No there is no Apache Httpd in front
> of it (yet). It's strange that you don't have the problem.
>
> The environment in which I'm running both tomcat 6.0.18 and tomcat
> 5.5.26 is on Ubuntu 64-bit linux with this java version: java version
> "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java
> HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
>

Hmm odd.

I tried it on my Redhat test server and worked fine also.

Is your tomcat 6 install a default/fresh install?

What browser are you using? What character encoding does it think the HelloWorldExample
output is coming in as?

-Steve O.

>
> Regards,
>
> Willem
>

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Willem Moors

2008-09-11 06:15:29 UTC

Permalink

On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani <***@ncc.edu> wrote:

> Hmm odd.
>
> I tried it on my Redhat test server and worked fine also.
>
> Is your tomcat 6 install a default/fresh install?
>
> What browser are you using? What character encoding does it think the
> HelloWorldExample
> output is coming in as?
>

Odd indeed!

The tomcat6 install is from a fresh install. The browser I'm using is FF3.

Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app
(same war-file), running on the same hardware using exactly the same java.
And to display the app I use one and the same browser (with different tabs)
but still my application gives this difference:
http://www.laadruim.com/issue/comparison_currrency_problem.png
(I don't know if it's proper to use attachments in posting to this list, so
I made the pic available on that URL).

Willem

Johnny Kewl

2008-09-11 08:48:32 UTC

Permalink

----- Original Message -----
From: "Willem Moors" <***@gmail.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 8:15 AM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani <***@ncc.edu> wrote:
>
>> Hmm odd.
>>
>> I tried it on my Redhat test server and worked fine also.
>>
>> Is your tomcat 6 install a default/fresh install?
>>
>> What browser are you using? What character encoding does it think the
>> HelloWorldExample
>> output is coming in as?
>>
>
> Odd indeed!
>
> The tomcat6 install is from a fresh install. The browser I'm using is FF3.
>
> Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app
> (same war-file), running on the same hardware using exactly the same java.
> And to display the app I use one and the same browser (with different
> tabs)
> but still my application gives this difference:
> http://www.laadruim.com/issue/comparison_currrency_problem.png
> (I don't know if it's proper to use attachments in posting to this list,
> so
> I made the pic available on that URL).
>
> Willem

Will if possible use
&pound
instead... that I think its font independent...

Otherwise I think you have to sorround that
getCurrencyInstance
stuff with a font... and tell it what font it must use...

... I think

I'm just wondering how the systems guess the character set from
getCurrencyInstance... I think the answer is there...

I think this because in a text editor if you insert a pound symbol you also
have to choose it from a font set and not all fonts support it...

So.. its getting inserted on some unknown font... and then the browser has
to guess it...

Its something like that.... &pound may be easier
Have fun...
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Willem Moors

2008-09-11 10:16:29 UTC

Permalink

>
> Will if possible use
> &pound
> instead... that I think its font independent...
>
> Otherwise I think you have to sorround that
> getCurrencyInstance
> stuff with a font... and tell it what font it must use...
>
> ... I think
>
> I'm just wondering how the systems guess the character set from
> getCurrencyInstance... I think the answer is there...
>
> I think this because in a text editor if you insert a pound symbol you also
> have to choose it from a font set and not all fonts support it...
>
> So.. its getting inserted on some unknown font... and then the browser has
> to guess it...
>
> Its something like that.... &pound may be easier
> Have fun...
>
Definitely having fun ! ;-)

Thanks for your suggestion in using the pound sign / and the fonts, but the
'getCurrencyInstance' is supposed to hide all that from me.

I rather think it has something to do with the Tomcat 6 configuration,
because all else is equal: same server with same jave / same app / same
client / .. only in Tomcat 5.5 it does work, and in Tomcat 6 it doesn't.

Konstantin Kolinko

2008-09-11 08:59:08 UTC

Permalink

2008/9/11 Willem Moors <***@gmail.com>:
> On Wed, Sep 10, 2008 at 7:55 PM, Steve Ochani <***@ncc.edu> wrote:
>
>> Hmm odd.
>>
>> I tried it on my Redhat test server and worked fine also.
>>
>> Is your tomcat 6 install a default/fresh install?
>>
>> What browser are you using? What character encoding does it think the
>> HelloWorldExample
>> output is coming in as?
>>
>
> Odd indeed!
>
> The tomcat6 install is from a fresh install. The browser I'm using is FF3.
>
> Really, apart from Tomcat 5.5 and 6, all else is equal: it's the same app
> (same war-file), running on the same hardware using exactly the same java.
> And to display the app I use one and the same browser (with different tabs)
> but still my application gives this difference:
> http://www.laadruim.com/issue/comparison_currrency_problem.png
> (I don't know if it's proper to use attachments in posting to this list, so
> I made the pic available on that URL).
>
> Willem
>

1. What the _Browser_ thinks about encoding of your page.

In menu View > Encoding > what encoding is auto-selected there.

2. In Page Info dialog of Firefox
(in Tools menu or in context menu > Page Info )

what is Encoding, Content Type, and what META tags are mentioned (does
it include Content-Type tag)

(disclaimer: I have a localized version of FF, so the above names are
translated ones).

3. Save both pages as HTML (choose "HTML only" format when saving), and compare
their text.

Is there any difference?

4. Well, £ (notice the trailing ';'), or better £ should
display the pound sign
irregardless of what encoding the browser thinks that your page uses.

Use the &#..; notation if generic xml processing is involved (the
£ entity is defined
for (X)HTML only).

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Mark Hagger

2008-09-11 09:35:51 UTC

Permalink

You are almost certainly having a problem with (default) character
encodings on your system, usual things to check are the encoding that
the JVM is using, for example what does:

echo $LANG

return (usually controlled by what's defined in /etc/sysconfig/i18n -
although I'm not familiar with Ubuntu systems).

The most likely thing is that the tomcat servlet is effectively
generating content in UTF-8, and then trying to return this character to
the end client, via a PrintWriter, in ISO1 where the currency symbol in
use is not supported by ISO1, hence the '?'. Alternatively tomcat is
returning either ISO1 or UTF-8 characters but not declaring them as such
in its response headers, leaving the browser confused and its choosing
the wrong "default". Be useful to know what headers tomcat is returning
really.

I can't begin to count the number of times I've had problems with
character encoding issues in the past, both on response and request
handling, fortunately the general trend for everything (including mobile
browsers) to support UTF-8 is slowly making life much much easier.

Mark

________________________________________________________________________
This email has been scanned for all known viruses by the MessageLabs SkyScan service.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Willem Moors

2008-09-11 10:36:48 UTC

Permalink

On Thu, Sep 11, 2008 at 11:35 AM, Mark Hagger <***@m-spatial.com>wrote:

> You are almost certainly having a problem with (default) character
> encodings on your system, usual things to check are the encoding that
> the JVM is using, for example what does:
>
> echo $LANG
>
> return (usually controlled by what's defined in /etc/sysconfig/i18n -
> although I'm not familiar with Ubuntu systems).

echo $LANG gives me this:
en_US.UTF-8

> The most likely thing is that the tomcat servlet is effectively
> generating content in UTF-8, and then trying to return this character to
> the end client, via a PrintWriter, in ISO1 where the currency symbol in
> use is not supported by ISO1, hence the '?'. Alternatively tomcat is
> returning either ISO1 or UTF-8 characters but not declaring them as such
> in its response headers, leaving the browser confused and its choosing
> the wrong "default". Be useful to know what headers tomcat is returning
> really.

But then, it would be the same issue for tomcat 5.5, no ? And there it
doesn't go wrong...
Like stated earlier: I rather think it has something to do with the Tomcat 6
configuration, because all else is equal: same server with same java / same
webapp / same client(FF4) / .. only in Tomcat 5.5 it does work, and in
Tomcat 6 it doesn't.

Thanks for your reply, Mark!

Regards,

Willem

Johnny Kewl

2008-09-11 11:42:03 UTC

Permalink

----- Original Message -----
From: "Willem Moors" <***@gmail.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 12:36 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> On Thu, Sep 11, 2008 at 11:35 AM, Mark Hagger
> <***@m-spatial.com>wrote:
>
>> You are almost certainly having a problem with (default) character
>> encodings on your system, usual things to check are the encoding that
>> the JVM is using, for example what does:
>>
>> echo $LANG
>>
>> return (usually controlled by what's defined in /etc/sysconfig/i18n -
>> although I'm not familiar with Ubuntu systems).
>
>
> echo $LANG gives me this:
> en_US.UTF-8
>
>
>
>
>> The most likely thing is that the tomcat servlet is effectively
>> generating content in UTF-8, and then trying to return this character to
>> the end client, via a PrintWriter, in ISO1 where the currency symbol in
>> use is not supported by ISO1, hence the '?'. Alternatively tomcat is
>> returning either ISO1 or UTF-8 characters but not declaring them as such
>> in its response headers, leaving the browser confused and its choosing
>> the wrong "default". Be useful to know what headers tomcat is returning
>> really.
>
> But then, it would be the same issue for tomcat 5.5, no ? And there it
> doesn't go wrong...
> Like stated earlier: I rather think it has something to do with the Tomcat
> 6
> configuration, because all else is equal: same server with same java /
> same
> webapp / same client(FF4) / .. only in Tomcat 5.5 it does work, and in
> Tomcat 6 it doesn't.
>
> Thanks for your reply, Mark!
>
> Regards,
>
> Willem

Will, I cant see how TC can be influencing it....
You write a char (the pound) to an output stream it appears differently in
browser...
TC is just sendign what it gets...
Its got to be this...
NumberFormat.getCurrencyInstance(Locale.UK)
and that is Java... so I conclude... TC 6 is not on the same JDK/JRE as TC 5

You JAVA has changed... must be..

That stuff that you like is LOCALE stuff... and that stuff can all be
configured from outside Java...

You are choosing a Locale... but if the font.property files in JRE/LIB
are different... its probably picking a wide super new Sun font... which in
swing will make no diffs... but
where the old JRE was using the something a browser gets... the new GB_SUPER
font with english flags and the national anthem... confuses current
browsers.

.... I think... you looking in the wrong place...

Convert it to bytes... and print that... you will see it... I think

Then just to confince yourself that TC is not doing a weird Arabic header...
get the header plugin for FireFox... and have a look...
I doubt they diffs...

Have more fun...
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Willem Moors

2008-09-11 13:17:49 UTC

Permalink

On Thu, Sep 11, 2008 at 1:42 PM, Johnny Kewl <***@kewlstuff.co.za> wrote:

> Will, I cant see how TC can be influencing it....
> You write a char (the pound) to an output stream it appears differently in
> browser...
> TC is just sendign what it gets...
> Its got to be this...
> NumberFormat.getCurrencyInstance(Locale.UK)
> and that is Java... so I conclude... TC 6 is not on the same JDK/JRE as TC
> 5
>
> You JAVA has changed... must be..

Sorry to have to disappoint you, but this server was installed just a few
days ago, and there is only ONE JDK on it:
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)

So it's impossible the TC5.5 uses a diffferent Java then TC6.

> That stuff that you like is LOCALE stuff... and that stuff can all be
> configured from outside Java...
>
> You are choosing a Locale... but if the font.property files in JRE/LIB
> are different... its probably picking a wide super new Sun font... which in
> swing will make no diffs... but
> where the old JRE was using the something a browser gets... the new
> GB_SUPER font with english flags and the national anthem... confuses current
> browsers.
>
> .... I think... you looking in the wrong place...
>
> Convert it to bytes... and print that... you will see it... I think

Can it be one of the libraries (*.jar) that is different, that forcec TC6 to
act differently ?

> Then just to confince yourself that TC is not doing a weird Arabic
> header... get the header plugin for FireFox... and have a look...
> I doubt they diffs...

That is a good track to follow ! Thanks for this advice.

> Have more fun...

Thanks!

Willem

Johnny Kewl

2008-09-11 14:28:07 UTC

Permalink

----- Original Message -----
From: "Willem Moors" <***@gmail.com>

>> .... I think... you looking in the wrong place...
>>
>> Convert it to bytes... and print that... you will see it... I think
>
> Can it be one of the libraries (*.jar) that is different, that forcec TC6
> to
> act differently ?

--- Will's Phantom Font Project ---

I been trying to find a way for you to set the font you want for a locale...
It does seem to be an option in JAVA... ie I think Java is expecting to find
that from a GUI

But here is the whole story....
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale

Notice that on linux there are things like it depends if the font server
starts up... yada yada.
I'm totally surprized that its the same JRE...

I think it may be possible that something else is setting the font... and
then the JRE is using that.
The above link actually gives you a way to find out what font is been picked
up...

But... I think this is all wrong anyway... say you get it figured out, and
pick Heleva... or whatever... then you now have to tell the browser to use
that in CSS or whatever.... its the beginning of a complex cycle...

&pound.... is making it the browsers problem and internally the browser will
find a font and make it happen...

And then if someone moves your servlet to a headless linux.... here we go
again... is the font there... etc

I think you can get it to work, and it is interesting... but I'm not sure
you want to...

I'd luv to know if the theory is right on your system... ie run this

String s = currencyFormat.format(1623540.00 );
byte[] ba = s.getBytes();
String ans = "";
for (int i = 0; i < ba.length; i++) {
ans += Integer.toHexString(ba[i]);
}
System.out.print("DA BYTES : " + ans);

See if the bytes are changing... ie the fonts are changing...

... that me out of idea's... other than it look like Java's localization can
nail you... and I'm now worrying about some of my systems... ha ha.
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-11 16:42:20 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johnny,

Johnny Kewl wrote:
> I think it may be possible that something else is setting the font...
> and then the JRE is using that.

I think you're totally confusing yourself about font issues. Java only
interacts with fonts of any kind when running AWT/Swing apps. Webapps
have no interactions with fonts of any kind.

The font used to display the web page is entirely dependent on the web
browser. The web browser chooses the font based upon the style of the
text to be displayed, and the language it's being displayed in.

The likely problem, here, is the encoding appearing in the Content-Type
header from the server.

It's possible that Willem's 5.x server is configured with a Valve to set
the default character encoding, and that the 6.x server is not similarly
configured.

Willem, can you post the relevant sections of your server.xml files from
each version? If you can't figure out what's relevant, just post the
entire thing.

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjJSmwACgkQ9CaO5/Lv0PCrjQCgnyTGy7SuYmJQme+uJRo+kpkH
qu0AniqswmAHi50a/6NgQlyuWJbP4U3x
=jBNr
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 18:16:16 UTC

Permalink

----- Original Message -----
From: "Christopher Schultz" <***@christopherschultz.net>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 6:42 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Johnny,
>
> Johnny Kewl wrote:
>> I think it may be possible that something else is setting the font...
>> and then the JRE is using that.
>
> I think you're totally confusing yourself about font issues. Java only
> interacts with fonts of any kind when running AWT/Swing apps. Webapps
> have no interactions with fonts of any kind.

Chris... exactly yes... it turns out he want setting headers, so you
absolutely right...
but his code is introducing a font into a web app and thats what I'm
wondering about...

Forget about the webapp for a moment and just look at his code...

java.text.NumberFormat currencyFormat=
java.text.NumberFormat.getCurrencyInstance(Locale.UK);
out.print("Formatted currency (GBP) : " + currencyFormat.format(
1623540.00 ) );

Its generating a pound... the question is, the webapp is not dicatation the
font... so I'm asking what font is being used for the pound?

And then yes... it so happens that he has found the encoding that works in
text plain... but its a flook, is lucky, its a problem waiting to happen
because if I change that locale of his to french, german, chinese... what
font is that now going to be... and that will probably definitely not work
in default US encoding...

Theres a few problem here...

He *is* introducing a font into a webapp.... and we dont even know what it
is?

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Hassan Schroeder

2008-09-11 18:58:33 UTC

Permalink

On Thu, Sep 11, 2008 at 11:16 AM, Johnny Kewl <***@kewlstuff.co.za> wrote:

> Its generating a pound... the question is, the webapp is not dicatation the
> font... so I'm asking what font is being used for the pound?

Whatever the browser picks from what it has available. :-)

> He *is* introducing a font into a webapp....

No. A character, a codepoint, yes, not a font.

--
Hassan Schroeder ------------------------ ***@gmail.com

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 19:54:03 UTC

Permalink

----- Original Message -----
From: "Hassan Schroeder" <***@gmail.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 8:58 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> On Thu, Sep 11, 2008 at 11:16 AM, Johnny Kewl <***@kewlstuff.co.za>
> wrote:
>
>> Its generating a pound... the question is, the webapp is not dicatation
>> the
>> font... so I'm asking what font is being used for the pound?
>
> Whatever the browser picks from what it has available. :-)
>
>> He *is* introducing a font into a webapp....
>
> No. A character, a codepoint, yes, not a font.

I tell you Wils example has confused the hell out of me... ha ha
Wil... you have caused chaos... ha ha

I'm probably using definition incorrectly.... lets just say you
internationalizing on a page...so you have

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

So you can display multiple langauges....

Now you designing a web page... you pick Arial... you select ® (registered
trade mark as a font if it doesnt come out)

And life is good....

But when that gets done for you... not from your own resource bundles, but
from a locale that can be using any character point in a font....
and you dont know what the font actually is.... the charset wont even help
you because how does the browser know it was Arial?

If it diplays it in MS Serif... surely its going to be wrong...

Its not really a browser problem thats bugging me... its the local "gives"
you something, it varies, especially on a headless linux and you cant assume
its anything....

Even worse if a chinese font has not been installed... its probably a ?....

I think one has to use &pound because Java's localization in this area is
unpredicatable...

So if you do want to use the pound symbols from localization... you also
have to discover the font (some how) and then you have to add that HTML to
CSS code to your page....

Or maybe Java is a whole lot smarter than I'm giving it credit for and its
embedding font attributed in the UTF8 or something...

I dont know... all I do know is that putting &pound in your Resource bundle
is a whole lot easier...

Totally confused... but I think if Wil is internationalizing that app... its
going to give him a huge head ache....

They disnt make &pound and &reg and all the rest for nothing... I think its
because it is a major head ache otherwise...

... I dont know... Wils phantom font has got me... ;)
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Hassan Schroeder

2008-09-11 21:07:48 UTC

Permalink

On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl <***@kewlstuff.co.za> wrote:

> Now you designing a web page... you pick Arial...

> have to discover the font (some how) and then you have to add that HTML to
> CSS code to your page....

Do you not understand that style information, including fonts, is just
a "serving suggestion"? A user-agent has *no* obligation to use any
given font, or any font at all.

If I'm looking at your page in Lynx, the font will be whatever my own
terminal window settings specify, be it Comic Sans or Copperplate
Gothic Bold.

If I use wget to grab a page and store it into a file or a DB, there is no
"font" information involved at any point whatsoever -- it's just character
data in some specified (or assumed!) encoding.

If a user-agent is intended to generate a visual display /and/ has a
font available to it with a glyph matching a specified code-point in a
specified encoding, great. If not -- so sorry. Doesn't matter whether
you were using HTML entities or numeric representation: <?> is it.

FWIW,
--
Hassan Schroeder ------------------------ ***@gmail.com

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 21:41:58 UTC

Permalink

----- Original Message -----
From: "Hassan Schroeder" <***@gmail.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 11:07 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl <***@kewlstuff.co.za>
> wrote:
>
>> Now you designing a web page... you pick Arial...
>
>> have to discover the font (some how) and then you have to add that HTML
>> to
>> CSS code to your page....
>
> Do you not understand that style information, including fonts, is just
> a "serving suggestion"? A user-agent has *no* obligation to use any
> given font, or any font at all.

http://www.kewlstuff.co.za/test/test.htm

What do you see in this test page?

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Hassan Schroeder

2008-09-11 21:52:04 UTC

Permalink

On Thu, Sep 11, 2008 at 2:41 PM, Johnny Kewl <***@kewlstuff.co.za> wrote:

> http://www.kewlstuff.co.za/test/test.htm
>
> What do you see in this test page?

problems :-)

<http://validator.w3.org/check?uri=http%3A%2F%2Fwww.kewlstuff.co.za%2Ftest%2Ftest.htm&charset=%28detect+automatically%29&doctype=Inline&ss=1&group=0&verbose=1&user-agent=W3C_Validator%2F1.591>

--
Hassan Schroeder ------------------------ ***@gmail.com

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Markus Schönhaber

2008-09-11 22:13:06 UTC

Permalink

Johnny Kewl wrote:

> http://www.kewlstuff.co.za/test/test.htm
>
> What do you see in this test page?

The output of a server that lies right to my face.
It says, it is serving UTF-8-encoded text, while it really serves text
encoded with some 8-bit charset - probably ISO-8859-1.

Regards
mks

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-12 04:01:30 UTC

Permalink

> From: Johnny Kewl [mailto:***@kewlstuff.co.za]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> http://www.kewlstuff.co.za/test/test.htm
> What do you see in this test page?

Depends on which character encoding I choose to view the page in. For the declared UTF-8, FF3 shows the invalid hex value at that spot in your page. If I override that with say ISO-8859-15, the R in a circle appears. Note that no font is involved here, just the encoding declaration.

You need to get over this fixation with fonts - they have absolutely nothing to do with this issue. A font is just a graphical description of how to draw one or more code points on an output device, based on the font designer's take on what each code point should look like. It's the character encoding that tells the message recipient what code point to generate for a given bit pattern; only after the code point is determined does any font get involved to create the visible symbol.

This is a great site to get lost in for a few days:
http://www.unicode.org/

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-12 08:49:15 UTC

Permalink

----- Original Message -----
From: "Caldarale, Charles R" <***@unisys.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Friday, September 12, 2008 6:01 AM
Subject: RE: Migrating to tomcat 6 gives formatted currency amounts problem

> From: Johnny Kewl [mailto:***@kewlstuff.co.za]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> http://www.kewlstuff.co.za/test/test.htm
> What do you see in this test page?

Depends on which character encoding I choose to view the page in. For the
declared UTF-8, FF3 shows the invalid hex value at that spot in your page.
If I override that with say ISO-8859-15, the R in a circle appears. Note
that no font is involved here, just the encoding declaration.

You need to get over this fixation with fonts - they have absolutely nothing
to do with this issue. A font is just a graphical description of how to
draw one or more code points on an output device, based on the font
designer's take on what each code point should look like. It's the
character encoding that tells the message recipient what code point to
generate for a given bit pattern; only after the code point is determined
does any font get involved to create the visible symbol.

This is a great site to get lost in for a few days:
http://www.unicode.org/

- Chuck

Yes, I do that, mix terminology....

But can I just get your opinion on this...

If this locale stuff is in fact defaulting to an ISO char set that can do
these symbols... and say you where making a non english page, say
Japanese... do you think that its possible to use it?

I've actually now seen examples on the web that are doing it Wil's way, they
using the getCurrencyInstance to make the currency symbols.
And it is the most natural thing in the world for a coder to want to do...
the functions are synonymous with internationalization.
Its probably in the Java manaul...

But I'm thinking its a US/Eng only methodology... when applied to a web
page.
Do you think using getCurrencyInstance is generalizable in other languages?

When you say.... "If I override that with say ISO-8859-15", is that the
whole page you talking about, or it possible to have different character
encoding sections in a web page.... thats another area thats confusing me
now, because if I do look at that test page in a MS tool... it displays
correctly with mixed encodings?

You see... people are saying in a well designed web page... its a
suggestion, I get that.
But when you choose a font in a text editor like Swing or Word, you are also
picking some character set... and thats whats been injected into the page as
its been formed... Or in a MS localization panel, if you choose you want
Verdana as a default font... these systems dont throw character sets at
users, it just picks one in the background... thus my analogy... and its the
cross over between these systems thats got me confused ;)

I screw up terminology... ok we all know that.... but
Does Wil need to worry about the way he is doing it?... thats all I'm
asking... I think so...

Thanks...

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-12 16:29:45 UTC

Permalink

> From: Johnny Kewl [mailto:***@kewlstuff.co.za]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> If this locale stuff is in fact defaulting to an ISO char set
> that can do these symbols...

There's the basic problem - anytime you allow defaults to come into play you put yourself at risk.

> and say you where making a non english page, say
> Japanese... do you think that its possible to use it?

Certainly, and you should use it - but with the desired Locale specified, not using whatever the default happens to be at that instant.

> they using the getCurrencyInstance to make the currency symbols.

But, if you want a specific currency symbol (e.g., Yen, Pound Sterling), the Locale should be explicitly provided on the API call; only if you want to use the platform's default should the getCurrencyInstance() without an argument be used.

> But I'm thinking its a US/Eng only methodology...

Nope, it's universal. Java supports a seemingly infinite number of locales.

> When you say.... "If I override that with say ISO-8859-15",
> is that the whole page you talking about

Yes, I was setting the browser to use a fixed encoding rather than the one in the HTTP header or the browser default.

> it possible to have different character encoding sections
> in a web page....

I don't know HTML well enough to completely answer that question, but I believe HTTP uses the last character set header specified, and all HTTP headers must precede the HTML. You should be able to achieve the desired effect with frames. However, if you just use UTF-8, you don't need to worry about, since that includes every code point in the known universe.

> if I do look at that test page in a MS tool...
> it displays correctly with mixed encodings?

MS cheats at every opportunity, seemingly avoiding standards whenever they can. IE likes to guess at the intent of the web page, sometimes getting it right, often getting it horribly wrong.

> But when you choose a font in a text editor like Swing or
> Word, you are also picking some character set...

Nope - most editors do not let you choose the character encoding, they just use the platform default. Some do let you choose a UTF-x flavor in lieu of the platform default, which is quite desirable. Some fonts (e.g., Wingdings) redefine the glyphs for given code points in order to display oddball symbols within a non-Unicode encoding; these were pretty much all developed before Unicode came into widespread use, but are still around for compatibility.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-12 16:33:12 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johnny,

Johnny Kewl wrote:
> If this locale stuff is in fact defaulting to an ISO char set that can
> do these symbols... and say you where making a non english page, say
> Japanese... do you think that its possible to use it?

It is up to your browser to choose a font that is appropriate for all
glyphs (that is, a graphical representation of a code point) that need
to be displayed. Some fonts do not support all codepoints because they
don't have all the glyphs. For instance, if you have a string in English
and also Sanskrit, your browser is likely to display one string in one
font (maybe Arial) and the other in another font (say, Sanskrit).

Let's say that the browser comes across the £ entity. £ maps
directly to 8-bit hex character code 0xa3
(http://htmlhelp.com/reference/html40/entities/latin1.html). Whether you
put £ or £ in your HTML, the browser should render it properly --
possibly switching fonts to one that supports that code point for that
character only.

The problem with your page is not that the £ symbol is not available in
the font the browser chose. Your problem is that you illegally encoded
it into the page in the first place (or, equivalently, you advertise the
wrong encoding for the page, which is really the same thing).

If you re-write your page to declare some <font> around that symbol, you
will never be able to get it to work, unless you use the browser to
override the server-declared encoding (as Chuck did, when things render
properly when using ISO-8859-1).

> I've actually now seen examples on the web that are doing it Wil's way,
> they using the getCurrencyInstance to make the currency symbols.

Use of Java's built-in currency-symbol-generating methods are likely to
produce a proper £ symbol. If you have your encoding chain set up
properly, it should go from NumberFormat.format() straight to your web
page without a hint of difficulty.

> But I'm thinking its a US/Eng only methodology... when applied to a web
> page.
> Do you think using getCurrencyInstance is generalizable in other languages?

Absolutely. The only reason $ is a magic symbol is because it's part of
US-ASCII and low enough in the symbol table so that it never gets
screwed up by incorrect encodings. Symbols like £ or € do not share that
luxury and are therefore error-prone when administrators poorly
configure their servers. It's further compounded by the fact that many
English-specking coders forget that there are other people in the world. :(

> When you say.... "If I override that with say ISO-8859-15", is that the
> whole page you talking about, or it possible to have different character
> encoding sections in a web page.... thats another area thats confusing
> me now, because if I do look at that test page in a MS tool... it
> displays correctly with mixed encodings?

The encoding is for the entire document, not just a single character.
basically, you sent an illegal character code. It would be like sending
6 bits of an 8-bit byte. In fact, that's /exactly/ what you did because,
to a UTF-8 renderer, your set of 8 bits looks like there should be
something else /before/ it in order to make it legal. Your server said
"hey, client... I'm gonna send you a bunch of oranges" and then went
right ahead and sent apples mixed-in with those oranges.

> But when you choose a font in a text editor like Swing or Word, you are
> also picking some character set... and thats whats been injected into
> the page as its been formed...

Yes and no. Many encodings are limited by a particular character set
(for instance, US-ASCII is never going to have Sanskrit letters in it).
But that'd why Unicode was invented: to make sure that anything we'd
ever possibly want to show on the screen is possible because we have
enough bits to display it. (My understanding is that Unicode (16-bit) is
actually not big enough for everything, but hey, they tried). The beauty
of UTF-8 is that every character you'd want to display has its own code
that nobody can steal -- regardless of the font being used.

The lesson is to always use UTF-8 and make sure you actually have
everything working properly. If your server is saying "utf-8" but the
character encoding on your servlet Writer is actually "ISO-8859-1" then
you haven't done your job and your web pages are going to look broken
when non-latin characters are thrown in there. The same is true if you
are serving static content (as I suspect you are in your example) and
advertising that it is "utf-8" but the file was written with ISO-8859-1
(or something else). (In your case, the problem is that text files
contain no explicit encoding information in them, so the server has to
guess -- or, more likely, there's no guessing going on, and the server
just blindly uses whatever its default has been configured to be.)

> I screw up terminology... ok we all know that.... but
> Does Wil need to worry about the way he is doing it?... thats all I'm
> asking... I think so...

The short answer is no: Wil does not need to worry. If his code is
generating a proper € or £ then, as long as the server isn't lying
abound the encoding, everything will be fine.

Unless the browser sucks. ;)

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKmcgACgkQ9CaO5/Lv0PAIVACfT+P6XVbLFDngXT6+C5jEzAQ8
TXUAoKVtwsaijbpdfTY9mEISD7G4Ho+t
=35Pr
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-12 16:43:28 UTC

Permalink

> From: Christopher Schultz [mailto:***@christopherschultz.net]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> (My understanding is that Unicode (16-bit) is actually not
> big enough for everything, but hey, they tried).

Point of clarification: Unicode is NOT limited to 16 bits (not even in Java, these days). There are defined code points that use 32 bits, and I don't think there's a limit, if you use the defined extension mechanisms. Again, browsing the Unicode web site is extremely enlightening.

> Unless the browser sucks. ;)

Let me guess which browser that is; does it start with an I?

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-12 17:09:55 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chuck,

Caldarale, Charles R wrote:
>> From: Christopher Schultz [mailto:***@christopherschultz.net]
>> Subject: Re: Migrating to tomcat 6 gives formatted currency amounts
>> problem
>>
>> (My understanding is that Unicode (16-bit) is actually not big
>> enough for everything, but hey, they tried).
>
> Point of clarification: Unicode is NOT limited to 16 bits (not even
> in Java, these days).

Sorry, I was trying to say 16-bit Unicode without saying UTF-16 (which
is not the same).

And regarding Java... the 'char' data type is /defined/ to be 16-bits
wide
(http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1).
Has this changed? When? (And how!?)

I always thought it was weird for Java to use 16-bit Unicode internally,
but then use UTF-8 for all serialized strings. I guess that's what you
get when you try to minimize file sizes and download times.

> There are defined code points that use 32
> bits, and I don't think there's a limit, if you use the defined
> extension mechanisms. Again, browsing the Unicode web site is
> extremely enlightening.
>
>> Unless the browser sucks. ;)
>
> Let me guess which browser that is; does it start with an I?

I usually spell it with an 'M'. ;)

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKomMACgkQ9CaO5/Lv0PC1OQCeP8FkNni/J320StYPF4lNeQWi
o84AnReYYyjaF+ljUub4wJ2HSkcOA3Jk
=JJir
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-12 17:29:08 UTC

Permalink

> From: Christopher Schultz [mailto:***@christopherschultz.net]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> the 'char' data type is /defined/ to be 16-bits wide
> (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1).
> Has this changed? When? (And how!?)

A char is still 16 bits, but you can now have 21-bit code points:
http://java.sun.com/javase/6/docs/api/java/lang/Character.html#unicode

These are manipulated via the int type, rather than char.

> I always thought it was weird for Java to use 16-bit Unicode
> internally

Back when Java was being defined, Unicode still was 16-bit, but not in widespread use.

> but then use UTF-8 for all serialized strings

Mostly for easy interoperation with existing editors, comm handlers, browsers, etc., which were all byte oriented and, at the time, still largely ASCII. The day-one existence of character encoders in Java permitted use in non-ASCII environments.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-12 19:58:56 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chuck,

Caldarale, Charles R wrote:
>> From: Christopher Schultz [mailto:***@christopherschultz.net]
>> Subject: Re: Migrating to tomcat 6 gives formatted currency amounts
>> problem
>>
>> the 'char' data type is /defined/ to be 16-bits wide
>> (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1).
>> Has this changed? When? (And how!?)
>
> A char is still 16 bits, but you can now have 21-bit code points:
> http://java.sun.com/javase/6/docs/api/java/lang/Character.html#unicode
>
>
> These are manipulated via the int type, rather than char.

Interesting... so, Java is still 16-bit Unicode in its char primitive,
but you can use ints to hold UTF-16 values using 21-bits? Wo, that's
confusing... especially since java.lang.Character only takes a char as a
constructor parameter :(

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKygAACgkQ9CaO5/Lv0PB5lgCfSaUnFHFx+OaL87mPtCsGcTOd
pkwAn0ob9OTMfrGCXk4udHyKg627Fd2k
=XWif
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-13 01:00:58 UTC

Permalink

> From: Christopher Schultz [mailto:***@christopherschultz.net]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> so, Java is still 16-bit Unicode in its char primitive,
> but you can use ints to hold UTF-16 values using 21-bits?

The 21-bit values are represented by pairs of Java chars, the first from the UTF-16 high-surrogate range, the second from the low-surrogate range. The 21-bit code point can be accessed as an int by some of the java.lang.Character methods introduced in 1.5.

> especially since java.lang.Character only takes a char as a
> constructor parameter :(

Yes, I think all the new Character methods related to code points are static; there are corresponding instance methods in java.lang.String though.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-13 12:31:24 UTC

Permalink

Caldarale, Charles R wrote:
>> From: Christopher Schultz [mailto:***@christopherschultz.net]
>> Subject: Re: Migrating to tomcat 6 gives formatted currency
>> amounts problem
>>
>> so, Java is still 16-bit Unicode in its char primitive,
>> but you can use ints to hold UTF-16 values using 21-bits?
>
> The 21-bit values are represented by pairs of Java chars, the first from the UTF-16 high-surrogate range, the second from the low-surrogate range. The 21-bit code point can be accessed as an int by some of the java.lang.Character methods introduced in 1.5.
>
>> especially since java.lang.Character only takes a char as a
>> constructor parameter :(
>
> Yes, I think all the new Character methods related to code points are static; there are corresponding instance methods in java.lang.String though.
>
There is some information about this in the link that Johnny pointed out
(excellent and very readable document in general (for a change)) :

http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale
paragraph : How is text represented in the Java platform?

And there is more here :

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#unicode

It's amazing what one finds, when one knows what one is looking for..

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-12 20:03:29 UTC

Permalink

Caldarale, Charles R wrote:
>> From: Christopher Schultz [mailto:***@christopherschultz.net]
>> Subject: Re: Migrating to tomcat 6 gives formatted currency
>> amounts problem
>>
>> (My understanding is that Unicode (16-bit) is actually not
>> big enough for everything, but hey, they tried).
>
> Point of clarification: Unicode is NOT limited to 16 bits (not even in Java, these days). There are defined code points that use 32 bits, and I don't think there's a limit, if you use the defined extension mechanisms. Again, browsing the Unicode web site is extremely enlightening.
>
Further clarification :
Unicode is not limited to anything. Unicode is (aims to be) a list
which attributes to any distinct character known to man, a number, from
0 to infinity. The particular position number given to a particular
character in this Unicode list is known as its "Unicode codepoint".
The Unicode group (consortium ?) also tries to do this with some order,
such as trying to keep together (with consecutive codepoints) various
groups of characters that are logically related in some way.
For example (but probably because they had to start somewhere), the
first 128 codepoints match the original 7-bit US-ASCII alphabet;
so for instance the "capital letter A", which has code \x41 in US-ASCII,
happens to have Unicode codepoint \x0041 (both 65 in decimal terms).
For example also, the same first 128 codepoints, plus the next 128
codepoints, match the iso-8859-1 alphabet (also known as iso-latin-1);
thus the character known as "capital letter A with umlaut" (an A with a
double-dot on top) has the codepoint \x00C4 in Unicode, and the code
\xC4 in iso-8859-1 (both 196 in decimal).

New Unicode characters (and codepoints) are being added all the time (I
think there's even Klingon in there), but there are also holes in the
list (presumably left for whenever some forgotten related character
shows up).

A quite different issue is encoding.

Because it would be quite impractical to specify a series of characters
just by writing their codepoints one after the other (using whatever
number of bits each codepoint needs), a series of clever schemes have
been devised in order to pass Unicode strings around, while being able
to separate them into characters, and keep each one with its proper
codepoint.
Such schemes are known as "Unicode encodings" with names such as UTF-2,
UTF-7, UTF-8, UTF-16, UTF-32, etc..
Each one of them specifies an algorithm whereby one can take any Unicode
character (or rather, its codepoint), and "encode" it into a series of
bits, in such a way that at the receiving end, an opposite algorithm can
be used to "decode" that series of bits and retrieve once again the same
series of Unicode codepoints (or characters).

UTF-16, for example, is an encoding of Unicode which uses always 16 bits
for each Unicode codepoint; but it is to my knowledge incomplete,
because since it uses a fixed number of 16 bit per character, it can
thus only ever represent no more than the first 65,532 Unicode
characters. (But we're not there yet, and there is still some leeway).

UTF-8 on the other hand is a variable-length scheme, using 1, 2, 3, or
more 8-bit groups to represent each Unicode codepoint. And it is in
principle not limited, as there are extension mechanisms foreseen for
whenever the need arises (imagine that some aliens suddenly show up, and
that they happen to write in 167 different languages and alphabets).

One frequent misconception is that in UTF-8, the first 256 "character
encoding bit sequences" match the iso-8859-1 codepoints.
Only the first 128 characters of iso-8859-1 (which happen to match the
128 characters of US-ASCII and the first 128 Unicode codepoints), have a
single-byte representation in UTF-8 which happens to match their Unicode
codepoint. The next 128 iso-8859-1 characters (which contain the
capital A with umlaut) require 2 bytes each in the UTF-8 encoding.
Thus for instance, the "capital letter A with umlaut" has the Unicode
codepoint \x00C4 (196 decimal), because is is the 197th character in the
Unicode list (and the first one is \x0000). It also happens to have the
code \xC4 (196 decimal) in the iso-8859-1 table.
But in UTF-8, it is encoded as the two bytes \xC3\x84, which is not the
decimal number 196 in any way.

All of that to say that when some people on this list say things like
"you should always decode your URLs as if they were Unicode (or UTF-8),
because it is the same as ASCII or iso-latin-1 anyway", they are talking
nonsense. The only time you can do that is when the server and all the
clients have agreed in advance that this is how they were going to
encode and decode URLs.
(That we developers wish it were so, and that ultimately we may get
there, is another matter.)

It is also talking nonsense to say that you should by default consider
html pages as UTF-8 encoded. The default character set (and encoding,
because in that case both are the same) for html is iso-8859-1, and
anything else (including UTF-8 or UTF-16) is non-default.
(see http://www.ietf.org/rfc/rfc2854.txt, section 6).
(So if you do output something else, you *must* say so).
(And hope that IE doesn't second-guess you).

We probably owe that to Tim Berners-Lee, and with tons of respect and
admiration for the guy notwithstanding, it may be an unfortunate
historical accident that he was born in England and worked in
Switzerland (both countries quite happy with iso-8859-1), rather than
being a Chinese national working in Greece e.g., who might have
preferred Unicode and UTF-8. But hey, he invented it, so he got to choose.

Anyway for the time being we all have to live with it.
Even the Tomcat guys.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-12 20:13:10 UTC

Permalink

Rectification to the clarification : what I say below about UTF-16 being
always 16-bit and limited is also nonsense. UTF-16 is variable-length,
it can cover the entire Unicode character set. It just uses a variable
number of 16-bit words per character, as compared to UTF-8 which uses a
variable number of 8-bit bytes.
I should have checked my sources. Shame on me.

About Java's internal char type being 16-bit wide though, I have heard
that too, and I'm also curious.

André Warnier wrote:
> Caldarale, Charles R wrote:
>>> From: Christopher Schultz [mailto:***@christopherschultz.net]
>>> Subject: Re: Migrating to tomcat 6 gives formatted currency
>>> amounts problem
>>>
>>> (My understanding is that Unicode (16-bit) is actually not
>>> big enough for everything, but hey, they tried).
>>
>> Point of clarification: Unicode is NOT limited to 16 bits (not even in
>> Java, these days). There are defined code points that use 32 bits,
>> and I don't think there's a limit, if you use the defined extension
>> mechanisms. Again, browsing the Unicode web site is extremely
>> enlightening.
>>
> Further clarification :
> Unicode is not limited to anything. Unicode is (aims to be) a list
> which attributes to any distinct character known to man, a number, from
> 0 to infinity. The particular position number given to a particular
> character in this Unicode list is known as its "Unicode codepoint".
> The Unicode group (consortium ?) also tries to do this with some order,
> such as trying to keep together (with consecutive codepoints) various
> groups of characters that are logically related in some way.
> For example (but probably because they had to start somewhere), the
> first 128 codepoints match the original 7-bit US-ASCII alphabet;
> so for instance the "capital letter A", which has code \x41 in US-ASCII,
> happens to have Unicode codepoint \x0041 (both 65 in decimal terms).
> For example also, the same first 128 codepoints, plus the next 128
> codepoints, match the iso-8859-1 alphabet (also known as iso-latin-1);
> thus the character known as "capital letter A with umlaut" (an A with a
> double-dot on top) has the codepoint \x00C4 in Unicode, and the code
> \xC4 in iso-8859-1 (both 196 in decimal).
>
> New Unicode characters (and codepoints) are being added all the time (I
> think there's even Klingon in there), but there are also holes in the
> list (presumably left for whenever some forgotten related character
> shows up).
>
> A quite different issue is encoding.
>
> Because it would be quite impractical to specify a series of characters
> just by writing their codepoints one after the other (using whatever
> number of bits each codepoint needs), a series of clever schemes have
> been devised in order to pass Unicode strings around, while being able
> to separate them into characters, and keep each one with its proper
> codepoint.
> Such schemes are known as "Unicode encodings" with names such as UTF-2,
> UTF-7, UTF-8, UTF-16, UTF-32, etc..
> Each one of them specifies an algorithm whereby one can take any Unicode
> character (or rather, its codepoint), and "encode" it into a series of
> bits, in such a way that at the receiving end, an opposite algorithm can
> be used to "decode" that series of bits and retrieve once again the same
> series of Unicode codepoints (or characters).
>
> UTF-16, for example, is an encoding of Unicode which uses always 16 bits
> for each Unicode codepoint; but it is to my knowledge incomplete,
> because since it uses a fixed number of 16 bit per character, it can
> thus only ever represent no more than the first 65,532 Unicode
> characters. (But we're not there yet, and there is still some leeway).
>
> UTF-8 on the other hand is a variable-length scheme, using 1, 2, 3, or
> more 8-bit groups to represent each Unicode codepoint. And it is in
> principle not limited, as there are extension mechanisms foreseen for
> whenever the need arises (imagine that some aliens suddenly show up, and
> that they happen to write in 167 different languages and alphabets).
>
> One frequent misconception is that in UTF-8, the first 256 "character
> encoding bit sequences" match the iso-8859-1 codepoints.
> Only the first 128 characters of iso-8859-1 (which happen to match the
> 128 characters of US-ASCII and the first 128 Unicode codepoints), have a
> single-byte representation in UTF-8 which happens to match their Unicode
> codepoint. The next 128 iso-8859-1 characters (which contain the
> capital A with umlaut) require 2 bytes each in the UTF-8 encoding.
> Thus for instance, the "capital letter A with umlaut" has the Unicode
> codepoint \x00C4 (196 decimal), because is is the 197th character in the
> Unicode list (and the first one is \x0000). It also happens to have the
> code \xC4 (196 decimal) in the iso-8859-1 table.
> But in UTF-8, it is encoded as the two bytes \xC3\x84, which is not the
> decimal number 196 in any way.
>
>
> All of that to say that when some people on this list say things like
> "you should always decode your URLs as if they were Unicode (or UTF-8),
> because it is the same as ASCII or iso-latin-1 anyway", they are talking
> nonsense. The only time you can do that is when the server and all the
> clients have agreed in advance that this is how they were going to
> encode and decode URLs.
> (That we developers wish it were so, and that ultimately we may get
> there, is another matter.)
>
> It is also talking nonsense to say that you should by default consider
> html pages as UTF-8 encoded. The default character set (and encoding,
> because in that case both are the same) for html is iso-8859-1, and
> anything else (including UTF-8 or UTF-16) is non-default.
> (see http://www.ietf.org/rfc/rfc2854.txt, section 6).
> (So if you do output something else, you *must* say so).
> (And hope that IE doesn't second-guess you).
>
> We probably owe that to Tim Berners-Lee, and with tons of respect and
> admiration for the guy notwithstanding, it may be an unfortunate
> historical accident that he was born in England and worked in
> Switzerland (both countries quite happy with iso-8859-1), rather than
> being a Chinese national working in Greece e.g., who might have
> preferred Unicode and UTF-8. But hey, he invented it, so he got to choose.
>
> Anyway for the time being we all have to live with it.
> Even the Tomcat guys.
>
>
> ---------------------------------------------------------------------
> To start a new topic, e-mail: ***@tomcat.apache.org
> To unsubscribe, e-mail: users-***@tomcat.apache.org
> For additional commands, e-mail: users-***@tomcat.apache.org
>

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-12 16:39:18 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chuck,

Caldarale, Charles R wrote:
>> From: Johnny Kewl [mailto:***@kewlstuff.co.za] Subject: Re:
>> Migrating to tomcat 6 gives formatted currency amounts problem
>>
>> if I do look at that test page in a MS tool... it displays
>> correctly with mixed encodings?
>
> MS cheats at every opportunity, seemingly avoiding standards whenever
> they can. IE likes to guess at the intent of the web page, sometimes
> getting it right, often getting it horribly wrong.

Yes, they do. MS, contrary to W3 specifications, sniffs the content of a
page and chooses the encoding and ignores any server-specified encoding.
It also does this with MIME types. (Sorry, can't find the reference
right now). Real web browsers do not behave in this way, so you
shouldn't base your conclusions on the behavior of MSIE.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKmzYACgkQ9CaO5/Lv0PBgEACfbFlp6HuBiTd93kGzrtOOVRhV
G4AAn2zaU1HGZA9isoewMQ3J5TZMsPjF
=E83R
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-12 18:35:39 UTC

Permalink

Christopher Schultz wrote:
[...]

>
> Yes, they do. MS, contrary to W3 specifications, sniffs the content of a
> page and chooses the encoding and ignores any server-specified encoding.
> It also does this with MIME types. (Sorry, can't find the reference
> right now).
[...]

Here is a start, sympathetic to Microsoft :
http://blogs.msdn.com/ie/archive/2005/02/01/364581.aspx

And here is another relevant MS technical document (not for the faint of
heart) :
http://msdn.microsoft.com/en-us/library/ms775147.aspx

On the other hand, the HTTP 1.1 RFC section 7.2.1
http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1
says :
quote
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If and
only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD treat
it as type "application/octet-stream".
unquote
(notice the "*if and only if* the media type is not given..")

In other words, IE's content sniffing is in clear violation of the HTTP
1.1 RFC, 99% of the time.

On the other hand, I once read a justification by one of the Microsoft
developers (as I recall that one was related to their implementation of
DAV, or "Web Folders"), which essentiually said this : there are
hundreds of millions of Windows (and IE) users, and most of them are
*not* developers. So, although we are ourselves developers and we would
very much like to adhere to the standards, our marketing people just
won't let us, if it risks inconveniencing several hundred million
average Windows users (and Microsoft customers), just to please the tiny
minority of several hundred thousand developers.

I think it's an argument, even a relatively democratic one ...

I also personally believe that if the Microsoft developers had not
started down the path a long time ago to believe that they could be
smarter than everyone else and could outguess webservers, and instead
had respected the HTTP RFC and just been more careful about which
documents IE opens (or worse, runs), they would have saved Microsoft and
the world countless bugs, countless viri and countless unproductive
hours of web-developer's forced work-arounds.

What I do not however understand is, considering the flak that each IE
bug or security advisory generates, why MS have never decided to create
and market another parallel browser (or maybe just one checkbox in the
regular IE), that would make it RFC-compliant. This way users could
just choose to either use a browser that is RFC-compliant and boring and
safe(r), or else enjoy all the gimmicks but risk the consequences.
But hey, I also do not know in how many viri-scanning companies MS owns
shares..

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-12 19:26:07 UTC

Permalink

Nope - most editors do not let you choose the character encoding, they just
use the platform default. Some do let you choose a UTF-x flavor in lieu of
the platform default, which is quite desirable. Some fonts (e.g.,
Wingdings) redefine the glyphs for given code points in order to display
oddball symbols within a non-Unicode encoding; these were pretty much all
developed before Unicode came into widespread use, but are still around for
compatibility.

You know your stuff Chuck ;)

Wonder if Wil knew he asked such a damn big question... ha ha

Ok... some more homework on this thing...

Servlet Response does in fact have a setLocale(Locale loc) function...
Which seems to indicate that if headers or something like
response.setContentType("text/html;charset=UTF-8");
is *not* used... TC will take on the encoding(ha ha did it again) charset of
that locale...

I find thinking outside of HTTP headers difficult... and it seems that
servlet spec has recognized the conflict inherent in locale and http header.
It seems that prior to Servlet spec 2.4 if a coder used locale dependent
JSTL to access resource bundles... that would in fact override
setContentType.... this apparently is no longer the case... the header takes
pref...

So André thats what you could well be seeing in your application.... because
the charset would follow the locale and that would be whatever
the JRE wants to give you...

ie the coder didnt even have to explicitly use a locale function a JSTL call
using a resource bundle will do it...

Its seems they are trying to bring locale technology that one applies in
Swing without too much thought and web technology a little closer...
Still lots of places to get caught it seems...
I think you just got to put on a different hat when doing Swing and Web
internationalization... different animals, with just enough commonality to
cause pain ;)

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-12 19:52:23 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johnny,

Johnny Kewl wrote:
> Servlet Response does in fact have a setLocale(Locale loc) function...
> Which seems to indicate that if headers or something like
> response.setContentType("text/html;charset=UTF-8");
> is *not* used... TC will take on the encoding(ha ha did it again)
> charset of that locale...

Nope! Locale != charset. Locale does not even hint of a /preferred/ charset.

> I find thinking outside of HTTP headers difficult... and it seems that
> servlet spec has recognized the conflict inherent in locale and http
> header.
> It seems that prior to Servlet spec 2.4 if a coder used locale dependent
> JSTL to access resource bundles... that would in fact override
> setContentType.... this apparently is no longer the case... the header
> takes pref...

Well, the header comes from the encoding set on the response, so it
should all be the same.

> I think you just got to put on a different hat when doing Swing and Web
> internationalization...

You shouldn't have to. The only difference is the character encoding for
the requests and responses. The use of the Java API should be identical.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKyHcACgkQ9CaO5/Lv0PDxDQCfazFHZjh/amrJBOkauDCFmwN0
rQoAoLYmA3A8Y6hbhaMN3dNeJckoy2YV
=4bXQ
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Willem Moors

2008-09-12 19:52:35 UTC

Permalink

On Fri, Sep 12, 2008 at 9:26 PM, Johnny Kewl <***@kewlstuff.co.za> wrote:

> Wonder if Wil knew he asked such a damn big question... ha ha
>

I'm really amazed at the volume of mails my question has raised.
I can only see one solution to this complexity: let's all (everybody in the
whole world) speak the same language, use the same currency and move into
one and the same timezone (the latter because of past fun with timezones)!

Willem

Christopher Schultz

2008-09-12 20:02:52 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Willem,

Willem Moors wrote:
> I can only see one solution to this complexity: let's all (everybody in the
> whole world) speak the same language, use the same currency and move into
> one and the same timezone (the latter because of past fun with timezones)!

You're not far off, except that you probably mean we should all speak
one human language (like English or Farsi or whatever). I agree, but
only if you mean we should all speak the same character language. It
should be UTF-8.

All hail UTF-8!

Seriously, switch to UTF-8.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKyuwACgkQ9CaO5/Lv0PCqFQCbB/9xp+ELXOONuWn7lQvo5hd8
jasAnjtoDUrn3d1kVoFjCcvLmg2R3KI2
=0DqD
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-12 20:20:37 UTC

Permalink

Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Willem,
>
> Willem Moors wrote:
>> I can only see one solution to this complexity: let's all (everybody in the
>> whole world) speak the same language, use the same currency and move into
>> one and the same timezone (the latter because of past fun with timezones)!
>
> You're not far off, except that you probably mean we should all speak
> one human language (like English or Farsi or whatever). I agree, but
> only if you mean we should all speak the same character language. It
> should be UTF-8.
>
> All hail UTF-8!
>
> Seriously, switch to UTF-8.
>
That reminds me of the old joke, about England deciding to switch from
driving on the (wrong) left side of the road instead of the (correct)
right side. To minimise disruptions, they were going to do it in
stages; the trucks first, the cars a week later.

Anyway, there is a flaw in the above suggestions, if taken together : if
we all spoke and wrote the same language, there would be no need for
Unicode nor for multi-byte character encodings.
Unless the language was Chinese of course.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

David Smith

2008-09-11 19:01:00 UTC

Permalink

I'm willing to bet the symbol for the british pound is not part of the
normal web character set like a US dollar symbol is and as a result
needs to be expressed by entity notation ( £ or £ ).

--David

Johnny Kewl wrote:
>
> ----- Original Message ----- From: "Christopher Schultz"
> <***@christopherschultz.net>
> To: "Tomcat Users List" <***@tomcat.apache.org>
> Sent: Thursday, September 11, 2008 6:42 PM
> Subject: Re: Migrating to tomcat 6 gives formatted currency amounts
> problem
>
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Johnny,
>>
>> Johnny Kewl wrote:
>>> I think it may be possible that something else is setting the font...
>>> and then the JRE is using that.
>>
>> I think you're totally confusing yourself about font issues. Java only
>> interacts with fonts of any kind when running AWT/Swing apps. Webapps
>> have no interactions with fonts of any kind.
>
> Chris... exactly yes... it turns out he want setting headers, so you
> absolutely right...
> but his code is introducing a font into a web app and thats what I'm
> wondering about...
>
> Forget about the webapp for a moment and just look at his code...
>
> java.text.NumberFormat currencyFormat=
> java.text.NumberFormat.getCurrencyInstance(Locale.UK);
> out.print("Formatted currency (GBP) : " + currencyFormat.format(
> 1623540.00 ) );
>
> Its generating a pound... the question is, the webapp is not
> dicatation the font... so I'm asking what font is being used for the
> pound?
>
> And then yes... it so happens that he has found the encoding that
> works in text plain... but its a flook, is lucky, its a problem
> waiting to happen
> because if I change that locale of his to french, german, chinese...
> what font is that now going to be... and that will probably definitely
> not work in default US encoding...
>
> Theres a few problem here...
>
> He *is* introducing a font into a webapp.... and we dont even know
> what it is?
>
> ---------------------------------------------------------------------------
>
> HARBOR : http://www.kewlstuff.co.za/index.htm
> The most powerful application server on earth.
> The only real POJO Application Server.
> See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
> ---------------------------------------------------------------------------
>
>
>
>
>
> ---------------------------------------------------------------------
> To start a new topic, e-mail: ***@tomcat.apache.org
> To unsubscribe, e-mail: users-***@tomcat.apache.org
> For additional commands, e-mail: users-***@tomcat.apache.org
>

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-12 04:26:36 UTC

Permalink

> From: David Smith [mailto:***@cornell.edu]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> I'm willing to bet the symbol for the british pound is not part of the
> normal web character set like a US dollar symbol is and as a result
> needs to be expressed by entity notation ( £ or £ ).

I'm not sure these days what the "normal web character set" really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-12 08:08:48 UTC

Permalink

Caldarale, Charles R wrote:
>
> I'm not sure these days what the "normal web character set" really is. If you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling symbol is not present. However, for any of the ISO-8859-x variants, it is present, using the 163 (0xA3) value you noted (same as the Unicode code point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to represent the code point.
>
I love these discussions about character sets. They seem to confuse so
many people; even I, who have been involved in them for 30 years...

Anyway, I have a related question, which I don't think constitutes a
hijack of this thread, because the underlying cause is probably similar.
Here it goes :

Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
The above Tomcat's running under the same Linux or Solaris, essentially
set up the same way. The JVM may vary, but I don't think that is the
problem, because of the consistency of the problem as explained below.
I am running a webapp from an external supplier, always the same binary
version. I don't have the code, can't see what's in it.
The pages served by that webapp are the same html pages, all of them
having a declaration <meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">.
The pages also *are* properly encoded as iso-8859-1 (100% positive, I
know the difference).
The browser receiving the pages is always the same one, same settings.

Now,

case a)
in the Tomcat startup files, I do nothing, meaning I just take Tomcat
out-of-the-box and run the webapp.
Result : in any such html page that contains characters with an ISO-8859
codepoint above \xA0 (meaning the displayable characters of the "high"
part of the table, where one finds things like "uppercase A with
umlaut"), these characters
- appear in the browser display as "?" (minus the quotes)
- also if I save the page from the browser to disk, and look at them
with an iso-8859-1 capable editor, they are effectively "?".
(So it's not the browser misunderstanding them, it is Tomcat sending
them that way).

case b)
In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or
even in /etc/init.d/tomcat5.5), I add the following line
LC_CTYPE="en_us.iso88591"
(or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
(before the actual start of Tomcat)
and restart Tomcat
then the same page displays properly in the browser, and also is correct
iso-8859-1 when saved to disk and examined with the editor.
(In other words, what previously were "?" characters, are now the
correct iso-8859-1 character bytes).

Now my question is :
How can it matter which LC_CTYPE Tomcat is started under, that would
have the result above ?
The behaviour above is consistent across different hosts, across the
same or different Tomcat versions, it is always the same webapp, always
the same html pages, always the same browser, etc. Only that LC_CTYPE
line changes the behaviour.
On the face of it, the only thing I can think of that would explain
this, is that the webapp in question does something wrong, but what
exactly could it be doing ?
Any ideas ?

Thanks in advance,
André

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Konstantin Kolinko

2008-09-12 08:56:18 UTC

Permalink

2008/9/12 André Warnier <***@ice-sa.com>:
> Caldarale, Charles R wrote:
>>
>> I'm not sure these days what the "normal web character set" really is. If
>> you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
>> symbol is not present. However, for any of the ISO-8859-x variants, it is
>> present, using the 163 (0xA3) value you noted (same as the Unicode code
>> point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to
>> represent the code point.
>>
> I love these discussions about character sets. They seem to confuse so many
> people; even I, who have been involved in them for 30 years...
>
> Anyway, I have a related question, which I don't think constitutes a hijack
> of this thread, because the underlying cause is probably similar.
> Here it goes :
>
> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
> The above Tomcat's running under the same Linux or Solaris, essentially set
> up the same way. The JVM may vary, but I don't think that is the problem,
> because of the consistency of the problem as explained below.
> I am running a webapp from an external supplier, always the same binary
> version. I don't have the code, can't see what's in it.
> The pages served by that webapp are the same html pages, all of them having
> a declaration <meta http-equiv="Content-Type" content="text/html;
> charset=iso-8859-1">.
> The pages also *are* properly encoded as iso-8859-1 (100% positive, I know
> the difference).
> The browser receiving the pages is always the same one, same settings.
>
> Now,
>
> case a)
> in the Tomcat startup files, I do nothing, meaning I just take Tomcat
> out-of-the-box and run the webapp.
> Result : in any such html page that contains characters with an ISO-8859
> codepoint above \xA0 (meaning the displayable characters of the "high" part
> of the table, where one finds things like "uppercase A with umlaut"), these
> characters
> - appear in the browser display as "?" (minus the quotes)
> - also if I save the page from the browser to disk, and look at them with
> an iso-8859-1 capable editor, they are effectively "?".
> (So it's not the browser misunderstanding them, it is Tomcat sending them
> that way).
>
> case b)
> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even
> in /etc/init.d/tomcat5.5), I add the following line
> LC_CTYPE="en_us.iso88591"
> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
> (before the actual start of Tomcat)
> and restart Tomcat
> then the same page displays properly in the browser, and also is correct
> iso-8859-1 when saved to disk and examined with the editor.
> (In other words, what previously were "?" characters, are now the correct
> iso-8859-1 character bytes).
>
> Now my question is :
> How can it matter which LC_CTYPE Tomcat is started under, that would have
> the result above ?
> The behaviour above is consistent across different hosts, across the same or
> different Tomcat versions, it is always the same webapp, always the same
> html pages, always the same browser, etc. Only that LC_CTYPE line changes
> the behaviour.
> On the face of it, the only thing I can think of that would explain this, is
> that the webapp in question does something wrong, but what exactly could it
> be doing ?
> Any ideas ?
>

It is <%@page pageEncoding="..." %> that is missing from those pages.
Thus JSP compiler does not know what encoding they are using for their
source and messes them at compilation time.

AFAIK (but never tried) it can be configured without modifying the sources
using the jsp-config element in web.xml. It can be done in the default one
in conf/web.xml.
The configuration element is described in JSP.3.3.4 of JSP2.0 spec.

By the way: in my pages I usually declare
<%@page contentType="text/html; charset=..." pageEncoding="..." %>
and add
<META http-equiv="Content-type" content="<%=response.getContentType() %>">

Thus both HTTP Content-Type: header and the META tag are present
in my response and are always in sync.

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-12 11:49:02 UTC

Permalink

Konstantin Kolinko wrote:
> 2008/9/12 André Warnier <***@ice-sa.com>:
>> Caldarale, Charles R wrote:
>>> I'm not sure these days what the "normal web character set" really is. If
>>> you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
>>> symbol is not present. However, for any of the ISO-8859-x variants, it is
>>> present, using the 163 (0xA3) value you noted (same as the Unicode code
>>> point). It's also in UTF-8 of course, but requires two bytes (0xC2 0xA3) to
>>> represent the code point.
>>>
>> I love these discussions about character sets. They seem to confuse so many
>> people; even I, who have been involved in them for 30 years...
>>
>> Anyway, I have a related question, which I don't think constitutes a hijack
>> of this thread, because the underlying cause is probably similar.
>> Here it goes :
>>
>> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
>> The above Tomcat's running under the same Linux or Solaris, essentially set
>> up the same way. The JVM may vary, but I don't think that is the problem,
>> because of the consistency of the problem as explained below.
>> I am running a webapp from an external supplier, always the same binary
>> version. I don't have the code, can't see what's in it.
>> The pages served by that webapp are the same html pages, all of them having
>> a declaration <meta http-equiv="Content-Type" content="text/html;
>> charset=iso-8859-1">.
>> The pages also *are* properly encoded as iso-8859-1 (100% positive, I know
>> the difference).
>> The browser receiving the pages is always the same one, same settings.
>>
>> Now,
>>
>> case a)
>> in the Tomcat startup files, I do nothing, meaning I just take Tomcat
>> out-of-the-box and run the webapp.
>> Result : in any such html page that contains characters with an ISO-8859
>> codepoint above \xA0 (meaning the displayable characters of the "high" part
>> of the table, where one finds things like "uppercase A with umlaut"), these
>> characters
>> - appear in the browser display as "?" (minus the quotes)
>> - also if I save the page from the browser to disk, and look at them with
>> an iso-8859-1 capable editor, they are effectively "?".
>> (So it's not the browser misunderstanding them, it is Tomcat sending them
>> that way).
>>
>> case b)
>> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even
>> in /etc/init.d/tomcat5.5), I add the following line
>> LC_CTYPE="en_us.iso88591"
>> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
>> (before the actual start of Tomcat)
>> and restart Tomcat
>> then the same page displays properly in the browser, and also is correct
>> iso-8859-1 when saved to disk and examined with the editor.
>> (In other words, what previously were "?" characters, are now the correct
>> iso-8859-1 character bytes).
>>
>> Now my question is :
>> How can it matter which LC_CTYPE Tomcat is started under, that would have
>> the result above ?
>> The behaviour above is consistent across different hosts, across the same or
>> different Tomcat versions, it is always the same webapp, always the same
>> html pages, always the same browser, etc. Only that LC_CTYPE line changes
>> the behaviour.
>> On the face of it, the only thing I can think of that would explain this, is
>> that the webapp in question does something wrong, but what exactly could it
>> be doing ?
>> Any ideas ?
>>
>
> It is <%@page pageEncoding="..." %> that is missing from those pages.
> Thus JSP compiler does not know what encoding they are using for their
> source and messes them at compilation time.
[...]

But these pages, as far as Tomcat and the webapp are concerned, are not
dynamic in any way. They are sraight static html pages.
So is the JSP stuff relevant ?
(I'm genuinely asking, since I know nothing about JSP pages)

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Konstantin Kolinko

2008-09-12 12:05:55 UTC

Permalink

2008/9/12 André Warnier <***@ice-sa.com>

> Konstantin Kolinko wrote:
>
>> 2008/9/12 André Warnier <***@ice-sa.com>:
>>
>>> Caldarale, Charles R wrote:
>>>
>>>> I'm not sure these days what the "normal web character set" really is.
>>>> If
>>>> you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
>>>> symbol is not present. However, for any of the ISO-8859-x variants, it
>>>> is
>>>> present, using the 163 (0xA3) value you noted (same as the Unicode code
>>>> point). It's also in UTF-8 of course, but requires two bytes (0xC2
>>>> 0xA3) to
>>>> represent the code point.
>>>>
>>>> I love these discussions about character sets. They seem to confuse so
>>> many
>>> people; even I, who have been involved in them for 30 years...
>>>
>>> Anyway, I have a related question, which I don't think constitutes a
>>> hijack
>>> of this thread, because the underlying cause is probably similar.
>>> Here it goes :
>>>
>>> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
>>> The above Tomcat's running under the same Linux or Solaris, essentially
>>> set
>>> up the same way. The JVM may vary, but I don't think that is the problem,
>>> because of the consistency of the problem as explained below.
>>> I am running a webapp from an external supplier, always the same binary
>>> version. I don't have the code, can't see what's in it.
>>> The pages served by that webapp are the same html pages, all of them
>>> having
>>> a declaration <meta http-equiv="Content-Type" content="text/html;
>>> charset=iso-8859-1">.
>>> The pages also *are* properly encoded as iso-8859-1 (100% positive, I
>>> know
>>> the difference).
>>> The browser receiving the pages is always the same one, same settings.
>>>
>>> Now,
>>>
>>> case a)
>>> in the Tomcat startup files, I do nothing, meaning I just take Tomcat
>>> out-of-the-box and run the webapp.
>>> Result : in any such html page that contains characters with an ISO-8859
>>> codepoint above \xA0 (meaning the displayable characters of the "high"
>>> part
>>> of the table, where one finds things like "uppercase A with umlaut"),
>>> these
>>> characters
>>> - appear in the browser display as "?" (minus the quotes)
>>> - also if I save the page from the browser to disk, and look at them
>>> with
>>> an iso-8859-1 capable editor, they are effectively "?".
>>> (So it's not the browser misunderstanding them, it is Tomcat sending them
>>> that way).
>>>
>>> case b)
>>> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or
>>> even
>>> in /etc/init.d/tomcat5.5), I add the following line
>>> LC_CTYPE="en_us.iso88591"
>>> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
>>> (before the actual start of Tomcat)
>>> and restart Tomcat
>>> then the same page displays properly in the browser, and also is correct
>>> iso-8859-1 when saved to disk and examined with the editor.
>>> (In other words, what previously were "?" characters, are now the correct
>>> iso-8859-1 character bytes).
>>>
>>> Now my question is :
>>> How can it matter which LC_CTYPE Tomcat is started under, that would have
>>> the result above ?
>>> The behaviour above is consistent across different hosts, across the same
>>> or
>>> different Tomcat versions, it is always the same webapp, always the same
>>> html pages, always the same browser, etc. Only that LC_CTYPE line
>>> changes
>>> the behaviour.
>>> On the face of it, the only thing I can think of that would explain this,
>>> is
>>> that the webapp in question does something wrong, but what exactly could
>>> it
>>> be doing ?
>>> Any ideas ?
>>>
>>>
>> It is <%@page pageEncoding="..." %> that is missing from those pages.
>> Thus JSP compiler does not know what encoding they are using for their
>> source and messes them at compilation time.
>>
> [...]
>
> But these pages, as far as Tomcat and the webapp are concerned, are not
> dynamic
>
in any way. They are straight static html pages.
> So is the JSP stuff relevant ?
> (I'm genuinely asking, since I know nothing about JSP pages)
>
>
The static HTML pages, as well as all the other static files, are served by
the
DefaultServlet. You should dig there. I think that fileEncoding
initialization parameter
of the servlet, as well as <mime-mapping> settings in web.xml come into
play.

JSP settings are irrelevant for them, of course.

Best regards,
Konstantin Kolinko

André Warnier

2008-09-12 14:57:50 UTC

Permalink

Konstantin Kolinko wrote:
> 2008/9/12 André Warnier <***@ice-sa.com>
>
>> Konstantin Kolinko wrote:
>>
>>> 2008/9/12 André Warnier <***@ice-sa.com>:
>>>
>>>> Caldarale, Charles R wrote:
>>>>
>>>>> I'm not sure these days what the "normal web character set" really is.
>>>>> If
>>>>> you're referring to ASCII (aka Basic Latin), then no, the Pound Sterling
>>>>> symbol is not present. However, for any of the ISO-8859-x variants, it
>>>>> is
>>>>> present, using the 163 (0xA3) value you noted (same as the Unicode code
>>>>> point). It's also in UTF-8 of course, but requires two bytes (0xC2
>>>>> 0xA3) to
>>>>> represent the code point.
>>>>>
>>>>> I love these discussions about character sets. They seem to confuse so
>>>> many
>>>> people; even I, who have been involved in them for 30 years...
>>>>
>>>> Anyway, I have a related question, which I don't think constitutes a
>>>> hijack
>>>> of this thread, because the underlying cause is probably similar.
>>>> Here it goes :
>>>>
>>>> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
>>>> The above Tomcat's running under the same Linux or Solaris, essentially
>>>> set
>>>> up the same way. The JVM may vary, but I don't think that is the problem,
>>>> because of the consistency of the problem as explained below.
>>>> I am running a webapp from an external supplier, always the same binary
>>>> version. I don't have the code, can't see what's in it.
>>>> The pages served by that webapp are the same html pages, all of them
>>>> having
>>>> a declaration <meta http-equiv="Content-Type" content="text/html;
>>>> charset=iso-8859-1">.
>>>> The pages also *are* properly encoded as iso-8859-1 (100% positive, I
>>>> know
>>>> the difference).
>>>> The browser receiving the pages is always the same one, same settings.
>>>>
>>>> Now,
>>>>
>>>> case a)
>>>> in the Tomcat startup files, I do nothing, meaning I just take Tomcat
>>>> out-of-the-box and run the webapp.
>>>> Result : in any such html page that contains characters with an ISO-8859
>>>> codepoint above \xA0 (meaning the displayable characters of the "high"
>>>> part
>>>> of the table, where one finds things like "uppercase A with umlaut"),
>>>> these
>>>> characters
>>>> - appear in the browser display as "?" (minus the quotes)
>>>> - also if I save the page from the browser to disk, and look at them
>>>> with
>>>> an iso-8859-1 capable editor, they are effectively "?".
>>>> (So it's not the browser misunderstanding them, it is Tomcat sending them
>>>> that way).
>>>>
>>>> case b)
>>>> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or
>>>> even
>>>> in /etc/init.d/tomcat5.5), I add the following line
>>>> LC_CTYPE="en_us.iso88591"
>>>> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
>>>> (before the actual start of Tomcat)
>>>> and restart Tomcat
>>>> then the same page displays properly in the browser, and also is correct
>>>> iso-8859-1 when saved to disk and examined with the editor.
>>>> (In other words, what previously were "?" characters, are now the correct
>>>> iso-8859-1 character bytes).
>>>>
>>>> Now my question is :
>>>> How can it matter which LC_CTYPE Tomcat is started under, that would have
>>>> the result above ?
>>>> The behaviour above is consistent across different hosts, across the same
>>>> or
>>>> different Tomcat versions, it is always the same webapp, always the same
>>>> html pages, always the same browser, etc. Only that LC_CTYPE line
>>>> changes
>>>> the behaviour.
>>>> On the face of it, the only thing I can think of that would explain this,
>>>> is
>>>> that the webapp in question does something wrong, but what exactly could
>>>> it
>>>> be doing ?
>>>> Any ideas ?
>>>>
>>>>
>>> It is <%@page pageEncoding="..." %> that is missing from those pages.
>>> Thus JSP compiler does not know what encoding they are using for their
>>> source and messes them at compilation time.
>>>
>> [...]
>>
>> But these pages, as far as Tomcat and the webapp are concerned, are not
>> dynamic
>>
> in any way. They are straight static html pages.
>> So is the JSP stuff relevant ?
>> (I'm genuinely asking, since I know nothing about JSP pages)
>>
>>
> The static HTML pages, as well as all the other static files, are served by
> the
> DefaultServlet. You should dig there. I think that fileEncoding
> initialization parameter
> of the servlet, as well as <mime-mapping> settings in web.xml come into
> play.
>
> JSP settings are irrelevant for them, of course.
>

Hi.
Thanks for the intent and answer above.
But I insist : these html pages are served by that webapp of which I am
talking, not by the DefaultServlet.
Those pages are being accessed via URLs like
http://myhost.mycompany.com/myservlet?..(additional parameters
indicating which static file to serve)..
It is on the way through that servlet that they get "corrupted", unless
I start Tomcat with LC_CTYPE="iso-8859-1".
That servlet, in its own web.xml config file in
tomcat_dir/webapps/myservlet/WEB-INF/web.xml, has no fileEncoding nor
mime-mapping section nor parameter.

So my question remains, I think : what could be going on in that servlet
so that :
- if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
the upper iso-8859-1 characters in the pages are replaced by "?"
- if LC_CTYPE is set to "iso-8859-1" in the Tomcat environment when it
starts, then the pages delivered by the servlet are correct
?

I am not very qualified in Java, but could it be something like :
- the servlet reads those documents with some InputStream, without
specifying a character set or encoding, and by default that means to use
Tomcat's idea of its default LC_CTYPE for those InputStreams ?
- or the servlet outputs the document via an OutputStream without
specifying an encoding etc..
?

André

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Antonio Vidal Ferrer

2008-09-12 15:27:14 UTC

Permalink

Hi,

Have you checked the configuration for this catalina opts?:

-Duser.language=es
-Duser.country=ES

Check that they are the same in both tomcats. (In this case, for instance,
is configured for Spanish-Spain)

Good Luck

Best,

Toni

-----Original Message-----
From: André Warnier [mailto:***@ice-sa.com]
Sent: viernes, 12 de septiembre de 2008 16:58
To: Tomcat Users List
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

Konstantin Kolinko wrote:
> 2008/9/12 André Warnier <***@ice-sa.com>
>
>> Konstantin Kolinko wrote:
>>
>>> 2008/9/12 André Warnier <***@ice-sa.com>:
>>>
>>>> Caldarale, Charles R wrote:
>>>>
>>>>> I'm not sure these days what the "normal web character set" really is.
>>>>> If
>>>>> you're referring to ASCII (aka Basic Latin), then no, the Pound
Sterling
>>>>> symbol is not present. However, for any of the ISO-8859-x variants,
it
>>>>> is
>>>>> present, using the 163 (0xA3) value you noted (same as the Unicode
code
>>>>> point). It's also in UTF-8 of course, but requires two bytes (0xC2
>>>>> 0xA3) to
>>>>> represent the code point.
>>>>>
>>>>> I love these discussions about character sets. They seem to confuse
so
>>>> many
>>>> people; even I, who have been involved in them for 30 years...
>>>>
>>>> Anyway, I have a related question, which I don't think constitutes a
>>>> hijack
>>>> of this thread, because the underlying cause is probably similar.
>>>> Here it goes :
>>>>
>>>> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
>>>> The above Tomcat's running under the same Linux or Solaris, essentially
>>>> set
>>>> up the same way. The JVM may vary, but I don't think that is the
problem,
>>>> because of the consistency of the problem as explained below.
>>>> I am running a webapp from an external supplier, always the same binary
>>>> version. I don't have the code, can't see what's in it.
>>>> The pages served by that webapp are the same html pages, all of them
>>>> having
>>>> a declaration <meta http-equiv="Content-Type" content="text/html;
>>>> charset=iso-8859-1">.
>>>> The pages also *are* properly encoded as iso-8859-1 (100% positive, I
>>>> know
>>>> the difference).
>>>> The browser receiving the pages is always the same one, same settings.
>>>>
>>>> Now,
>>>>
>>>> case a)
>>>> in the Tomcat startup files, I do nothing, meaning I just take Tomcat
>>>> out-of-the-box and run the webapp.
>>>> Result : in any such html page that contains characters with an
ISO-8859
>>>> codepoint above \xA0 (meaning the displayable characters of the "high"
>>>> part
>>>> of the table, where one finds things like "uppercase A with umlaut"),
>>>> these
>>>> characters
>>>> - appear in the browser display as "?" (minus the quotes)
>>>> - also if I save the page from the browser to disk, and look at them
>>>> with
>>>> an iso-8859-1 capable editor, they are effectively "?".
>>>> (So it's not the browser misunderstanding them, it is Tomcat sending
them
>>>> that way).
>>>>
>>>> case b)
>>>> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or
>>>> even
>>>> in /etc/init.d/tomcat5.5), I add the following line
>>>> LC_CTYPE="en_us.iso88591"
>>>> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
>>>> (before the actual start of Tomcat)
>>>> and restart Tomcat
>>>> then the same page displays properly in the browser, and also is
correct
>>>> iso-8859-1 when saved to disk and examined with the editor.
>>>> (In other words, what previously were "?" characters, are now the
correct
>>>> iso-8859-1 character bytes).
>>>>
>>>> Now my question is :
>>>> How can it matter which LC_CTYPE Tomcat is started under, that would
have
>>>> the result above ?
>>>> The behaviour above is consistent across different hosts, across the
same
>>>> or
>>>> different Tomcat versions, it is always the same webapp, always the
same
>>>> html pages, always the same browser, etc. Only that LC_CTYPE line
>>>> changes
>>>> the behaviour.
>>>> On the face of it, the only thing I can think of that would explain
this,
>>>> is
>>>> that the webapp in question does something wrong, but what exactly
could
>>>> it
>>>> be doing ?
>>>> Any ideas ?
>>>>
>>>>
>>> It is <%@page pageEncoding="..." %> that is missing from those pages.
>>> Thus JSP compiler does not know what encoding they are using for their
>>> source and messes them at compilation time.
>>>
>> [...]
>>
>> But these pages, as far as Tomcat and the webapp are concerned, are not
>> dynamic
>>
> in any way. They are straight static html pages.
>> So is the JSP stuff relevant ?
>> (I'm genuinely asking, since I know nothing about JSP pages)
>>
>>
> The static HTML pages, as well as all the other static files, are served
by
> the
> DefaultServlet. You should dig there. I think that fileEncoding
> initialization parameter
> of the servlet, as well as <mime-mapping> settings in web.xml come into
> play.
>
> JSP settings are irrelevant for them, of course.
>

Hi.
Thanks for the intent and answer above.
But I insist : these html pages are served by that webapp of which I am
talking, not by the DefaultServlet.
Those pages are being accessed via URLs like
http://myhost.mycompany.com/myservlet?..(additional parameters
indicating which static file to serve)..
It is on the way through that servlet that they get "corrupted", unless
I start Tomcat with LC_CTYPE="iso-8859-1".
That servlet, in its own web.xml config file in
tomcat_dir/webapps/myservlet/WEB-INF/web.xml, has no fileEncoding nor
mime-mapping section nor parameter.

So my question remains, I think : what could be going on in that servlet
so that :
- if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
the upper iso-8859-1 characters in the pages are replaced by "?"
- if LC_CTYPE is set to "iso-8859-1" in the Tomcat environment when it
starts, then the pages delivered by the servlet are correct
?

I am not very qualified in Java, but could it be something like :
- the servlet reads those documents with some InputStream, without
specifying a character set or encoding, and by default that means to use
Tomcat's idea of its default LC_CTYPE for those InputStreams ?
- or the servlet outputs the document via an OutputStream without
specifying an encoding etc..
?

André

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-12 15:42:20 UTC

Permalink

> From: André Warnier [mailto:***@ice-sa.com]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> - the servlet reads those documents with some InputStream,
> without specifying a character set or encoding, and by
> default that means to use Tomcat's idea of its default
> LC_CTYPE for those InputStreams ?

Essentially correct, if you substitute "JVM" for "Tomcat" in the above. Input and output are done via byte streams, converted to and from Unicode based on the specified character encoding. When that's not specified (via <Connector> attribute or HTTP header), the JVM uses a default encoding. To determine the default, JVM initialization looks at various system properties if they exist, and then certain environment variables. (The exact ones are platform dependent.)

Consequently, setting LC_CTYPE (or equivalent) prior to starting up Tomcat can have a dramatic effect on the interpretation of both input and output, as you have discovered.

Look at the API doc for java.io.InputStreamReader and java.io.OutputStreamWriter for examples of character set encoding usage.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-12 15:48:49 UTC

Permalink

> From: Caldarale, Charles R
> Subject: RE: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> Consequently, setting LC_CTYPE (or equivalent) prior to
> starting up Tomcat can have a dramatic effect on the
> interpretation of both input and output, as you have discovered.

Also, as Johnny K stated, this should not be left up to the sys admin. It really is the app writers' job to explicitly specify the encoding for both input and output, rather than leaving them up to the whims of the platform and browser. Unfortunately, many developers design with blinders on, and never think about where the app might be deployed or accessed from.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-12 16:54:47 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

André,

André Warnier wrote:
> It is on the way through that servlet that they get "corrupted", unless
> I start Tomcat with LC_CTYPE="iso-8859-1".

What do the HTTP headers say when the file is served correctly versus
when it is not? I suspect that the encoding is either set incorrectly or
not set at all unless you specify LC_CTYPE.

> So my question remains, I think : what could be going on in that servlet
> so that :
> - if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
> the upper iso-8859-1 characters in the pages are replaced by "?"
> - if LC_CTYPE is set to "iso-8859-1" in the Tomcat environment when it
> starts, then the pages delivered by the servlet are correct
> ?

My guess is that the magic servlet here is using the platform's default
encoding in the HTTP headers, which may be incorrect for the static file
in question.

> I am not very qualified in Java, but could it be something like :
> - the servlet reads those documents with some InputStream, without
> specifying a character set or encoding

Note that InputStreams are encoding-less. Sounds like semantics, but
encodings only come into play with you are dealing with
character-oriented streams which, in Java, are called Readers and
Writers. Note that neither InputStream nor OutputStream have any methods
that deal with the char data type.

> and by default that means to use
> Tomcat's idea of its default LC_CTYPE for those InputStreams ?
> - or the servlet outputs the document via an OutputStream without
> specifying an encoding etc..

I'll bet a binary stream of data is being sent (that is, with no
interpretation or encoding) and that the JVM's default encoding is being
advertised by the server in the HTTP headers. That would certainly cause
the problem.

I've found that the default encoding on my Linux box is something I've
never heard of before: "file.encoding=ANSI_X3.4-1968". Since I have my
server configured properly (and don't really serve much in the way of
static content), the platform's default encoding doesn't matter: my
preferred encoding (UTF-8) is always used.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKntcACgkQ9CaO5/Lv0PAjWACgquvyCh3SDJdqBxPPx3+zOwQ4
z3QAoKL8C5k0ZI3B6Hl4GyuDcZrcnrRf
=HPFJ
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-12 20:56:21 UTC

Permalink

Just for the sake of completeness :

Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> André,
>
> André Warnier wrote:
>> It is on the way through that servlet that they get "corrupted", unless
>> I start Tomcat with LC_CTYPE="iso-8859-1".
>
> What do the HTTP headers say when the file is served correctly versus
> when it is not? I suspect that the encoding is either set incorrectly or
> not set at all unless you specify LC_CTYPE.
>

>> So my question remains, I think : what could be going on in that servlet
>> so that :
>> - if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
>> the upper iso-8859-1 characters in the pages are replaced by "?"
>> - if LC_CTYPE is set to "iso-8859-1" in the Tomcat environment when it
>> starts, then the pages delivered by the servlet are correct
>> ?
>
> My guess is that the magic servlet here is using the platform's default
> encoding in the HTTP headers, which may be incorrect for the static file
> in question.
>
>> I am not very qualified in Java, but could it be something like :
>> - the servlet reads those documents with some InputStream, without
>> specifying a character set or encoding
>
> Note that InputStreams are encoding-less. Sounds like semantics, but
> encodings only come into play with you are dealing with
> character-oriented streams which, in Java, are called Readers and
> Writers. Note that neither InputStream nor OutputStream have any methods
> that deal with the char data type.
>
>> and by default that means to use
>> Tomcat's idea of its default LC_CTYPE for those InputStreams ?
>> - or the servlet outputs the document via an OutputStream without
>> specifying an encoding etc..
>
> I'll bet a binary stream of data is being sent (that is, with no
> interpretation or encoding) and that the JVM's default encoding is being
> advertised by the server in the HTTP headers. That would certainly cause
> the problem.
>
The last tine I looked, the http headers sent along with the documents
were the same in both cases.

It is physically (if that's the appropriate expression in this case) the
"high" iso-8859-1 characters (bytes) in the htnl document that are
being replaced by "?" (single-byte low-ascii question mark), on the way
from the disk file to the browser, via the servlet.
And if the LC_CTYPE of java (and Tomcat) is set to "iso-8859-1" in the
Tomcat startup script, it is no longer the case.

So I (now) believe that Chuck's earlier explanation is the correct one :
the servlet reads the disk document with a Reader (thanks Chris),
without specifying an encoding when it opens this Reader.
The effect is thus as follows :
- if the LC_CTYPE environment variable is not set for Java and Tomcat,
this Reader is opened using whichever encoding happens to be then the
JVM's default. Obviously, in this case it is not iso-8859-1.
The servlet thus reads the iso-8859-1 data, but with the wrong decoder.
I guess then that this decoder replaces anything that does not fit into
that default encoding, by a "?". (Would it do that, or would it trigger
an exception ?)
So that is what the servlet reads, and it passes it unchanged to it's
Writer and to the browser.
(Alternatively, it is at the level of the Writer of the servlet that the
wrong encoding is used, or both).
- if the LC_CTYPE variable is set to "iso-8859-1", then these
reader_Writer default to that as an encoding, and everything works fine.

Fortunately setting the LC_CTYPE in the Tomcat startup script does not
seem to affect other applications on the server; that is probably
because this particular servlet is the only sloppy one, which does not
explicitly specify an encoding when reading or writing stuff.
(It's also because in this case, there are not many other servlets apart
from the sloppy one).

Now I'm writing the above without a solid knowledge of Java or Tomcat
behind, so it's mostly guessing. If someone has a good reason for
shooting this down as an explanation, I'm still open.

I'll post another question under another title, I think this thread is
long enough by now.

Thanks to all though.

André

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-12 21:44:19 UTC

Permalink

----- Original Message -----
From: "André Warnier" <***@ice-sa.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Friday, September 12, 2008 10:56 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> Just for the sake of completeness :
>
> Christopher Schultz wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> André,
>>
>> André Warnier wrote:
>>> It is on the way through that servlet that they get "corrupted", unless
>>> I start Tomcat with LC_CTYPE="iso-8859-1".
>>
>> What do the HTTP headers say when the file is served correctly versus
>> when it is not? I suspect that the encoding is either set incorrectly or
>> not set at all unless you specify LC_CTYPE.
>>
>
>>> So my question remains, I think : what could be going on in that servlet
>>> so that :
>>> - if LC_CTYPE is not set in the environment *of Tomcat* when it starts,
>>> the upper iso-8859-1 characters in the pages are replaced by "?"
>>> - if LC_CTYPE is set to "iso-8859-1" in the Tomcat environment when it
>>> starts, then the pages delivered by the servlet are correct
>>> ?
>>
>> My guess is that the magic servlet here is using the platform's default
>> encoding in the HTTP headers, which may be incorrect for the static file
>> in question.
>>
>>> I am not very qualified in Java, but could it be something like :
>>> - the servlet reads those documents with some InputStream, without
>>> specifying a character set or encoding
>>
>> Note that InputStreams are encoding-less. Sounds like semantics, but
>> encodings only come into play with you are dealing with
>> character-oriented streams which, in Java, are called Readers and
>> Writers. Note that neither InputStream nor OutputStream have any methods
>> that deal with the char data type.
>>
>>> and by default that means to use
>>> Tomcat's idea of its default LC_CTYPE for those InputStreams ?
>>> - or the servlet outputs the document via an OutputStream without
>>> specifying an encoding etc..
>>
>> I'll bet a binary stream of data is being sent (that is, with no
>> interpretation or encoding) and that the JVM's default encoding is being
>> advertised by the server in the HTTP headers. That would certainly cause
>> the problem.
>>
> The last tine I looked, the http headers sent along with the documents
> were the same in both cases.
>
> It is physically (if that's the appropriate expression in this case) the
> "high" iso-8859-1 characters (bytes) in the htnl document that are being
> replaced by "?" (single-byte low-ascii question mark), on the way from the
> disk file to the browser, via the servlet.
> And if the LC_CTYPE of java (and Tomcat) is set to "iso-8859-1" in the
> Tomcat startup script, it is no longer the case.
>
> So I (now) believe that Chuck's earlier explanation is the correct one :
> the servlet reads the disk document with a Reader (thanks Chris), without
> specifying an encoding when it opens this Reader.
> The effect is thus as follows :
> - if the LC_CTYPE environment variable is not set for Java and Tomcat,
> this Reader is opened using whichever encoding happens to be then the
> JVM's default. Obviously, in this case it is not iso-8859-1.
> The servlet thus reads the iso-8859-1 data, but with the wrong decoder.
> I guess then that this decoder replaces anything that does not fit into
> that default encoding, by a "?". (Would it do that, or would it trigger an
> exception ?)
> So that is what the servlet reads, and it passes it unchanged to it's
> Writer and to the browser.
> (Alternatively, it is at the level of the Writer of the servlet that the
> wrong encoding is used, or both).
> - if the LC_CTYPE variable is set to "iso-8859-1", then these
> reader_Writer default to that as an encoding, and everything works fine.
>
> Fortunately setting the LC_CTYPE in the Tomcat startup script does not
> seem to affect other applications on the server; that is probably because
> this particular servlet is the only sloppy one, which does not explicitly
> specify an encoding when reading or writing stuff.
> (It's also because in this case, there are not many other servlets apart
> from the sloppy one).
>
> Now I'm writing the above without a solid knowledge of Java or Tomcat
> behind, so it's mostly guessing. If someone has a good reason for
> shooting this down as an explanation, I'm still open.
>
>
> I'll post another question under another title, I think this thread is
> long enough by now.
>
> Thanks to all though.

By goerge... I think you have it... the locale encoding is taking preference
over the header.
In theory... in newer servlets that will no longer happen... the header now
overrules locale encoding.
If you do decide to look at this link...
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale
Whats happening to you is described at the very bottom ;)

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

André Warnier

2008-09-13 12:01:31 UTC

Permalink

Johnny Kewl wrote:
>
> If you do decide to look at this link...
> http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale

The above link seems to be extremely informative, right on the spot for
this thread.
Thanks.

Among other things, it points out that changing the "default locale" for
the Tomcat JVM (as I am forced to do to make this servlet work properly)
may be unsafe, see the "What is the default encoding?" paragraph.

André

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-13 21:44:56 UTC

Permalink

----- Original Message -----
From: "André Warnier" <***@ice-sa.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Saturday, September 13, 2008 2:01 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> Johnny Kewl wrote:
>>
>> If you do decide to look at this link...
>> http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale
>
> The above link seems to be extremely informative, right on the spot for
> this thread.
> Thanks.
>
> Among other things, it points out that changing the "default locale" for
> the Tomcat JVM (as I am forced to do to make this servlet work properly)
> may be unsafe, see the "What is the default encoding?" paragraph.
>
> André

Chuck knows his stuff...
I treat char sets as black box's, the things work or dont ;) pain in the
butt... UTF8 just seems to work... pretty safe bet.
I'm not very scientific when it comes to this stuff... but I get the need
for unicode...
America forgot about the rest of the world when they made ascii... and we
been suffering ever since... ha ha
Diff perspective ;)

Chuck talks code points and all that good stuff... but I think that when
they made double byte codes and microsoft did their thing and then Java
tried to fix it for the rest of the world... that they must have made that
original ascii a "subset" of other codes...

So I think... if a client is expecting ISO, or UTF and it is in fact ascii
because of those locale issues... I dont think it will break down
completely... it will still "get" that ascii... but german chars and all the
rest, will just be ?... high chars as you call them will fail.

Its actually amazing how clever these guys are that moved US into a system
that included the forgotten world ;) without breaking the whole thing...
Trouble is... us humans see the "fonts" and have to guess whats going on
underneath...

Its not over... the opera browser already talks... and just wait for it...
musical "mood" and emphasis will become a thing and then they going to need
more bytes... so we can have singing sentences in 150 languages... and some
clever guy is going to map the past into the future again...

When Gate's off spring have taken over the planet ;) Our kids will be
saying... how come the my budgie sounds like a dog barking... in IE 306
Someone will say ...are you using EUTF256

Will be another long thread... its going to get worse ;)

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-13 00:53:43 UTC

Permalink

> From: André Warnier [mailto:***@ice-sa.com]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> The servlet thus reads the iso-8859-1 data, but with the
> wrong decoder. I guess then that this decoder replaces
> anything that does not fit into that default encoding,
> by a "?". (Would it do that, or would it trigger an
> exception ?)

I believe (but have not verified) that the substitution occurs for any decoding errors. At least, I can't find any exceptions defined for the APIs that perform decoding.

> I'll post another question under another title, I think this thread is
> long enough by now.

Nah, let's go for the record.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Bhatti, Shahzad

2008-11-07 23:55:20 UTC

Permalink

I am using Tomcat 5.5 with Liferay 5.1 and I am getting following error when deploying some portlets. I didn't see any liferay or my application code in the stack trace. Has anyone seen something like this or suggest something. Also, I am not sure where it's getting "xerciesImpl..." because I didn't have in the war file.
Thanks.

SEVERE: Error deploying web application directory JasperPortal
java.lang.IllegalArgumentException: xercesImpl,xml-apis,log4j,rhino,fop,batik,avalon-framework,ant,junit,hsqldb-Extension-Name
at java.util.jar.Attributes$Name.<init>(Attributes.java:447)
at java.util.jar.Attributes.getValue(Attributes.java:99)
at org.apache.catalina.util.ManifestResource.getRequiredExtensions(ManifestResource.java:186)
at org.apache.catalina.util.ManifestResource.processManifest(ManifestResource.java:155)
at org.apache.catalina.util.ManifestResource.<init>(ManifestResource.java:52)
at org.apache.catalina.util.ExtensionValidator.validateApplication(ExtensionValidator.java:186)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4064)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:926)
at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:889)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:492)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:736)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at org.apache.catalina.core.StandardService.start(StandardService.java:448)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:700)
at org.apache.catalina.startup.Catalina.start(Catalina.java:552)

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-11-08 02:06:18 UTC

Permalink

> From: Bhatti, Shahzad [mailto:***@amazon.com]
> Subject: manifest error loading portlets (liferay)
>
> I am using Tomcat 5.5 with Liferay 5.1 and I am getting
> following error when deploying some portlets.

The code in question is scanning the META-INF/MANIFEST.MF entries in each jar in your webapp's WEB-INF/lib directory. In particular, the code is looking for xxx-Extension-Name entries corresponding to the names in the Extension-List entry; it appears that some such entry in one of your jars is malformed, resulting in the IllegalArgumentException.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-12 13:14:56 UTC

Permalink

----- Original Message -----
From: "André Warnier" <***@ice-sa.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Friday, September 12, 2008 10:08 AM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> Caldarale, Charles R wrote:
>>
>> I'm not sure these days what the "normal web character set" really is.
>> If you're referring to ASCII (aka Basic Latin), then no, the Pound
>> Sterling symbol is not present. However, for any of the ISO-8859-x
>> variants, it is present, using the 163 (0xA3) value you noted (same as
>> the Unicode code point). It's also in UTF-8 of course, but requires two
>> bytes (0xC2 0xA3) to represent the code point.
>>
> I love these discussions about character sets. They seem to confuse so
> many people; even I, who have been involved in them for 30 years...
>
> Anyway, I have a related question, which I don't think constitutes a
> hijack of this thread, because the underlying cause is probably similar.
> Here it goes :
>
> Tomcat (v 4.1, v 5.0, v5.5, have not tried yet in 6.x)
> The above Tomcat's running under the same Linux or Solaris, essentially
> set up the same way. The JVM may vary, but I don't think that is the
> problem, because of the consistency of the problem as explained below.
> I am running a webapp from an external supplier, always the same binary
> version. I don't have the code, can't see what's in it.
> The pages served by that webapp are the same html pages, all of them
> having a declaration <meta http-equiv="Content-Type" content="text/html;
> charset=iso-8859-1">.
> The pages also *are* properly encoded as iso-8859-1 (100% positive, I know
> the difference).
> The browser receiving the pages is always the same one, same settings.
>
> Now,
>
> case a)
> in the Tomcat startup files, I do nothing, meaning I just take Tomcat
> out-of-the-box and run the webapp.
> Result : in any such html page that contains characters with an ISO-8859
> codepoint above \xA0 (meaning the displayable characters of the "high"
> part of the table, where one finds things like "uppercase A with umlaut"),
> these characters
> - appear in the browser display as "?" (minus the quotes)
> - also if I save the page from the browser to disk, and look at them
> with an iso-8859-1 capable editor, they are effectively "?".
> (So it's not the browser misunderstanding them, it is Tomcat sending them
> that way).
>
> case b)
> In one of the Tomcat startup files (e.g. tomcat_dir/bin/startup.sh or even
> in /etc/init.d/tomcat5.5), I add the following line
> LC_CTYPE="en_us.iso88591"
> (or whatever is valid on that host to specify an iso-8859-1 LC_CTYPE)
> (before the actual start of Tomcat)
> and restart Tomcat
> then the same page displays properly in the browser, and also is correct
> iso-8859-1 when saved to disk and examined with the editor.
> (In other words, what previously were "?" characters, are now the correct
> iso-8859-1 character bytes).
>
> Now my question is :
> How can it matter which LC_CTYPE Tomcat is started under, that would have
> the result above ?
> The behaviour above is consistent across different hosts, across the same
> or different Tomcat versions, it is always the same webapp, always the
> same html pages, always the same browser, etc. Only that LC_CTYPE line
> changes the behaviour.
> On the face of it, the only thing I can think of that would explain this,
> is that the webapp in question does something wrong, but what exactly
> could it be doing ?
> Any ideas ?
>
> Thanks in advance,
> André

Andre see this link, about halfway down...
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale
They talking Solaris, which on the default C locale is Ascii...
When they do what you doing... more or less... it becomes ISO...

So if there is a Java locale function in that web app... one minute its
working with ascii, the next with ISO...
The page encoding has been hardcoded by the coder to always ISO...
Its the Java locale in a web app... I think...

Look at the classes in an IDE, or search it...
java.util.Locale
is hiding in your web-app ;)... I think

Thanks... theres the gotcha I was worried about... and you still talking
english ;)

Does it mean you cant run linux headless?... I wonder...
For fun... make your linux box Japanese... I think the web app will really
start having fun
... no foreign administrators for you ;)

I dont believe at all its Tomcat... its client side Java sitting in
servers... gotcha..
The coders broke their own application... all by themselves... admin guys
have now got the headache...

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Caldarale, Charles R

2008-09-12 16:59:25 UTC

Permalink

> From: Johnny Kewl [mailto:***@kewlstuff.co.za]
> Subject: Re: Migrating to tomcat 6 gives formatted currency
> amounts problem
>
> Does it mean you cant run linux headless?...

Of course you can (think about blade servers).

Now you're confusing graphical display with encoding. The term "headless" is concerned with the ability to display graphical information, not render it. JVMs running in headless mode can render glyphs, graphs, or what have you, but must send the resulting bit maps to some graphics server to have it displayed (it can also be saved in files if needed).

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-12 16:46:24 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

André,

André Warnier wrote:
> The pages served by that webapp are the same html pages, all of them
> having a declaration <meta http-equiv="Content-Type" content="text/html;
> charset=iso-8859-1">.

Note that using META tags to set character sets is a bit dangerous.
You're telling the client to ignore the character set indicated by the
server which was (probably) responsible for encoding the document in the
first place. For static documents, where the server doesn't know any
better, and is probably sending binary data and doing no interpretation
or encoding of any kind, it's probably okay.

> The pages also *are* properly encoded as iso-8859-1 (100% positive, I
> know the difference).

So, for instance, the British pound symbol in your source documents
(read using an ISO-8859-1-configured viewer) looks correct?

> The browser receiving the pages is always the same one, same settings.

Did you check the md5sum of that page on both the client and the server?
I suspect they are actually different.

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKnOAACgkQ9CaO5/Lv0PBbBQCguAzYccOcY1sCgTbsxlXi5Lq5
SfQAn0HMhCIjmL5VENVqvOkwi1G73pI8
=FCfS
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Willem Moors

2008-09-11 10:29:10 UTC

Permalink

>
> 1. What the _Browser_ thinks about encoding of your page.
>
> In menu View > Encoding > what encoding is auto-selected there.

Western / ISO 8859-1 for both.

> 2. In Page Info dialog of Firefox
> (in Tools menu or in context menu > Page Info )
>
> what is Encoding, Content Type, and what META tags are mentioned (does
> it include Content-Type tag)
>
> (disclaimer: I have a localized version of FF, so the above names are
> translated ones).

Encoding: ISO-8859-1
Content type / meta tags are not mentioned.

> 3. Save both pages as HTML (choose "HTML only" format when saving), and
> compare
> their text.
>
> Is there any difference?

Since the content is Ajax generated, a save-page doesn't make much sense.
When I highlight the bits, and do a view-selection-source and then
copy/paste this into vi, I notice that the 5.5 page shows the pound sign,
while the 6.0 page shows a blank spot where the pound sign is supposed to
be.

> 4. Well, £ (notice the trailing ';'), or better £ should
> display the pound sign
> irregardless of what encoding the browser thinks that your page uses.
>
> Use the &#..; notation if generic xml processing is involved (the
> £ entity is defined
> for (X)HTML only).

The NumberFormat.getCurrencyInstance(Locale.UK) is supposed to save me the
pain of putting currency signs in.

Thanks for your reply, Konstantin.

Regards,

Willem

Jeff

2008-09-11 13:21:23 UTC

Permalink

On Wed, Sep 10, 2008 at 10:27 AM, Willem Moors <***@gmail.com> wrote:
> I'm transferring my application from a tomcat 5.5.26 server to tomcat
> 6.0.18, and notice that my formatted currency amounts are not being properly
> displayed. Instead of a Pound (GBP) sign I get a question mark within a
> black diamond (the app works fine in 5.5.26).
>
> This can easily be emulated. Add the following lines to the
> HelloWorldExample.java of the servlet examples in Tomcat 5.5.26 and those of
> 6.0.18:
>
> java.text.NumberFormat currencyFormat=
> java.text.NumberFormat.getCurrencyInstance(Locale.UK);
> out.print("Formatted currency (GBP) : " + currencyFormat.format(
> 1623540.00 ) );
>
> This will display the following :
>
> In Tomcat 6.0.18: Formatted currency (GBP) : <?>1,623,540.00
> (I've emulated the question-mark within diamond here, I'll send you a
> screenshot if you want)
>
> Tomcat 5.5.26: Formatted currency (GBP) : £1,623,540.00
> (depending on your client you may or not may see the pound sign in front of
> the above amount)
>
> What can be the problem, is there some extra locale configuration that needs
> to be done ?

I experienced similar issues (though not UK Locale) running Tomcat in
Linux/UNIX. For reasons unknown, my Tomcat/Java was not picking up the
default locale of the OS. So I explicitly set them for the JVM by
putting JAVA_OPTS="-Duser.country=US -Duser.language=en" in setenv.sh.
Problem solved. This is admittedly a duct-tape solution. I would
rather know why Java was not using the proper locale and get that
fixed, but time is money.

Examine your Tomcat 5 setup, maybe a similar tweak had been made there..

--
Jeff

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 14:34:42 UTC

Permalink

----- Original Message -----
From: "Johnny Kewl" <***@kewlstuff.co.za>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 4:28 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

>
> ----- Original Message -----
> From: "Willem Moors" <***@gmail.com>
>
>>> .... I think... you looking in the wrong place...
>>>
>>> Convert it to bytes... and print that... you will see it... I think
>>
>> Can it be one of the libraries (*.jar) that is different, that forcec TC6
>> to
>> act differently ?
>
> --- Will's Phantom Font Project ---
>
> I been trying to find a way for you to set the font you want for a
> locale...
> It does seem to be an option in JAVA... ie I think Java is expecting to
> find that from a GUI
>
> But here is the whole story....
> http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-locale
>
> Notice that on linux there are things like it depends if the font server
> starts up... yada yada.
> I'm totally surprized that its the same JRE...
>
> I think it may be possible that something else is setting the font... and
> then the JRE is using that.
> The above link actually gives you a way to find out what font is been
> picked up...
>
> But... I think this is all wrong anyway... say you get it figured out, and
> pick Heleva... or whatever... then you now have to tell the browser to use
> that in CSS or whatever.... its the beginning of a complex cycle...
>
> &pound.... is making it the browsers problem and internally the browser
> will find a font and make it happen...
>
> And then if someone moves your servlet to a headless linux.... here we go
> again... is the font there... etc
>
> I think you can get it to work, and it is interesting... but I'm not sure
> you want to...
>
> I'd luv to know if the theory is right on your system... ie run this
>
> String s = currencyFormat.format(1623540.00 );
> byte[] ba = s.getBytes();
> String ans = "";
> for (int i = 0; i < ba.length; i++) {
> ans += Integer.toHexString(ba[i]);
> }
> System.out.print("DA BYTES : " + ans);
>
> See if the bytes are changing... ie the fonts are changing...
>
> ... that me out of idea's... other than it look like Java's localization
> can nail you... and I'm now worrying about some of my systems... ha ha.

IE Format your numbers.... but dont include a currency symbol thru Java...
use &pound...

Interesting question... thanks
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Willem Moors

2008-09-11 15:06:48 UTC

Permalink

I studied the Response Headers for the ajax call that generates the output
and found that for the correct result (ie. in TC55), the content type was
this:
Content-Type text/plain;charset=ISO-8859-1

while for the wrong result (ie. in TC6), the content type was:
Content-Type text/plain

So I added this line to my code :
response.setCharacterEncoding("ISO-8859-15");
(I chose the ISO-..-15 set, to see if my change had effect)

And lo and behold: problem solved !

So would this be the right conclusion : it's TC55 that's wrong here and not
TC 6 ?
TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.

Anyway, glad to have found the solution, thank you all for chipping in
your ideas!

Regards,

Willem

Konstantin Kolinko

2008-09-11 15:53:32 UTC

Permalink

2008/9/11 Willem Moors <***@gmail.com>:
> I studied the Response Headers for the ajax call that generates the output
> and found that for the correct result (ie. in TC55), the content type was
> this:
> Content-Type text/plain;charset=ISO-8859-1
>
> while for the wrong result (ie. in TC6), the content type was:
> Content-Type text/plain
>
>
> So I added this line to my code :
> response.setCharacterEncoding("ISO-8859-15");
> (I chose the ISO-..-15 set, to see if my change had effect)
>
> And lo and behold: problem solved !
>
> So would this be the right conclusion : it's TC55 that's wrong here and not
> TC 6 ?
> TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.
>
> Anyway, glad to have found the solution, thank you all for chipping in
> your ideas!
>
Hi, Willem!

Glad to hear, that you solved this.

By the way, I think it is not Tomcat, but the browser that is confused when the
encoding is not specified in the Content-Type header.

Those question marks were '? in a romb' i.e. Unicode replacement symbol. I.e.
as if those were replaced at the browser side. When PrintWriter replaces
symbols, it prints '?' punctuation mark.

Is it true, that the Content-Type header of your Ajax responses now has
the ";charset=..." suffix? (Is Content-Type updated from your
setCharacterEncoding(), or not?)

Also, I have heard that Ajax responses that are read through XmlHttpRequest
are expected to be in UTF-8. E.g., mentioned here:
http://dojotoolkit.org/book/dojo-book-0-9/part-3-programmatic-dijit-and-dojo/i18n/encoding-considerations

Also, your HTML pages do not specify their charset explicitly, thus the
browser has to autodetect their encoding,
http://www.w3.org/TR/html4/charset.html#spec-char-encoding

Also, Tomcat wiki:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 16:18:25 UTC

Permalink

----- Original Message -----
From: "Willem Moors" <***@gmail.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 5:06 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

>I studied the Response Headers for the ajax call that generates the output
> and found that for the correct result (ie. in TC55), the content type was
> this:
> Content-Type text/plain;charset=ISO-8859-1
>
> while for the wrong result (ie. in TC6), the content type was:
> Content-Type text/plain
>
>
> So I added this line to my code :
> response.setCharacterEncoding("ISO-8859-15");
> (I chose the ISO-..-15 set, to see if my change had effect)
>
> And lo and behold: problem solved !
>
> So would this be the right conclusion : it's TC55 that's wrong here and
> not
> TC 6 ?
> TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.
>
> Anyway, glad to have found the solution, thank you all for chipping in
> your ideas!
>
>
> Regards,
>
> Willem

Didnt realize this was Ajax... ;)
I think browsers default to ISO-8859-1 unless set otherwise anyway... so its
a bit strange.
Maybe the plain text has an effect...

It also depend on the Accept headers that Ajax sent to TC... if it doesnt
specify a required encoding TC is actually at liberty to return whatever it
wants, unless of course you dictate the encoding... I see now why you cant
use &pound ;)

I think its just a matter of telling TC what it must do, either from client
header or as you doing... forcing a response.
Its your servlet... and you should probably also be setting the size headers
in your response...
Its a question/answer thing, so there is no bug, unless the client said,
gimme utf/ISO whatever and TC didnt...

So I guess the theory on localized fonts changing just fell thru ;)
I wonder how that actually works... I mean if you set a china locale... it
just has to be a weird font... what happens if it no there?

Set those headers.... Ajax is not automatic either... make sure the system
isnt guessing...
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-11 16:46:51 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Willem,

Willem Moors wrote:
> I studied the Response Headers for the ajax call that generates the output
> and found that for the correct result (ie. in TC55), the content type was
> this:
> Content-Type text/plain;charset=ISO-8859-1
>
> while for the wrong result (ie. in TC6), the content type was:
> Content-Type text/plain

Looks like the server is using something else (UTF-8?) in TC 6 and not
reporting it to the client. The client is assuming ISO-8859-1 and
therefore misinterpreting those characters outside of US-ASCII (such as £).

> So would this be the right conclusion : it's TC55 that's wrong here and not
> TC 6 ?
> TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.

It might not be by default: lots of folks explicitly set their charsets
to UTF-8 using some other technique.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjJS3sACgkQ9CaO5/Lv0PDsoQCfXcxM6uOoaA7lWCbySN8dNblG
u0oAn0ybnK1s5T6TVZuhHemLHnoriQkr
=tDJb
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 16:29:24 UTC

Permalink

----- Original Message -----
From: "Johnny Kewl" <***@kewlstuff.co.za>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 6:18 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

>
> ----- Original Message -----
> From: "Willem Moors" <***@gmail.com>
> To: "Tomcat Users List" <***@tomcat.apache.org>
> Sent: Thursday, September 11, 2008 5:06 PM
> Subject: Re: Migrating to tomcat 6 gives formatted currency amounts
> problem
>
>
>>I studied the Response Headers for the ajax call that generates the output
>> and found that for the correct result (ie. in TC55), the content type was
>> this:
>> Content-Type text/plain;charset=ISO-8859-1
>>
>> while for the wrong result (ie. in TC6), the content type was:
>> Content-Type text/plain
>>
>>
>> So I added this line to my code :
>> response.setCharacterEncoding("ISO-8859-15");
>> (I chose the ISO-..-15 set, to see if my change had effect)
>>
>> And lo and behold: problem solved !
>>
>> So would this be the right conclusion : it's TC55 that's wrong here and
>> not
>> TC 6 ?
>> TC55 slaps on the 'charset=ISO-8859-1' by default and TC 6 doesn't.
>>
>> Anyway, glad to have found the solution, thank you all for chipping in
>> your ideas!
>>
>>
>> Regards,
>>
>> Willem
>
> Didnt realize this was Ajax... ;)
> I think browsers default to ISO-8859-1 unless set otherwise anyway... so
> its a bit strange.
> Maybe the plain text has an effect...
>
> It also depend on the Accept headers that Ajax sent to TC... if it doesnt
> specify a required encoding TC is actually at liberty to return whatever
> it wants, unless of course you dictate the encoding... I see now why you
> cant use &pound ;)
>
> I think its just a matter of telling TC what it must do, either from
> client header or as you doing... forcing a response.
> Its your servlet... and you should probably also be setting the size
> headers in your response...
> Its a question/answer thing, so there is no bug, unless the client said,
> gimme utf/ISO whatever and TC didnt...
>
> So I guess the theory on localized fonts changing just fell thru ;)
> I wonder how that actually works... I mean if you set a china locale... it
> just has to be a weird font... what happens if it no there?
>
> Set those headers.... Ajax is not automatic either... make sure the system
> isnt guessing...

Actually here something interesting for you to try.... I discovered the IE
is a huge guesser... some may say more intelligent...
On IE if you set the header to text/plain... but make an HTML page... its
somehow guesses that its not text plain and makes it HTML...
Other browsers will dispay the raw HTML... browsers do guess if you dont
help em... and IE just over rules you ;)

Make sure you test in more than one browser as well... that often catches
stuff like this...

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 21:53:48 UTC

Permalink

----- Original Message -----
From: "Johnny Kewl" <***@kewlstuff.co.za>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 11:41 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

>
> ----- Original Message -----
> From: "Hassan Schroeder" <***@gmail.com>
> To: "Tomcat Users List" <***@tomcat.apache.org>
> Sent: Thursday, September 11, 2008 11:07 PM
> Subject: Re: Migrating to tomcat 6 gives formatted currency amounts
> problem
>
>
>> On Thu, Sep 11, 2008 at 12:54 PM, Johnny Kewl <***@kewlstuff.co.za>
>> wrote:
>>
>>> Now you designing a web page... you pick Arial...
>>
>>> have to discover the font (some how) and then you have to add that HTML
>>> to
>>> CSS code to your page....
>>
>> Do you not understand that style information, including fonts, is just
>> a "serving suggestion"? A user-agent has *no* obligation to use any
>> given font, or any font at all.
>
> http://www.kewlstuff.co.za/test/test.htm
>
> What do you see in this test page?

Hassan I not arguing, you know nothing about that font... how is your client
going to display it?
I'm probably missing something... teach me.
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Hassan Schroeder

2008-09-11 21:59:34 UTC

Permalink

On Thu, Sep 11, 2008 at 2:53 PM, Johnny Kewl <***@kewlstuff.co.za> wrote:

> Hassan I not arguing, you know nothing about that font... how is your client
> going to display it?

If the page contains an invalid code-point, as the error message
points out, then what should a browser display??

--
Hassan Schroeder ------------------------ ***@gmail.com

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 22:20:59 UTC

Permalink

----- Original Message -----
From: "Hassan Schroeder" <***@gmail.com>
To: "Tomcat Users List" <***@tomcat.apache.org>
Sent: Thursday, September 11, 2008 11:59 PM
Subject: Re: Migrating to tomcat 6 gives formatted currency amounts problem

> On Thu, Sep 11, 2008 at 2:53 PM, Johnny Kewl <***@kewlstuff.co.za> wrote:
>
>> Hassan I not arguing, you know nothing about that font... how is your
>> client
>> going to display it?
>
> If the page contains an invalid code-point, as the error message
> points out, then what should a browser display??

Thats probably what I'm not getting...
All I did was set the Font to Verdana and drop a registered mark in...

And thats what I'm worried about because locale info will default to
something similar....

I dont think that local code of Wils, knows its in a webapp?

Anyway... look I dont get it... maybe the only thing to say is that if one
introduces technology targeting GUI and Swing into a server, its probably
got issues.
Whether that locale stuff is intelligent enuf not to make an invalid code
point... thats the question.

... I dont know ;)
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Hassan Schroeder

2008-09-11 23:19:16 UTC

Permalink

On Thu, Sep 11, 2008 at 3:20 PM, Johnny Kewl <***@kewlstuff.co.za> wrote:

>> If the page contains an invalid code-point, as the error message
>> points out, then what should a browser display??
>
> Thats probably what I'm not getting...
> All I did was set the Font to Verdana and drop a registered mark in...

However you created your test page, it /isn't valid UTF-8/. Until that's
resolved, it has no value as a test of anything.

> Whether that locale stuff is intelligent enuf not to make an invalid code
> point... thats the question.

If that were my question, I'd be testing Locale-based code :-)

--
Hassan Schroeder ------------------------ ***@gmail.com

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-11 23:54:49 UTC

Permalink

http://validator.w3.org

Very cool btw... didnt know it was there

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-12 09:43:20 UTC

Permalink

OK, Wil.... you made me do some homework... got it sorted for you....

You must not guess the Charset... as we been doing.

Use this function....

System.out.print("CharSet : " + Charset.defaultCharset().toString());

and thats what you HAVE TO set your page at....

On my system it tells me its..... windows-1252

On Solaris if you running in a C Locale... it will be US-ASCII
If you running in a US locale it will be ISO-8859-1

Now you doing Ajax, so I imagine you may want to inject this stuff in DIV
statements...
.... I'll let someone else try answer that... mission impossible... I think.

So... you have to convert character sets from what the locale is using...
from the looks of things different on every single machine and OS... to what
you using in the web page proper... probably UTF8 if you are
internationalizing....

... it a headache... rather refactor your code so the pages are all the same
charset of your choosing and work with &pound, &yen &dollar....

.... anyway use that function to get the decoding that is actually been
used... they all changed from "outside" Java... in linux itself by the
user... so you cannot guess... and then how you going to try get that Ajax
into DIV's and tables using Javascript and DHtml or whatever.... only you
know ;)

..... Dont do it......
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Christopher Schultz

2008-09-12 17:21:14 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johnny,

Johnny Kewl wrote:
> Use this function....
>
> System.out.print("CharSet : " + Charset.defaultCharset().toString());
>
> and thats what you HAVE TO set your page at....
>
> On my system it tells me its..... windows-1252

I think you're still missing something: the file on the disk has an
implicit file encoding that is not advertised in any way. This is the
core of the problem.

If all text files said "hey, I'm encoded in UTF-8" or "I'm in
ISO-8859-1" or "This file is WINDOWS-1252", then there would be no
problem: all code would use the native encoding of the file as the
encoding of the HTTP response, and the file would be streamed as binary
without changing a single bit in the stream.

Unfortunately, this is better known as "explicit encoding" and basically
doesn't exist (except in some UTF-encoded files). Since the server
doesn't know the file's original encoding, it /can never make a sensible
decision about the output encoding/. It's simply not possible.

It has nothing to do with your OS, of your filesystem, or your per-user
locale preferences, installed fonts, etc. It has to do with the fact
that the file has no explicit encoding and the server can use. (This is
what gives rise to the MSIE practice of sniffing the document content
regardless of the server's assertion as to the character encoding).

> ... it a headache... rather refactor your code so the pages are all the
> same charset of your choosing and work with &pound, &yen &dollar....

This is always a sensible way to go. If you stick to pages that always
use US-ASCII or anything compatible with it (generally ISO-8859-*, I
think), you'll be good to go.

A much better way to go is to always use properties files for text that
will be displayed on web pages. It's the right thing to do from a
localization perspective (yes, you can have separate pages for each
language, but that's no fun), AND the encoding for Java properties files
is DEFINED TO BE ISO-8859-1, no matter what you want to put in there. In
this case, there /is/ an explicit character encoding, and it's
predictable. Of course, Java coders can always bone the creation of
these files...

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKpQoACgkQ9CaO5/Lv0PDW4ACdEHqsgCK2IrHF1Bl6cz40Wben
liYAn00FVbmPpVAl35Zh6nDd1Q5Cxh/d
=4lJ4
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Johnny Kewl

2008-09-12 10:45:26 UTC

Permalink

Then one last thing before I put this in my little black book of things I'm
never going to do... and forget about it forever ;)

This is what windows does....

If the machine is on US English...

Regardless of the local I set... German, English, Japanese I set.... in Java
the charset is always

windows-1252... which is basically ISO with differences...

But if I switch the machine back to Japanese... then its

windows-32j

So.... thats what you injecting into your web pages... when using Java
locale functions... in a web page...
Maybe thats what a person wants and in a company, using these local
functions and every user is on Windows... it may just work
... thats actually scary...

Nice question....
---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------

---------------------------------------------------------------------
To start a new topic, e-mail: ***@tomcat.apache.org
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org