PHP :: Bug #29955 :: Support for Turkish/iso-8859-9

Bug #29955 Support for Turkish/iso-8859-9
Submitted: 2004-09-02 17:15 UTC Modified: 2007-09-04 14:14 UTC
Votes:2
Avg. Score:4.5 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:0 (0.0%)
From: jan at horde dot org Assigned: hirokawa (profile)
Status: Closed Package: mbstring related
PHP Version: 5CVS, 4CVS (2004-09-02) OS: Linux
Private report: No CVE-ID: None

 [2004-09-02 17:15 UTC] jan at horde dot org

Description:
------------
In ISO-8859-9 (Turkish) the uppercase letter of "i" is a dotted uppercase "I", the lowercase letter of "I" is a dotless "i". But mb_strtolower() und mb_strtoupper() simply return the ASCII uppercase or lowercase counterparts.

You get the correct result with:
setlocale(LC_ALL, 'tr_TR');
echo strtoupper('i');
echo strtolower('I');

But the wrong results with:
echo mb_strtoupper('i', 'iso-8859-9');
echo mb_strtolower('I', 'iso-8859-9');



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

 [2005-02-22 11:10 UTC] moriyoshi@php.net

It turned out this is because mbstring doesn't take the 
locale into consideration.


 [2005-05-13 02:26 UTC] mustafa at deu dot edu dot tr

I get the same results like jan.

I need to get UTF-8 output for consuming a web service and I configured my php 5.0.4 with --enable-mbstring=all parameter (on linux that has been set with Turkish locale)

I see that mbstring extension has limited language support in source code. (German, English, Japanese, Korean, Russian, Chinese)

Is there a way to add our (Turkish) language to source code? Any references about this extension's source?

 [2005-05-13 08:00 UTC] moriyoshi@php.net

Turkish locale would need complete overhaul on the 
entire extension because the locale's character 
properties and required case folding behaviour are very 
special.

PHP-ICU extension could support anything, but that's 
just an ongoing work by l0t3k.

 [2005-12-23 14:10 UTC] hirokawa@php.net

I don't know which is the standard way (0x49 or 0xdd).
In ISO-8859-9 (Turkish),
upper case of 'i' (0x69) always should be translated to 'I' 
with dot (0xdd) ?
If yes, please let me know some URLs which describe 
the mapping.


 [2007-01-05 14:31 UTC] jan at horde dot org

Any chance this is going to be backported to PHP 5.2? I guess mbstring is going to be obsolete with the Unicode and ICU support in PHP 6.

 [2007-01-05 14:33 UTC] jan at horde dot org

Oh, and by the way, this conversion should always happen for iso-8859-9, not only if mbstring.language is set to Turkish, because this is completely useless in real world applications.

 [2007-08-17 22:19 UTC] hirokawa@php.net

This change is already back ported to PHP 5.2.
In my understanding, it shouldn't always applied to ISO-8859-9,
because the conversion result is depends on the locale.
(correct ?)




 [2007-08-23 16:33 UTC] jan at horde dot org

No. The conversion has to be done this way for iso-8859-9 always, not only if the current locale is Turkish. Turkish is the only language that uses this charset.

 [2007-09-04 14:14 UTC] hirokawa@php.net

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.