UTF-8 support

Announcements and description of general changes in kPlaylist

UTF-8 support

Postby aba » Sun Apr 29, 2007 3:40 pm

kPlaylist, starting from build 440 has experimental UTF-8 support. The only way to get this release is via the customizer and selecting 'UTF-8' in the new type selection in the top.

Download here: http://www.kplaylist.net/kcustomizer/

Prerequisites:

Getid3 and getid3 version 1.7.

php.ini's value of default_charset is "":

Code: Select all
default_charset = ""


Notes

The UTF-8 edition will not treat any of the old latin1 data. So, the first thing you should do after installing is running the update with full id3 rebuild. As previously posted, this is experimental and bugs will exist, so please post comments on any strange encounters.

aba
aba
Site Admin
 
Posts: 2325
Joined: Wed May 08, 2002 9:19 am

Postby tcghost » Sun Apr 29, 2007 5:06 pm

thanks a lot, was waiting for this for a while!

one question though, any way that you can add an option to the customizer to have it split kplaylist into files (or rather, not join them) and just put them in a zip or tar.gz file?

EDIT: ok, seems I got it working. I see only 2 problems with it right now:

1. Can't get custom hotselect to work. It seems it leaves it in the default charset? I tried encoding it to UTF-8, but no go.

2. Last streams are left in the default encoding as well.

I'll let you know if I find fixes for the 2 problems or if I find more.
tcghost
 
Posts: 229
Joined: Fri Nov 04, 2005 6:48 pm
Location: Florida

Postby aba » Mon Apr 30, 2007 12:19 pm

tcghost wrote:thanks a lot, was waiting for this for a while!

one question though, any way that you can add an option to the customizer to have it split kplaylist into files (or rather, not join them) and just put them in a zip or tar.gz file?

EDIT: ok, seems I got it working. I see only 2 problems with it right now:

1. Can't get custom hotselect to work. It seems it leaves it in the default charset? I tried encoding it to UTF-8, but no go.

2. Last streams are left in the default encoding as well.

I'll let you know if I find fixes for the 2 problems or if I find more.


1. I'll get back with a fix for this.

2. Are you using 'livestreams' with ajax? Make sure you have the latest prototype if you do. Remember, the live streams uses the same meta data as the rest of the system, so make sure you've done a complete id3 rebuild after getting the utf8 edition.
aba
Site Admin
 
Posts: 2325
Joined: Wed May 08, 2002 9:19 am

Postby tcghost » Tue May 01, 2007 12:43 am

I did a complete rescan with rebuild with getid3 1.7.8b1 (that may be the problem) and it ran successfully. I was not using the ajax last streams.

I think the problem is that most of my tags are now encoded in Windows-1251 yet kplaylist tries to read them as unicode now. I will try writing them all in unicode and see if that helps.

Also, I had to set the filesystem charset as CP1251 (running under xp) to get it to display filenames and folders correctly
tcghost
 
Posts: 229
Joined: Fri Nov 04, 2005 6:48 pm
Location: Florida

Postby aba » Tue May 01, 2007 1:28 am

tcghost wrote:I did a complete rescan with rebuild with getid3 1.7.8b1 (that may be the problem) and it ran successfully. I was not using the ajax last streams.

I think the problem is that most of my tags are now encoded in Windows-1251 yet kplaylist tries to read them as unicode now. I will try writing them all in unicode and see if that helps.

Also, I had to set the filesystem charset as CP1251 (running under xp) to get it to display filenames and folders correctly


I just released 441 which should take care of the hotselect problem. I was expecting getid3 to return the data in UTF-8 by converting the original charset.
aba
Site Admin
 
Posts: 2325
Joined: Wed May 08, 2002 9:19 am

Postby tcghost » Tue May 01, 2007 1:38 am

just tested it and yes, if the id3 tag is encoded as unicode everything works fine.

EDIT: just tested build 441 and hotselect still seems to be messed up.

EDIT2: I got the hotselect option semi-fixed by using:
Code: Select all
$cfg['hotselectchars'] = iconv('CP1251', 'UTF-8', '*0abcdefghijklmnopqrstuvwxyz_абвгдежзиклмнопрстуфхцчшщэюя');

but it displayed no hits for a lot of the letters that were working before.

EDIT3: Found another bug: the last streams list now cuts off the title before it reaches the imposed limit of chars (probably because it uses the regular strlen) this is with both, ajax on and off

also, on a sidenot, for the unicode bug in getid3:
http://www.kplaylist.net/forum/viewtopic.php?t=1146
http://www.getid3.org/phpBB2/viewtopic.php?p=1428
before I was able to set $tagformat = 'ISO-8859-1; and it would work fine with "write id3 with stream". Yet now, since all my tags are now in unicode, I can't use that workaround as all tags look garbled. I'll dig into the getid3 code a bit deeper and maybe identify the real problem why it screws up the tags when using unicode.
tcghost
 
Posts: 229
Joined: Fri Nov 04, 2005 6:48 pm
Location: Florida

Postby tcghost » Tue May 01, 2007 9:11 pm

1. Found another bug (warning, actually). Not sure if it is related to the unicode changes, but I think it only started happening with the 2 new builds. I get a:
Warning: mysql_fetch_row(): supplied argument is not a valid MySQL result resource in image.php on line 196

I traced the problem to $drive being an empty array with no elements and thus generating an incorrect query. Haven't been able to trace it further. It occurs for albums without an image for the detailed view.

2. DB Update seems to not be fully unicode-aware. Whenever I do an update it displays current filenames that it's updating as weird characters for non-engilsh names.
tcghost
 
Posts: 229
Joined: Fri Nov 04, 2005 6:48 pm
Location: Florida

Postby aba » Sun May 27, 2007 1:16 am

tcghost wrote:1. Found another bug (warning, actually). Not sure if it is related to the unicode changes, but I think it only started happening with the 2 new builds. I get a:
Warning: mysql_fetch_row(): supplied argument is not a valid MySQL result resource in image.php on line 196

I traced the problem to $drive being an empty array with no elements and thus generating an incorrect query. Haven't been able to trace it further. It occurs for albums without an image for the detailed view.

2. DB Update seems to not be fully unicode-aware. Whenever I do an update it displays current filenames that it's updating as weird characters for non-engilsh names.


I've just released build 442 which should take care of some of the issues you mentioned. In regards to the hotselect, make sure your kpconfig.php file is actually in unicode. (Try iconv -t "utf-8" <file>)
aba
Site Admin
 
Posts: 2325
Joined: Wed May 08, 2002 9:19 am

Postby tcghost » Wed May 30, 2007 6:12 pm

I'm back :)

found a couple of new issues (#1 was not there before I think)

1. The bulletin now converts all non-ansi characters to question marks.

2. if I save the kpconfig file in utf-8, php outputs something while processing the file, so headers cannot be modified later on. I will stick with my solution for now of just encoding the non-ansi characters to unicode.

3. hotselect still says that there are no entries for non-ansi characters

thank you again for all your help
tcghost
 
Posts: 229
Joined: Fri Nov 04, 2005 6:48 pm
Location: Florida

Postby aba » Wed May 30, 2007 7:50 pm

tcghost wrote:I'm back :)

found a couple of new issues (#1 was not there before I think)

1. The bulletin now converts all non-ansi characters to question marks.

2. if I save the kpconfig file in utf-8, php outputs something while processing the file, so headers cannot be modified later on. I will stick with my solution for now of just encoding the non-ansi characters to unicode.

3. hotselect still says that there are no entries for non-ansi characters

thank you again for all your help


You are downloading the development kP from the customizer and selecting 'UTF-8' in the type selectbox, right?
aba
Site Admin
 
Posts: 2325
Joined: Wed May 08, 2002 9:19 am

Postby tcghost » Wed May 30, 2007 7:50 pm

and one more:

4. the rss feed displays non-ansi characters as question marks as well now. Before I would just change the encoding to the correct one (it is hard coded) and it would work but now it displays non-ansi characters as "?".

EDIT:
no I was using the development version that was split into files. Let me try the official one.

EDIT2: just tried it and got the same results for all 3 issues (didn't try #2 as it was unrelated to the actual kplaylist script
tcghost
 
Posts: 229
Joined: Fri Nov 04, 2005 6:48 pm
Location: Florida

Postby aba » Thu May 31, 2007 10:29 am

tcghost wrote:I'm back :)

found a couple of new issues (#1 was not there before I think)

1. The bulletin now converts all non-ansi characters to question marks.

2. if I save the kpconfig file in utf-8, php outputs something while processing the file, so headers cannot be modified later on. I will stick with my solution for now of just encoding the non-ansi characters to unicode.

3. hotselect still says that there are no entries for non-ansi characters

thank you again for all your help


1. If you create a new bulletin, does it not show correctly? When "installing" the UTF version of kP, it does not currently convert any of the old data. So, if old entries looks 'strange', this should be normal.

2. It sounds strange. What version of PHP are you using?

3. This is most probably connected with 2. You might have a unicode signature in your file.

Make sure that the detected encoding is correct. If you're using Firefox, click 'View' -> Character encoding.
aba
Site Admin
 
Posts: 2325
Joined: Wed May 08, 2002 9:19 am

Postby tcghost » Thu May 31, 2007 6:59 pm

aba wrote:1. If you create a new bulletin, does it not show correctly? When "installing" the UTF version of kP, it does not currently convert any of the old data. So, if old entries looks 'strange', this should be normal.

2. It sounds strange. What version of PHP are you using?

3. This is most probably connected with 2. You might have a unicode signature in your file.

Make sure that the detected encoding is correct. If you're using Firefox, click 'View' -> Character encoding.

#1 I tried with both creating new entries and editing old ones.
#2 and #3 I'm using the latest version of php, 5.2.2. The page encodings seem to be correct, UTF-8. I'll try some more stuff and get back to you.

EDIT: I found the problem... all my unicode tags and everything are being read as question marks with this new build. probably something inside php. I'll try reinstalling kplaylist from scratch.

EDIT2: Tried with a fresh install and it's the same thing. Looks like something was changed in the later builds that makes all the unicode tags not be read correctly and breaks the non-ansi user input
tcghost
 
Posts: 229
Joined: Fri Nov 04, 2005 6:48 pm
Location: Florida

Postby aba » Thu May 31, 2007 10:48 pm

tcghost wrote:
aba wrote:1. If you create a new bulletin, does it not show correctly? When "installing" the UTF version of kP, it does not currently convert any of the old data. So, if old entries looks 'strange', this should be normal.

2. It sounds strange. What version of PHP are you using?

3. This is most probably connected with 2. You might have a unicode signature in your file.

Make sure that the detected encoding is correct. If you're using Firefox, click 'View' -> Character encoding.

#1 I tried with both creating new entries and editing old ones.
#2 and #3 I'm using the latest version of php, 5.2.2. The page encodings seem to be correct, UTF-8. I'll try some more stuff and get back to you.

EDIT: I found the problem... all my unicode tags and everything are being read as question marks with this new build. probably something inside php. I'll try reinstalling kplaylist from scratch.

EDIT2: Tried with a fresh install and it's the same thing. Looks like something was changed in the later builds that makes all the unicode tags not be read correctly and breaks the non-ansi user input


Ahh. I've found a bug in the utf downloader. The languages that are taken from database were not in UTF-8. I've modified it and now it should be in unicode. So, if you can, re download the same version from the customizer with UTF-8 as type and hopefully you will have better result.
aba
Site Admin
 
Posts: 2325
Joined: Wed May 08, 2002 9:19 am

Postby tcghost » Fri Jun 01, 2007 2:44 am

aba wrote:Ahh. I've found a bug in the utf downloader. The languages that are taken from database were not in UTF-8. I've modified it and now it should be in unicode. So, if you can, re download the same version from the customizer with UTF-8 as type and hopefully you will have better result.


thanks a lot. will give it a try and get back to you.

EDIT: nope, seems I get the same results for both the bulletin and all id3 tags. :( Could it be that the id3 tags in the db are stored as latin_general_ci and not unicode? (just a wild guess)
tcghost
 
Posts: 229
Joined: Fri Nov 04, 2005 6:48 pm
Location: Florida

Next

Return to Announcement



cron