Topic: UTF-8 encoding issue

I'm working on a support issue where users enter text into a form and it should update an objects attributes in the database by calling update_attributes
The issue I'm having seems to be some kind of encoding problem - When a certain set of japanese characters are entered:
グーグル
はい

with the carriage return, it returns this error:

PG::Error: ERROR:  invalid byte sequence for encoding "UTF8": 0xe3283f
: UPDATE "monitor_lists" SET "updated_at" = '2012-11-30 23:20:40.698964', "regexp" = '\bã(?:‚°ãƒ¼ã‚°ãƒ«|¯ã„)\b', "list" = 'グーグル
はい' WHERE "monitor_lists"."id" = 1

but I've checked and the front end site and back end DB both use UTF8, so there's no reason I can find why there should be an issue there. I did wonder if it could be something to do with the regex being generated for the japanese phrases - the error doesn't occur if you try either on their own or without the new line - but the hex code in the error line converts back to ?ã(? and that phrase isn't in the regex. I'm at a loss, so any suggestions would be appreciated.

Thanks in advance.

Last edited by silversurfer (2012-11-30 19:24:31)

Re: UTF-8 encoding issue

I am confused after reading your issue , i can not provide you the solution .

Re: UTF-8 encoding issue

UTF-8 has become the dominant character encoding for the World-Wide Web, accounting for more than half of all Web pages.The Internet Engineering Task Force (IETF) requires all Internet protocols to identify the encoding used for character data, and the supported character encodings must include UTF-8.The Internet Mail Consortium (IMC) recommends that all e-mail programs be able to display and create mail using UTF-8.UTF-8 is also increasingly being used as the default character encoding in operating systems, programming languages, APIs, and software applications.

Semaphore Software Joomla Development Blog