UTF-8 problem (U+202F -> ?)

I’m using IIS 7.5, the PHP connector.

In my database I have a string such as “U+0035 U+2013 U+0061 U+202F U+0032”, when the page is initially displayed, the client grid properly displays: “5–a 2”.

When the row containing these characters is updated, I see this being sent back: 5%E2%80%93a%E2%80%AF2 (which I believe is correct)

After the update, I see this in the DB : 5–a?2, viz. U+0035 U+2013 U+0061 U+003F U+0032.

What is interesting is that the U+2013 (en dash) was properly saved but not the U+202F (narrow non-break space)…

I tried to play with all kinds of UTF-8 setting parameter (dataprocessor, php) to no avail.

The problem can be related to the way how grid stores data on client side. Data is stored in the HTML and browser data “auto-fixing” during rendering can cause some problems.

You can try to use text based cells for the problematic data ( rotxt, edtxt, txttxt column types in the grid )

The problem occurs in an edtxt column of the grid.

The U+202F (narrow non-break space) is displayed correctly (a space) when the data is loaded initially. So the DB looks okay (it stores the en dash and the narrow non-break space properly) and the browser also (it gets an <?xml version='1.0' encoding='utf-8' ?> that it displays properly). I checked the xml sent it does contain the U+202F.

My impression is that the DB gets corrupted (viz. U+202F mapped to ?) after it gets updated through the dataprocessor/connector. I was wondering if it could be possible that URL decoder converts the %xx sequence "post"ed in the form to a “?” before it saves the data in the DB ?

Would it be possible for you people (Dhx) to check that U+202F roundtrips properly through the dhtmlxConnector (.php in my case) and Dataprocessor to the client and back to the Database (MS SQL on latest version of IIS in my case) when a row containing U+202F is updated ?

My .php on the server side gets the U+202F correctly.

It seems that it’s the code updating my MS SQL DB that forgets an N prefix around the string being updated.

I open my DB connection thus:

$connectionInfo = array( "Database"=>"$database", "UID"=>"$username", "PWD"=>"$password", "CharacterSet"=>"UTF-8");
$conn = sqlsrv_connect( $hostname, $connectionInfo);

When I look at the connector log I see this (shortened for legibility):

UPDATE tab SET aNvarcharCol=‘5–a 2’ WHERE id=‘1’;

The - is U+2013, and the space between the “a” and the “2” is a U+202F.

This statement is generated by the connector I believe.

This causes the field aNvarcharCol to be updated in this way in the DB: “5-a?2”. The en dash is preserved, the narrow non-break space is replaced by a question mark. This is most probably the result of an intermediary mapping to windows 1252 or such, which would explain the preservation of the endash.

If I add (manually) a N prefix to the update statement :

UPDATE tab SET aNvarcharCol=N’5–a 2’ WHERE id=‘1’;

Everything works well, the U+202F is saved properly in the DB.

If this is the problem, do you have a suggestion to get the connector to add that N prefix?

Surely, I’m not the only one to have had this problem…

The AutoTranslate option on the connection string (as in ODBC) might do the trick… But that option does not exist for sqlsrv_connect…

msdn.microsoft.com/en-us/library/ff628167.aspx

Actually this is first report of such kind, so we didn’t have a ready solution.
You can define your own SQL command, that will be used for data update operations
docs.dhtmlx.com/connector__php__ … lexqueries

You can use this approach to define SQL code that is necessary in your case for correct data saving.

I corrected the code in db_common.php (I have a few of those grids…) by add a N before the ‘’, apparently using parameterized queries (with “?” for each parameter ) might also correct this problem.