odbc_fetch_into returns junk data at end of multi-byte char fields

Bug #60616 odbc_fetch_into returns junk data at end of multi-byte char fields
Submitted: 2011-12-28 11:57 UTC Modified: 2014-07-31 17:31 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: j dot faithw at yahoo dot com Assigned: keyur (profile)
Status: Closed Package: ODBC related
PHP Version: 5.3.8 OS: Linux
Private report: No CVE-ID: None

 [2011-12-28 11:57 UTC] j dot faithw at yahoo dot com

Description:
------------
This relates to bug#25792, which has been marked Analyzed but does not seem to be fixed.

When retrieving data from a char() field containing multi-byte characters using e.g. odbc_fetch_into(), if the number of bytes used exceeds the number of characters then junk data is returned at the end of the string.

I have experience this with Postgres char columns when the database is created with e.g the EUC_CN character encoding(createdb -E EUC_CN).

This encoding uses between 1 and 3 bytes per character. So a char(10) could need up to 30 bytes. 

The problem is in the odbc_bindcols function in ext/odbc/php_odbc.c
  SQLColAttributes is called with SQL_COLUMN_DISPLAY_SIZE but this indicates the maximum number of characters required not the number of bytes.
This means the buffer allocated for the value may not be big enough
  result->values[i].value=(char)emalloc(displaysize+1);

Later on in e.g. odbc_fetch_into
  Z_STRLEN_P(tmp) = result->values[i].vallen;
  Z_STRVAL_P(tmp) = estrndup(result->values[i].value,Z_STRLEN_P(tmp));

This can result in a vallen bigger that displaysize. But the ODBC driver will only fill in at most displaysize+1 bytes(including null terminator). This means character data is missed and junk bytes are returned instead.

The same problem may exist in ext/pdo_odbc/odbc_stmt.c. Where 
  rc = SQLColAttribute(S->stmt, colno+1, SQL_DESC_DISPLAY_SIZE,
            NULL, 0, NULL, &displaysize);
is called. But I have not tested this.

The following fixes odbc_bindcols for the char(x) datatype. I believe 4 bytes is the maximum required for any character encoding.
php_odbc.c:line 988
      if (result->values[i].coltype == SQL_CHAR) {
        //If using a multibyte character encoding
        //number of bytes could be 4*SQL_COLUMN_DISPLAY_SIZE.
        //Without this workaround various functions
        //e.g. odbc_fetch_into will return data with a null after
        //diplaysize bytes and extra junk data at the end as
        //vallen can be bigger than displaysize. Tested using
        //PostgreSQL with EUC_CN encoding.
        displaysize*=4;
      }

The fix may be needed for other data types as well as SQL_CHAR.




Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

 [2014-07-28 23:29 UTC] keyur@php.net

-Status: Open +Status: Closed -Assigned To: +Assigned To: keyur

 [2014-07-28 23:30 UTC] keyur@php.net

The fix for this bug has been committed.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.


 [2014-07-29 09:45 UTC] j dot faithw at yahoo dot com

-Status: Closed +Status: Assigned

 [2014-07-29 09:45 UTC] j dot faithw at yahoo dot com

The patch adds support for WVARCHAR which should fix issues with that datatype but this bug was for the CHAR datatype which is still broken.

 [2014-07-30 02:57 UTC] keyur@php.net

-Status: Assigned +Status: Closed

 [2014-07-31 17:31 UTC] j dot faithw at yahoo dot com

I had a look over the new patch and it looks good to me.
I also downloaded and built a snapshot(php5.5-201407311430) and did a quick test, both char and varchar work fine now.
Thank you for the fix.