PHP :: Bug #35427 :: str_word_count() handles '-' incorrectly

Bug #35427 str_word_count() handles '-' incorrectly
Submitted: 2005-11-27 19:12 UTC Modified: 2005-11-29 17:14 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: tomas_matousek at hotmail dot com Assigned: iliaa (profile)
Status: Closed Package: Strings related
PHP Version: 5.1.0 OS: *
Private report: No CVE-ID: None

 [2005-11-27 19:12 UTC] tomas_matousek at hotmail dot com

Description:
------------
Characters specified in str_word_count() should be treated equally to letters, right?
This works for apostrophe but doesn't for hyphen.

Reproduce code:
---------------
var_dump(str_word_count("foo'0 bar-0var", 2, "0"));


Expected result:
----------------
array(3) {
  [0]=>
  string(5) "foo'0"
  [6]=>
  string(3) "bar0var"
}


Actual result:
--------------
array(3) {
  [0]=>
  string(5) "foo'0"
  [6]=>
  string(3) "bar"
  [10]=>
  string(4) "0var"
}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

 [2005-11-27 19:28 UTC] tony2001@php.net

"bar-0var" doesn't look like a valid *WORD* to me.
Or is it?

 [2005-11-27 20:00 UTC] tomas_matousek at hotmail dot com

By passing "0" as the third parameter, one declares '0' character legal word character which should be equivalent to any other letter, e.g. 'x'. "bar-xbar" is considered to be a word so "bar-0bar" should be word as well.

 [2005-11-28 21:27 UTC] tomas_matousek at hotmail dot com

No, I needn't. str_word_count("bar-var") returns 1, so '-' is considered as a part of the word if it is followed by 'word' character.

See the source code. The bug is clear there.

 [2005-11-29 09:41 UTC] tomas_matousek at hotmail dot com

File string.c, line 4744:

while (isalpha(*p) || *p == '\'' || (*p == '-' && isalpha(*(p+1))) || (char_list && ch[(unsigned char)*p])) 

should be:

while (isalpha(*p) || *p == '\'' || (*p == '-' && (isalpha(*(p+1) || (char_list && ch[(unsigned char)*p])))) || (char_list && ch[(unsigned char)*p]))

 [2005-11-29 09:45 UTC] tomas_matousek at hotmail dot com

One more correction:

while (isalpha(*p) || *p == '\'' || (*p == '-' && (isalpha(*(p+1)) || char_list && ch[(unsigned char)*(p+1)]))
|| (char_list && ch[(unsigned char)*p]))

 [2005-11-29 17:14 UTC] iliaa@php.net

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.