tokenizer misses last single-line comment (PHP 5.3+, with re2c lexer)
| Bug #46817 | tokenizer misses last single-line comment (PHP 5.3+, with re2c lexer) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Submitted: | 2008-12-09 22:35 UTC | Modified: | 2010-11-22 13:39 UTC |
|
||||||||||
| From: | master dot jexus at gmail dot com | Assigned: | shire (profile) | |||||||||||
| Status: | Closed | Package: | Scripting Engine problem | |||||||||||
| PHP Version: | 5.3.0alpha3 | OS: | * | |||||||||||
| Private report: | No | CVE-ID: | None | |||||||||||
[2008-12-09 22:35 UTC] master dot jexus at gmail dot com
Description:
------------
When using the tokenizer to lex given text, the output seems to miss
the last token, if it was a single line comment.
It only seems to occur if there isn't a newline behind the comment
lexeme.
Note the last entries in the arrays.
Reproduce code:
---------------
<?php
print_r(token_get_all(file_get_contents(__FILE__)));
// test
$var = 5;
// test
Expected result:
----------------
Array
(
[0] => Array
(
[0] => 367
[1] => 1
)
[1] => Array
(
[0] => 307
[1] => print_r
[2] => 2
)
[2] => (
[3] => Array
(
[0] => 307
[1] => token_get_all
[2] => 2
)
[4] => (
[5] => Array
(
[0] => 307
[1] => file_get_contents
[2] => 2
)
[6] => (
[7] => Array
(
[0] => 364
[1] => __FILE__
[2] => 2
)
[8] => )
[9] => )
[10] => )
[11] => ;
[12] => Array
(
[0] => 370
[1] =>
[2] => 2
)
[13] => Array
(
[0] => 365
[1] => // test
[2] => 4
)
[14] => Array
(
[0] => 309
[1] => $var
[2] => 5
)
[15] => Array
(
[0] => 370
[1] =>
[2] => 5
)
[16] => =
[17] => Array
(
[0] => 370
[1] =>
[2] => 5
)
[18] => Array
(
[0] => 305
[1] => 5
[2] => 5
)
[19] => ;
[20] => Array
(
[0] => 370
[1] =>
[2] => 5
)
[21] => Array
(
[0] => 365
[1] => // test
[2] => 6
)
)
Actual result:
--------------
Array
(
[0] => Array
(
[0] => 368
[1] => 1
)
[1] => Array
(
[0] => 307
[1] => print_r
[2] => 2
)
[2] => (
[3] => Array
(
[0] => 307
[1] => token_get_all
[2] => 2
)
[4] => (
[5] => Array
(
[0] => 307
[1] => file_get_contents
[2] => 2
)
[6] => (
[7] => Array
(
[0] => 365
[1] => __FILE__
[2] => 2
)
[8] => )
[9] => )
[10] => )
[11] => ;
[12] => Array
(
[0] => 371
[1] =>
[2] => 2
)
[13] => Array
(
[0] => 366
[1] => // test
[2] => 4
)
[14] => Array
(
[0] => 309
[1] => $var
[2] => 5
)
[15] => Array
(
[0] => 371
[1] =>
[2] => 5
)
[16] => =
[17] => Array
(
[0] => 371
[1] =>
[2] => 5
)
[18] => Array
(
[0] => 305
[1] => 5
[2] => 5
)
[19] => ;
[20] => Array
(
[0] => 371
[1] =>
[2] => 5
)
)
Patches
Pull Requests
History
AllCommentsChangesGit/SVN commits
[2008-12-10 10:25 UTC] nlopess@php.net
[2009-03-06 07:41 UTC] lucas@php.net
I'm seeing what could be related if not the same problem trying to detect trailing windows CR+LF in T_WHITESPACE: Reproduce code: --------------- <?php // this comment and trailing blank contain windows CR+LF^M ^M Expected result: ---------------- array(3) { [0]=> array(3) { [0]=> int(367) [1]=> string(6) "<?php " [2]=> int(1) } [1]=> array(3) { [0]=> int(365) [1]=> " string(57) "// this comment and trailing blank contain windows CR+LF [2]=> int(2) } [2]=> array(3) { [0]=> int(370) [1]=> string(3) " " int(2) } } [2]=> int(2) } } Actual result: -------------- array(2) { [0]=> array(3) { [0]=> int(368) [1]=> string(6) "<?php " [2]=> int(1) } [1]=> array(3) { [0]=> int(366) [1]=> " string(57) "// this comment and trailing blank contain windows CR+LF [2]=> int(2) } }[2009-03-11 22:18 UTC] shire@php.net
[2010-11-22 13:39 UTC] felipe@php.net
-Block user comment: N +Block user comment: Y