array_diff() takes over 3000 times longer than php 5.2.4

Bug #47643 array_diff() takes over 3000 times longer than php 5.2.4
Submitted: 2009-03-13 11:49 UTC Modified: 2010-11-01 18:18 UTC
Votes:35
Avg. Score:4.7 ± 0.6
Reproduced:28 of 29 (96.6%)
Same Version:22 (78.6%)
Same OS:17 (60.7%)
From: viper7 at viper-7 dot com Assigned: felipe (profile)
Status: Closed Package: Performance problem
PHP Version: 5.*, 6CVS (2009-04-13) OS: *
Private report: No CVE-ID: None

 [2009-03-13 11:49 UTC] viper7 at viper-7 dot com

Description:
------------
This bug was reported in ##php on freenode, and after some thorough testing on multiple machines we determined it must be an engine bug.

array_diff on two large arrays of md5 hashes (600,000 elements each) takes approximately 4 seconds on a fast server in PHP 5.2.4 and below (confirmed with PHP 5.2.0), but over 4 hours (!) on PHP 5.2.6 and greater (confirmed with PHP 5.2.9 and PHP 5.3.0 beta2)


Reproduce code:
---------------
<?php
$i=0; $j=500000;
while($i < 600000) {
	$i++; $j++;
	$data1[] = md5($i);
	$data2[] = md5($j);
}
 
$time = microtime(true);

echo "Starting array_diff\n";
$data_diff1 = array_diff($data1, $data2);

$time = microtime(true) - $time;

echo 'array_diff() took ' . number_format($time, 3) . ' seconds and returned ' . count($data_diff1) . " entries\n";
?>

Expected result:
----------------
Starting array_diff
array_diff() took 3.778 seconds and returned 500000 entries

Actual result:
--------------
Starting array_diff
array_diff() took 14826.278 seconds and returned 500000 entries

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

 [2009-06-30 15:22 UTC] derick@php.net

Dmitry, could you have a look? I have no idea why this occurs.

 [2009-07-01 15:32 UTC] dmitry@php.net

The problems occurs because of "bad" patch for bug #42838.

The diff algorithm sorts arrays using qsort and then assumes that they are sorted correctly. But in case of user compaison function it can't be guaranteed. Thus in ext/standard/tests/array/bug42838.phpt key_compare_func() can't sort array correctly because expressions (0 < 'a') and (0 > 'a') both false ('a' is interpreted as a number 0).

It should be fixed in some way

 [2009-07-09 20:38 UTC] jani@php.net

As Dmitry's noted, this is side-effect your fix caused.

 [2010-01-17 12:09 UTC] emiel dot bruijntjes at copernica dot com

This bug is now open for 10 months. Are you still working on this?

 [2010-02-17 20:53 UTC] maarten at talkin dot nl

Why dont you only reset ptr if (behavior & DIFF_ASSOC) ?

 [2010-04-16 22:20 UTC] sylvain at jamendo dot com

I would also appreciate a patch, this issue made our servers crash after a php 5.3 
upgrade :-/

thanks!

 [2010-08-04 05:21 UTC] lonnyk at gmail dot com

I feel as though the actual bug here is the fix that caused this issue.  If you 
revert the fix and typecast the variables passed into the custom compare function 
as (string) then this works fine.  This is in line with other non-user defined 
comparison functions, they compare as === and not ==

 [2010-11-01 18:18 UTC] felipe@php.net

-Status: Assigned +Status: Closed

 [2010-11-01 18:18 UTC] felipe@php.net

This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 [2011-02-23 14:56 UTC] jaromir dot dolecek at skype dot net

Looking at the fix, the same problem seems to be possible to happen with 
DIFF_ASSOC option.

 [2014-11-28 14:39 UTC] samantha at adrichem dot nu

Could this fix be the reason why (since i can't find anything else in the changelog) array_diff() now (php 5.6) does string comparisons and no longer supports multidimensional arrays, whilst in php 5.3.23 it does? (though documentation says it doesn't)

 [2014-11-28 14:47 UTC] samantha at adrichem dot nu

Never mind, it just didn't generate a notice array to string conversion, now it does