Make delta directly process the input stream if it has enough data. by dbaarda · Pull Request #234 · librsync/librsync
added 14 commits
September 11, 2021 13:59Make rs_infilebuf_fill() shuffle and top-up input buffers that are more than half empty. Also tidy and tighten assert() statements in rs_infilebuf_fill() and rs_outfilebuf_drain() about to require the input/output buffers be fully contained in the rs_filebuf_t buffer.
Rename rs_scoop_total_avail() to rs_scoop_avail() and make it a static inline in stream.h Remove rs_job_input_is_ending() from job.[hc] and replace it with a rs_scoop_eof() static inline in stream.h. In scoop.c make rs_scoop_input() only shuffle data to the start of the buffer if necessary to free up space. Also make an assert() check more strict about the data being within the buffer. Slightly tidy up rs_scoop_read_rest(). In delta.c make rs_delta_s_slack() neater by using the new rs_scoop_*() functions.
…oop. Add static inline fuctions to stream.h for getting and iterating through contiguous data buffers from the scoop. In tube.c remove rs_tube_copy_from_scoop() and rs_tube_copy_from_stream() and just make rs_tube_catchup_copy() iterate through contiguous buffers from the scoop. In rs_tube_catchup() use rs_scoop_eof() to check for eof instead of checking the scoop and stream directly. Remove undefined/unused rs_buffers_copy() from steam.h.
This means we only accumulate data into the scoop buffer if the input stream is too small, otherwise we process directly from the input stream. In job.h rename scoop_pos to scan_pos and add scan_buf and scan_len for pointing at the curren scan data, which can be either in the input stream or the scoop buffer. Also give the scoop and scan fields proper doxygen comments. In delta.c use scan_pos, scan_buf, and scan_len instead of scoop_pos, scoop_next, and scoop_avail respectively. Change rs_getinput() to return an rs_result_t and take the block_len as an argument, and have it set scan_buf and scan_len using rs_scoop_readhead() to get at least enough data to scan and emit a full miss literal command. Change rs_delta_s_scan() and rs_delta_s_flush() to do rs_tube_catchup() before rs_getinput() to consume any literal data off the scoop buffer before refilling it. Change the MAX_MISS_LEN to 64K - 3 cmd bytes from 32K. In whole.c change rs_delta_file() to use buffers large enough for 4x 64K literal commands, which is large enough to scan without copying into the scoop buffer.
This gives us a single point for defining the size used for delta commands and streaming buffers. In job.h define MAX_DELTA_CMD to be the maximum size of a single delta command at 64K. In delta.c use MAX_DELTA_CMD to define MAX_MISS_LEN and use it to get the minimum readahead size in rs_getinput(). In whole.c use MAX_DELTA_CMD for defining the buffer sizes used for delta and patch operations.
It turns out ssize_t is a Posix thing that doesn't exist on Windows. I originally started using it in stream.h so that negative values could be used to indicate errors when iterating through buffers. We don't use or need that, so we can just use size_t instead.
In stream.h remove rs_scoop_input() and in scoop.c make it static inline. This function nolonger needs to be called directly anywhere outside scoop.c.
…fers. This points out that large buffers can be processed directly and can leave a tail of data behind in the input buffer. Using large buffers avoids data copies and can be much faster.
…our. Update the docstring so it correctly describes how data is processed directly from the input stream if there is sufficient data there.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters