How to optimize file compression, best practices
File archiving is a way to consolidate multiple input files in a single output archive, often using integrated data compression strategies for removing data redundancies, so the output is both smaller (to save disk space occupation and upload/download bandwidth) and easier to handle than separate input files - learn more: what is a compressed / archive file
Optimize compression method due end user's goals
A common concern about compressing data - either for backup or file upload or distribution - is balancing worthy compression ratio with reasonably fast operational speed, so i.e. end users will be able to unpack data in a timely fashion, or a backup process will end in a fixed maximum amount of time.
As scenarios of different goals and constrains will vary, file compression efficiency factors must be carefully weighted minding intended use of the data in first place, in following chapter will be provided some suggestions for carefully chosing strategies and best parameters for optimal compression results.
Data compression best practices
Quite obviously, best data compression practices mean nothing if the file cannot be provided to the intended end user. If the archive needs to be shared, the first concern is what archive file types is capable to read the end user - what archive formats are supported or can be supported through end user computing platform (Microsoft Windows, Google Android/ChromeOS, iOS, Apple OSX, Linux, BDS...) - if the user is willing and authorized to install needed software.
So most of times the better choice in this case is staying with most common format (ZIP), while RAR is quite popular on MS Windows platforms and TAR is ubiquitously supported on Unix derivate systems, and 7Z is becoming increasingly popular on all systems.
Some
file
sharing platforms, cloud services, and e-mail provides may
block some file types with the explanation they are commonly abused
(spam, viruses, illicit content), preventing it to reach the intended
end user(s), so it is critical to read terms of services to avoid this
issue.
Usually changing file extension is not a solution, as each archive file
has a well defined internal structure (that is meant for the file to
properly function, so can hardly be cloaked) so file format recognition
is seldom based on simple parsing the file extension.
In some other cases are blocked all encrypted files or all files of
unknown/unsupported formats that service provider are not able to
inspect / scan for viruses.
Keep archive size under a mandatory max size
Following block discuss factors that influences more the efficiency of compression, and which needs more weight and attention in evaluation for choiche of best compression strategy, and options / tips & tricks to obtain best results.
More suggestions can be found on: compression algorithms
comparison
, entropy
and maximum
e-mail
attachment size
articles
on Wikipedia.
Best
options for maximum compression efficiency
| Evaluate
need
for
using high
compression formats and settings
Highest
compression
ratio is
usually attained with slower and more computing intensive algorithms,
i.e. RAR compression is slower and
more powerful than ZIP compression,
and 7Z compression is
slower and more powerful compressor than RAR, with PAQ / ZPAQ
outperforming other algorithms in terms of maximum compression ratio
but requiring more computing power. |
|
|
|
Identify poorly
compressible files
Evaluate if
spending time to compress poorly compressible data or,
rather, simply store it "as is". Some
data structures contain
high levels of entropy, or
entropy is introduced by previous processes as encryption or
compression -
making further compression efforts difficult or even useless; computing
power wold be more productively spent reducing size occupation of other
types of files, leading to both better results and faster operation. |
|
|
Evaluate solid
compression
advantages
Solid
compression,
available as option for some archival formats like 7Z and RAR, can
improve final
compression ratio, it works providing a wider context for compression
algorithm to reduce data redundancy and represent it in a more
convenient way to spare output file size. |
|
|
You usually
don't need
to
archive duplicate files A very obvious suggestion is to removing duplicate identical files (deduplication) in order to avoid archiving redundant data whenever it is adviceable. Identify and remove duplicate files before archival decreases the input size improving both operation time and final size result, and at the same time make easier for the end user to navigate/search in a tidier archive. Don't remove duplicate files if they are mandatorily needed in the path they are originally featured, i.e. by a software or an automated procedure. |
|
|
Zeroing free space on
virtual machines and disk images to remove non-meaningful information Zero delete function (File tools submenu) is intended for overwriting file data or free partition space with all-0 stream, in order to fill corresponding physical disk area of homegeneus, highly compressible data. This allows to save space when compressing disk images, either low-level physical disk snapshot done for backup porpose, and Virtual Machines guest virtual disks, as the 1:1 exact copy of the disk content is not burdened of leftover data on free space area - some disk imaging utilities and Virtual Machines players/managers have built-in compression routines, zeroing free space before is strongly recommended to improve compression ratio. Zeroing deletion also offers a basic grade of security improvement over PeaZip's "Quick delete" function, which simply remove the file from filesystem, making it not recoverable by system's recycle bin but susceptible of being recovered with undelete file utilities. Zero deletion however is not meant for advanced security, and PeaZip's Secure delete should be used instead when it is needed to securely and permanently erase a file or sanitize free space on a volume for privacy reasons. Learn more about optimizing virtual machines and disk images compression. |
|
|
Impact of using self
extracting archives Self extracting archives are useful to provide the end user of the appropriate extraction routines without the need of installing any software, but being the extraction module embedded in the archive it represent an overhead of some 10s or 100s of KB, which makes it a noticeable disadvantage only in the case of very small (e.g. approximately less than 1MB) archives - which is however well in the size range of a typical archive of a few textual documents. Moreover, being the self extracting archive an executable file, some file sharing platforms, cloud providers, and e-mail servers, may block the file, preventing it to reach the intended receiver(s). |
Synopsis: How to
optimize file compression ratio and speed. Best settings and options to
improve file archiving and compression efficiency. Suggestions and best
practices for maximum data compression performances.
Topics: how to optimize
compression of files, what are the best compression options
PeaZip > FAQ
> How to optimize file compression, best practices




