What is PEA file format, features, specs
What
are PEA
files, format features and specifications
PEA file format specifications version 1.6
PEA file extension
Pea (.pea file extension), acronym for Pack, Encrypt, Authenticate, designs a file format focused on data security, aiming to provide archiving, compression and multi volume file split (spanning) feature in a single passage, along with flexible schemes of optional checksum / hash integrity check and authenticated file encryption (AES in EAX or HMAC mode, alternatively Twofish and Serpent in EAX mode); PEA file format specifications are released under public domain.
PEA
specifications document
PEA file format specifications and implementation notes (pdf)
PEA file format compression specs
Pea compression is optional, at current level of implementation are defined only following levels: PCOMPRESS0 (store only, no compression), and PCOMPRESS1..3 based on deflate (reference zlib's compres/uncompres algorithm code), respectively at compression level 3, 6 and 9.
PEA file format encryption and hashing specs
PEA format security model acts at 3 levels: objects (input files and folders sent to .pea archive), volumes (output archive file that can be spanned to user defined size) and streams (the actual output data stream that is formed by multiple input files and can be written to multiple output volumes); each one of those levels can be omitted as needed by the user.
- Object level integrity checking is performed to detect errors with object level granularity on raw input data and all associated data (name, size, attributes, date-time);
- Volume level integrity check is communication oriented and allow to discard single corrupted volumes in order to minimize, in case of error, the retransmission overhead;
- Current implementation allows same Checksum and
Hash algorithms featured by Object level check
- Stream level check offers wide choice of algorithms up to authenticated encryption, protecting privacy and authenticity of a group of objects sharing same security needs, including tags generated by object level checks;
- Current implementation allows same Checksum and
Hash algorithms featured by Object and Volume levels, plus
Authenticated encryption schemes: HMAC mode AES128, EAX mode
AES128,
AES256, Serpent128, Serpent256, Twofish128, Twofish 256, triple cascade
encrypttion combining AES, Twofish, and Serpent each 256 bit in EAX
mode.
PEA file format volume spanning specs
Arbitrarily sized volume spanning allows the archive to be splitted in volumes of arbitrary size, with the only constrain of volumes being at least 10 byte bigger than volume control tag to allow passing (through archive's header) minimum needed information to the extraction application.
PEA specs revisions
PEA file format standard, as defined in version 1 revision 5 specification, can store a single stream containing unlimited objects, each up to 2^64 byte in size; current Pea executable supports 1.5 file format specifications (practically, archives are memory and filesystem-limited rather than format limited) and is backward compatible with previous revisions of the format.
PEA 2.0 file format specifications extend the concepts behind PEA 1.x file format and can store an unlimited number of stream, but the format is not actually supported by current Pea archiving utility.
PEA format specifications table: max file size, compression, security...
Here, a brief table of features and limitations applying to file format and to current implementation:
|
Feature |
PEA file format |
Current utility implementation |
|
Archive |
||
Maximum PEA archive size |
PEA archive maximum size is
unlimited, nohigher limit is set by
the format design for maximum archive size, only filesystem size
limitations applies |
Maximum PEA archive size is
limited to 16 YB (yottabyte), up to 999999 volumes of
2^64-1
byte each. |
|
Stream number |
1.3: single stream; 2.0 unlimited number of streams; |
Single stream (1.3 file format) |
|
Output |
||
|
Security |
Optional Authenticated Encryption, at stream level only. HMAC mode: AES128, EAX mode: AES 128 or 256bit, Serpent 128 / 256, Twofish 128 / 256, triple cascade encryption: AES+Twofish+Serpent, Twofish+Serpent+AES, Serpent+AES+Twofish each 256 bit in EAX mode |
|
|
Integrity check |
AE tag (see security section) or
hash or checksum
at
stream level, plus hash or checksum for input objects, and for output
volumes. Currently supported: Adler32,
CRC32, CRC64 checksum algorithms; MD5, SHA1, RIPEMD-160, SHA-2 and
SHA-3 families, and Whirlpool hash algorithms. |
|
|
Error correction |
No scheme featured at current
level of development |
|
|
Communication recovery |
Independent volume control check allow to identify corrupted volumes (first volume may be needed to know volume check algorithm) |
No specific tool developed; volume check is done during extraction and then, allowing to repeat download only of corrupted volumes |
|
Data recovery |
Stream control tags allow to recognize correct streams, if better granularity is needed object control tags allow to recognize correct objects; input object names and POD trigger allow to identify objects and stream between the archive data; |
No specific tool developed to try error resistant data extraction, however object check errors are reported to identify corrupted and non corrupted data if the extraction is successful |
|
Support for multi volume output |
Native, requires a single
pass. Raw file spanning compatible with Unix split command, and
applications like HJSplit and 7-Zip. |
|
|
Volume number |
1..unlimited |
1..999999 (6 digit counter string in output file name, after .pea file extension) |
|
Volume size |
Volume tag size +1.. unlimited; first volume must contain at least 10 byte of data to allow parsing of the archive header, to allow unpacking application to calculate volume tag size |
Volume tag size +1.. 2^64-1 (qword variable) ; first volume must contain at least 10 byte of data |
|
Compression |
Native, requires single pass; schemes: PCOMPRESS0: no compression; PCOMPRESS1..3 based on deflate using zlib's compres/uncompres, level 3, 6 and 9 respectively |
|
|
Solid archive |
Not implemented compression modes featuring the possibility of creating solid archive |
|
|
Input |
||
|
Input types |
1.3: files and dirs; 2.0: files, dirs, metadata stored as messages triggers |
Files and dirs (1.3) |
Maximum number of files/ objects in a PEA archive
|
1..unlimited, theoretically a PEA archive can accept an unlimied number of input files |
Host system memory limited (input object list is stored in a dynamic array of strings) |
Maximum size of input file for PEA archive
|
0..2^64-1 16 EB maximum size for
each input file |
0..2^64-1 16 EB maximum size,
likely limited by underlying filesystem technology |
|
Input object qualified name size (size 0 mean that archive object is a trigger, no input object mapped to the archive object) |
1..2^16-1 64 KB of characters
under any encoding |
1..32K (exceeding needs, longer values are considered errors) |
|
Metadata |
Objects attributes and last modification time, optionally comments and any kind of meta content using messages |
Save object attributes and object last modification time. Restore only object attributes (on Microsoft Windows), nothing on *x |
Triple cascaded encryption: AES, Twofish, Serpent each 256 bit in EAX mode
PEA supports multiple chained encryption, cascading AES, Twofish, and Sepent, 256 bit in EAX mode
- Each cipher is separately keyed through PBKDF2,
scrypt (default), or both
- KFD options:
- with PBKDF2 key schedule of each cipher is based on a different hash primitive which is run for a different number of iterations: Whirlpool x 25000 for AES, SHA512 x 50000 for Twofish, SHA3-512 x 75000 for Serpent (Whirlpool is significantly slower than SHA512 that is slower than SHA3-512). PEA format revision 1.4 introduced variable, user defined number of KDF rounds for the triple cascaded encryption, up to 25 million rounds for each of the 3 algorithms - also, please note rounds are based on 512 bit hash primitives, which are more resources intensive than 256 bit counterparts.
- with scrypt KDF the key schedule work load not only impacts on the CPU but also on memory, in order to increase resilinece to dictionary attacks. Requiring 64 MB up to 1 GB RAM (depending on the KDF workload option) for each instance severely increases the requisites to build an hardware setup for brute forcing the password, making it difficult to implement such a machine with ASIC or FPGA.
- Hybrid KDF (introduced in 1.6 revision) uses scrypt
for AES (as specified in scrypt section) and for Twofish (with half the
N parameter and doubling the r parameter, same p parameter), and uses
PBKDF2 for Serpent (75,000 iterations, plus up to 25M additional
iterations to increase the work load as specified in PBKDF2 section).
KDF work flow can be increased as specified in the tenth byte of the header
- 1 use 128 MB RAM for scrypt KDFs, and +200K iterations for the PBKDF2 KDF
- 2 use 256 MB RAM for scrypt KDFs, and +500K iterations for the PBKDF2 KDF
- 3 use 512 MB RAM for scrypt KDFs, and +1M iterations for the PBKDF2 KDF
- 4 use 1 GB RAM for scrypt KDFs, and +2M iterations for the PBKDF2 KDF
- 5 use 1 GB RAM, with p = 2 for scrypt KDFs, and +5M iterations for the PBKDF2 KDF
- 6 use 1 GB RAM, with p = 4 for scrypt KDFs, and +10M iterations for the PBKDF2 KDF
- 7 use 1 GB RAM, with p = 8 for scrypt KDFs, and +25M iterations for the PBKDF2 KDF
- key schedule of each cipher is provided a separate 96 byte pseudorandom salt
- password is modified when provided as input for key schedule of each cipher; modification are trivial xor with non secret values and counters, with the sole purpose to initialize the key derivation with different values and be a further factor (alongside different salt, and different hash / iteration number) to guarantee keys are a statistically independent
- Password verification tag is the xor of the 3 password verification tags of each encryption function, and is written / verified after all 3 key initialization functions are completed before verification
- Each block between password verification tag and stream authentication tag is encrypted with all 3 ciphers
- A 1..128 bytes block of random data is added after
password verification tag in order to mask exact archive size (this is
the first block to be encrypted/decrypted)
- Each cipher generate its own 128 bit sized stream authentication tag, tags are concatenated and hashed with SHA3-384; the SHA3-384 value is checked for verification, this requires all the 3 tags to match to expected values and does not allow ciphers to be authenticated separately
Multiple encryption, if correctly implemented, is meant under current understandings to:
- Provide a larger keyspace than each single cipher, but smaller than the sum of the lengths of keyspaces due possibility of meet-in-the-middle type of attacks. However, such large keyspace may be overkilling even in event of significant quantum computing advancements: Grover's quantum algorithm which is the best-possible known attack for NP-complete problems provides a quadratic speed-up over a classic computing. Under those assumptions, as a role of thumb, a quantum computer will be able to brute force a 256 bit keyspace not faster than a classic machine can brute force a 128 bit keyspace, which is currently considered safe by a wide margin.
- provide a security margin even in case all but one of the algorithms used as cipher (or key schedule hash) is compromised by a breakthrough in cryptanalysis, which seems unlike due the amount of theoretical work and real life testing behind mainstream primitives available today.
- The inherent added complexity makes multiple encryption more prone to implementation errors
- Performing multiple algorithms requires more computing power and consequently reduces performances.
- Test machine: notebook with Intel Core i7-8565U CPU, 4 physical cores with hyper-threading (8 logical cores), 8 GB RAM, 512 GB PCIe NVMe SSD, NTFS filesystem
- Benchmark creation of PEA archive from 100MB input:
- 7 seconds archive creation, 3 seconds archive extraction with AES256 EAX, deflate compression, CRC32 and SHA3-256 integrity checks
- 8 seconds archive creation, 4 seconds archive extraction with Serpent 256 EAX, deflate compression, CRC32 and SHA3-256 integrity checks (slower than AES and Twofish)
- 10 seconds archive creation, 6 seconds archive extraction with AES+Twofish+Serpent 256 EAX, deflate compression, CRC32 and SHA3-256 integrity checks – the purposely slower key schedule, employed at startup for multiple encryption modes, also account for the extra time
For a more complete explanation and discussion of the pea format specifications please see the documentation about Pea archive format design (.pdf).
Use cases for PEA archive format
|
|
|
|||||
|
SPEED
![]() Pea format features average speed, due lightweight, quick Deflate-based compression algorithm, and efficient encryption and hashing algorithms. |
|||||
COMPRESSION
RATIO ![]() Pea format features moderate compression, due to fast Deflate-based compression, comparable with compression ratios of GZ and classic ZIP format, making it suitable to archive or backup large quantities of data in reasonable time. |
ADVANCED
OPTIONS ![]() Pea format lacks some features of competing formats, but offers advanced security focused characteristics, as AES-based authenticated encryption (can be optionally be replaced by Serpent or Twofish EAX mode authenticated encryption), and triple cascade encryption.. |
|||||
Synopsis: Pea file format
specifications. What .pea file extension stands for? What are pea file
format features in terms of compression ratio, compression speed,
advanced authenticated encryption options?
Topics: pea file
extension specs, pea authenticated encryption
PeaZip > FAQ >
What is PEA file format, features, specs


