Source code

Revision control

Copy as Markdown

Other Tools

LZMA compression↩
----------------↩
Version: 9.35↩
This file describes LZMA encoding and decoding functions written in C language.↩
LZMA is an improved version of famous LZ77 compression algorithm. ↩
It was improved in way of maximum increasing of compression ratio,↩
keeping high decompression speed and low memory requirements for ↩
decompressing.↩
Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK)↩
Also you can look source code for LZMA encoding and decoding:↩
C/Util/Lzma/LzmaUtil.c↩
LZMA compressed file format↩
---------------------------↩
Offset Size Description↩
0 1 Special LZMA properties (lc,lp, pb in encoded form)↩
1 4 Dictionary size (little endian)↩
5 8 Uncompressed size (little endian). -1 means unknown size↩
13 Compressed data↩
ANSI-C LZMA Decoder↩
~~~~~~~~~~~~~~~~~~~↩
Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.↩
If you want to use old interfaces you can download previous version of LZMA SDK↩
from sourceforge.net site.↩
To use ANSI-C LZMA Decoder you need the following files:↩
1) LzmaDec.h + LzmaDec.c + 7zTypes.h + Precomp.h + Compiler.h↩
Look example code:↩
C/Util/Lzma/LzmaUtil.c↩
Memory requirements for LZMA decoding↩
-------------------------------------↩
Stack usage of LZMA decoding function for local variables is not ↩
larger than 200-400 bytes.↩
LZMA Decoder uses dictionary buffer and internal state structure.↩
Internal state structure consumes↩
state_size = (4 + (1.5 << (lc + lp))) KB↩
by default (lc=3, lp=0), state_size = 16 KB.↩
How To decompress data↩
----------------------↩
LZMA Decoder (ANSI-C version) now supports 2 interfaces:↩
1) Single-call Decompressing↩
2) Multi-call State Decompressing (zlib-like interface)↩
You must use external allocator:↩
Example:↩
void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }↩
void SzFree(void *p, void *address) { p = p; free(address); }↩
ISzAlloc alloc = { SzAlloc, SzFree };↩
You can use p = p; operator to disable compiler warnings.↩
Single-call Decompressing↩
-------------------------↩
When to use: RAM->RAM decompressing↩
Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h↩
Compile defines: no defines↩
Memory Requirements:↩
- Input buffer: compressed size↩
- Output buffer: uncompressed size↩
- LZMA Internal Structures: state_size (16 KB for default settings) ↩
Interface:↩
int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,↩
const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, ↩
ELzmaStatus *status, ISzAlloc *alloc);↩
In: ↩
dest - output data↩
destLen - output data size↩
src - input data↩
srcLen - input data size↩
propData - LZMA properties (5 bytes)↩
propSize - size of propData buffer (5 bytes)↩
finishMode - It has meaning only if the decoding reaches output limit (*destLen).↩
LZMA_FINISH_ANY - Decode just destLen bytes.↩
LZMA_FINISH_END - Stream must be finished after (*destLen).↩
You can use LZMA_FINISH_END, when you know that ↩
current output buffer covers last bytes of stream. ↩
alloc - Memory allocator.↩
Out: ↩
destLen - processed output size ↩
srcLen - processed input size ↩
Output:↩
SZ_OK↩
status:↩
LZMA_STATUS_FINISHED_WITH_MARK↩
LZMA_STATUS_NOT_FINISHED ↩
LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK↩
SZ_ERROR_DATA - Data error↩
SZ_ERROR_MEM - Memory allocation error↩
SZ_ERROR_UNSUPPORTED - Unsupported properties↩
SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).↩
If LZMA decoder sees end_marker before reaching output limit, it returns OK result,↩
and output value of destLen will be less than output buffer size limit.↩
You can use multiple checks to test data integrity after full decompression:↩
1) Check Result and "status" variable.↩
2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.↩
3) Check that output(srcLen) = compressedSize, if you know real compressedSize. ↩
You must use correct finish mode in that case. */ ↩
Multi-call State Decompressing (zlib-like interface)↩
----------------------------------------------------↩
When to use: file->file decompressing ↩
Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h↩
Memory Requirements:↩
- Buffer for input stream: any size (for example, 16 KB)↩
- Buffer for output stream: any size (for example, 16 KB)↩
- LZMA Internal Structures: state_size (16 KB for default settings) ↩
- LZMA dictionary (dictionary size is encoded in LZMA properties header)↩
1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:↩
unsigned char header[LZMA_PROPS_SIZE + 8];↩
ReadFile(inFile, header, sizeof(header)↩
2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties↩
CLzmaDec state;↩
LzmaDec_Constr(&state);↩
res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);↩
if (res != SZ_OK)↩
return res;↩
3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop↩
LzmaDec_Init(&state);↩
for (;;)↩
{↩
... ↩
int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, ↩
const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);↩
...↩
}↩
4) Free all allocated structures↩
LzmaDec_Free(&state, &g_Alloc);↩
Look example code:↩
C/Util/Lzma/LzmaUtil.c↩
How To compress data↩
--------------------↩
Compile files: ↩
7zTypes.h↩
Threads.h ↩
LzmaEnc.h↩
LzmaEnc.c↩
LzFind.h↩
LzFind.c↩
LzFindMt.h↩
LzFindMt.c↩
LzHash.h↩
Memory Requirements:↩
- (dictSize * 11.5 + 6 MB) + state_size↩
Lzma Encoder can use two memory allocators:↩
1) alloc - for small arrays.↩
2) allocBig - for big arrays.↩
For example, you can use Large RAM Pages (2 MB) in allocBig allocator for ↩
better compression speed. Note that Windows has bad implementation for ↩
Large RAM Pages. ↩
It's OK to use same allocator for alloc and allocBig.↩
Single-call Compression with callbacks↩
--------------------------------------↩
Look example code:↩
C/Util/Lzma/LzmaUtil.c↩
When to use: file->file compressing ↩
1) you must implement callback structures for interfaces:↩
ISeqInStream↩
ISeqOutStream↩
ICompressProgress↩
ISzAlloc↩
static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }↩
static void SzFree(void *p, void *address) { p = p; MyFree(address); }↩
static ISzAlloc g_Alloc = { SzAlloc, SzFree };↩
CFileSeqInStream inStream;↩
CFileSeqOutStream outStream;↩
inStream.funcTable.Read = MyRead;↩
inStream.file = inFile;↩
outStream.funcTable.Write = MyWrite;↩
outStream.file = outFile;↩
2) Create CLzmaEncHandle object;↩
CLzmaEncHandle enc;↩
enc = LzmaEnc_Create(&g_Alloc);↩
if (enc == 0)↩
return SZ_ERROR_MEM;↩
3) initialize CLzmaEncProps properties;↩
LzmaEncProps_Init(&props);↩
Then you can change some properties in that structure.↩
4) Send LZMA properties to LZMA Encoder↩
res = LzmaEnc_SetProps(enc, &props);↩
5) Write encoded properties to header↩
Byte header[LZMA_PROPS_SIZE + 8];↩
size_t headerSize = LZMA_PROPS_SIZE;↩
UInt64 fileSize;↩
int i;↩
res = LzmaEnc_WriteProperties(enc, header, &headerSize);↩
fileSize = MyGetFileLength(inFile);↩
for (i = 0; i < 8; i++)↩
header[headerSize++] = (Byte)(fileSize >> (8 * i));↩
MyWriteFileAndCheck(outFile, header, headerSize)↩
6) Call encoding function:↩
res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, ↩
NULL, &g_Alloc, &g_Alloc);↩
7) Destroy LZMA Encoder Object↩
LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);↩
If callback function return some error code, LzmaEnc_Encode also returns that code↩
or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.↩
Single-call RAM->RAM Compression↩
--------------------------------↩
Single-call RAM->RAM Compression is similar to Compression with callbacks,↩
but you provide pointers to buffers instead of pointers to stream callbacks:↩
SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,↩
const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, ↩
ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);↩
Return code:↩
SZ_OK - OK↩
SZ_ERROR_MEM - Memory allocation error ↩
SZ_ERROR_PARAM - Incorrect paramater↩
SZ_ERROR_OUTPUT_EOF - output buffer overflow↩
SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version)↩
Defines↩
-------↩
_LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.↩
_LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for ↩
some structures will be doubled in that case.↩
_LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit.↩
_LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type.↩
_7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder.↩
C++ LZMA Encoder/Decoder ↩
~~~~~~~~~~~~~~~~~~~~~~~~↩
C++ LZMA code use COM-like interfaces. So if you want to use it, ↩
you can study basics of COM/OLE.↩
C++ LZMA code is just wrapper over ANSI-C code.↩
C++ Notes↩
~~~~~~~~~~~~~~~~~~~~~~~~↩
If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),↩
you must check that you correctly work with "new" operator.↩
7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.↩
So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:↩
operator new(size_t size)↩
{↩
void *p = ::malloc(size);↩
if (p == 0)↩
throw CNewException();↩
return p;↩
}↩
If you use MSCV that throws exception for "new" operator, you can compile without ↩
"NewHandler.cpp". So standard exception will be used. Actually some code of ↩
7-Zip catches any exception in internal code and converts it to HRESULT code.↩
So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.↩
---↩