Initial class construction

This commit is contained in:
João Narciso
2019-05-06 16:34:28 +02:00
parent 67f2d57e03
commit 431ff5f7d4
5813 changed files with 1622108 additions and 0 deletions

View File

@ -0,0 +1,27 @@
Authors of XZ Utils
===================
XZ Utils is developed and maintained by Lasse Collin
<lasse.collin@tukaani.org>.
Major parts of liblzma are based on code written by Igor Pavlov,
specifically the LZMA SDK <http://7-zip.org/sdk.html>. Without
this code, XZ Utils wouldn't exist.
The SHA-256 implementation in liblzma is based on the code found from
7-Zip <http://7-zip.org/>, which has a modified version of the SHA-256
code found from Crypto++ <http://www.cryptopp.com/>. The SHA-256 code
in Crypto++ was written by Kevin Springle and Wei Dai.
Some scripts have been adapted from gzip. The original versions
were written by Jean-loup Gailly, Charles Levert, and Paul Eggert.
Andrew Dudman helped adapting the scripts and their man pages for
XZ Utils.
The GNU Autotools-based build system contains files from many authors,
which I'm not trying to list here.
Several people have contributed fixes or reported bugs. Most of them
are mentioned in the file THANKS.

View File

@ -0,0 +1,65 @@
XZ Utils Licensing
==================
Different licenses apply to different files in this package. Here
is a rough summary of which licenses apply to which parts of this
package (but check the individual files to be sure!):
- liblzma is in the public domain.
- xz, xzdec, and lzmadec command line tools are in the public
domain unless GNU getopt_long had to be compiled and linked
in from the lib directory. The getopt_long code is under
GNU LGPLv2.1+.
- The scripts to grep, diff, and view compressed files have been
adapted from gzip. These scripts and their documentation are
under GNU GPLv2+.
- All the documentation in the doc directory and most of the
XZ Utils specific documentation files in other directories
are in the public domain.
- Translated messages are in the public domain.
- The build system contains public domain files, and files that
are under GNU GPLv2+ or GNU GPLv3+. None of these files end up
in the binaries being built.
- Test files and test code in the tests directory, and debugging
utilities in the debug directory are in the public domain.
- The extra directory may contain public domain files, and files
that are under various free software licenses.
You can do whatever you want with the files that have been put into
the public domain. If you find public domain legally problematic,
take the previous sentence as a license grant. If you still find
the lack of copyright legally problematic, you have too many
lawyers.
As usual, this software is provided "as is", without any warranty.
If you copy significant amounts of public domain code from XZ Utils
into your project, acknowledging this somewhere in your software is
polite (especially if it is proprietary, non-free software), but
naturally it is not legally required. Here is an example of a good
notice to put into "about box" or into documentation:
This software includes code from XZ Utils <https://tukaani.org/xz/>.
The following license texts are included in the following files:
- COPYING.LGPLv2.1: GNU Lesser General Public License version 2.1
- COPYING.GPLv2: GNU General Public License version 2
- COPYING.GPLv3: GNU General Public License version 3
Note that the toolchain (compiler, linker etc.) may add some code
pieces that are copyrighted. Thus, it is possible that e.g. liblzma
binary wouldn't actually be in the public domain in its entirety
even though it contains no copyrighted code from the XZ Utils source
package.
If you have questions, don't hesitate to ask the author(s) for more
information.

View File

@ -0,0 +1,339 @@
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.

View File

@ -0,0 +1,571 @@
XZ Utils Release Notes
======================
5.2.4 (2018-04-29)
* liblzma:
- Allow 0 as memory usage limit instead of returning
LZMA_PROG_ERROR. Now 0 is treated as if 1 byte was specified,
which effectively is the same as 0.
- Use "noexcept" keyword instead of "throw()" in the public
headers when a C++11 (or newer standard) compiler is used.
- Added a portability fix for recent Intel C Compilers.
- Microsoft Visual Studio build files have been moved under
windows/vs2013 and windows/vs2017.
* xz:
- Fix "xz --list --robot missing_or_bad_file.xz" which would
try to print an unitialized string and thus produce garbage
output. Since the exit status is non-zero, most uses of such
a command won't try to interpret the garbage output.
- "xz --list foo.xz" could print "Internal error (bug)" in a
corner case where a specific memory usage limit had been set.
5.2.3 (2016-12-30)
* xz:
- Always close a file before trying to delete it to avoid
problems on some operating system and file system combinations.
- Fixed copying of file timestamps on Windows.
- Added experimental (disabled by default) sandbox support using
Capsicum (FreeBSD >= 10). See --enable-sandbox in INSTALL.
* C99/C11 conformance fixes to liblzma. The issues affected at least
some builds using link-time optimizations.
* Fixed bugs in the rarely-used function lzma_index_dup().
* Use of external SHA-256 code is now disabled by default.
It can still be enabled by passing --enable-external-sha256
to configure. The reasons to disable it by default (see INSTALL
for more details):
- Some OS-specific SHA-256 implementations conflict with
OpenSSL and cause problems in programs that link against both
liblzma and libcrypto. At least FreeBSD 10 and MINIX 3.3.0
are affected.
- The internal SHA-256 is faster than the SHA-256 code in
some operating systems.
* Changed CPU core count detection to use sched_getaffinity() on
GNU/Linux and GNU/kFreeBSD.
* Fixes to the build-system and xz to make xz buildable even when
encoders, decoders, or threading have been disabled from libilzma
using configure options. These fixes added two new #defines to
config.h: HAVE_ENCODERS and HAVE_DECODERS.
5.2.2 (2015-09-29)
* Fixed bugs in QNX-specific code.
* Omitted the use of pipe2() even if it is available to avoid
portability issues with some old Linux and glibc combinations.
* Updated German translation.
* Added project files to build static and shared liblzma (not the
whole XZ Utils) with Visual Studio 2013 update 2 or later.
* Documented that threaded decompression hasn't been implemented
yet. A 5.2.0 NEWS entry describing multi-threading support had
incorrectly said "decompression" when it should have said
"compression".
5.2.1 (2015-02-26)
* Fixed a compression-ratio regression in fast mode of LZMA1 and
LZMA2. The bug is present in 5.1.4beta and 5.2.0 releases.
* Fixed a portability problem in xz that affected at least OpenBSD.
* Fixed xzdiff to be compatible with FreeBSD's mktemp which differs
from most other mktemp implementations.
* Changed CPU core count detection to use cpuset_getaffinity() on
FreeBSD.
5.2.0 (2014-12-21)
Since 5.1.4beta:
* All fixes from 5.0.8
* liblzma: Fixed lzma_stream_encoder_mt_memusage() when a preset
was used.
* xzdiff: If mktemp isn't installed, mkdir will be used as
a fallback to create a temporary directory. Installing mktemp
is still recommended.
* Updated French, German, Italian, Polish, and Vietnamese
translations.
Summary of fixes and new features added in the 5.1.x development
releases:
* liblzma:
- Added support for multi-threaded compression. See the
lzma_mt structure, lzma_stream_encoder_mt(), and
lzma_stream_encoder_mt_memusage() in <lzma/container.h>,
lzma_get_progress() in <lzma/base.h>, and lzma_cputhreads()
in <lzma/hardware.h> for details.
- Made the uses of lzma_allocator const correct.
- Added lzma_block_uncomp_encode() to create uncompressed
.xz Blocks using LZMA2 uncompressed chunks.
- Added support for LZMA_IGNORE_CHECK.
- A few speed optimizations were made.
- Added support for symbol versioning. It is enabled by default
on GNU/Linux, other GNU-based systems, and FreeBSD.
- liblzma (not the whole XZ Utils) should now be buildable
with MSVC 2013 update 2 or later using windows/config.h.
* xz:
- Fixed a race condition in the signal handling. It was
possible that e.g. the first SIGINT didn't make xz exit
if reading or writing blocked and one had bad luck. The fix
is non-trivial, so as of writing it is unknown if it will be
backported to the v5.0 branch.
- Multi-threaded compression can be enabled with the
--threads (-T) option.
[Fixed: This originally said "decompression".]
- New command line options in xz: --single-stream,
--block-size=SIZE, --block-list=SIZES,
--flush-timeout=TIMEOUT, and --ignore-check.
- xz -lvv now shows the minimum xz version that is required to
decompress the file. Currently it is 5.0.0 for all supported
.xz files except files with empty LZMA2 streams require 5.0.2.
* xzdiff and xzgrep now support .lzo files if lzop is installed.
The .tzo suffix is also recognized as a shorthand for .tar.lzo.
5.1.4beta (2014-09-14)
* All fixes from 5.0.6
* liblzma: Fixed the use of presets in threaded encoder
initialization.
* xz --block-list and --block-size can now be used together
in single-threaded mode. Previously the combination only
worked in multi-threaded mode.
* Added support for LZMA_IGNORE_CHECK to liblzma and made it
available in xz as --ignore-check.
* liblzma speed optimizations:
- Initialization of a new LZMA1 or LZMA2 encoder has been
optimized. (The speed of reinitializing an already-allocated
encoder isn't affected.) This helps when compressing many
small buffers with lzma_stream_buffer_encode() and other
similar situations where an already-allocated encoder state
isn't reused. This speed-up is visible in xz too if one
compresses many small files one at a time instead running xz
once and giving all files as command-line arguments.
- Buffer comparisons are now much faster when unaligned access
is allowed (configured with --enable-unaligned-access). This
speeds up encoding significantly. There is arch-specific code
for 32-bit and 64-bit x86 (32-bit needs SSE2 for the best
results and there's no run-time CPU detection for now).
For other archs there is only generic code which probably
isn't as optimal as arch-specific solutions could be.
- A few speed optimizations were made to the SHA-256 code.
(Note that the builtin SHA-256 code isn't used on all
operating systems.)
* liblzma can now be built with MSVC 2013 update 2 or later
using windows/config.h.
* Vietnamese translation was added.
5.1.3alpha (2013-10-26)
* All fixes from 5.0.5
* liblzma:
- Fixed a deadlock in the threaded encoder.
- Made the uses of lzma_allocator const correct.
- Added lzma_block_uncomp_encode() to create uncompressed
.xz Blocks using LZMA2 uncompressed chunks.
- Added support for native threads on Windows and the ability
to detect the number of CPU cores.
* xz:
- Fixed a race condition in the signal handling. It was
possible that e.g. the first SIGINT didn't make xz exit
if reading or writing blocked and one had bad luck. The fix
is non-trivial, so as of writing it is unknown if it will be
backported to the v5.0 branch.
- Made the progress indicator work correctly in threaded mode.
- Threaded encoder now works together with --block-list=SIZES.
- Added preliminary support for --flush-timeout=TIMEOUT.
It can be useful for (somewhat) real-time streaming. For
now the decompression side has to be done with something
else than the xz tool due to how xz does buffering, but this
should be fixed.
5.1.2alpha (2012-07-04)
* All fixes from 5.0.3 and 5.0.4
* liblzma:
- Fixed a deadlock and an invalid free() in the threaded encoder.
- Added support for symbol versioning. It is enabled by default
on GNU/Linux, other GNU-based systems, and FreeBSD.
- Use SHA-256 implementation from the operating system if one is
available in libc, libmd, or libutil. liblzma won't use e.g.
OpenSSL or libgcrypt to avoid introducing new dependencies.
- Fixed liblzma.pc for static linking.
- Fixed a few portability bugs.
* xz --decompress --single-stream now fixes the input position after
successful decompression. Now the following works:
echo foo | xz > foo.xz
echo bar | xz >> foo.xz
( xz -dc --single-stream ; xz -dc --single-stream ) < foo.xz
Note that it doesn't work if the input is not seekable
or if there is Stream Padding between the concatenated
.xz Streams.
* xz -lvv now shows the minimum xz version that is required to
decompress the file. Currently it is 5.0.0 for all supported .xz
files except files with empty LZMA2 streams require 5.0.2.
* Added an *incomplete* implementation of --block-list=SIZES to xz.
It only works correctly in single-threaded mode and when
--block-size isn't used at the same time. --block-list allows
specifying the sizes of Blocks which can be useful e.g. when
creating files for random-access reading.
5.1.1alpha (2011-04-12)
* All fixes from 5.0.2
* liblzma fixes that will also be included in 5.0.3:
- A memory leak was fixed.
- lzma_stream_buffer_encode() no longer creates an empty .xz
Block if encoding an empty buffer. Such an empty Block with
LZMA2 data would trigger a bug in 5.0.1 and older (see the
first bullet point in 5.0.2 notes). When releasing 5.0.2,
I thought that no encoder creates this kind of files but
I was wrong.
- Validate function arguments better in a few functions. Most
importantly, specifying an unsupported integrity check to
lzma_stream_buffer_encode() no longer creates a corrupt .xz
file. Probably no application tries to do that, so this
shouldn't be a big problem in practice.
- Document that lzma_block_buffer_encode(),
lzma_easy_buffer_encode(), lzma_stream_encoder(), and
lzma_stream_buffer_encode() may return LZMA_UNSUPPORTED_CHECK.
- The return values of the _memusage() functions are now
documented better.
* Support for multithreaded compression was added using the simplest
method, which splits the input data into blocks and compresses
them independently. Other methods will be added in the future.
The current method has room for improvement, e.g. it is possible
to reduce the memory usage.
* Added the options --single-stream and --block-size=SIZE to xz.
* xzdiff and xzgrep now support .lzo files if lzop is installed.
The .tzo suffix is also recognized as a shorthand for .tar.lzo.
* Support for short 8.3 filenames under DOS was added to xz. It is
experimental and may change before it gets into a stable release.
5.0.8 (2014-12-21)
* Fixed an old bug in xzgrep that affected OpenBSD and probably
a few other operating systems too.
* Updated French and German translations.
* Added support for detecting the amount of RAM on AmigaOS/AROS.
* Minor build system updates.
5.0.7 (2014-09-20)
* Fix regressions introduced in 5.0.6:
- Fix building with non-GNU make.
- Fix invalid Libs.private value in liblzma.pc which broke
static linking against liblzma if the linker flags were
taken from pkg-config.
5.0.6 (2014-09-14)
* xzgrep now exits with status 0 if at least one file matched.
* A few minor portability and build system fixes
5.0.5 (2013-06-30)
* lzmadec and liblzma's lzma_alone_decoder(): Support decompressing
.lzma files that have less common settings in the headers
(dictionary size other than 2^n or 2^n + 2^(n-1), or uncompressed
size greater than 256 GiB). The limitations existed to avoid false
positives when detecting .lzma files. The lc + lp <= 4 limitation
still remains since liblzma's LZMA decoder has that limitation.
NOTE: xz's .lzma support or liblzma's lzma_auto_decoder() are NOT
affected by this change. They still consider uncommon .lzma headers
as not being in the .lzma format. Changing this would give way too
many false positives.
* xz:
- Interaction of preset and custom filter chain options was
made less illogical. This affects only certain less typical
uses cases so few people are expected to notice this change.
Now when a custom filter chain option (e.g. --lzma2) is
specified, all preset options (-0 ... -9, -e) earlier are on
the command line are completely forgotten. Similarly, when
a preset option is specified, all custom filter chain options
earlier on the command line are completely forgotten.
Example 1: "xz -9 --lzma2=preset=5 -e" is equivalent to "xz -e"
which is equivalent to "xz -6e". Earlier -e didn't put xz back
into preset mode and thus the example command was equivalent
to "xz --lzma2=preset=5".
Example 2: "xz -9e --lzma2=preset=5 -7" is equivalent to
"xz -7". Earlier a custom filter chain option didn't make
xz forget the -e option so the example was equivalent to
"xz -7e".
- Fixes and improvements to error handling.
- Various fixes to the man page.
* xzless: Fixed to work with "less" versions 448 and later.
* xzgrep: Made -h an alias for --no-filename.
* Include the previously missing debug/translation.bash which can
be useful for translators.
* Include a build script for Mac OS X. This has been in the Git
repository since 2010 but due to a mistake in Makefile.am the
script hasn't been included in a release tarball before.
5.0.4 (2012-06-22)
* liblzma:
- Fix lzma_index_init(). It could crash if memory allocation
failed.
- Fix the possibility of an incorrect LZMA_BUF_ERROR when a BCJ
filter is used and the application only provides exactly as
much output space as is the uncompressed size of the file.
- Fix a bug in doc/examples_old/xz_pipe_decompress.c. It didn't
check if the last call to lzma_code() really returned
LZMA_STREAM_END, which made the program think that truncated
files are valid.
- New example programs in doc/examples (old programs are now in
doc/examples_old). These have more comments and more detailed
error handling.
* Fix "xz -lvv foo.xz". It could crash on some corrupted files.
* Fix output of "xz --robot -lv" and "xz --robot -lvv" which
incorrectly printed the filename also in the "foo (x/x)" format.
* Fix exit status of "xzdiff foo.xz bar.xz".
* Fix exit status of "xzgrep foo binary_file".
* Fix portability to EBCDIC systems.
* Fix a configure issue on AIX with the XL C compiler. See INSTALL
for details.
* Update French, German, Italian, and Polish translations.
5.0.3 (2011-05-21)
* liblzma fixes:
- A memory leak was fixed.
- lzma_stream_buffer_encode() no longer creates an empty .xz
Block if encoding an empty buffer. Such an empty Block with
LZMA2 data would trigger a bug in 5.0.1 and older (see the
first bullet point in 5.0.2 notes). When releasing 5.0.2,
I thought that no encoder creates this kind of files but
I was wrong.
- Validate function arguments better in a few functions. Most
importantly, specifying an unsupported integrity check to
lzma_stream_buffer_encode() no longer creates a corrupt .xz
file. Probably no application tries to do that, so this
shouldn't be a big problem in practice.
- Document that lzma_block_buffer_encode(),
lzma_easy_buffer_encode(), lzma_stream_encoder(), and
lzma_stream_buffer_encode() may return LZMA_UNSUPPORTED_CHECK.
- The return values of the _memusage() functions are now
documented better.
* Fix command name detection in xzgrep. xzegrep and xzfgrep now
correctly use egrep and fgrep instead of grep.
* French translation was added.
5.0.2 (2011-04-01)
* LZMA2 decompressor now correctly accepts LZMA2 streams with no
uncompressed data. Previously it considered them corrupt. The
bug can affect applications that use raw LZMA2 streams. It is
very unlikely to affect .xz files because no compressor creates
.xz files with empty LZMA2 streams. (Empty .xz files are a
different thing than empty LZMA2 streams.)
* "xz --suffix=.foo filename.foo" now refuses to compress the
file due to it already having the suffix .foo. It was already
documented on the man page, but the code lacked the test.
* "xzgrep -l foo bar.xz" works now.
* Polish translation was added.
5.0.1 (2011-01-29)
* xz --force now (de)compresses files that have setuid, setgid,
or sticky bit set and files that have multiple hard links.
The man page had it documented this way already, but the code
had a bug.
* gzip and bzip2 support in xzdiff was fixed.
* Portability fixes
* Minor fix to Czech translation
5.0.0 (2010-10-23)
Only the most important changes compared to 4.999.9beta are listed
here. One change is especially important:
* The memory usage limit is now disabled by default. Some scripts
written before this change may have used --memory=max on xz command
line or in XZ_OPT. THESE USES OF --memory=max SHOULD BE REMOVED
NOW, because they interfere with user's ability to set the memory
usage limit himself. If user-specified limit causes problems to
your script, blame the user.
Other significant changes:
* Added support for XZ_DEFAULTS environment variable. This variable
allows users to set default options for xz, e.g. default memory
usage limit or default compression level. Scripts that use xz
must never set or unset XZ_DEFAULTS. Scripts should use XZ_OPT
instead if they need a way to pass options to xz via an
environment variable.
* The compression settings associated with the preset levels
-0 ... -9 have been changed. --extreme was changed a little too.
It is now less likely to make compression worse, but with some
files the new --extreme may compress slightly worse than the old
--extreme.
* If a preset level (-0 ... -9) is specified after a custom filter
chain options have been used (e.g. --lzma2), the custom filter
chain will be forgotten. Earlier the preset options were
completely ignored after custom filter chain options had been
seen.
* xz will create sparse files when decompressing if the uncompressed
data contains long sequences of binary zeros. This is done even
when writing to standard output that is connected to a regular
file and certain additional conditions are met to make it safe.
* Support for "xz --list" was added. Combine with --verbose or
--verbose --verbose (-vv) for detailed output.
* I had hoped that liblzma API would have been stable after
4.999.9beta, but there have been a couple of changes in the
advanced features, which don't affect most applications:
- Index handling code was revised. If you were using the old
API, you will get a compiler error (so it's easy to notice).
- A subtle but important change was made to the Block handling
API. lzma_block.version has to be initialized even for
lzma_block_header_decode(). Code that doesn't do it will work
for now, but might break in the future, which makes this API
change easy to miss.
* The major soname has been bumped to 5.0.0. liblzma API and ABI
are now stable, so the need to recompile programs linking against
liblzma shouldn't arise soon.

View File

@ -0,0 +1,308 @@
XZ Utils
========
0. Overview
1. Documentation
1.1. Overall documentation
1.2. Documentation for command-line tools
1.3. Documentation for liblzma
2. Version numbering
3. Reporting bugs
4. Translating the xz tool
5. Other implementations of the .xz format
6. Contact information
0. Overview
-----------
XZ Utils provide a general-purpose data-compression library plus
command-line tools. The native file format is the .xz format, but
also the legacy .lzma format is supported. The .xz format supports
multiple compression algorithms, which are called "filters" in the
context of XZ Utils. The primary filter is currently LZMA2. With
typical files, XZ Utils create about 30 % smaller files than gzip.
To ease adapting support for the .xz format into existing applications
and scripts, the API of liblzma is somewhat similar to the API of the
popular zlib library. For the same reason, the command-line tool xz
has a command-line syntax similar to that of gzip.
When aiming for the highest compression ratio, the LZMA2 encoder uses
a lot of CPU time and may use, depending on the settings, even
hundreds of megabytes of RAM. However, in fast modes, the LZMA2 encoder
competes with bzip2 in compression speed, RAM usage, and compression
ratio.
LZMA2 is reasonably fast to decompress. It is a little slower than
gzip, but a lot faster than bzip2. Being fast to decompress means
that the .xz format is especially nice when the same file will be
decompressed very many times (usually on different computers), which
is the case e.g. when distributing software packages. In such
situations, it's not too bad if the compression takes some time,
since that needs to be done only once to benefit many people.
With some file types, combining (or "chaining") LZMA2 with an
additional filter can improve the compression ratio. A filter chain may
contain up to four filters, although usually only one or two are used.
For example, putting a BCJ (Branch/Call/Jump) filter before LZMA2
in the filter chain can improve compression ratio of executable files.
Since the .xz format allows adding new filter IDs, it is possible that
some day there will be a filter that is, for example, much faster to
compress than LZMA2 (but probably with worse compression ratio).
Similarly, it is possible that some day there is a filter that will
compress better than LZMA2.
XZ Utils doesn't support multithreaded compression or decompression
yet. It has been planned though and taken into account when designing
the .xz file format.
1. Documentation
----------------
1.1. Overall documentation
README This file
INSTALL.generic Generic install instructions for those not familiar
with packages using GNU Autotools
INSTALL Installation instructions specific to XZ Utils
PACKAGERS Information to packagers of XZ Utils
COPYING XZ Utils copyright and license information
COPYING.GPLv2 GNU General Public License version 2
COPYING.GPLv3 GNU General Public License version 3
COPYING.LGPLv2.1 GNU Lesser General Public License version 2.1
AUTHORS The main authors of XZ Utils
THANKS Incomplete list of people who have helped making
this software
NEWS User-visible changes between XZ Utils releases
ChangeLog Detailed list of changes (commit log)
TODO Known bugs and some sort of to-do list
Note that only some of the above files are included in binary
packages.
1.2. Documentation for command-line tools
The command-line tools are documented as man pages. In source code
releases (and possibly also in some binary packages), the man pages
are also provided in plain text (ASCII only) and PDF formats in the
directory "doc/man" to make the man pages more accessible to those
whose operating system doesn't provide an easy way to view man pages.
1.3. Documentation for liblzma
The liblzma API headers include short docs about each function
and data type as Doxygen tags. These docs should be quite OK as
a quick reference.
I have planned to write a bunch of very well documented example
programs, which (due to comments) should work as a tutorial to
various features of liblzma. No such example programs have been
written yet.
For now, if you have never used liblzma, libbzip2, or zlib, I
recommend learning the *basics* of the zlib API. Once you know that,
it should be easier to learn liblzma.
http://zlib.net/manual.html
http://zlib.net/zlib_how.html
2. Version numbering
--------------------
The version number format of XZ Utils is X.Y.ZS:
- X is the major version. When this is incremented, the library
API and ABI break.
- Y is the minor version. It is incremented when new features
are added without breaking the existing API or ABI. An even Y
indicates a stable release and an odd Y indicates unstable
(alpha or beta version).
- Z is the revision. This has a different meaning for stable and
unstable releases:
* Stable: Z is incremented when bugs get fixed without adding
any new features. This is intended to be convenient for
downstream distributors that want bug fixes but don't want
any new features to minimize the risk of introducing new bugs.
* Unstable: Z is just a counter. API or ABI of features added
in earlier unstable releases having the same X.Y may break.
- S indicates stability of the release. It is missing from the
stable releases, where Y is an even number. When Y is odd, S
is either "alpha" or "beta" to make it very clear that such
versions are not stable releases. The same X.Y.Z combination is
not used for more than one stability level, i.e. after X.Y.Zalpha,
the next version can be X.Y.(Z+1)beta but not X.Y.Zbeta.
3. Reporting bugs
-----------------
Naturally it is easiest for me if you already know what causes the
unexpected behavior. Even better if you have a patch to propose.
However, quite often the reason for unexpected behavior is unknown,
so here are a few things to do before sending a bug report:
1. Try to create a small example how to reproduce the issue.
2. Compile XZ Utils with debugging code using configure switches
--enable-debug and, if possible, --disable-shared. If you are
using GCC, use CFLAGS='-O0 -ggdb3'. Don't strip the resulting
binaries.
3. Turn on core dumps. The exact command depends on your shell;
for example in GNU bash it is done with "ulimit -c unlimited",
and in tcsh with "limit coredumpsize unlimited".
4. Try to reproduce the suspected bug. If you get "assertion failed"
message, be sure to include the complete message in your bug
report. If the application leaves a coredump, get a backtrace
using gdb:
$ gdb /path/to/app-binary # Load the app to the debugger.
(gdb) core core # Open the coredump.
(gdb) bt # Print the backtrace. Copy & paste to bug report.
(gdb) quit # Quit gdb.
Report your bug via email or IRC (see Contact information below).
Don't send core dump files or any executables. If you have a small
example file(s) (total size less than 256 KiB), please include
it/them as an attachment. If you have bigger test files, put them
online somewhere and include a URL to the file(s) in the bug report.
Always include the exact version number of XZ Utils in the bug report.
If you are using a snapshot from the git repository, use "git describe"
to get the exact snapshot version. If you are using XZ Utils shipped
in an operating system distribution, mention the distribution name,
distribution version, and exact xz package version; if you cannot
repeat the bug with the code compiled from unpatched source code,
you probably need to report a bug to your distribution's bug tracking
system.
4. Translating the xz tool
--------------------------
The messages from the xz tool have been translated into a few
languages. Before starting to translate into a new language, ask
the author whether someone else hasn't already started working on it.
Test your translation. Testing includes comparing the translated
output to the original English version by running the same commands
in both your target locale and with LC_ALL=C. Ask someone to
proof-read and test the translation.
Testing can be done e.g. by installing xz into a temporary directory:
./configure --disable-shared --prefix=/tmp/xz-test
# <Edit the .po file in the po directory.>
make -C po update-po
make install
bash debug/translation.bash | less
bash debug/translation.bash | less -S # For --list outputs
Repeat the above as needed (no need to re-run configure though).
Note especially the following:
- The output of --help and --long-help must look nice on
an 80-column terminal. It's OK to add extra lines if needed.
- In contrast, don't add extra lines to error messages and such.
They are often preceded with e.g. a filename on the same line,
so you have no way to predict where to put a \n. Let the terminal
do the wrapping even if it looks ugly. Adding new lines will be
even uglier in the generic case even if it looks nice in a few
limited examples.
- Be careful with column alignment in tables and table-like output
(--list, --list --verbose --verbose, --info-memory, --help, and
--long-help):
* All descriptions of options in --help should start in the
same column (but it doesn't need to be the same column as
in the English messages; just be consistent if you change it).
Check that both --help and --long-help look OK, since they
share several strings.
* --list --verbose and --info-memory print lines that have
the format "Description: %s". If you need a longer
description, you can put extra space between the colon
and %s. Then you may need to add extra space to other
strings too so that the result as a whole looks good (all
values start at the same column).
* The columns of the actual tables in --list --verbose --verbose
should be aligned properly. Abbreviate if necessary. It might
be good to keep at least 2 or 3 spaces between column headings
and avoid spaces in the headings so that the columns stand out
better, but this is a matter of opinion. Do what you think
looks best.
- Be careful to put a period at the end of a sentence when the
original version has it, and don't put it when the original
doesn't have it. Similarly, be careful with \n characters
at the beginning and end of the strings.
- Read the TRANSLATORS comments that have been extracted from the
source code and included in xz.pot. If they suggest testing the
translation with some type of command, do it. If testing needs
input files, use e.g. tests/files/good-*.xz.
- When updating the translation, read the fuzzy (modified) strings
carefully, and don't mark them as updated before you actually
have updated them. Reading through the unchanged messages can be
good too; sometimes you may find a better wording for them.
- If you find language problems in the original English strings,
feel free to suggest improvements. Ask if something is unclear.
- The translated messages should be understandable (sometimes this
may be a problem with the original English messages too). Don't
make a direct word-by-word translation from English especially if
the result doesn't sound good in your language.
In short, take your time and pay attention to the details. Making
a good translation is not a quick and trivial thing to do. The
translated xz should look as polished as the English version.
5. Other implementations of the .xz format
------------------------------------------
7-Zip and the p7zip port of 7-Zip support the .xz format starting
from the version 9.00alpha.
http://7-zip.org/
http://p7zip.sourceforge.net/
XZ Embedded is a limited implementation written for use in the Linux
kernel, but it is also suitable for other embedded use.
https://tukaani.org/xz/embedded.html
6. Contact information
----------------------
If you have questions, bug reports, patches etc. related to XZ Utils,
contact Lasse Collin <lasse.collin@tukaani.org> (in Finnish or English).
I'm sometimes slow at replying. If you haven't got a reply within two
weeks, assume that your email has got lost and resend it or use IRC.
You can find me also from #tukaani on Freenode; my nick is Larhzu.
The channel tends to be pretty quiet, so just ask your question and
someone may wake up.

View File

@ -0,0 +1,124 @@
Thanks
======
Some people have helped more, some less, but nevertheless everyone's help
has been important. :-) In alphabetical order:
- Mark Adler
- H. Peter Anvin
- Jeff Bastian
- Nelson H. F. Beebe
- Karl Berry
- Anders F. Björklund
- Emmanuel Blot
- Melanie Blower
- Martin Blumenstingl
- Ben Boeckel
- Jakub Bogusz
- Maarten Bosmans
- Trent W. Buck
- James Buren
- David Burklund
- Daniel Mealha Cabrita
- Milo Casagrande
- Marek Černocký
- Tomer Chachamu
- Gabi Davar
- Chris Donawa
- Andrew Dudman
- Markus Duft
- İsmail Dönmez
- Robert Elz
- Gilles Espinasse
- Denis Excoffier
- Michael Felt
- Michael Fox
- Mike Frysinger
- Daniel Richard G.
- Bill Glessner
- Jason Gorski
- Juan Manuel Guerrero
- Diederik de Haas
- Joachim Henke
- Christian Hesse
- Vincenzo Innocente
- Peter Ivanov
- Jouk Jansen
- Jun I Jin
- Per Øyvind Karlsen
- Thomas Klausner
- Richard Koch
- Ville Koskinen
- Jan Kratochvil
- Christian Kujau
- Stephan Kulow
- Peter Lawler
- James M Leddy
- Hin-Tak Leung
- Andraž 'ruskie' Levstik
- Cary Lewis
- Wim Lewis
- Eric Lindblad
- Lorenzo De Liso
- Bela Lubkin
- Gregory Margo
- Jim Meyering
- Arkadiusz Miskiewicz
- Conley Moorhous
- Rafał Mużyło
- Adrien Nader
- Evan Nemerson
- Hongbo Ni
- Jonathan Nieder
- Andre Noll
- Peter O'Gorman
- Peter Pallinger
- Rui Paulo
- Igor Pavlov
- Diego Elio Pettenò
- Elbert Pol
- Mikko Pouru
- Rich Prohaska
- Trần Ngọc Quân
- Pavel Raiskup
- Ole André Vadla Ravnås
- Robert Readman
- Bernhard Reutner-Fischer
- Eric S. Raymond
- Cristian Rodríguez
- Christian von Roques
- Torsten Rupp
- Jukka Salmi
- Alexandre Sauvé
- Benno Schulenberg
- Andreas Schwab
- Dan Shechter
- Stuart Shelton
- Sebastian Andrzej Siewior
- Brad Smith
- Pippijn van Steenhoven
- Jonathan Stott
- Dan Stromberg
- Vincent Torri
- Paul Townsend
- Mohammed Adnène Trojette
- Alexey Tourbin
- Patrick J. Volkerding
- Martin Väth
- Adam Walling
- Christian Weisgerber
- Bert Wesarg
- Fredrik Wikstrom
- Jim Wilcoxson
- Ralf Wildenhues
- Charles Wilson
- Lars Wirzenius
- Pilorz Wojciech
- Ryan Young
- Andreas Zieringer
Also thanks to all the people who have participated in the Tukaani project.
I have probably forgot to add some names to the above list. Sorry about
that and thanks for your help.

View File

@ -0,0 +1,111 @@
XZ Utils To-Do List
===================
Known bugs
----------
The test suite is too incomplete.
If the memory usage limit is less than about 13 MiB, xz is unable to
automatically scale down the compression settings enough even though
it would be possible by switching from BT2/BT3/BT4 match finder to
HC3/HC4.
XZ Utils compress some files significantly worse than LZMA Utils.
This is due to faster compression presets used by XZ Utils, and
can often be worked around by using "xz --extreme". With some files
--extreme isn't enough though: it's most likely with files that
compress extremely well, so going from compression ratio of 0.003
to 0.004 means big relative increase in the compressed file size.
xz doesn't quote unprintable characters when it displays file names
given on the command line.
tuklib_exit() doesn't block signals => EINTR is possible.
SIGTSTP is not handled. If xz is stopped, the estimated remaining
time and calculated (de)compression speed won't make sense in the
progress indicator (xz --verbose).
If liblzma has created threads and fork() gets called, liblzma
code will break in the child process unless it calls exec() and
doesn't touch liblzma.
Missing features
----------------
Add support for storing metadata in .xz files. A preliminary
idea is to create a new Stream type for metadata. When both
metadata and data are wanted in the same .xz file, two or more
Streams would be concatenated.
The state stored in lzma_stream should be cloneable, which would
be mostly useful when using a preset dictionary in LZMA2, but
it may have other uses too. Compare to deflateCopy() in zlib.
Support LZMA_FINISH in raw decoder to indicate end of LZMA1 and
other streams that don't have an end of payload marker.
Adjust dictionary size when the input file size is known.
Maybe do this only if an option is given.
xz doesn't support copying extended attributes, access control
lists etc. from source to target file.
Multithreaded compression:
- Reduce memory usage of the current method.
- Implement threaded match finders.
- Implement pigz-style threading in LZMA2.
Multithreaded decompression
Buffer-to-buffer coding could use less RAM (especially when
decompressing LZMA1 or LZMA2).
I/O library is not implemented (similar to gzopen() in zlib).
It will be a separate library that supports uncompressed, .gz,
.bz2, .lzma, and .xz files.
Support changing lzma_options_lzma.mode with lzma_filters_update().
Support LZMA_FULL_FLUSH for lzma_stream_decoder() to stop at
Block and Stream boundaries.
lzma_strerror() to convert lzma_ret to human readable form?
This is tricky, because the same error codes are used with
slightly different meanings, and this cannot be fixed anymore.
Make it possible to adjust LZMA2 options in the middle of a Block
so that the encoding speed vs. compression ratio can be optimized
when the compressed data is streamed over network.
Improved BCJ filters. The current filters are small but they aren't
so great when compressing binary packages that contain various file
types. Specifically, they make things worse if there are static
libraries or Linux kernel modules. The filtering could also be
more effective (without getting overly complex), for example,
streamable variant BCJ2 from 7-Zip could be implemented.
Filter that autodetects specific data types in the input stream
and applies appropriate filters for the corrects parts of the input.
Perhaps combine this with the BCJ filter improvement point above.
Long-range LZ77 method as a separate filter or as a new LZMA2
match finder.
Documentation
-------------
More tutorial programs are needed for liblzma.
Document the LZMA1 and LZMA2 algorithms.
Miscellaneous
------------
Try to get the media type for .xz registered at IANA.

View File

@ -0,0 +1,31 @@
liblzma example programs
========================
Introduction
The examples are written so that the same comments aren't
repeated (much) in later files.
On POSIX systems, the examples should build by just typing "make".
The examples that use stdin or stdout don't set stdin and stdout
to binary mode. On systems where it matters (e.g. Windows) it is
possible that the examples won't work without modification.
List of examples
01_compress_easy.c Multi-call compression using
a compression preset
02_decompress.c Multi-call decompression
03_compress_custom.c Like 01_compress_easy.c but using
a custom filter chain
(x86 BCJ + LZMA2)
04_compress_easy_mt.c Multi-threaded multi-call
compression using a compression
preset

View File

@ -0,0 +1,24 @@
#
# Author: Lasse Collin
#
# This file has been put into the public domain.
# You can do whatever you want with this file.
#
CC = c99
CFLAGS = -g
LDFLAGS = -llzma
PROGS = \
01_compress_easy \
02_decompress \
03_compress_custom \
04_compress_easy_mt
all: $(PROGS)
.c:
$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
clean:
-rm -f $(PROGS)

View File

@ -0,0 +1,224 @@
XZ Utils FAQ
============
Q: What do the letters XZ mean?
A: Nothing. They are just two letters, which come from the file format
suffix .xz. The .xz suffix was selected, because it seemed to be
pretty much unused. It has no deeper meaning.
Q: What are LZMA and LZMA2?
A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. It is the name
of the compression algorithm designed by Igor Pavlov for 7-Zip.
LZMA is based on LZ77 and range encoding.
LZMA2 is an updated version of the original LZMA to fix a couple of
practical issues. In context of XZ Utils, LZMA is called LZMA1 to
emphasize that LZMA is not the same thing as LZMA2. LZMA2 is the
primary compression algorithm in the .xz file format.
Q: There are many LZMA related projects. How does XZ Utils relate to them?
A: 7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly
a subset of the 7-Zip source tree.
p7zip is 7-Zip's command-line tools ported to POSIX-like systems.
LZMA Utils provide a gzip-like lzma tool for POSIX-like systems.
LZMA Utils are based on LZMA SDK. XZ Utils are the successor to
LZMA Utils.
There are several other projects using LZMA. Most are more or less
based on LZMA SDK. See <http://7-zip.org/links.html>.
Q: Why is liblzma named liblzma if its primary file format is .xz?
Shouldn't it be e.g. libxz?
A: When the designing of the .xz format began, the idea was to replace
the .lzma format and use the same .lzma suffix. It would have been
quite OK to reuse the suffix when there were very few .lzma files
around. However, the old .lzma format became popular before the
new format was finished. The new format was renamed to .xz but the
name of liblzma wasn't changed.
Q: Do XZ Utils support the .7z format?
A: No. Use 7-Zip (Windows) or p7zip (POSIX-like systems) to handle .7z
files.
Q: I have many .tar.7z files. Can I convert them to .tar.xz without
spending hours recompressing the data?
A: In the "extra" directory, there is a script named 7z2lzma.bash which
is able to convert some .7z files to the .lzma format (not .xz). It
needs the 7za (or 7z) command from p7zip. The script may silently
produce corrupt output if certain assumptions are not met, so
decompress the resulting .lzma file and compare it against the
original before deleting the original file!
Q: I have many .lzma files. Can I quickly convert them to the .xz format?
A: For now, no. Since XZ Utils supports the .lzma format, it's usually
not too bad to keep the old files in the old format. If you want to
do the conversion anyway, you need to decompress the .lzma files and
then recompress to the .xz format.
Technically, there is a way to make the conversion relatively fast
(roughly twice the time that normal decompression takes). Writing
such a tool would take quite a bit of time though, and would probably
be useful to only a few people. If you really want such a conversion
tool, contact Lasse Collin and offer some money.
Q: I have installed xz, but my tar doesn't recognize .tar.xz files.
How can I extract .tar.xz files?
A: xz -dc foo.tar.xz | tar xf -
Q: Can I recover parts of a broken .xz file (e.g. a corrupted CD-R)?
A: It may be possible if the file consists of multiple blocks, which
typically is not the case if the file was created in single-threaded
mode. There is no recovery program yet.
Q: Is (some part of) XZ Utils patented?
A: Lasse Collin is not aware of any patents that could affect XZ Utils.
However, due to the nature of software patents, it's not possible to
guarantee that XZ Utils isn't affected by any third party patent(s).
Q: Where can I find documentation about the file format and algorithms?
A: The .xz format is documented in xz-file-format.txt. It is a container
format only, and doesn't include descriptions of any non-trivial
filters.
Documenting LZMA and LZMA2 is planned, but for now, there is no other
documentation than the source code. Before you begin, you should know
the basics of LZ77 and range-coding algorithms. LZMA is based on LZ77,
but LZMA is a lot more complex. Range coding is used to compress
the final bitstream like Huffman coding is used in Deflate.
Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?
A: BCJ filter is called "x86" in liblzma. BCJ2 is not included,
because it requires using more than one encoded output stream.
A streamable version of BCJ2-style filtering is planned.
Q: I need to use a script that runs "xz -9". On a system with 256 MiB
of RAM, xz says that it cannot allocate memory. Can I make the
script work without modifying it?
A: Set a default memory usage limit for compression. You can do it e.g.
in a shell initialization script such as ~/.bashrc or /etc/profile:
XZ_DEFAULTS=--memlimit-compress=150MiB
export XZ_DEFAULTS
xz will then scale the compression settings down so that the given
memory usage limit is not reached. This way xz shouldn't run out
of memory.
Check also that memory-related resource limits are high enough.
On most systems, "ulimit -a" will show the current resource limits.
Q: How do I create files that can be decompressed with XZ Embedded?
A: See the documentation in XZ Embedded. In short, something like
this is a good start:
xz --check=crc32 --lzma2=preset=6e,dict=64KiB
Or if a BCJ filter is needed too, e.g. if compressing
a kernel image for PowerPC:
xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB
Adjust the dictionary size to get a good compromise between
compression ratio and decompressor memory usage. Note that
in single-call decompression mode of XZ Embedded, a big
dictionary doesn't increase memory usage.
Q: Will xz support threaded compression?
A: It is planned and has been taken into account when designing
the .xz file format. Eventually there will probably be three types
of threading, each method having its own advantages and disadvantages.
The simplest method is splitting the uncompressed data into blocks
and compressing them in parallel independent from each other.
Since the blocks are compressed independently, they can also be
decompressed independently. Together with the index feature in .xz,
this allows using threads to create .xz files for random-access
reading. This also makes threaded decompression possible, although
it is not clear if threaded decompression will ever be implemented.
The independent blocks method has a couple of disadvantages too. It
will compress worse than a single-block method. Often the difference
is not too big (maybe 1-2 %) but sometimes it can be too big. Also,
the memory usage of the compressor increases linearly when adding
threads.
Match finder parallelization is another threading method. It has
been in 7-Zip for ages. It doesn't affect compression ratio or
memory usage significantly. Among the three threading methods, only
this is useful when compressing small files (files that are not
significantly bigger than the dictionary). Unfortunately this method
scales only to about two CPU cores.
The third method is pigz-style threading (I use that name, because
pigz <http://www.zlib.net/pigz/> uses that method). It doesn't
affect compression ratio significantly and scales to many cores.
The memory usage scales linearly when threads are added. This isn't
significant with pigz, because Deflate uses only a 32 KiB dictionary,
but with LZMA2 the memory usage will increase dramatically just like
with the independent-blocks method. There is also a constant
computational overhead, which may make pigz-method a bit dull on
dual-core compared to the parallel match finder method, but with more
cores the overhead is not a big deal anymore.
Combining the threading methods will be possible and also useful.
E.g. combining match finder parallelization with pigz-style threading
can cut the memory usage by 50 %.
It is possible that the single-threaded method will be modified to
create files identical to the pigz-style method. We'll see once
pigz-style threading has been implemented in liblzma.
Q: How do I build a program that needs liblzmadec (lzmadec.h)?
A: liblzmadec is part of LZMA Utils. XZ Utils has liblzma, but no
liblzmadec. The code using liblzmadec should be ported to use
liblzma instead. If you cannot or don't want to do that, download
LZMA Utils from <https://tukaani.org/lzma/>.
Q: The default build of liblzma is too big. How can I make it smaller?
A: Give --enable-small to the configure script. Use also appropriate
--enable or --disable options to include only those filter encoders
and decoders and integrity checks that you actually need. Use
CFLAGS=-Os (with GCC) or equivalent to tell your compiler to optimize
for size. See INSTALL for information about configure options.
If the result is still too big, take a look at XZ Embedded. It is
a separate project, which provides a limited but significantly
smaller XZ decoder implementation than XZ Utils. You can find it
at <https://tukaani.org/xz/embedded.html>.

View File

@ -0,0 +1,150 @@
History of LZMA Utils and XZ Utils
==================================
Tukaani distribution
In 2005, there was a small group working on the Tukaani distribution,
which was a Slackware fork. One of the project's goals was to fit the
distro on a single 700 MiB ISO-9660 image. Using LZMA instead of gzip
helped a lot. Roughly speaking, one could fit data that took 1000 MiB
in gzipped form into 700 MiB with LZMA. Naturally, the compression
ratio varied across packages, but this was what we got on average.
Slackware packages have traditionally had .tgz as the filename suffix,
which is an abbreviation of .tar.gz. A logical naming for LZMA
compressed packages was .tlz, being an abbreviation of .tar.lzma.
At the end of the year 2007, there was no distribution under the
Tukaani project anymore, but development of LZMA Utils was kept going.
Still, there were .tlz packages around, because at least Vector Linux
(a Slackware based distribution) used LZMA for its packages.
First versions of the modified pkgtools used the LZMA_Alone tool from
Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need
to interact with LZMA_Alone directly. But people soon wanted to use
LZMA for other files too, and the interface of LZMA_Alone wasn't
comfortable for those used to gzip and bzip2.
First steps of LZMA Utils
The first version of LZMA Utils (4.22.0) included a shell script called
lzmash. It was a wrapper that had a gzip-like command-line interface. It
used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep,
zdiff, and related scripts from gzip were adapted to work with LZMA and
were part of the first LZMA Utils release too.
LZMA Utils 4.22.0 included also lzmadec, which was a small (less than
10 KiB) decoder-only command-line tool. It was written on top of the
decoder-only C code found from the LZMA SDK. lzmadec was convenient in
situations where LZMA_Alone (a few hundred KiB) would be too big.
lzmash and lzmadec were written by Lasse Collin.
Second generation
The lzmash script was an ugly and not very secure hack. The last
version of LZMA Utils to use lzmash was 4.27.1.
LZMA Utils 4.32.0beta1 introduced a new lzma command-line tool written
by Ville Koskinen. It was written in C++, and used the encoder and
decoder from C++ LZMA SDK with some little modifications. This tool
replaced both the lzmash script and the LZMA_Alone command-line tool
in LZMA Utils.
Introducing this new tool caused some temporary incompatibilities,
because the LZMA_Alone executable was simply named lzma like the new
command-line tool, but they had a completely different command-line
interface. The file format was still the same.
Lasse wrote liblzmadec, which was a small decoder-only library based
on the C code found from LZMA SDK. liblzmadec had an API similar to
zlib, although there were some significant differences, which made it
non-trivial to use it in some applications designed for zlib and
libbzip2.
The lzmadec command-line tool was converted to use liblzmadec.
Alexandre Sauvé helped converting the build system to use GNU
Autotools. This made it easier to test for certain less portable
features needed by the new command-line tool.
Since the new command-line tool never got completely finished (for
example, it didn't support the LZMA_OPT environment variable), the
intent was to not call 4.32.x stable. Similarly, liblzmadec wasn't
polished, but appeared to work well enough, so some people started
using it too.
Because the development of the third generation of LZMA Utils was
delayed considerably (3-4 years), the 4.32.x branch had to be kept
maintained. It got some bug fixes now and then, and finally it was
decided to call it stable, although most of the missing features were
never added.
File format problems
The file format used by LZMA_Alone was primitive. It was designed with
embedded systems in mind, and thus provided only a minimal set of
features. The two biggest problems for non-embedded use were the lack
of magic bytes and an integrity check.
Igor and Lasse started developing a new file format with some help
from Ville Koskinen. Also Mark Adler, Mikko Pouru, H. Peter Anvin,
and Lars Wirzenius helped with some minor things at some point of the
development. Designing the new format took quite a long time (actually,
too long a time would be a more appropriate expression). It was mostly
because Lasse was quite slow at getting things done due to personal
reasons.
Originally the new format was supposed to use the same .lzma suffix
that was already used by the old file format. Switching to the new
format wouldn't have caused much trouble when the old format wasn't
used by many people. But since the development of the new format took
such a long time, the old format got quite popular, and it was decided
that the new file format must use a different suffix.
It was decided to use .xz as the suffix of the new file format. The
first stable .xz file format specification was finally released in
December 2008. In addition to fixing the most obvious problems of
the old .lzma format, the .xz format added some new features like
support for multiple filters (compression algorithms), filter chaining
(like piping on the command line), and limited random-access reading.
Currently the primary compression algorithm used in .xz is LZMA2.
It is an extension on top of the original LZMA to fix some practical
problems: LZMA2 adds support for flushing the encoder, uncompressed
chunks, eases stateful decoder implementations, and improves support
for multithreading. Since LZMA2 is better than the original LZMA, the
original LZMA is not supported in .xz.
Transition to XZ Utils
The early versions of XZ Utils were called LZMA Utils. The first
releases were 4.42.0alphas. They dropped the rest of the C++ LZMA SDK.
The code was still directly based on LZMA SDK but ported to C and
converted from a callback API to a stateful API. Later, Igor Pavlov
made a C version of the LZMA encoder too; these ports from C++ to C
were independent in LZMA SDK and LZMA Utils.
The core of the new LZMA Utils was liblzma, a compression library with
a zlib-like API. liblzma supported both the old and new file format.
The gzip-like lzma command-line tool was rewritten to use liblzma.
The new LZMA Utils code base was renamed to XZ Utils when the name
of the new file format had been decided. The liblzma compression
library retained its name though, because changing it would have
caused unnecessary breakage in applications already using the early
liblzma snapshots.
The xz command-line tool can emulate the gzip-like lzma tool by
creating appropriate symlinks (e.g. lzma -> xz). Thus, practically
all scripts using the lzma tool from LZMA Utils will work as is with
XZ Utils (and will keep using the old .lzma format). Still, the .lzma
format is more or less deprecated. XZ Utils will keep supporting it,
but new applications should use the .xz format, and migrating old
applications to .xz is often a good idea too.

View File

@ -0,0 +1,99 @@
EXPORTS
lzma_alone_decoder
lzma_alone_encoder
lzma_auto_decoder
lzma_block_buffer_bound
lzma_block_buffer_decode
lzma_block_buffer_encode
lzma_block_compressed_size
lzma_block_decoder
lzma_block_encoder
lzma_block_header_decode
lzma_block_header_encode
lzma_block_header_size
lzma_block_total_size
lzma_block_uncomp_encode
lzma_block_unpadded_size
lzma_check_is_supported
lzma_check_size
lzma_code
lzma_cputhreads
lzma_crc32
lzma_crc64
lzma_easy_buffer_encode
lzma_easy_decoder_memusage
lzma_easy_encoder
lzma_easy_encoder_memusage
lzma_end
lzma_filter_decoder_is_supported
lzma_filter_encoder_is_supported
lzma_filter_flags_decode
lzma_filter_flags_encode
lzma_filter_flags_size
lzma_filters_copy
lzma_filters_update
lzma_get_check
lzma_get_progress
lzma_index_append
lzma_index_block_count
lzma_index_buffer_decode
lzma_index_buffer_encode
lzma_index_cat
lzma_index_checks
lzma_index_decoder
lzma_index_dup
lzma_index_encoder
lzma_index_end
lzma_index_file_size
lzma_index_hash_append
lzma_index_hash_decode
lzma_index_hash_end
lzma_index_hash_init
lzma_index_hash_size
lzma_index_init
lzma_index_iter_init
lzma_index_iter_locate
lzma_index_iter_next
lzma_index_iter_rewind
lzma_index_memusage
lzma_index_memused
lzma_index_size
lzma_index_stream_count
lzma_index_stream_flags
lzma_index_stream_padding
lzma_index_stream_size
lzma_index_total_size
lzma_index_uncompressed_size
lzma_lzma_preset
lzma_memlimit_get
lzma_memlimit_set
lzma_memusage
lzma_mf_is_supported
lzma_mode_is_supported
lzma_physmem
lzma_properties_decode
lzma_properties_encode
lzma_properties_size
lzma_raw_buffer_decode
lzma_raw_buffer_encode
lzma_raw_decoder
lzma_raw_decoder_memusage
lzma_raw_encoder
lzma_raw_encoder_memusage
lzma_stream_buffer_bound
lzma_stream_buffer_decode
lzma_stream_buffer_encode
lzma_stream_decoder
lzma_stream_encoder
lzma_stream_encoder_mt
lzma_stream_encoder_mt_memusage
lzma_stream_flags_compare
lzma_stream_footer_decode
lzma_stream_footer_encode
lzma_stream_header_decode
lzma_stream_header_encode
lzma_version_number
lzma_version_string
lzma_vli_decode
lzma_vli_encode
lzma_vli_size

View File

@ -0,0 +1,166 @@
The .lzma File Format
=====================
0. Preface
0.1. Notices and Acknowledgements
0.2. Changes
1. File Format
1.1. Header
1.1.1. Properties
1.1.2. Dictionary Size
1.1.3. Uncompressed Size
1.2. LZMA Compressed Data
2. References
0. Preface
This document describes the .lzma file format, which is
sometimes also called LZMA_Alone format. It is a legacy file
format, which is being or has been replaced by the .xz format.
The MIME type of the .lzma format is `application/x-lzma'.
The most commonly used software to handle .lzma files are
LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document
describes some of the differences between these implementations
and gives hints what subset of the .lzma format is the most
portable.
0.1. Notices and Acknowledgements
This file format was designed by Igor Pavlov for use in
LZMA SDK. This document was written by Lasse Collin
<lasse.collin@tukaani.org> using the documentation found
from the LZMA SDK.
This document has been put into the public domain.
0.2. Changes
Last modified: 2011-04-12 11:55+0300
1. File Format
+-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
| Header | LZMA Compressed Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
The .lzma format file consist of 13-byte Header followed by
the LZMA Compressed Data.
Unlike the .gz, .bz2, and .xz formats, it is not possible to
concatenate multiple .lzma files as is and expect the
decompression tool to decode the resulting file as if it were
a single .lzma file.
For example, the command line tools from LZMA Utils and
LZMA SDK silently ignore all the data after the first .lzma
stream. In contrast, the command line tool from XZ Utils
considers the .lzma file to be corrupt if there is data after
the first .lzma stream.
1.1. Header
+------------+----+----+----+----+--+--+--+--+--+--+--+--+
| Properties | Dictionary Size | Uncompressed Size |
+------------+----+----+----+----+--+--+--+--+--+--+--+--+
1.1.1. Properties
The Properties field contains three properties. An abbreviation
is given in parentheses, followed by the value range of the
property. The field consists of
1) the number of literal context bits (lc, [0, 8]);
2) the number of literal position bits (lp, [0, 4]); and
3) the number of position bits (pb, [0, 4]).
The properties are encoded using the following formula:
Properties = (pb * 5 + lp) * 9 + lc
The following C code illustrates a straightforward way to
decode the Properties field:
uint8_t lc, lp, pb;
uint8_t prop = get_lzma_properties();
if (prop > (4 * 5 + 4) * 9 + 8)
return LZMA_PROPERTIES_ERROR;
pb = prop / (9 * 5);
prop -= pb * 9 * 5;
lp = prop / 9;
lc = prop - lp * 9;
XZ Utils has an additional requirement: lc + lp <= 4. Files
which don't follow this requirement cannot be decompressed
with XZ Utils. Usually this isn't a problem since the most
common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb
combination that the files created by LZMA Utils can have,
but LZMA Utils can decompress files with any lc/lp/pb.
1.1.2. Dictionary Size
Dictionary Size is stored as an unsigned 32-bit little endian
integer. Any 32-bit value is possible, but for maximum
portability, only sizes of 2^n and 2^n + 2^(n-1) should be
used.
LZMA Utils creates only files with dictionary size 2^n,
16 <= n <= 25. LZMA Utils can decompress files with any
dictionary size.
XZ Utils creates and decompresses .lzma files only with
dictionary sizes 2^n and 2^n + 2^(n-1). If some other
dictionary size is specified when compressing, the value
stored in the Dictionary Size field is a rounded up, but the
specified value is still used in the actual compression code.
1.1.3. Uncompressed Size
Uncompressed Size is stored as unsigned 64-bit little endian
integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
that Uncompressed Size is unknown. End of Payload Marker (*)
is used if and only if Uncompressed Size is unknown.
XZ Utils rejects files whose Uncompressed Size field specifies
a known size that is 256 GiB or more. This is to reject false
positives when trying to guess if the input file is in the
.lzma format. When Uncompressed Size is unknown, there is no
limit for the uncompressed size of the file.
(*) Some tools use the term End of Stream (EOS) marker
instead of End of Payload Marker.
1.2. LZMA Compressed Data
Detailed description of the format of this field is out of
scope of this document.
2. References
LZMA SDK - The original LZMA implementation
http://7-zip.org/sdk.html
7-Zip
http://7-zip.org/
LZMA Utils - LZMA adapted to POSIX-like systems
http://tukaani.org/lzma/
XZ Utils - The next generation of LZMA Utils
http://tukaani.org/xz/
The .xz file format - The successor of the .lzma format
http://tukaani.org/xz/xz-file-format.txt

File diff suppressed because it is too large Load Diff