Menu
Is free
registration
home  /  Installation and configuration/ Who is the Sysadmin? Lossless data compression algorithms.

Who is the Sysadmin? Lossless data compression algorithms.

  • Translation

Part one - historical.

Introduction

Existing data compression algorithms can be divided into two large classes - lossy and lossy. Lossy algorithms are commonly used to compress images and audio. These algorithms allow high compression ratios to be achieved due to selective loss of quality. However, by definition, it is impossible to recover the original data from the compressed result.
Lossless compression algorithms are used to reduce the size of the data, and operate in such a way that it is possible to restore the data exactly as it was before the compression. They are used in communications, archivers and some algorithms for compressing audio and graphic information. In the following, we will only consider lossless compression algorithms.
The basic principle of compression algorithms is based on the fact that information is partially repeated in any file containing non-random data. Using statistical mathematical models, you can determine the probability of repeating a certain combination of symbols. You can then create codes for the selected phrases and assign the shortest codes to the most frequently repeated phrases. For this are used different techniques for example: entropy coding, repetition coding, and dictionary compression. With their help, an 8-bit character, or an entire string, can be replaced with just a few bits, thus eliminating unnecessary information.

History

Hierarchy of algorithms:

Although data compression became widespread with the Internet and after the invention of the algorithms by Lempel and Ziv (LZ algorithms), several earlier examples of compression can be cited. Morse, inventing his code in 1838, intelligently assigned the most frequently used letters in English language, "E" and "t", the shortest sequences (dot and dash, respectively). Soon after the mainframes appeared in 1949, the Shannon-Fano algorithm was invented, which assigned codes to characters in a data block based on the likelihood of their occurrence in the block. The likelihood of a character appearing in a block was inversely proportional to the length of the code, which made it possible to compress the data representation.
David Huffman was a student in Robert Fano's class and as educational work chose to search for an improved binary data encoding method. As a result, he was able to improve the Shannon-Fano algorithm.
Early versions of the Shannon-Fano and Huffman algorithms used predefined codes. Later, for this they began to use codes generated dynamically on the basis of data intended for compression. In 1977, Lempel and Ziv published their LZ77 algorithm, based on the use of a dynamically created dictionary (also called a "sliding window"). In 78, they published the LZ78 algorithm, which first parses data and builds a dictionary, instead of dynamically creating it.

Rights problems

The LZ77 and LZ78 algorithms gained great popularity and caused a wave of enhancers, of which DEFLATE, LZMA and LZX have survived to this day. Most of the popular algorithms are based on LZ77, because the LZ78-derived LZW algorithm was patented by Unisys in 1984, after which they began to troll everyone, including even the use of GIF images. At this time, a variation of the LZW algorithm called LZC was used on UNIX, and due to permission problems, they had to be phased out. Preference was given to the DEFLATE (gzip) algorithm and the Burrows-Wheeler transform, BWT (bzip2). Which was for the best, since these algorithms almost always outperform LZW in compression.
By 2003, the patent expired, but the train had already left and the LZW algorithm was preserved, perhaps only in GIF files... Algorithms based on LZ77 are dominant.
In 1993, there was another patent battle - when Stac Electronics discovered that its LZS algorithm was being used by Microsoft in the disk compression program that came with MS-DOS 6.0. Stac Electronics filed a lawsuit and won the case, resulting in over $ 100 million.

The rise in popularity of Deflate

Large corporations used compression algorithms to store ever-increasing amounts of data, but the true spread of algorithms came with the birth of the Internet in the late 1980s. The bandwidth of the channels was extremely narrow. To compress data transmitted over the network, the ZIP, GIF and PNG formats were invented.
Tom Henderson invented and released the first commercially successful ARC archiver in 1985 (by System Enhancement Associates). ARC was popular with BBS users because she was one of the first to compress several files into an archive, besides her sources were open. ARC used a modified LZW algorithm.
Phil Katz, inspired by the popularity of ARC, released a shareware program called PKARC, in which he improved the compression algorithms by rewriting them in Assembler. However, he was sued by Henderson and found guilty. PKARC copied ARC so openly that it sometimes even repeated typos in its source code comments.
But Phil Katz was not taken aback, and in 1989 he greatly changed the archiver and released PKZIP. After he was attacked in connection with the patent for the LZW algorithm, he also changed the basic algorithm to a new one, called IMPLODE. The format was replaced again in 1993 with the release of PKZIP 2.0, and the replacement was DEFLATE. Among the new features was the function of splitting the archive into volumes. This version is still widely used, despite its venerable age.
The GIF (Graphics Interchange Format) image format was created by CompuServe in 1987. As you know, the format supports lossless image compression, and is limited to a palette of 256 colors. Despite all the efforts of Unisys, she was unable to stop the spread of this format. It is still popular today, especially for its animation support.
Slightly worried about patent issues, CompuServe released the Portable Network Graphics (PNG) format in 1994. Like ZIP, it used the fancy new DEFLATE algorithm. Although DEFLATE was patented by Katz, he made no claims.
It is now the most popular compression algorithm. Besides PNG and ZIP, it is used in gzip, HTTP, SSL and other data transfer technologies.

Sadly, Phil Katz didn't live to see DEFLATE's triumph; he died of alcoholism in 2000 at the age of 37. Citizens - Excessive Alcohol Consumption Is Dangerous To Your Health! You may not live to see your triumph!

Modern archivers

ZIP reigned supreme until the mid-90s, but in 1993 a simple Russian genius Evgeny Roshal came up with his own format and RAR algorithm. Its latest versions are based on the PPM and LZSS algorithms. Now ZIP, perhaps the most common of the formats, RAR - until recently, was the standard for the distribution of various low-legal content over the Internet (thanks to the increase bandwidth more and more often files are distributed without archiving), and 7zip is used as the format with the best compression with an acceptable runtime. In the UNIX world, tar + gzip is used (gzip is an archiver, and tar combines several files into one, since gzip cannot do this).

Approx. transl. Personally, in addition to those listed, I also came across the ARJ archiver (Archived by Robert Jung), which was popular in the 90s during the BBS era. It maintained multivolume archives, and just like RAR after it, was used to distribute games and other vares. There was also an HA archiver from Harri Hirvola, which used HSC compression (did not find a clear explanation - only "bounded context model and arithmetic coding"), which coped well with compressing long text files.

In 1996, the open source bzip2 variant of the BWT algorithm appeared and quickly gained popularity. In 1999, the 7-zip program appeared with the 7z format. In terms of compression, it competes with RAR, its advantage is openness, as well as the ability to choose between bzip2, LZMA, LZMA2 and PPMd algorithms.
In 2002, another archiver appeared, PAQ. Author Matt Mahoney used an improved version of the PPM algorithm using a technique called context blending. It allows more than one statistical model to be used to improve symbol rate prediction.

The future of compression algorithms

Of course, God knows, but it seems that PAQ is gaining popularity due to its very good compression ratio (although it is very slow). But thanks to the increase in the speed of computers, the speed of work becomes less critical.
On the other hand, the Lempel-Ziv-Markov LZMA algorithm is a compromise between speed and compression ratio and can generate many interesting offshoots.
Another interesting technology "substring enumeration" or CSE, which is still little used in programs.

In the next part we will look at technical side of the mentioned algorithms and the principles of their work.

- 67.00 Kb

GENERAL INFORMATION ABOUT FILE ARCHIVING

Understanding the file archiving process

One of the most widespread types of service programs are programs designed for archiving, packing files by compressing the information stored in them.

Compression of information is the process of converting information stored in a file into a form in which redundancy in its representation is reduced and, accordingly, less memory is required for storage.

Compression of information in files is performed by eliminating redundancy in various ways, for example, by simplifying codes, eliminating constant bits from them, or representing repeating symbols or a repeating sequence of symbols in the form of a repetition factor and corresponding symbols. Various algorithms for such information compression are used.

One or several files can be compressed, which are placed in a compressed form in a so-called archive file or archive.

An archive file is a specially organized file containing one or more files in compressed or uncompressed form and service information about the file names, date and time of their creation or modification, sizes, etc.

The purpose of packing files is usually to provide a more compact placement of information on a disk, to reduce the time and, accordingly, the cost of transmitting information over communication channels in computer networks. In addition, packing a group of files into one archive file greatly simplifies their transfer from one computer to another, reduces the time for copying files to disks, protects information from unauthorized access, and helps protect against infection by computer viruses.

The compression ratio of files is characterized by the coefficient Кс, defined as the ratio of the volume compressed file V s to the volume of the source file V 0, expressed as a percentage:

The amount of compression depends on the program you are using, the compression method and the type of source file. The files of graphic images, text files and data files for which the compression ratio can reach 5 - 40% are compressed the best, files of executable programs and load modules are compressed less - 60 - 90%. Archive files are hardly compressed. Archiving programs differ in the compression methods used, which accordingly affects the compression ratio.

Archiving (packing) - placing (loading) source files into an archive file in compressed or uncompressed form.

Unpacking (unpacking) - the process of restoring files from an archive
exactly as they were before they were uploaded to the archive. When unpacking
files are extracted from the archive and placed on disk or in RAM .__

The programs that pack and unpack files are called archiving programs.

Archive files that are large in size can be located on multiple disks (volumes). Such archives are called multivolume. Tom is component multivolume archive. By creating an archive of several parts, you can write parts of it to multiple floppy disks.

The main types of archiving programs

Currently, several dozen archiving programs are used, which differ in the list of functions and operating parameters, but the best of them have approximately the same characteristics. Among the most popular programs are: ARJ, RARK, LHA, ICE, HYPER, ZIP, RAC, ZOO, EXPAND, developed abroad, as well as AIN and RAR, developed in Russia. Usually, packing and unpacking files is performed by the same program, but in some cases it is done by different programs, for example, PKZIP packs files, and PKUNZIP unpacks files.

Archiving programs allow you to create archives that do not require any programs to extract the files they contain, since the archive files themselves may contain an unpacking program. These archive files are called self-extracting files.

A self-extracting archive file is a bootable, executable module that is capable of self-extracting the files contained in it without using an archiver program.

The self-extracting archive is called SFX-archive (SelF-eXtracting). Archives of this type in MS DOS are usually created in the form of an .EXE file.

Many archiving programs unpack files by unloading them to disk, but there are also those that are designed to create a packed executable module (program). As a result of such packaging, a program file with the same name and extension is created, which, when loaded into RAM, is self-extracting and immediately launched. At the same time, the reverse conversion of the program file to the unpacked format is also possible. These archivers include PKLITE, LZEXE, UNP programs.

The EXPAND program, which is part of the utilities of the MS DOS operating system and the Windows shell, is used to unpack files for software products supplied by Microsoft.

RAR and AIN archiving programs, in addition to the usual compression mode, have a solid mode, in which archives with a high compression ratio and a special organization structure are created. In such archives, all files are compressed as one data stream, i.e. the search area for repeated sequences of characters is the entire collection of files loaded into the archive, and therefore unpacking each file, if not the first, is associated with processing others. Archives of this type are preferable to use for archiving a large number of similar files.

Methods of managing the archiver program

The archiver program is controlled in one of two ways:

Using the MS DOS command line, in which a launch command is formed, containing the name of the archiver program, the control command and its configuration keys, as well as the names of the archive and source files; similar control is typical for archivers ARJ, AIN, ZIP, RAC, LHA, etc .;

With the help of a built-in shell and dialog panels that appear after the program is launched and allow control using menus and function keys, which creates a more comfortable working environment for the user. The RAR archiver program has such control.

Performing the prescribed actions, the archiver program, as a rule, displays a protocol of its work on the screen. All modern archiving programs are equipped with help screens that are called up when you enter only one program name or a name with the /? Key on the command line. Help can be brief - on one screen or expanded - on several. Many archivers have help screens with examples of writing commands to perform various operations. Help information is usually displayed in English or another international language.

Considering the similarity of the management principles of most archiving programs, we will consider the main features of the ARJ program, which is known as one of the best in terms of the set of functions provided to the user, the compression ratio and the speed of work. ARJ is especially effective when working with database files and text files.

1. MS DOS archivers

1.1 ARJ Archiver

Works from the command line. Performs all functions of maintaining archives.arj, incl. support for multivolume archives.

Get help on the keys of the arj archiver using the commands:

arj (regular help)

arj /? (detailed help)

Arj has a very large number of keys. You can automate many actions - creating a disk backup, archiving from a certain date, adding the current date to the archive name (arh970821.arj), archiving a file from a specific location, several compression levels, and so on. In version 2.55, it is possible to work with long names.

Advantages: a very large number of keys, which makes it possible to automate a large number of functions. Archive protection from damage.

Disadvantages: lack of dialog mode, some inconvenience of work in the presence of some key in environment variable(ARJ_SW) and the startup line - mutual destruction.

1.2 PKZIP

Works from the command line. Various functions for maintaining .zip archives are performed by different programs:

pkzip - archive files

pkunzip - extract files from archive

zip2exe - create a self-extracting archive

pkzipfix - recovering a damaged archive.

Explore the help for working with the pkzip archiver using the commands:

1.3 RAR

Archiver RAR v2.50 for DOS - Integrated archive management program

RAR is a very powerful tool for creating and managing archives. RAR features:

Full screen interactive interface (switchable);

Mouse and menu support;

Support for non-RAR archives;

Standard command line interface;

Original highly efficient data compression algorithm;

Special algorithm for compressing multimedia files;

Better packing ratio than similar products due to the use of "continuous" compression mode;

Self-extracting (SFX) regular and multivolume archives;

Recovery of physically damaged archives;

Programming language for installation SFX archives;

Locking, encryption, file order list, volume labels, etc.

1.4 QUARK

Quark is a classic archiver that uses the LZ77 algorithm to compress the original data by encoding repeated sequences of bytes (RSE algorithm), followed by the secondary compression of the compressed stream with Huffman codes. Similar methods are used by all three leaders in the field of data packaging - archivers ARJ, LHA, PkZIP.

Nevertheless, Quark achieves better results in data compactness at a speed better than LHA, not less than that of ARJ and not much different from the speed of PkZIP, when using it so-called. maximum data compression. This is due to several reasons:

1) Quark works with a floating window size from 32Kb to 64Kb (versus the fixed 16Kb for LHA, and 32Kb for PkZIP and ARJ).

2) Quark performs Type I optimization (optimality of LZ77 link addresses) and Type II optimization (optimality of link coverage of the stream).

3) Quark uses text reduction for text files.

4) Quark archives a minimum of service information, without pretending to other hardware platforms and operating systems.

1.5 GZIP

Gzip reduces the size of the given files using Ziv-Lemel encoding (LZ77). Whenever possible, each file is replaced with a ".gz" file, while retaining the owner, modes, access, and modification times (Other extensions are "-gz" for VMS, "z" for MSDOS, OS / 2, FAT, and Atari). If no files are specified or the filename is "-", then standard input is packed and printed to standard output. Gzip tries to pack only regular files, in particular GZip ignores symbolic links.

Gzip uses the Ziv-Lemel algorithm as well as Zip, PKZIP. The final size of the resulting file after compression depends on the size of the original file and the presence of common substrings in it. Typically, text such as source code or English text is cut by 60-70%. Packing using this algorithm is usually better than using LZW (Compress uses it), Huffman coding (Pack uses it), or Adapted Huffman coding (Compact).

Packing occurs regardless of whether the size of the packed file has increased in comparison with the original or not. The reason for the extension is a few bytes for the Gzip file header, plus 5 bytes for each 32K block, or an extension ratio of 0.015% of the file length. Note that the actual number of blocks used on the disk never increases. Gzip preserves the access modes, owners, and modification times of files when packing and unpacking.

1.6 ARJZ

ARJZ (pronounced "arzh-zet" at the will of the author of the program) is an archiver based on the famous ARJ program by Robert Young. Unlike modern archiving tools such as RAR and UC2, ARJZ uses a file format, command line and options that are compatible with one of the most popular data compression programs, and this has its advantages. In particular:

1) Almost all software designed to call ARJ will work the same with ARJZ without any modification. For example, you will not need to rewrite ARCVIEW, NC 4.0, DN, or those BAT files that you may have created while using ARJ.

2) In order to use the features of ARJZ "and when working with your old archives, you do not need to re-archive them at all.

3) You also almost get rid of the need to learn a new archiver. Knowing how ARJ starts, you know how ARJZ starts.

Short description

One of the most widespread types of service programs are programs designed for archiving, packing files by compressing the information stored in them.
Compression of information is the process of converting information stored in a file into a form in which redundancy in its representation is reduced and, accordingly, less memory is required for storage.

Vadim Tukaev (Saratov)

You may have come across archives with the ".arj" extension on your way. If you click on such a file, then WinRAR will surely start (or whatever you have installed as a standard archiver), quite calmly read its contents and unpack it wherever you want. However, if you have an inquiring mind (or just a little curiosity), then you will ask yourself: why did the creator of that archive use this particular archiver? Nowadays, it is rare to find anything other than ".zip" and ".rar". Well, uniksoids often come across ".tgz" (strictly speaking, which is not a special file format, but an abbreviation for ".tar.gz", i.e. this extension means that the file was first archived with the tar program, and then compressed with the gzip program , but that's a completely different story). Once upon a time, there were many archiving algorithms (ARC, HA, LHA, PAK, UC2, ZOO), and each had adherents.

One of the most common was ARJ, which competed on equal terms with ZIP. The reason ZIP has become the de facto standard is because of its very fast algorithm and good compression ratio. If there were archivers compressing better (for example, RAR), then this was achieved by a disproportionately large expenditure of system resources. Roughly speaking, it took 10 times longer to compress 10% better. In addition, PKZIP was distributed on a shareware basis and was completely free for most people. Modern research shows that 60% of all existing file archives are in ZIP format. Phil Katz, the creator of the ZIP algorithm, the PKZIP program and the founder of PKWARE (PK - Phil Katz), which distributes it, became a wealthy and famous man, which did not bring him happiness. He drank himself and died at the age of 37. However, this is again a completely different story, albeit very instructive. By the way, ARJ stands for Archived by Robert Jung. I did not find any information about the author of ARJ. Perhaps this has something to do with his deep religiosity. For example, the Lord himself is seriously named the senior partner of ARJ Software.

ARJ advantages:

1. It works very quickly, which is not surprising, because the first version of the program appeared in 1990 (then 16-bit Intel processor The 80286 was considered unreasonably cool, and the 32-bit 80386 was a pipe dream!) And the algorithm hasn't changed a bit since then. For the same reasons, it is undemanding to volume. random access memory(I had a case when I could not unpack a rar archive on my old computer just because it had too little memory).

2. Total compatibility from top to bottom, bottom to top and wherever you please. Any ARJ version will open any arj archive. Compare this situation with RAR. He, of course, develops and improves, but this leads to the fact that old version RAR may not unpack the next generation rar archive. She simply will not understand that you are slipping this to her.

3. A huge number of options and customization options for your specific needs, far superior to any other archiver. Some functions of ARJ are not even in a very similar, but more modern JAR archiver by the same author.

4. Availability for almost any OS - DOS, Windows, Linux, FreeBSD, OS / 2, and most importantly - support for the specific features of these OS. For example, OS / 2 EA (Extended Attributes). This also includes the ability to unpack files with long names in DOS, which does not understand such names. Please note that ARJ Software itself has created only console versions for DOS and Windows, everything else is either open source, or (as in the case of ARJ / 2 and WinArj) developed by third parties.

5. Last but not least (English proverb, loosely translated - "the last in listing, but not in meaning"): multivolume. In most cases, it was on this basis that the user made the final choice between ZIP and ARJ. Imagine the following situation: you need to transfer from one computer to another a file of such a size that, even when packed, it is larger than any available external storage medium. Read: "it won't fit on a floppy disk," because it used to be the only generally accepted and publicly available means of sharing files. Not everyone had magneto-optical discs, streamer tapes, Bernoulli discs, etc. CD-R was the same innovation inaccessible to the layman, as is now the BlueRay disc recorder. What to do? Use ARJ, which was able to create multivolume archives, i.e. archives consisting of several files. There were even cases when ARJ was used for its side effect (cutting the file into pieces), and not the main one (reducing them in size). For example, the files were first archived with PKZIP, and then the resulting huge zip file was placed into a multivolume arj archive. It made sense if every byte was counted, and this particular set of files in the "zipun" turned out to be smaller than in the "arzhan" form.

Disadvantages of ARJ (which, as is often the case, mirror reflections its merits):

1. The archiver does not develop, because there is nothing especially to develop there. Any major innovations contradict the ARJ ideology: everything should be unpacked with the first version of the 1990 sample.

2. In particular, work with files longer than two gigabytes is not supported. And now it is unlikely to ever be supported, given that the rework source code this will require a substantial one, and the author seems to have lost interest in his creation. He is now developing a JAR archiver that follows the same philosophy but is not backward compatible with ARJ.

3. Solid-archives are not and never will be. For those who do not know what it is, I will explain it using the RAR example (as far as I know, it was in it that this brilliant idea was first implemented). Let's say you have two files that are very similar in their content. Let's say two texts on philosophical topics. Surely both will often contain the same letter combinations, for example, the characteristic ending "ism" (Marxism, Leninism, idealism). By archiving the first file, RAR will note this fact and save information about these "isms" in a special "dictionary". When he zips the second file, he will no longer include "ism" in the list of frequently occurring letter combinations, but will simply link to the corresponding entry in the dictionary. As a result, the second file will be archived much more efficiently. By the way, JAR supports solid archiving.

4. The ARJ compression ratio is not bad, approximately at the ZIP level (it is impossible to clearly say which of the two is better - different files produce different results), but still modern archivers compress much more efficiently.

But sometimes it makes sense not only to wait ten times longer for the sake of reducing the data by one tenth, but even for the sake of reducing by only one hundredth, you can wait a hundred times longer. Moreover, modern computers are very powerful, and "a hundred times longer" can mean "a second instead of one hundredth of a second." In addition, according to my observations, once an archive is created, it is rarely necessary to update and repackage.

5. The need to use the command line interface and remember specific commands and keys of the ARJ program. Now a whole generation of users has grown up who are afraid of the "black screen with letters" like the devil of incense. Still, I advise you to overcome this phobia. A more flexible way of interacting with programs has not yet been invented. Someday it will come in handy ... At least in order to avoid a "culture shock" when faced with UNIX systems.

Conclusions:

ARJ was developed not only and not so much for "tamping" static data (for example, distributions of programs), but for conveniently archiving current documents (for example, your own program's source tree), automating regular backups and creating frequently used, modified and updated archives. It is in these cases that ARJ's proprietary tricks come to the fore, such as searching in the archive, several types of SFX archives (SelF-eXtracting - self-extracting), writing the current date into the archive name, unpacking files based on the presence of a certain line of text in them, powerful the ability to recover partially damaged archives, the ability to rename a file directly inside the archive, the ability to make decisions in force majeure without user intervention ... It's pointless to continue. It is enough to see what a huge list of commands, keys and modifiers the ARJ.EXE /? Command produces. Their listing alone will be longer than this article. Here's just one example of using ARJ:

arj a -e -jt -jm -jh65535 -vav -g? -wC: \ TEMP -xMY_DIARY.TXT my_texts_ * .txt -h # -hcCLS

command a: add files to the archive (if there is no archive with this name, it will be created).

switch e: do not save information about the directory structure.

jt key: check if the files were damaged when packing.

jm key: use maximum compression level.

key jh: set the buffer size for the Huffman algorithm (65535 is the maximum, 2048 is the minimum, but there is no direct relationship between its size and the compression ratio, i.e. sometimes a smaller value gives better compression).

key v: create a multivolume archive, modifier "a" - use all available space on the medium for the volume (it is convenient if you do not have "zero" floppies, but only half-empty and / or partially damaged ones), "v" - after writing each volume "Beep" with the speaker to make the user wake up and insert a new floppy disk.

key g: encrypt archive, modifier "?" - ask for a password immediately before archiving.

key w: specify a directory for temporary files.

switch x: under no circumstances archive this file!

my_texts_ is the name of the archive (or its first part - see the h # key).

* .txt - process all text files from the current directory.

key h #: add today's date to the archive name in YYMMDD format, i.e. the archive dated February 13, 2010 will be named "my_texts_100213.ARJ".

key hc: execute a DOS command before starting work, in this case CLS (CLear Screen - clear the screen).

Topic 2.1Working with files

1. Archivers and archiving.

2. View archive file in format ZIP.

The problem of data compression has existed for a long time, since the advent of computers.The purpose of packing files is usually to provide a more compact placement of information on a disk, to reduce the time and, accordingly, the cost of transmitting information over communication channels in computer networks. In addition, packing a group of files into one archive file greatly simplifies their transfer from one computer to another, reduces the time for copying files to disks, and helps protect information from unauthorized access. These and other issues are solved with the help of powerful and functional archiving programs, the developers of which offer users various methods for processing data. Moreover, archivers can be both free and commercial, and their choice depends on the requirements that the user makes to the program for working with specified files. Among the most popular programs are: WinRAR (commercial version) and 7-Zip (free program).

Archive file is a specially organized file containing one or several files in compressed or uncompressed form and service information about the file names, date and time of their creation or modification, sizes, etc.


Archivers are programs that implement the archiving process, allowing you to create and unpack archives
Archiving- this is compression, compaction, packaging of information.
Unzip- file recovery processfishing from the archive exactly in the form that they had before loading intoarchive. When unpacking, the files are extracted from the archive and placed are saved to disk.
Compression ratio file is characterized compression ratio K c, which is defined as the ratio of the size of the compressed file V c to the volume of the original file V o, expressed in%:

Let's learn how to create archives using the WinRar program.


After loading the program, we see a standard window with a clear interface.

Add - allows you to both archive selected files and add them to an existing archive.

View - shows the contents of the file.

Delete - deletes the selected file / group of files.

Fix - Recovers a corrupted archive.

Rate - gives an approximate estimate for archiving the selected file / group of files.

Extract to - allows you to specify the unpacking path.

Test - tests the selected archive for errors.

To archive a file or group of files, select them and click on the add button.


When creating an archive, you must specify the name of the archive, if the archive is created in the current folder


or where it was saved.


When creating an archive, you can choose the archiving format RAR or ZIP

When creating an archive, you can choose a compression method

The maximum method provides the highest compression ratio, but at the slowest speed. On the contrary, the high-speed one compresses badly, but quickly. The uncompressed method puts files in the archive without packing them. For transmission over computer networks or for long-term storage, it makes sense to choose the maximum method to obtain the best compression. For daily backup the usual method is most often used.

Multivolume archives.

E If the original file intended for transmission over the Internet is very large, then it is simply impossible to transfer it over the network for this reason. To make this possible, such a file is “sliced” during compression into fragments, each of which is called an archive volume. As a result, it turns out, let's say, 10 volumes, which are downloaded in turn. Such an archive is called multivolume. When unpacking the first fragment, all the rest are unpacked automatically, and the user receives the original file in its original form.


Self-extracting archive.

To unzip such an archive, you do not need a special program, it is enough to run the archive file for execution, since it is an executable file.


It is possible to set a password.

To extract files from the archive, use the "extract" button, indicate the path and parameters for extraction.


Practical work:

1. Open the My Archive folder.

2. Zip each file

3. Define compression ratio

4. Investigate the change in size of the source files and resulting archives.

5. Record the results in the table.

P / p No.

File name

File type

Original size

Archive file size

Compression ratio

Conclusion about the compression ratio of files of different types --__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

How it all began When CDs had not yet received such distribution and the only media from one computer to another, as well as for Reserve copy, there were floppies, there was a need for programs that would compress information so that it took up less space and would save it in one or more files for transfer on floppy disks. This is how archivers came into being.
As already noted, archivers were mainly used for backing up and transferring information. When storing copies of files in a compressed form, they take up less space, moreover, it is more convenient to operate with one or more files than with big amount files and directories. The archivers have not lost their relevance now, however, the requirements for this category of software products to users have changed significantly. If earlier, perhaps the most important was the requirement for maximum compression of information, at least due to the high cost of storage media at that time - archivers that satisfied this very requirement were distributed first of all, now the situation has changed significantly and simplicity and convenience in use.
Another important requirement for archivers when transferring information was its prevalence, that is, so that when transferring information, it was not necessary to transfer the archiver itself in addition.
Over the years since the inception of the first program of this type, hundreds of different archivers have been written that support various archive formats. At the time of the formation and development of archivers, the most common format was ARJ, in second place almost immediately after it ZIP, with some margin followed by such archivers as ARC, ACE, LZH. At the moment, the situation has changed significantly. The first place among archiving formats is taken by ZIP, having won it over from ARJ, which has now receded into the background, RAR is in second place, and ACE, ARJ and other less popular formats follow by a significant margin.
Thus, in our review, we are primarily interested in archivers of the most common formats:


ZIP- the format was developed by PKWARE.

RAR- the format was developed by Eugene Roshal, the author of the archiver of the same name and thanks to user-friendly interface the archiver simultaneously gained popularity with good compression.

Descriptions of archivers

WinZip

The last Final version WinZip 8.1

WinZip is probably the most popular archiver, it has built-in support for unzipping .CAB files and files of popular "Internet formats" such as TAR, GZip, UUencode, BinHex and MIME. Unzipping ARJ, LZH and ARC files is supported through the appropriate archivers. WinZip is simple and easy to use, has an intuitive interface that allows even novice users to work with it without preparation. An external antivirus can be connected to the program to scan archive files for viruses. The archiver can work in two styles: Wizard (wizard, assistant) Classic (standard, classic). The Wizard style is for those who have not yet mastered the archiver or who like to work step by step, answering the corresponding questions of the program.


Support for dragging and dropping files to / from the archive, as well as integration with the explorer, make WinZip a very easy-to-use archiver. By clicking the right mouse button in the explorer, a context-sensitive menu is called up, which can be configured in the options. Menu items allow you to add files to the archive, create a new archive, unzip files from the archive, create a self-extracting archive, zip files and send by e-mail, I use the mail client installed by default.


If the file on which the button is clicked is a ZIP archive, then the menu in the explorer will look like this:


By default, WinZip associates itself with the following file extensions:


WinZip supports the creation of multivolume archives.

You can download add-ons to the program:

WinZip Command Line Support Add-On- to work with archives from the command line.

WinZip Internet Browser Support Add-On- to simplify downloading archives from the Internet, unpacking them and installing programs. The archive is automatically downloaded to the directory specified in the settings and upon completion it is opened in WinZip.

WinZip Self-Extractor - for creating self-extracting archives. WinZip Self-Extractor, although it can work as an add-on to WinZip, is a standalone software product... Since version 8.0, WinZip has a built-in WinZip Self-Extractor Personal Edition with somewhat limited capabilities compared to WinZip Self-Extractor.

Probably the second most popular archiver after WinZip, and in Russia it is probably even ahead of it. The latest final version is WinRAR 2.90


WinRAR works on Windows 9x / ME / NT / 2000 / XP. There is a console version of Rar, as well as versions for Linux, BeOS, DOS, OS / 2, and various Unix platforms. WinRAR is available in many languages, including Russian. The author of the program is Evgeny Roshal from Chelyabinsk.
The program implements an original compression algorithm that allows you to compress files well, especially executable ones, libraries and large text files, as well as a special algorithm for compressing multimedia files.
ZIP format is fully supported, as well as basic operations (viewing content, unpacking, displaying comments and information about archives) for CAB files, ARJ, LZH, TAR, GZ, ACE, UUE, BZ2, and JAR.


The ability to create continuous (solid) archives is supported, with an increase in the compression ratio by 10-50%, especially for a large number of files; creating multivolume and self-extracting archives. WinRAR is integrated into the explorer, when you right-click on a supported archive type, a context menu appears:


And when you select other files (and directories) - a menu for adding files to the archive (to start WinRAR and specify parameters) and a menu for creating a RAR archive from the selected files and directories.
WinRAR also allows you to protect archives from damage by storing redundant information, close archives from changes, archive with a password, and add comments to archives (with support for ANSI ESC sequences) and an entry about the creator (only for registered users).

PowerArchiver 2001

Latest version - PowerArchiver 2001 7.02.08


Built-in full support for ZIP, CAB, LHA (LZH), TAR, TAR.GZ, TAR.BZ2 and BH (BlakHole) files, as well as XXE and UUE files. Built-in support for unzipping RAR files, ARJ, ARC, ACE, ZOO, GZ and BZIP2. Built-in internal viewer for TXT, RTF, BMP, ICO, WMF, EMF, GIF and JPG files. Support for skins. Ability to print a list of files in the archive or export the list to files in TXT or HTML format. Support for dragging and dropping files to / from archives. There are useful options for performing backups using your own scripts, the ability to unzip several archives at once, repair a damaged archive (only for ZIP), create a multivolume ZIP archive from a whole archive and vice versa, a built-in function for determining the type of archive. The ability to connect an external antivirus. Two models of viewing archives - classic and explorer-type, with two horizontally separated windows and showing a tree structure. In terms of the number of built-in features, the program is ahead of its competitors, due to which it quickly gained popularity among users.

Integration into explorer with custom menu:


On archive files, the pop-up menu by clicking the right mouse button looks like this.


And it looks like this when you click and select on files of other formats.


Additions to the program:

Command line parameter support - PowerArchiver Command Line Support Add-On or PowerArchiver Command Line.

To create self-extracting archives - PowerArchiver SFX Maker Add-On by David Cornish.

For those who want to create skins themselves, we will help PowerArchiver Toolbar-ImageList Creator.

WinAce

At the moment, the latest version is WinAce 2.11


Archiving in the following formats: ACE, ZIP, LHA, MS-CAB, JAVA JAR.


Unzipping ACE, ZIP, LHA, MS-CAB, RAR, ARC, ARJ, GZip, TAR, ZOO, JAR formats. Support for multivolume archives for ACE, ZIP, CAB files. Create self-extracting archives and restore archives for ACE and ZIP files. Command line access. Built-in viewer for Word documents, HTML, text files and major graphic formats: TIFF (* .tif; * .tiff), Photoshop (* .psd, * .pdd), Paintshop Pro (* .psp), * .png (Portable network graphic), GIF, BMP, Standard Windows bitmap (* .bmp, * .rle, * .dib), * .ico, SGI (* .bw, * .rgb, * .rgba, * .sgi), Autodesk ( * .cel; * .pic), Truevision (* .tga; * .vst; * .icb; * .vda; * .win), ZSoft Paintbrush (* .pcx, * .pcc), Word 5.x Screenshots (* .scr), Kodak Photo-CD (* .pcd), Portable pixel / gray map (* .ppm, * .pgm, * .pbm), Dr. Halo (* .cut, * .pal), SGI Wavefront (* .rla, * .rpf) and GFI fax (* .fax). Optimization function for existing archives. Explorer Integration: Pop-up context menus by right-clicking and an additional tab when viewing file properties (only for ACE and ZIP archives).



The ACE format is often used in file exchanges on a number of IRC server channels.

7-Zip

The latest version currently available is 7-Zip 2.30 Beta 12.


It is a relatively little-known archiver that provides a fairly high compression ratio for the ZIP format and also has its own 7z format with a high compression ratio. In addition, 7-Zip is free. V this review he came in to show that the most popular archivers listed above are not always the leaders in maximum compression.
So, the archiver fully supports the ZIP, GZIP, BZIP2, TAR, 7z formats, provides unpacking of RAR, CAB files. Work from the command line is possible. It integrates into the explorer, adding a simple menu of three items:

Comparison of functionality

Format support and other features

Format, functionWinZIP 8.1WinRAR 2.90PowerArchiver 2001 7.02.08WinAce Archiver 2.047-Zip 2.30 Beta 12
ZIPFullFullFullFullFull
RARNoFullUnpackingUnpackingUnpacking
ACENoUnpackingUnpackingFullNo
GzipUnpackingUnpackingUnpackingUnpackingFull
CABUnpackingUnpackingFullFullUnpacking
TARUnpackingUnpackingFullUnpackingFull
LZHExternalUnpackingFullFullNo
ARJExternalUnpackingUnpackingNoNo
BZ2NoUnpackingUnpackingNoFull
JARNoUnpackingNoUnpackingNo
BhNoNoFullNoNo
ARCNoNoNoUnpackingNo
ZOONoNoNoUnpackingNo
UUEUnpackingUnpackingFullNoNo
OtherXXe, BinHex, MIME - XXE - 7z
Support for multivolume archivesZIPRARZIPACE, ZIP, CABNo
Support for creating Solid archivesNoRARNoACE7z
AV recording supportNoRARNoACENo
Built-inWindowsWindows and DOSWindowsWindows and DOSWindows
External antivirus supportYesNoYesYesNo
Drag & Drop supportYesYesYesYesNo
Command line supportvia WinZip Command Line Support Add-OnFullBasic operations. Complete - Via PowerArchiver Command Line Support Add-OnFullFull
Support for comments in archivesASCII for ZIPASCII and ANSI for RAR and ZIPASCII for ZIPASCII, ANSI and HTMLNo

Testing

The purpose of this test was to obtain not an absolute value of the compression time, but a relative comparison of the speed of the compression ratio of the archivers participating in the testing. Compression ratio: the size of the source file (set of files) was taken as 100%, the table shows the volume of the resulting compression field as a percentage of the original file (set of files).

Testing was carried out on a system with the following configuration:

Intel Celeron 450MHz processor
HDD Fujitsu 20Gb
256Mb RAM
Windows 98 SE

2017 .EXE and DLL files, size 462,326,078 bytes

Archiving programOptionsCompression methodArchive formatArchiving time, min: secArchive size, bytesCompression ratio
WinRAR 2.90Dictionary size 1024KBBestRAR 16:57 185,829,854 40.19 %
WinRAR 2.90 BestRAR 32:40:00 174,505,219 37.75 %
WinRAR 2.90 BestZIP 12:29 201,984,371 43.69 %
WinZIP 8.1 MaximumZIP 16:10 202,072,691 43.71 %
7-Zip 2.30 Beta 12 MaximumZIP 29:37:00 196,345,086 42.47 %
7-Zip 2.30 Beta 12 Maximum7Z 29:10:00 169,185,782 36.59 %
WinAce Archiver 2.04 MaximumZIP 15:21 196,345,096 42.47 %
WinAce Archiver 2.04Solid,MaximumACE 2.0 20:34 160,158,266 34.65 %
WinAce Archiver 2.04Dictionary size 4096KB, optimized exe compressionMaximumACE 2.0 18:32 176,050,278 38.08 %
WinAce Archiver 2.04Dictionary size 4096KBMaximumACE 18:21 183,747,786 39.74 %
PowerArchiver 2001 7.02.08 MaximumZIP 14:13 201,838,065 43.66 %

521 Word files, size 32,175,596 bytes

Archiving programOptionsCompression methodArchive formatArchiving time, min: secArchive size, bytesCompression ratio
WinRAR 2.90Dictionary size 1024KBBestRAR 1:14 8,068,122 25.08 %
WinRAR 2.90Solid, Dictionary size 1024KBBestRAR 1:30 5,538,095 17.21 %
WinRAR 2.90 BestZIP 1:03 9,462,371 29.43 %
WinZIP 8.1 MaximumZIP 1:29 9,470,530 29.43 %
7-Zip 2.30 Beta 12 MaximumZIP 2:22 9,087,254 28.24 %
7-Zip 2.30 Beta 12 Maximum7Z 2:05 7,302,364 22.70 %
7-Zip 2.30 Beta 12SolidMaximum7Z 2:04 4,717,281 14.66 %
WinAce Archiver 2.04 MaximumZIP 1:11 9,470,116 29.43 %
WinAce Archiver 2.04 MaximumACE 2.0 1:28 5,245,381 16.30 %
WinAce Archiver 2.04Dictionary size 4096KB, optimized exe compressionMaximumACE 2.0 1:21 7,963,681 24.75 %
WinAce Archiver 2.04Dictionary size 4096KBMaximumACE 1:17 8,060,489 25.05 %
WinAce Archiver 2.04 MaximumACE 1:24 5,309,725 16.50 %
PowerArchiver 2001 7.02.08 MaximumZIP 1:01 9,458,970 29.40 %

ZIP 1:53 48,639,712 97.28 %
7-Zip 2.30 Beta 12 Maximum7Z 3:57 48,555,679 97.11 %
WinAce Archiver 2.04 MaximumZIP 1:11 48,452,915 96.90 %
WinAce Archiver 2.04Dictionary size 4096KB, optimized exe compressionMaximumACE 2.0 3:08 48,571,875 97.14 %
WinAce Archiver 2.04Dictionary size 4096KBMaximumACE 3:08 48,571,875 97.14 %
PowerArchiver 2001 7.02.08 MaximumZIP 0:51 48,452,892 WinZIP 8.1 MaximumZIP 5:42 7,056,986 21.93 %
7-Zip 2.30 Beta 12 MaximumZIP 4:36 7,041,872 21.89 %
7-Zip 2.30 Beta 12 Maximum7Z 9:59 5,824,793 18.10 %
7-Zip 2.30 Beta 12SolidMaximum7Z 4:17 4,227,902 13.14 %
WinAce Archiver 2.04 MaximumZIP 5:32 7,098,841 22.06 %
WinAce Archiver 2.04Solid, Dictionary size 4096KB, delta compressionMaximumACE 2.0 7:14 5,152,231 16.01 %
WinAce Archiver 2.04Dictionary size 4096KB, optimized exe compressionMaximumACE 2.0 16:55 6,353,898 19.75 %
WinAce Archiver 2.04Dictionary size 4096KBMaximumACE 16:53 6,388,514 19.86 %
WinAce Archiver 2.04Solid, Dictionary size 4096KB, optimized exe compressionMaximumACE 7:15 5,164,797 16.05 %
PowerArchiver 2001 7.02.08 MaximumZIP 5:26 7,089,947 22.04 %

conclusions

Based on the results of the review:



The most common archive formats today are ZIP, RAR, Gzip, TAR. Therefore, an archiver wishing to become popular should, if possible, support these formats.

The most common today are archivers, which provide the user with maximum convenience and ease of use and ensure the performance of the functions necessary for the user. The compression ratio of information has faded into the background for users today.

Based on the test results:



If we take only popular formats, then when archiving in ZIP format, in order to achieve the shortest archiving time, it is recommended to use WinRAR archivers and PowerArchiver, for the smallest archive - 7-Zip; executable files best compresses WinAce; Word documents are best compressed with WinAce and WinRAR; with a lot of small files, WinRAR does the best job.

The Solid option for creating continuous archives, available in some archivers, helps when compressing a large number of files, reducing the size of the archives and significantly reducing the compression time (although with a small number of files, the archiving time using this option increases).
In addition to the well-known archivers and archive formats, there are a large number of lesser known archivers that support their formats, which in some cases may surpass their well-known counterparts in terms of their characteristics, in particular in terms of compression ratio. An example of this is the 7-Zip archiver with its 7z format, which won almost all the tests in terms of compression ratio.