Menu
Is free
registration
home  /  Advice/ What is ansi. Character encoding - What is ANSI format? A brief history of encodings

What is ansi. Character encoding - What is ANSI format? A brief history of encodings

Basically "ANSI" refers to the legacy code page in Windows. See also on this topic. The first 127 characters are identical to ASCII in most code pages, but the upper characters are different.

However ANSI automatically not means CP1252 or Latin 1.

Despite all the confusion, you should simply avoid such problems for the time being and use Unicode.

What is ANSI encoding format? it system format default? How is it different from ASCII?

At one time Microsoft, like everyone else, used 7-bit character sets, and they came up with their own when they fit, although they kept ASCII as the main subset. Then they realized that the world had moved to 8-bit encodings and that there were international standards such as the ISO-8859 family. In those days, if you wanted an international standard and you lived in the United States, you bought it from the American National Standards Institute, ANSI, which reissued international standards with its own branding and numbers (this is because the US government wants to comply with American standards, and not international standards). So a copy of Microsoft ISO-8859 said "ANSI" on the cover. And because Microsoft wasn't very used to standards in those days, they didn't realize that ANSI had published many other standards. So they referenced the ISO-8859 family of standards (and the variants they invented because they didn't understand standards in those days) by the title on the cover "ANSI" and it found its way into Microsoft's user documentation and thus the community users. It was about 30 years ago, but you still hear the name sometimes today.

Or you can query your registry:

C: \> reg query HKEY_LOCAL_MACHINE \ SYSTEM \ CurrentControlSet \ Control \ Nls \ CodePage / f ACP HKEY_LOCAL_MACHINE \ SYSTEM \ CurrentControlSet \ Control \ Nls \ CodePage ACP REG_SZ 1252 End of search: 1 match (es) found. C: \>

When using single-byte characters, ASCII defines the first 127 characters. Extended characters from 128-255 are defined different codes ANSI to provide limited support for other languages. To understand ANSI encoding, you need to know what codepage it is using.

Technically ANSI should be the same as US-ASCII. It belongs to the ANSI X3.4 standard, which is simply the ASCII version of the ANSI organization. The use of upper-bit characters is not defined in ASCII / ANSI, as it is a 7-bit character set.

However, the years of misuse of the term DOS and subsequently by the Windows community have left their practical meaning as "the system code page of any machine." The system code page is also sometimes known as "mbcs", as in East Asian systems that can be an encoding with multiple bytes per character. Some code pages may even use upper-bit bytes as bytes of bytes in a multibyte sequence, so it is not even strictly compliant with plain ASCII ... but even then it is still called ANSI.

In the US and Western Europe default settings, "ANSI" is mapped to code page Windows 1252. This is not the same as ISO-8859-1 (although it is quite similar). On other machines, it could have been anything. This makes ANSI completely useless as an external encoding identifier.

I remember when ANSI text referred to the pseudo-VT-100 escape codes used in DOS through the ANSI.SYS driver to change the stream of the stream text .... This is probably not what you are talking about, but if it sees

ANSI is the institution for the standardization of industrial methods and technologies. It is a member of the International Organization for Standardization (ISO). In Germany, there is an analogue of such an organization - the German Standards Institute (DIN), in Austria - the Austrian Standards Institute (ASI), in Switzerland - the Swiss Standards Association (SNV).

Although ANSI standards are found in many industrial areas, the separate abbreviation "ANSI" in computer technology denotes a specific group of characters based on ASCII. Genuine ANSI standard does not exist, however, ANSI projects smoothly adopted ISO 8859 standard.

ANSI Objectives

The main task of the American National Standards Institute (ANSI) is the dissemination and implementation of US national standards around the world, at enterprises in all countries.

In addition, the work of this institute solves problems of a global scale:

  • environmental Protection,
  • industrial safety,
  • household safety.

It is known that in the United States, as in Russia, standards are primarily regulated by the state (although ANSI positions itself as a non-profit, non-governmental organization), so the desire to fill this niche and bring all the norms to the American denominator is a completely logical and consistent thought. Indeed, through standards, it is possible to disseminate not only purely technical innovations, but also to carry out the state foreign policy of globalization and world integration.

To support the ANSI program, the state spends a large budget, which is spent mainly on optimization, updating and reorganization of production methods. In the steel industry, ANSI standards have long established themselves as some of the best in the world.

Our company is also willing in its work in the production of flange products, which are sold in huge quantities to industrial enterprises in Russia and the CIS countries.

Sometimes even a fairly experienced specialist will not immediately tell you what a particular value of pressure or length in one system corresponds to values ​​in another system of values.

To facilitate for you this task, we offer tables of the ratio of pressure and length values ​​in the European and American systems with small explanations... But first, a few words about the standards themselves.


DIN is the German standard (stands for Deutsches Institut für Normung, that is, developed by the German Institute for Standardization), which is developed strictly within the framework of the provisions of the International Organization for Standardization - ISO (International Organization for Standardization).


ANSI- the standard adopted in the United States of America. Stands for American National Standards Institute, that is, the standard of the American National Standards Institute.

Accordingly, ANSI standards are determined by this institution, and far not always between standards DIN and ANSI the exact conformity in various fields.

Pressure Units Conversion from ANSI to DIN

Everything is simple here: if by the standard ANSI the number 150 stands opposite the pressure - this means that the nominal (for which the valve is designed) pressure is 20 bar, 300 - 50 bar, etc. Maximum value by ANSI Class- 2500 will be equal to 420 bar according to the European standard DIN.


Using this table, not difficult translate pressure values ​​and back: from DIN v ANSI, although our engineers need to carry out such a translation much less often.

Conversion of units of length from the American system to the European (Russian)

As is known, the americans everything is measured in inches and feet, and we and Europeans- millimeters, centimeters and meters, that is, like the vast majority of states in the world, we live in metric system of units.


How to convert inches to millimeters? In fact, this is also not difficult, just remember that 1 inch equals 25.4 mm. However, quite often the number after the decimal point neglected and, for the sake of it, indicate that 1 inch = 25mm.

Thus, if, for example, the cross-section of the inlet is 2 inches according to the American system of measures, then, by translating this value into our system of measures according to the above rule, we get 50 mm or, more precisely, 51 mm (rounding 50.8 according to the rules) ...

It remains to add that the diameter in technical characteristics are marked with Latin letters DN and is often indicated precisely in inches, and pressure is indicated by the letters PN and is indicated most often in bars- in any case, we use just such a marking as the most comfortable.

And the next table will help you can calculate not only precise the number of millimeters in one inch (with an accuracy of a thousandth of a millimeter), but it also helps to find out how many millimeters are contained, for example, in 2.5 inches.

To do this, find column 2 "" (2 inches), and on the left look for 1/2. Total 2.5 inches = 63.501 mm, which is quite possible to round up to 64 mm, and, for example, 6.25 inches (i.e. 6 and 1/4) = 158.753 mm or 159 mm.


Inches "" in millimeters



ANSI Lumen (lm, lm), the unit is ...

ANSI lumen is a unit of measure for the illumination in multimedia projectors created by a lamp when it shines through a lens. "Lumen" in Latin means "light", ANSI stands for "American National Standards Institute". It is a luminous flux measurement standard used to compare projectors.

This parameter was introduced in 1992 by the American Institute of National Standards as a unit representing the average luminous flux on a 40 "control screen at the minimum focal length of the projector's zoom lens.

The measurement is carried out on a full white picture, the illumination of the screen is measured with a lux meter in Lux at 9 control points of the screen. The luminous flux value is calculated as the average of these 9 measurements - multiplied by its area and averaged.

The resulting light energy on the screen for each square meter is indicated in lux and is calculated using the formula: lux = lumen / m². But lumen / lux measurement varies with environment, fixture setup and projected image, so ANSI Lumens is now widely accepted as the standard.

This measurement allows you to evaluate the uniformity of the distribution of the luminous flux over the surface of the screen. Reducing the brightness of an image around its edges is called a "Hot Spot" or light spot. The uniformity of the luminous flux distribution is calculated as the ratio of the smallest to the largest of the obtained illuminance measurements. In good projectors, this value does not fall below 70%.

This technique accurately describes the order in which measurements are taken. Under strictly defined environmental conditions and device settings, the image projected on the screen is divided by nine equal parts, and in each of them the light energy is determined. The average of all nine measurements multiplied by the screen area in m² gives the ANSI lumen value.

Interestingly, the luminous flux, unlike the illumination (measured in ANSI lumens), does not depend on the projected area. In addition, manufacturer-specific ANSI lumens are often referenced to reference maximum settings that are rarely used in practice.

Also, ANSI lumens are often just an average, making it difficult to infer how well or poorly the projector is at distributing light across the screen surface.

ANSI lumens for digital projectors can range from 900 ANSI lumens for older models to 4,700 ANSI lumens for today's high-end products. A good digital home theater projector should have around 2000 ANSI lumens.

Reg.ru: domains and hosting

The largest registrar and hosting provider in Russia.

More than 2 million domain names in service.

Promotion, mail for domain, business solutions.

More than 700 thousand customers around the world have already made their choice.

* Hover your mouse to pause scrolling.

Back forward

Encodings: useful information and a brief retrospective

I decided to write this article as a small overview on the issue of encodings.

We will figure out what encoding is in general and touch on the history of how they appeared in principle.

We will talk about some of their features and also consider the moments that allow us to work with encodings more consciously and avoid the appearance on the site of the so-called krakozyabrov, i.e. unreadable characters.

So let's go ...

What is encoding?

To put it simply, encoding is a table of character mappings that we can see on the screen, to certain numeric codes.

Those. each character that we enter from the keyboard, or see on the monitor screen, is encoded with a certain sequence of bits (zeros and ones). 8 bits, as you probably know, are equal to 1 byte of information, but more on that later.

The appearance of the symbols themselves is determined by the font files that are installed on your computer. Therefore, the process of displaying text on the screen can be described as a constant mapping of sequences of zeros and ones to some specific characters that make up the font.

The progenitor of all modern encodings can be considered ASCII.

This abbreviation stands for American Standard Code for Information Interchange(American Standard Coding Table for printable characters and some special codes).

it single-byte encoding, which initially contained only 128 characters: letters of the Latin alphabet, Arabic numerals, etc.


Later it was expanded (initially it did not use all 8 bits), so it became possible to use not 128, but 256 (2 to the 8th power) different characters that can be encoded in one byte of information.

This improvement made it possible to add to ASCII symbols of national languages, in addition to the already existing Latin alphabet.

There are a lot of options for extended ASCII encoding due to the fact that there are also many languages ​​in the world. I think that many of you have heard of such an encoding as KOI8-R is also an extended ASCII encoding designed to work with the characters of the Russian language.

The next step in the development of encodings can be considered the emergence of the so-called ANSI encodings.

In fact, they were the same extended ASCII versions however, various pseudo-graphic elements have been removed from them and typographic symbols have been added, for which there was previously not enough "free space".

An example of such ANSI encoding is the well-known Windows-1251... In addition to typographic characters, this encoding also included letters of the alphabets of languages ​​close to Russian (Ukrainian, Belarusian, Serbian, Macedonian and Bulgarian).


ANSI encoding is a collective name... In fact, the actual encoding when using ANSI will be determined by what is specified in the registry of your operating room. Windows systems... In the case of the Russian language, it will be Windows-1251, however, for other languages ​​it will be a different kind of ANSI.

As you understand, a bunch of encodings and the lack of a single standard did not bring to good luck, which was the reason for frequent meetings with the so-called krakozyabrami- an unreadable meaningless set of characters.

The reason for their appearance is simple - it is trying to display characters encoded with one encoding table using a different encoding table.

In the context of web development, we may encounter krakozyabras when, for example, Russian text is mistakenly saved in the wrong encoding that is used on the server.

Of course, this is not the only case when we can get unreadable text - there are a lot of options here, especially when you consider that there is also a database in which information is also stored in a certain encoding, there is a mapping of a connection to a database, etc.

The emergence of all these problems served as an incentive to create something new. It had to be an encoding that could encode any language in the world (after all, with the help of single-byte encodings, at all desire, one cannot describe all characters, say, Chinese, where there are clearly more than 256), any additional special characters and typography.

In short, it was necessary to create a universal encoding that would solve the problem of krakozyabrov once and for all.

Unicode - Universal Text Encoding (UTF-32, UTF-16, and UTF-8)

The standard itself was proposed in 1991 by a non-profit organization Unicode Consortium(Unicode Consortium, Unicode Inc.), and the first result of his work was the creation of the encoding UTF-32.

By the way, the abbreviation itself UTF stands for Unicode Transformation Format(Unicode Conversion Format).

In this encoding, to encode one character, it was supposed to use as much 32 bit, i.e. 4 bytes of information. If we compare this number with single-byte encodings, then we come to a simple conclusion: to encode 1 character in this universal encoding, you need 4 times more bits, which makes the file 4 times heavier.

It is also obvious that the number of characters that could potentially be described using this encoding exceeds all reasonable limits and is technically limited to a number equal to 2 to the 32nd power. It is clear that this was a clear overkill and waste in terms of the weight of the files, so this encoding has not become widespread.

It was replaced by a new development - UTF-16.

As the name implies, in this encoding one character is encoded no longer 32 bits, but only 16(i.e. 2 bytes). Obviously, this makes any character twice "lighter" than UTF-32, but twice as heavy as any single-byte encoded character.

The number of characters available for encoding in UTF-16 is at least 2 to the 16th power, i.e. 65536 characters. Everything seems to be good, besides the final value of the code space in UTF-16 has been expanded to more than 1 million characters.

However, this encoding did not fully satisfy the needs of the developers. For example, if you write using exclusively Latin characters, then after switching from the extended version of the ASCII encoding to UTF-16, the weight of each file doubled.

As a result, another attempt was made to create something universal, and that something became the well-known UTF-8 encoding.

UTF-8- this is multibyte encoding with variable character length... Looking at the name, one might think, by analogy with UTF-32 and UTF-16, that 8 bits are used to encode one character, but this is not the case. More precisely, not quite so.

This is because UTF-8 provides the best compatibility with older systems that used 8-bit characters. To encode one character in UTF-8 is actually used 1 to 4 bytes(hypothetically, up to 6 bytes are possible).

In UTF-8, all Latin characters are encoded in 8 bits, just like in ASCII encoding... In other words, the basic part of the ASCII encoding (128 characters) has moved to UTF-8, which allows you to "spend" only 1 byte on their representation, while maintaining the universality of the encoding, for which everything was started.

So, if the first 128 characters are encoded with 1 byte, then all other characters are encoded with 2 or more bytes. In particular, each Cyrillic character is encoded with exactly 2 bytes.

Thus, we got a universal encoding that allows us to cover all possible characters that need to be displayed, without unnecessarily "weighting" the files.

With or without BOM?

If you worked with text editors(code editors) like Notepad ++, phpDesigner, rapid php etc., you probably drew attention to the fact that when specifying the encoding in which the page will be created, you can choose, as a rule, 3 options:

ANSI
- UTF-8
- UTF-8 without BOM


I must say right away that it is always the last option that is worth choosing - UTF-8 without BOM.

So what is BOM and why don't we need it?

BOM stands for Byte order mark... This is a special Unicode character used to indicate byte order. text file... According to the specification, its use is optional, but if BOM is used, then it must be set at the beginning of the text file.

We will not go into details of the work. BOM... For us, the main conclusion is as follows: using this service character together with UTF-8 prevents programs from reading the encoding normally, as a result of which errors occur in the work of scripts.