Is there anyone who is familiar with module programming

What is an Application Binary Interface (ABI)?


I never clearly understood what an ABI is. Please do not refer me to any Wikipedia article. If I could understand, I wouldn't be here to do such a long post.

This is my take on various interfaces:

A TV remote control is an interface between the user and the television. It is an existing entity, but useless on its own (has no functionality). All functions for each of these buttons on the remote control are implemented in the TV.

Interface: It is an "existing entity" between and this functionality. An interface in itself does not matter. It only calls the functionality behind it.

Depending on who the user is, there are now different types of interfaces.

CLI Commands (Command Line Interface) are the existing entities, the consumer is the user and the functionality is behind them.

my software functionality solving a purpose for which we describe this interface.

Commands

user

Windows, buttons, etc. of the graphical user interface (GUI) are the entities present, and again the consumer is the user, and the functionality lies behind that.

my software functionality that solves a problem for which we describe this interface.

Windows, buttons etc ..

user

API Features ( Application programming interface) (or more precisely) interfaces (in interface programming) are the existing entities, the consumer is here another program, not a user, and again the functionality lies behind this layer.

my software functionality that solves a problem for which we describe this interface.

Functions, interfaces (array of functions).

another program / application.

Application Binary Interface (ABI) This is where my problem begins.

???

???

???

  • I've written software in different languages ​​and deployed different types of interfaces (CLI, GUI, and API), but I'm not sure I've ever deployed an ABI.

Wikipedia says:

ABIs cover details like

  • Data type, size and alignment;
  • the calling convention, which controls how the arguments of functions are passed and return values ​​are obtained;
  • the system phone numbers and how an application should make system calls to the operating system;

Other ABIs standardize details like

  • the C ++ name Mangling,
  • Exception spread and
  • Calling convention between compilers on the same platform, but does not require cross-platform compatibility.
  • Who needs these details? Please don't tell the operating system. I am familiar with assembly language programming. I know how linking and loading works. I know exactly what is going on inside.

  • Why did C ++ Name Mangling come into play? I thought we were talking on a binary level. Why do languages ​​come in?

Anyway, I have the [PDF] System V Application Binary Interface Edition 4.1 (1997-03-18) downloaded to to see what exactly it contains. Well, most of it didn't make sense.

  • Why is there two chapters (4th and 5th) to describe the ELF file format? In fact, these are the only two essential chapters in this specification. The rest of the chapters are "processor specific". Anyway, I think it's a completely different topic. Please do not say ELF file format specifications are the ABI. It is not interface according to the definition.

  • I know since we are speaking at such a low level it has to be very specific. But I'm not sure how it is "Instruction Set Architecture (ISA)" specific?

  • Where can I find the Microsoft Windows ABI?

So these are the main questions that annoy me.






Reply:


An easy way to understand "ABI" is to compare it to "API".

You already know the concept of an API. For example, if you want to use the functions of a library or your operating system, you program against an API. The API consists of data types / structures, constants, functions, etc. that you can use in your code to access the functions of this external component.

An ABI is very similar. Think of this as a compiled version of an API (or a machine language level API). When you write source code, you access the library through an API. Once the code is compiled, your application will access the binary data in the library through the ABI. The ABI defines the structures and methods your compiled application uses to access the external library (just like the API), just at a lower level. Your API defines the order in which you pass arguments to a function. Your ABI defines the mechanics, howThese arguments are passed (register, stack, etc.). Your API defines which functions are part of your library. Your ABI defines how your code is stored in the library file so that any program using your library can find and perform the function they want.

ABIs are important when it comes to applications that use external libraries. Libraries are full of code and other resources, but your program needs to know how to find the information it needs in the library file. Your ABI defines how the contents of a library are stored in the file, and your program uses the ABI to search the file and find what it needs. If everything in your system corresponds to the same ABI, any program can work with any library file, regardless of who created it. Linux and Windows use different ABIs, so a Windows program does not know how to access a library compiled for Linux.

Sometimes ABI changes are inevitable. In this case, all programs that use this library will only work if they have been recompiled to use the new version of the library. When the ABI changes but the API does not, the old and new library versions are sometimes referred to as "source compatible". This means that while a program compiled for one library version will not work with the other, if it is recompiled, it will work for one written source code for the other.

Because of this, developers tend to keep their ABI stable (to minimize interference). In order to keep an ABI stable, the function interfaces (return type and number, types and order of the arguments), definitions of data types or data structures, defined constants etc. do not have to be changed. New functions and data types can be added, but existing ones must be kept the same. For example, if your library uses 32-bit integers to indicate the offset of a function and you switch to 64-bit integers, code that is already compiled using this library will not correctly access this field (or any one of it) . Access to data structure elements is converted to memory addresses and offsets during compilation. If the data structure changes,

An ABI is not necessarily something that you explicitly provide unless you are doing very simple system design work. It is also not language specific as (for example) a C application and a Pascal application can use the same ABI after compiling.

To edit:Regarding your question about the chapters on the ELF file format in the SysV ABI documents: This information is included because the ELF format defines the interface between the operating system and the application. When you tell the operating system to run a program, it expects the program to be formatted in a certain way, for example, expects the first section of the binary to be an ELF header that contains certain information about certain memory offsets. In this way, the application transmits important information about itself to the operating system. If you write a program in a non-ELF binary format (such as a.out or PE), an operating system expecting ELF-formatted applications will not be able to interpret the binary or run the application.

IIRC, Windows currently uses the Portable Executable (or PE) format. There are links in the External Links section of this Wikipedia page with more information on the PE format.

Regarding your note about C ++ name association, when you search for a function in a library file, the function is usually searched by name. In C ++, you can overload function names so that the name alone is insufficient to identify a function. C ++ compilers have their own ways of dealing with this internally, the so-called Name Mangling . An ABI can define a standard way of coding the name of a function so that programs written in a different language or compiler can find what they need. When using in a C ++ program, you are instructing the compiler to use a standardized way of recording names that other software can understand.







If you are familiar with assembly and how it works at the operating system level, you conform to a specific ABI. The ABI regulates, for example, how parameters are passed and where return values ​​are placed. For many platforms there is only one ABI to choose from, and in these cases the ABI is just "how things work".

However, the ABI also regulates things like the layout of classes / objects in C ++. This is necessary if you want to pass object references across module boundaries or if you want to mix code compiled with different compilers.

If you have a 64-bit operating system that can run 32-bit binaries, you have different ABIs for 32- and 64-bit code.

In general, any code that you link to the same executable must conform to the same ABI. If you want to communicate between code using different ABIs, you must use some form of RPC or serialization protocols.

I think you are trying too hard to fit different types of interfaces into one fixed set of characteristics. For example, an interface does not necessarily have to be divided into consumers and manufacturers. An interface is just a convention by which two entities interact.

ABIs can be (partially) ISA-agnostic. Some aspects (e.g. calling conventions) depend on the ISA, others (e.g. the layout of the C ++ class) do not.

A well-defined ABI is very important to people who write compilers. Without a well-defined ABI, it would be impossible to generate interoperable code.

EDIT: Some notes for clarification:

  • "Binary" in ABI does not preclude the use of strings or text. If you want to link a DLL that exports a C ++ class, the methods and type signatures must be encoded somewhere in it. This is where C ++ Name Mangling comes into play.
  • The reason you've never provided an ABI is because the vast majority of programmers never will. ABIs are provided by the same people who design the platform (i.e., the operating system), and very few programmers will ever have the privilege of designing a widely used ABI.






You actually need none at all ABI if ...

  • Your program has no functions and--
  • Your program is a single executable file that runs on its own (i.e. an embedded system) where literally just the program runs and nothing else needs to be spoken to.

A simplified summary:

API: "Here are all the features you can access."

ABI: "This is how to call up a function. "

The ABI is a set of rules that compilers and linkers follow in order to compile your program so that it works properly. ABIs cover several topics:

  • Arguably the largest and most important part of an ABI is the procedure calling standard, sometimes referred to as the "calling convention". Calling conventions standardize how "functions" are translated into assembly code.
  • ABIs also determine how that Names of exposed functions should be represented in libraries so that other code can call those libraries and know which arguments to pass. This is called "Name Mangling".
  • ABIs also determine what kind of data types can be used, how they must be aligned, and other low-level details.

A closer look at the calling convention, which I consider to be the core of an ABI:

The machine itself has no concept of "functions". When you write a function in a high-level language like c, the compiler generates a line of assembly code like. this is a description which is finally resolved into an address by the assembler. This label marks the "start" of your "function" in the assembly language. If you "call" this function in the high-level code, the CPU actually causes the address of this label to be accessed leap and continue there.

In preparation for the jump, the compiler must do a number of important things. The calling convention is like a checklist that the compiler follows to do all of these things:

  • First, the compiler adds a little assembly language code to store the current address so that the CPU can jump back to the right place after your "function" is complete and continue executing.
  • Next, the compiler generates assembly code to pass the arguments.
    • Some calling conventions dictate that arguments should be pushed on the stack (of course in a certain order ).
    • Other conventions dictate that the arguments should be placed in specific registers (of course depending on their data types ).
    • Still other conventions dictate that a particular combination of stacks and registers should be used.
  • If there was something important in these registers before, these values ​​will now be overwritten and will be lost forever. Therefore, some calling conventions may dictate that the compiler must save some of these registers before putting the arguments into them.
  • The compiler now inserts a jump instruction which asks the CPU to switch to the label created previously (). At this point you can think of the CPU as "in" your "function".
  • At the end of the function, the compiler inserts an assembly code with which the CPU writes the return value in the correct place. The calling convention determines whether the return value should be placed in a specific register (depending on its type) or on the stack.
  • Now is the time to clean up. The calling convention determines where the compiler places the cleanup assembly code.
    • Some conventions state that the caller must clean up the stack. This means that after the "function" has been executed and the CPU jumps back to its previous state, the next code to be executed should be very specific cleanup code.
    • Other conventions say that there should be some specific parts of the cleanup code at the end of the "function" in front the jump back.

There are many different ABIs / calling conventions. Some of the most important are:

  • For the x86 or x86-64 CPU (32-bit environment):
    • CDECL
    • STDCALL
    • FASTCALL
    • VECTORCALL
    • THIS CALL
  • For the x86-64 CPU (64-bit environment):
    • SYSTEMV
    • MSNATIVE
    • VECTORCALL
  • For the ARM CPU (32-bit)
  • For the ARM CPU (64-bit)

Here's a great page that actually shows the assembly differences generated when compiling for different ABIs.

Another thing to mention is that an ABI doesn't just do that within of the executable module of your program is relevant is . It will also used by the linker to ensure that your program calls the library functions correctly. Several shared libraries are running on your computer. As long as your compiler knows which ABI it's using each, it can properly call functions from them without blowing up the stack.

It is extremely It is important that your compiler understands how library functions are called. On a hosted platform (that is, a platform on which an operating system loads programs), your program cannot even blink without making a kernel call.


An Application Binary Interface (ABI) is similar to an API, but the function is not accessible to the caller at the source code level. Only a binary representation is accessible / available.

ABIs can be defined at the processor architecture level or at the operating system level. The ABIs are standards that the code generator phase of the compiler must follow. The standard is set by either the operating system or the processor.

Functionality: Define the mechanism / standard to make function calls independent of the implementation language or a specific compiler / linker / toolchain. Provide the mechanism that enables JNI or a Python C interface, etc.

Existing entities: functions in machine code form.

Consumer: Another function (including one in a different language compiled by another compiler or linked by another linker).



Functionality: A set of contracts that affect the compiler, assembly writer, linker, and operating system. The contracts specify how functions are arranged, where parameters are passed, how parameters are passed, how function returns work. These are generally specific to a tuple (processor architecture, operating system).

Existing entities: parameter layout, function semantics, register assignment. For example, the ARM architecture has numerous ABIs (APCS, EABI, GNU-EABI, regardless of a number of historical cases). Using a mixed ABI will result in your code simply not working when called across borders.

Consumer: The compiler, assembly writer, operating system, CPU-specific architecture.

Who needs these details? The compiler, assembly writer, linker that generate code (or alignment requirements), operating system (interrupt handling, syscall interface). If you have carried out module programming, you have adapted yourself to an ABI!

C ++ name linking is a special case - it's a problem centered on linkers and dynamic linkers - if name linking is not standardized, dynamic linking will not work. From now on, the C ++ ABI is called just that, the C ++ ABI. It is not a linker-level problem, but a code generation problem. Once you have a C ++ binary, there is no way to make it compatible with another C ++ ABI (Name Mangling, Exception Handling) without recompiling it from source.

ELF is a file format for using a loader and dynamic linker. ELF is a container format for binary code and data and as such specifies the ABI of a piece of code. I wouldn't think of ELF as an ABI in the strictest sense because PE executables are not an ABI.

All ABIs are instruction set specific. An ARM ABI does not make sense on an MSP430 or x86_64 processor.

Windows has several ABIs. For example, Fastcall and stdcall are two commonly used ABIs. The Syscall ABI is different again.


Let me answer at least part of your question. With an example of how the Linux ABI affects the system calls and why this is useful.

A system call is a way for a userspace program to ask the kernel space for something. It works by putting the numeric code for the call and argument in a specific register and triggering an interrupt. Then there is a switch to the kernel space and the kernel searches for the numerical code and the argument, processes the request, stores the result in a register and triggers a switch back to userspace. This is necessary, for example, when the application wants to allocate memory or open a file (syscalls "brk" and "open").

Now the system calls have short names "brk" etc. and corresponding opcodes that are defined in a system-specific header file. As long as these opcodes stay the same, you can run the same compiled userland programs on different updated kernels without recompiling them. So you have an interface that is used by precompiled binaries, hence ABI.


To call code in shared libraries or code between compilation units, the object file must contain labels for the calls. C ++ mangles the names of method labels to force data to be hidden and to allow overloaded methods. Because of this, you cannot mix files from different C ++ compilers unless they explicitly support the same ABI.


The best way to differentiate between ABI and API is to know why and what it is used for:

For x86-64 there is generally an ABI (and for x86 32-bit there is a different set):

http://www.x86-64.org/documentation/abi.pdf

https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html

http://people.freebsd.org/~obrien/amd64-elf-abi.pdf

Linux + FreeBSD + MacOSX follow with a few minor differences. And Windows x64 has its own ABI:

http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/

If you know the ABI and assume that another compiler will follow it too, the binary files theoretically know how to call each other (especially the library API) and pass parameters via the stack or via registers, etc. Or which registers are changed when the functions are called, etc. This knowledge essentially helps the software to integrate with one another. If I know the order of the registers / stack layout, I can easily put together different software written in assemblies without any problems.

But API are different:

These are high-level function names that have an argument defined. When different pieces of software are built using this API, they can potentially call into one another. However, an additional requirement from SAME ABI must be complied with.

For example, Windows used to be POSIX API compatible:

https://en.wikipedia.org/wiki/Windows_Services_for_UNIX

https://en.wikipedia.org/wiki/POSIX

And Linux is POSIX compatible too. However, the binaries cannot simply be moved and executed immediately. However, since they used the same NAMES in the POSIX-compatible API, you can use the same software in C, recompile it in the different operating systems and get it working immediately.

APIs are designed to facilitate the integration of software - pre-compilation phase. So after compilation, the software can look completely different - if the ABIs are different.

ABIs are intended to define the exact integration of software on the binary / assembly level.







Example of a minimally executable Linux shared Linux library

In the context of shared libraries, the most important impact of a "stable ABI" is that you do not have to recompile your programs after changing the library.

So for example:

  • When you sell a shared library, you save your users the hassle of recompiling everything that depends on your library with each new version

  • If you sell a closed source program that depends on a shared library in the user's distribution, you can reduce the number of prebuilts to publish and test if you are certain that ABI is stable for certain versions of the target operating system.

    This is especially important in the case of the C standard library that has many, many programs associated with it in your system.

Now I would like to provide a minimal concrete working example of this.

Main c

mylib.c

mylib.h

Compiles and runs fine with:

Suppose we want to add a new field to the call for version 2 of the library.

If we added the field earlier like this:

and rebuilt the library, but not, then the claim fails!

This is because the line:

had generated an assembly trying to access the very first structure, which is now in place of the expected one.

Hence, this change broke the ABI.

But if we add afterwards:

Then the old generated assembly will still access the first structure and the program will continue to work because we kept the ABI stable.

Here is a fully automated version of this example on GitHub.

Another possibility to keep this ABI stable would have been to treat it as an opaque structure and only access its fields via method helpers. This makes it easier to keep the ABI stable, but it would add a performance overhead as we would be making more function calls.

API against ABI

In the previous example it is interesting that the note addition before, only broke the ABI, but not the API.

This means that if we had recompiled our program against the library, it would still have worked.

However, we would also have damaged the API if we had changed the function signature, for example:

in this case the compilation would stop completely.

Semantic API vs Programming API

We can also categorize API changes into a third type: semantic changes.

The semantic API is usually a natural description of what the API is supposed to do, usually included in the API documentation.

It is therefore possible to interrupt the semantic API without interrupting the program creation itself.

For example, if we had changed

to:

then this would not have damaged the programming API or ABI, but the semantic API would be damaged.

There are two ways to programmatically check the Contract API:

  • Try a number of corner suitcases. Easy to do, but you could always miss one.
  • formal review. Harder to do, but to provide a mathematical proof of correctness that unifies documentation and testing in a "human" / machine testable way! As long as your formal description doesn't contain an error ;-)

    This concept is closely related to the formalization of mathematics itself: / math / 53969 / what-does-formal-mean / 3297537 # 3297537

List of all elements that break C / C ++ ABIs for shared libraries

TODO: Find / create the ultimate list:

Java minimally executable example

What is binary compatibility in Java?

Tested in Ubuntu 18.10, GCC 8.2.0.


The ABI must be consistent between the caller and the callee to ensure that the call is successful. Stack usage, register usage, stack pop at the end of the routine. All of these are the most important parts of the ABI.


There are different interpretations and strong opinions about the exact layer that defines an ABI (Application Binary Interface).

In my opinion, an ABI is one subjective convention of what is considered a given / platform for a particular API. The ABI is the "remainder" of conventions that "don't change" for a particular API or that are handled by the runtime environment: executors, tools, linkers, compilers, JVM and operating system.

If you want to use a library like joda-time you have to declare a dependency on. The library follows best practices and uses semantic versioning. This defines API compatibility on three levels:

  1. Patch - you don't have to change your code at all. The library only fixes a few bugs.
  2. Small - you don't have to change your code since the additions
  3. Major - The interface (API) has changed and you may need to change your code.

Many other conventions must be observed in order to use a new major version of the same library:

  • The binary language used for the libraries (in Java cases the JVM target version which defines the Java bytecode)
  • Call conventions
  • JVM conventions
  • Link conventions
  • Runtime Conventions All of these are defined and managed by the tools we use.

Java case study

For example, Java did not standardize all of these conventions in one tool, but in a formal JVM specification. The specification allowed other vendors to provide other tools that could be used to issue compatible libraries.

Java has two other interesting case studies for ABI: Scala versions and Dalvik Virtual Machine.

Dalvik's virtual machine broke the ABI

The Dalvik VM requires a different bytecode than the Java bytecode. The Dalvik libraries are obtained by converting the Java bytecode (using the same API) for Dalvik. This way you can get two versions of the same API: defined by the original. We could call me and. They use a different ABI for the standard Java stacked VMs: Oracle, IBM, Open Java, or some other; and the second ABI is the one around Dalvik.

Successive Scala versions are not compatible

Scala has no binary compatibility between smaller Scala versions: 2.X. For this reason the same API "io.reactivex" %% "rxscala"% "0.26.5" has three versions (more in the future): for Scala 2.10, 2.11 and 2.12. What's changed? I don't know yet, but the binaries are incompatible. Probably the latest versions add things that make the libraries unusable on the old virtual machines, probably things related to association / naming / parameter conventions.

Successive versions of Java are not compatible

Java also has problems with the major versions of the JVM: 4,5,6,7,8,9. They only offer backward compatibility. Jvm9 can run compiled / targeted code (option from javac) for all other versions, while JVM 4 does not know how to run code that is JVM 5 targeted. All of this while having a joda library. This incompatibility flies under the radar thanks to various solutions:

  1. Semantic versioning: when libraries target a higher level JVM, they usually change the major version.
  2. Use JVM 4 as your ABI and you are safe.
  3. Java 9 adds a specification on how to include bytecode for specific target JVM in the same library.

API and ABI are just conventions used to define compatibility. The lower layers are generic of a variety of high-level semantics. So it's easy to make some conventions. The first type of convention concerns memory alignment, bytecoding, calling conventions, big and little endian encodings, etc. It also gives you the executable conventions like the others described, linking conventions, intermediate byte code like that used by Java or LLVM IR used by GCC. Third, you get conventions for finding and loading libraries (see Java Class Loaders). As you go higher and higher in concepts you have new conventions that you take for granted. That's why they didn't make it to semantic versioning. Execution. We could change the semantic versioning with. This is what is actually already happening: Platform is already on,, (JVM Bytecode), (JVM + Web - Server),, (specific Scala version) and so on. When you say APK, you are already talking about a specific ABI part of your API.

The top level of an abstraction (the sources written for the top level API can be recompiled / ported to any other lower level abstraction.

Let's say I have some sources for rxscala. If the Scala tools are changed I can recompile them. If the JVM changes, I could automatically convert from the old to the new machine without worrying about the high-level concepts. While porting can be tricky, any other client will find it helpful. When a new operating system is created with a completely different assembly language code, a translator can be created.

There are APIs ported into multiple languages, such as: B. reactive streams. In general, they define mappings to specific languages ​​/ platforms. I would argue that the API is the master specification, formally defined in human language or even some particular programming language. All other "assignments" are to a certain extent ABI, otherwise more API than the usual ABI. The same applies to the REST interfaces.


In short, and in philosophy, only one thing can do one Art get along well, and the ABI could be viewed as that Art viewed by software that work together.


I also tried to understand ABI and JesperE's answer was very helpful.

From a very simple perspective, we can try to understand ABI by considering binary compatibility.

The KDE Wiki defines a library as binary compatible "if a program that is dynamically linked to an earlier version of the library continues to run on newer versions of the library without the need to recompile." For more information on dynamic linking, see Static Linking and Dynamic Linking

Now let's try to look at only the most basic aspects necessary for a library to be binary compatible (assuming there are no changes to the library's source code):

  1. Same / backward compatible instruction set architecture (processor instructions, register file structure, stack organization, memory access types as well as size, layout and alignment of the basic data types that the processor can access directly)
  2. Same calling conventions
  3. Lack of convention with the same name (this may be necessary if, for example, a Fortran program needs to call a C ++ library function).

Sure, there are many other details, but this is mostly what the ABI covers as well.

To answer your question in more detail, we can deduce the following from the above:

ABI functionality: binary compatibility

existing entities: existing program / libraries / operating system

Consumers: libraries, operating system

Hope that helps!


Application Binary Interface (ABI)

Functionality:

  • Translation from the programmer's model to the domain data type, size, alignment and calling convention of the underlying system that controls how the arguments of the functions are passed and the return values ​​are retrieved. the system phone numbers and how an application should make system calls to the operating system; However, the name-mangling scheme of high-level language compilers, the passing of exceptions, and the calling convention between compilers on the same platform do not require cross-platform compatibility ...

Existing units:

  • Logical blocks directly involved in program execution: ALU, general purpose registers, registers for memory / I / O mapping of I / O, etc.

Consumer:

  • Language processors linkers, assemblers ...

These are needed by anyone who needs to make sure that build tool chains work as a whole. If you write one module in assembly language and another in Python and want to use an operating system instead of your own boot loader, your "application" modules work across "binary" boundaries and require approval of such an "interface".

Lack of C ++ names as object files from different high-level languages ​​may need to be linked in your application. Consider using the GCC standard library for system calls to Windows made with Visual C ++.

ELF is a possible expectation of the linker on an object file for interpretation, although the JVM may have a different idea.

For a Windows RT Store app, try searching for ARM ABI if you really want some build tool chains to work together.


The term ABI refers to two different but related concepts.

When it comes to compilers, it refers to the rules used to translate source-level constructs into binary constructs. How big are the data types? How does the stack work? How do I pass parameters to functions? Which registers should the caller save for the called party?

When it comes to libraries, it refers to the binary interface represented by a compiled library. This interface is the result of a number of factors including the library's source code, the rules used by the compiler, and in some cases definitions that have been adopted from other libraries.

Changes to a library can break the ABI without breaking the API. For example, imagine a library with an interface like.

and the application programmer writes code like