For a user who only uses one C++ standard library, such as libc++,there are typically three compatibility goals, each with increasingcompatibility requirements:
If we replace "different libc++ versions" with a mixture of libc++and libstdc++, we encounter additional goals:
Considering static linking raises another interesting question:
If libc++ is statically linked into b.so
, can it be usedwith a.out
that links against a different version oflibc++? Let's focus on the first three questions, which specificallypertain to libc++.
libc++ is assigned a version number that corresponds to the major andminor releases of the llvm-project. Additionally, libc++ offers a targetABI version (LIBCXX_ABI_VERSION
) as a build-time option,which currently defaults to 1. LIBCXX_ABI_VERSION
is usedto choose between the stable ABI and the unstable ABI, as explained inthe official documentation libc++ ABIstability.
If we build a program using a specific libc++ version with the stableABI and link it against the libc++ DSO, upgrading the libc++ DSO shouldnot break the program (assuming the program itself doesn't have anybugs). However, there are rare cases where libc++ might remove symbolsthat technically have the potential to cause an ABI break. These casesusually involve symbols such as debug mode symbols or symbols that onlyaffect certain C++ 2003 programs, and their impact is limited.
In general, the answer to the first two questions (repeated below) isyes:
However, certain unusual configurations, like_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS
, need to beexcluded.
Now, let's consider the problem: when does an ABI break occur? An ABIbreak can happen when an executable or DSO uses a symbol that undergoesan ABI change due to a libc++ upgrade. This symbol can be defined in thelibc++ DSO itself or in another DSO that is rebuilt with a new versionof libc++.
For each symbol affected by libc++, there is an intention regardingwhether it should be exported or not. Typically, the goal is to minimizethe number of exported symbols. Let's discuss these symbolsseparately.
The symbols that are intended to be exported include numeroustypeinfo/vtable symbols from _LIBCPP_TEMPLATE_VIS
classes,many _LIBCPP_EXPORTED_FROM_ABI
symbols, a few_LIBCPP_METHOD_TEMPLATE_IMPLICIT_INSTANTIATION_VIS
and enumsymbols, as well as a few miscellaneous symbols.
libc++ needs to provide ABI compatibility for these symbols withinthe stable ABI.
Most classes are marked with _LIBCPP_TEMPLATE_VIS
, whichallows the instantiated typeinfo symbols to have default visibility evenwhen using -fvisibility=hidden
or-fvisibility-inlines-hidden
. 1
2
3
4
5
6
7
8% cat typeid.cc
#include <string>
#include <typeinfo>
const char *foo() { return typeid(std::string).name(); }
% clang -stdlib=libc++ -c -fvisibility=hidden typeid.cc
% readelf -WsC c.o | grep basic_string
5: 0000000000000000 16 OBJECT WEAK DEFAULT 9 typeinfo for std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >
7: 0000000000000000 63 OBJECT WEAK DEFAULT 7 typeinfo name for std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >
For exported function symbols (e.g.std::__1::thread::~thread()
(whenLIBCXX_ABI_VERSION=1
)), they are generally defined inlibcxx/src/**/*.cpp
files.
When our executable or DSO has an undefined symbol that is defined bythe libc++ DSO, problems may arise if a new libc++ DSO defines thesymbol with an ABI break or no longer defines the symbol.
In the case of DSOs, symbol interposition introduces anotherconsideration: non-local default visibility symbols that are defined. Bydefault, such symbols are exported to the dynamic symbol table. When theDSO is used with another DSO, the runtime linker (rtld) binds thereference from the second DSO to the first DSO or executable thatdefines the symbol.
To enable the safe mixing of linked images built with differentversions of libc++, libc++ utilizes the concept of "hiding symbols fromthe ABI." Let's explore different categories of symbols and how to hidethem from the ABI.
In short, libc++ provides the macro macro_LIBCPP_HIDE_FROM_ABI
for such symbols. The macro consistsof at least__attribute__((__visibility__("hidden"))) __attribute__((__exclude_from_explicit_instantiation__))
.
The majority of symbols impacted by libc++ are those generated duringthe compilation of libc++ headers. Dealing with these symbols isrelatively straightforward. We simply need to make them hidden using__attribute__((__visibility__("hidden")))
.
In the case of explicit instantiation declarations, the compilerassumes that the definition exists in another translation unit and maygenerate undefined symbols. This behavior corresponds to theavailable_externally
linkage in LLVM IR. The presence ofundefined symbols poses a problem because if the definition originatesfrom a translation unit built with a different version of libc++, theremay be a mismatch in the ABI.
To address this, starting from Clang 8, the attribute __attribute__((exclude_from_explicit_instantiation))
is defined. Applying this attribute to a symbol treats it as not beingpart of an explicit instantiation declaration. Consequently, the symbolwill exhibit the regular COMDAT behavior (the linkonce_odr
or weak_odr
linkage in LLVM IR), where it is either definedor optimized out, ensuring that it never results in an undefinedsymbol.
1 | % cat a.cc |
In cases where__attribute__((__exclude_from_explicit_instantiation__))
isunavailable, __attribute__((__always_inline__))
serves as afallback for older versions of Clang. However,always_inline
has certain issues:
always_inline
does not guarantee that thefunction will be inlined.The primary concern with always_inline
is that Clang iscompelled to expand the large function at every call site, even when atranslation unit contains multiple calls to the same member function ofa specific instantiation. On the other hand, withexclude_from_explicit_instantiation
, the compiler can makea decision to emit a single definition that serves multiple call sites,thereby optimizing code size and compilation time.
Prior to the introduction ofexclude_from_explicit_instantiation
, __attribute__((internal_linkage))
was used as a better alternative to always_inline
. Thecompiler can emit a single definition that serves multiple call sites,but the definitions in different translation units may not bededuplicated.
In addition, exclude_from_explicit_instantiation
oralways_inline
serves another purpose in conjunction withthe hidden visibility. If a member function in an extern template hashidden visibility (due to an attribute,-fvisibility-inlines-hidden
, or-fvisibility=hidden
), a function call to it can compile toan hidden undefined symbol, which cannot be resolved to a definition inthe libc++ DSO.
libc++ refers to the per-translation-unit ABI insulation as theability to safely link two relocatable object files built with differentversions of libc++ into the same executable or DSO. Additionally, thisrequires that non-exported symbols from building relocatable objectfiles with different libc++ versions do not conflict.
To achieve per-translation-unit ABI insulation, an intendednon-exported symbol can be annotated with__attribute__((__abi_tag__(...)))
. The ABI tag string isderived from _LIBCPP_VERSION
. In the default configuration,_LIBCPP_HIDE_FROM_ABI
consists of__attribute__((__visibility__("hidden"))) __attribute__((__exclude_from_explicit_instantiation__)) __attribute__((__abi_tag__(...)))
.
Let's consider an example. 1
2
3
4
5
6
7
8
9
10
11% cat a.cc
int foo(const std::vector<int> &a) { return a[0]; }
% clang -c -stdlib=libc++ a.cc
% readelf -WsC b.o | grep 'vector<'
4: 0000000000000000 33 FUNC GLOBAL DEFAULT 2 foo(std::__1::vector<int, std::__1::allocator<int> > const&)
5: 0000000000000000 32 FUNC WEAK HIDDEN 5 std::__1::vector<int, std::__1::allocator<int> >::operator[][abi:v170000](unsigned long) const
% clang -c -stdlib=libc++ -D_LIBCPP_NO_ABI_TAG a.cc
% readelf -WsC b.o | grep 'vector<'
4: 0000000000000000 33 FUNC GLOBAL DEFAULT 2 foo(std::__1::vector<int, std::__1::allocator<int> > const&)
5: 0000000000000000 32 FUNC WEAK HIDDEN 5 std::__1::vector<int, std::__1::allocator<int> >::operator[](unsigned long) const
When compiling a.c
with libc++ that defines_LIBCPP_VERSION
to 170000, thestd::vector::operator[]
generates a symbol whose nameB7v170000
. However, including this string for every_LIBCPP_HIDE_FROM_ABI
symbol can significantly increase thesize of relocatable object files, therefore_LIBCPP_NO_ABI_TAG
is provided as an escape hatch whenusers do not require this strong ABI compatibility.
Previously, libc++ used__attribute__((internal_linkage))
to provideper-translation-unit ABI insulation. As aforementioned, the maindownside is the presence of duplidate copies of functions acrosstranslation units.
If a process has two different libc++ versions:
libcxx/src/iostream_init.h:__start_std_streams
) may happentwice.If a program was built with an old libstdc++ version, it should workwith an upgraded libstdc++ DSO (barring the program's own bugs).
In libstdc++, there is no instance of the libc++-stylealways_inline/internal_linkage/abi_tag
, suggesting that itdoesn't provide per-TU ABI insulation in any configuration. It appearsthat the documentation at https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.htmlshould cover the implications for archives, but it currently doesnot.
In situations that a process an executable and its DSOs require twodifferent libstdc++ copies, it is recommended to rename the conflictingsymbols.
libstdc++ uses the abi_tag
attribute as well but forlimited purposes, e.g. the GCC 5 change to switchstd::basic_string
to use small size optimization. See GCC5and the C++11 ABI for detail.
1 | % cat a.cc |
Statically linking libstdc++ is unpopular as it's difficult to ensurethat there is no conflicting libstdc++.so.6
.
libstdc++ does utilize symbol versioning, but I think the primary usecase is so that glibc rtld will report an error (versionGLIBCXX_XXX' not found (required by YYY)
when a DSO builtwith new libstdc++ is loaded by an old libstdc++.so.6
.
Non-default symbols are limited to the implementation files and notthe headers. If I runllvm-nm -gjDU /usr/lib/gcc/x86_64-linux-gnu/12/libstdc++.so | grep '[^@]@[^@]'
,I can only find very few symbols.
The above discussion has bypassed another important component, C++ABI library. Common implementations are:
The C++ standard library implementation in llvm-project, libc++, canleverage libc++abi, libcxxrt or libsupc++, but libc++abi is recommended.These libraries do not use inline namespace and have conflicting symbolnames. We should prevent ODR violation due to the C++ ABI library.
To link two libc++ copies, we can do:
b.so
b.so
, libc++ Y, libc++abi => a.out
With some efforts, wou can mix libstdc++ and libc++. This requirescompiling the non-libsupc++ part of libstdc++ to get a homebrewlibstdc++.so.6
. To mix libstdc++ and libc++, we can do:
b.so
b.so
, libc++, libc++abi => a.out