Error reporting in the API | Most functions in the hwloc API return an integer value. Unless documentated differently, they return 0 on success and -1 on error. Functions that return a pointer type return NULL on error |
API version | |
Object Sets (hwloc_cpuset_t and hwloc_nodeset_t) | Hwloc uses bitmaps to represent two distinct kinds of object sets: CPU sets (hwloc_cpuset_t) and NUMA node sets (hwloc_nodeset_t). These types are both typedefs to a common back end type (hwloc_bitmap_t), and therefore all the hwloc bitmap functions are applicable to both hwloc_cpuset_t and hwloc_nodeset_t (see The bitmap API) |
Object Types | |
Object Structure and Attributes | |
Topology Creation and Destruction | |
Object levels, depths and types | Be sure to see the figure in Terms and Definitions that shows a complete topology tree, including depths, child/sibling/cousin relationships, and an example of an asymmetric topology where one package has fewer caches than its peers |
Converting between Object Types and Attributes, and Strings | |
Consulting and Adding Info Attributes | |
CPU binding | Some operating systems only support binding threads or processes to a single PU. Others allow binding to larger sets such as entire Cores or Packages or even random sets of individual PUs. In such operating system, the scheduler is free to run the task on one of these PU, then migrate it to another PU, etc. It is often useful to call hwloc_bitmap_singlify() on the target CPU set before passing it to the binding function to avoid these expensive migrations. See the documentation of hwloc_bitmap_singlify() for details |
Memory binding | Memory binding can be done three ways: |
Changing the Source of Topology Discovery | These functions must be called between hwloc_topology_init() and hwloc_topology_load(). Otherwise, they will return -1 with errno set to EBUSY |
Topology Detection Configuration and Query | Several functions can optionally be called between hwloc_topology_init() and hwloc_topology_load() to configure how the detection should be performed, e.g. to ignore some objects types, define a synthetic topology, etc |
Modifying a loaded Topology | |
Finding Objects inside a CPU set | |
Finding Objects covering at least CPU set | |
Looking at Ancestor and Child Objects | Be sure to see the figure in Terms and Definitions that shows a complete topology tree, including depths, child/sibling/cousin relationships, and an example of an asymmetric topology where one package has fewer caches than its peers |
Kinds of object Type | Each object type is either Normal (i.e. hwloc_obj_type_is_normal() returns 1), or Memory (i.e. hwloc_obj_type_is_memory() returns 1) or I/O (i.e. hwloc_obj_type_is_io() returns 1) or Misc (i.e. equal to HWLOC_OBJ_MISC). It cannot be of more than one of these kinds |
Looking at Cache Objects | |
Finding objects, miscellaneous helpers | Be sure to see the figure in Terms and Definitions that shows a complete topology tree, including depths, child/sibling/cousin relationships, and an example of an asymmetric topology where one package has fewer caches than its peers |
Distributing items over a topology | |
CPU and node sets of entire topologies | |
Converting between CPU sets and node sets | |
Finding I/O objects | |
The bitmap API | The hwloc_bitmap_t type represents a set of integers (positive or null). A bitmap may be of infinite size (all bits are set after some point). A bitmap may even be full if all bits are set |
Exporting Topologies to XML | |
Exporting Topologies to Synthetic | |
Retrieve distances between objects | |
Helpers for consulting distance matrices | |
Add distances between objects | The usual way to add distances is: |
Remove distances between objects | |
Comparing memory node attributes for finding where to allocate on | Platforms with heterogeneous memory require ways to decide whether a buffer should be allocated on "fast" memory (such as HBM), "normal" memory (DDR) or even "slow" but large-capacity memory (non-volatile memory). These memory nodes are called "Targets" while the CPU accessing them is called the "Initiator". Access performance depends on their locality (NUMA platforms) as well as the intrinsic performance of the targets (heterogeneous platforms) |
Managing memory attributes | |
Kinds of CPU cores | Platforms with heterogeneous CPUs may have some cores with different features or frequencies. This API exposes identical PUs in sets called CPU kinds. Each PU of the topology may only be in a single kind |
Linux-specific helpers | This includes helpers for manipulating Linux kernel cpumap files, and hwloc equivalents of the Linux sched_setaffinity and sched_getaffinity system calls |
Interoperability with Linux libnuma unsigned long masks | This interface helps converting between Linux libnuma unsigned long masks and hwloc cpusets and nodesets |
Interoperability with Linux libnuma bitmask | This interface helps converting between Linux libnuma bitmasks and hwloc cpusets and nodesets |
Windows-specific helpers | These functions query Windows processor groups. These groups partition the operating system into virtual sets of up to 64 neighbor PUs. Threads and processes may only be bound inside a single group. Although Windows processor groups may be exposed in the hwloc hierarchy as hwloc Groups, they are also often merged into existing hwloc objects such as NUMA nodes or Packages. This API provides explicit information about Windows processor groups so that applications know whether binding to a large set of PUs may fail because it spans over multiple Windows processor groups |
Interoperability with glibc sched affinity | This interface offers ways to convert between hwloc cpusets and glibc cpusets such as those manipulated by sched_getaffinity() or pthread_attr_setaffinity_np() |
Interoperability with OpenCL | This interface offers ways to retrieve topology information about OpenCL devices |
Interoperability with the CUDA Driver API | This interface offers ways to retrieve topology information about CUDA devices when using the CUDA Driver API |
Interoperability with the CUDA Runtime API | This interface offers ways to retrieve topology information about CUDA devices when using the CUDA Runtime API |
Interoperability with the NVIDIA Management Library | This interface offers ways to retrieve topology information about devices managed by the NVIDIA Management Library (NVML) |
Interoperability with the ROCm SMI Management Library | This interface offers ways to retrieve topology information about devices managed by the ROCm SMI Management Library |
Interoperability with the oneAPI Level Zero interface. | This interface offers ways to retrieve topology information about devices managed by the Level Zero API |
Interoperability with OpenGL displays | This interface offers ways to retrieve topology information about OpenGL displays |
Interoperability with OpenFabrics | This interface offers ways to retrieve topology information about OpenFabrics devices (InfiniBand, Omni-Path, usNIC, etc) |
Topology differences | Applications that manipulate many similar topologies, for instance one for each node of a homogeneous cluster, may want to compress topologies to reduce the memory footprint |
Sharing topologies between processes | These functions are used to share a topology between processes by duplicating it into a file-backed shared-memory buffer |
Components and Plugins: Discovery components | |
Components and Plugins: Discovery backends | |
Components and Plugins: Generic components | |
Components and Plugins: Core functions to be used by components | |
Components and Plugins: Filtering objects | |
Components and Plugins: helpers for PCI discovery | |
Components and Plugins: finding PCI objects during other discoveries | |
Netloc API | |