| Hardware Locality (hwloc)
    master-20250612.1317.gitd03ae8e67
    | 
| Data Structures | |
| struct | hwloc_location | 
| Typedefs | |
| typedef unsigned | hwloc_memattr_id_t | 
| Functions | |
| int | hwloc_memattr_get_by_name (hwloc_topology_t topology, const char *name, hwloc_memattr_id_t *id) | 
| int | hwloc_get_local_numanode_objs (hwloc_topology_t topology, struct hwloc_location *location, unsigned *nr, hwloc_obj_t *nodes, unsigned long flags) | 
| int | hwloc_topology_get_default_nodeset (hwloc_topology_t topology, hwloc_nodeset_t nodeset, unsigned long flags) | 
| int | hwloc_memattr_get_value (hwloc_topology_t topology, hwloc_memattr_id_t attribute, hwloc_obj_t target_node, struct hwloc_location *initiator, unsigned long flags, hwloc_uint64_t *value) | 
| int | hwloc_memattr_get_best_target (hwloc_topology_t topology, hwloc_memattr_id_t attribute, struct hwloc_location *initiator, unsigned long flags, hwloc_obj_t *best_target, hwloc_uint64_t *value) | 
| int | hwloc_memattr_get_best_initiator (hwloc_topology_t topology, hwloc_memattr_id_t attribute, hwloc_obj_t target_node, unsigned long flags, struct hwloc_location *best_initiator, hwloc_uint64_t *value) | 
| int | hwloc_memattr_get_targets (hwloc_topology_t topology, hwloc_memattr_id_t attribute, struct hwloc_location *initiator, unsigned long flags, unsigned *nr, hwloc_obj_t *targets, hwloc_uint64_t *values) | 
| int | hwloc_memattr_get_initiators (hwloc_topology_t topology, hwloc_memattr_id_t attribute, hwloc_obj_t target_node, unsigned long flags, unsigned *nr, struct hwloc_location *initiators, hwloc_uint64_t *values) | 
Platforms with heterogeneous memory require ways to decide whether a buffer should be allocated on "fast" memory (such as HBM), "normal" memory (DDR) or even "slow" but large-capacity memory (non-volatile memory). These memory nodes are called "Targets" while the CPU accessing them is called the "Initiator". Access performance depends on their locality (NUMA platforms) as well as the intrinsic performance of the targets (heterogeneous platforms).
The following attributes describe the performance of memory accesses from an Initiator to a memory Target, for instance their latency or bandwidth. Initiators performing these memory accesses are usually some PUs or Cores (described as a CPU set). Hence a Core may choose where to allocate a memory buffer by comparing the attributes of different target memory nodes nearby.
There are also some attributes that are system-wide. Their value does not depend on a specific initiator performing an access. The memory node Capacity is an example of such attribute without initiator.
One way to use this API is to start with a cpuset describing the Cores where a program is bound. The best target NUMA node for allocating memory in this program on these Cores may be obtained by passing this cpuset as an initiator to hwloc_memattr_get_best_target() with the relevant memory attribute. For instance, if the code is latency limited, use the Latency attribute.
A more flexible approach consists in getting the list of local NUMA nodes by passing this cpuset to hwloc_get_local_numanode_objs(). Attribute values for these nodes, if any, may then be obtained with hwloc_memattr_get_value() and manually compared with the desired criteria.
Memory attributes are also used internally to build Memory Tiers which provide an easy way to distinguish NUMA nodes of different kinds, as explained in Heterogeneous Memory.
Beside tiers, hwloc defines a set of "default" nodes where normal memory allocations should be made from (see hwloc_topology_get_default_nodeset()). This is also useful for dividing the machine into a set of non-overlapping NUMA domains, for instance for binding tasks per domain.
| typedef unsigned hwloc_memattr_id_t | 
A memory attribute identifier.
hwloc predefines some commonly-used attributes in hwloc_memattr_id_e. One may then dynamically register custom ones with hwloc_memattr_register(), they will be assigned IDs immediately after the predefined ones. See Managing memory attributes for more information about existing attribute IDs.
Flags for selecting target NUMA nodes.
| enum hwloc_memattr_id_e | 
Predefined memory attribute IDs. See hwloc_memattr_id_t for the generic definition of IDs for predefined or custom attributes.
| Enumerator | |
|---|---|
| HWLOC_MEMATTR_ID_CAPACITY | The "Capacity" is returned in bytes (local_memory attribute in objects). Best capacity nodes are nodes with higher capacity. No initiator is involved when looking at this attribute. The corresponding attribute flags are HWLOC_MEMATTR_FLAG_HIGHER_FIRST. Capacity values may not be modified using hwloc_memattr_set_value(). | 
| HWLOC_MEMATTR_ID_LOCALITY | The "Locality" is returned as the number of PUs in that locality (e.g. the weight of its cpuset). Best locality nodes are nodes with smaller locality (nodes that are local to very few PUs). Poor locality nodes are nodes with larger locality (nodes that are local to the entire machine). No initiator is involved when looking at this attribute. The corresponding attribute flags are HWLOC_MEMATTR_FLAG_HIGHER_FIRST. Locality values may not be modified using hwloc_memattr_set_value(). | 
| HWLOC_MEMATTR_ID_BANDWIDTH | The "Bandwidth" is returned in MiB/s, as seen from the given initiator location. Best bandwidth nodes are nodes with higher bandwidth. The corresponding attribute flags are HWLOC_MEMATTR_FLAG_HIGHER_FIRST and HWLOC_MEMATTR_FLAG_NEED_INITIATOR. This is the average bandwidth for read and write accesses. If the platform provides individual read and write bandwidths but no explicit average value, hwloc computes and returns the average. | 
| HWLOC_MEMATTR_ID_READ_BANDWIDTH | The "ReadBandwidth" is returned in MiB/s, as seen from the given initiator location. Best bandwidth nodes are nodes with higher bandwidth. The corresponding attribute flags are HWLOC_MEMATTR_FLAG_HIGHER_FIRST and HWLOC_MEMATTR_FLAG_NEED_INITIATOR. | 
| HWLOC_MEMATTR_ID_WRITE_BANDWIDTH | The "WriteBandwidth" is returned in MiB/s, as seen from the given initiator location. Best bandwidth nodes are nodes with higher bandwidth. The corresponding attribute flags are HWLOC_MEMATTR_FLAG_HIGHER_FIRST and HWLOC_MEMATTR_FLAG_NEED_INITIATOR. | 
| HWLOC_MEMATTR_ID_LATENCY | The "Latency" is returned as nanoseconds, as seen from the given initiator location. Best latency nodes are nodes with smaller latency. The corresponding attribute flags are HWLOC_MEMATTR_FLAG_LOWER_FIRST and HWLOC_MEMATTR_FLAG_NEED_INITIATOR. This is the average latency for read and write accesses. If the platform provides individual read and write latencies but no explicit average value, hwloc computes and returns the average. | 
| HWLOC_MEMATTR_ID_READ_LATENCY | The "ReadLatency" is returned as nanoseconds, as seen from the given initiator location. Best latency nodes are nodes with smaller latency. The corresponding attribute flags are HWLOC_MEMATTR_FLAG_LOWER_FIRST and HWLOC_MEMATTR_FLAG_NEED_INITIATOR. | 
| HWLOC_MEMATTR_ID_WRITE_LATENCY | The "WriteLatency" is returned as nanoseconds, as seen from the given initiator location. Best latency nodes are nodes with smaller latency. The corresponding attribute flags are HWLOC_MEMATTR_FLAG_LOWER_FIRST and HWLOC_MEMATTR_FLAG_NEED_INITIATOR. | 
| int hwloc_get_local_numanode_objs | ( | hwloc_topology_t | topology, | 
| struct hwloc_location * | location, | ||
| unsigned * | nr, | ||
| hwloc_obj_t * | nodes, | ||
| unsigned long | flags | ||
| ) | 
Return an array of local NUMA nodes.
By default only select the NUMA nodes whose locality is exactly the given location. More nodes may be selected if additional flags are given as a OR'ed set of hwloc_local_numanode_flag_e.
If location is given as an explicit object, its CPU set is used to find NUMA nodes with the corresponding locality. If the object does not have a CPU set (e.g. I/O object), the CPU parent (where the I/O object is attached) is used.
On input, nr points to the number of nodes that may be stored in the nodes array. On output, nr will be changed to the number of stored nodes, or the number of nodes that would have been stored if there were enough room.
nodes array.| int hwloc_memattr_get_best_initiator | ( | hwloc_topology_t | topology, | 
| hwloc_memattr_id_t | attribute, | ||
| hwloc_obj_t | target_node, | ||
| unsigned long | flags, | ||
| struct hwloc_location * | best_initiator, | ||
| hwloc_uint64_t * | value | ||
| ) | 
Return the best initiator for the given attribute and target NUMA node.
If value is non NULL, the corresponding value is returned there.
If multiple initiators have the same attribute values, only one is returned (and there is no way to clarify how that one is chosen). Applications that want to detect initiators with identical/similar values, or that want to look at values for multiple attributes, should rather get all values using hwloc_memattr_get_value() and manually select the initiator they consider the best.
The returned initiator should not be modified or freed, it belongs to the topology.
target_node cannot be NULL.
flags must be 0 for now.
ENOENT if there are no matching initiators. EINVAL if the attribute does not relate to a specific initiator (it does not have the flag HWLOC_MEMATTR_FLAG_NEED_INITIATOR). | int hwloc_memattr_get_best_target | ( | hwloc_topology_t | topology, | 
| hwloc_memattr_id_t | attribute, | ||
| struct hwloc_location * | initiator, | ||
| unsigned long | flags, | ||
| hwloc_obj_t * | best_target, | ||
| hwloc_uint64_t * | value | ||
| ) | 
Return the best target NUMA node for the given attribute and initiator.
If the attribute does not relate to a specific initiator (it does not have the flag HWLOC_MEMATTR_FLAG_NEED_INITIATOR), location initiator is ignored and may be NULL.
If value is non NULL, the corresponding value is returned there.
If multiple targets have the same attribute values, only one is returned (and there is no way to clarify how that one is chosen). Applications that want to detect targets with identical/similar values, or that want to look at values for multiple attributes, should rather get all values using hwloc_memattr_get_value() and manually select the target they consider the best.
flags must be 0 for now.
ENOENT if there are no matching targets. EINVAL if flags are invalid, or no such attribute exists.initiator should be of type HWLOC_LOCATION_TYPE_CPUSET when refering to accesses performed by CPU cores. HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs. | int hwloc_memattr_get_by_name | ( | hwloc_topology_t | topology, | 
| const char * | name, | ||
| hwloc_memattr_id_t * | id | ||
| ) | 
Return the identifier of the memory attribute with the given name.
EINVAL if no such attribute exists. | int hwloc_memattr_get_initiators | ( | hwloc_topology_t | topology, | 
| hwloc_memattr_id_t | attribute, | ||
| hwloc_obj_t | target_node, | ||
| unsigned long | flags, | ||
| unsigned * | nr, | ||
| struct hwloc_location * | initiators, | ||
| hwloc_uint64_t * | values | ||
| ) | 
Return the initiators that have values for a given attribute for a specific target NUMA node.
Return initiators for the given attribute and target node in the initiators array. If values is not NULL, the corresponding attribute values are stored in the array it points to.
On input, nr points to the number of initiators that may be stored in the array initiators (and values). On output, nr points to the number of initiators (and values) that were actually found, even if some of them couldn't be stored in the array. Initiators that couldn't be stored are ignored, but the function still returns success (0). The caller may find out by comparing the value pointed by nr before and after the function call.
The returned initiators should not be modified or freed, they belong to the topology.
target_node cannot be NULL.
flags must be 0 for now.
If the attribute does not relate to a specific initiator (it does not have the flag HWLOC_MEMATTR_FLAG_NEED_INITIATOR), no initiator is returned.
| int hwloc_memattr_get_targets | ( | hwloc_topology_t | topology, | 
| hwloc_memattr_id_t | attribute, | ||
| struct hwloc_location * | initiator, | ||
| unsigned long | flags, | ||
| unsigned * | nr, | ||
| hwloc_obj_t * | targets, | ||
| hwloc_uint64_t * | values | ||
| ) | 
Return the target NUMA nodes that have some values for a given attribute.
Return targets for the given attribute in the targets array (for the given initiator if any). If values is not NULL, the corresponding attribute values are stored in the array it points to.
On input, nr points to the number of targets that may be stored in the array targets (and values). On output, nr points to the number of targets (and values) that were actually found, even if some of them couldn't be stored in the array. Targets that couldn't be stored are ignored, but the function still returns success (0). The caller may find out by comparing the value pointed by nr before and after the function call.
The returned targets should not be modified or freed, they belong to the topology.
Argument initiator is ignored if the attribute does not relate to a specific initiator (it does not have the flag HWLOC_MEMATTR_FLAG_NEED_INITIATOR). Otherwise initiator may be non NULL to report only targets that have a value for that initiator.
flags must be 0 for now.
initiator should be of type HWLOC_LOCATION_TYPE_CPUSET when referring to accesses performed by CPU cores. HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs. | int hwloc_memattr_get_value | ( | hwloc_topology_t | topology, | 
| hwloc_memattr_id_t | attribute, | ||
| hwloc_obj_t | target_node, | ||
| struct hwloc_location * | initiator, | ||
| unsigned long | flags, | ||
| hwloc_uint64_t * | value | ||
| ) | 
Return an attribute value for a specific target NUMA node.
If the attribute does not relate to a specific initiator (it does not have the flag HWLOC_MEMATTR_FLAG_NEED_INITIATOR), location initiator is ignored and may be NULL.
target_node cannot be NULL. If attribute is HWLOC_MEMATTR_ID_CAPACITY, target_node must be a NUMA node. If it is HWLOC_MEMATTR_ID_LOCALITY, target_node must have a CPU set.
flags must be 0 for now.
EINVAL if flags are invalid or no such attribute exists.initiator should be of type HWLOC_LOCATION_TYPE_CPUSET when refering to accesses performed by CPU cores. HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs. | int hwloc_topology_get_default_nodeset | ( | hwloc_topology_t | topology, | 
| hwloc_nodeset_t | nodeset, | ||
| unsigned long | flags | ||
| ) | 
Return the set of default NUMA nodes.
In machines with heterogeneous memory, some NUMA nodes are considered the default ones, i.e. where basic allocations should be made from. These are usually DRAM nodes.
Other nodes may be reserved for specific use (I/O device memory, e.g. GPU memory), small but high performance (HBM), large but slow memory (NVM), etc. Buffers should usually not be allocated from there unless explicitly required.
This function fills nodeset with the bits of NUMA nodes considered default.
It is guaranteed that these nodes have non-intersecting CPU sets, i.e. cores may not have multiple local NUMA nodes anymore. Hence this may be used to iterate over the platform divided into separate NUMA localities, for instance for binding one task per NUMA domain.
Any core that had some local NUMA node(s) in the initial topology should still have one in the default nodeset. Corner cases where this would be wrong consist in asymmetric platforms with missing DRAM nodes, or topologies that were already restricted to less NUMA nodes.
The returned nodeset may be passed to hwloc_topology_restrict() with HWLOC_RESTRICT_FLAG_BYNODESET to remove all non-default nodes from the topology. The resulting topology will be easier to use when iterating over (now homogeneous) NUMA nodes.
The heuristics for finding default nodes relies on memory tiers and subtypes (see Heterogeneous Memory) as well as the assumption that hardware vendors list default nodes first in hardware tables.
flags must be 0 for now.
 1.8.17
 1.8.17