// SPDX-License-Identifier: GPL-2.0 /* * SLUB: A slab allocator that limits cache line use instead of queuing * objects in per cpu and per node lists. * * The allocator synchronizes using per slab locks or atomic operations * and only uses a centralized lock to manage a pool of partial slabs. * * (C) 2007 SGI, Christoph Lameter * (C) 2011 Linux Foundation, Christoph Lameter
*/
/* * Lock order: * 1. slab_mutex (Global Mutex) * 2. node->list_lock (Spinlock) * 3. kmem_cache->cpu_slab->lock (Local lock) * 4. slab_lock(slab) (Only on some arches) * 5. object_map_lock (Only for debugging) * * slab_mutex * * The role of the slab_mutex is to protect the list of all the slabs * and to synchronize major metadata changes to slab cache structures. * Also synchronizes memory hotplug callbacks. * * slab_lock * * The slab_lock is a wrapper around the page lock, thus it is a bit * spinlock. * * The slab_lock is only used on arches that do not have the ability * to do a cmpxchg_double. It only protects: * * A. slab->freelist -> List of free objects in a slab * B. slab->inuse -> Number of objects in use * C. slab->objects -> Number of objects in slab * D. slab->frozen -> frozen state * * Frozen slabs * * If a slab is frozen then it is exempt from list management. It is * the cpu slab which is actively allocated from by the processor that * froze it and it is not on any list. The processor that froze the * slab is the one who can perform list operations on the slab. Other * processors may put objects onto the freelist but the processor that * froze the slab is the only one that can retrieve the objects from the * slab's freelist. * * CPU partial slabs * * The partially empty slabs cached on the CPU partial list are used * for performance reasons, which speeds up the allocation process. * These slabs are not frozen, but are also exempt from list management, * by clearing the SL_partial flag when moving out of the node * partial list. Please see __slab_free() for more details. * * To sum up, the current scheme is: * - node partial slab: SL_partial && !frozen * - cpu partial slab: !SL_partial && !frozen * - cpu slab: !SL_partial && frozen * - full slab: !SL_partial && !frozen * * list_lock * * The list_lock protects the partial and full list on each node and * the partial slab counter. If taken then no new slabs may be added or * removed from the lists nor make the number of partial slabs be modified. * (Note that the total number of slabs is an atomic value that may be * modified without taking the list lock). * * The list_lock is a centralized lock and thus we avoid taking it as * much as possible. As long as SLUB does not have to handle partial * slabs, operations can continue without any centralized lock. F.e. * allocating a long series of objects that fill up slabs does not require * the list lock. * * For debug caches, all allocations are forced to go through a list_lock * protected region to serialize against concurrent validation. * * cpu_slab->lock local lock * * This locks protect slowpath manipulation of all kmem_cache_cpu fields * except the stat counters. This is a percpu structure manipulated only by * the local cpu, so the lock protects against being preempted or interrupted * by an irq. Fast path operations rely on lockless operations instead. * * On PREEMPT_RT, the local lock neither disables interrupts nor preemption * which means the lockless fastpath cannot be used as it might interfere with * an in-progress slow path operations. In this case the local lock is always * taken but it still utilizes the freelist for the common operations. * * lockless fastpaths * * The fast path allocation (slab_alloc_node()) and freeing (do_slab_free()) * are fully lockless when satisfied from the percpu slab (and when * cmpxchg_double is possible to use, otherwise slab_lock is taken). * They also don't disable preemption or migration or irqs. They rely on * the transaction id (tid) field to detect being preempted or moved to * another cpu. * * irq, preemption, migration considerations * * Interrupts are disabled as part of list_lock or local_lock operations, or * around the slab_lock operation, in order to make the slab allocator safe * to use in the context of an irq. * * In addition, preemption (or migration on PREEMPT_RT) is disabled in the * allocation slowpath, bulk allocation, and put_cpu_partial(), so that the * local cpu doesn't change in the process and e.g. the kmem_cache_cpu pointer * doesn't have to be revalidated in each section protected by the local lock. * * SLUB assigns one slab for allocation to each processor. * Allocations only occur from these slabs called cpu slabs. * * Slabs with free elements are kept on a partial list and during regular * operations no list for full slabs is used. If an object in a full slab is * freed then the slab will show up again on the partial lists. * We track full slabs for debugging purposes though because otherwise we * cannot scan all objects. * * Slabs are freed when they become empty. Teardown and setup is * minimal so we rely on the page allocators per cpu caches for * fast frees and allocs. * * slab->frozen The slab is frozen and exempt from list processing. * This means that the slab is dedicated to a purpose * such as satisfying allocations for a specific * processor. Objects may be freed in the slab while * it is frozen but slab_free will then skip the usual * list operations. It is up to the processor holding * the slab to integrate the slab into the slab lists * when the slab is no longer needed. * * One use of this flag is to mark slabs that are * used for allocations. Then such a slab becomes a cpu * slab. The cpu slab may be equipped with an additional * freelist that allows lockless access to * free objects in addition to the regular freelist * that requires the slab lock. * * SLAB_DEBUG_FLAGS Slab requires special handling due to debug * options set. This moves slab handling out of * the fast path and disables lockless freelists.
*/
/** * enum slab_flags - How the slab flags bits are used. * @SL_locked: Is locked with slab_lock() * @SL_partial: On the per-node partial list * @SL_pfmemalloc: Was allocated from PF_MEMALLOC reserves * * The slab flags share space with the page flags but some bits have * different interpretations. The high bits are used for information * like zone/node/section.
*/ enum slab_flags {
SL_locked = PG_locked,
SL_partial = PG_workingset, /* Historical reasons for this bit */
SL_pfmemalloc = PG_active, /* Historical reasons for this bit */
};
/* * We could simply use migrate_disable()/enable() but as long as it's a * function call even on !PREEMPT_RT, use inline preempt_disable() there.
*/ #ifndef CONFIG_PREEMPT_RT #define slub_get_cpu_ptr(var) get_cpu_ptr(var) #define slub_put_cpu_ptr(var) put_cpu_ptr(var) #define USE_LOCKLESS_FAST_PATH() (true) #else #define slub_get_cpu_ptr(var) \
({ \
migrate_disable(); \
this_cpu_ptr(var); \
}) #define slub_put_cpu_ptr(var) \ do { \
(void)(var); \
migrate_enable(); \
} while (0) #define USE_LOCKLESS_FAST_PATH() (false) #endif
/* * Issues still to be resolved: * * - Support PAGE_ALLOC_DEBUG. Should be easy to do. * * - Variable sizing of the per node arrays
*/
/* Enable to log cmpxchg failures */ #undef SLUB_DEBUG_CMPXCHG
#ifndef CONFIG_SLUB_TINY /* * Minimum number of partial slabs. These will be left on the partial * lists even if they are empty. kmem_cache_shrink may reclaim them.
*/ #define MIN_PARTIAL 5
/* * Maximum number of desirable partial slabs. * The existence of more partial slabs makes kmem_cache_shrink * sort the partial list by the number of objects in use.
*/ #define MAX_PARTIAL 10 #else #define MIN_PARTIAL 0 #define MAX_PARTIAL 0 #endif
/* * These debug flags cannot use CMPXCHG because there might be consistency * issues when checking or reading debug information
*/ #define SLAB_NO_CMPXCHG (SLAB_CONSISTENCY_CHECKS | SLAB_STORE_USER | \
SLAB_TRACE)
/* * Debugging flags that require metadata to be stored in the slab. These get * disabled when slab_debug=O is used and a cache's min order increases with * metadata.
*/ #define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
#define OO_SHIFT 16 #define OO_MASK ((1 << OO_SHIFT) - 1) #define MAX_OBJS_PER_PAGE 32767 /* since slab.objects is u15 */
/* * Tracking user of a slab.
*/ #define TRACK_ADDRS_COUNT 16 struct track { unsignedlong addr; /* Called from address */ #ifdef CONFIG_STACKDEPOT
depot_stack_handle_t handle; #endif int cpu; /* Was running on cpu */ int pid; /* Pid context */ unsignedlong when; /* When did the operation occur */
};
enum stat_item {
ALLOC_FASTPATH, /* Allocation from cpu slab */
ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */
FREE_FASTPATH, /* Free to cpu slab */
FREE_SLOWPATH, /* Freeing not to cpu slab */
FREE_FROZEN, /* Freeing to frozen slab */
FREE_ADD_PARTIAL, /* Freeing moves slab to partial list */
FREE_REMOVE_PARTIAL, /* Freeing removes last object */
ALLOC_FROM_PARTIAL, /* Cpu slab acquired from node partial list */
ALLOC_SLAB, /* Cpu slab acquired from page allocator */
ALLOC_REFILL, /* Refill cpu slab from slab freelist */
ALLOC_NODE_MISMATCH, /* Switching cpu slab */
FREE_SLAB, /* Slab freed to the page allocator */
CPUSLAB_FLUSH, /* Abandoning of the cpu slab */
DEACTIVATE_FULL, /* Cpu slab was full when deactivated */
DEACTIVATE_EMPTY, /* Cpu slab was empty when deactivated */
DEACTIVATE_TO_HEAD, /* Cpu slab was moved to the head of partials */
DEACTIVATE_TO_TAIL, /* Cpu slab was moved to the tail of partials */
DEACTIVATE_REMOTE_FREES,/* Slab contained remotely freed objects */
DEACTIVATE_BYPASS, /* Implicit deactivation */
ORDER_FALLBACK, /* Number of times fallback was necessary */
CMPXCHG_DOUBLE_CPU_FAIL,/* Failures of this_cpu_cmpxchg_double */
CMPXCHG_DOUBLE_FAIL, /* Failures of slab freelist update */
CPU_PARTIAL_ALLOC, /* Used cpu partial on alloc */
CPU_PARTIAL_FREE, /* Refill cpu partial on free */
CPU_PARTIAL_NODE, /* Refill cpu partial from node partial */
CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */
NR_SLUB_STAT_ITEMS
};
#ifndef CONFIG_SLUB_TINY /* * When changing the layout, make sure freelist and tid are still compatible * with this_cpu_cmpxchg_double() alignment requirements.
*/ struct kmem_cache_cpu { union { struct { void **freelist; /* Pointer to next available object */ unsignedlong tid; /* Globally unique transaction id */
};
freelist_aba_t freelist_tid;
}; struct slab *slab; /* The slab from which we are allocating */ #ifdef CONFIG_SLUB_CPU_PARTIAL struct slab *partial; /* Partially allocated slabs */ #endif
local_lock_t lock; /* Protects the fields above */ #ifdef CONFIG_SLUB_STATS unsignedint stat[NR_SLUB_STAT_ITEMS]; #endif
}; #endif/* CONFIG_SLUB_TINY */
staticinlinevoid stat(conststruct kmem_cache *s, enum stat_item si)
{ #ifdef CONFIG_SLUB_STATS /* * The rmw is racy on a preemptible kernel but this is acceptable, so * avoid this_cpu_add()'s irq-disable overhead.
*/
raw_cpu_inc(s->cpu_slab->stat[si]); #endif
}
/* * Iterator over all nodes. The body will be executed for each node that has * a kmem_cache_node structure allocated (which is true for all online nodes)
*/ #define for_each_kmem_cache_node(__s, __node, __n) \ for (__node = 0; __node < nr_node_ids; __node++) \ if ((__n = get_node(__s, __node)))
/* * Tracks for which NUMA nodes we have kmem_cache_nodes allocated. * Corresponds to node_state[N_MEMORY], but can temporarily * differ during memory hotplug/hotremove operations. * Protected by slab_mutex.
*/ static nodemask_t slab_nodes;
#ifndef CONFIG_SLUB_TINY /* * Workqueue used for flush_cpu_slab().
*/ staticstruct workqueue_struct *flushwq; #endif
/* * Returns freelist pointer (ptr). With hardening, this is obfuscated * with an XOR of the address where the pointer is held and a per-cache * random number.
*/ staticinline freeptr_t freelist_ptr_encode(conststruct kmem_cache *s, void *ptr, unsignedlong ptr_addr)
{ unsignedlong encoded;
/* * When running under KMSAN, get_freepointer_safe() may return an uninitialized * pointer value in the case the current thread loses the race for the next * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in * slab_alloc_node() will fail, so the uninitialized value won't be used, but * KMSAN will still check all arguments of cmpxchg because of imperfect * handling of inline assembly. * To work around this problem, we apply __no_kmsan_checks to ensure that * get_freepointer_safe() returns initialized memory.
*/
__no_kmsan_checks staticinlinevoid *get_freepointer_safe(struct kmem_cache *s, void *object)
{ unsignedlong freepointer_addr;
freeptr_t p;
if (!debug_pagealloc_enabled_static()) return get_freepointer(s, object);
/* * See comment in calculate_sizes().
*/ staticinlinebool freeptr_outside_object(struct kmem_cache *s)
{ return s->offset >= s->inuse;
}
/* * Return offset of the end of info block which is inuse + free pointer if * not overlapping with object.
*/ staticinlineunsignedint get_info_end(struct kmem_cache *s)
{ if (freeptr_outside_object(s)) return s->inuse + sizeof(void *); else return s->inuse;
}
/* Loop over all objects in a slab */ #define for_each_object(__p, __s, __addr, __objects) \ for (__p = fixup_red_left(__s, __addr); \
__p < (__addr) + (__objects) * (__s)->size; \
__p += (__s)->size)
/* * We take the number of objects but actually limit the number of * slabs on the per cpu partial list, in order to limit excessive * growth of the list. For simplicity we assume that the slabs will * be half-full.
*/
nr_slabs = DIV_ROUND_UP(nr_objects * 2, oo_objects(s->oo));
s->cpu_partial_slabs = nr_slabs;
}
/* * If network-based swap is enabled, slub must keep track of whether memory * were allocated from pfmemalloc reserves.
*/ staticinlinebool slab_test_pfmemalloc(conststruct slab *slab)
{ return test_bit(SL_pfmemalloc, &slab->flags);
}
/* * Interrupts must be disabled (for the fallback code to work right), typically * by an _irqsave() lock variant. On PREEMPT_RT the preempt_disable(), which is * part of bit_spin_lock(), is sufficient because the policy is not to allow any * allocation/ free operation in hardirq context. Therefore nothing can * interrupt the operation.
*/ staticinlinebool __slab_update_freelist(struct kmem_cache *s, struct slab *slab, void *freelist_old, unsignedlong counters_old, void *freelist_new, unsignedlong counters_new, constchar *n)
{ bool ret;
if (USE_LOCKLESS_FAST_PATH())
lockdep_assert_irqs_disabled();
if (s->flags & __CMPXCHG_DOUBLE) {
ret = __update_freelist_fast(slab, freelist_old, counters_old,
freelist_new, counters_new);
} else {
ret = __update_freelist_slow(slab, freelist_old, counters_old,
freelist_new, counters_new);
} if (likely(ret)) returntrue;
/* * kmalloc caches has fixed sizes (mostly power of 2), and kmalloc() API * family will round up the real request size to these fixed ones, so * there could be an extra area than what is requested. Save the original * request size in the meta data area, for better debug and sanity check.
*/ staticinlinevoid set_orig_size(struct kmem_cache *s, void *object, unsignedint orig_size)
{ void *p = kasan_reset_tag(object);
if (!slub_debug_orig_size(s)) return;
p += get_info_end(s);
p += sizeof(struct track) * 2;
/* * slub is about to manipulate internal object metadata. This memory lies * outside the range of the allocated object, so accessing it would normally * be reported by kasan as a bounds error. metadata_access_enable() is used * to tell kasan that these accesses are OK.
*/ staticinlinevoid metadata_access_enable(void)
{
kasan_disable_current();
kmsan_disable_current();
}
/* Verify that a pointer has an address that is valid within a slab page */ staticinlineint check_valid_pointer(struct kmem_cache *s, struct slab *slab, void *object)
{ void *base;
if (!object) return 1;
base = slab_address(slab);
object = kasan_reset_tag(object);
object = restore_red_left(s, object); if (object < base || object >= base + slab->objects * s->size ||
(object - base) % s->size) { return 0;
}
pr_err("Object 0x%p @offset=%tu fp=0x%p\n\n",
p, p - addr, get_freepointer(s, p));
if (s->flags & SLAB_RED_ZONE)
print_section(KERN_ERR, "Redzone ", p - s->red_left_pad,
s->red_left_pad); elseif (p > addr + 16)
print_section(KERN_ERR, "Bytes b4 ", p - 16, 16);
print_section(KERN_ERR, "Object ", p,
min_t(unsignedint, s->object_size, PAGE_SIZE)); if (s->flags & SLAB_RED_ZONE)
print_section(KERN_ERR, "Redzone ", p + s->object_size,
s->inuse - s->object_size);
off = get_info_end(s);
if (s->flags & SLAB_STORE_USER)
off += 2 * sizeof(struct track);
if (slub_debug_orig_size(s))
off += sizeof(unsignedint);
off += kasan_metadata_size(s, false);
if (off != size_from_object(s)) /* Beginning of the filler is the free pointer */
print_section(KERN_ERR, "Padding ", p + off,
size_from_object(s) - off);
}
if (s->flags & SLAB_RED_ZONE) { /* * Here and below, avoid overwriting the KMSAN shadow. Keeping * the shadow makes it possible to distinguish uninit-value * from use-after-free.
*/
memset_no_sanitize_memory(p - s->red_left_pad, val,
s->red_left_pad);
if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { /* * Redzone the extra allocated space by kmalloc than * requested, and the poison size will be limited to * the original request size accordingly.
*/
poison_size = get_orig_size(s, object);
}
}
/* * Object layout: * * object address * Bytes of the object to be managed. * If the freepointer may overlay the object then the free * pointer is at the middle of the object. * * Poisoning uses 0x6b (POISON_FREE) and the last byte is * 0xa5 (POISON_END) * * object + s->object_size * Padding to reach word boundary. This is also used for Redzoning. * Padding is extended by another word if Redzoning is enabled and * object_size == inuse. * * We fill with 0xbb (SLUB_RED_INACTIVE) for inactive objects and with * 0xcc (SLUB_RED_ACTIVE) for objects in use. * * object + s->inuse * Meta data starts here. * * A. Free pointer (if we cannot overwrite object on free) * B. Tracking data for SLAB_STORE_USER * C. Original request size for kmalloc object (SLAB_STORE_USER enabled) * D. Padding to reach required alignment boundary or at minimum * one word if debugging is on to be able to detect writes * before the word boundary. * * Padding is done using 0x5a (POISON_INUSE) * * object + s->size * Nothing is used beyond s->size. * * If slabcaches are merged then the object_size and inuse boundaries are mostly * ignored. And therefore no slab options that rely on these boundaries * may be used with merged slabcaches.
*/
staticint check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
{ unsignedlong off = get_info_end(s); /* The end of info */
if (s->flags & SLAB_STORE_USER) { /* We also have user information there */
off += 2 * sizeof(struct track);
if (s->flags & SLAB_KMALLOC)
off += sizeof(unsignedint);
}
/* Check the pad bytes at the end of a slab page */ static pad_check_attributes void
slab_pad_check(struct kmem_cache *s, struct slab *slab)
{
u8 *start;
u8 *fault;
u8 *end;
u8 *pad; int length; int remainder;
if (!(s->flags & SLAB_POISON)) return;
start = slab_address(slab);
length = slab_size(slab);
end = start + length;
remainder = length % s->size; if (!remainder) return;
pad = end - remainder;
metadata_access_enable();
fault = memchr_inv(kasan_reset_tag(pad), POISON_INUSE, remainder);
metadata_access_disable(); if (!fault) return; while (end > fault && end[-1] == POISON_INUSE)
end--;
if (s->flags & SLAB_RED_ZONE) { if (!check_bytes_and_report(s, slab, object, "Left Redzone",
object - s->red_left_pad, val, s->red_left_pad, ret))
ret = 0;
if (!check_bytes_and_report(s, slab, object, "Right Redzone",
endobject, val, s->inuse - s->object_size, ret))
ret = 0;
if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) {
orig_size = get_orig_size(s, object);
if (s->object_size > orig_size &&
!check_bytes_and_report(s, slab, object, "kmalloc Redzone", p + orig_size,
val, s->object_size - orig_size, ret)) {
ret = 0;
}
}
} else { if ((s->flags & SLAB_POISON) && s->object_size < s->inuse) { if (!check_bytes_and_report(s, slab, p, "Alignment padding",
endobject, POISON_INUSE,
s->inuse - s->object_size, ret))
ret = 0;
}
}
if (s->flags & SLAB_POISON) { if (val != SLUB_RED_ACTIVE && (s->flags & __OBJECT_POISON)) { /* * KASAN can save its free meta data inside of the * object at offset 0. Thus, skip checking the part of * the redzone that overlaps with the meta data.
*/
kasan_meta_size = kasan_metadata_size(s, true); if (kasan_meta_size < s->object_size - 1 &&
!check_bytes_and_report(s, slab, p, "Poison",
p + kasan_meta_size, POISON_FREE,
s->object_size - kasan_meta_size - 1, ret))
ret = 0; if (kasan_meta_size < s->object_size &&
!check_bytes_and_report(s, slab, p, "End Poison",
p + s->object_size - 1, POISON_END, 1, ret))
ret = 0;
} /* * check_pad_bytes cleans up on its own.
*/ if (!check_pad_bytes(s, slab, p))
ret = 0;
}
/* * Cannot check freepointer while object is allocated if * object and freepointer overlap.
*/ if ((freeptr_outside_object(s) || val != SLUB_RED_ACTIVE) &&
!check_valid_pointer(s, slab, get_freepointer(s, p))) {
object_err(s, slab, p, "Freepointer corrupt"); /* * No choice but to zap it and thus lose the remainder * of the free objects in this slab. May cause * another error because the object count is now wrong.
*/
set_freepointer(s, p, NULL);
ret = 0;
}
return ret;
}
staticint check_slab(struct kmem_cache *s, struct slab *slab)
{ int maxobj;
if (!folio_test_slab(slab_folio(slab))) {
slab_err(s, slab, "Not a valid slab page"); return 0;
}
maxobj = order_objects(slab_order(slab), s->size); if (slab->objects > maxobj) {
slab_err(s, slab, "objects %u > max %u",
slab->objects, maxobj); return 0;
} if (slab->inuse > slab->objects) {
slab_err(s, slab, "inuse %u > max %u",
slab->inuse, slab->objects); return 0;
} if (slab->frozen) {
slab_err(s, slab, "Slab disabled since SLUB metadata consistency check failed"); return 0;
}
/* Slab_pad_check fixes things up after itself */
slab_pad_check(s, slab); return 1;
}
/* * Determine if a certain object in a slab is on the freelist. Must hold the * slab lock to guarantee that the chains are in a consistent state.
*/ staticbool on_freelist(struct kmem_cache *s, struct slab *slab, void *search)
{ int nr = 0; void *fp; void *object = NULL; int max_objects;
if (!check_object(s, slab, object, SLUB_RED_INACTIVE)) return 0;
return 1;
}
static noinline bool alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object, int orig_size)
{ if (s->flags & SLAB_CONSISTENCY_CHECKS) { if (!alloc_consistency_checks(s, slab, object)) goto bad;
}
/* Success. Perform special debug activities for allocs */
trace(s, slab, object, 1);
set_orig_size(s, object, orig_size);
init_object(s, object, SLUB_RED_ACTIVE); returntrue;
bad: if (folio_test_slab(slab_folio(slab))) { /* * If this is a slab page then lets do the best we can * to avoid issues in the future. Marking all objects * as used avoids touching the remaining objects.
*/
slab_fix(s, "Marking all objects used");
slab->inuse = slab->objects;
slab->freelist = NULL;
slab->frozen = 1; /* mark consistency-failed slab as frozen */
} returnfalse;
}
if (!check_object(s, slab, object, SLUB_RED_ACTIVE)) return 0;
if (unlikely(s != slab->slab_cache)) { if (!folio_test_slab(slab_folio(slab))) {
slab_err(s, slab, "Attempt to free object(0x%p) outside of slab",
object);
} elseif (!slab->slab_cache) {
slab_err(NULL, slab, "No slab cache for object 0x%p",
object);
} else {
object_err(s, slab, object, "page slab pointer corrupt.");
} return 0;
} return 1;
}
/* * Parse a block of slab_debug options. Blocks are delimited by ';' * * @str: start of block * @flags: returns parsed flags, or DEBUG_DEFAULT_FLAGS if none specified * @slabs: return start of list of slabs, or NULL when there's no list * @init: assume this is initial parsing and not per-kmem-create parsing * * returns the start of next block if there's any, or NULL
*/ staticchar *
parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init)
{ bool higher_order_disable = false;
/* Skip any completely empty blocks */ while (*str && *str == ';')
str++;
if (*str == ',') { /* * No options but restriction on slabs. This means full * debugging for slabs matching a pattern.
*/
*flags = DEBUG_DEFAULT_FLAGS; goto check_slabs;
}
*flags = 0;
/* Determine which debug features should be switched on */ for (; *str && *str != ',' && *str != ';'; str++) { switch (tolower(*str)) { case'-':
*flags = 0; break; case'f':
*flags |= SLAB_CONSISTENCY_CHECKS; break; case'z':
*flags |= SLAB_RED_ZONE; break; case'p':
*flags |= SLAB_POISON; break; case'u':
*flags |= SLAB_STORE_USER; break; case't':
*flags |= SLAB_TRACE; break; case'a':
*flags |= SLAB_FAILSLAB; break; case'o': /* * Avoid enabling debugging on caches if its minimum * order would increase as a result.
*/
higher_order_disable = true; break; default: if (init)
pr_err("slab_debug option '%c' unknown. skipped\n", *str);
}
}
check_slabs: if (*str == ',')
*slabs = ++str; else
*slabs = NULL;
/* Skip over the slab list */ while (*str && *str != ';')
str++;
/* Skip any completely empty blocks */ while (*str && *str == ';')
str++;
if (init && higher_order_disable)
disable_higher_order_debug = 1;
/* * For backwards compatibility, a single list of flags with list of * slabs means debugging is only changed for those slabs, so the global * slab_debug should be unchanged (0 or DEBUG_DEFAULT_FLAGS, depending * on CONFIG_SLUB_DEBUG_ON). We can extended that to multiple lists as * long as there is no option specifying flags without a slab list.
*/ if (slab_list_specified) { if (!global_slub_debug_changed)
global_flags = slub_debug;
slub_debug_string = saved_str;
}
out:
slub_debug = global_flags; if (slub_debug & SLAB_STORE_USER)
stack_depot_request_early_init(); if (slub_debug != 0 || slub_debug_string)
static_branch_enable(&slub_debug_enabled); else
static_branch_disable(&slub_debug_enabled); if ((static_branch_unlikely(&init_on_alloc) ||
static_branch_unlikely(&init_on_free)) &&
(slub_debug & SLAB_POISON))
pr_info("mem auto-init: SLAB_POISON will take precedence over init_on_alloc/init_on_free\n"); return 1;
}
/* * kmem_cache_flags - apply debugging options to the cache * @flags: flags to set * @name: name of the cache * * Debug option(s) are applied to @flags. In addition to the debug * option(s), if a slab name (or multiple) is specified i.e. * slab_debug=<Debug-Options>,<slab name1>,<slab name2> ... * then only the select slabs will receive the debug option(s).
*/
slab_flags_t kmem_cache_flags(slab_flags_t flags, constchar *name)
{ char *iter;
size_t len; char *next_block;
slab_flags_t block_flags;
slab_flags_t slub_debug_local = slub_debug;
if (flags & SLAB_NO_USER_FLAGS) return flags;
/* * If the slab cache is for debugging (e.g. kmemleak) then * don't store user (stack trace) information by default, * but let the user enable it via the command line below.
*/ if (flags & SLAB_NOLEAKTRACE)
slub_debug_local &= ~SLAB_STORE_USER;
len = strlen(name);
next_block = slub_debug_string; /* Go through all blocks of debug options, see if any matches our slab's name */ while (next_block) {
next_block = parse_slub_debug_flags(next_block, &block_flags, &iter, false); if (!iter) continue; /* Found a block that has a slab list, search it */ while (*iter) { char *end, *glob;
size_t cmplen;
end = strchrnul(iter, ','); if (next_block && next_block < end)
end = next_block - 1;
staticinlinevoid handle_failed_objexts_alloc(unsignedlong obj_exts, struct slabobj_ext *vec, unsignedint objects)
{ /* * If vector previously failed to allocate then we have live * objects with no tag reference. Mark all references in this * vector as empty to avoid warnings later on.
*/ if (obj_exts & OBJEXTS_ALLOC_FAIL) { unsignedint i;
for (i = 0; i < objects; i++)
set_codetag_empty(&vec[i].ref);
}
}
/* * The allocated objcg pointers array is not accounted directly. * Moreover, it should not come from DMA buffer and is not readily * reclaimable. So those GFP bits should be masked off.
*/ #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \
__GFP_ACCOUNT | __GFP_NOFAIL)
gfp &= ~OBJCGS_CLEAR_MASK; /* Prevent recursive extension vector allocation */
gfp |= __GFP_NO_OBJ_EXT;
vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
slab_nid(slab)); if (!vec) { /* * Try to mark vectors which failed to allocate. * If this operation fails, there may be a racing process * that has already completed the allocation.
*/ if (!mark_failed_objexts_alloc(slab) &&
slab_obj_exts(slab)) return 0;
return -ENOMEM;
}
new_exts = (unsignedlong)vec; #ifdef CONFIG_MEMCG
new_exts |= MEMCG_DATA_OBJEXTS; #endif
retry:
old_exts = READ_ONCE(slab->obj_exts);
handle_failed_objexts_alloc(old_exts, vec, objects); if (new_slab) { /* * If the slab is brand new and nobody can yet access its * obj_exts, no synchronization is required and obj_exts can * be simply assigned.
*/
slab->obj_exts = new_exts;
} elseif (old_exts & ~OBJEXTS_FLAGS_MASK) { /* * If the slab is already in use, somebody can allocate and * assign slabobj_exts in parallel. In this case the existing * objcg vector should be reused.
*/
mark_objexts_empty(vec);
kfree(vec); return 0;
} elseif (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) { /* Retry if a racing thread changed slab->obj_exts from under us. */ goto retry;
}
obj_exts = slab_obj_exts(slab); if (!obj_exts) { /* * If obj_exts allocation failed, slab->obj_exts is set to * OBJEXTS_ALLOC_FAIL. In this case, we end up here and should * clear the flag.
*/
slab->obj_exts = 0; return;
}
/* * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its * corresponding extension will be NULL. alloc_tag_sub() will throw a * warning if slab has extensions but the extension of an object is * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that * the extension for obj_exts is expected to be NULL.
*/
mark_objexts_empty(obj_exts);
kfree(obj_exts);
slab->obj_exts = 0;
}
/* Should be called only if mem_alloc_profiling_enabled() */ static noinline void
__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
{ struct slabobj_ext *obj_exts;
obj_exts = prepare_slab_obj_exts_hook(s, flags, object); /* * Currently obj_exts is used only for allocation profiling. * If other users appear then mem_alloc_profiling_enabled() * check should be added before alloc_tag_add().
*/ if (likely(obj_exts))
alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
}
/* Should be called only if mem_alloc_profiling_enabled() */ static noinline void
__alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects)
{ struct slabobj_ext *obj_exts; int i;
/* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */ if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE)) return;
obj_exts = slab_obj_exts(slab); if (!obj_exts) return;
for (i = 0; i < objects; i++) { unsignedint off = obj_to_index(s, slab, p[i]);
alloc_tag_sub(&obj_exts[off].ref, s->size);
}
}
staticinlinevoid
alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects)
{ if (mem_alloc_profiling_enabled())
__alloc_tagging_slab_free_hook(s, slab, p, objects);
}
folio = virt_to_folio(p); if (!folio_test_slab(folio)) { int size;
if (folio_memcg_kmem(folio)) returntrue;
if (__memcg_kmem_charge_page(folio_page(folio, 0), flags,
folio_order(folio))) returnfalse;
/* * This folio has already been accounted in the global stats but * not in the memcg stats. So, subtract from the global and use * the interface which adds to both global and memcg stats.
*/
size = folio_size(folio);
node_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B, -size);
lruvec_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B, size); returntrue;
}
slab = folio_slab(folio);
s = slab->slab_cache;
/* * Ignore KMALLOC_NORMAL cache to avoid possible circular dependency * of slab_obj_exts being allocated from the same slab and thus the slab * becoming effectively unfreeable.
*/ if (is_kmalloc_normal(s)) returntrue;
/* Ignore already charged objects. */
slab_exts = slab_obj_exts(slab); if (slab_exts) {
off = obj_to_index(s, slab, p); if (unlikely(slab_exts[off].objcg)) returntrue;
}
/* * Hooks for other subsystems that check memory allocations. In a typical * production configuration these hooks all should produce no code at all. * * Returns true if freeing of the object can proceed, false if its reuse * was delayed by CONFIG_SLUB_RCU_DEBUG or KASAN quarantine, or it was returned * to KFENCE.
*/ static __always_inline bool slab_free_hook(struct kmem_cache *s, void *x, bool init, bool after_rcu_delay)
{ /* Are the object contents still accessible? */ bool still_accessible = (s->flags & SLAB_TYPESAFE_BY_RCU) && !after_rcu_delay;
if (!(s->flags & SLAB_DEBUG_OBJECTS))
debug_check_no_obj_freed(x, s->object_size);
/* Use KCSAN to help debug racy use-after-free. */ if (!still_accessible)
__kcsan_check_access(x, s->object_size,
KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT);
if (kfence_free(x)) returnfalse;
/* * Give KASAN a chance to notice an invalid free operation before we * modify the object.
*/ if (kasan_slab_pre_free(s, x)) returnfalse;
#ifdef CONFIG_SLUB_RCU_DEBUG if (still_accessible) { struct rcu_delayed_free *delayed_free;
delayed_free = kmalloc(sizeof(*delayed_free), GFP_NOWAIT); if (delayed_free) { /* * Let KASAN track our call stack as a "related work * creation", just like if the object had been freed * normally via kfree_rcu(). * We have to do this manually because the rcu_head is * not located inside the object.
*/
kasan_record_aux_stack(x);
/* * As memory initialization might be integrated into KASAN, * kasan_slab_free and initialization memset's must be * kept together to avoid discrepancies in behavior. * * The initialization memset's clear the object and the metadata, * but don't touch the SLAB redzone. * * The object's freepointer is also avoided if stored outside the * object.
*/ if (unlikely(init)) { int rsize;
--> --------------------
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.