From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 245793EDAAE for ; Thu, 2 Jul 2026 13:20:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782998445; cv=none; b=K7BcghxYUFUB1ALX+h4FrqffPYwvbPJQXYzSSxcWf75z83zplW7AjIsBlSKkoS6QvqupFKNWY3uLNbufLIpK+L2EewYx5/WfcYUzDC+CkuZ4eXBZQmmj3I1Taz+ZXFTFlQ08erQJJNelgsN8tkK59+5PVSHTjn7NJKONKMX7haE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782998445; c=relaxed/simple; bh=o4/ERm71qHLuCDQU+2Rzi9W/zNA0JSNtT5sdLItnb3Y=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=s9nmJWzAl/v/uREbWIy2Znjh6IWGDGEbmMIztcVW9X4pbuqwvG+KKe+v8yEafNvReaI7enaP40gRPClEpPmRBDztjtate/5tLbei84/WZd+HjLL7ZSEg8bmNEoak2SX8/xAc3M4Kkt1cPJlJlJp6uStVJaiDVs7tPuOm73bfV6Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jvRdAZ46; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jvRdAZ46" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 253F01F000E9; Thu, 2 Jul 2026 13:20:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782998443; bh=LtnYoiP/eG9Tgh5i0c9SgBuYZuS66dfGoyaTQJgAbZE=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=jvRdAZ46eMQWhWZ53ieNWDhKvK8PeQvdzSslWVTUSSMKk7qNDcYgzyrJUmU+7hYN6 5Q294ZTb2BRGx5lMbe7e/2uDNLQ6HYuv5gHnjlbwB+lwMUX/EWikl5/K1lBPidoucP xuZ/VQQESO16C8Xpe7C3YYeDuSEL8eEaZyyYyJ76pchQq1f/ue1E4RoryVZcXnSuS5 cVg6P6vslyISRbtuHYez8X1lC4udpjK0H/IlZjDi3ptpXD2FiXEzBTOAC1+waaRzTN eVkwwOdT8VWjYs1zlJqN0pegNsCOFQshh98rATnbtkA50lgt9bUVJilL1Lurt57RZY 8ZUXpkE44hg9Q== Message-ID: Date: Thu, 2 Jul 2026 22:20:34 +0900 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type To: "Vlastimil Babka (SUSE)" , Andrew Morton , Hao Li , Christoph Lameter , David Rientjes , Roman Gushchin , Suren Baghdasaryan , Hao Ge , Kees Cook , Pedro Falcato , Shakeel Butt , Danielle Constantino Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20260702-kmalloc-no-objext-v1-0-167175008538@kernel.org> <20260702-kmalloc-no-objext-v1-2-167175008538@kernel.org> Content-Language: en-US From: Harry Yoo In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------th9OEBCdGuABwYynh6tiGITw" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------th9OEBCdGuABwYynh6tiGITw Content-Type: multipart/mixed; boundary="------------WPWqZUGtewhYY4G8yYTBEeLb"; protected-headers="v1" From: Harry Yoo To: "Vlastimil Babka (SUSE)" , Andrew Morton , Hao Li , Christoph Lameter , David Rientjes , Roman Gushchin , Suren Baghdasaryan , Hao Ge , Kees Cook , Pedro Falcato , Shakeel Butt , Danielle Constantino Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Message-ID: Subject: Re: [PATCH RFC hotfixes 2/2] mm/slab: prevent unbounded recursion in free path with new kmalloc type References: <20260702-kmalloc-no-objext-v1-0-167175008538@kernel.org> <20260702-kmalloc-no-objext-v1-2-167175008538@kernel.org> In-Reply-To: --------------WPWqZUGtewhYY4G8yYTBEeLb Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 7/2/26 9:57 PM, Vlastimil Babka (SUSE) wrote: > On 7/2/26 06:09, Harry Yoo (Oracle) wrote: >> Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from= >> its own slab") avoided recursive allocation of obj_exts from kmalloc >> caches of the same size, by bumping the obj_exts array's allocation >> size whenever the array size equals the size of the object being >> allocated. >> >> However, as reported by Danielle Costantino and Shakeel Butt, >> even slabs from kmalloc caches of different sizes can form a cycle >> by allocating obj_exts arrays from each other [1]: >> >> What happened: a KMALLOC_NORMAL slab's obj_exts array (used by >> allocation profiling / memcg accounting) is itself kmalloc()'d from = a >> KMALLOC_NORMAL cache, so the "slab holds another slab's obj_exts arr= ay" >> relation can form cycles. With sizeof(struct slabobj_ext) =3D=3D 16 = and >> the host's geometry: >> >> - kmalloc-512 has 64 objects/slab -> array is 64*16 =3D=3D 1024 byte= s, >> served from kmalloc-1k; >> - kmalloc-1k has 32 objects/slab -> array is 32*16 =3D=3D 512 byte= s, >> served from kmalloc-512. >> >> A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's= >> obj_exts array. Discarding one frees the other's array, which empti= es >> and discards that slab, which frees the first's array, and so on: >> __free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() -= > >> __free_slab() recurses along the cycle until the stack is exhausted.= >> >> With memory allocation profiling, this allows unbounded recursion >> in the free path and led to a stack overflow on a production host in >> the Meta fleet [1]: >> >> BUG: TASK stack guard page was hit >> Oops: stack guard page >> RIP: 0010:kfree+0x8/0x5d0 >> Call Trace: >> __free_slab+0x66/0xc0 >> kfree+0x3f0/0x5d0 >> ... ( ~125x __free_slab <-> kfree ) ... >> >> do_syscall_64 >> >> It is proposed [1] to resolve this issue by always serving the obj_ext= s >> array allocation from kmalloc caches (or large kmalloc) of sizes large= r >> than the object size. However, as pointed out by Vlastimil Babka [2], >> this can waste an excessive amount of memory as slabs from large >> kmalloc sizes (e.g. kmalloc-8k) generally need obj_exts arrays much >> smaller than the object size. >> >> Therefore, rather than bumping the size, let us take a different >> approach; disallow formation of cycles between kmalloc types when >> allocating obj_exts arrays. Currently, all obj_exts arrays are served >> from normal kmalloc caches. Cycles cannot be created if obj_exts array= s >> of normal kmalloc caches are served from a special kmalloc type that c= an >> never have obj_exts arrays. >> >> To achieve this, create a new kmalloc type called KMALLOC_NO_OBJ_EXT. >> KMALLOC_NO_OBJ_EXT caches are created when CONFIG_SLAB_OBJ_EXT is >> enabled, and they have SLAB_NO_OBJ_EXT flag to prevent allocation >> of obj_exts arrays. They remain unused until allocation of obj_exts >> arrays for normal kmalloc caches happens. >=20 > I wonder if we should just use them always (not just for kmalloc_normal= ) if > we already have them. Would there be any downside? Good point! That's more intuitive and sounds like it's good to separate them because likely obj_exts will have longer lifetime than slab objects. Not sure about impact on memory usage, need to check. I'd say it's fine as long as it doesn't clearly increase memory usage. But I guess that should not be part of bugfix as it's a functional change that is not required to fix the bug. >> @@ -426,6 +434,11 @@ static inline bool is_kmalloc_normal(struct kmem_= cache *s) >> { >> if (!is_kmalloc_cache(s)) >> return false; >> + >> + /* KMALLOC_NO_OBJ_EXT is not normal kmalloc */ >> + if (s->flags & SLAB_NO_OBJ_EXT) >> + return false; >=20 > Could it just go the the test below? Yes! >> + >> return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUN= T)); >> @@ -7957,10 +7940,10 @@ static int calculate_sizes(struct kmem_cache_a= rgs *args, struct kmem_cache *s) >> s->allocflags |=3D __GFP_RECLAIMABLE; >> =20 >> /* >> - * For KMALLOC_NORMAL caches we enable sheaves later by >> - * bootstrap_kmalloc_sheaves() to avoid recursion >> + * For kmalloc caches we enable sheaves later by >> + * bootstrap_kmalloc_sheaves() to avoid recursion. >> */ >> - if (!is_kmalloc_normal(s)) >> + if (!(s->flags & SLAB_KMALLOC)) >=20 > is_kmalloc_cache()? Will do, thanks! --=20 Cheers, Harry / Hyeonggon --------------WPWqZUGtewhYY4G8yYTBEeLb-- --------------th9OEBCdGuABwYynh6tiGITw Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature.asc" -----BEGIN PGP SIGNATURE----- iHUEARYKAB0WIQQQ1ub6gR5ogjaKRmOGXBN6rc5S1gUCakZlpAAKCRCGXBN6rc5S 1mpUAQDY/swNtIsKO44//01bM1v0keN/oBshI/Ey6NKb/R0WUQEAoo3buWnHvAOH hRN8dWitGqW4SOzNJ/ojcE7rfqXkUwk= =qhOX -----END PGP SIGNATURE----- --------------th9OEBCdGuABwYynh6tiGITw--