From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C89BC4338F for ; Thu, 5 Aug 2021 15:21:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 109EA60E53 for ; Thu, 5 Aug 2021 15:21:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 109EA60E53 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D9F186B009C; Thu, 5 Aug 2021 11:20:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D78166B009B; Thu, 5 Aug 2021 11:20:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB9AB6B009C; Thu, 5 Aug 2021 11:20:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id 11D156B009F for ; Thu, 5 Aug 2021 11:20:12 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B4E63181BD7A9 for ; Thu, 5 Aug 2021 15:20:11 +0000 (UTC) X-FDA: 78441387822.02.8C6CEDD Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf12.hostedemail.com (Postfix) with ESMTP id 5A1B51001508 for ; Thu, 5 Aug 2021 15:20:11 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 60597223EA; Thu, 5 Aug 2021 15:20:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1628176810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=86hWrvq9VzboJf/e/QEFd73JA4qBH9jYWEpfNvZWORw=; b=YNvOuVxrTXepid8/3jCzvTfxwQ68cPvuWBEqn0k4e5USka4cDSTIAm/P8J/weqwPI+/MUX tLaRCnKxA7YoL84JTnOmoET+kBzBOcUYrrY1ES5NQ/x6gJ/xZqbFfnBz8n8+dxir8YI/n/ 2QPjhj6zkkKJNiRG81IMU3lhQBUJOB0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1628176810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=86hWrvq9VzboJf/e/QEFd73JA4qBH9jYWEpfNvZWORw=; b=kC/DLyP4DPZil6WC8jfDmamPD3EAJGgaVAB+ktaWsUnlOwYz1yHKz7ci2mTc/BYfBDAK/t 9aY0Uhxw/iUuF6Bw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 303C013DAC; Thu, 5 Aug 2021 15:20:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EJkoC6oBDGFDdQAAMHmgww (envelope-from ); Thu, 05 Aug 2021 15:20:10 +0000 From: Vlastimil Babka To: Andrew Morton , Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mike Galbraith , Sebastian Andrzej Siewior , Thomas Gleixner , Mel Gorman , Jesper Dangaard Brouer , Jann Horn , Vlastimil Babka Subject: [PATCH v4 33/35] mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg Date: Thu, 5 Aug 2021 17:19:58 +0200 Message-Id: <20210805152000.12817-34-vbabka@suse.cz> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210805152000.12817-1-vbabka@suse.cz> References: <20210805152000.12817-1-vbabka@suse.cz> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 5A1B51001508 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YNvOuVxr; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="kC/DLyP4"; dmarc=none; spf=pass (imf12.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Stat-Signature: xnjdukm6w3e56psak8pb3tcx5fm9ok81 X-HE-Tag: 1628176811-722021 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Jann Horn reported [1] the following theoretically possible race: task A: put_cpu_partial() calls preempt_disable() task A: oldpage =3D this_cpu_read(s->cpu_slab->partial) interrupt: kfree() reaches unfreeze_partials() and discards the page task B (on another CPU): reallocates page as page cache task A: reads page->pages and page->pobjects, which are actually halves of the pointer page->lru.prev task B (on another CPU): frees page interrupt: allocates page as SLUB page and places it on the percpu part= ial list task A: this_cpu_cmpxchg() succeeds which would cause page->pages and page->pobjects to end up containing halves of pointers that would then influence when put_cpu_partial() happens and show up in root-only sysfs files. Maybe that's acceptable, I don't know. But there should probably at least be a comment for now to point out that we're reading union fields of a page that might be in a completely different state. Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is onl= y safe against s->cpu_slab->partial manipulation in ___slab_alloc() if the latte= r disables irqs, otherwise a __slab_free() in an irq handler could call put_cpu_partial() in the middle of ___slab_alloc() manipulating ->partial and corrupt it. This becomes an issue on RT after a local_lock is introdu= ced in later patch. The fix means taking the local_lock also in put_cpu_parti= al() on RT. After debugging this issue, Mike Galbraith suggested [2] that to avoid different locking schemes on RT and !RT, we can just protect put_cpu_part= ial() with disabled irqs (to be converted to local_lock_irqsave() later) everyw= here. This should be acceptable as it's not a fast path, and moving the actual partial unfreezing outside of the irq disabled section makes it short, an= d with the retry loop gone the code can be also simplified. In addition, the rac= e reported by Jann should no longer be possible. [1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz= +H66fvm4FcA@mail.gmail.com/ [2] https://lore.kernel.org/linux-rt-users/e3470ab357b48bccfbd1f5133b9821= 78a7d2befb.camel@gmx.de/ Reported-by: Jann Horn Suggested-by: Mike Galbraith Signed-off-by: Vlastimil Babka --- mm/slub.c | 81 ++++++++++++++++++++++++++++++------------------------- 1 file changed, 44 insertions(+), 37 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 252421ff1d5f..c35ad273e3e9 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2003,7 +2003,12 @@ static inline void *acquire_slab(struct kmem_cache= *s, return freelist; } =20 +#ifdef CONFIG_SLUB_CPU_PARTIAL static void put_cpu_partial(struct kmem_cache *s, struct page *page, int= drain); +#else +static inline void put_cpu_partial(struct kmem_cache *s, struct page *pa= ge, + int drain) { } +#endif static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags); =20 /* @@ -2437,14 +2442,6 @@ static void unfreeze_partials_cpu(struct kmem_cach= e *s, __unfreeze_partials(s, partial_page); } =20 -#else /* CONFIG_SLUB_CPU_PARTIAL */ - -static inline void unfreeze_partials(struct kmem_cache *s) { } -static inline void unfreeze_partials_cpu(struct kmem_cache *s, - struct kmem_cache_cpu *c) { } - -#endif /* CONFIG_SLUB_CPU_PARTIAL */ - /* * Put a page that was just frozen (in __slab_free|get_partial_node) int= o a * partial page slot if available. @@ -2454,46 +2451,56 @@ static inline void unfreeze_partials_cpu(struct k= mem_cache *s, */ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int= drain) { -#ifdef CONFIG_SLUB_CPU_PARTIAL struct page *oldpage; - int pages; - int pobjects; + struct page *page_to_unfreeze =3D NULL; + unsigned long flags; + int pages =3D 0; + int pobjects =3D 0; =20 - preempt_disable(); - do { - pages =3D 0; - pobjects =3D 0; - oldpage =3D this_cpu_read(s->cpu_slab->partial); + local_irq_save(flags); + + oldpage =3D this_cpu_read(s->cpu_slab->partial); =20 - if (oldpage) { + if (oldpage) { + if (drain && oldpage->pobjects > slub_cpu_partial(s)) { + /* + * Partial array is full. Move the existing set to the + * per node partial list. Postpone the actual unfreezing + * outside of the critical section. + */ + page_to_unfreeze =3D oldpage; + oldpage =3D NULL; + } else { pobjects =3D oldpage->pobjects; pages =3D oldpage->pages; - if (drain && pobjects > slub_cpu_partial(s)) { - /* - * partial array is full. Move the existing - * set to the per node partial list. - */ - unfreeze_partials(s); - oldpage =3D NULL; - pobjects =3D 0; - pages =3D 0; - stat(s, CPU_PARTIAL_DRAIN); - } } + } =20 - pages++; - pobjects +=3D page->objects - page->inuse; + pages++; + pobjects +=3D page->objects - page->inuse; =20 - page->pages =3D pages; - page->pobjects =3D pobjects; - page->next =3D oldpage; + page->pages =3D pages; + page->pobjects =3D pobjects; + page->next =3D oldpage; =20 - } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) - !=3D oldpage); - preempt_enable(); -#endif /* CONFIG_SLUB_CPU_PARTIAL */ + this_cpu_write(s->cpu_slab->partial, page); + + local_irq_restore(flags); + + if (page_to_unfreeze) { + __unfreeze_partials(s, page_to_unfreeze); + stat(s, CPU_PARTIAL_DRAIN); + } } =20 +#else /* CONFIG_SLUB_CPU_PARTIAL */ + +static inline void unfreeze_partials(struct kmem_cache *s) { } +static inline void unfreeze_partials_cpu(struct kmem_cache *s, + struct kmem_cache_cpu *c) { } + +#endif /* CONFIG_SLUB_CPU_PARTIAL */ + static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cp= u *c, bool lock) { --=20 2.32.0