From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE7EDC433EF for ; Thu, 31 Mar 2022 02:05:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352454AbiCaCHe (ORCPT ); Wed, 30 Mar 2022 22:07:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343905AbiCaCHe (ORCPT ); Wed, 30 Mar 2022 22:07:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4D8F60AA3 for ; Wed, 30 Mar 2022 19:05:47 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 30E9E61932 for ; Thu, 31 Mar 2022 02:05:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8023FC340EC; Thu, 31 Mar 2022 02:05:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1648692346; bh=5A7mFMmbLTr6ozt52HIKnhq7t0hrujDuNXrO3msPZ0s=; h=Date:To:From:Subject:From; b=BgxCJOTcSfKOfR/hMuFVyKt9SWc395j3NMCD6X9QL9bNgu+Gw/kH6T8ewn5ER+CPw vWtrssGQEzv+/vJxLu3YUjtljH/Owf2M9gP7zev+gdrACR3QzN7qjtYYfhSIo4WdrL mOxKty9ihOgCsEQEMUszEQ3HtTUBhZ8rGwGDAsis= Date: Wed, 30 Mar 2022 19:05:45 -0700 To: mm-commits@vger.kernel.org, songmuchun@bytedance.com, shakeelb@google.com, roman.gushchin@linux.dev, mhocko@suse.com, hannes@cmpxchg.org, longman@redhat.com, akpm@linux-foundation.org From: Andrew Morton Subject: + mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch added to -mm tree Message-Id: <20220331020546.8023FC340EC@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The patch titled Subject: mm/list_lru: fix possible race in memcg_reparent_list_lru_node() has been added to the -mm tree. Its filename is mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Waiman Long Subject: mm/list_lru: fix possible race in memcg_reparent_list_lru_node() Muchun Song found out there could be a race between list_lru_add() and memcg_reparent_list_lru_node() causing the later function to miss reparenting of a lru entry as shown below: CPU0: CPU1: list_lru_add() spin_lock(&nlru->lock) l = list_lru_from_kmem(memcg) memcg_reparent_objcgs(memcg) memcg_reparent_list_lrus(memcg) memcg_reparent_list_lru() memcg_reparent_list_lru_node() if (!READ_ONCE(nlru->nr_items)) // Miss reparenting return // Assume 0->1 l->nr_items++ // Assume 0->1 nlru->nr_items++ Though it is not likely that a list_lru_node that has 0 item suddenly has a newly added lru entry at the end of its life. The race is still theoretically possible. With the lock/unlock pair used within the percpu_ref_kill() which is the last function call of memcg_reparent_objcgs(), any read issued in memcg_reparent_list_lru_node() will not be reordered before the reparenting of objcgs. Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check to ensure that either the reading of nr_items is valid or the racing list_lru_add() will see the reparented objcg. Link: https://lkml.kernel.org/r/20220330172646.2687555-1-longman@redhat.com Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()") Reported-by: Muchun Song Signed-off-by: Waiman Long Acked-by: Roman Gushchin Cc: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Signed-off-by: Andrew Morton --- mm/list_lru.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) --- a/mm/list_lru.c~mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node +++ a/mm/list_lru.c @@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node struct list_lru_one *src, *dst; /* - * If there is no lru entry in this nlru, we can skip it immediately. + * With the lock/unlock pair used within the percpu_ref_kill() + * which is the last function call of memcg_reparent_objcgs(), any + * read issued here will not be reordered before the reparenting + * of objcgs. + * + * Assuming a racing list_lru_add(): + * list_lru_add() + * <- memcg_reparent_list_lru_node() + * spin_lock(&nlru->lock) + * l = list_lru_from_kmem(memcg) + * nlru->nr_items++ + * spin_unlock(&nlru->lock) + * <- memcg_reparent_list_lru_node() + * + * The !spin_is_locked(&nlru->lock) check is true means it is + * either before the spin_lock() or after the spin_unlock(). In the + * former case, list_lru_add() will see the reparented objcg and so + * won't touch the lru to be reparented. In the later case, it will + * see the updated nr_items. So we can use the optimization that if + * there is no lru entry in this nlru, skip it immediately. */ - if (!READ_ONCE(nlru->nr_items)) - return; + if (!spin_is_locked(&nlru->lock)) { + /* nr_items read must be ordered after nlru->lock */ + smp_rmb(); + if (!READ_ONCE(nlru->nr_items)) + return; + } /* * Since list_lru_{add,del} may be called under an IRQ-safe lock, @@ -407,7 +430,7 @@ static void memcg_reparent_list_lru_node spin_lock_irq(&nlru->lock); src = list_lru_from_memcg_idx(lru, nid, src_idx); - if (!src) + if (!src || !src->nr_items) goto out; dst = list_lru_from_memcg_idx(lru, nid, dst_idx); _ Patches currently in -mm which might be from longman@redhat.com are mm-sparsemem-fix-mem_section-will-never-be-null-gcc-12-warning.patch mm-list_lru-fix-possible-race-in-memcg_reparent_list_lru_node.patch ipc-mqueue-use-get_tree_nodev-in-mqueue_get_tree.patch