From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2391226AC3 for ; Sun, 26 Apr 2026 17:55:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777226134; cv=none; b=HErXZazZVNss5B4KsTdR1BsfJfylXW1J+imCgfPlcC5slCjCX2e8xTGisrWagBqWrWT0LyseqjkLAFttFZ1VQ0sM0be30dJzCLW7ZJU0kpVTpkjlAYC+kUKSHI5HGtfNoItCkhx6uD5EeTuSEpx4wvCKfyoxdjuyUFtv2y4010I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777226134; c=relaxed/simple; bh=mXOS8t04jSzoeTj2rqO9CLfoH5hEFOMVTL70yfmZC0k=; h=Date:From:To:Cc:Subject:Message-Id:In-Reply-To:References: Mime-Version:Content-Type; b=W2JdJLuLkftz5wL31QGY+o5YuLxXTMigklXyvyFSW/2yX7LZsJn3rxL9tVKvR9iHuf9ohpQuHtGFK0Ao9tg3gZTeEcXHoZdCt1Hc/9SWkMqPqsGrfTpsuKIXMkkWnycl09WFX9tEPVLcUhqv9mKGv15ZYJBNoLYhped5/To8i00= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=2QepOV6Z; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="2QepOV6Z" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 553F2C2BCAF; Sun, 26 Apr 2026 17:55:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1777226133; bh=mXOS8t04jSzoeTj2rqO9CLfoH5hEFOMVTL70yfmZC0k=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=2QepOV6Z61ReiTObECNklIqDbeOPTElm/Pi8Xa0fndJCSHLwVfVhlKALqwFCTGf28 SXZkvKI9v17XSGxHsNJA/NA+EUMMMrtQfJ1pD60FhHMoIlOjbW8M2dHrMYK6DJMSXt T5oGkHvUX7PnBy4178JQPwGRV4eeNNYgR9vtdvVw= Date: Sun, 26 Apr 2026 10:55:32 -0700 From: Andrew Morton To: Qi Zheng Cc: shakeel.butt@linux.dev, syzbot , Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, ljs@kernel.org, surenb@google.com, syzkaller-bugs@googlegroups.com, vbabka@kernel.org, Muchun Song Subject: Re: [syzbot] [mm?] WARNING: bad unlock balance in do_wp_page Message-Id: <20260426105532.43768b24a42744f1b52fdff2@linux-foundation.org> In-Reply-To: References: <69edca15.170a0220.38e3f1.0000.GAE@google.com> <20260426034938.db29d74982a8eb8463f8cf3a@linux-foundation.org> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 26 Apr 2026 23:57:42 +0800 Qi Zheng wrote: > Hi Andrew, > > On 4/26/26 6:49 PM, Andrew Morton wrote: > > On Sun, 26 Apr 2026 01:17:25 -0700 syzbot wrote: > > > >> Hello, > >> > >> syzbot found the following issue on: > >> > >> HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi.. > >> git tree: upstream > >> console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000 > >> kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb > >> dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c > >> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 > >> userspace arch: i386 > >> > >> Unfortunately, I don't have any reproducer for this issue yet. > > > > argh, that dreaded sentence. > > > > Thanks. > > > > Something's definitely amiss. This is at least the fifth report of > > rcu_read_lock() imbalance post-7.0. Others: > > > > https://lore.kernel.org/69eab803.a00a0220.17a17.004a.GAE@google.com > > https://lore.kernel.org/69eab803.a00a0220.17a17.004b.GAE@google.com > > https://lore.kernel.org/69eafb0e.a00a0220.9259.0031.GAE@google.com > > https://lore.kernel.org/69ebcbe2.a00a0220.7773.0005.GAE@google.com > > All the kernel configs mentioned above include 'CONFIG_MEMCG_V1=y'. > > Theoretically, a rebind_subsystems() can lead a rcu unbalance, see my > previous discussion with Shakeel for details: > > https://lore.kernel.org/all/358c60e1-fa91-40a1-9e00-84c93340c04e@linux.dev/ Right, that looks similar. The rcu locking under lruvec_stat_mod_folio() is very simple, and that return in get_non_dying_memcg_end() does look super suspicious. Why does it omit the unlock? otoh, in https://lore.kernel.org/all/69eafb0e.a00a0220.9259.0031.GAE@google.com/ we're trying to release an rcu_read_lock() which isn't presently held. But if cgroup_subsys_on_dfl() were to become false between the get_non_dying_memcg_start/end pair, that's what would happen. So yup, I agree, concurrent rebind_subsystems() activity could cause all of this. The reports are pretty common - is there some debugging patch we can temporarily add to confirm this theory? And/or is it possible to cook up a selftest which will trigger this? > However, in a production environment, this is practically impossible. Can you expand on this? sysbot isn't a production environment ;) > So Shakeel and I chose to wait for a reproducer at the time. :( > > > > > In some cases we released it too often, in other cases we failed to > > release it. > > > > The first one is slightly more useful in that it tells us that the > > not-released rcu_read_lock() was taken in folio_lruvec_lock_irqsave(). > > I double-checked some callers of folio_lruvec_lock_irqsave() (such as > folios_put_refs()), but didn't find anything suspicious. :( Right - it's rare and smells of a race condition.