From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9316436CE1D for ; Tue, 10 Mar 2026 12:13:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773144800; cv=none; b=sPcjzHhrf0jdvRIiOpztEGQlZ1mCPySnwVTm7YTutTkPEFZ20XerFoVGNgHbcms+5q2P6u+R0+LHp2KIgCnSIHcNkIx3hAnTJH8zfM60zcZtjpoBYpnd+iEqSEspzNxt497hWQv/Szuxr7aOEbfcE64z6AcwpujkS8xg7LI4DJc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773144800; c=relaxed/simple; bh=Xw1ctyw+XvLWKsRTclPZ/IslK0hImsaaDs0XGgXMvBE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=T5KvfhChcDoP7xLPyNjPJY0qeF4+beAJx8jhui/hQGlSgMeGRrABMkdrrAalN+uhOupYMrVBD5N6ub7X71wrJjWe3WIQp0n5K7iXh6ivVucVk7e9NOvnmO5uejLyurjEZhcD7cuGNRHVneqAVk+Jg5mFnB6Qv4f9CkH4stT02Z4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=lSf4yTdc; arc=none smtp.client-ip=95.215.58.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="lSf4yTdc" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1773144796; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IVbDBC/gXLUKjiFKFntQ3YXB8R/aXBWwbD87Qad6a5M=; b=lSf4yTdcBxPuznlEYDAK9BVXAUirWopBvZQprwXkWE7MxH9ZpZ02jVh+Shf79NA451GlDh uZOlBVwtkFmf+ggl881gdGBps+hIbZszB9FyD5DshVg2AwAQZrbR2m8OtU0jVWJB4Fx74E FYIkUitsoOF5k5iwpchkHV28MGDTdPg= From: Lance Yang To: dev.jain@arm.com Cc: Liam.Howlett@oracle.com, akpm@linux-foundation.org, anshuman.khandual@arm.com, baohua@kernel.org, david@kernel.org, harry.yoo@oracle.com, jannh@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lorenzo.stoakes@oracle.com, riel@surriel.com, ryan.roberts@arm.com, stable@kernel.org, vbabka@kernel.org, Lance Yang Subject: Re: [PATCH v3] mm/rmap: fix incorrect pte restoration for lazyfree folios Date: Tue, 10 Mar 2026 20:13:04 +0800 Message-ID: <20260310121304.17173-1-lance.yang@linux.dev> In-Reply-To: <20260303061528.2429162-1-dev.jain@arm.com> References: <20260303061528.2429162-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On Tue, 3 Mar 2026 11:45:28 +0530, Dev Jain wrote: >We batch unmap anonymous lazyfree folios by folio_unmap_pte_batch. >If the batch has a mix of writable and non-writable bits, we may end up >setting the entire batch writable. Fix this by respecting writable bit >during batching. >Although on a successful unmap of a lazyfree folio, the soft-dirty bit is >lost, preserve it on pte restoration by respecting the bit during batching, >to make the fix consistent w.r.t both writable bit and soft-dirty bit. > >I was able to write the below reproducer and crash the kernel. >Explanation of reproducer (set 64K mTHP to always): > >Fault in a 64K large folio. Split the VMA at mid-point with MADV_DONTFORK. >fork() - parent points to the folio with 8 writable ptes and 8 non-writable >ptes. Merge the VMAs with MADV_DOFORK so that folio_unmap_pte_batch() can >determine all the 16 ptes as a batch. Do MADV_FREE on the range to mark >the folio as lazyfree. Write to the memory to dirty the pte, eventually >rmap will dirty the folio. Then trigger reclaim, we will hit the pte >restoration path, and the kernel will crash with the following trace: > >[ 21.134473] kernel BUG at mm/page_table_check.c:118! >[ 21.134497] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >[ 21.135917] Modules linked in: >[ 21.136085] CPU: 1 UID: 0 PID: 1735 Comm: dup-lazyfree Not tainted 7.0.0-rc1-00116-g018018a17770 #1028 PREEMPT >[ 21.136858] Hardware name: linux,dummy-virt (DT) >[ 21.137019] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) >[ 21.137308] pc : page_table_check_set+0x28c/0x2a8 >[ 21.137607] lr : page_table_check_set+0x134/0x2a8 >[ 21.137885] sp : ffff80008a3b3340 >[ 21.138124] x29: ffff80008a3b3340 x28: fffffdffc3d14400 x27: ffffd1a55e03d000 >[ 21.138623] x26: 0040000000000040 x25: ffffd1a55f7dd000 x24: 0000000000000001 >[ 21.139045] x23: 0000000000000001 x22: 0000000000000001 x21: ffffd1a55f217f30 >[ 21.139629] x20: 0000000000134521 x19: 0000000000134519 x18: 005c43e000040000 >[ 21.140027] x17: 0001400000000000 x16: 0001700000000000 x15: 000000000000ffff >[ 21.140578] x14: 000000000000000c x13: 005c006000000000 x12: 0000000000000020 >[ 21.140828] x11: 0000000000000000 x10: 005c000000000000 x9 : ffffd1a55c079ee0 >[ 21.141077] x8 : 0000000000000001 x7 : 005c03e000040000 x6 : 000000004000ffff >[ 21.141490] x5 : ffff00017fffce00 x4 : 0000000000000001 x3 : 0000000000000002 >[ 21.141741] x2 : 0000000000134510 x1 : 0000000000000000 x0 : ffff0000c08228c0 >[ 21.141991] Call trace: >[ 21.142093] page_table_check_set+0x28c/0x2a8 (P) >[ 21.142265] __page_table_check_ptes_set+0x144/0x1e8 >[ 21.142441] __set_ptes_anysz.constprop.0+0x160/0x1a8 >[ 21.142766] contpte_set_ptes+0xe8/0x140 >[ 21.142907] try_to_unmap_one+0x10c4/0x10d0 >[ 21.143177] rmap_walk_anon+0x100/0x250 >[ 21.143315] try_to_unmap+0xa0/0xc8 >[ 21.143441] shrink_folio_list+0x59c/0x18a8 >[ 21.143759] shrink_lruvec+0x664/0xbf0 >[ 21.144043] shrink_node+0x218/0x878 >[ 21.144285] __node_reclaim.constprop.0+0x98/0x338 >[ 21.144763] user_proactive_reclaim+0x2a4/0x340 >[ 21.145056] reclaim_store+0x3c/0x60 >[ 21.145216] dev_attr_store+0x20/0x40 >[ 21.145585] sysfs_kf_write+0x84/0xa8 >[ 21.145835] kernfs_fop_write_iter+0x130/0x1c8 >[ 21.145994] vfs_write+0x2b8/0x368 >[ 21.146119] ksys_write+0x70/0x110 >[ 21.146240] __arm64_sys_write+0x24/0x38 >[ 21.146380] invoke_syscall+0x50/0x120 >[ 21.146513] el0_svc_common.constprop.0+0x48/0xf8 >[ 21.146679] do_el0_svc+0x28/0x40 >[ 21.146798] el0_svc+0x34/0x110 >[ 21.146926] el0t_64_sync_handler+0xa0/0xe8 >[ 21.147074] el0t_64_sync+0x198/0x1a0 >[ 21.147225] Code: f9400441 b4fff241 17ffff94 d4210000 (d4210000) >[ 21.147440] ---[ end trace 0000000000000000 ]--- > > >#define _GNU_SOURCE >#include >#include >#include >#include >#include >#include >#include >#include > >void write_to_reclaim() { > const char *path = "/sys/devices/system/node/node0/reclaim"; I wasn't able to get this reproducer working with node reclaim, but using memcg v1 memory.force_empty worked fine for me. [...] > >Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation") >Cc: stable >Signed-off-by: Dev Jain >--- Thanks! Tested-by: Lance Yang