From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92181C02182 for ; Tue, 21 Jan 2025 05:00:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E68B36B007B; Tue, 21 Jan 2025 00:00:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E18C16B0082; Tue, 21 Jan 2025 00:00:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D07DA6B0083; Tue, 21 Jan 2025 00:00:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B0FF46B007B for ; Tue, 21 Jan 2025 00:00:20 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 47BAE1A06CD for ; Tue, 21 Jan 2025 05:00:20 +0000 (UTC) X-FDA: 83030257800.21.C1A53EA Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf20.hostedemail.com (Postfix) with ESMTP id 083C31C0008 for ; Tue, 21 Jan 2025 05:00:17 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=WJofv2U3; spf=none (imf20.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737435618; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sVJNRYcC45C+fk8/emdWmRDSfn/hmkyWe5KKx8RKg8o=; b=tt4GPHuy9L3cHlUevMflZekOYZ3cO5ddn8es/Q5pRtj0fTdPG4SEvKs3JDsZ8Anl1HwXWa fSWvOlN7trURGNwYiUvB0nnmbfVCh3Z9kPNmD/I3Yi1Xxu+J5kms34YTwd/6nxwlItTsXF kWUDFtMTHkEx5tlZdGBjsZKOJ1rFyj0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=WJofv2U3; spf=none (imf20.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737435618; a=rsa-sha256; cv=none; b=4g79Ni7xZ2fxY5LgsxSvjBKQ7mXAg3IZHHuP8pKjGAiMqY6YUf3ao8eKiKKb4eZzZFXnhY jV/JxujPbmstb1Lv/FkYjXwP1HANUc/kuhkl7XxXNLfK5iuok0kaaCXpWI5iIZnr00X6mB 2+I5UOMlyiPKm7/Rrd+QXyZh7Ia1EHk= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=sVJNRYcC45C+fk8/emdWmRDSfn/hmkyWe5KKx8RKg8o=; b=WJofv2U3TYkJUlgOSAaerZKUEA 19LpGlT3PdmhsUuYxBq/baxqATuA6rAH+NfFVedJ9LVxz7sSKMkDUxEjJTg2JGgkpgrL+tN+VDR9R e0KPx4k/XaPNAbeR4/Np5libYaqhCGjdNdnkITTx5GG8oFJT4nLhSsf3Pael3VR0KoP5BcKM2wAo8 MQETD520QroOHAhorBJH3GmqaDJa0b+xpqzjZQvyYdZ+km/45foLUtHPGhhelmts59t0y5J8697z9 tuQXM4MeGLUCiaDey0OsY3ijloPH0ws/yvrsxIbEQTkjkc3QCtaH+OZPcxO/gGOO0jppKz/xGUQ/5 0001tn7Q==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1ta6N0-0000000FpWy-2EN8; Tue, 21 Jan 2025 05:00:10 +0000 Date: Tue, 21 Jan 2025 05:00:10 +0000 From: Matthew Wilcox To: Jane Chu Cc: akpm@linux-foundation.org, linmiaohe@huawei.com, kirill.shutemov@linux.intel.com, hughd@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: make page_mapped_in_vma() hugetlb walk aware Message-ID: References: <20250121041849.3393237-1-jane.chu@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250121041849.3393237-1-jane.chu@oracle.com> X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 083C31C0008 X-Stat-Signature: k6ghpn5nywrb4bgb8zfxdnib73g11jjy X-Rspam-User: X-HE-Tag: 1737435617-293561 X-HE-Meta: U2FsdGVkX185afRocnhhAXCkvc6D5bP6EkbL65F7P46ZLLLKvW1v6cSh3mqc5bPGK05cb2hRXm3bFaIOTAXF4Nlexoj2SYpYRl0kLDxDnHzoylnuZmRF9aySIi7xYYTn4x6/OatBVSNatKzy7sQMOfrwg8ZLJ9ltthOk26rWGRGHn6wPxL5hsyRwgpKzMjyum7T1o1vi/JrvoD0ikdHuHJIDteOYwAi0IvGGCottkAso0cAeQLt53PJezKut3cVxBaER0tAb490+5rGRA9vnOBphPd5AtzVM9gGWUjU/VA1+RavOWNpq6BWBK6GCwhMzeWHAPLX3r3ofmfVNJTyvYb2ReQZucPAya4A93zDPT1Vi+TgVErJIHg27PssavBqsuul9lGRXOhIXNsbSl9SgXQ1i9qX+eVPAjUqGXRIr62Fc9wEvUK5NiYi5hOrPKze45mgC9dmLMfUCkXMGNPD21GUP7J5x/EnA56MsO8j+sAl8NZzaWROlblqeLYx2svohmLuLXWsojWHrMrsJJ637FXpK8ZB6N9RDYC+mRVqlHfue9uR6jMSSt6MJQnieZbuL7FLysoBH3ERK/RAC/76iW8IZ671iFyr2X0aqEJDj2IxUao5GLc+sZgmP3kCHsBjPNUNuIe8msXO/LnNocfpbZvEnnRsduRLAn/QldCvzzfpZm8ar7m+U1MVVbKZbxhJ9vD0VbfkcSdoUUruLT6Xs5fWTo74UvR8CzdKBTV3KhE0k8tIRU8HJWurI0RlxVCtTcKidPB6mAKw3awn74F427J96KS76qHH4IkuaJNHmIaBLDlD/NoX1c7rTk5P036PUnR5m1OOycmjW/39lDuogIMKv0e9OJPxPZu+HRUd3qHac/BIwb6bh6OBh6iGdt2kxDmH7eVRvgStDF07DBrDvjiswH8mfJekqkgwPECFd6p82kTHE8j2NStB8fYOdFBOYtyux6H6DWCP2tRdg133 4nz6NNhj tyhi2xYf9efF7ZDZN8OVoqjjOPiOwyJxrIY7QKkfJ5UYozh5nZvrVORRZtaenRLuHVmQjZdEeUbfLn8YYRWdXbfoGX31ZWYaWVPv/xS6e7rorOCHobgJykU9PjVUrXQjGuuESAZe13VpmV9RqYu4eo3DGmp0Y3u2fDkjUWmCHoRfGPTP8CCn745f7ySqW57xA95l3mVaeggbA2U6Nw59Y1BHGPP6X33obmoPEqdxXZV4bT9+Vj42k9D1pFh6SgGc9yPS/jRADW+cojlVb8XfueMO5Ng== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 20, 2025 at 09:18:49PM -0700, Jane Chu wrote: > When a process consumes a UE in a page, the memory failure handler > attempts to collect information for a potential SIGBUS. > If the page is an anonymous page, page_mapped_in_vma(page, vma) is > invoked in order to > 1. retrieve the vaddr from the process' address space, > 2. verify that the vaddr is indeed mapped to the poisoned page, > where 'page' is the precise small page with UE. > > It's been observed that when injecting poison to a non-head subpage > of an anonymous hugetlb page, no SIGBUS show up; while injecting to > the head page produces a SIGBUS. The casue is that, though hugetlb_walk() > returns a valid pmd entry (on x86), but check_pte() detects mismatch > between the head page per the pmd and the input subpage. Thus the vaddr > is considered not mapped to the subpage and the process is not collected > for SIGBUS purpose. This is the calling stack > collect_procs_anon > page_mapped_in_vma > page_vma_mapped_walk > hugetlb_walk > huge_pte_lock > check_pte > > It seems that the most obvious place to fix the issue is by making > page_mapped_in_vma() hugetlb walk aware. The precise subpage in the > input is useful in providing PAGE_SIZE granularity vaddr. I don't like this solution because it adds yet another special case for hugetlb. If we don't split a PMD-mapped THP, we'd have the same problem, right? check_pte() would succeed if we set pvmw->pfn to folio_pfn() and pvmw->nr_pages to folio_nr_pages(), right? I just don't know what else might be affected by that. I like one of these two options: @@ -206,6 +206,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pvmw->pte = hugetlb_walk(vma, pvmw->address, size); if (!pvmw->pte) return false; + pvmw->pte += pvmw->address & (size - PAGE_SIZE); pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); if (!check_pte(pvmw)) (that needs a bit of tidying up; you can't just do that, but I think you get the basic idea -- correct the pte to point to the precise page instead of the hugetlb pfn) The option I really prefer is much more work but matches our preferred direction of getting rid of hugetlb specific code. Something like this: @@ -192,27 +192,6 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pmd && !pvmw->pte) return not_found(pvmw); - if (unlikely(is_vm_hugetlb_page(vma))) { - struct hstate *hstate = hstate_vma(vma); - unsigned long size = huge_page_size(hstate); - /* The only possible mapping was handled on last iteration */ [...] - pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); - if (!check_pte(pvmw)) - return not_found(pvmw); - return true; - } - end = vma_address_end(pvmw); if (pvmw->pte) goto next_pte; @@ -229,7 +208,19 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw continue; } pud = pud_offset(p4d, pvmw->address); - if (!pud_present(*pud)) { + pude = *pud; + if (pud_trans_huge(pude) || + (pud_present(pude) && pud_devmap(pude))) { + pvmw->ptl = pud_lock(mm, pvmw->pud); + ... + if (likely(pud_trans_huge(pude) || pud_devmap(pude))) { + if (pvmw->flags & PVMW_MIGRATION) + return not_found(pvmw); + if (!check_pud(pud_pfn(pude), pvmw)) + return not_found(pvmw); + return true; + } + } else if (!pud_present(pude)) { step_forward(pvmw, PUD_SIZE); continue; } ie get rid of all the hugetlb-specific code, and add support for the PUD level to the common code. You'd also need to write check_pud(). I'll understand if you don't want to do all the extra work. And thanks for tracking down this bug.