From: Hugh Dickins <hughd@google.com>
To: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Mike Rapoport <rppt@kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@infradead.org>,
David Hildenbrand <david@redhat.com>,
Suren Baghdasaryan <surenb@google.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Yang Shi <shy828301@gmail.com>,
Mel Gorman <mgorman@techsingularity.net>,
Peter Xu <peterx@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
Alistair Popple <apopple@nvidia.com>,
Ralph Campbell <rcampbell@nvidia.com>,
Ira Weiny <ira.weiny@intel.com>,
Steven Price <steven.price@arm.com>,
SeongJae Park <sj@kernel.org>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Huang Ying <ying.huang@intel.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>,
Axel Rasmussen <axelrasmussen@google.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Minchan Kim <minchan@kernel.org>,
Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>,
Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
Russell King <linux@armlinux.org.uk>,
"David S. Miller" <davem@davemloft.net>,
Michael Ellerman <mpe@ellerman.id.au>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Vishal Moola <vishal.moola@gmail.com>,
Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
sparclinux@vger.kernel.org,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
linux-s390 <linux-s390@vger.kernel.org>,
kernel list <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock()
Date: Mon, 14 Aug 2023 23:34:05 -0700 (PDT) [thread overview]
Message-ID: <cacd4a19-386d-8bea-400-e99778dbc3b@google.com> (raw)
In-Reply-To: <CAG48ez0FxiRC4d3VTu_a9h=rg5FW-kYD5Rg5xo_RDBM0LTTqZQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]
On Mon, 14 Aug 2023, Jann Horn wrote:
> On Wed, Jul 12, 2023 at 6:42 AM Hugh Dickins <hughd@google.com> wrote:
> > Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp().
> > It does need mmap_read_lock(), but it does not need mmap_write_lock(),
> > nor vma_start_write() nor i_mmap lock nor anon_vma lock. All racing
> > paths are relying on pte_offset_map_lock() and pmd_lock(), so use those.
>
> We can still have a racing userfaultfd operation at the "/* step 4:
> remove page table */" point that installs a new PTE before the page
> table is removed.
>
> To reproduce, patch a delay into the kernel like this:
>
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 9a6e0d507759..27cc8dfbf3a7 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -20,6 +20,7 @@
> #include <linux/swapops.h>
> #include <linux/shmem_fs.h>
> #include <linux/ksm.h>
> +#include <linux/delay.h>
>
> #include <asm/tlb.h>
> #include <asm/pgalloc.h>
> @@ -1617,6 +1618,11 @@ int collapse_pte_mapped_thp(struct mm_struct
> *mm, unsigned long addr,
> }
>
> /* step 4: remove page table */
> + if (strcmp(current->comm, "DELAYME") == 0) {
> + pr_warn("%s: BEGIN DELAY INJECTION\n", __func__);
> + mdelay(5000);
> + pr_warn("%s: END DELAY INJECTION\n", __func__);
> + }
>
> /* Huge page lock is still held, so page table must remain empty */
> pml = pmd_lock(mm, pmd);
>
>
> And then run the attached reproducer against mm/mm-everything. You
> should get this in dmesg:
>
> [ 206.578096] BUG: Bad rss-counter state mm:000000000942ebea
> type:MM_ANONPAGES val:1
Thanks a lot, Jann. I haven't thought about it at all yet; and just
tried to reproduce, but haven't yet got the "BUG: Bad rss-counter":
just see "Invalid argument" on the UFFDIO_COPY ioctl.
Will investigate tomorrow.
Hugh
WARNING: multiple messages have this Message-ID (diff)
From: Hugh Dickins <hughd@google.com>
To: Jann Horn <jannh@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
David Hildenbrand <david@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>,
kernel list <linux-kernel@vger.kernel.org>,
Song Liu <song@kernel.org>,
sparclinux@vger.kernel.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Will Deacon <will@kernel.org>,
linux-s390 <linux-s390@vger.kernel.org>,
Yu Zhao <yuzhao@google.com>, Ira Weiny <ira.weiny@intel.com>,
Alistair Popple <apopple@nvidia.com>,
Hugh Dickins <hughd@google.com>,
Russell King <linux@armlinux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
Steven Price <steven.price@arm.com>,
Christoph Hellwig <hch@infradead.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
Zi Yan <ziy@nvidia.com>, Huang Ying <ying.huang@intel.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ib m.com>,
Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
Ralph Campbell <rcampbell@nvidia.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Heiko Carstens <hca@linux.ibm.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
SeongJae Park <sj@kernel.org>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Linux-MM <linux-mm@kvack.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
Zack Rusin <zackr@vmware.com>,
Vishal Moola <vishal.moola@gmail.com>,
Minchan Kim <minchan@kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
"David S. Miller" <davem@davemloft.net>,
Mike Rapoport <rppt@kernel.org>,
Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock()
Date: Mon, 14 Aug 2023 23:34:05 -0700 (PDT) [thread overview]
Message-ID: <cacd4a19-386d-8bea-400-e99778dbc3b@google.com> (raw)
In-Reply-To: <CAG48ez0FxiRC4d3VTu_a9h=rg5FW-kYD5Rg5xo_RDBM0LTTqZQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]
On Mon, 14 Aug 2023, Jann Horn wrote:
> On Wed, Jul 12, 2023 at 6:42 AM Hugh Dickins <hughd@google.com> wrote:
> > Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp().
> > It does need mmap_read_lock(), but it does not need mmap_write_lock(),
> > nor vma_start_write() nor i_mmap lock nor anon_vma lock. All racing
> > paths are relying on pte_offset_map_lock() and pmd_lock(), so use those.
>
> We can still have a racing userfaultfd operation at the "/* step 4:
> remove page table */" point that installs a new PTE before the page
> table is removed.
>
> To reproduce, patch a delay into the kernel like this:
>
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 9a6e0d507759..27cc8dfbf3a7 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -20,6 +20,7 @@
> #include <linux/swapops.h>
> #include <linux/shmem_fs.h>
> #include <linux/ksm.h>
> +#include <linux/delay.h>
>
> #include <asm/tlb.h>
> #include <asm/pgalloc.h>
> @@ -1617,6 +1618,11 @@ int collapse_pte_mapped_thp(struct mm_struct
> *mm, unsigned long addr,
> }
>
> /* step 4: remove page table */
> + if (strcmp(current->comm, "DELAYME") == 0) {
> + pr_warn("%s: BEGIN DELAY INJECTION\n", __func__);
> + mdelay(5000);
> + pr_warn("%s: END DELAY INJECTION\n", __func__);
> + }
>
> /* Huge page lock is still held, so page table must remain empty */
> pml = pmd_lock(mm, pmd);
>
>
> And then run the attached reproducer against mm/mm-everything. You
> should get this in dmesg:
>
> [ 206.578096] BUG: Bad rss-counter state mm:000000000942ebea
> type:MM_ANONPAGES val:1
Thanks a lot, Jann. I haven't thought about it at all yet; and just
tried to reproduce, but haven't yet got the "BUG: Bad rss-counter":
just see "Invalid argument" on the UFFDIO_COPY ioctl.
Will investigate tomorrow.
Hugh
next prev parent reply other threads:[~2023-08-15 6:35 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-12 4:27 [PATCH v3 00/13] mm: free retracted page table by RCU Hugh Dickins
2023-07-12 4:27 ` Hugh Dickins
2023-07-12 4:30 ` [PATCH v3 01/13] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins
2023-07-12 4:30 ` Hugh Dickins
2023-07-12 4:32 ` [PATCH v3 02/13] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins
2023-07-12 4:32 ` Hugh Dickins
2023-07-12 4:33 ` [PATCH v3 03/13] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins
2023-07-12 4:33 ` Hugh Dickins
2023-07-12 4:34 ` [PATCH v3 04/13] powerpc: assert_pte_locked() " Hugh Dickins
2023-07-12 4:34 ` Hugh Dickins
2023-07-18 10:41 ` Aneesh Kumar K.V
2023-07-18 10:41 ` Aneesh Kumar K.V
2023-07-19 5:04 ` Hugh Dickins
2023-07-19 5:04 ` Hugh Dickins
2023-07-19 5:24 ` Aneesh Kumar K V
2023-07-19 5:24 ` Aneesh Kumar K V
2023-07-21 13:13 ` Jay Patel
2023-07-21 13:13 ` Jay Patel
2023-07-23 22:26 ` [PATCH v3 04/13 fix] powerpc: assert_pte_locked() use pte_offset_map_nolock(): fix Hugh Dickins
2023-07-23 22:26 ` Hugh Dickins
2023-07-12 4:35 ` [PATCH v3 05/13] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-07-12 4:35 ` Hugh Dickins
2023-07-12 4:37 ` [PATCH v3 06/13] sparc: add pte_free_defer() for pte_t *pgtable_t Hugh Dickins
2023-07-12 4:37 ` Hugh Dickins
2023-07-12 4:38 ` [PATCH v3 07/13] s390: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-07-12 4:38 ` Hugh Dickins
2023-07-13 4:47 ` Alexander Gordeev
2023-07-13 4:47 ` Alexander Gordeev
2023-07-19 14:25 ` Claudio Imbrenda
2023-07-19 14:25 ` Claudio Imbrenda
2023-07-23 22:29 ` [PATCH v3 07/13 fix] s390: add pte_free_defer() for pgtables sharing page: fix Hugh Dickins
2023-07-23 22:29 ` Hugh Dickins
2023-07-12 4:39 ` [PATCH v3 08/13] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins
2023-07-12 4:39 ` Hugh Dickins
2023-07-12 4:41 ` [PATCH v3 09/13] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins
2023-07-12 4:41 ` Hugh Dickins
2023-07-12 4:42 ` [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins
2023-07-12 4:42 ` Hugh Dickins
2023-07-23 22:32 ` [PATCH v3 10/13 fix] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock(): fix Hugh Dickins
2023-07-23 22:32 ` Hugh Dickins
2023-08-03 9:17 ` [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Qi Zheng
2023-08-03 9:17 ` Qi Zheng
2023-08-06 3:55 ` Hugh Dickins
2023-08-06 3:55 ` Hugh Dickins
2023-08-07 2:21 ` Qi Zheng
2023-08-07 2:21 ` Qi Zheng
2023-08-06 3:59 ` [PATCH v3 10/13 fix2] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock(): fix2 Hugh Dickins
2023-08-06 3:59 ` Hugh Dickins
2023-08-14 20:36 ` [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Jann Horn
2023-08-14 20:36 ` Jann Horn
2023-08-15 6:34 ` Hugh Dickins [this message]
2023-08-15 6:34 ` Hugh Dickins
2023-08-15 7:11 ` David Hildenbrand
2023-08-15 7:11 ` David Hildenbrand
2023-08-15 15:41 ` Hugh Dickins
2023-08-15 15:41 ` Hugh Dickins
2023-08-21 19:48 ` Hugh Dickins
2023-08-21 19:48 ` Hugh Dickins
2023-07-12 4:43 ` [PATCH v3 11/13] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins
2023-07-12 4:43 ` Hugh Dickins
2023-07-23 22:35 ` [PATCH v3 11/13 fix] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps(): fix Hugh Dickins
2023-07-23 22:35 ` Hugh Dickins
2023-07-12 4:44 ` [PATCH v3 12/13] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins
2023-07-12 4:44 ` Hugh Dickins
2023-07-12 4:48 ` [PATCH mm " Hugh Dickins
2023-07-12 4:48 ` Hugh Dickins
2023-07-12 4:46 ` [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]() Hugh Dickins
2023-07-12 4:46 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cacd4a19-386d-8bea-400-e99778dbc3b@google.com \
--to=hughd@google.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hch@infradead.org \
--cc=imbrenda@linux.ibm.com \
--cc=ira.weiny@intel.com \
--cc=jannh@google.com \
--cc=jgg@ziepe.ca \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lstoakes@gmail.com \
--cc=mgorman@techsingularity.net \
--cc=mike.kravetz@oracle.com \
--cc=minchan@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=naoya.horiguchi@nec.com \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rcampbell@nvidia.com \
--cc=rppt@kernel.org \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=song@kernel.org \
--cc=sparclinux@vger.kernel.org \
--cc=steven.price@arm.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
--cc=zackr@vmware.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.