From: Mike Kravetz <mike.kravetz@oracle.com>
To: Rik van Riel <riel@surriel.com>
Cc: Chris Mason <clm@meta.com>, David Hildenbrand <david@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [BUG] hugetlbfs_no_page vs MADV_DONTNEED race leading to SIGBUS
Date: Tue, 25 Oct 2022 21:13:13 -0700 [thread overview]
Message-ID: <Y1iz2dgCEvbpSPR1@monkey> (raw)
In-Reply-To: <Y1ikfoLS0SdA6dna@monkey>
On 10/25/22 20:07, Mike Kravetz wrote:
> On 10/25/22 16:37, Rik van Riel wrote:
> > Hi Mike,
> >
> > After getting promising results initially, we discovered there
> > is yet another bug left with hugetlbfs MADV_DONTNEED.
> >
> > This one involves a page fault on a hugetlbfs address, while
> > another thread in the same process is in the middle of MADV_DONTNEED
> > on that same memory address.
> >
> > The code in __unmap_hugepage_range() will clear the page table
> > entry, and then at some point later the lazy TLB code will
> > actually free the huge page back into the hugetlbfs free page
> > pool.
> >
> > Meanwhile, hugetlb_no_page will call alloc_huge_page, and that
> > will fail because the code calling __unmap_hugepage_range() has
> > not actually returned the page to the free list yet.
> >
> > The result is that the process gets killed with SIGBUS.
>
> Thanks for the details Rik. That makes perfect sense.
>
> > I have thought of a few different solutions to this problem, but
> > none of them look good:
> > - Make MADV_DONTNEED take a write lock on mmap_sem, to exclude
> > page faults. This could make MADV_DONTNEED on VMAs with 4kB
> > pages unacceptably slow.
> > - Some sort of atomic counter kept by __unmap_hugepage_range()
> > that huge pages may be getting placed in the tlb gather, and
> > freed later by tlb_finish_mmu(). This would involve changes
> > to the MMU gather code, outside of hugetlbfs.
> > - Some sort of generation counter that tracks tlb_gather_mmu
> > cycles in progress, with the alloc_huge_page failure path
> > waiting until all mmu gather operations that started before
> > it to finish, before retrying the allocation. This requires
> > changes to the generic code, outside of hugetlbfs.
> >
> > What are the reasonable alternatives here?
> >
> > Should we see if anybody can come up with a simple solution
> > to the problem, or would it be better to just disable
> > MADV_DONTNEED on hugetlbfs for now?
>
> I am thinking that we could use the vma_lock to prevent faults on the
> vma until the MADV_DONTNEED processing is totally complete. IIUC, it is
> the tlb_finish_mmu call that actually drops the ref count on the pages
> and returns them to the free list. Currently, we hold the vma_lock in
> write mode during unmap, and acquire it in read mode during faults.
> However, we drop it in the MADV_DONTNEED path (just a bit) before calling
> tlb_finish_mmu. So, holding a bit longer may address this issue rather
> simply.
Sorry, that is wrong! The vma_lock only applies to sharable mappings
and this is certainly about non-sharable mappings. Please disregard.
I will continue to think about this and try to come up with a solution.
If not, we may need to disable hugetlb MADV_DONTNEED.
I really don't want to modify too much (if any) non-hugtlb code to
accommodate this.
--
Mike Kravetz
next prev parent reply other threads:[~2022-10-26 4:13 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-25 20:37 [BUG] hugetlbfs_no_page vs MADV_DONTNEED race leading to SIGBUS Rik van Riel
2022-10-26 3:07 ` Mike Kravetz
2022-10-26 4:13 ` Mike Kravetz [this message]
2022-10-26 19:38 ` Mike Kravetz
2022-10-27 15:07 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1iz2dgCEvbpSPR1@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=clm@meta.com \
--cc=david@redhat.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@surriel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.