From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yang Shi <shy828301@gmail.com>,
Matthew Wilcox <willy@infradead.org>,
syzbot <syzbot+152d76c44ba142f8992b@syzkaller.appspotmail.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, llvm@lists.linux.dev, nathan@kernel.org,
ndesaulniers@google.com, songmuchun@bytedance.com,
syzkaller-bugs@googlegroups.com, trix@redhat.com
Subject: Re: [syzbot] general protection fault in PageHeadHuge
Date: Thu, 29 Sep 2022 16:33:53 -0700 [thread overview]
Message-ID: <YzYrYVeA0b9d5dos@monkey> (raw)
In-Reply-To: <YzMomT+OusJnLOPC@x1n>
On 09/27/22 12:45, Peter Xu wrote:
> On Tue, Sep 27, 2022 at 09:24:37AM -0700, Mike Kravetz wrote:
> > This should guarantee a read fault independent of what pthread_mutex_lock
> > does. However, it still results in the occasional "ERROR: unexpected write
> > fault". So, something else if happening. I will continue to experiment
> > and think about this.
>
> Thanks for verifying this, Mike. I didn't yet reply but I did have some
> update on my side too. I plan to look deeper and wanted to reply until
> that, because I do think there's something special on hugetlb and I still
> don't know. I just keep getting distracted by other things.. but maybe I
> should still share out what I have already.
>
> I think I already know what had guaranteed the read faults - the NPTL
> pthread lib implemented mutex in different types, and the 1st instruction
> of lock() is to fetch the mutex type (at offset 0x10) then we know how we
> should move on:
>
> (gdb) disas pthread_mutex_lock
> Dump of assembler code for function ___pthread_mutex_lock:
> 0x00007ffff7e3b0d0 <+0>: endbr64
> 0x00007ffff7e3b0d4 <+4>: mov 0x10(%rdi),%eax <---- read 0x10 of &mutex
> 0x00007ffff7e3b0d7 <+7>: mov %eax,%edx
> 0x00007ffff7e3b0d9 <+9>: and $0x17f,%edx
> (gdb) ptype pthread_mutex_t
> type = union {
> struct __pthread_mutex_s __data;
> char __size[40];
> long __align;
> }
> (gdb) ptype struct __pthread_mutex_s
> type = struct __pthread_mutex_s {
> int __lock;
> unsigned int __count;
> int __owner;
> unsigned int __nusers;
> int __kind; <--- 0x10 offset here
> short __spins;
> short __elision;
> __pthread_list_t __list;
> }
>
> I looked directly at asm dumped from the binary I tested to make sure it's
> accurate. So it means with current uffd selftest logically there should
> never be a write missing fault (I still don't think it's good to have the
> write check though.. but that's separate question to ask).
>
> It also means hugetlb does something special here. It smells really like
> for some reason the hugetlb pgtable got evicted after UFFDIO_COPY during
> locking_thread running, then any further lock() (e.g. cmpxchg) or modifying
> the counter could trigger a write fault.
>
> OTOH this also explained why futex code was never tested on userfaultfd
> selftest, simply because futex() will always to be after that "read the
> type of mutex" thing.. which is something I want to rework a bit, so as to
> have uffd selftest to cover gup as you used to rightfully suggest. But
> that'll be nothing urgent, and be something after we know what's special
> with hugetlb new code.
>
I was able to do a little more debugging:
As you know the hugetlb calling path to handle_userfault is something
like this,
hugetlb_fault
mutex_lock(&hugetlb_fault_mutex_table[hash]);
ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h));
if (huge_pte_none_mostly())
hugetlb_no_page()
page = find_lock_page(mapping, idx);
if (!page) {
if (userfaultfd_missing(vma))
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
return handle_userfault()
}
For anon mappings, find_lock_page() will never find a page, so as long
as huge_pte_none_mostly() is true we will call into handle_userfault().
Since your analysis shows the testcase should never call handle_userfault() for
a write fault, I simply added a 'if (flags & FAULT_FLAG_WRITE) printk' before
the call to handle_userfault(). Sure enough, I saw plenty of printk messages.
Then, before calling handle_userfault() I added code to take the page table
lock and test huge_pte_none_mostly() again. In every FAULT_FLAG_WRITE case,
this second test of huge_pte_none_mostly() was false. So, the condition
changed from the check in hugetlb_fault until the check (with page table
lock) in hugetlb_no_page.
IIUC, the only code that should be modifying the pte in this test is
hugetlb_mcopy_atomic_pte(). It also holds the hugetlb_fault_mutex while
updating the pte.
It 'appears' that hugetlb_fault is not seeing the updated pte and I can
only guess that it is due to some caching issues.
After writing the pte in hugetlb_mcopy_atomic_pte() there is this comment.
/* No need to invalidate - it was non-present before */
update_mmu_cache(dst_vma, dst_addr, dst_pte);
I suspect that is true. However, it seems like this test depends on all
CPUs seeing the updated pte immediately?
I added some TLB flushing to hugetlb_mcopy_atomic_pte, but it did not make
any difference. Suggestions would be appreciated as cache/tlb/??? flushing
issues take me a while to figure out.
--
Mike Kravetz
next prev parent reply other threads:[~2022-09-29 23:34 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-23 16:05 [syzbot] general protection fault in PageHeadHuge syzbot
2022-09-23 21:11 ` Mike Kravetz
2022-09-23 21:37 ` Yang Shi
2022-09-24 0:14 ` Hugh Dickins
2022-09-24 0:18 ` Nathan Chancellor
2022-09-24 7:28 ` Hugh Dickins
2022-09-24 0:58 ` Peter Xu
2022-09-24 15:06 ` Peter Xu
2022-09-24 19:01 ` Mike Kravetz
2022-09-26 0:11 ` Peter Xu
2022-09-27 16:24 ` Mike Kravetz
2022-09-27 16:45 ` Peter Xu
2022-09-29 23:33 ` Mike Kravetz [this message]
2022-09-30 1:29 ` Peter Xu
2022-09-30 1:35 ` Peter Xu
2022-09-30 16:05 ` Peter Xu
2022-09-30 21:38 ` Mike Kravetz
2022-10-01 2:47 ` Peter Xu
2022-10-01 3:01 ` Peter Xu
2022-10-03 1:16 ` Mike Kravetz
2022-10-03 15:24 ` Peter Xu
2022-09-25 18:55 ` Andrew Morton
2022-09-24 21:56 ` Matthew Wilcox
2022-09-27 4:32 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YzYrYVeA0b9d5dos@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=llvm@lists.linux.dev \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=peterx@redhat.com \
--cc=shy828301@gmail.com \
--cc=songmuchun@bytedance.com \
--cc=syzbot+152d76c44ba142f8992b@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=trix@redhat.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.