From: Florian Weimer <fweimer@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "André Almeida" <andrealmeid@igalia.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Darren Hart" <dvhart@infradead.org>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"Arnd Bergmann" <arnd@arndb.de>,
sonicadvance1@gmail.com, linux-kernel@vger.kernel.org,
kernel-dev@igalia.com, linux-api@vger.kernel.org,
"Nathan Chancellor" <nathan@kernel.org>
Subject: Re: [PATCH v2 0/3] futex: Create set_robust_list2
Date: Mon, 04 Nov 2024 13:36:43 +0100 [thread overview]
Message-ID: <87jzdjxjj8.fsf@oldenburg3.str.redhat.com> (raw)
In-Reply-To: <20241104113240.GB24862@noisy.programming.kicks-ass.net> (Peter Zijlstra's message of "Mon, 4 Nov 2024 12:32:40 +0100")
* Peter Zijlstra:
> On Sat, Nov 02, 2024 at 10:58:42PM +0100, Florian Weimer wrote:
>
>> QEMU hints towards further problems (in linux-user/syscall.c):
>>
>> case TARGET_NR_set_robust_list:
>> case TARGET_NR_get_robust_list:
>> /* The ABI for supporting robust futexes has userspace pass
>> * the kernel a pointer to a linked list which is updated by
>> * userspace after the syscall; the list is walked by the kernel
>> * when the thread exits. Since the linked list in QEMU guest
>> * memory isn't a valid linked list for the host and we have
>> * no way to reliably intercept the thread-death event, we can't
>> * support these. Silently return ENOSYS so that guest userspace
>> * falls back to a non-robust futex implementation (which should
>> * be OK except in the corner case of the guest crashing while
>> * holding a mutex that is shared with another process via
>> * shared memory).
>> */
>> return -TARGET_ENOSYS;
>
> I don't think we can sanely fix that. Can't QEMU track the robust thing
> itself and use waitpid() to discover the thread is gone and fudge things
> from there?
There are race conditions with munmap, I think, and they probably get a
lot of worse if QEMU does that.
See Rich Felker's bug report:
| The corruption is performed by the kernel when it walks the robust
| list. The basic situation is the same as in PR #13690, except that
| here there's actually a potential write to the memory rather than just
| a read.
|
| The sequence of events leading to corruption goes like this:
|
| 1. Thread A unlocks the process-shared, robust mutex and is preempted
| after the mutex is removed from the robust list and atomically
| unlocked, but before it's removed from the list_op_pending field of
| the robust list header.
|
| 2. Thread B locks the mutex, and, knowing by program logic that it's
| the last user of the mutex, unlocks and unmaps it, allocates/maps
| something else that gets assigned the same address as the shared mutex
| mapping, and then exits.
|
| 3. The kernel destroys the process, which involves walking each
| thread's robust list and processing each thread's list_op_pending
| field of the robust list header. Since thread A has a list_op_pending
| pointing at the address previously occupied by the mutex, the kernel
| obliviously "unlocks the mutex" by writing a 0 to the address and
| futex-waking it. However, the kernel has instead overwritten part of
| whatever mapping thread A created. If this is private memory it
| (probably) doesn't matter since the process is ending anyway (but are
| there race conditions where this can be seen?). If this is shared
| memory or a shared file mapping, however, the kernel corrupts it.
|
| I suspect the race is difficult to hit since thread A has to get
| preempted at exactly the wrong time AND thread B has to do a fair
| amount of work without thread A getting scheduled again. So I'm not
| sure how much luck we'd have getting a test case.
<https://sourceware.org/bugzilla/show_bug.cgi?id=14485#c3>
We also have a silent unlocking failure because userspace does not know
about ROBUST_LIST_LIMIT:
Bug 19089 - Robust mutexes do not take ROBUST_LIST_LIMIT into account
<https://sourceware.org/bugzilla/show_bug.cgi?id=19089>
(I think we may have discussed this one before, and you may have
suggested to just hard-code 2048 in userspace because the constant is
not expected to change.)
So the in-mutex linked list has quite a few problems even outside of
emulation. 8-(
Thanks,
Florian
next prev parent reply other threads:[~2024-11-04 12:37 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-01 16:21 [PATCH v2 0/3] futex: Create set_robust_list2 André Almeida
2024-11-01 16:21 ` [PATCH v2 1/3] futex: Use explicit sizes for compat_exit_robust_list André Almeida
2024-11-02 5:44 ` kernel test robot
2024-11-02 14:57 ` kernel test robot
2024-11-01 16:21 ` [PATCH v2 2/3] futex: Create set_robust_list2 André Almeida
2024-11-02 16:40 ` kernel test robot
2024-11-04 11:22 ` Peter Zijlstra
2024-11-04 21:55 ` André Almeida
2024-11-05 12:10 ` Peter Zijlstra
2024-11-01 16:21 ` [PATCH v2 3/3] futex: Wire up set_robust_list2 syscall André Almeida
2024-11-02 5:13 ` kernel test robot
2024-11-02 6:05 ` kernel test robot
2024-11-02 21:58 ` [PATCH v2 0/3] futex: Create set_robust_list2 Florian Weimer
2024-11-04 11:32 ` Peter Zijlstra
2024-11-04 11:56 ` Peter Zijlstra
2024-11-04 12:36 ` Florian Weimer [this message]
2024-11-05 12:18 ` Peter Zijlstra
2024-11-04 21:49 ` André Almeida
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jzdjxjj8.fsf@oldenburg3.str.redhat.com \
--to=fweimer@redhat.com \
--cc=andrealmeid@igalia.com \
--cc=arnd@arndb.de \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=kernel-dev@igalia.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nathan@kernel.org \
--cc=peterz@infradead.org \
--cc=sonicadvance1@gmail.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox