linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] mm/userfaultfd: prevent busy looping for tasks with signals pending
Date: Thu, 24 Apr 2025 08:54:40 -0600	[thread overview]
Message-ID: <c68882dd-067b-4d16-8fb8-28bfdd51e627@kernel.dk> (raw)
In-Reply-To: <20250424140344.GA840@cmpxchg.org>

On 4/24/25 8:03 AM, Johannes Weiner wrote:
> On Wed, Apr 23, 2025 at 05:37:06PM -0600, Jens Axboe wrote:
>> userfaultfd may use interruptible sleeps to wait on userspace filling
>> a page fault, which works fine if the task can be reliably put to
>> sleeping waiting for that. However, if the task has a normal (ie
>> non-fatal) signal pending, then TASK_INTERRUPTIBLE sleep will simply
>> cause schedule() to be a no-op.
>>
>> For a task that registers a page with userfaultfd and then proceeds
>> to do a write from it, if that task also has a signal pending then
>> it'll essentially busy loop from do_page_fault() -> handle_userfault()
>> until that fault has been filled. Normally it'd be expected that the
>> task would sleep until that happens. Here's a trace from an application
>> doing just that:
>>
>> handle_userfault+0x4b8/0xa00 (P)
>> hugetlb_fault+0xe24/0x1060
>> handle_mm_fault+0x2bc/0x318
>> do_page_fault+0x1e8/0x6f0
> 
> Makes sense. There is a fault_signal_pending() check before retrying:
> 
> static inline bool fault_signal_pending(vm_fault_t fault_flags,
>                                         struct pt_regs *regs)
> {
>         return unlikely((fault_flags & VM_FAULT_RETRY) &&
>                         (fatal_signal_pending(current) ||
>                          (user_mode(regs) && signal_pending(current))));
> }
> 
> Since it's an in-kernel fault, and the signal is non-fatal, it won't
> stop looping until the fault is handled.
> 
> This in itself seems a bit sketchy. You have to hope there is no
> dependency between handling the signal -> handling the fault inside
> the userspace components.

Indeed... But that's generic userfaultfd sketchiness, not really related
to this patch.

> 
>> do_translation_fault+0x9c/0xd0
>> do_mem_abort+0x44/0xa0
>> el1_abort+0x3c/0x68
>> el1h_64_sync_handler+0xd4/0x100
>> el1h_64_sync+0x6c/0x70
>> fault_in_readable+0x74/0x108 (P)
>> iomap_file_buffered_write+0x14c/0x438
>> blkdev_write_iter+0x1a8/0x340
>> vfs_write+0x20c/0x348
>> ksys_write+0x64/0x108
>> __arm64_sys_write+0x1c/0x38
>>
>> where the task is looping with 100% CPU time in the above mentioned
>> fault path.
>>
>> Since it's impossible to handle signals, or other conditions like
>> TIF_NOTIFY_SIGNAL that also prevents interruptible sleeping, from the
>> fault path, use TASK_UNINTERRUPTIBLE with a short timeout even for vmf
>> modes that would normally ask for INTERRUPTIBLE or KILLABLE sleep. Fatal
>> signals will still be handled by the caller, and the timeout is short
>> enough to hopefully not cause any issues. If this is the first invocation
>> of this fault, eg FAULT_FLAG_TRIED isn't set, then the normal sleep mode
>> is used.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
> 
> When this patch was first introduced, VM_FAULT_RETRY would work only
> once. The second try would have FAULT_FLAG_ALLOW_RETRY cleared,
> causing handle_userfault() to return VM_SIGBUS, which would bubble
> through the fixup table (kernel fault), -EFAULT from
> iomap_file_buffered_write() and unwind the kernel stack this way.
> 
> So I'm thinking this is the more likely commit for Fixes: and stable:
> 
> commit 4064b982706375025628094e51d11cf1a958a5d3
> Author: Peter Xu <peterx@redhat.com>
> Date:   Wed Apr 1 21:08:45 2020 -0700
> 
>     mm: allow VM_FAULT_RETRY for multiple times

Thanks for checking that - yep that sounds fine to me, we can adjust the
fixes tag appropriately.

>> Reported-by: Zhiwei Jiang <qq282012236@gmail.com>
>> Link: https://lore.kernel.org/io-uring/20250422162913.1242057-1-qq282012236@gmail.com/
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Thanks!

-- 
Jens Axboe

  reply	other threads:[~2025-04-24 14:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23 23:37 [PATCH] mm/userfaultfd: prevent busy looping for tasks with signals pending Jens Axboe
2025-04-24 14:03 ` Johannes Weiner
2025-04-24 14:54   ` Jens Axboe [this message]
2025-04-24 15:11     ` Johannes Weiner
2025-04-24 15:22       ` Jens Axboe
2025-04-24 18:26   ` Peter Xu
2025-04-24 18:40     ` Jens Axboe
2025-04-24 19:13       ` Peter Xu
2025-04-24 19:20         ` Jens Axboe
2025-04-24 19:57           ` Peter Xu
2025-05-01 16:18             ` Peter Xu
2025-05-01 16:28               ` Peter Xu
2025-04-24 19:42     ` Matthew Wilcox
2025-04-24 21:45       ` Peter Xu
2025-04-25  4:52         ` Matthew Wilcox
2025-04-25 15:44           ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c68882dd-067b-4d16-8fb8-28bfdd51e627@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).