Re: [PATCH] mm/userfaultfd: prevent busy looping for tasks with signals pending

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Johannes Weiner <hannes@cmpxchg.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] mm/userfaultfd: prevent busy looping for tasks with signals pending
Date: Thu, 24 Apr 2025 10:03:44 -0400	[thread overview]
Message-ID: <20250424140344.GA840@cmpxchg.org> (raw)
In-Reply-To: <27c3a7f5-aad8-4f2a-a66e-ff5ae98f31eb@kernel.dk>

On Wed, Apr 23, 2025 at 05:37:06PM -0600, Jens Axboe wrote:
> userfaultfd may use interruptible sleeps to wait on userspace filling
> a page fault, which works fine if the task can be reliably put to
> sleeping waiting for that. However, if the task has a normal (ie
> non-fatal) signal pending, then TASK_INTERRUPTIBLE sleep will simply
> cause schedule() to be a no-op.
> 
> For a task that registers a page with userfaultfd and then proceeds
> to do a write from it, if that task also has a signal pending then
> it'll essentially busy loop from do_page_fault() -> handle_userfault()
> until that fault has been filled. Normally it'd be expected that the
> task would sleep until that happens. Here's a trace from an application
> doing just that:
> 
> handle_userfault+0x4b8/0xa00 (P)
> hugetlb_fault+0xe24/0x1060
> handle_mm_fault+0x2bc/0x318
> do_page_fault+0x1e8/0x6f0

Makes sense. There is a fault_signal_pending() check before retrying:

static inline bool fault_signal_pending(vm_fault_t fault_flags,
                                        struct pt_regs *regs)
{
        return unlikely((fault_flags & VM_FAULT_RETRY) &&
                        (fatal_signal_pending(current) ||
                         (user_mode(regs) && signal_pending(current))));
}

Since it's an in-kernel fault, and the signal is non-fatal, it won't
stop looping until the fault is handled.

This in itself seems a bit sketchy. You have to hope there is no
dependency between handling the signal -> handling the fault inside
the userspace components.

> do_translation_fault+0x9c/0xd0
> do_mem_abort+0x44/0xa0
> el1_abort+0x3c/0x68
> el1h_64_sync_handler+0xd4/0x100
> el1h_64_sync+0x6c/0x70
> fault_in_readable+0x74/0x108 (P)
> iomap_file_buffered_write+0x14c/0x438
> blkdev_write_iter+0x1a8/0x340
> vfs_write+0x20c/0x348
> ksys_write+0x64/0x108
> __arm64_sys_write+0x1c/0x38
>
> where the task is looping with 100% CPU time in the above mentioned
> fault path.
> 
> Since it's impossible to handle signals, or other conditions like
> TIF_NOTIFY_SIGNAL that also prevents interruptible sleeping, from the
> fault path, use TASK_UNINTERRUPTIBLE with a short timeout even for vmf
> modes that would normally ask for INTERRUPTIBLE or KILLABLE sleep. Fatal
> signals will still be handled by the caller, and the timeout is short
> enough to hopefully not cause any issues. If this is the first invocation
> of this fault, eg FAULT_FLAG_TRIED isn't set, then the normal sleep mode
> is used.
>
> Cc: stable@vger.kernel.org
> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")

When this patch was first introduced, VM_FAULT_RETRY would work only
once. The second try would have FAULT_FLAG_ALLOW_RETRY cleared,
causing handle_userfault() to return VM_SIGBUS, which would bubble
through the fixup table (kernel fault), -EFAULT from
iomap_file_buffered_write() and unwind the kernel stack this way.

So I'm thinking this is the more likely commit for Fixes: and stable:

commit 4064b982706375025628094e51d11cf1a958a5d3
Author: Peter Xu <peterx@redhat.com>
Date:   Wed Apr 1 21:08:45 2020 -0700

    mm: allow VM_FAULT_RETRY for multiple times

> Reported-by: Zhiwei Jiang <qq282012236@gmail.com>
> Link: https://lore.kernel.org/io-uring/20250422162913.1242057-1-qq282012236@gmail.com/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

next prev parent reply	other threads:[~2025-04-24 14:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23 23:37 [PATCH] mm/userfaultfd: prevent busy looping for tasks with signals pending Jens Axboe
2025-04-24 14:03 ` Johannes Weiner [this message]
2025-04-24 14:54   ` Jens Axboe
2025-04-24 15:11     ` Johannes Weiner
2025-04-24 15:22       ` Jens Axboe
2025-04-24 18:26   ` Peter Xu
2025-04-24 18:40     ` Jens Axboe
2025-04-24 19:13       ` Peter Xu
2025-04-24 19:20         ` Jens Axboe
2025-04-24 19:57           ` Peter Xu
2025-05-01 16:18             ` Peter Xu
2025-05-01 16:28               ` Peter Xu
2025-04-24 19:42     ` Matthew Wilcox
2025-04-24 21:45       ` Peter Xu
2025-04-25  4:52         ` Matthew Wilcox
2025-04-25 15:44           ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250424140344.GA840@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.