All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Will Deacon <will@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
Date: Mon, 29 Nov 2021 20:56:08 +0000	[thread overview]
Message-ID: <YaU+aDG5pCAba57r@arm.com> (raw)
In-Reply-To: <CAHk-=wiZgAgcynfLsop+D1xBUAZ-Z+NUBxe9mb-AedecFRNm+w@mail.gmail.com>

On Mon, Nov 29, 2021 at 10:40:38AM -0800, Linus Torvalds wrote:
> On Mon, Nov 29, 2021 at 7:36 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > That's what this series does when it probes the whole range in
> > fault_in_writeable(). The main reason was that it's more efficient to do
> > a read than a write on a large range (the latter dirtying the cache
> > lines).
> 
> The more this thread goes on, the more I'm starting to think that we
> should just make "fault_in_writable()" (and readable, of course) only
> really work on the beginning of the area.
> 
> Not just for the finer-granularity pointer color probing, but for the
> page probing too.

I have patches for the finer-granularity checking of the beginning of
the buffer. They need a bit of testing, so probably posting them
tomorrow.

> I'm looking at our current fault_in_writeable(), and I'm going
> 
>  (a) it uses __put_user() without range checks, which is really not great

For arm64 at least __put_user() does the access_ok() check. I thought
only unsafe_put_user() should skip the checks. If __put_user() can write
arbitrary memory, we may have a bigger problem.

>  (b) it looks like a disaster from another standpoint: essentially
> user-controlled loop size with no limit checking, no preemption, and
> no check for fatal signals.

Indeed, the fault_in_*() loop can get pretty long, bounded by how much
memory can be faulted in the user process. My patches for now only
address the outer loop doing the copy_to_user() as that can be
unbounded.

> Now, (a) should be fixed with a access_ok() or similar.
> 
> And (b) can easily be fixed multiple ways, with one option simply just
> being adding a can_resched() call and checking for fatal signals.
> 
> But faulting in the whole region is actually fundamentally wrong in
> low-memory situations - the beginning of the region might be swapped
> out by the time we get to the end. That's unlikely to be a problem in
> real life, but it's an example of how it's simply not conceptually
> sensible.
> 
> So I do wonder why we don't just say "fault_in_writable will fault in
> _at_most_ X bytes", and simply limit the actual fault-in size to
> something reasonable.
> 
> That solves _all_ the problems. It solves the lack of preemption and
> fatal signals (by virtue of just limiting the amount of work we do).
> It solves the low memory situation. And it solves the "excessive dirty
> cachelines" case too.

I think that would be useful, though it doesn't solve the potential
livelock with sub-page faults. We still need the outer loop to
handle the copy_to_user() for the whole user buffer and the sub-page
probing of the beginning of such buffer (or whenever copy_to_user()
failed). IOW, you still fault in the whole buffer eventually.

Anyway, I think the sub-page probing and limiting the fault-in are
complementary improvements.

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Will Deacon <will@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
Date: Mon, 29 Nov 2021 20:56:08 +0000	[thread overview]
Message-ID: <YaU+aDG5pCAba57r@arm.com> (raw)
In-Reply-To: <CAHk-=wiZgAgcynfLsop+D1xBUAZ-Z+NUBxe9mb-AedecFRNm+w@mail.gmail.com>

On Mon, Nov 29, 2021 at 10:40:38AM -0800, Linus Torvalds wrote:
> On Mon, Nov 29, 2021 at 7:36 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > That's what this series does when it probes the whole range in
> > fault_in_writeable(). The main reason was that it's more efficient to do
> > a read than a write on a large range (the latter dirtying the cache
> > lines).
> 
> The more this thread goes on, the more I'm starting to think that we
> should just make "fault_in_writable()" (and readable, of course) only
> really work on the beginning of the area.
> 
> Not just for the finer-granularity pointer color probing, but for the
> page probing too.

I have patches for the finer-granularity checking of the beginning of
the buffer. They need a bit of testing, so probably posting them
tomorrow.

> I'm looking at our current fault_in_writeable(), and I'm going
> 
>  (a) it uses __put_user() without range checks, which is really not great

For arm64 at least __put_user() does the access_ok() check. I thought
only unsafe_put_user() should skip the checks. If __put_user() can write
arbitrary memory, we may have a bigger problem.

>  (b) it looks like a disaster from another standpoint: essentially
> user-controlled loop size with no limit checking, no preemption, and
> no check for fatal signals.

Indeed, the fault_in_*() loop can get pretty long, bounded by how much
memory can be faulted in the user process. My patches for now only
address the outer loop doing the copy_to_user() as that can be
unbounded.

> Now, (a) should be fixed with a access_ok() or similar.
> 
> And (b) can easily be fixed multiple ways, with one option simply just
> being adding a can_resched() call and checking for fatal signals.
> 
> But faulting in the whole region is actually fundamentally wrong in
> low-memory situations - the beginning of the region might be swapped
> out by the time we get to the end. That's unlikely to be a problem in
> real life, but it's an example of how it's simply not conceptually
> sensible.
> 
> So I do wonder why we don't just say "fault_in_writable will fault in
> _at_most_ X bytes", and simply limit the actual fault-in size to
> something reasonable.
> 
> That solves _all_ the problems. It solves the lack of preemption and
> fatal signals (by virtue of just limiting the amount of work we do).
> It solves the low memory situation. And it solves the "excessive dirty
> cachelines" case too.

I think that would be useful, though it doesn't solve the potential
livelock with sub-page faults. We still need the outer loop to
handle the copy_to_user() for the whole user buffer and the sub-page
probing of the beginning of such buffer (or whenever copy_to_user()
failed). IOW, you still fault in the whole buffer eventually.

Anyway, I think the sub-page probing and limiting the fault-in are
complementary improvements.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-11-29 23:00 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-24 19:20 [PATCH 0/3] Avoid live-lock in fault-in+uaccess loops with sub-page faults Catalin Marinas
2021-11-24 19:20 ` Catalin Marinas
2021-11-24 19:20 ` [PATCH 1/3] mm: Introduce fault_in_exact_writeable() to probe for " Catalin Marinas
2021-11-24 19:20   ` Catalin Marinas
2021-11-24 19:20 ` [PATCH 2/3] arm64: Add support for sub-page faults user probing Catalin Marinas
2021-11-24 19:20   ` Catalin Marinas
2021-11-24 19:20 ` [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Catalin Marinas
2021-11-24 19:20   ` Catalin Marinas
2021-11-24 20:03   ` Matthew Wilcox
2021-11-24 20:03     ` Matthew Wilcox
2021-11-24 20:37     ` Catalin Marinas
2021-11-24 20:37       ` Catalin Marinas
2021-11-25 22:25       ` Andreas Gruenbacher
2021-11-25 22:25         ` Andreas Gruenbacher
2021-11-25 22:42         ` Catalin Marinas
2021-11-25 22:42           ` Catalin Marinas
2021-11-26 22:29           ` Andreas Gruenbacher
2021-11-26 22:29             ` Andreas Gruenbacher
2021-11-26 22:57             ` Catalin Marinas
2021-11-26 22:57               ` Catalin Marinas
2021-11-27  3:52               ` Andreas Gruenbacher
2021-11-27  3:52                 ` Andreas Gruenbacher
2021-11-27 12:39                 ` Andreas Gruenbacher
2021-11-27 12:39                   ` Andreas Gruenbacher
2021-11-27 15:21                   ` Catalin Marinas
2021-11-27 15:21                     ` Catalin Marinas
2021-11-27 18:05                     ` Andreas Gruenbacher
2021-11-27 18:05                       ` Andreas Gruenbacher
2021-11-29 12:16                       ` Catalin Marinas
2021-11-29 12:16                         ` Catalin Marinas
2021-11-29 13:33                         ` Andreas Gruenbacher
2021-11-29 13:33                           ` Andreas Gruenbacher
2021-11-29 15:36                           ` Catalin Marinas
2021-11-29 15:36                             ` Catalin Marinas
2021-11-29 18:40                             ` Linus Torvalds
2021-11-29 18:40                               ` Linus Torvalds
2021-11-29 19:31                               ` Andreas Gruenbacher
2021-11-29 19:31                                 ` Andreas Gruenbacher
2021-11-29 20:56                               ` Catalin Marinas [this message]
2021-11-29 20:56                                 ` Catalin Marinas
2021-11-29 21:53                                 ` Linus Torvalds
2021-11-29 21:53                                   ` Linus Torvalds
2021-11-29 23:12                                   ` Catalin Marinas
2021-11-29 23:12                                     ` Catalin Marinas
2021-11-29 13:52                       ` Catalin Marinas
2021-11-29 13:52                         ` Catalin Marinas
2021-11-27 14:33                 ` Catalin Marinas
2021-11-27 14:33                   ` Catalin Marinas
2021-11-24 23:00     ` Linus Torvalds
2021-11-24 23:00       ` Linus Torvalds
2021-11-25 11:10       ` Catalin Marinas
2021-11-25 11:10         ` Catalin Marinas
2021-11-25 18:13         ` Linus Torvalds
2021-11-25 18:13           ` Linus Torvalds
2021-11-25 20:43           ` Catalin Marinas
2021-11-25 20:43             ` Catalin Marinas
2021-11-25 21:02             ` Matthew Wilcox
2021-11-25 21:02               ` Matthew Wilcox
2021-11-25 21:29               ` Catalin Marinas
2021-11-25 21:29                 ` Catalin Marinas
2021-11-25 21:40               ` Andreas Gruenbacher
2021-11-25 21:40                 ` Andreas Gruenbacher
2021-11-26 16:42   ` David Sterba
2021-11-26 16:42     ` David Sterba
2021-11-24 21:36 ` [PATCH 0/3] Avoid live-lock in fault-in+uaccess loops " Andrew Morton
2021-11-24 21:36   ` Andrew Morton
2021-11-24 22:31   ` Catalin Marinas
2021-11-24 22:31     ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YaU+aDG5pCAba57r@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=agruenba@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.