linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Will Deacon <will@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
Date: Mon, 29 Nov 2021 15:36:41 +0000	[thread overview]
Message-ID: <YaTziROgnFwB6Ddj@arm.com> (raw)
In-Reply-To: <CAHc6FU6zVi9A2D3V3T5zE71YAdkBiJTs0ao1Q6ysSuEp=bz8fQ@mail.gmail.com>

On Mon, Nov 29, 2021 at 02:33:42PM +0100, Andreas Gruenbacher wrote:
> On Mon, Nov 29, 2021 at 1:22 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Sat, Nov 27, 2021 at 07:05:39PM +0100, Andreas Gruenbacher wrote:
> > > We also still have fault_in_safe_writeable which is more difficult to
> > > fix, and fault_in_readable which we don't want to leave behind broken,
> > > either.
> >
> > fault_in_safe_writeable() can be done by using get_user() instead of
> > put_user() for arm64 MTE and probably SPARC ADI (an alternative is to
> > read the in-memory tags and compare them with the pointer).
> 
> So we'd keep the existing fault_in_safe_writeable() logic for the
> actual fault-in and use get_user() to check for sub-page faults? If
> so, then that should probably also be hidden in arch code.

That's what this series does when it probes the whole range in
fault_in_writeable(). The main reason was that it's more efficient to do
a read than a write on a large range (the latter dirtying the cache
lines).

> > For CHERI, that's different again since the fault_in_safe_writeable capability
> > encodes the read/write permissions independently.
> >
> > However, do we actually want to change the fault_in_safe_writeable() and
> > fault_in_readable() functions at this stage? I could not get any of them
> > to live-lock, though I only tried btrfs, ext4 and gfs2. As per the
> > earlier discussion, normal files accesses are guaranteed to make
> > progress. The only problematic one was O_DIRECT which seems to be
> > alright for the above filesystems (the fs either bails out after several
> > attempts or uses GUP to read which skips the uaccess altogether).
> 
> Only gfs2 uses fault_in_safe_writeable(). For buffered reads, progress
> is guaranteed because failures are at a byte granularity.
> 
> O_DIRECT reads and writes happen in device block size granularity, but
> the pages are grabbed with get_user_pages() before the copying
> happens. So by the time the copying happens, the pages are guaranteed
> to be resident, and we don't need to loop around fault_in_*().

For file reads, I couldn't triggered any mismatched tag faults with gfs2
and O_DIRECT, so it matches your description above. For file writes it
does trigger such faults, so I suspect it doesn't always use
get_user_pages() for writes. No live-lock though with the vanilla
kernel. My test uses a page with some mismatched tags in the middle.

ext4: no tag faults with O_DIRECT read/write irrespective of whether the
user buffer is page aligned or not.

btrfs: O_DIRECT file writes - no faults on page-aligned buffers, faults
on unaligned; file reads - tag faults on both aligned/unaligned buffers.
No live-lock.

So, some tag faults still happen even with O_DIRECT|O_SYNC but the
filesystems too care of continuous faulting.

> You've mentioned before that copying to/from struct page bypasses
> sub-page fault checking. If that is the case, then the checking
> probably needs to happen in iomap_dio_bio_iter and dio_refill_pages
> instead.

It's too expensive and not really worth it. With a buffered access, the
uaccess takes care of checking at the time of load/store (the hardware
does this for us). With a GUP, the access is done via the kernel mapping
with a match-all tag to avoid faults (kernel panic). We set the ABI
expectation some time ago that kernel accesses to user memory may not
always be tag-checked if the access is done via a GUP'ed page.

> > Happy to address them if there is a real concern, I just couldn't trigger it.
> 
> Hopefully it should now be clear why you couldn't. One way of
> reproducing with fault_in_safe_writeable() would be to use that in
> btrfs instead of fault_in_writeable(), of course.

Yes, that would trigger it again. I guess if we want to make this API
safer in general, we can add the checks to the other functions. Only
probing a few bytes at the start shouldn't cause a performance issue.

Thanks.

-- 
Catalin

  reply	other threads:[~2021-11-29 15:38 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-24 19:20 [PATCH 0/3] Avoid live-lock in fault-in+uaccess loops with sub-page faults Catalin Marinas
2021-11-24 19:20 ` [PATCH 1/3] mm: Introduce fault_in_exact_writeable() to probe for " Catalin Marinas
2021-11-24 19:20 ` [PATCH 2/3] arm64: Add support for sub-page faults user probing Catalin Marinas
2021-11-24 19:20 ` [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Catalin Marinas
2021-11-24 20:03   ` Matthew Wilcox
2021-11-24 20:37     ` Catalin Marinas
2021-11-25 22:25       ` Andreas Gruenbacher
2021-11-25 22:42         ` Catalin Marinas
2021-11-26 22:29         ` Andreas Gruenbacher
2021-11-26 22:57           ` Catalin Marinas
2021-11-27  3:52             ` Andreas Gruenbacher
2021-11-27 14:33               ` Catalin Marinas
2021-11-27 12:39         ` Andreas Gruenbacher
2021-11-27 15:21           ` Catalin Marinas
2021-11-27 18:05             ` Andreas Gruenbacher
2021-11-29 12:16               ` Catalin Marinas
2021-11-29 13:33                 ` Andreas Gruenbacher
2021-11-29 15:36                   ` Catalin Marinas [this message]
2021-11-29 18:40                     ` Linus Torvalds
2021-11-29 19:31                       ` Andreas Gruenbacher
2021-11-29 20:56                       ` Catalin Marinas
2021-11-29 21:53                         ` Linus Torvalds
2021-11-29 23:12                           ` Catalin Marinas
2021-11-29 13:52               ` Catalin Marinas
2021-11-24 23:00     ` Linus Torvalds
2021-11-25 11:10       ` Catalin Marinas
2021-11-25 18:13         ` Linus Torvalds
2021-11-25 20:43           ` Catalin Marinas
2021-11-25 21:02             ` Matthew Wilcox
2021-11-25 21:29               ` Catalin Marinas
2021-11-25 21:40               ` Andreas Gruenbacher
2021-11-26 16:42   ` David Sterba
2021-11-24 21:36 ` [PATCH 0/3] Avoid live-lock in fault-in+uaccess loops " Andrew Morton
2021-11-24 22:31   ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YaTziROgnFwB6Ddj@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=agruenba@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).