From: Catalin Marinas <catalin.marinas@arm.com>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Will Deacon <will@kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
Date: Mon, 29 Nov 2021 15:36:41 +0000 [thread overview]
Message-ID: <YaTziROgnFwB6Ddj@arm.com> (raw)
In-Reply-To: <CAHc6FU6zVi9A2D3V3T5zE71YAdkBiJTs0ao1Q6ysSuEp=bz8fQ@mail.gmail.com>
On Mon, Nov 29, 2021 at 02:33:42PM +0100, Andreas Gruenbacher wrote:
> On Mon, Nov 29, 2021 at 1:22 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Sat, Nov 27, 2021 at 07:05:39PM +0100, Andreas Gruenbacher wrote:
> > > We also still have fault_in_safe_writeable which is more difficult to
> > > fix, and fault_in_readable which we don't want to leave behind broken,
> > > either.
> >
> > fault_in_safe_writeable() can be done by using get_user() instead of
> > put_user() for arm64 MTE and probably SPARC ADI (an alternative is to
> > read the in-memory tags and compare them with the pointer).
>
> So we'd keep the existing fault_in_safe_writeable() logic for the
> actual fault-in and use get_user() to check for sub-page faults? If
> so, then that should probably also be hidden in arch code.
That's what this series does when it probes the whole range in
fault_in_writeable(). The main reason was that it's more efficient to do
a read than a write on a large range (the latter dirtying the cache
lines).
> > For CHERI, that's different again since the fault_in_safe_writeable capability
> > encodes the read/write permissions independently.
> >
> > However, do we actually want to change the fault_in_safe_writeable() and
> > fault_in_readable() functions at this stage? I could not get any of them
> > to live-lock, though I only tried btrfs, ext4 and gfs2. As per the
> > earlier discussion, normal files accesses are guaranteed to make
> > progress. The only problematic one was O_DIRECT which seems to be
> > alright for the above filesystems (the fs either bails out after several
> > attempts or uses GUP to read which skips the uaccess altogether).
>
> Only gfs2 uses fault_in_safe_writeable(). For buffered reads, progress
> is guaranteed because failures are at a byte granularity.
>
> O_DIRECT reads and writes happen in device block size granularity, but
> the pages are grabbed with get_user_pages() before the copying
> happens. So by the time the copying happens, the pages are guaranteed
> to be resident, and we don't need to loop around fault_in_*().
For file reads, I couldn't triggered any mismatched tag faults with gfs2
and O_DIRECT, so it matches your description above. For file writes it
does trigger such faults, so I suspect it doesn't always use
get_user_pages() for writes. No live-lock though with the vanilla
kernel. My test uses a page with some mismatched tags in the middle.
ext4: no tag faults with O_DIRECT read/write irrespective of whether the
user buffer is page aligned or not.
btrfs: O_DIRECT file writes - no faults on page-aligned buffers, faults
on unaligned; file reads - tag faults on both aligned/unaligned buffers.
No live-lock.
So, some tag faults still happen even with O_DIRECT|O_SYNC but the
filesystems too care of continuous faulting.
> You've mentioned before that copying to/from struct page bypasses
> sub-page fault checking. If that is the case, then the checking
> probably needs to happen in iomap_dio_bio_iter and dio_refill_pages
> instead.
It's too expensive and not really worth it. With a buffered access, the
uaccess takes care of checking at the time of load/store (the hardware
does this for us). With a GUP, the access is done via the kernel mapping
with a match-all tag to avoid faults (kernel panic). We set the ABI
expectation some time ago that kernel accesses to user memory may not
always be tag-checked if the access is done via a GUP'ed page.
> > Happy to address them if there is a real concern, I just couldn't trigger it.
>
> Hopefully it should now be clear why you couldn't. One way of
> reproducing with fault_in_safe_writeable() would be to use that in
> btrfs instead of fault_in_writeable(), of course.
Yes, that would trigger it again. I guess if we want to make this API
safer in general, we can add the checks to the other functions. Only
probing a few bytes at the start shouldn't cause a performance issue.
Thanks.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Will Deacon <will@kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
Date: Mon, 29 Nov 2021 15:36:41 +0000 [thread overview]
Message-ID: <YaTziROgnFwB6Ddj@arm.com> (raw)
In-Reply-To: <CAHc6FU6zVi9A2D3V3T5zE71YAdkBiJTs0ao1Q6ysSuEp=bz8fQ@mail.gmail.com>
On Mon, Nov 29, 2021 at 02:33:42PM +0100, Andreas Gruenbacher wrote:
> On Mon, Nov 29, 2021 at 1:22 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Sat, Nov 27, 2021 at 07:05:39PM +0100, Andreas Gruenbacher wrote:
> > > We also still have fault_in_safe_writeable which is more difficult to
> > > fix, and fault_in_readable which we don't want to leave behind broken,
> > > either.
> >
> > fault_in_safe_writeable() can be done by using get_user() instead of
> > put_user() for arm64 MTE and probably SPARC ADI (an alternative is to
> > read the in-memory tags and compare them with the pointer).
>
> So we'd keep the existing fault_in_safe_writeable() logic for the
> actual fault-in and use get_user() to check for sub-page faults? If
> so, then that should probably also be hidden in arch code.
That's what this series does when it probes the whole range in
fault_in_writeable(). The main reason was that it's more efficient to do
a read than a write on a large range (the latter dirtying the cache
lines).
> > For CHERI, that's different again since the fault_in_safe_writeable capability
> > encodes the read/write permissions independently.
> >
> > However, do we actually want to change the fault_in_safe_writeable() and
> > fault_in_readable() functions at this stage? I could not get any of them
> > to live-lock, though I only tried btrfs, ext4 and gfs2. As per the
> > earlier discussion, normal files accesses are guaranteed to make
> > progress. The only problematic one was O_DIRECT which seems to be
> > alright for the above filesystems (the fs either bails out after several
> > attempts or uses GUP to read which skips the uaccess altogether).
>
> Only gfs2 uses fault_in_safe_writeable(). For buffered reads, progress
> is guaranteed because failures are at a byte granularity.
>
> O_DIRECT reads and writes happen in device block size granularity, but
> the pages are grabbed with get_user_pages() before the copying
> happens. So by the time the copying happens, the pages are guaranteed
> to be resident, and we don't need to loop around fault_in_*().
For file reads, I couldn't triggered any mismatched tag faults with gfs2
and O_DIRECT, so it matches your description above. For file writes it
does trigger such faults, so I suspect it doesn't always use
get_user_pages() for writes. No live-lock though with the vanilla
kernel. My test uses a page with some mismatched tags in the middle.
ext4: no tag faults with O_DIRECT read/write irrespective of whether the
user buffer is page aligned or not.
btrfs: O_DIRECT file writes - no faults on page-aligned buffers, faults
on unaligned; file reads - tag faults on both aligned/unaligned buffers.
No live-lock.
So, some tag faults still happen even with O_DIRECT|O_SYNC but the
filesystems too care of continuous faulting.
> You've mentioned before that copying to/from struct page bypasses
> sub-page fault checking. If that is the case, then the checking
> probably needs to happen in iomap_dio_bio_iter and dio_refill_pages
> instead.
It's too expensive and not really worth it. With a buffered access, the
uaccess takes care of checking at the time of load/store (the hardware
does this for us). With a GUP, the access is done via the kernel mapping
with a match-all tag to avoid faults (kernel panic). We set the ABI
expectation some time ago that kernel accesses to user memory may not
always be tag-checked if the access is done via a GUP'ed page.
> > Happy to address them if there is a real concern, I just couldn't trigger it.
>
> Hopefully it should now be clear why you couldn't. One way of
> reproducing with fault_in_safe_writeable() would be to use that in
> btrfs instead of fault_in_writeable(), of course.
Yes, that would trigger it again. I guess if we want to make this API
safer in general, we can add the checks to the other functions. Only
probing a few bytes at the start shouldn't cause a performance issue.
Thanks.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-11-29 15:38 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-24 19:20 [PATCH 0/3] Avoid live-lock in fault-in+uaccess loops with sub-page faults Catalin Marinas
2021-11-24 19:20 ` Catalin Marinas
2021-11-24 19:20 ` [PATCH 1/3] mm: Introduce fault_in_exact_writeable() to probe for " Catalin Marinas
2021-11-24 19:20 ` Catalin Marinas
2021-11-24 19:20 ` [PATCH 2/3] arm64: Add support for sub-page faults user probing Catalin Marinas
2021-11-24 19:20 ` Catalin Marinas
2021-11-24 19:20 ` [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Catalin Marinas
2021-11-24 19:20 ` Catalin Marinas
2021-11-24 20:03 ` Matthew Wilcox
2021-11-24 20:03 ` Matthew Wilcox
2021-11-24 20:37 ` Catalin Marinas
2021-11-24 20:37 ` Catalin Marinas
2021-11-25 22:25 ` Andreas Gruenbacher
2021-11-25 22:25 ` Andreas Gruenbacher
2021-11-25 22:42 ` Catalin Marinas
2021-11-25 22:42 ` Catalin Marinas
2021-11-26 22:29 ` Andreas Gruenbacher
2021-11-26 22:29 ` Andreas Gruenbacher
2021-11-26 22:57 ` Catalin Marinas
2021-11-26 22:57 ` Catalin Marinas
2021-11-27 3:52 ` Andreas Gruenbacher
2021-11-27 3:52 ` Andreas Gruenbacher
2021-11-27 12:39 ` Andreas Gruenbacher
2021-11-27 12:39 ` Andreas Gruenbacher
2021-11-27 15:21 ` Catalin Marinas
2021-11-27 15:21 ` Catalin Marinas
2021-11-27 18:05 ` Andreas Gruenbacher
2021-11-27 18:05 ` Andreas Gruenbacher
2021-11-29 12:16 ` Catalin Marinas
2021-11-29 12:16 ` Catalin Marinas
2021-11-29 13:33 ` Andreas Gruenbacher
2021-11-29 13:33 ` Andreas Gruenbacher
2021-11-29 15:36 ` Catalin Marinas [this message]
2021-11-29 15:36 ` Catalin Marinas
2021-11-29 18:40 ` Linus Torvalds
2021-11-29 18:40 ` Linus Torvalds
2021-11-29 19:31 ` Andreas Gruenbacher
2021-11-29 19:31 ` Andreas Gruenbacher
2021-11-29 20:56 ` Catalin Marinas
2021-11-29 20:56 ` Catalin Marinas
2021-11-29 21:53 ` Linus Torvalds
2021-11-29 21:53 ` Linus Torvalds
2021-11-29 23:12 ` Catalin Marinas
2021-11-29 23:12 ` Catalin Marinas
2021-11-29 13:52 ` Catalin Marinas
2021-11-29 13:52 ` Catalin Marinas
2021-11-27 14:33 ` Catalin Marinas
2021-11-27 14:33 ` Catalin Marinas
2021-11-24 23:00 ` Linus Torvalds
2021-11-24 23:00 ` Linus Torvalds
2021-11-25 11:10 ` Catalin Marinas
2021-11-25 11:10 ` Catalin Marinas
2021-11-25 18:13 ` Linus Torvalds
2021-11-25 18:13 ` Linus Torvalds
2021-11-25 20:43 ` Catalin Marinas
2021-11-25 20:43 ` Catalin Marinas
2021-11-25 21:02 ` Matthew Wilcox
2021-11-25 21:02 ` Matthew Wilcox
2021-11-25 21:29 ` Catalin Marinas
2021-11-25 21:29 ` Catalin Marinas
2021-11-25 21:40 ` Andreas Gruenbacher
2021-11-25 21:40 ` Andreas Gruenbacher
2021-11-26 16:42 ` David Sterba
2021-11-26 16:42 ` David Sterba
2021-11-24 21:36 ` [PATCH 0/3] Avoid live-lock in fault-in+uaccess loops " Andrew Morton
2021-11-24 21:36 ` Andrew Morton
2021-11-24 22:31 ` Catalin Marinas
2021-11-24 22:31 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YaTziROgnFwB6Ddj@arm.com \
--to=catalin.marinas@arm.com \
--cc=agruenba@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.