* [PATCH] bcachefs: fix iov_iter count underflow on sub-block dio read
@ 2024-02-05 15:48 Brian Foster
2024-02-05 19:15 ` Kent Overstreet
0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2024-02-05 15:48 UTC (permalink / raw)
To: linux-bcachefs
bch2_direct_IO_read() checks the request offset and size for sector
alignment and then falls through to a couple calculations to shrink
the size of the request based on the inode size. The problem is that
these checks round up to the fs block size, which runs the risk of
underflowing iter->count if the block size happens to be large
enough. This is triggered by fstest generic/361 with a 4k block
size, which subsequently leads to a crash.
After some discussion, the original purpose of the shorten logic in this
path is not totally clear. It appears to be intended as an optimization
of limited value, so simplify things and avoid the underflow problem by
removing it.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
Note that I left the ret variable name alone because it seemed to
bother me less after we realized it is actually the dio return
value.
Brian
fs/bcachefs/fs-io-direct.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/fs/bcachefs/fs-io-direct.c b/fs/bcachefs/fs-io-direct.c
index e3b219e19e10..53f6e8a939d5 100644
--- a/fs/bcachefs/fs-io-direct.c
+++ b/fs/bcachefs/fs-io-direct.c
@@ -72,7 +72,6 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
struct bio *bio;
loff_t offset = req->ki_pos;
bool sync = is_sync_kiocb(req);
- size_t shorten;
ssize_t ret;
bch2_inode_opts_get(&opts, c, &inode->ei_inode);
@@ -87,9 +86,6 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
if (!ret)
return ret;
- shorten = iov_iter_count(iter) - round_up(ret, block_bytes(c));
- iter->count -= shorten;
-
bio = bio_alloc_bioset(NULL,
bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS),
REQ_OP_READ,
@@ -158,8 +154,6 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
bch2_read(c, rbio_init(bio, opts), inode_inum(inode));
}
- iter->count += shorten;
-
if (sync) {
closure_sync(&dio->cl);
closure_debug_destroy(&dio->cl);
--
2.42.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: fix iov_iter count underflow on sub-block dio read
2024-02-05 15:48 [PATCH] bcachefs: fix iov_iter count underflow on sub-block dio read Brian Foster
@ 2024-02-05 19:15 ` Kent Overstreet
2024-02-05 19:38 ` Brian Foster
0 siblings, 1 reply; 6+ messages in thread
From: Kent Overstreet @ 2024-02-05 19:15 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-bcachefs
On Mon, Feb 05, 2024 at 10:48:14AM -0500, Brian Foster wrote:
> bch2_direct_IO_read() checks the request offset and size for sector
> alignment and then falls through to a couple calculations to shrink
> the size of the request based on the inode size. The problem is that
> these checks round up to the fs block size, which runs the risk of
> underflowing iter->count if the block size happens to be large
> enough. This is triggered by fstest generic/361 with a 4k block
> size, which subsequently leads to a crash.
>
> After some discussion, the original purpose of the shorten logic in this
> path is not totally clear. It appears to be intended as an optimization
> of limited value, so simplify things and avoid the underflow problem by
> removing it.
Thinking about this a bit more...
All that we're avoiding with the shorten is zeroing out the rest of the
read, which will happen in bch2_read_extent(); we won't be issuing any
IO, since that would require real data extents above i_size, which is of
course not allowed.
But zeroing out the rest of the read is actual work that's being
avoided, since if we return a short read (which is what we're doing
here) it's undefined what happens to the rest of the buffer.
So... maybe we should keep it, for the crazy but inevitable application
that uses a 1MB read buffer for everything, including O_DIRECT reads to
4k files?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: fix iov_iter count underflow on sub-block dio read
2024-02-05 19:15 ` Kent Overstreet
@ 2024-02-05 19:38 ` Brian Foster
2024-02-09 2:39 ` Su Yue
0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2024-02-05 19:38 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs
On Mon, Feb 05, 2024 at 02:15:36PM -0500, Kent Overstreet wrote:
> On Mon, Feb 05, 2024 at 10:48:14AM -0500, Brian Foster wrote:
> > bch2_direct_IO_read() checks the request offset and size for sector
> > alignment and then falls through to a couple calculations to shrink
> > the size of the request based on the inode size. The problem is that
> > these checks round up to the fs block size, which runs the risk of
> > underflowing iter->count if the block size happens to be large
> > enough. This is triggered by fstest generic/361 with a 4k block
> > size, which subsequently leads to a crash.
> >
> > After some discussion, the original purpose of the shorten logic in this
> > path is not totally clear. It appears to be intended as an optimization
> > of limited value, so simplify things and avoid the underflow problem by
> > removing it.
>
> Thinking about this a bit more...
>
> All that we're avoiding with the shorten is zeroing out the rest of the
> read, which will happen in bch2_read_extent(); we won't be issuing any
> IO, since that would require real data extents above i_size, which is of
> course not allowed.
>
> But zeroing out the rest of the read is actual work that's being
> avoided, since if we return a short read (which is what we're doing
> here) it's undefined what happens to the rest of the buffer.
>
> So... maybe we should keep it, for the crazy but inevitable application
> that uses a 1MB read buffer for everything, including O_DIRECT reads to
> 4k files?
>
I had another change around that basically just protected against the
underflow. I.e., something along the lines of the following, to also
cover the additional use of shorten later in the function:
if (shorten >= iter->count)
shorten = 0;
... and IIRC that seemed to work as well. But I'll probably have to take
a closer look at that large buffer case to grok what's happening and the
original purpose of this logic.
Brian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: fix iov_iter count underflow on sub-block dio read
2024-02-05 19:38 ` Brian Foster
@ 2024-02-09 2:39 ` Su Yue
2024-02-14 18:13 ` Brian Foster
0 siblings, 1 reply; 6+ messages in thread
From: Su Yue @ 2024-02-09 2:39 UTC (permalink / raw)
To: Brian Foster; +Cc: Kent Overstreet, linux-bcachefs
On Mon 05 Feb 2024 at 14:38, Brian Foster <bfoster@redhat.com>
wrote:
> On Mon, Feb 05, 2024 at 02:15:36PM -0500, Kent Overstreet wrote:
>> On Mon, Feb 05, 2024 at 10:48:14AM -0500, Brian Foster wrote:
>> > bch2_direct_IO_read() checks the request offset and size for
>> > sector
>> > alignment and then falls through to a couple calculations to
>> > shrink
>> > the size of the request based on the inode size. The problem
>> > is that
>> > these checks round up to the fs block size, which runs the
>> > risk of
>> > underflowing iter->count if the block size happens to be
>> > large
>> > enough. This is triggered by fstest generic/361 with a 4k
>> > block
>> > size, which subsequently leads to a crash.
>> >
>> > After some discussion, the original purpose of the shorten
>> > logic in this
>> > path is not totally clear. It appears to be intended as an
>> > optimization
>> > of limited value, so simplify things and avoid the underflow
>> > problem by
>> > removing it.
>>
>> Thinking about this a bit more...
>>
>> All that we're avoiding with the shorten is zeroing out the
>> rest of the
>> read, which will happen in bch2_read_extent(); we won't be
>> issuing any
>> IO, since that would require real data extents above i_size,
>> which is of
>> course not allowed.
>>
>> But zeroing out the rest of the read is actual work that's
>> being
>> avoided, since if we return a short read (which is what we're
>> doing
>> here) it's undefined what happens to the rest of the buffer.
>>
>> So... maybe we should keep it, for the crazy but inevitable
>> application
>> that uses a 1MB read buffer for everything, including O_DIRECT
>> reads to
>> 4k files?
>>
>
> I had another change around that basically just protected
> against the
> underflow. I.e., something along the lines of the following, to
> also
> cover the additional use of shorten later in the function:
>
> if (shorten >= iter->count)
> shorten = 0;
>
The change is a clean fix without breaking logic in
bch2_direct_IO_read.
If without shroten, the next loop counting on iter->count must be
altered.
I'd like the simpler way.
--
Su
> ... and IIRC that seemed to work as well. But I'll probably have
> to take
> a closer look at that large buffer case to grok what's happening
> and the
> original purpose of this logic.
>
> Brian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: fix iov_iter count underflow on sub-block dio read
2024-02-09 2:39 ` Su Yue
@ 2024-02-14 18:13 ` Brian Foster
2024-02-15 6:59 ` Su Yue
0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2024-02-14 18:13 UTC (permalink / raw)
To: Su Yue; +Cc: Kent Overstreet, linux-bcachefs
On Fri, Feb 09, 2024 at 10:39:43AM +0800, Su Yue wrote:
>
> On Mon 05 Feb 2024 at 14:38, Brian Foster <bfoster@redhat.com> wrote:
>
> > On Mon, Feb 05, 2024 at 02:15:36PM -0500, Kent Overstreet wrote:
> > > On Mon, Feb 05, 2024 at 10:48:14AM -0500, Brian Foster wrote:
> > > > bch2_direct_IO_read() checks the request offset and size for >
> > > sector
> > > > alignment and then falls through to a couple calculations to >
> > > shrink
> > > > the size of the request based on the inode size. The problem > is
> > > that
> > > > these checks round up to the fs block size, which runs the > risk
> > > of
> > > > underflowing iter->count if the block size happens to be > large
> > > > enough. This is triggered by fstest generic/361 with a 4k > block
> > > > size, which subsequently leads to a crash.
> > > >
> > > > After some discussion, the original purpose of the shorten > logic
> > > in this
> > > > path is not totally clear. It appears to be intended as an >
> > > optimization
> > > > of limited value, so simplify things and avoid the underflow >
> > > problem by
> > > > removing it.
> > >
> > > Thinking about this a bit more...
> > >
> > > All that we're avoiding with the shorten is zeroing out the rest of
> > > the
> > > read, which will happen in bch2_read_extent(); we won't be issuing
> > > any
> > > IO, since that would require real data extents above i_size, which
> > > is of
> > > course not allowed.
> > >
> > > But zeroing out the rest of the read is actual work that's being
> > > avoided, since if we return a short read (which is what we're doing
> > > here) it's undefined what happens to the rest of the buffer.
> > >
> > > So... maybe we should keep it, for the crazy but inevitable
> > > application
> > > that uses a 1MB read buffer for everything, including O_DIRECT reads
> > > to
> > > 4k files?
> > >
> >
> > I had another change around that basically just protected against the
> > underflow. I.e., something along the lines of the following, to also
> > cover the additional use of shorten later in the function:
> >
> > if (shorten >= iter->count)
> > shorten = 0;
> >
>
> The change is a clean fix without breaking logic in bch2_direct_IO_read.
> If without shroten, the next loop counting on iter->count must be altered.
> I'd like the simpler way.
>
Hi Su,
I'm not quite following your comment on the "next loop counting on
iter->count." Can you elaborate? Also, which option are you calling more
simple? :) Thanks.
Brian
> --
> Su
>
> > ... and IIRC that seemed to work as well. But I'll probably have to take
> > a closer look at that large buffer case to grok what's happening and the
> > original purpose of this logic.
> >
> > Brian
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bcachefs: fix iov_iter count underflow on sub-block dio read
2024-02-14 18:13 ` Brian Foster
@ 2024-02-15 6:59 ` Su Yue
0 siblings, 0 replies; 6+ messages in thread
From: Su Yue @ 2024-02-15 6:59 UTC (permalink / raw)
To: Brian Foster; +Cc: Su Yue, Kent Overstreet, linux-bcachefs
On Wed 14 Feb 2024 at 13:13, Brian Foster <bfoster@redhat.com>
wrote:
> On Fri, Feb 09, 2024 at 10:39:43AM +0800, Su Yue wrote:
>>
>> On Mon 05 Feb 2024 at 14:38, Brian Foster <bfoster@redhat.com>
>> wrote:
>>
>> > On Mon, Feb 05, 2024 at 02:15:36PM -0500, Kent Overstreet
>> > wrote:
>> > > On Mon, Feb 05, 2024 at 10:48:14AM -0500, Brian Foster
>> > > wrote:
>> > > > bch2_direct_IO_read() checks the request offset and size
>> > > > for >
>> > > sector
>> > > > alignment and then falls through to a couple calculations
>> > > > to >
>> > > shrink
>> > > > the size of the request based on the inode size. The
>> > > > problem > is
>> > > that
>> > > > these checks round up to the fs block size, which runs
>> > > > the > risk
>> > > of
>> > > > underflowing iter->count if the block size happens to be
>> > > > > large
>> > > > enough. This is triggered by fstest generic/361 with a 4k
>> > > > > block
>> > > > size, which subsequently leads to a crash.
>> > > >
>> > > > After some discussion, the original purpose of the
>> > > > shorten > logic
>> > > in this
>> > > > path is not totally clear. It appears to be intended as
>> > > > an >
>> > > optimization
>> > > > of limited value, so simplify things and avoid the
>> > > > underflow >
>> > > problem by
>> > > > removing it.
>> > >
>> > > Thinking about this a bit more...
>> > >
>> > > All that we're avoiding with the shorten is zeroing out the
>> > > rest of
>> > > the
>> > > read, which will happen in bch2_read_extent(); we won't be
>> > > issuing
>> > > any
>> > > IO, since that would require real data extents above
>> > > i_size, which
>> > > is of
>> > > course not allowed.
>> > >
>> > > But zeroing out the rest of the read is actual work that's
>> > > being
>> > > avoided, since if we return a short read (which is what
>> > > we're doing
>> > > here) it's undefined what happens to the rest of the
>> > > buffer.
>> > >
>> > > So... maybe we should keep it, for the crazy but inevitable
>> > > application
>> > > that uses a 1MB read buffer for everything, including
>> > > O_DIRECT reads
>> > > to
>> > > 4k files?
>> > >
>> >
>> > I had another change around that basically just protected
>> > against the
>> > underflow. I.e., something along the lines of the following,
>> > to also
>> > cover the additional use of shorten later in the function:
>> >
>> > if (shorten >= iter->count)
>> > shorten = 0;
>> >
Sorry for the misleading reply.
I mean the check of shorten is efficient and simpler which looks
good to me.
>>
>> The change is a clean fix without breaking logic in
>> bch2_direct_IO_read.
>> If without shroten, the next loop counting on iter->count must
>> be altered.
>> I'd like the simpler way.
>>
>
> Hi Su,
>
> I'm not quite following your comment on the "next loop counting
> on
> iter->count." Can you elaborate? Also, which option are you
> calling more
> simple? :) Thanks.
>
I think I just explained Kent's opinion in another way. The loop
is
fs/bcachefs/fs-io-direct.c:
static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter
*iter) {
91 iter->count -= shorten;
...
130 while (iter->count) {
...
bch2_read()
}
iter->count += shorten;
}
bch2_direct_IO_read() does all boundary check for dio. If shorten
is
removed and no other condition in the loop, it brings more bios,
more btree searches in bch2_read[_extent].
--
Su
> Brian
>
>> --
>> Su
>>
>> > ... and IIRC that seemed to work as well. But I'll probably
>> > have to take
>> > a closer look at that large buffer case to grok what's
>> > happening and the
>> > original purpose of this logic.
>> >
>> > Brian
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-02-15 7:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-05 15:48 [PATCH] bcachefs: fix iov_iter count underflow on sub-block dio read Brian Foster
2024-02-05 19:15 ` Kent Overstreet
2024-02-05 19:38 ` Brian Foster
2024-02-09 2:39 ` Su Yue
2024-02-14 18:13 ` Brian Foster
2024-02-15 6:59 ` Su Yue
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).