From: Jeff Moyer <jmoyer@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: manish honap <manish_honap_vit@yahoo.co.in>,
"tytso\@mit.edu" <tytso@mit.edu>,
"adilger.kernel\@dilger.ca" <adilger.kernel@dilger.ca>,
"linux-fsdevel\@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code.
Date: Tue, 22 May 2012 16:41:02 -0400 [thread overview]
Message-ID: <x49d35wt0dd.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <CA+55aFxA0Do37AxoM2MPH5nnMa2VAPHp1pKJ7CiafGGQL-b0wQ@mail.gmail.com> (Linus Torvalds's message of "Sun, 20 May 2012 21:50:24 -0700")
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Sun, May 20, 2012 at 8:28 PM, manish honap
> <manish_honap_vit@yahoo.co.in> wrote:
>> Hello Linus,
>>
>> The overflow issue was seen during async dio path
>
> Christ. fs/aio.c doesn't do the proper rw_verify_area().
>
> As a result, it doesn't check file locks, and it doesn't seem to check
> offset overflows either.
>
> The vector versions kind of get the size limit by mistake (because
> they at least use rw_copy_check_uvector(), which does limit things to
> MAX_RW_COUNT), but they don't do the offset overflow check either.
>
> Does this patch work for you? What it *should* do is the same that the
> other read/write paths do (and the vector path for aio already do),
> namely truncate reads or writes to MAX_RW_COUNT (which is INT_MAX
> aligned down to a page).
OK, this took me a while to wrap my head around, mostly due to confusion
with ki_nbytes and ki_left. When doing p{read|write}v, aio_nbytes is
used to store the number of vectors. Thus, we need to set both ki_left
and ki_nbytes to the result of the rw_verify_area call, which means that
when ki_left goes to 0, we'll exit out of this loop:
do {
ret = rw_op(iocb, &iocb->ki_iovec[iocb->ki_cur_seg],
iocb->ki_nr_segs - iocb->ki_cur_seg,
iocb->ki_pos);
if (ret > 0)
aio_advance_iovec(iocb, ret);
/* retry all partial writes. retry partial reads as long as its a
* regular file. */
} while (ret > 0 && iocb->ki_left > 0 &&
(opcode == IOCB_CMD_PWRITEV ||
(!S_ISFIFO(inode->i_mode) && !S_ISSOCK(inode->i_mode))));
and here:
/* This means we must have transferred all that we could */
/* No need to retry anymore */
if ((ret == 0) || (iocb->ki_left == 0))
ret = iocb->ki_nbytes - iocb->ki_left;
ki_nbytes will remain at whatever rw_verify_area returned, and ki_left
should be zero. Meaning, if we passed in 2G or more, we're capped at
the value Linus mentioned earlier.
In the case of a single vector, we leave ki_left alone:
> - kiocb->ki_iovec->iov_len = kiocb->ki_left;
> + kiocb->ki_iovec->iov_len = bytes;
Thus, we will exit the while loop due to ret == 0, but this time,
ki_nbytes will be the initial value, and ki_left will be non-zero. The
end result is that the amount of data transferred is returned, so all is
well.
Of course, none of that matters for O_DIRECT I/O, since the return value
is -EIOCBQUEUED. And, since buffered AIO isn't even asynchronous,
nobody /should/ be doing that anyway, so even if the above was broken,
nobody /should/ notice. Nevertheless, I did some targeted testing of
both the O_DIRECT and the buffered cases, and it seems to be working
fine for the single and multiple vector cases.
I do have one question, though: why do we limit the total bytes in a
vectored operation to 2GB? So long as each segment is below 2G, we
shouldn't trip up any code paths in the kernel, right? So that just
leaves the "it would take a long time and burn kernel resources"
argument. So I guess that's the reason?
Anyway, consider this a long-winded:
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
prev parent reply other threads:[~2012-05-22 20:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <F9014F50-D1F6-4B64-8535-7452CC64B18A@qualexsystems.com>
2012-05-20 8:01 ` [PATCH 1/1] ext4, dio: Remove overflow for size >2G in aio-dio code manish honap
2012-05-20 18:33 ` Linus Torvalds
2012-05-21 3:28 ` manish honap
2012-05-21 4:50 ` Linus Torvalds
2012-05-21 22:22 ` Linus Torvalds
2012-05-21 23:31 ` Ted Ts'o
2012-05-22 16:11 ` Eric Sandeen
2012-05-22 19:02 ` Eric Sandeen
2012-05-22 16:13 ` manish honap
2012-05-22 19:26 ` [PATCH 1/1] xfstests 286: test for 2G overflows in AIO Eric Sandeen
2012-05-22 20:41 ` Jeff Moyer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=x49d35wt0dd.fsf@segfault.boston.devel.redhat.com \
--to=jmoyer@redhat.com \
--cc=adilger.kernel@dilger.ca \
--cc=linux-fsdevel@vger.kernel.org \
--cc=manish_honap_vit@yahoo.co.in \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).