From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Alan J. Wylie" <alan@wylie.me.uk>,
Thorsten Leemhuis <regressions@leemhuis.info>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn
Date: Sat, 14 Jan 2017 01:24:28 +0000 [thread overview]
Message-ID: <20170114012428.GX1555@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFx-Lc2KooGGt3ohHZUM6cUpcHWxs2a_YUdL=4bdRyxNfA@mail.gmail.com>
On Fri, Jan 13, 2017 at 04:59:37PM -0800, Linus Torvalds wrote:
> EXCEPT.
>
> I don't think "i->iov_offset" is actually correct. If you truncated
> the whole thing, you should have cleared iov_offset too, and that
> never happened. So truncating everything will not empty the buffers
> for us, we'll still get to that "if (off)" case and have nrbufs=1.
>
> So I'd expect something like
>
> if (!size)
> i->iov_offset = 0;
>
> in pipe_advance(), in order to really free all buffers for that case. No?
Why would advance by 0 change ->iov_offset here?
> Or is there some guarantee that iov_offset was already zero there? I
> don't see it, but I'm batting zero on this code...
It was zero initially (see iov_iter_pipe()). It was not affected by
iov_iter_get_pages() and friends. If there was copy_to_iter/zero_iter/
copy_page_to_iter for any non-zero amount of data, then it's _not_ zero and
should bloody well stay such.
Note that the normal case is something like O_DIRECT read grabbing pages,
doing a bunch of IO into such and, once it's done, iov_iter_advance()
for the actual amount read being done by
retval = mapping->a_ops->direct_IO(iocb, &data);
if (retval >= 0) {
iocb->ki_pos += retval;
iov_iter_advance(iter, retval);
}
which advances iter past the actually read data. If we proceed to
fall back to the do_generic_file_read(), it will do a bunch of
copy_page_to_iter() to the points past that.
default_file_splice_read() pretty much imitates O_DIRECT read in that
respect, with kernel_readv() being a counterpart of direct_IO.
generic_file_splice_read() is another PIPE_ITER user; that one really
calls ->read_iter(). On any non-error (including short reads) it will
not bother with iov_iter_advance() at all - ->read_iter() should've
called that already, if at all needed.
On error it does use iov_iter_advance(), pretty much as a way to
trigger pipe_truncate(). There we directly reset ->iov_offset to 0
and ->idx to its original value. Strictly speaking, we could live
without calling it there at all - normally ->read_iter() instances follow
the usual "if you'd managed to read something, report a short read,
not an error" approach. None of the exceptions are good candidates for
use of generic_file_splice_read() anyway.
However, theoretically it is possible that ->read_iter() instance does
successful copy_to_iter() and then decides to return an error. This
} else if (ret < 0) {
to.idx = idx;
to.iov_offset = 0;
iov_iter_advance(&to, 0); /* to free what was emitted */
in generic_file_splice_read() catches any such cases. Here the call
of iov_iter_advance(&to, 0) is guaranteed to be equivalent to
pipe_truncate(&to); we might make the latter non-static and call it
directly, but I'm not sure it's worth doing.
> Can you please explain when you are once more awake..
next prev parent reply other threads:[~2017-01-14 1:24 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-12 20:26 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn Alan J. Wylie
2017-01-12 20:31 ` Al Viro
2017-01-12 20:38 ` Alan J. Wylie
2017-01-12 22:26 ` Linus Torvalds
2017-01-12 22:37 ` Al Viro
2017-01-12 22:46 ` Al Viro
2017-01-12 23:02 ` Linus Torvalds
2017-01-12 23:14 ` Al Viro
2017-01-12 23:14 ` Linus Torvalds
2017-01-12 23:27 ` Al Viro
2017-01-12 22:46 ` Alan J. Wylie
2017-01-12 22:58 ` Al Viro
2017-01-12 23:28 ` Linus Torvalds
2017-01-13 4:00 ` Al Viro
2017-01-13 7:38 ` Alan J. Wylie
2017-01-13 7:23 ` Alan J. Wylie
2017-01-13 9:33 ` Al Viro
2017-01-13 9:54 ` Alan J. Wylie
2017-01-13 10:20 ` Al Viro
2017-01-13 10:32 ` Alan J. Wylie
2017-01-13 11:25 ` Al Viro
2017-01-13 11:18 ` Al Viro
2017-01-13 19:33 ` Linus Torvalds
2017-01-13 20:08 ` Al Viro
2017-01-13 20:11 ` Al Viro
2017-01-13 20:32 ` Linus Torvalds
2017-01-13 20:47 ` Al Viro
2017-01-13 21:55 ` Al Viro
2017-01-13 21:59 ` Al Viro
2017-01-13 22:13 ` Al Viro
2017-01-13 22:50 ` Al Viro
2017-01-14 0:59 ` Linus Torvalds
2017-01-14 1:24 ` Al Viro [this message]
2017-01-14 1:43 ` Al Viro
2017-01-14 1:46 ` Linus Torvalds
2017-01-14 1:57 ` Al Viro
2017-01-15 0:53 ` Al Viro
2017-01-14 13:16 ` Alan J. Wylie
2017-01-14 16:29 ` Alan J. Wylie
2017-01-14 17:57 ` Linus Torvalds
2017-01-13 20:08 ` Linus Torvalds
2017-01-13 20:16 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170114012428.GX1555@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=alan@wylie.me.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=regressions@leemhuis.info \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox