public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Alan J. Wylie" <alan@wylie.me.uk>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn
Date: Sat, 14 Jan 2017 01:24:28 +0000	[thread overview]
Message-ID: <20170114012428.GX1555@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFx-Lc2KooGGt3ohHZUM6cUpcHWxs2a_YUdL=4bdRyxNfA@mail.gmail.com>

On Fri, Jan 13, 2017 at 04:59:37PM -0800, Linus Torvalds wrote:

> EXCEPT.
> 
> I don't think "i->iov_offset" is actually correct. If you truncated
> the whole thing, you should have cleared iov_offset too, and that
> never happened. So truncating everything will not empty the buffers
> for us, we'll still get to that "if (off)" case and have nrbufs=1.
> 
> So I'd expect something like
> 
>     if (!size)
>         i->iov_offset = 0;
> 
> in pipe_advance(), in order to really free all buffers for that case. No?

Why would advance by 0 change ->iov_offset here?

> Or is there some guarantee that iov_offset was already zero there? I
> don't see it, but I'm batting zero on this code...

It was zero initially (see iov_iter_pipe()).  It was not affected by
iov_iter_get_pages() and friends.  If there was copy_to_iter/zero_iter/
copy_page_to_iter for any non-zero amount of data, then it's _not_ zero and
should bloody well stay such.

Note that the normal case is something like O_DIRECT read grabbing pages,
doing a bunch of IO into such and, once it's done, iov_iter_advance()
for the actual amount read being done by
                retval = mapping->a_ops->direct_IO(iocb, &data);
                if (retval >= 0) {
                        iocb->ki_pos += retval;
                        iov_iter_advance(iter, retval);
                }
which advances iter past the actually read data.  If we proceed to
fall back to the do_generic_file_read(), it will do a bunch of
copy_page_to_iter() to the points past that.

default_file_splice_read() pretty much imitates O_DIRECT read in that
respect, with kernel_readv() being a counterpart of direct_IO.

generic_file_splice_read() is another PIPE_ITER user; that one really
calls ->read_iter().  On any non-error (including short reads) it will
not bother with iov_iter_advance() at all - ->read_iter() should've
called that already, if at all needed.

On error it does use iov_iter_advance(), pretty much as a way to
trigger pipe_truncate().  There we directly reset ->iov_offset to 0
and ->idx to its original value.  Strictly speaking, we could live
without calling it there at all - normally ->read_iter() instances follow
the usual "if you'd managed to read something, report a short read,
not an error" approach.  None of the exceptions are good candidates for
use of generic_file_splice_read() anyway.

However, theoretically it is possible that ->read_iter() instance does
successful copy_to_iter() and then decides to return an error.  This
        } else if (ret < 0) {
                to.idx = idx;
                to.iov_offset = 0;
                iov_iter_advance(&to, 0); /* to free what was emitted */
in generic_file_splice_read() catches any such cases.  Here the call
of iov_iter_advance(&to, 0) is guaranteed to be equivalent to
pipe_truncate(&to); we might make the latter non-static and call it
directly, but I'm not sure it's worth doing.
 
> Can you please explain when you are once more awake..

  reply	other threads:[~2017-01-14  1:24 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-12 20:26 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn Alan J. Wylie
2017-01-12 20:31 ` Al Viro
2017-01-12 20:38   ` Alan J. Wylie
2017-01-12 22:26 ` Linus Torvalds
2017-01-12 22:37   ` Al Viro
2017-01-12 22:46     ` Al Viro
2017-01-12 23:02       ` Linus Torvalds
2017-01-12 23:14         ` Al Viro
2017-01-12 23:14         ` Linus Torvalds
2017-01-12 23:27           ` Al Viro
2017-01-12 22:46   ` Alan J. Wylie
2017-01-12 22:58     ` Al Viro
2017-01-12 23:28     ` Linus Torvalds
2017-01-13  4:00       ` Al Viro
2017-01-13  7:38         ` Alan J. Wylie
2017-01-13  7:23       ` Alan J. Wylie
2017-01-13  9:33         ` Al Viro
2017-01-13  9:54           ` Alan J. Wylie
2017-01-13 10:20             ` Al Viro
2017-01-13 10:32               ` Alan J. Wylie
2017-01-13 11:25                 ` Al Viro
2017-01-13 11:18               ` Al Viro
2017-01-13 19:33                 ` Linus Torvalds
2017-01-13 20:08                   ` Al Viro
2017-01-13 20:11                     ` Al Viro
2017-01-13 20:32                       ` Linus Torvalds
2017-01-13 20:47                         ` Al Viro
2017-01-13 21:55                           ` Al Viro
2017-01-13 21:59                             ` Al Viro
2017-01-13 22:13                               ` Al Viro
2017-01-13 22:50                                 ` Al Viro
2017-01-14  0:59                                   ` Linus Torvalds
2017-01-14  1:24                                     ` Al Viro [this message]
2017-01-14  1:43                                       ` Al Viro
2017-01-14  1:46                                       ` Linus Torvalds
2017-01-14  1:57                                         ` Al Viro
2017-01-15  0:53                                           ` Al Viro
2017-01-14 13:16                                   ` Alan J. Wylie
2017-01-14 16:29                                     ` Alan J. Wylie
2017-01-14 17:57                                       ` Linus Torvalds
2017-01-13 20:08                   ` Linus Torvalds
2017-01-13 20:16                     ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170114012428.GX1555@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=alan@wylie.me.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox