All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Alan J. Wylie" <alan@wylie.me.uk>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn
Date: Sat, 14 Jan 2017 01:24:28 +0000	[thread overview]
Message-ID: <20170114012428.GX1555@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFx-Lc2KooGGt3ohHZUM6cUpcHWxs2a_YUdL=4bdRyxNfA@mail.gmail.com>

On Fri, Jan 13, 2017 at 04:59:37PM -0800, Linus Torvalds wrote:

> EXCEPT.
> 
> I don't think "i->iov_offset" is actually correct. If you truncated
> the whole thing, you should have cleared iov_offset too, and that
> never happened. So truncating everything will not empty the buffers
> for us, we'll still get to that "if (off)" case and have nrbufs=1.
> 
> So I'd expect something like
> 
>     if (!size)
>         i->iov_offset = 0;
> 
> in pipe_advance(), in order to really free all buffers for that case. No?

Why would advance by 0 change ->iov_offset here?

> Or is there some guarantee that iov_offset was already zero there? I
> don't see it, but I'm batting zero on this code...

It was zero initially (see iov_iter_pipe()).  It was not affected by
iov_iter_get_pages() and friends.  If there was copy_to_iter/zero_iter/
copy_page_to_iter for any non-zero amount of data, then it's _not_ zero and
should bloody well stay such.

Note that the normal case is something like O_DIRECT read grabbing pages,
doing a bunch of IO into such and, once it's done, iov_iter_advance()
for the actual amount read being done by
                retval = mapping->a_ops->direct_IO(iocb, &data);
                if (retval >= 0) {
                        iocb->ki_pos += retval;
                        iov_iter_advance(iter, retval);
                }
which advances iter past the actually read data.  If we proceed to
fall back to the do_generic_file_read(), it will do a bunch of
copy_page_to_iter() to the points past that.

default_file_splice_read() pretty much imitates O_DIRECT read in that
respect, with kernel_readv() being a counterpart of direct_IO.

generic_file_splice_read() is another PIPE_ITER user; that one really
calls ->read_iter().  On any non-error (including short reads) it will
not bother with iov_iter_advance() at all - ->read_iter() should've
called that already, if at all needed.

On error it does use iov_iter_advance(), pretty much as a way to
trigger pipe_truncate().  There we directly reset ->iov_offset to 0
and ->idx to its original value.  Strictly speaking, we could live
without calling it there at all - normally ->read_iter() instances follow
the usual "if you'd managed to read something, report a short read,
not an error" approach.  None of the exceptions are good candidates for
use of generic_file_splice_read() anyway.

However, theoretically it is possible that ->read_iter() instance does
successful copy_to_iter() and then decides to return an error.  This
        } else if (ret < 0) {
                to.idx = idx;
                to.iov_offset = 0;
                iov_iter_advance(&to, 0); /* to free what was emitted */
in generic_file_splice_read() catches any such cases.  Here the call
of iov_iter_advance(&to, 0) is guaranteed to be equivalent to
pipe_truncate(&to); we might make the latter non-static and call it
directly, but I'm not sure it's worth doing.
 
> Can you please explain when you are once more awake..

  reply	other threads:[~2017-01-14  1:24 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-12 20:26 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn Alan J. Wylie
2017-01-12 20:31 ` Al Viro
2017-01-12 20:38   ` Alan J. Wylie
2017-01-12 22:26 ` Linus Torvalds
2017-01-12 22:37   ` Al Viro
2017-01-12 22:46     ` Al Viro
2017-01-12 23:02       ` Linus Torvalds
2017-01-12 23:14         ` Al Viro
2017-01-12 23:14         ` Linus Torvalds
2017-01-12 23:27           ` Al Viro
2017-01-12 22:46   ` Alan J. Wylie
2017-01-12 22:58     ` Al Viro
2017-01-12 23:28     ` Linus Torvalds
2017-01-13  4:00       ` Al Viro
2017-01-13  7:38         ` Alan J. Wylie
2017-01-13  7:23       ` Alan J. Wylie
2017-01-13  9:33         ` Al Viro
2017-01-13  9:54           ` Alan J. Wylie
2017-01-13 10:20             ` Al Viro
2017-01-13 10:32               ` Alan J. Wylie
2017-01-13 11:25                 ` Al Viro
2017-01-13 11:18               ` Al Viro
2017-01-13 19:33                 ` Linus Torvalds
2017-01-13 20:08                   ` Al Viro
2017-01-13 20:11                     ` Al Viro
2017-01-13 20:32                       ` Linus Torvalds
2017-01-13 20:47                         ` Al Viro
2017-01-13 21:55                           ` Al Viro
2017-01-13 21:59                             ` Al Viro
2017-01-13 22:13                               ` Al Viro
2017-01-13 22:50                                 ` Al Viro
2017-01-14  0:59                                   ` Linus Torvalds
2017-01-14  1:24                                     ` Al Viro [this message]
2017-01-14  1:43                                       ` Al Viro
2017-01-14  1:46                                       ` Linus Torvalds
2017-01-14  1:57                                         ` Al Viro
2017-01-15  0:53                                           ` Al Viro
2017-01-14 13:16                                   ` Alan J. Wylie
2017-01-14 16:29                                     ` Alan J. Wylie
2017-01-14 17:57                                       ` Linus Torvalds
2017-01-13 20:08                   ` Linus Torvalds
2017-01-13 20:16                     ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170114012428.GX1555@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=alan@wylie.me.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.