Re: [RFC] writev() semantics with invalid iovec in the middle

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC] writev() semantics with invalid iovec in the middle
Date: Thu, 15 Sep 2016 23:29:35 +0100	[thread overview]
Message-ID: <20160915222935.GI2356@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAOg9mSQ46jFvtUk9v211uH5cNyoNwktCj79s4Rfy9zc5nWq2Pg@mail.gmail.com>

On Thu, Sep 15, 2016 at 06:23:24AM -0400, Mike Marshall wrote:
> If you squeeze out every byte won't you still have a short
> write? And the written data wouldn't be cut at the bad
> place, but it would have a weird hole or discontinuity there.

???

What I mean is that if we have an invalid address in the middle of a buffer
(unmapped, for example), we do not attempt to write every byte prior to that
invalid address.  Of course what we write is going to be contiguous.

Suppose we have a buffer spanning 10 pages (amd64, so these are 4K ones) -
7 valid, 3 invalid:
	VVVVIIIVV
and it starts 100 bytes into the first page.  And write goes into a regular
file on e.g. tmpfs, starting at offset 31.  We _can't_ write more than
4*4096-100 bytes, no matter what.  It will be a short write.  As the matter
of fact, it will be even shorter than that - it will be 3*4096-31 bytes,
up to the last pagecache boundary we can cover completely.  That obviously
depends upon the filesystem - not everything uses pagecache, for starters.
However, the caller is *not* guaranteed that write() with an invalid page
in the middle of a buffer would write everything up to the very beginning
of the invalid page.  A short write will happen, but the amount written
might be up to page size less than the actual length of valid part in the
beginning of the buffer.

Now, for writev() we could have invalid pages in any iovec; again, we
obviously can't write anything past the first invalid page - we'll get
either a short write or -EFAULT (if nothing got written).  That's fine;
the question is what the caller can count upon wrt shortening.

Again, we are *not* guaranteed writing up to exact boundary.  However, the
current implementation will end up shortening no more than to the iovec
boundary.  I.e. if the first iovec contains only valid pages and there's
an invalid one in the second iovec, the current implementation will write
at least everything in the first iovec.  That's _not_ promised by POSIX
or our manpages; moreover, I'm not sure if it's even true for each filesystem.
And keeping that property is actually inconvenient - if we could discard it,
we could make partial-copy ->write_end() calls a lot more infrequent.

Unfortunately, some of LTP writev tests end up checking that writev() does
behave that way - they feed it a three-element iovec with shorter-than-page
segments, the second of which is all invalid.  And they check that the
entire first segment had been written.

I would really like to drop that property, making it "if some addresses
in the buffer(s) we are asked to write are invalid, the write will be
shortened by up to a PAGE_SIZE from the first such invalid address", making
writev() rules exactly the same as write() ones.  Does anybody have objections
to it?

next prev parent reply	other threads:[~2016-09-15 22:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-14 21:34 [RFC] writev() semantics with invalid iovec in the middle Al Viro
2016-09-15 10:23 ` Mike Marshall
2016-09-15 22:29   ` Al Viro [this message]
2016-09-15 22:32     ` Linus Torvalds
2016-09-15 22:32     ` Cedric Blancher
2016-09-16 13:25     ` One Thousand Gnomes
2016-09-16 18:36       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160915222935.GI2356@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hubcap@omnibond.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox