From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: [RFC] writev() semantics with invalid iovec in the middle
Date: Wed, 14 Sep 2016 22:34:58 +0100 [thread overview]
Message-ID: <20160914213457.GG2356@ZenIV.linux.org.uk> (raw)
Right now writev() with 3-iovec array that has unmapped address in
the second element and total length less than PAGE_SIZE will write the
first segment and stop at that. Among other things, it guarantees the
short copy, and I would rather have it yeild 0-bytes write (and -EFAULT as
return value).
All POSIX has to say about that is this (in 2.3 Error Numbers):
[EFAULT]
Bad address. The system detected an invalid address in attempting to use
an argument of a call. The reliable detection of this error cannot be
guaranteed, and when not detected may result in the generation of a signal,
indicating an address violation, which is sent to the process.
Note that unmapped page in the middle of a range covered already can lead to
the same kind of short write - i.e. if we have
p = mmap(0, 3*4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
munmap(p + 4096, 4096);
fd = open("/tmp/foo", O_CREAT|O_TRUNC|O_RDWR, 0777);
write(fd, p + 2048, 8192);
write() will yield -EFAULT, not a 2Kb stored. The same will happen with
writev(fd, &(struct iovec){p + 2048, 8192}, 1);
BTW, adding lseek(fd, 2049, SEEK_SET); before that write (or writev) will
result in 2047 bytes being written by the latter.
IOW, we do not try to squeeze every byte that can be squeezed out of the
buffer; generally, an unmapped address anywhere in PAGE_SIZE worth of data
that would go into the same page-aligned chunk of destination can result in
short write cut at the beginning of that chunk. iovec boundaries act
as barriers to short writes, mostly by accident.
Do we need to preserve that special treatment of iovec boundaries? I would
really like to get rid of that - the current behaviour is an easy and reliable
way to trigger a short copy case in ->write_end() and those are fairly
brittle. Sure, we still need to cope with them, and I think I've got all
instances in the current mainline fixed, but they are often suboptimal.
Objections?
next reply other threads:[~2016-09-14 21:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-14 21:34 Al Viro [this message]
2016-09-15 10:23 ` [RFC] writev() semantics with invalid iovec in the middle Mike Marshall
2016-09-15 22:29 ` Al Viro
2016-09-15 22:32 ` Cedric Blancher
2016-09-15 22:32 ` Linus Torvalds
2016-09-16 13:25 ` One Thousand Gnomes
2016-09-16 18:36 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160914213457.GG2356@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox