public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] writev() semantics with invalid iovec in the middle
@ 2016-09-14 21:34 Al Viro
  2016-09-15 10:23 ` Mike Marshall
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2016-09-14 21:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

	Right now writev() with 3-iovec array that has unmapped address in
the second element and total length less than PAGE_SIZE will write the
first segment and stop at that.  Among other things, it guarantees the
short copy, and I would rather have it yeild 0-bytes write (and -EFAULT as
return value).

	All POSIX has to say about that is this (in 2.3 Error Numbers):

[EFAULT]
    Bad address. The system detected an invalid address in attempting to use
an argument of a call. The reliable detection of this error cannot be
guaranteed, and when not detected may result in the generation of a signal,
indicating an address violation, which is sent to the process.

Note that unmapped page in the middle of a range covered already can lead to
the same kind of short write  - i.e. if we have
	p = mmap(0, 3*4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
	munmap(p + 4096, 4096);
	fd = open("/tmp/foo", O_CREAT|O_TRUNC|O_RDWR, 0777);
	write(fd, p + 2048, 8192);

write() will yield -EFAULT, not a 2Kb stored.  The same will happen with
	writev(fd, &(struct iovec){p + 2048, 8192}, 1);
BTW, adding lseek(fd, 2049, SEEK_SET); before that write (or writev) will
result in 2047 bytes being written by the latter.

IOW, we do not try to squeeze every byte that can be squeezed out of the
buffer; generally, an unmapped address anywhere in PAGE_SIZE worth of data
that would go into the same page-aligned chunk of destination can result in
short write cut at the beginning of that chunk.  iovec boundaries act
as barriers to short writes, mostly by accident.

Do we need to preserve that special treatment of iovec boundaries?  I would
really like to get rid of that - the current behaviour is an easy and reliable
way to trigger a short copy case in ->write_end() and those are fairly
brittle.  Sure, we still need to cope with them, and I think I've got all
instances in the current mainline fixed, but they are often suboptimal.

Objections?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-09-16 18:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-14 21:34 [RFC] writev() semantics with invalid iovec in the middle Al Viro
2016-09-15 10:23 ` Mike Marshall
2016-09-15 22:29   ` Al Viro
2016-09-15 22:32     ` Linus Torvalds
2016-09-15 22:32     ` Cedric Blancher
2016-09-16 13:25     ` One Thousand Gnomes
2016-09-16 18:36       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox