From: Al Viro <viro@zeniv.linux.org.uk>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH] fs: prefer read_iter over read and write_iter over write
Date: Fri, 20 May 2022 15:04:49 +0000 [thread overview]
Message-ID: <YoeuEXzGFdnCZIVs@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <20220520135103.166972-1-Jason@zx2c4.com>
On Fri, May 20, 2022 at 03:51:03PM +0200, Jason A. Donenfeld wrote:
> Most kernel code prefers read_iter over read and write_iter over write,
> yet the read function pointer is tested first. Reverse these so that the
> iter function is always used first.
NAK. There are some weird devices (at the very least, one in sound)
where data gets interpreted differently for write() and writev().
There are several degrees of messiness:
1) packet-like semantics, where boundaries of iovecs are
significant; writev() is equivalent to loop of write() calls, but
*NOT* to write() on a single concatenated copy. _Any_ short write
on any segment (due to ->write() instance ignoring the rest of data
as well as due to unmapped page halfway through) terminates writev().
2) similar, but more extreme - write() reports consuming all
the data it's been given (assuming the damn thing parses) and
ignores the excess. writev() is equivalent to iterated write() on
all segments, as long as each is valid. Not uncommon, sadly...
3) completely unrelated interpretations of input for write()
and for writev(). writev() is *NOT* equivalent to a loop of write()
there. Yes, such beasts exist. And it's a user-visible ABI.
Example: snd_pcm_write() vs. snd_pcm_writev(). Not a chance to
retire that one any time soon, and the difference in semantics is
that writev() is "feed several channels at once; the chunks for
individual channels are covered by elements of iovec array".
Worse one: qib_write() and qib_write_iter(). There we flat-out
have different command sets for write() and for writev(). That,
at least, might be possible to retire someday.
IIRC, for pcm the readv() vs. read() differences are same as for
writev() vs. write() - parallel reads from different channels,
each to its own iovec.
It's a bad userland ABI design, but we are stuck with it - it's a couple
of decades too late to change.
next prev parent reply other threads:[~2022-05-20 15:05 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-20 13:51 [PATCH] fs: prefer read_iter over read and write_iter over write Jason A. Donenfeld
2022-05-20 14:37 ` Jens Axboe
2022-05-20 15:04 ` Al Viro [this message]
2022-05-20 21:24 ` David Laight
2022-05-20 21:30 ` Jason A. Donenfeld
2022-05-20 22:08 ` David Laight
2022-05-20 22:18 ` Jens Axboe
2022-05-23 8:18 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YoeuEXzGFdnCZIVs@zeniv-ca.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=Jason@zx2c4.com \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox