* Read/write counts
@ 2007-06-04 10:20 David H. Lynch Jr.
2007-06-04 16:33 ` Andreas Dilger
0 siblings, 1 reply; 8+ messages in thread
From: David H. Lynch Jr. @ 2007-06-04 10:20 UTC (permalink / raw)
To: linux-fsdevel
I have a file system that has really odd blocking.
All files have a variable length header (basically a directory
entry) at their start.
Most but not all sectors, have a small fixed length signature as
well as some link data at their start.
The net result is that implimentation would be simpler if I could
just read/write, the amount of data
that can be done with the least amount of work, even if that is less
than was requested.
If I receive a request to read 512 bytes, and I return that I have
read 486, is either the OS, libc, or something else
going to treat that as an error, or are they coming back for the
rest in a subsequent call ?
I though I recalled that read()/write() returning a cound less than
requested is not an error.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Read/write counts
2007-06-04 10:20 Read/write counts David H. Lynch Jr.
@ 2007-06-04 16:33 ` Andreas Dilger
2007-06-04 16:56 ` Bryan Henderson
0 siblings, 1 reply; 8+ messages in thread
From: Andreas Dilger @ 2007-06-04 16:33 UTC (permalink / raw)
To: David H. Lynch Jr.; +Cc: linux-fsdevel
On Jun 04, 2007 06:20 -0400, David H. Lynch Jr. wrote:
> The net result is that implimentation would be simpler if I could
> just read/write, the amount of data that can be done with the least
> amount of work, even if that is less than was requested.
>
> If I receive a request to read 512 bytes, and I return that I have read
> 486, is either the OS, libc, or something else going to treat that as an
> error, or are they coming back for the rest in a subsequent call ?
>
> I though I recalled that read()/write() returning a cound less than
> requested is not an error.
It is not strictly an error to read/write less than the requested amount,
but you will find that a lot of applications don't handle this correctly.
They will assume that if the amount read/written is != amount requested
that this is an error. Of course the opposite is also true - some
applications assume that the amount requested == amount read/written and
don't even check whether that is actually the case or not.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Read/write counts
2007-06-04 16:33 ` Andreas Dilger
@ 2007-06-04 16:56 ` Bryan Henderson
2007-06-04 17:02 ` Matthew Wilcox
0 siblings, 1 reply; 8+ messages in thread
From: Bryan Henderson @ 2007-06-04 16:56 UTC (permalink / raw)
To: Andreas Dilger; +Cc: David H. Lynch Jr., linux-fsdevel
>It is not strictly an error to read/write less than the requested amount,
>but you will find that a lot of applications don't handle this correctly.
I'd give it a slightly different nuance. It's not an error, and it's a
reasonable thing to do, but there is value in not doing it. POSIX and its
predecessors back to the beginning of Unix say read()/write() don't have
to transfer the full count (they must transfer at least one byte). The
main reason for this choice is that it may require more resources (e.g. a
memory buffer) than the system can allocate to do the whole request at
once.
Programs that assume a full transfer are fairly common, but are
universally regarded as either broken or just lazy, and when it does cause
a problem, it is far more common to fix the application than the kernel.
Most application programs access files via libc's fread/fwrite, which
don't have partial transfers. GNU libc does handle partial (kernel) reads
and writes correctly. I'd be surprised if someone can name a major
application that doesn't.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Read/write counts
2007-06-04 16:56 ` Bryan Henderson
@ 2007-06-04 17:02 ` Matthew Wilcox
2007-06-04 18:33 ` Theodore Tso
0 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2007-06-04 17:02 UTC (permalink / raw)
To: Bryan Henderson; +Cc: Andreas Dilger, David H. Lynch Jr., linux-fsdevel
On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote:
> Programs that assume a full transfer are fairly common, but are
> universally regarded as either broken or just lazy, and when it does cause
> a problem, it is far more common to fix the application than the kernel.
Linus has explicitly forbidden short reads from being returned. The
original poster may get away with it for a specialised case, but for
example, signals may not cause a return to userspace with a short read
for exactly this reason.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Read/write counts
2007-06-04 17:02 ` Matthew Wilcox
@ 2007-06-04 18:33 ` Theodore Tso
2007-06-04 18:57 ` Roman Zippel
0 siblings, 1 reply; 8+ messages in thread
From: Theodore Tso @ 2007-06-04 18:33 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Bryan Henderson, Andreas Dilger, David H. Lynch Jr.,
linux-fsdevel
On Mon, Jun 04, 2007 at 11:02:23AM -0600, Matthew Wilcox wrote:
> On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote:
> > Programs that assume a full transfer are fairly common, but are
> > universally regarded as either broken or just lazy, and when it does cause
> > a problem, it is far more common to fix the application than the kernel.
>
> Linus has explicitly forbidden short reads from being returned. The
> original poster may get away with it for a specialised case, but for
> example, signals may not cause a return to userspace with a short read
> for exactly this reason.
Hmm, I'm not sure I would go that far. Per the POSIX specification,
we support the optional BSD-style restartable system calls for signals
which will avoid short reads; but this is only true if SA_RESTART is
passed to sigaction(). Without SA_RESTART, we will indeed return
short reads, as required by POSIX.
I don't think Linus has said that short reads are always evil; I
certainly can't remember him ever making that statement. Do you have
a pointer to a LKML message where he's said that?
- Ted
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Read/write counts
2007-06-04 18:33 ` Theodore Tso
@ 2007-06-04 18:57 ` Roman Zippel
2007-06-04 19:24 ` Joel Becker
2007-06-04 20:00 ` Theodore Tso
0 siblings, 2 replies; 8+ messages in thread
From: Roman Zippel @ 2007-06-04 18:57 UTC (permalink / raw)
To: Theodore Tso
Cc: Matthew Wilcox, Bryan Henderson, Andreas Dilger,
David H. Lynch Jr., linux-fsdevel
Hi,
On Mon, 4 Jun 2007, Theodore Tso wrote:
> Hmm, I'm not sure I would go that far. Per the POSIX specification,
> we support the optional BSD-style restartable system calls for signals
> which will avoid short reads; but this is only true if SA_RESTART is
> passed to sigaction(). Without SA_RESTART, we will indeed return
> short reads, as required by POSIX.
>
> I don't think Linus has said that short reads are always evil; I
> certainly can't remember him ever making that statement. Do you have
> a pointer to a LKML message where he's said that?
That's the last discussion about signals and I/O I can remember:
http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html
bye, Roman
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Read/write counts
2007-06-04 18:57 ` Roman Zippel
@ 2007-06-04 19:24 ` Joel Becker
2007-06-04 20:00 ` Theodore Tso
1 sibling, 0 replies; 8+ messages in thread
From: Joel Becker @ 2007-06-04 19:24 UTC (permalink / raw)
To: Roman Zippel
Cc: Theodore Tso, Matthew Wilcox, Bryan Henderson, Andreas Dilger,
David H. Lynch Jr., linux-fsdevel
On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote:
> On Mon, 4 Jun 2007, Theodore Tso wrote:
>
> > Hmm, I'm not sure I would go that far. Per the POSIX specification,
> > we support the optional BSD-style restartable system calls for signals
> > which will avoid short reads; but this is only true if SA_RESTART is
> > passed to sigaction(). Without SA_RESTART, we will indeed return
> > short reads, as required by POSIX.
> >
> > I don't think Linus has said that short reads are always evil; I
> > certainly can't remember him ever making that statement. Do you have
> > a pointer to a LKML message where he's said that?
>
> That's the last discussion about signals and I/O I can remember:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html
He said 'disk read', not 'read(2)'. I'd expect he means certain
things like stat(2) and readdir(2) when they have to go to disk.
read(2) explicitly lists EINTR as a valid result, and often folks use
signals to interrupt read(2). The world certainly writes programs
to expect short read(2).
Joel
--
"Gone to plant a weeping willow
On the bank's green edge it will roll, roll, roll.
Sing a lulaby beside the waters.
Lovers come and go, the river roll, roll, rolls."
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Read/write counts
2007-06-04 18:57 ` Roman Zippel
2007-06-04 19:24 ` Joel Becker
@ 2007-06-04 20:00 ` Theodore Tso
1 sibling, 0 replies; 8+ messages in thread
From: Theodore Tso @ 2007-06-04 20:00 UTC (permalink / raw)
To: Roman Zippel
Cc: Matthew Wilcox, Bryan Henderson, Andreas Dilger,
David H. Lynch Jr., linux-fsdevel
On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote:
> That's the last discussion about signals and I/O I can remember:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html
Well, I think Linus was saying that we have to do both (where the
signal interrupts and where it doesn't), and I agree with that:
There are enough reasons to discourage people from using uninterruptible
sleep ("this f*cking application won't die when the network goes down")
that I don't think this is an issue. We need to handle both cases, and
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
while we can expand on the two cases we have now, we can't remove them.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fortunately, although the -ERESTARTSYS framework is a little awkward
(and people can shoot arrows at me for creating it 15 year ago :-), we
do have a way of supporting both styles without _too_ much pain.
- Ted
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-06-04 20:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-04 10:20 Read/write counts David H. Lynch Jr.
2007-06-04 16:33 ` Andreas Dilger
2007-06-04 16:56 ` Bryan Henderson
2007-06-04 17:02 ` Matthew Wilcox
2007-06-04 18:33 ` Theodore Tso
2007-06-04 18:57 ` Roman Zippel
2007-06-04 19:24 ` Joel Becker
2007-06-04 20:00 ` Theodore Tso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).