linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Read/write counts
@ 2007-06-04 10:20 David H. Lynch Jr.
  2007-06-04 16:33 ` Andreas Dilger
  0 siblings, 1 reply; 8+ messages in thread
From: David H. Lynch Jr. @ 2007-06-04 10:20 UTC (permalink / raw)
  To: linux-fsdevel


    I have a file system that has really odd blocking.

    All files have a variable length header (basically a directory
entry) at their start.
    Most but not all sectors, have a small fixed length signature as
well as some link data at their start.

    The net result is that implimentation would be simpler if I could
just read/write, the amount of data
    that can be done with the least amount of work, even if that is less
than was requested.

    If I receive a request to read 512 bytes, and I return that I have
read 486, is either the OS, libc, or something else
    going to treat that as an error, or are they coming back for the
rest in a subsequent call ?

    I though I recalled that read()/write() returning a cound less than
requested is not an error.
   
   

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Read/write counts
  2007-06-04 10:20 Read/write counts David H. Lynch Jr.
@ 2007-06-04 16:33 ` Andreas Dilger
  2007-06-04 16:56   ` Bryan Henderson
  0 siblings, 1 reply; 8+ messages in thread
From: Andreas Dilger @ 2007-06-04 16:33 UTC (permalink / raw)
  To: David H. Lynch Jr.; +Cc: linux-fsdevel

On Jun 04, 2007  06:20 -0400, David H. Lynch Jr. wrote:
> The net result is that implimentation would be simpler if I could
> just read/write, the amount of data that can be done with the least
> amount of work, even if that is less than was requested.
> 
> If I receive a request to read 512 bytes, and I return that I have read
> 486, is either the OS, libc, or something else going to treat that as an
> error, or are they coming back for the rest in a subsequent call ?
> 
> I though I recalled that read()/write() returning a cound less than
> requested is not an error.

It is not strictly an error to read/write less than the requested amount,
but you will find that a lot of applications don't handle this correctly.
They will assume that if the amount read/written is != amount requested
that this is an error.  Of course the opposite is also true - some
applications assume that the amount requested == amount read/written and
don't even check whether that is actually the case or not.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Read/write counts
  2007-06-04 16:33 ` Andreas Dilger
@ 2007-06-04 16:56   ` Bryan Henderson
  2007-06-04 17:02     ` Matthew Wilcox
  0 siblings, 1 reply; 8+ messages in thread
From: Bryan Henderson @ 2007-06-04 16:56 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: David H. Lynch Jr., linux-fsdevel

>It is not strictly an error to read/write less than the requested amount,
>but you will find that a lot of applications don't handle this correctly.

I'd give it  a slightly different nuance.  It's not an error, and it's a 
reasonable thing to do, but there is value in not doing it.  POSIX and its 
predecessors back to the beginning of Unix say read()/write() don't have 
to transfer the full count (they must transfer at least one byte).  The 
main reason for this choice is that it may require more resources (e.g.  a 
memory buffer) than the system can allocate to do the whole request at 
once.

Programs that assume a full transfer are fairly common, but are 
universally regarded as either broken or just lazy, and when it does cause 
a problem, it is far more common to fix the application than the kernel.

Most application programs access files via libc's fread/fwrite, which 
don't have partial transfers.  GNU libc does handle partial (kernel) reads 
and writes correctly.  I'd be surprised if someone can name a major 
application that doesn't.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Read/write counts
  2007-06-04 16:56   ` Bryan Henderson
@ 2007-06-04 17:02     ` Matthew Wilcox
  2007-06-04 18:33       ` Theodore Tso
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2007-06-04 17:02 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: Andreas Dilger, David H. Lynch Jr., linux-fsdevel

On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote:
> Programs that assume a full transfer are fairly common, but are 
> universally regarded as either broken or just lazy, and when it does cause 
> a problem, it is far more common to fix the application than the kernel.

Linus has explicitly forbidden short reads from being returned.  The
original poster may get away with it for a specialised case, but for
example, signals may not cause a return to userspace with a short read
for exactly this reason.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Read/write counts
  2007-06-04 17:02     ` Matthew Wilcox
@ 2007-06-04 18:33       ` Theodore Tso
  2007-06-04 18:57         ` Roman Zippel
  0 siblings, 1 reply; 8+ messages in thread
From: Theodore Tso @ 2007-06-04 18:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Bryan Henderson, Andreas Dilger, David H. Lynch Jr.,
	linux-fsdevel

On Mon, Jun 04, 2007 at 11:02:23AM -0600, Matthew Wilcox wrote:
> On Mon, Jun 04, 2007 at 09:56:07AM -0700, Bryan Henderson wrote:
> > Programs that assume a full transfer are fairly common, but are 
> > universally regarded as either broken or just lazy, and when it does cause 
> > a problem, it is far more common to fix the application than the kernel.
> 
> Linus has explicitly forbidden short reads from being returned.  The
> original poster may get away with it for a specialised case, but for
> example, signals may not cause a return to userspace with a short read
> for exactly this reason.

Hmm, I'm not sure I would go that far.  Per the POSIX specification,
we support the optional BSD-style restartable system calls for signals
which will avoid short reads; but this is only true if SA_RESTART is
passed to sigaction().  Without SA_RESTART, we will indeed return
short reads, as required by POSIX.

I don't think Linus has said that short reads are always evil; I
certainly can't remember him ever making that statement.  Do you have
a pointer to a LKML message where he's said that?

							- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Read/write counts
  2007-06-04 18:33       ` Theodore Tso
@ 2007-06-04 18:57         ` Roman Zippel
  2007-06-04 19:24           ` Joel Becker
  2007-06-04 20:00           ` Theodore Tso
  0 siblings, 2 replies; 8+ messages in thread
From: Roman Zippel @ 2007-06-04 18:57 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Matthew Wilcox, Bryan Henderson, Andreas Dilger,
	David H. Lynch Jr., linux-fsdevel

Hi,

On Mon, 4 Jun 2007, Theodore Tso wrote:

> Hmm, I'm not sure I would go that far.  Per the POSIX specification,
> we support the optional BSD-style restartable system calls for signals
> which will avoid short reads; but this is only true if SA_RESTART is
> passed to sigaction().  Without SA_RESTART, we will indeed return
> short reads, as required by POSIX.
> 
> I don't think Linus has said that short reads are always evil; I
> certainly can't remember him ever making that statement.  Do you have
> a pointer to a LKML message where he's said that?

That's the last discussion about signals and I/O I can remember:
http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

bye, Roman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Read/write counts
  2007-06-04 18:57         ` Roman Zippel
@ 2007-06-04 19:24           ` Joel Becker
  2007-06-04 20:00           ` Theodore Tso
  1 sibling, 0 replies; 8+ messages in thread
From: Joel Becker @ 2007-06-04 19:24 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Theodore Tso, Matthew Wilcox, Bryan Henderson, Andreas Dilger,
	David H. Lynch Jr., linux-fsdevel

On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote:
> On Mon, 4 Jun 2007, Theodore Tso wrote:
> 
> > Hmm, I'm not sure I would go that far.  Per the POSIX specification,
> > we support the optional BSD-style restartable system calls for signals
> > which will avoid short reads; but this is only true if SA_RESTART is
> > passed to sigaction().  Without SA_RESTART, we will indeed return
> > short reads, as required by POSIX.
> > 
> > I don't think Linus has said that short reads are always evil; I
> > certainly can't remember him ever making that statement.  Do you have
> > a pointer to a LKML message where he's said that?
> 
> That's the last discussion about signals and I/O I can remember:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

	He said 'disk read', not 'read(2)'.  I'd expect he means certain
things like stat(2) and readdir(2) when they have to go to disk.
read(2) explicitly lists EINTR as a valid result, and often folks use
signals to interrupt read(2).  The world certainly writes programs
to expect short read(2).

Joel

-- 

"Gone to plant a weeping willow
 On the bank's green edge it will roll, roll, roll.
 Sing a lulaby beside the waters.
 Lovers come and go, the river roll, roll, rolls."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Read/write counts
  2007-06-04 18:57         ` Roman Zippel
  2007-06-04 19:24           ` Joel Becker
@ 2007-06-04 20:00           ` Theodore Tso
  1 sibling, 0 replies; 8+ messages in thread
From: Theodore Tso @ 2007-06-04 20:00 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Matthew Wilcox, Bryan Henderson, Andreas Dilger,
	David H. Lynch Jr., linux-fsdevel

On Mon, Jun 04, 2007 at 08:57:16PM +0200, Roman Zippel wrote:
> That's the last discussion about signals and I/O I can remember:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0208.0/0188.html

Well, I think Linus was saying that we have to do both (where the
signal interrupts and where it doesn't), and I agree with that:

  There are enough reasons to discourage people from using uninterruptible
  sleep ("this f*cking application won't die when the network goes down")
  that I don't think this is an issue. We need to handle both cases, and
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  while we can expand on the two cases we have now, we can't remove them. 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Fortunately, although the -ERESTARTSYS framework is a little awkward
(and people can shoot arrows at me for creating it 15 year ago :-), we
do have a way of supporting both styles without _too_ much pain.

							- Ted


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-06-04 20:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-04 10:20 Read/write counts David H. Lynch Jr.
2007-06-04 16:33 ` Andreas Dilger
2007-06-04 16:56   ` Bryan Henderson
2007-06-04 17:02     ` Matthew Wilcox
2007-06-04 18:33       ` Theodore Tso
2007-06-04 18:57         ` Roman Zippel
2007-06-04 19:24           ` Joel Becker
2007-06-04 20:00           ` Theodore Tso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).