public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* O_NONBLOCK is NOOP on block devices
@ 2010-03-03  8:26 Mike Hayward
  2010-03-03 11:50 ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Hayward @ 2010-03-03  8:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: kasey.erickson

I'm not sure who is working on block io these days, but hopefully an
active developer can steer this feedback toward folks who are as
interested in io performance as I am :-)

I've spent the last several years or so developing a user space
distributed storage system and I've recently gotten down to some io
performance tuning.  Surprisingly, my results indicate that the
O_NONBLOCK flag produces no noticable effect on read or writev to a
Linux block device.  I always perform aligned ios which are a multiple
of the sector size which also allows the use of O_DIRECT if desired.
For testing, I've been using 2.6.22 and 2.6.24 kernels (fedora core
and ubuntu distros) on both x86_64 and 32 bit arm architectures and
get similar results on every variation of hardware and kernel tested,
so I figure the behavior may still exist in the most recent kernels.

To extract the following data, I used the following set of system
calls in a loop driven by poll, surrounding read and write calls
immediately with time checks.

fd = open( filename, O_RDWR | O_NONBLOCK | O_NOATIME );
gettimeofday( &time, 0 );
read( fd, pos, len );
writev( fd, iov, count );
poll( pfd, npfd, timeoutms );

Byte counts are displayed in hex.  On my core 2 duo laptop, for
example, io to or from the buffer cache typically takes 100 to 125
micro seconds to transfer 64k.

----------------------------------------------------------------------
BUFFER CACHE NOT FULL, NONBLOCKING 64K WRITES AS EXPECTED

write fd:3 0.000117s bytes:10000 remain:0
write fd:3 0.000115s bytes:10000 remain:0
write fd:3 0.000116s bytes:10000 remain:0
write fd:3 0.000118s bytes:10000 remain:0
write fd:3 0.000125s bytes:10000 remain:0
write fd:3 0.000126s bytes:10000 remain:0
write fd:3 0.000101s bytes:10000 remain:0

----------------------------------------------------------------------
READING AND WRITING, BUFFER CACHE FULL

read  fd:3 0.006351s bytes:10000 remain:0
write fd:3 0.001235s bytes:200   remain:0
write fd:3 0.002477s bytes:200   remain:0
read  fd:3 0.005010s bytes:10000 remain:0
write fd:3 0.001243s bytes:200   remain:0
read  fd:3 0.005028s bytes:10000 remain:0
write fd:3 0.000506s bytes:200   remain:0
write fd:3 0.000106s bytes:10000 remain:0
write fd:3 0.000812s bytes:200   remain:0
write fd:3 0.000108s bytes:10000 remain:0
write fd:3 0.000807s bytes:200   remain:0
write fd:3 0.002652s bytes:200   remain:0
write fd:3 0.000107s bytes:10000 remain:0
write fd:3 0.000141s bytes:10000 remain:0
write fd:3 0.002232s bytes:200   remain:0

These are not worst-case, but rather best case results.  For an
example of more worse case results, using a usb flash device,
frequently (about once a second or so) under heavier load I see reads
or writes blocked for 500ms or more when vmstat and top report more
than 90% idle / wait.  500ms to perform a 512 byte "non blocking" io
with a nearly idle cpu is an eternity in computer time; more than
10,000 times longer than it should take to memcpy all or even a
portion of the data or return EAGAIN.

I discovered this because, even though they succeed, all of these
"non" blocking system calls are blocking so much so that they easily
choke my process non blocking socket io.  As a work around to this
failed attempt at nonblocking disk io, I now intend to implement a
somewhat more complex solution using aio or scsi generic to prevent
block device io from choking network io.

I think this O_NONBLOCK behavior has aspects that could probably be
classified as both a documentation and a kernel defect depending upon
whether the existing open(2) man page documents the intended behavior
of read and write or not.

If O_NONBLOCK is meaningful whatsoever (see man page docs for
semantics) against block devices, one would expect a nonblocking io
involving an unbuffered page to return either a partial result if a
prefix of the io can be completed immediately, or EAGAIN, schedule an
io against the device, then trigger a blocking select or poll type
call after the relevant page at the offending file descriptor cursor
becomes available in the buffer cache.  The timing and results of each
read or write call speak for themselves.  Specifying O_NONBLOCK does
not convert unbuffered ios to async buffer cache ios as expected;
typically blocking ios (i.e unbuffered reads or sustained writes to a
full, dirty buffer cache) definitely block in my app, whether or not
O_NONBLOCK is specified.

I've spent a tremendous amount of time building and benchmarking a
program based upon the Linux documentation for the previously
mentioned system calls only to find out the kernel doesn't behave as
specified.  To save someone else from my fate, if O_NONBLOCK doesn't
prevent reads and writes to block devices from blocking, it should be
documented in the man page, and preferably also return an error when
supplied as a flag to open or fcntl for a block device.  That's the
easy solution.  The harder solution would be to make the system calls
actually be non blocking when O_NONBLOCK is specified.

Furthermore, I've also noticed these kernels also allow O_NONBLOCK and
O_DIRECT to be simultaneously specified against a block device even
though this is not logically even possible since, by definition, the
buffer cache is not involved and the process will have to wait for the
io to synchronously complete.  This flag incompatibility should
probably be documented for clarity and it would be straight forward
for it to return an error if these contradictory behaviors are
simultaneously specified, unintentionally of course.

Thoughts anyone?

- Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
  2010-03-03  8:26 Mike Hayward
@ 2010-03-03 11:50 ` Alan Cox
  2010-03-03 19:49   ` Mike Hayward
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2010-03-03 11:50 UTC (permalink / raw)
  To: Mike Hayward; +Cc: linux-kernel, kasey.erickson

> If O_NONBLOCK is meaningful whatsoever (see man page docs for
> semantics) against block devices, one would expect a nonblocking io

It isn't...

The manual page says "When possible, the file is opened in non-blocking
mode" . Your write is probably not blocking - but the memory allocation
for it is forcing other data to disk to make room. ie it didn't block it
was just "slow".

O_NONBLOCK on a regular file does influence how it responds to leases and
mandatory locks.

> probably be documented for clarity and it would be straight forward
> for it to return an error if these contradictory behaviors are
> simultaneously specified, unintentionally of course.

and risk breaking existing apps.

> Thoughts anyone?


Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
  2010-03-03 11:50 ` Alan Cox
@ 2010-03-03 19:49   ` Mike Hayward
  2010-03-03 21:25     ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Hayward @ 2010-03-03 19:49 UTC (permalink / raw)
  To: alan; +Cc: linux-kernel, kasey.erickson

Hi Alan,

 > > If O_NONBLOCK is meaningful whatsoever (see man page docs for
 > > semantics) against block devices, one would expect a nonblocking io
 > 
 > It isn't...

Thanks for the reply.  It's good to get confirmation that I am not all
alone in an alternate non blocking universe.  The linux man pages
actually had me convinced O_NONBLOCK would actually keep a process
from blocking on device io :-)

 > The manual page says "When possible, the file is opened in non-blocking
 > mode" . Your write is probably not blocking - but the memory allocation
 > for it is forcing other data to disk to make room. ie it didn't block it
 > was just "slow".

Even though I know quit well what blocking is, I am not sure how we
define "slowness".  Perhaps when we do define it, we can also define
"immediately" to mean anything less than five seconds ;-)

You are correct that io to the disk is precisely what must happen to
complete, and last time I checked, that was the very definition of
blocking.  Not only are writes blocking, even reads are blocking.  The
docs for read(2) also says it will return EAGAIN if "Non-blocking I/O
has been selected using O_NONBLOCK and no data was immediately
available for reading."

There is no doubt the kernel is blocking the process whether or not
O_NONBLOCK is specified.  Look again at the timings I sent; the flag
doesn't affect io at all.  I think we can probably agree that reading
from an empty buffer cache should by definition return EAGAIN within a
few microseconds if it isn't going to block the process.  But it
doesn't.  I can easily make a process "run slowly" for an entire half
of a second or longer just trying to perform a 512 byte "non blocking"
read on a system with a virtually idle cpu.

Writing is no different from reading when the buffer cache cannot
immediately service either kind of request (i.e. all pages are dirty,
writing a page not in the cache, and there is no more free ram).  If a
process can't run while the kernel performs io to a device to service
a writev call, it is by definition blocking said process.  I certainly
concur that blocking is also both slow and not very immediate :-)

Why is blocking io an issue?  As an example, time non blocking reads
to a drive and it takes say 5ms to return from a 64k read.  Run
several processes simultaneously doing the same thing and it takes say
10ms to service each "non blocking" read request.  Do a couple hundred
ios per second in each process and you'll soon find out your processes
(or threads) have nearly zero time at the cpu despite the fact that
the system is virtually idle and you are performing 100% "linux non
blocking" device io.

I've been doing unix io for a very long time and can assure you that
this is precisely why most high performance io applications use
asynchronous io libraries or multiple threads.  It isn't that they are
necessarily compute intensive, but if read and write are going to
blocking your process, how else can you simultaneously execute ios to
different devices or perform computation while waiting on device io?

----------------------------------------------------------------------
There is currently and quite literally no point in specifying
O_NONBLOCK in Linux when opening a block device to affect anything
other than locking semantics, since it doesn't do anything.
----------------------------------------------------------------------

I'm not arguing that linux either should or should not provide non
blocking read and write calls, but pointing out that the documentation
claims it does when clearly O_NONBLOCK doesn't do anything related to
io, at least not with a block device.  Probably it doesn't do anything
related to read or write against file systems either.

 > > probably be documented for clarity and it would be straight forward
 > > for it to return an error if these contradictory behaviors are
 > > simultaneously specified, unintentionally of course.
 > 
 > and risk breaking existing apps.

Changing anything risks breaking an app somewhere :-) You are right, I
completely agree it isn't appropriate to remove it since it's meaning
has been overloaded and it affects locking semantics with O_DIRECT.

Perhaps the man pages are partly derived from POSIX specs and non
blocking read and write calls are where linux eventually wants to be?
Updating the docs to describe it's actual behavior as it stands (or
rather, lack thereof) should be fairly low impact on existing apps.

How much effort do you think it would take to build consensus to
update the man pages?  Accurate man pages don't really break code and
should really cut down on a lot of confusion, emails, and wasted
effort going forward.  Do you think we should post a documentation
defect as opposed to a kernel defect?

- Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
  2010-03-03 19:49   ` Mike Hayward
@ 2010-03-03 21:25     ` Alan Cox
  0 siblings, 0 replies; 11+ messages in thread
From: Alan Cox @ 2010-03-03 21:25 UTC (permalink / raw)
  To: Mike Hayward; +Cc: linux-kernel, kasey.erickson

> blocking.  Not only are writes blocking, even reads are blocking.  The
> docs for read(2) also says it will return EAGAIN if "Non-blocking I/O
> has been selected using O_NONBLOCK and no data was immediately
> available for reading."

The read case is more clearly blocking. We don't implement non blocking
disk I/O in that sense, although AIO sort of does and threads are very
cheap for I/O tasks.

> There is no doubt the kernel is blocking the process whether or not
> O_NONBLOCK is specified.  Look again at the timings I sent; the flag
> doesn't affect io at all.  I think we can probably agree that reading
> from an empty buffer cache should by definition return EAGAIN within a
> few microseconds if it isn't going to block the process. 

That might make sense in its own way but there would then be no reason
for the I/O ever to complete. Non blocking tends to mean "don't wait for
some external non kernel event" (eg serial data arriving, hitting a
button)

> I've been doing unix io for a very long time and can assure you that
> this is precisely why most high performance io applications use
> asynchronous io libraries or multiple threads.  It isn't that they are
> necessarily compute intensive, but if read and write are going to
> blocking your process, how else can you simultaneously execute ios to
> different devices or perform computation while waiting on device io?

The big challenge is that you may need to do disk I/O in many situations
you don't expect. Eg to find out which disk block in the cache you want
to see is available might require disk I/O itself.

You would end up with an implementation model in the kernel that was
essentially

	if (O_NDELAY) {
		try_op
		if blocking create thread
	}

which would badly underperform threading it in the first place.

Unix perhaps never got it entirely right, but we inherited that model.
VMS SYS$QIO v SYS$QIOW is a good deal more elegantly structured.

> claims it does when clearly O_NONBLOCK doesn't do anything related to
> io, at least not with a block device.  Probably it doesn't do anything
> related to read or write against file systems either.

Correct - except for things like mandatory locks where it has a real
meaning.

> Perhaps the man pages are partly derived from POSIX specs and non
> blocking read and write calls are where linux eventually wants to be?
> Updating the docs to describe it's actual behavior as it stands (or
> rather, lack thereof) should be fairly low impact on existing apps.

I've not read the SuS entries on this for a while. There was some
discussion a while ago on what was needed to create an behaviour where
as soon as something blocked it created a thread that continued to
perform the I/O side and returned an error. It's not an easy problem to
solve and it's not clear that solving it is actually worth it versus
using threads and making sure our thread implemntation is fast and has
fast synchronization primitives.

> How much effort do you think it would take to build consensus to
> update the man pages?  Accurate man pages don't really break code and
> should really cut down on a lot of confusion, emails, and wasted
> effort going forward.  Do you think we should post a documentation
> defect as opposed to a kernel defect?

I would go one further... post a documentation patch to:
linux-man@vger.kernel.org for discussion merging.

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
       [not found] <4B904D98.50602@gmail.com>
@ 2010-03-05  1:39 ` M vd S
  2010-03-05 16:03   ` Jeff Moyer
  2010-03-10  0:50   ` M vd S
  0 siblings, 2 replies; 11+ messages in thread
From: M vd S @ 2010-03-05  1:39 UTC (permalink / raw)
  To: linux-kernel

 > > > If O_NONBLOCK is meaningful whatsoever (see man page docs for
> > > semantics) against block devices, one would expect a nonblocking io
> >
> > It isn't...
>
> Thanks for the reply. It's good to get confirmation that I am not all
> alone in an alternate non blocking universe. The linux man pages
> actually had me convinced O_NONBLOCK would actually keep a process
> from blocking on device io :-)
>

You're even less alone, I'm running into the same issue just now. But I 
think I've found a way around it, see below.

> > The manual page says "When possible, the file is opened in non-blocking
> > mode" . Your write is probably not blocking - but the memory allocation
> > for it is forcing other data to disk to make room. ie it didn't 
> block it
> > was just "slow".
>
> Even though I know quit well what blocking is, I am not sure how we
> define "slowness". Perhaps when we do define it, we can also define
> "immediately" to mean anything less than five seconds ;-)
>
> You are correct that io to the disk is precisely what must happen to
> complete, and last time I checked, that was the very definition of
> blocking. Not only are writes blocking, even reads are blocking. The
> docs for read(2) also says it will return EAGAIN if "Non-blocking I/O
> has been selected using O_NONBLOCK and no data was immediately
> available for reading."
>

The read(2) manpage reads, under NOTES:

"Many file systems and disks were considered to be fast enough that the 
implementation of O_NONBLOCK was deemed unnecessary.  So, O_NONBLOCK may 
not be available on files and/or disks."

The statement ("fast enough") maybe only reflects the state of affairs 
at that time - 10 ms seek time takes an eternity at 3 GHz, and times 
100k it takes an eternity IRL as well. I would define "immediately" if 
the data is available from kernel (or disk) buffers.

I need to do vast amounts (100k+) of scattered and unordered small reads 
from harddisk and want to keep my seeks short through sorting them. I 
have done some measurements and it seems perfectly possible to derive 
the physical disk layout from statistics on some 10-100k random seeks, 
so I can solve everything in userland. But before writing my own I/O 
scheduler I'd thought to give the kernel and/or SATA's NCQ tricks a shot.

Now the problem is how to tell the kernel/disk which data I want without 
blocking. readv(2) appearantly reads the requests in array order. 
Multithreading doesn't sound too good for just this purpose.

posix_fadvise(2) sounds like something: "POSIX_FADV_WILLNEED initiates a 
non-blocking read of the specified region into the page cache."
But there's appearantly no signalling to the process that an actual 
read() will indeed not block.

readahead(2) blocks until the specified data has been read.

aio_read(2) appearantly doesn't issue a real non blocking read request, 
so you will get the unneeded overhead of one thread per outstanding request.


mmap(2) / madvise(2) / mincore(2) may be a way around things (although 
non-atomic), but I haven't tested it yet. It might also solve the 
problem that started this thread, at least for the reading part of it. 
Writing a small read() like function that operates through mmap() 
doesn't seem too complicated. As for writing, you could use msync() with 
MS_ASYNC to initiate a write. I'm not sure how to find out if a write 
has indeed taken place, but at least initiating a non-blocking write is 
possible. munmap() might then still block.

Maybe some guru here can tell beforehand if such an approach would work?

Cheers,
M.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
  2010-03-05  1:39 ` O_NONBLOCK is NOOP on block devices M vd S
@ 2010-03-05 16:03   ` Jeff Moyer
  2010-03-05 19:43     ` Mike Hayward
  2010-03-10  0:50   ` M vd S
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2010-03-05 16:03 UTC (permalink / raw)
  To: M vd S; +Cc: linux-kernel

M vd S <mvds.00@gmail.com> writes:

>> > > If O_NONBLOCK is meaningful whatsoever (see man page docs for
>> > > semantics) against block devices, one would expect a nonblocking io
>> >
>> > It isn't...
>>
>> Thanks for the reply. It's good to get confirmation that I am not all
>> alone in an alternate non blocking universe. The linux man pages
>> actually had me convinced O_NONBLOCK would actually keep a process
>> from blocking on device io :-)
>>
>
> You're even less alone, I'm running into the same issue just now. But
> I think I've found a way around it, see below.

I guess I should note that I've suggested nonblocking I/O for files
before:

  http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-10/0290.html

I'll also note that enabling such a patch broke apps that accessed cd
burners, for example, since O_NONBLOCK had some preexisting semantics
there that I fail to recall.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
  2010-03-05 16:03   ` Jeff Moyer
@ 2010-03-05 19:43     ` Mike Hayward
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Hayward @ 2010-03-05 19:43 UTC (permalink / raw)
  To: jmoyer; +Cc: mvds.00, linux-kernel

Hi Jeff,

 > I guess I should note that I've suggested nonblocking I/O for files
 > before:
 > 
 >   http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-10/0290.html
 > 
 > I'll also note that enabling such a patch broke apps that accessed cd
 > burners, for example, since O_NONBLOCK had some preexisting semantics
 > there that I fail to recall.

Sounds like nonblocking read/write calls are strongly tied to threads
instead of state related to a file descriptor.  I haven't poked around
in there, but perhaps the current linux io architecture is just too
set in stone to design an efficient non blocking mechanism.  It would
be a shame not to fix it simply because some broken apps depend upon
blocking behavior when they have been explicitly specifying O_NONBLOCK
via open or fcntl.

At least for now we can describe it's actual behavior in the man
pages; I will be submitting a man page patch for consideration later
today.

- Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
  2010-03-05  1:39 ` O_NONBLOCK is NOOP on block devices M vd S
  2010-03-05 16:03   ` Jeff Moyer
@ 2010-03-10  0:50   ` M vd S
  2010-03-10 13:21     ` Jeff Moyer
  1 sibling, 1 reply; 11+ messages in thread
From: M vd S @ 2010-03-10  0:50 UTC (permalink / raw)
  To: linux-kernel


> mmap(2) / madvise(2) / mincore(2) may be a way around things (although 
> non-atomic), but I haven't tested it yet. It might also solve the 
> problem that started this thread, at least for the reading part of it. 
> Writing a small read() like function that operates through mmap() 
> doesn't seem too complicated. As for writing, you could use msync() 
> with MS_ASYNC to initiate a write. I'm not sure how to find out if a 
> write has indeed taken place, but at least initiating a non-blocking 
> write is possible. munmap() might then still block.
>

For the record I would like to share my very positive experience with 
the approach described. Thanks to 64 bit addressing you can mmap() an 
entire block device, and madvise() and mincore() work like you would 
expect them to. I haven't tried writing.

I also briefly tried aio_* and the libaio interface. The former is not 
really asynchronous - all requests are put in one separate thread where 
they will be executed in order, i.e. blocking, so you don't get any 
advantage from NCQ or data that was cached by the disk or the kernel. 
The latter apparently ends in an io_submit() which will block until all 
queued reads are finished, but I might have missed something there.

Imagine the orderly world in which O_NONBLOCK would make syscalls 
actually non-blocking...

Cheers,
M.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
  2010-03-10  0:50   ` M vd S
@ 2010-03-10 13:21     ` Jeff Moyer
  2010-03-10 17:09       ` M vd S
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2010-03-10 13:21 UTC (permalink / raw)
  To: M vd S; +Cc: linux-kernel

M vd S <mvds.00@gmail.com> writes:
> I also briefly tried aio_* and the libaio interface. The former is not
> really asynchronous - all requests are put in one separate thread
> where they will be executed in order, i.e. blocking, so you don't get
> any advantage from NCQ or data that was cached by the disk or the
> kernel. The latter apparently ends in an io_submit() which will block
> until all queued reads are finished, but I might have missed something
> there.

What you missed is that the native aio system calls require O_DIRECT.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
  2010-03-10 13:21     ` Jeff Moyer
@ 2010-03-10 17:09       ` M vd S
       [not found]         ` <201003102350.o2ANousd007794@alien.loup.net>
  0 siblings, 1 reply; 11+ messages in thread
From: M vd S @ 2010-03-10 17:09 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On 3/10/10 2:21 PM, Jeff Moyer wrote:
> M vd S <mvds.00@gmail.com> writes:
>   
>> I also briefly tried aio_* and the libaio interface. The former is not
>> really asynchronous - all requests are put in one separate thread
>> where they will be executed in order, i.e. blocking, so you don't get
>> any advantage from NCQ or data that was cached by the disk or the
>> kernel. The latter apparently ends in an io_submit() which will block
>> until all queued reads are finished, but I might have missed something
>> there.
>>     
>
> What you missed is that the native aio system calls require O_DIRECT.
>   

Thanks, that made it work. It seems without O_DIRECT it's just like 
aio_* but without the separate thread. But I now get the "benefits" of 
O_DIRECT for free...

Cheers,
M.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: O_NONBLOCK is NOOP on block devices
       [not found]           ` <4B983E88.5080901@gmail.com>
@ 2010-03-11  7:41             ` Mike Hayward
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Hayward @ 2010-03-11  7:41 UTC (permalink / raw)
  To: mvds.00; +Cc: alan, linux-kernel, linux-man

Hi M,

 > >  > > What you missed is that the native aio system calls require O_DIRECT.
 > >  > >   
 > >  > 
 > >  > Thanks, that made it work. It seems without O_DIRECT it's just like 
 > >  > aio_* but without the separate thread. But I now get the "benefits" of 
 > >  > O_DIRECT for free...
 > >
 > > That is awesome news; I was worried.  I saw that about O_DIRECT in the
 > > doc but assumed you were doing it.
 > >
 > >   
 > Where did you see that? I reverted to the kernel source where indeed I 
 > saw __generic_file_aio_read() in mm/filemap.c check for O_DIRECT.
 >
 > io_submit(3), io_setup(3) etc don't mention O_DIRECT. Even the example 
 > in io(3) doesn't do O_DIRECT, so it must be broken. The example has no 
 > means to see if it is in fact a non blocking system call. But io(3) 
 > states "The  libaio  library defines a new set of I/O operations which 
 > can significantly reduce the time an application spends waiting at I/O.  
 > The new functions allow a program to initiate one or more I/O operations 
 > and then immediately resume normal work while the I/O operations are 
 > executed in parallel."

Not in the linux man pages, but a few folks around have web pages I
was able to google about actual behavior:

http://lse.sourceforge.net/io/aio.html

Like you, I trusted that the man pages actually described the behavior
in my software design until the "nonblocking" read and writev calls
choked off the nonblocking sockets :-)  Now I am writing extra test code
for all system calls I consider using; not sure if there is actual public
test code or not.

You are absolutely right, the blocking behavior of libaio without
specifying O_DIRECT should also definitely be in the man pages.  Why
isn't it an error to not specify O_DIRECT when that's the only way
libaio to block devs is actual async io?  It seems a bit odd to go to
the trouble to use libaio if synchronous behavior is expected?!

You might want to wait and see if my man patch even gets applied
before going to the trouble to make another one.  Alan Cox suggested I
post a patch to spell out the actual behavior of the blocking
"O_NONBLOCK" read and write class of calls.  I did that and a number
of us vetted the patch before I posted it to linux-man like a week
ago, but no feedback there from Michael Kerrisk or anyone else yet.
Maybe he's on holiday, or maybe someone else can also carry the man
page pumpkin, I don't know...

Either way I imagine lkml sees this over and over again and fixing the
man pages would go a long way toward cutting down on confusion.

- Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-03-11  7:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4B904D98.50602@gmail.com>
2010-03-05  1:39 ` O_NONBLOCK is NOOP on block devices M vd S
2010-03-05 16:03   ` Jeff Moyer
2010-03-05 19:43     ` Mike Hayward
2010-03-10  0:50   ` M vd S
2010-03-10 13:21     ` Jeff Moyer
2010-03-10 17:09       ` M vd S
     [not found]         ` <201003102350.o2ANousd007794@alien.loup.net>
     [not found]           ` <4B983E88.5080901@gmail.com>
2010-03-11  7:41             ` Mike Hayward
2010-03-03  8:26 Mike Hayward
2010-03-03 11:50 ` Alan Cox
2010-03-03 19:49   ` Mike Hayward
2010-03-03 21:25     ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox