* Re: [patch rfc] towards supporting O_NONBLOCK on regular files @ 2004-10-05 13:07 Dan Kegel 0 siblings, 0 replies; 24+ messages in thread From: Dan Kegel @ 2004-10-05 13:07 UTC (permalink / raw) To: Linux Kernel Mailing List Marcelo wrote: > Curiosity: Is this defined in any UNIX standard? No. See http://www.opengroup.org/onlinepubs/009695399/functions/open.html which leaves it undefined. http://www.pasc.org/interps/unofficial/db/p1003.1/pasc-1003.1-71.html says implementations have to allow setting O_NONBLOCK even if they ignore it. http://www.ussg.iu.edu/hypermail/linux/kernel/9911.3/0530.html claims other Unixes and NT implement it. There's a thread that discusses this in a bit of detail, and suggests that older Solaris might implement it: http://lists.freebsd.org/pipermail/freebsd-arch/2003-April/000132.html http://lists.freebsd.org/pipermail/freebsd-arch/2003-April/000134.html Googling for O_NONBLOCK disk seems to be good. I'd google more but my baby is calling :-) - Dan -- My technical stuff: http://kegel.com My politics: see http://www.misleader.org for examples of why I'm for regime change ^ permalink raw reply [flat|nested] 24+ messages in thread
* [patch rfc] towards supporting O_NONBLOCK on regular files
@ 2004-10-01 20:57 Jeff Moyer
2004-10-03 19:48 ` Pavel Machek
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Jeff Moyer @ 2004-10-01 20:57 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, sct
This patch makes an attempt at supporting the O_NONBLOCK flag for regular
files. It's pretty straight-forward. One limitation is that we still call
into the readahead code, which I believe can block. However, if we don't
do this, then an application which only uses non-blocking reads may never
get it's data.
Comments welcome.
-Jeff
--- linux-2.6.8/mm/filemap.c.orig 2004-09-30 16:33:46.881129560 -0400
+++ linux-2.6.8/mm/filemap.c 2004-09-30 16:34:12.109294296 -0400
@@ -720,7 +720,7 @@ void do_generic_mapping_read(struct addr
unsigned long index, end_index, offset;
loff_t isize;
struct page *cached_page;
- int error;
+ int error, nonblock = filp->f_flags & O_NONBLOCK;
struct file_ra_state ra = *_ra;
cached_page = NULL;
@@ -755,10 +755,20 @@ find_page:
page = find_get_page(mapping, index);
if (unlikely(page == NULL)) {
handle_ra_miss(mapping, &ra, index);
+ if (nonblock) {
+ desc->error = -EWOULDBLOCK;
+ break;
+ }
goto no_cached_page;
}
- if (!PageUptodate(page))
+ if (!PageUptodate(page)) {
+ if (nonblock) {
+ page_cache_release(page);
+ desc->error = -EWOULDBLOCK;
+ break;
+ }
goto page_not_up_to_date;
+ }
page_ok:
/* If users can be writing to this page using arbitrary
@@ -1004,7 +1014,7 @@ __generic_file_aio_read(struct kiocb *io
desc.error = 0;
do_generic_file_read(filp,ppos,&desc,file_read_actor);
retval += desc.written;
- if (!retval) {
+ if (!retval || desc.error) {
retval = desc.error;
break;
}
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-01 20:57 Jeff Moyer @ 2004-10-03 19:48 ` Pavel Machek 2004-10-13 14:28 ` Jeff Moyer 2004-10-05 11:27 ` Marcelo Tosatti 2004-10-05 15:35 ` Rik van Riel 2 siblings, 1 reply; 24+ messages in thread From: Pavel Machek @ 2004-10-03 19:48 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel, mingo, sct Hi! > This patch makes an attempt at supporting the O_NONBLOCK flag for regular > files. It's pretty straight-forward. One limitation is that we still call > into the readahead code, which I believe can block. However, if we don't > do this, then an application which only uses non-blocking reads may never > get it's data. This looks very nice. Does it mean that aio and friends are instantly obsolete? Does it have comparable performance to aio? Pavel -- 64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-03 19:48 ` Pavel Machek @ 2004-10-13 14:28 ` Jeff Moyer 2004-10-14 17:39 ` Pavel Machek 0 siblings, 1 reply; 24+ messages in thread From: Jeff Moyer @ 2004-10-13 14:28 UTC (permalink / raw) To: Pavel Machek; +Cc: linux-kernel, mingo, sct ==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; Pavel Machek <pavel@ucw.cz> adds: pavel> Hi! >> This patch makes an attempt at supporting the O_NONBLOCK flag for >> regular files. It's pretty straight-forward. One limitation is that we >> still call into the readahead code, which I believe can block. However, >> if we don't do this, then an application which only uses non-blocking >> reads may never get it's data. pavel> This looks very nice. Does it mean that aio and friends are pavel> instantly obsolete? I dont' think so. This only addresses the read() path, for one. Plus, in it's current form, it will not perform any I/O if the data is not present. So, you will need another thread/process to do kick off the I/O. pavel> Does it have comparable performance to aio? I haven't run any tests. One advantage this has to current aio is that it can operate without O_DIRECT. -Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-13 14:28 ` Jeff Moyer @ 2004-10-14 17:39 ` Pavel Machek 0 siblings, 0 replies; 24+ messages in thread From: Pavel Machek @ 2004-10-14 17:39 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel, mingo, sct Hi! > pavel> Does it have comparable performance to aio? > > I haven't run any tests. One advantage this has to current aio is that it > can operate without O_DIRECT. Well, second advantage is that it is much nicer interface... Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-01 20:57 Jeff Moyer 2004-10-03 19:48 ` Pavel Machek @ 2004-10-05 11:27 ` Marcelo Tosatti 2004-10-06 13:13 ` Jeff Moyer 2004-10-05 15:35 ` Rik van Riel 2 siblings, 1 reply; 24+ messages in thread From: Marcelo Tosatti @ 2004-10-05 11:27 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel, mingo, sct On Fri, Oct 01, 2004 at 04:57:50PM -0400, Jeff Moyer wrote: > This patch makes an attempt at supporting the O_NONBLOCK flag for regular > files. It's pretty straight-forward. One limitation is that we still call > into the readahead code, which I believe can block. However, if we don't > do this, then an application which only uses non-blocking reads may never > get it's data. > > Comments welcome. Hi Jeff, Curiosity: Is this defined in any UNIX standard? Adv. Programming in the UNIX environment says: 12.2 Nonblocking I/O "Nonblocking I/O lets us issue an I/O operation, such as open, read, or write and not have it block forever. If the operation cannot be completed, return is made immediately with an error noting that the operation would have blocked." He is talking about read's on descriptors (pipe's, devices, etc) which block in case of no data present, not about filesystem IO. But here we create our own semantics of O_NONBLOCK on read() of fs IO. As you say page_cache_readahead can block for one trying to allocate pages, possibly while submitting IO too. The patch is cool - might be nice to check if SuS or someone else specificies behaviour and try to match if so? > > -Jeff > > --- linux-2.6.8/mm/filemap.c.orig 2004-09-30 16:33:46.881129560 -0400 > +++ linux-2.6.8/mm/filemap.c 2004-09-30 16:34:12.109294296 -0400 > @@ -720,7 +720,7 @@ void do_generic_mapping_read(struct addr > unsigned long index, end_index, offset; > loff_t isize; > struct page *cached_page; > - int error; > + int error, nonblock = filp->f_flags & O_NONBLOCK; > struct file_ra_state ra = *_ra; > > cached_page = NULL; > @@ -755,10 +755,20 @@ find_page: > page = find_get_page(mapping, index); > if (unlikely(page == NULL)) { > handle_ra_miss(mapping, &ra, index); > + if (nonblock) { > + desc->error = -EWOULDBLOCK; > + break; > + } > goto no_cached_page; > } > - if (!PageUptodate(page)) > + if (!PageUptodate(page)) { > + if (nonblock) { > + page_cache_release(page); > + desc->error = -EWOULDBLOCK; > + break; > + } > goto page_not_up_to_date; > + } > page_ok: > > /* If users can be writing to this page using arbitrary > @@ -1004,7 +1014,7 @@ __generic_file_aio_read(struct kiocb *io > desc.error = 0; > do_generic_file_read(filp,ppos,&desc,file_read_actor); > retval += desc.written; > - if (!retval) { > + if (!retval || desc.error) { > retval = desc.error; > break; > } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-05 11:27 ` Marcelo Tosatti @ 2004-10-06 13:13 ` Jeff Moyer 2004-10-06 12:01 ` Marcelo Tosatti 0 siblings, 1 reply; 24+ messages in thread From: Jeff Moyer @ 2004-10-06 13:13 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-kernel, mingo, sct ==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; Marcelo Tosatti <marcelo.tosatti@cyclades.com> adds: marcelo.tosatti> On Fri, Oct 01, 2004 at 04:57:50PM -0400, Jeff Moyer marcelo.tosatti> wrote: >> This patch makes an attempt at supporting the O_NONBLOCK flag for >> regular files. It's pretty straight-forward. One limitation is that we >> still call into the readahead code, which I believe can block. However, >> if we don't do this, then an application which only uses non-blocking >> reads may never get it's data. >> >> Comments welcome. marcelo.tosatti> Hi Jeff, marcelo.tosatti> Curiosity: Is this defined in any UNIX standard? marcelo.tosatti> Adv. Programming in the UNIX environment says: marcelo.tosatti> 12.2 Nonblocking I/O marcelo.tosatti> "Nonblocking I/O lets us issue an I/O operation, such as marcelo.tosatti> open, read, or write and not have it block forever. If the marcelo.tosatti> operation cannot be completed, return is made immediately marcelo.tosatti> with an error noting that the operation would have marcelo.tosatti> blocked." marcelo.tosatti> He is talking about read's on descriptors (pipe's, marcelo.tosatti> devices, etc) which block in case of no data present, not marcelo.tosatti> about filesystem IO. marcelo.tosatti> But here we create our own semantics of O_NONBLOCK on marcelo.tosatti> read() of fs IO. As you say page_cache_readahead can block marcelo.tosatti> for one trying to allocate pages, possibly while marcelo.tosatti> submitting IO too. marcelo.tosatti> The patch is cool - might be nice to check if SuS or marcelo.tosatti> someone else specificies behaviour and try to match if so? Hi, Marcelo, The text below was taken from the following standard: IEEE Std 1003.1, 2004 Edition The Open Group Technical Standard Base Sepcifications, Issue 6 Includes IEEE Std 1003.1-2001, IEEE Std 1003.1-2001/Cor 1-2002 open() O_NONBLOCK When opening a FIFO with O_RDONLY or O_WRONLY set: o If O_NONBLOCK is set, an open( ) for reading-only shall return without delay. An open( ) for writing-only shall return an error if no process currently has the file open for reading. o If O_NONBLOCK is clear, an open( ) for reading-only shall block the calling thread until a thread opens the file for writing. An open( ) for writing-only shall block the calling thread until a thread opens the file for reading. When opening a block special or character special file that supports non- blocking opens: o If O_NONBLOCK is set, the open( ) function shall return without blocking for the device to be ready or available. Subsequent behavior of the device is device-specific. o If O_NONBLOCK is clear, the open( ) function shall block the calling thread until the device is ready or available before returning. Otherwise, the behavior of O_NONBLOCK is unspecified. read() When attempting to read a file (other than a pipe or FIFO) that supports non-blocking reads and has no data currently available: o If O_NONBLOCK is set, read( ) shall return -1 and set errno to [EAGAIN]. o If O_NONBLOCK is clear, read( ) shall block the calling thread until some data becomes available. o The use of the O_NONBLOCK flag has no effect if there is some data available. write() When attempting to write to a file descriptor (other than a pipe or FIFO) that supports non- blocking writes and cannot accept the data immediately: o If the O_NONBLOCK flag is clear, write( ) shall block the calling thread until the data can be accepted. o If the O_NONBLOCK flag is set, write( ) shall not block the thread. If some data can be written without blocking the thread, write( ) shall write what it can and return the number of bytes written. Otherwise, it shall return -1 and set errno to [EAGAIN]. Thanks! Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-06 13:13 ` Jeff Moyer @ 2004-10-06 12:01 ` Marcelo Tosatti 2004-10-07 3:31 ` Stephen C. Tweedie 0 siblings, 1 reply; 24+ messages in thread From: Marcelo Tosatti @ 2004-10-06 12:01 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel, mingo, sct On Wed, Oct 06, 2004 at 09:13:38AM -0400, Jeff Moyer wrote: > ==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; Marcelo Tosatti <marcelo.tosatti@cyclades.com> adds: Hi Jeff! > Hi, Marcelo, > > The text below was taken from the following standard: > > IEEE Std 1003.1, 2004 Edition > The Open Group Technical Standard > Base Sepcifications, Issue 6 > Includes IEEE Std 1003.1-2001, IEEE Std 1003.1-2001/Cor 1-2002 > > > open() > > O_NONBLOCK When opening a FIFO with O_RDONLY or O_WRONLY set: > o If O_NONBLOCK is set, an open( ) for reading-only shall return without > delay. An open( ) for writing-only shall return an error if no process > currently has the file open for reading. > o If O_NONBLOCK is clear, an open( ) for reading-only shall block the > calling thread until a thread opens the file for writing. An open( ) for > writing-only shall block the calling thread until a thread opens the file > for reading. > > When opening a block special or character special file that supports non- > blocking opens: > o If O_NONBLOCK is set, the open( ) function shall return without blocking > for the device to be ready or available. Subsequent behavior of the device > is device-specific. > o If O_NONBLOCK is clear, the open( ) function shall block the calling > thread until the device is ready or available before returning. > > Otherwise, the behavior of O_NONBLOCK is unspecified. > > read() > > When attempting to read a file (other than a pipe or FIFO) that supports > non-blocking reads and has no data currently available: > o If O_NONBLOCK is set, read( ) shall return -1 and set errno to [EAGAIN]. This implies read(O_NONBLOCK) should never block. Maybe your code should pass down __GFP_FAIL in the gfp_mask to the page_cache_alloc() to avoid blocking reclaiming pages, and possibly pass info down to the block layer "if this is going to block, fail". We have a bdi_congested() check before doing the readahead (dont RA if the queue is full). Check if that really works - I'm afraid it can block because the check is not done at each page submitted, but rather at each readahead operation (which can be several pages large). > o If O_NONBLOCK is clear, read( ) shall block the calling thread until > some data becomes available. > o The use of the O_NONBLOCK flag has no effect if there is some data > available. > > write() > > When attempting to write to a file descriptor (other than a pipe or FIFO) > that supports non- blocking writes and cannot accept the data immediately: > o If the O_NONBLOCK flag is clear, write( ) shall block the calling > thread until the data can be accepted. > o If the O_NONBLOCK flag is set, write( ) shall not block the thread. If > some data can be written without blocking the thread, write( ) shall > write what it can and return the number of bytes written. Otherwise, it > shall return -1 and set errno to [EAGAIN]. > > > Thanks! > > Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-06 12:01 ` Marcelo Tosatti @ 2004-10-07 3:31 ` Stephen C. Tweedie 2004-10-07 10:12 ` Marcelo Tosatti 0 siblings, 1 reply; 24+ messages in thread From: Stephen C. Tweedie @ 2004-10-07 3:31 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Jeff Moyer, linux-kernel, Ingo Molnar, Stephen Tweedie Hi, On Wed, 2004-10-06 at 09:01 -0300, Marcelo Tosatti wrote: > > When attempting to read a file (other than a pipe or FIFO) that supports > > non-blocking reads and has no data currently available: > > o If O_NONBLOCK is set, read( ) shall return -1 and set errno to [EAGAIN]. > > This implies read(O_NONBLOCK) should never block. The spec is usually pretty careful never to come straight out and require that in all cases, even for true AIO. > Maybe your code should pass down __GFP_FAIL in the gfp_mask > to the page_cache_alloc() to avoid blocking reclaiming pages, > and possibly pass info down to the block layer > "if this is going to block, fail". It's not just the page allocation that can block, though. Readahead requires us to map the buffers being read before we submit the async read, so we can still block reading indirect blocks. If we want to avoid submitting that extra synchronous IO, then either O_NONBLOCK needs to avoid readahead entirely for non-present pages, or the readahead itself needs to know that it's a O_NONBLOCK IO and fail cleanly if the metadata is not in cache. Cheers, Stephen ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-07 3:31 ` Stephen C. Tweedie @ 2004-10-07 10:12 ` Marcelo Tosatti 2004-10-07 12:30 ` Arjan van de Ven 2004-10-11 18:32 ` Stephen C. Tweedie 0 siblings, 2 replies; 24+ messages in thread From: Marcelo Tosatti @ 2004-10-07 10:12 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Jeff Moyer, linux-kernel, Ingo Molnar On Thu, Oct 07, 2004 at 04:31:35AM +0100, Stephen C. Tweedie wrote: > Hi, > > On Wed, 2004-10-06 at 09:01 -0300, Marcelo Tosatti wrote: > > > o If O_NONBLOCK is set, read( ) shall return -1 and set errno to [EAGAIN]. > > > > This implies read(O_NONBLOCK) should never block. > > The spec is usually pretty careful never to come straight out and > require that in all cases, even for true AIO. > > > Maybe your code should pass down __GFP_FAIL in the gfp_mask > > to the page_cache_alloc() to avoid blocking reclaiming pages, > > and possibly pass info down to the block layer > > "if this is going to block, fail". > > It's not just the page allocation that can block, though. Readahead > requires us to map the buffers being read before we submit the async > read, so we can still block reading indirect blocks. If we want to > avoid submitting that extra synchronous IO, then either O_NONBLOCK needs > to avoid readahead entirely for non-present pages, or the readahead > itself needs to know that it's a O_NONBLOCK IO and fail cleanly if the > metadata is not in cache. Hi Stephen! Oh yes, theres also the indirect blocks which we might need to read from disk. Now the question is, how strict should the O_NONBLOCK implementation be in reference to "not blocking" ? Maybe Jeff's currently implementation is just fine avoiding the potential block at !PageUptodate. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-07 10:12 ` Marcelo Tosatti @ 2004-10-07 12:30 ` Arjan van de Ven 2004-10-11 18:32 ` Stephen C. Tweedie 1 sibling, 0 replies; 24+ messages in thread From: Arjan van de Ven @ 2004-10-07 12:30 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Stephen C. Tweedie, Jeff Moyer, linux-kernel, Ingo Molnar [-- Attachment #1: Type: text/plain, Size: 397 bytes --] > Now the question is, how strict should the O_NONBLOCK implementation be > in reference to "not blocking" ? almost any allocation all over the kernel can in theory block ;) I'd say be pragmatic in this and avoid the obvious pagecache blocking, but ignore the rest, it'll be rare and if it's there, of short duration. Userland can get rescheduled anyway at any time for brief periods [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-07 10:12 ` Marcelo Tosatti 2004-10-07 12:30 ` Arjan van de Ven @ 2004-10-11 18:32 ` Stephen C. Tweedie 2004-10-11 18:58 ` Jeff Moyer 1 sibling, 1 reply; 24+ messages in thread From: Stephen C. Tweedie @ 2004-10-11 18:32 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Jeff Moyer, linux-kernel, Ingo Molnar, Stephen Tweedie Hi, On Thu, 2004-10-07 at 11:12, Marcelo Tosatti wrote: > Oh yes, theres also the indirect blocks which we might need to read from > disk. Right. > Now the question is, how strict should the O_NONBLOCK implementation be > in reference to "not blocking" ? Well, I suspect that depends on the application. But if you've got an app that really wants to make sure its hot path is as fast as possible (eg. a high-throughput server multiplexing disk IO and networking through a single event loop), then ideally the app would want to punt any blocking disk IO to another thread. So it's a matter of significant extra programing for a small extra reduction in app blocking potential. I think it's worth getting this right in the long term, though. Getting readahead of indirect blocks right has other benefits too --- eg. we may be able to fix the situation where we end up trying to read indirect blocks before we've even submitted the IO for the previous data blocks, breaking the IO pipeline ordering. --Stephen ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-11 18:32 ` Stephen C. Tweedie @ 2004-10-11 18:58 ` Jeff Moyer 2004-10-11 21:49 ` Stephen C. Tweedie 0 siblings, 1 reply; 24+ messages in thread From: Jeff Moyer @ 2004-10-11 18:58 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar ==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; "Stephen C. Tweedie" <sct@redhat.com> adds: sct> Hi, On Thu, 2004-10-07 at 11:12, Marcelo Tosatti wrote: >> Oh yes, theres also the indirect blocks which we might need to read from >> disk. sct> Right. >> Now the question is, how strict should the O_NONBLOCK implementation be >> in reference to "not blocking" ? sct> Well, I suspect that depends on the application. But if you've got an sct> app that really wants to make sure its hot path is as fast as possible sct> (eg. a high-throughput server multiplexing disk IO and networking sct> through a single event loop), then ideally the app would want to punt sct> any blocking disk IO to another thread. sct> So it's a matter of significant extra programing for a small extra sct> reduction in app blocking potential. sct> I think it's worth getting this right in the long term, though. sct> Getting readahead of indirect blocks right has other benefits too --- sct> eg. we may be able to fix the situation where we end up trying to read sct> indirect blocks before we've even submitted the IO for the previous sct> data blocks, breaking the IO pipeline ordering. So for the short term, are you an advocate of the patch posted? -Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-11 18:58 ` Jeff Moyer @ 2004-10-11 21:49 ` Stephen C. Tweedie 2004-10-13 14:26 ` Jeff Moyer 0 siblings, 1 reply; 24+ messages in thread From: Stephen C. Tweedie @ 2004-10-11 21:49 UTC (permalink / raw) To: Jeff Moyer; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar, Stephen Tweedie Hi, On Mon, 2004-10-11 at 19:58, Jeff Moyer wrote: > sct> I think it's worth getting this right in the long term, though. > sct> Getting readahead of indirect blocks right has other benefits too --- > sct> eg. we may be able to fix the situation where we end up trying to read > sct> indirect blocks before we've even submitted the IO for the previous > sct> data blocks, breaking the IO pipeline ordering. > > So for the short term, are you an advocate of the patch posted? In the short term, can't we just disable readahead for O_NONBLOCK? That has true non-blocking semantics --- if the data is already available we return it, but if not, it's up to somebody else to retrieve it. That's exactly what you want if you're genuinely trying to avoid blocking at all costs on a really hot event loop, and the semantics seem to make sense to me. It's not that different from the networking case where no amount of read() on a non-blocking fd will get you more data unless there's another process somewhere filling the stream. --Stephen ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-11 21:49 ` Stephen C. Tweedie @ 2004-10-13 14:26 ` Jeff Moyer 2004-10-15 15:44 ` Jeff Moyer 0 siblings, 1 reply; 24+ messages in thread From: Jeff Moyer @ 2004-10-13 14:26 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar, akpm ==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; "Stephen C. Tweedie" <sct@redhat.com> adds: sct> Hi, On Mon, 2004-10-11 at 19:58, Jeff Moyer wrote: sct> I think it's worth getting this right in the long term, though. sct> Getting readahead of indirect blocks right has other benefits too --- sct> eg. we may be able to fix the situation where we end up trying to read sct> indirect blocks before we've even submitted the IO for the previous sct> data blocks, breaking the IO pipeline ordering. >> So for the short term, are you an advocate of the patch posted? sct> In the short term, can't we just disable readahead for O_NONBLOCK? sct> That has true non-blocking semantics --- if the data is already sct> available we return it, but if not, it's up to somebody else to sct> retrieve it. sct> That's exactly what you want if you're genuinely trying to avoid sct> blocking at all costs on a really hot event loop, and the semantics sct> seem to make sense to me. It's not that different from the networking sct> case where no amount of read() on a non-blocking fd will get you more sct> data unless there's another process somewhere filling the stream. Yes, that sounds like a fine idea. Here is a patch which does this. Andrew, I know you only want bug fixes, but I'd like to get this into your queue for post 2.6.9, if possible. Thanks, Jeff --- linux-2.6.9-rc4-mm1/mm/filemap.c~ 2004-10-12 12:40:01.000000000 -0400 +++ linux-2.6.9-rc4-mm1/mm/filemap.c 2004-10-12 12:53:13.202396656 -0400 @@ -692,7 +692,7 @@ void do_generic_mapping_read(struct addr unsigned long index, end_index, offset; loff_t isize; struct page *cached_page; - int error; + int error, nonblock = filp->f_flags & O_NONBLOCK; struct file_ra_state ra = *_ra; cached_page = NULL; @@ -721,16 +721,27 @@ void do_generic_mapping_read(struct addr nr = nr - offset; cond_resched(); - page_cache_readahead(mapping, &ra, filp, index); + if (!nonblock) + page_cache_readahead(mapping, &ra, filp, index); find_page: page = find_get_page(mapping, index); if (unlikely(page == NULL)) { + if (nonblock) { + desc->error = -EWOULDBLOCK; + break; + } handle_ra_miss(mapping, &ra, index); goto no_cached_page; } - if (!PageUptodate(page)) + if (!PageUptodate(page)) { + if (nonblock) { + page_cache_release(page); + desc->error = -EWOULDBLOCK; + break; + } goto page_not_up_to_date; + } page_ok: /* If users can be writing to this page using arbitrary @@ -976,7 +987,7 @@ __generic_file_aio_read(struct kiocb *io desc.error = 0; do_generic_file_read(filp,ppos,&desc,file_read_actor); retval += desc.written; - if (!retval) { + if (!retval || desc.error) { retval = desc.error; break; } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-13 14:26 ` Jeff Moyer @ 2004-10-15 15:44 ` Jeff Moyer 2004-10-15 16:19 ` Stephen C. Tweedie 2004-10-17 7:59 ` Alexandre Oliva 0 siblings, 2 replies; 24+ messages in thread From: Jeff Moyer @ 2004-10-15 15:44 UTC (permalink / raw) To: Stephen C. Tweedie, Marcelo Tosatti, linux-kernel, Ingo Molnar, akpm [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 1758 bytes --] ==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; Jeff Moyer <jmoyer@redhat.com> adds: ==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; "Stephen C. Tweedie" <sct@redhat.com> adds: sct> Hi, On Mon, 2004-10-11 at 19:58, Jeff Moyer wrote: sct> I think it's worth getting this right in the long term, though. sct> Getting readahead of indirect blocks right has other benefits too --- sct> eg. we may be able to fix the situation where we end up trying to read sct> indirect blocks before we've even submitted the IO for the previous sct> data blocks, breaking the IO pipeline ordering. >>> So for the short term, are you an advocate of the patch posted? sct> In the short term, can't we just disable readahead for O_NONBLOCK? sct> That has true non-blocking semantics --- if the data is already sct> available we return it, but if not, it's up to somebody else to sct> retrieve it. sct> That's exactly what you want if you're genuinely trying to avoid sct> blocking at all costs on a really hot event loop, and the semantics sct> seem to make sense to me. It's not that different from the networking sct> case where no amount of read() on a non-blocking fd will get you more sct> data unless there's another process somewhere filling the stream. jmoyer> Yes, that sounds like a fine idea. Here is a patch which does jmoyer> this. Andrew, I know you only want bug fixes, but I'd like to get jmoyer> this into your queue for post 2.6.9, if possible. I got the partial read case wrong in the last patch. In fact, it looks like this code path would perform infinite retries before. This should address that by returning upon the first partial read. Attached is a new version of the patch. -Jeff [-- Attachment #2: linux-2.6.9-o_nonblock-2.patch --] [-- Type: text/plain, Size: 1452 bytes --] --- linux-2.6.9-rc4-mm1/mm/filemap.c.orig 2004-10-15 10:33:24.986209880 -0400 +++ linux-2.6.9-rc4-mm1/mm/filemap.c 2004-10-15 11:38:50.869384920 -0400 @@ -692,7 +692,7 @@ void do_generic_mapping_read(struct addr unsigned long index, end_index, offset; loff_t isize; struct page *cached_page; - int error; + int error, nonblock = filp->f_flags & O_NONBLOCK; struct file_ra_state ra = *_ra; cached_page = NULL; @@ -721,16 +721,27 @@ void do_generic_mapping_read(struct addr nr = nr - offset; cond_resched(); - page_cache_readahead(mapping, &ra, filp, index); + if (!nonblock) + page_cache_readahead(mapping, &ra, filp, index); find_page: page = find_get_page(mapping, index); if (unlikely(page == NULL)) { + if (nonblock) { + desc->error = -EWOULDBLOCK; + break; + } handle_ra_miss(mapping, &ra, index); goto no_cached_page; } - if (!PageUptodate(page)) + if (!PageUptodate(page)) { + if (nonblock) { + page_cache_release(page); + desc->error = -EWOULDBLOCK; + break; + } goto page_not_up_to_date; + } page_ok: /* If users can be writing to this page using arbitrary @@ -976,10 +987,10 @@ __generic_file_aio_read(struct kiocb *io desc.error = 0; do_generic_file_read(filp,ppos,&desc,file_read_actor); retval += desc.written; - if (!retval) { + if (!retval) retval = desc.error; + if (desc.written != iov[seg].iov_len) break; - } } } out: ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-15 15:44 ` Jeff Moyer @ 2004-10-15 16:19 ` Stephen C. Tweedie 2004-10-17 7:59 ` Alexandre Oliva 1 sibling, 0 replies; 24+ messages in thread From: Stephen C. Tweedie @ 2004-10-15 16:19 UTC (permalink / raw) To: Jeff Moyer Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar, Andrew Morton, Stephen Tweedie Hi, On Fri, 2004-10-15 at 16:44, Jeff Moyer wrote: > I got the partial read case wrong in the last patch. In fact, it looks > like this code path would perform infinite retries before. This should > address that by returning upon the first partial read. Attached is a new > version of the patch. ... > --- linux-2.6.9-rc4-mm1/mm/filemap.c.orig 2004-10-15 10:33:24.986209880 -0400 > +++ linux-2.6.9-rc4-mm1/mm/filemap.c 2004-10-15 11:38:50.869384920 -0400 > @@ -976,10 +987,10 @@ __generic_file_aio_read(struct kiocb *io > desc.error = 0; > do_generic_file_read(filp,ppos,&desc,file_read_actor); > retval += desc.written; > - if (!retval) { > + if (!retval) > retval = desc.error; > + if (desc.written != iov[seg].iov_len) > break; > - } > } Yep; this chunk used to break out only on !retval, so at worst it's a data corrupter for multi-segment iovecs. If one do_generic_file_read() returned a short read we'd update retval but then move on to the next segment of the iovec; if the next one succeeded we'd be reading into the next segment but ppos wouldn't be correctly advanced. The fix looks correct --- we still only return an error if no data at all has been returned, but we break out of the IO completely at the first short read. --Stephen ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-15 15:44 ` Jeff Moyer 2004-10-15 16:19 ` Stephen C. Tweedie @ 2004-10-17 7:59 ` Alexandre Oliva 2004-10-17 11:20 ` Ingo Molnar 1 sibling, 1 reply; 24+ messages in thread From: Alexandre Oliva @ 2004-10-17 7:59 UTC (permalink / raw) To: jmoyer; +Cc: Stephen C. Tweedie, Marcelo Tosatti, linux-kernel, Ingo Molnar, akpm On Oct 15, 2004, Jeff Moyer <jmoyer@redhat.com> wrote: jmoyer> Yes, that sounds like a fine idea. Here is a patch which does jmoyer> this. Andrew, I know you only want bug fixes, but I'd like to get jmoyer> this into your queue for post 2.6.9, if possible. > I got the partial read case wrong in the last patch. In fact, it looks > like this code path would perform infinite retries before. This should > address that by returning upon the first partial read. Attached is a new > version of the patch. Don't you have to adjust poll/select as well, to test whether read has any data to return immediately? Squid is broken with the latest FCdevel kernel, and this patch is in it. The reason Squid breaks is that poll (or is it select? I forget) says there's data to be read from cache files (as well as from error message files read during start up), but then read fails with -EAGAIN. If I bring the error files into memory with cat /etc/squid/errors/ERR*, then squid will successfully start up, and then, in order for it to not eat all the available CPU polling data files and attempting to read from them, I need to start a command line this: pid=`pidof '(squid)' while :; do lsof -p $pid | sed -n 's,.*\(/var/spool/squid/.*/.*/.*\),\1,p' | xargs cat /dev/null > /dev/null; sleep 5 done -- Alexandre Oliva http://www.ic.unicamp.br/~oliva/ Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org} Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org} ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-17 7:59 ` Alexandre Oliva @ 2004-10-17 11:20 ` Ingo Molnar 2004-10-17 19:38 ` Alexandre Oliva 0 siblings, 1 reply; 24+ messages in thread From: Ingo Molnar @ 2004-10-17 11:20 UTC (permalink / raw) To: Alexandre Oliva Cc: jmoyer, Stephen C. Tweedie, Marcelo Tosatti, linux-kernel, akpm On Sun, 17 Oct 2004, Alexandre Oliva wrote: > The reason Squid breaks is that poll (or is it select? I forget) says > there's data to be read from cache files (as well as from error message > files read during start up), but then read fails with -EAGAIN. If I > bring the error files into memory with cat /etc/squid/errors/ERR*, then > squid will successfully start up, and then, in order for it to not eat > all the available CPU polling data files and attempting to read from > them, I need to start a command line this: the problem is that upon read() we dont 'kick' any IO if there's no data available. I.e. the readahead-kicking is necessary after all, because squid apparently assumes that re-trying a read will eventually succeed. (Tux on the other hand does not assume this and uses helper threads to kick IO.) Ingo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-17 11:20 ` Ingo Molnar @ 2004-10-17 19:38 ` Alexandre Oliva 2004-10-18 16:51 ` Jeff Moyer 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Oliva @ 2004-10-17 19:38 UTC (permalink / raw) To: Ingo Molnar Cc: jmoyer, Stephen C. Tweedie, Marcelo Tosatti, linux-kernel, akpm On Oct 17, 2004, Ingo Molnar <mingo@redhat.com> wrote: > I.e. the readahead-kicking is necessary after all, because > squid apparently assumes that re-trying a read will eventually succeed. I'm not sure it assumes that. It definitely expects read to succeed if poll says there is data available from the file, though, and having poll return that there is data, and then having read fail because there isn't anything there, so that you go back to poll, is a recipe for wasting CPU. I do think read should kick in readahead, yes, but so should poll, if the process says it wants to read from the file, and poll should not return (or not say data is available) unless an immediate, atomic call to read would actually return some data. Of course, if the data happens to be elicited from memory between the poll and read calls, it's legitimate for read to fail with -EAGAIN, but this shouldn't happen very often. -- Alexandre Oliva http://www.ic.unicamp.br/~oliva/ Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org} Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org} ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-17 19:38 ` Alexandre Oliva @ 2004-10-18 16:51 ` Jeff Moyer 2004-10-19 6:04 ` Alexandre Oliva 2004-10-21 20:14 ` James Antill 0 siblings, 2 replies; 24+ messages in thread From: Jeff Moyer @ 2004-10-18 16:51 UTC (permalink / raw) To: Alexandre Oliva Cc: Ingo Molnar, Stephen C. Tweedie, Marcelo Tosatti, linux-kernel, akpm ==> Regarding Re: [patch rfc] towards supporting O_NONBLOCK on regular files; Alexandre Oliva <aoliva@redhat.com> adds: aoliva> On Oct 17, 2004, Ingo Molnar <mingo@redhat.com> wrote: >> I.e. the readahead-kicking is necessary after all, because squid >> apparently assumes that re-trying a read will eventually succeed. aoliva> I'm not sure it assumes that. It definitely expects read to aoliva> succeed if poll says there is data available from the file, though, aoliva> and having poll return that there is data, and then having read aoliva> fail because there isn't anything there, so that you go back to aoliva> poll, is a recipe for wasting CPU. I do think read should kick in aoliva> readahead, yes, but so should poll, if the process says it wants to aoliva> read from the file, and poll should not return (or not say data is aoliva> available) unless an immediate, atomic call to read would actually aoliva> return some data. Of course, if the data happens to be elicited aoliva> from memory between the poll and read calls, it's legitimate for aoliva> read to fail with -EAGAIN, but this shouldn't happen very often. Select, pselect, and poll will always return data ready on a regular file. As such, I would argue that squid's behaviour is broken. Additionally, I don't think it's a good idea to modify any polling mechanism to kick off I/O, if simply because I'm not sure how much data to request! Since it wasn't supported before, I see no need to extend select/poll to change behaviour for regular files. After all, the POSIX spec explicitly says that select and friends will always return data ready for regular files. To make O_NONBLOCK on regular files a workable implementation, Uli suggestsed extending epoll. I will look into this. I am in favor of kicking off I/O for reads that would block. -Jeff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-18 16:51 ` Jeff Moyer @ 2004-10-19 6:04 ` Alexandre Oliva 2004-10-21 20:14 ` James Antill 1 sibling, 0 replies; 24+ messages in thread From: Alexandre Oliva @ 2004-10-19 6:04 UTC (permalink / raw) To: jmoyer; +Cc: Ingo Molnar, Stephen C. Tweedie, Marcelo Tosatti, linux-kernel, akpm On Oct 18, 2004, Jeff Moyer <jmoyer@redhat.com> wrote: > Select, pselect, and poll will always return data ready on a regular file. > As such, I would argue that squid's behaviour is broken. Additionally, I > don't think it's a good idea to modify any polling mechanism to kick off > I/O, if simply because I'm not sure how much data to request! You could request a single block (whatever that means), and then the subsequent non-blocking read would stand a chance of eventually making progress. . Or you could request nothing, and have the actual read start a readahead, such that next time it hopefully will have something to get to immediately. If the read comes in too quickly, before the poll-initiated readahead completes, you'll probably get them merged anyway, so it's not like it could hurt, methinks. And then, you might arrange for select/poll to not return immediately for these file descriptors, but rather return as soon as one of them has some data available to read, the time-out expired, or a very short time-out set for the case of non-blocking file descriptors select()ed for read expired. The latter time-out should be short enough to be hardly distinguishable from an immediate return, and the return value should be exactly what a POSIX-compliant application expects. This doesn't quite fix the problem with the existing standard interfaces, that don't quite enable anyone to do non-blocking reads without explicit readahead advice and busy-waiting for data. The short time-out above should at least reduce the syscall explosion that we get with the current behavior, but if read kicks in a readahead to guarantee we'd eventually get out of the select/read loop without external help to bring the data into memory, I guess we could live without the select/poll changes. -- Alexandre Oliva http://www.ic.unicamp.br/~oliva/ Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org} Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org} ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-18 16:51 ` Jeff Moyer 2004-10-19 6:04 ` Alexandre Oliva @ 2004-10-21 20:14 ` James Antill 1 sibling, 0 replies; 24+ messages in thread From: James Antill @ 2004-10-21 20:14 UTC (permalink / raw) To: jmoyer Cc: Alexandre Oliva, Ingo Molnar, Stephen C. Tweedie, Marcelo Tosatti, linux-kernel, akpm Jeff Moyer <jmoyer@redhat.com> writes: > Select, pselect, and poll will always return data ready on a regular file. > As such, I would argue that squid's behaviour is broken. I would argue that this isn't true. Yes, poll() would always return ready, but read would always "block" when you called it on the files. And using the same method you use for network IO saves having to have a separate path through the event loop just for reading files from disk. For another view of the argument, see: http://www.and.org/vstr/examples/ex_cat.c.html ...this does non-blocking IO from stdin to stdout. You seem to be arguing that it should have to know when stdin is from a pipe and when it is a redirected file. > I am in favor of kicking off I/O for reads that would block. This would solve the problem, but I'd still argue that poll() shouldn't lie ... having the event loop do the right thing when the disk is busy or the file is on a down NFS server would be a huge free win. -- James Antill e: <james.antill@redhat.com> Red Hat Support Engineering Group ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch rfc] towards supporting O_NONBLOCK on regular files 2004-10-01 20:57 Jeff Moyer 2004-10-03 19:48 ` Pavel Machek 2004-10-05 11:27 ` Marcelo Tosatti @ 2004-10-05 15:35 ` Rik van Riel 2 siblings, 0 replies; 24+ messages in thread From: Rik van Riel @ 2004-10-05 15:35 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel, mingo, sct On Fri, 1 Oct 2004, Jeff Moyer wrote: > This patch makes an attempt at supporting the O_NONBLOCK flag for > regular files. It's pretty straight-forward. One limitation is that we > still call into the readahead code, which I believe can block. > However, if we don't do this, then an application which only uses > non-blocking reads may never get it's data. I like it, though for programs which are real serious about O_NONBLOCK we might want to hand off readahead to a helper thread instead of starting readahead in the same context... Not sure if we would always want this, though ;) -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2004-10-21 20:26 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-10-05 13:07 [patch rfc] towards supporting O_NONBLOCK on regular files Dan Kegel -- strict thread matches above, loose matches on Subject: below -- 2004-10-01 20:57 Jeff Moyer 2004-10-03 19:48 ` Pavel Machek 2004-10-13 14:28 ` Jeff Moyer 2004-10-14 17:39 ` Pavel Machek 2004-10-05 11:27 ` Marcelo Tosatti 2004-10-06 13:13 ` Jeff Moyer 2004-10-06 12:01 ` Marcelo Tosatti 2004-10-07 3:31 ` Stephen C. Tweedie 2004-10-07 10:12 ` Marcelo Tosatti 2004-10-07 12:30 ` Arjan van de Ven 2004-10-11 18:32 ` Stephen C. Tweedie 2004-10-11 18:58 ` Jeff Moyer 2004-10-11 21:49 ` Stephen C. Tweedie 2004-10-13 14:26 ` Jeff Moyer 2004-10-15 15:44 ` Jeff Moyer 2004-10-15 16:19 ` Stephen C. Tweedie 2004-10-17 7:59 ` Alexandre Oliva 2004-10-17 11:20 ` Ingo Molnar 2004-10-17 19:38 ` Alexandre Oliva 2004-10-18 16:51 ` Jeff Moyer 2004-10-19 6:04 ` Alexandre Oliva 2004-10-21 20:14 ` James Antill 2004-10-05 15:35 ` Rik van Riel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox