linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LST/MM TOPIC] really non-blocking in aio stack
@ 2012-02-13 15:35 Zheng Liu
  2012-02-13 16:34 ` Zach Brown
  0 siblings, 1 reply; 3+ messages in thread
From: Zheng Liu @ 2012-02-13 15:35 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel

Hi all,

Currently native aio has been used in many critical applications like
innodb of MySQL and Nginx for a web server. But it is really not as
asynchronous as the user expects. __getblk() can be blocked by the 
metadata allocation, and get_reuqest() can sleep because of the queue
congestion. So the user is really annoyed by the delay. So we want to
improve it somehow to make it at least really non-blocking.

Although we have define EIOCBRETRY, it seems to me that it is not used
as perfect as it can. We can return a EIOCBRETRY when the underlying
work will be blocked, and the generic aio can be tuned to either do it
asynchronously or notify the user about the status and let the user
have been no finalization yet. So maybe we can take this chance to
discuss it since now at least it really hurts some very important
applications in the world.

I am sorry that I forgot to send a topic to mailing list before the 
deadline. Hope it isn't too late.

Regards,
Zheng

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [LST/MM TOPIC] really non-blocking in aio stack
  2012-02-13 15:35 [LST/MM TOPIC] really non-blocking in aio stack Zheng Liu
@ 2012-02-13 16:34 ` Zach Brown
  2012-02-15  6:13   ` Zheng Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Zach Brown @ 2012-02-13 16:34 UTC (permalink / raw)
  To: linux-fsdevel, gnehzuil.liu; +Cc: Jeff Moyer

(dropping lsf-pc from the follow-on discussion for fsdevel)

> Although we have define EIOCBRETRY, it seems to me that it is not used
> as perfect as it can.

EIOCBRETRY is a disaster because the operations are retried in the
context of the kaio threads.  To use it safely you have to ensure that
nothing the operation will do after returning -EIOCBRETRY will reference
current-> .

Realize that this can include convoluted paths through shared code that
might have *no idea* that they're used by some other path after
EIOCBRETRY and so have to be supernaturally careful with current->
references.  It's a maintenance nightmare.

The fs/aio.c retry code has the aio thread magically assume the mm
context of the submitting thread when it calls the retry handlers.
(aio_kick_handler()).  So, great, that's one current field that happens
to be sharable.  How about others?  current->journal_info?
current->io_context?  People sometimes ask about EIOCBRETRY and vfs ops
and never mention current->link_count.

As one of the people who has sunk serious time into fs/aio.c (cc:ing my
erstwhile partner in crime), I strongly discourage investing more
resources into the fs/aio.c design.  If it were me I'd be putting
resources into async infrastructure which makes use of the current
existing sync system call handling paths.

Async calls should have no idea that they're async: no duplication of
the syscall abi in submission argument structs, no magical fget before
calling operation handlers, no iocbs being sprinkled down through kernel
call stacks, no magical return codes.

Yeah, this ends up implying heavy use of kernel threads and playing
scary games with the task_struct of the submitter and async processing
thread.  At least the scary code would be in one place.

The current alternative of requiring fragile async implementations of
system calls has a compelling history of failure. fs/aio.c has been
around for a decade and has not seen significant use outside of its
initial supported operation.

I should really get the ogg of my LCA presentation (more of a jet-lagged
rant :)) on this posted somewhere.

- z

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [LST/MM TOPIC] really non-blocking in aio stack
  2012-02-13 16:34 ` Zach Brown
@ 2012-02-15  6:13   ` Zheng Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Zheng Liu @ 2012-02-15  6:13 UTC (permalink / raw)
  To: Zach Brown; +Cc: linux-fsdevel, Jeff Moyer

On Mon, Feb 13, 2012 at 11:34:33AM -0500, Zach Brown wrote:
> (dropping lsf-pc from the follow-on discussion for fsdevel)
> 
> >Although we have define EIOCBRETRY, it seems to me that it is not used
> >as perfect as it can.
> 
> EIOCBRETRY is a disaster because the operations are retried in the
> context of the kaio threads.  To use it safely you have to ensure that
> nothing the operation will do after returning -EIOCBRETRY will reference
> current-> .
> 
> Realize that this can include convoluted paths through shared code that
> might have *no idea* that they're used by some other path after
> EIOCBRETRY and so have to be supernaturally careful with current->
> references.  It's a maintenance nightmare.
> 
> The fs/aio.c retry code has the aio thread magically assume the mm
> context of the submitting thread when it calls the retry handlers.
> (aio_kick_handler()).  So, great, that's one current field that happens
> to be sharable.  How about others?  current->journal_info?
> current->io_context?  People sometimes ask about EIOCBRETRY and vfs ops
> and never mention current->link_count.
> 
> As one of the people who has sunk serious time into fs/aio.c (cc:ing my
> erstwhile partner in crime), I strongly discourage investing more
> resources into the fs/aio.c design.  If it were me I'd be putting
> resources into async infrastructure which makes use of the current
> existing sync system call handling paths.
> 
> Async calls should have no idea that they're async: no duplication of
> the syscall abi in submission argument structs, no magical fget before
> calling operation handlers, no iocbs being sprinkled down through kernel
> call stacks, no magical return codes.
> 
> Yeah, this ends up implying heavy use of kernel threads and playing
> scary games with the task_struct of the submitter and async processing
> thread.  At least the scary code would be in one place.
> 
> The current alternative of requiring fragile async implementations of
> system calls has a compelling history of failure. fs/aio.c has been
> around for a decade and has not seen significant use outside of its
> initial supported operation.

Hi Zach,

As I am the new comer to this problem, so any suggestions are welcomed,
and that is also the reason I raise it as a topic in this year's LSF
summit. Currently we provide the semantics and some very important
applications try to use it, while it annoyed them for such a long time.
For example, Innodb of MySQL use io_submit in background thread to
improve write performance. After uses it, the performance of MySQL can
be promoted by 10%. Nginx uses io_submit to read/write files. So, IMHO,
we need to think about how to improve it, rather than let applications
to modify their programs. Especially, MySQL and Nginx are widely used in
the world.

Now my employer does allocate some resources for me to try to resolve it
, so let us discuss a roadmap and I volunteer to work on it. My very
first attempt is trivial. Just let the user decide what to do. If there
is any blocking issues, we can return it to the caller and let him/her
decide if he/she can endure the delay or throw it to another thread.

> 
> I should really get the ogg of my LCA presentation (more of a jet-lagged
> rant :)) on this posted somewhere.

Never mind. Google has helped me to find it out.
http://mirror.linux.org.au/pub/linux.conf.au/2009/Thursday/131.ogg
Thanks for the advice.

Regards,
Zheng

> 
> - z

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-02-15  6:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-13 15:35 [LST/MM TOPIC] really non-blocking in aio stack Zheng Liu
2012-02-13 16:34 ` Zach Brown
2012-02-15  6:13   ` Zheng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).