[LSF/MM TOPIC][ATTEND] Improving async io, specifically io

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies
@ 2013-02-28 20:07 Ankit Jain
  2013-02-28 21:03 ` Kent Overstreet
  0 siblings, 1 reply; 7+ messages in thread
From: Ankit Jain @ 2013-02-28 20:07 UTC (permalink / raw)
  To: lsf-pc, linux-fsdevel; +Cc: Jan Kara, Zach Brown

Hi,

I'm interested in discussing how to improve async io api in the kernel,
specifically io_submit latencies.

I am working on trying to make io_submit non-blocking. I had posted a
patch[1] for this earlier on fsdevel and there was some discussion on
it. I have made some of the improvements suggested there.

The approach attempted in that patch essentially tries to service the
requests on a separate kernel thread. It was pointed out that this would
need to ensure that there aren't any unknown task_struct references or
dependencies under f_op->aio* which might get confused because of the
kernel thread. Would this kinda full audit be enough or would be it
considered too fragile?

I would like to discuss whether this is best approach for solving this
problem, and/or discuss some of the other possible approaches to solving
this issue.

This has been discussed in the past but we don't seem to have a solution
as of now.

References:
1. http://comments.gmane.org/gmane.linux.kernel.aio.general/3142

Regards,
-- 
Ankit Jain
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies
  2013-02-28 20:07 [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies Ankit Jain
@ 2013-02-28 21:03 ` Kent Overstreet
  2013-02-28 23:49   ` Zach Brown
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Kent Overstreet @ 2013-02-28 21:03 UTC (permalink / raw)
  To: Ankit Jain; +Cc: lsf-pc, linux-fsdevel, Jan Kara, Zach Brown, tytso

On Fri, Mar 01, 2013 at 01:37:55AM +0530, Ankit Jain wrote:
> Hi,
> 
> I'm interested in discussing how to improve async io api in the kernel,
> specifically io_submit latencies.
> 
> I am working on trying to make io_submit non-blocking. I had posted a
> patch[1] for this earlier on fsdevel and there was some discussion on
> it. I have made some of the improvements suggested there.
> 
> The approach attempted in that patch essentially tries to service the
> requests on a separate kernel thread. It was pointed out that this would
> need to ensure that there aren't any unknown task_struct references or
> dependencies under f_op->aio* which might get confused because of the
> kernel thread. Would this kinda full audit be enough or would be it
> considered too fragile?

Was just talking about this.  Completely agreed that we need to do
something about it, but personally I don't think punting everything to
workqueue is a realistic solution.

One problem with the approach is that sometimes we _do_ need to block.
The primary reason we block in submit_bio if the request queue is too
full is that our current IO schedulers can't cope with unbounded queue
depth; other processes will be starved and see unbounded IO latencies.
This is even worse when a filesystem is involved and metadata operations
get stuck at the end of a huge queue.  By punting everything to
workqueue, all that's been accomplished is to hide the queueing and
shove it up a layer.

A similar problem exists with kernel memory usage, but it's even worse
there because most users aren't using memcg. If we're short on memery,
the processing doing aio really needs to be throttled in io_submit() ->
get_user_pages(); if it's punting everything to workqueue, now the other
processes may have to compete against 1000 worker threads calling
get_user_pages() simultaneously instead of just the process doing aio.

Also, punting everything to workqueue introduces a real performance
cost. Workqueues are fast, and it's not going to be noticed with hard
drives or even SATA SSDs - but high end SSDs are pushing over a million
iops these days and automatically punting everything to workqueue is
going to be unacceptable there.

That said, I think for filesystems blocking in get_blocks() another
kernel thread probably is only practical solution.

What I'd really like is a way to spawn a worker thread automagically
only if and when we block. The thought of trying to implement that
scares me though, I'm pretty sure it'd require deep magic.

In the short term though, Ted implemented a hack in ext4 to pin all
metadata for a given file in memory, and bumping up the request queue
depth shouldn't be a big deal if that's an issue (at least
configurably).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies
  2013-02-28 21:03 ` Kent Overstreet
@ 2013-02-28 23:49   ` Zach Brown
  2013-03-01 15:03   ` Jeff Moyer
  2013-03-04 19:55   ` Ankit Jain
  2 siblings, 0 replies; 7+ messages in thread
From: Zach Brown @ 2013-02-28 23:49 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Ankit Jain, lsf-pc, linux-fsdevel, Jan Kara, tytso

> What I'd really like is a way to spawn a worker thread automagically
> only if and when we block. The thought of trying to implement that
> scares me though, I'm pretty sure it'd require deep magic.

Yeah, there have been some experiments along those lines.

I messed with juggling kernel stacks when blocking with "fibrils":

  http://lwn.net/Articles/219954/

Ingo created a new thread and returned it to userspace when a syscall
blocks with "syslets":

  http://lwn.net/Articles/221887/

- z

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies
  2013-02-28 21:03 ` Kent Overstreet
  2013-02-28 23:49   ` Zach Brown
@ 2013-03-01 15:03   ` Jeff Moyer
  2013-03-01 16:20     ` Tejun Heo
  2013-03-04 19:55   ` Ankit Jain
  2 siblings, 1 reply; 7+ messages in thread
From: Jeff Moyer @ 2013-03-01 15:03 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Ankit Jain, lsf-pc, linux-fsdevel, Jan Kara, Zach Brown, tytso,
	Tejun Heo, Jens Axboe

Kent Overstreet <koverstreet@google.com> writes:

> On Fri, Mar 01, 2013 at 01:37:55AM +0530, Ankit Jain wrote:
>> Hi,
>> 
>> I'm interested in discussing how to improve async io api in the kernel,
>> specifically io_submit latencies.
>> 
>> I am working on trying to make io_submit non-blocking. I had posted a
>> patch[1] for this earlier on fsdevel and there was some discussion on
>> it. I have made some of the improvements suggested there.
>> 
>> The approach attempted in that patch essentially tries to service the
>> requests on a separate kernel thread. It was pointed out that this would
>> need to ensure that there aren't any unknown task_struct references or
>> dependencies under f_op->aio* which might get confused because of the
>> kernel thread. Would this kinda full audit be enough or would be it
>> considered too fragile?
>
> Was just talking about this.  Completely agreed that we need to do
> something about it, but personally I don't think punting everything to
> workqueue is a realistic solution.
>
> One problem with the approach is that sometimes we _do_ need to block.
> The primary reason we block in submit_bio if the request queue is too
> full is that our current IO schedulers can't cope with unbounded queue
> depth; other processes will be starved and see unbounded IO latencies.

The I/O schedulers have no problem coping with a larger queue depth.  In
fact, the more I/O you let through to the scheduler, the better chance
you have of getting fairness between processes (not the other way around
as you suggest).  The sleeping on nr_requests is done to prevent the I/O
subsystem from eating up all of your kernel memory.

> This is even worse when a filesystem is involved and metadata operations
> get stuck at the end of a huge queue.  By punting everything to
> workqueue, all that's been accomplished is to hide the queueing and
> shove it up a layer.

Slightly different issue there, but no need to hash it out in this
thread.  One thing I do agree with is that, when you punt I/O to a
workqueue, you lose the ability to account that I/O to the proper
process.

> A similar problem exists with kernel memory usage, but it's even worse
> there because most users aren't using memcg. If we're short on memery,
> the processing doing aio really needs to be throttled in io_submit() ->
> get_user_pages(); if it's punting everything to workqueue, now the other
> processes may have to compete against 1000 worker threads calling
> get_user_pages() simultaneously instead of just the process doing aio.

Right, this hits on the inability to track the i/o to the original
submitting process.  I thought we had a plan to fix that (and I have
some really old patches for this that I never quite finished).  Tejun?
Jens?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies
  2013-03-01 15:03   ` Jeff Moyer
@ 2013-03-01 16:20     ` Tejun Heo
  2013-03-01 16:31       ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2013-03-01 16:20 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Kent Overstreet, Ankit Jain, lsf-pc, linux-fsdevel, Jan Kara,
	Zach Brown, tytso, Jens Axboe, Li Zefan, cgroups

(cc'ing Li and cgroups ML)

Hey, guys.

On Fri, Mar 01, 2013 at 10:03:47AM -0500, Jeff Moyer wrote:
> > This is even worse when a filesystem is involved and metadata operations
> > get stuck at the end of a huge queue.  By punting everything to
> > workqueue, all that's been accomplished is to hide the queueing and
> > shove it up a layer.
> 
> Slightly different issue there, but no need to hash it out in this
> thread.  One thing I do agree with is that, when you punt I/O to a
> workqueue, you lose the ability to account that I/O to the proper
> process.

Block layer now supports tagging bio's with the issuer's identity.

  int bio_associate_current(struct bio *bio);
  void bio_disassociate_task(struct bio *bio);

After bio_associate_current() is performed on a bio, block layer will
treat the bio as if it's being issued by the %current at the time of
association no matter which task ends up doing the actual submission
in terms of ioctx and blkcg.

Async IO handling of blkcg is still uttrely broken so it isn't as
useful at this point yet tho.

> > A similar problem exists with kernel memory usage, but it's even worse
> > there because most users aren't using memcg. If we're short on memery,
> > the processing doing aio really needs to be throttled in io_submit() ->
> > get_user_pages(); if it's punting everything to workqueue, now the other
> > processes may have to compete against 1000 worker threads calling
> > get_user_pages() simultaneously instead of just the process doing aio.
> 
> Right, this hits on the inability to track the i/o to the original
> submitting process.  I thought we had a plan to fix that (and I have
> some really old patches for this that I never quite finished).  Tejun?
> Jens?

For IO, I think bio tagging should be able to handle most of it,
eventually.  For memory, ultimately, we want the workqueue tasks to be
able to assume the resource role of the work item issuer.  Associating
dynamically is nasty given the variety of cgroups - e.g. there might
not be any common CPUs between the allowed sets to the workqueue and
the issuer, so I'm unsure whether we can reach a general solution;
however, workqueue is currently growing worker pools with custom
attributes which will eventually cover cgroup association and we can
use that for specific problem areas - ie. create a matching workqueue
for each aio context (the backed pool is shared so the overhead isn't
big).

One obstacle there is we currently don't have a way to say "this
workqueue belongs to this cgroup" as there is no "this" cgroup defined
(awesome design).  That part is being rectified but for the time being
we can probably say "this workqueue belongs to the same cgroups as
%current" which should be enough for aio contexts, I think.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies
  2013-03-01 16:20     ` Tejun Heo
@ 2013-03-01 16:31       ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2013-03-01 16:31 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Kent Overstreet, Ankit Jain, lsf-pc, linux-fsdevel, Jan Kara,
	Zach Brown, tytso, Jens Axboe, Li Zefan, cgroups

On Fri, Mar 01, 2013 at 08:20:38AM -0800, Tejun Heo wrote:
> One obstacle there is we currently don't have a way to say "this
> workqueue belongs to this cgroup" as there is no "this" cgroup defined
> (awesome design).  That part is being rectified but for the time being
> we can probably say "this workqueue belongs to the same cgroups as
> %current" which should be enough for aio contexts, I think.

Or maybe we should just add current->wq which always matches cgroup
associations of the task?

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies
  2013-02-28 21:03 ` Kent Overstreet
  2013-02-28 23:49   ` Zach Brown
  2013-03-01 15:03   ` Jeff Moyer
@ 2013-03-04 19:55   ` Ankit Jain
  2 siblings, 0 replies; 7+ messages in thread
From: Ankit Jain @ 2013-03-04 19:55 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: lsf-pc, linux-fsdevel, Jan Kara, Zach Brown, tytso

On 03/01/2013 02:33 AM, Kent Overstreet wrote:
> On Fri, Mar 01, 2013 at 01:37:55AM +0530, Ankit Jain wrote:
>> Hi,
>>
>> I'm interested in discussing how to improve async io api in the kernel,
>> specifically io_submit latencies.
>>
>> I am working on trying to make io_submit non-blocking. I had posted a
>> patch[1] for this earlier on fsdevel and there was some discussion on
>> it. I have made some of the improvements suggested there.
>>
>> The approach attempted in that patch essentially tries to service the
>> requests on a separate kernel thread. It was pointed out that this would
>> need to ensure that there aren't any unknown task_struct references or
>> dependencies under f_op->aio* which might get confused because of the
>> kernel thread. Would this kinda full audit be enough or would be it
>> considered too fragile?
> 
> Was just talking about this.  Completely agreed that we need to do
> something about it, but personally I don't think punting everything to
> workqueue is a realistic solution.

Sure. Like you and others mentioned on this thread, there are possible
ways to solve this. I think it would be useful to discuss those and
figure out an approach to try for this, at the conference.

Thanks,
-- 
Ankit Jain
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-03-04 19:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-28 20:07 [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies Ankit Jain
2013-02-28 21:03 ` Kent Overstreet
2013-02-28 23:49   ` Zach Brown
2013-03-01 15:03   ` Jeff Moyer
2013-03-01 16:20     ` Tejun Heo
2013-03-01 16:31       ` Tejun Heo
2013-03-04 19:55   ` Ankit Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).