* [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies @ 2013-02-28 20:07 Ankit Jain 2013-02-28 21:03 ` Kent Overstreet 0 siblings, 1 reply; 7+ messages in thread From: Ankit Jain @ 2013-02-28 20:07 UTC (permalink / raw) To: lsf-pc, linux-fsdevel; +Cc: Jan Kara, Zach Brown Hi, I'm interested in discussing how to improve async io api in the kernel, specifically io_submit latencies. I am working on trying to make io_submit non-blocking. I had posted a patch[1] for this earlier on fsdevel and there was some discussion on it. I have made some of the improvements suggested there. The approach attempted in that patch essentially tries to service the requests on a separate kernel thread. It was pointed out that this would need to ensure that there aren't any unknown task_struct references or dependencies under f_op->aio* which might get confused because of the kernel thread. Would this kinda full audit be enough or would be it considered too fragile? I would like to discuss whether this is best approach for solving this problem, and/or discuss some of the other possible approaches to solving this issue. This has been discussed in the past but we don't seem to have a solution as of now. References: 1. http://comments.gmane.org/gmane.linux.kernel.aio.general/3142 Regards, -- Ankit Jain SUSE Labs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies 2013-02-28 20:07 [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies Ankit Jain @ 2013-02-28 21:03 ` Kent Overstreet 2013-02-28 23:49 ` Zach Brown ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Kent Overstreet @ 2013-02-28 21:03 UTC (permalink / raw) To: Ankit Jain; +Cc: lsf-pc, linux-fsdevel, Jan Kara, Zach Brown, tytso On Fri, Mar 01, 2013 at 01:37:55AM +0530, Ankit Jain wrote: > Hi, > > I'm interested in discussing how to improve async io api in the kernel, > specifically io_submit latencies. > > I am working on trying to make io_submit non-blocking. I had posted a > patch[1] for this earlier on fsdevel and there was some discussion on > it. I have made some of the improvements suggested there. > > The approach attempted in that patch essentially tries to service the > requests on a separate kernel thread. It was pointed out that this would > need to ensure that there aren't any unknown task_struct references or > dependencies under f_op->aio* which might get confused because of the > kernel thread. Would this kinda full audit be enough or would be it > considered too fragile? Was just talking about this. Completely agreed that we need to do something about it, but personally I don't think punting everything to workqueue is a realistic solution. One problem with the approach is that sometimes we _do_ need to block. The primary reason we block in submit_bio if the request queue is too full is that our current IO schedulers can't cope with unbounded queue depth; other processes will be starved and see unbounded IO latencies. This is even worse when a filesystem is involved and metadata operations get stuck at the end of a huge queue. By punting everything to workqueue, all that's been accomplished is to hide the queueing and shove it up a layer. A similar problem exists with kernel memory usage, but it's even worse there because most users aren't using memcg. If we're short on memery, the processing doing aio really needs to be throttled in io_submit() -> get_user_pages(); if it's punting everything to workqueue, now the other processes may have to compete against 1000 worker threads calling get_user_pages() simultaneously instead of just the process doing aio. Also, punting everything to workqueue introduces a real performance cost. Workqueues are fast, and it's not going to be noticed with hard drives or even SATA SSDs - but high end SSDs are pushing over a million iops these days and automatically punting everything to workqueue is going to be unacceptable there. That said, I think for filesystems blocking in get_blocks() another kernel thread probably is only practical solution. What I'd really like is a way to spawn a worker thread automagically only if and when we block. The thought of trying to implement that scares me though, I'm pretty sure it'd require deep magic. In the short term though, Ted implemented a hack in ext4 to pin all metadata for a given file in memory, and bumping up the request queue depth shouldn't be a big deal if that's an issue (at least configurably). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies 2013-02-28 21:03 ` Kent Overstreet @ 2013-02-28 23:49 ` Zach Brown 2013-03-01 15:03 ` Jeff Moyer 2013-03-04 19:55 ` Ankit Jain 2 siblings, 0 replies; 7+ messages in thread From: Zach Brown @ 2013-02-28 23:49 UTC (permalink / raw) To: Kent Overstreet; +Cc: Ankit Jain, lsf-pc, linux-fsdevel, Jan Kara, tytso > What I'd really like is a way to spawn a worker thread automagically > only if and when we block. The thought of trying to implement that > scares me though, I'm pretty sure it'd require deep magic. Yeah, there have been some experiments along those lines. I messed with juggling kernel stacks when blocking with "fibrils": http://lwn.net/Articles/219954/ Ingo created a new thread and returned it to userspace when a syscall blocks with "syslets": http://lwn.net/Articles/221887/ - z ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies 2013-02-28 21:03 ` Kent Overstreet 2013-02-28 23:49 ` Zach Brown @ 2013-03-01 15:03 ` Jeff Moyer 2013-03-01 16:20 ` Tejun Heo 2013-03-04 19:55 ` Ankit Jain 2 siblings, 1 reply; 7+ messages in thread From: Jeff Moyer @ 2013-03-01 15:03 UTC (permalink / raw) To: Kent Overstreet Cc: Ankit Jain, lsf-pc, linux-fsdevel, Jan Kara, Zach Brown, tytso, Tejun Heo, Jens Axboe Kent Overstreet <koverstreet@google.com> writes: > On Fri, Mar 01, 2013 at 01:37:55AM +0530, Ankit Jain wrote: >> Hi, >> >> I'm interested in discussing how to improve async io api in the kernel, >> specifically io_submit latencies. >> >> I am working on trying to make io_submit non-blocking. I had posted a >> patch[1] for this earlier on fsdevel and there was some discussion on >> it. I have made some of the improvements suggested there. >> >> The approach attempted in that patch essentially tries to service the >> requests on a separate kernel thread. It was pointed out that this would >> need to ensure that there aren't any unknown task_struct references or >> dependencies under f_op->aio* which might get confused because of the >> kernel thread. Would this kinda full audit be enough or would be it >> considered too fragile? > > Was just talking about this. Completely agreed that we need to do > something about it, but personally I don't think punting everything to > workqueue is a realistic solution. > > One problem with the approach is that sometimes we _do_ need to block. > The primary reason we block in submit_bio if the request queue is too > full is that our current IO schedulers can't cope with unbounded queue > depth; other processes will be starved and see unbounded IO latencies. The I/O schedulers have no problem coping with a larger queue depth. In fact, the more I/O you let through to the scheduler, the better chance you have of getting fairness between processes (not the other way around as you suggest). The sleeping on nr_requests is done to prevent the I/O subsystem from eating up all of your kernel memory. > This is even worse when a filesystem is involved and metadata operations > get stuck at the end of a huge queue. By punting everything to > workqueue, all that's been accomplished is to hide the queueing and > shove it up a layer. Slightly different issue there, but no need to hash it out in this thread. One thing I do agree with is that, when you punt I/O to a workqueue, you lose the ability to account that I/O to the proper process. > A similar problem exists with kernel memory usage, but it's even worse > there because most users aren't using memcg. If we're short on memery, > the processing doing aio really needs to be throttled in io_submit() -> > get_user_pages(); if it's punting everything to workqueue, now the other > processes may have to compete against 1000 worker threads calling > get_user_pages() simultaneously instead of just the process doing aio. Right, this hits on the inability to track the i/o to the original submitting process. I thought we had a plan to fix that (and I have some really old patches for this that I never quite finished). Tejun? Jens? Cheers, Jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies 2013-03-01 15:03 ` Jeff Moyer @ 2013-03-01 16:20 ` Tejun Heo 2013-03-01 16:31 ` Tejun Heo 0 siblings, 1 reply; 7+ messages in thread From: Tejun Heo @ 2013-03-01 16:20 UTC (permalink / raw) To: Jeff Moyer Cc: Kent Overstreet, Ankit Jain, lsf-pc, linux-fsdevel, Jan Kara, Zach Brown, tytso, Jens Axboe, Li Zefan, cgroups (cc'ing Li and cgroups ML) Hey, guys. On Fri, Mar 01, 2013 at 10:03:47AM -0500, Jeff Moyer wrote: > > This is even worse when a filesystem is involved and metadata operations > > get stuck at the end of a huge queue. By punting everything to > > workqueue, all that's been accomplished is to hide the queueing and > > shove it up a layer. > > Slightly different issue there, but no need to hash it out in this > thread. One thing I do agree with is that, when you punt I/O to a > workqueue, you lose the ability to account that I/O to the proper > process. Block layer now supports tagging bio's with the issuer's identity. int bio_associate_current(struct bio *bio); void bio_disassociate_task(struct bio *bio); After bio_associate_current() is performed on a bio, block layer will treat the bio as if it's being issued by the %current at the time of association no matter which task ends up doing the actual submission in terms of ioctx and blkcg. Async IO handling of blkcg is still uttrely broken so it isn't as useful at this point yet tho. > > A similar problem exists with kernel memory usage, but it's even worse > > there because most users aren't using memcg. If we're short on memery, > > the processing doing aio really needs to be throttled in io_submit() -> > > get_user_pages(); if it's punting everything to workqueue, now the other > > processes may have to compete against 1000 worker threads calling > > get_user_pages() simultaneously instead of just the process doing aio. > > Right, this hits on the inability to track the i/o to the original > submitting process. I thought we had a plan to fix that (and I have > some really old patches for this that I never quite finished). Tejun? > Jens? For IO, I think bio tagging should be able to handle most of it, eventually. For memory, ultimately, we want the workqueue tasks to be able to assume the resource role of the work item issuer. Associating dynamically is nasty given the variety of cgroups - e.g. there might not be any common CPUs between the allowed sets to the workqueue and the issuer, so I'm unsure whether we can reach a general solution; however, workqueue is currently growing worker pools with custom attributes which will eventually cover cgroup association and we can use that for specific problem areas - ie. create a matching workqueue for each aio context (the backed pool is shared so the overhead isn't big). One obstacle there is we currently don't have a way to say "this workqueue belongs to this cgroup" as there is no "this" cgroup defined (awesome design). That part is being rectified but for the time being we can probably say "this workqueue belongs to the same cgroups as %current" which should be enough for aio contexts, I think. Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies 2013-03-01 16:20 ` Tejun Heo @ 2013-03-01 16:31 ` Tejun Heo 0 siblings, 0 replies; 7+ messages in thread From: Tejun Heo @ 2013-03-01 16:31 UTC (permalink / raw) To: Jeff Moyer Cc: Kent Overstreet, Ankit Jain, lsf-pc, linux-fsdevel, Jan Kara, Zach Brown, tytso, Jens Axboe, Li Zefan, cgroups On Fri, Mar 01, 2013 at 08:20:38AM -0800, Tejun Heo wrote: > One obstacle there is we currently don't have a way to say "this > workqueue belongs to this cgroup" as there is no "this" cgroup defined > (awesome design). That part is being rectified but for the time being > we can probably say "this workqueue belongs to the same cgroups as > %current" which should be enough for aio contexts, I think. Or maybe we should just add current->wq which always matches cgroup associations of the task? -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies 2013-02-28 21:03 ` Kent Overstreet 2013-02-28 23:49 ` Zach Brown 2013-03-01 15:03 ` Jeff Moyer @ 2013-03-04 19:55 ` Ankit Jain 2 siblings, 0 replies; 7+ messages in thread From: Ankit Jain @ 2013-03-04 19:55 UTC (permalink / raw) To: Kent Overstreet; +Cc: lsf-pc, linux-fsdevel, Jan Kara, Zach Brown, tytso On 03/01/2013 02:33 AM, Kent Overstreet wrote: > On Fri, Mar 01, 2013 at 01:37:55AM +0530, Ankit Jain wrote: >> Hi, >> >> I'm interested in discussing how to improve async io api in the kernel, >> specifically io_submit latencies. >> >> I am working on trying to make io_submit non-blocking. I had posted a >> patch[1] for this earlier on fsdevel and there was some discussion on >> it. I have made some of the improvements suggested there. >> >> The approach attempted in that patch essentially tries to service the >> requests on a separate kernel thread. It was pointed out that this would >> need to ensure that there aren't any unknown task_struct references or >> dependencies under f_op->aio* which might get confused because of the >> kernel thread. Would this kinda full audit be enough or would be it >> considered too fragile? > > Was just talking about this. Completely agreed that we need to do > something about it, but personally I don't think punting everything to > workqueue is a realistic solution. Sure. Like you and others mentioned on this thread, there are possible ways to solve this. I think it would be useful to discuss those and figure out an approach to try for this, at the conference. Thanks, -- Ankit Jain SUSE Labs ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-03-04 19:56 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-28 20:07 [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies Ankit Jain 2013-02-28 21:03 ` Kent Overstreet 2013-02-28 23:49 ` Zach Brown 2013-03-01 15:03 ` Jeff Moyer 2013-03-01 16:20 ` Tejun Heo 2013-03-01 16:31 ` Tejun Heo 2013-03-04 19:55 ` Ankit Jain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).