From: Vivek Goyal <vgoyal@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org, jaxboe@fusionio.com,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
khlebnikov@openvz.org, jmoyer@redhat.com
Subject: Re: [RFC PATCH 0/3] block: Fix fsync slowness with CFQ cgroups
Date: Tue, 28 Jun 2011 09:35:58 -0400 [thread overview]
Message-ID: <20110628133557.GB17552@redhat.com> (raw)
In-Reply-To: <20110628024738.GJ32466@dastard>
On Tue, Jun 28, 2011 at 12:47:38PM +1000, Dave Chinner wrote:
>
> Vivek, I'm not sure this is a general solution. If we hand journal
> IO off to a workqueue, then we've got no idea what the "dependent
> task" is.
>
> I bring this up as I have a current patchset that moves all the XFS
> journal IO out of process context into a workqueue to solve
> process-visible operation latency (e.g. 1000 mkdir syscalls run at
> 1ms each, the 1001st triggers a journal checkpoint and takes 500ms)
> and background checkpoint submission races. This effectively means
> that XFS will trigger the same bad CFQ behaviour on fsync, but have
> no means of avoiding it because we don't have a specific task to
> yield to.
>
> And FWIW, we're going to be using workqueues more and more in XFS
> for asynchronous processing of operations. I'm looking to use WQs
> for speculative readahead of inodes, all our delayed metadata
> writeback, log IO submission, free space allocation requests,
> background inode allocation, background inode freeing, background
> EOF truncation, etc to process as much work asynchronously outside
> syscall context as possible (let's use all those CPU cores we
> have!).
>
> All of these things will push potentially dependent IO operations
> outside of the bounds of the process actually doing the operation,
> so some general solution to the "dependent IO in an undefined thread
> context" problem really needs to be solved sooner rather than
> later...
>
> As it is, I don't have any good ideas of how to solve this, but I
> thought it is worth bringing to your attention while you are trying
> to solve a similar issue.
Dave,
Coule of thoughts.
- We can introduce anohter block layer call were dependencies are setup
from worker thread context. So when the process schedules the work, it can
save the task information somewhere and when the worker thread actually
calls the specified funciton, that function can setup the dependency
between worker thread and submitting task.
Probably original process can tear down the dependency connection
when IO is done. I am assuming that IO submitting process is waiting
for all IO to finish.
In current framework one can specify multiple processes being dependent
on one thread but not vice-a-versa. I think we should be able to
handle that by maintaining a linked list of dependent queues instead
of single pointer. So if a process submits a bunch of jobs with help
of bunch of worker threads from multiple cpus, I think that case is
manageable with some extension to current patches.
- Or we can also try to do something more exotic and that when we schedule
a work, one should be able to tell which cgroup the worker should run in.
When the worker actually runs, it can migrate itself to destination
destination cgroup and submit IO. This does not take care of cases like
journalling thread where multiple processes are dependent on single
kernel thread. In that case above dependent queue solution should work
well.
So I think above API can be extended to handle the case of work queues
also or we could look into migrating worker in user specified cgroup if
that turns out to be a better solution.
Thanks
Vivek
next prev parent reply other threads:[~2011-06-28 13:35 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-27 20:17 [RFC PATCH 0/3] block: Fix fsync slowness with CFQ cgroups Vivek Goyal
2011-06-27 20:17 ` [PATCH 1/3] block: A new interface for specifying IO dependencing among tasks Vivek Goyal
2011-06-27 20:17 ` [PATCH 2/3] ext4: Explicitly specify fsync dependency on journaling thread Vivek Goyal
2011-06-27 20:17 ` [PATCH 3/3] ext3: " Vivek Goyal
2011-06-28 1:18 ` [RFC PATCH 0/3] block: Fix fsync slowness with CFQ cgroups Shaohua Li
2011-06-28 1:40 ` Vivek Goyal
2011-06-28 2:03 ` Shaohua Li
2011-06-28 13:04 ` Vivek Goyal
2011-06-29 1:04 ` Shaohua Li
2011-06-29 1:29 ` Vivek Goyal
2011-06-30 0:29 ` Shaohua Li
2011-06-28 2:47 ` Dave Chinner
2011-06-28 13:35 ` Vivek Goyal [this message]
2011-06-28 11:00 ` Konstantin Khlebnikov
2011-06-28 13:45 ` Vivek Goyal
2011-06-28 14:42 ` Konstantin Khlebnikov
2011-06-28 14:47 ` Vivek Goyal
2011-06-28 21:20 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110628133557.GB17552@redhat.com \
--to=vgoyal@redhat.com \
--cc=david@fromorbit.com \
--cc=jaxboe@fusionio.com \
--cc=jmoyer@redhat.com \
--cc=khlebnikov@openvz.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).