From: Tejun Heo <tj@kernel.org>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Milan Broz <gmazyland@gmail.com>,
Mikulas Patocka <mpatocka@redhat.com>,
dm-devel@redhat.com, Andi Kleen <andi@firstfloor.org>,
dm-crypt@saout.de, linux-kernel@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Christian Schmidt <schmidt@digadd.de>,
Vivek Goyal <vgoyal@redhat.com>, Jens Axboe <axboe@kernel.dk>
Subject: Re: dm-crypt performance
Date: Thu, 28 Mar 2013 11:53:27 -0700 [thread overview]
Message-ID: <20130328185327.GF14088@htj.dyndns.org> (raw)
In-Reply-To: <20130326202837.GA5599@redhat.com>
Hello,
(cc'ing Vivek and Jens for the iosched related bits)
On Tue, Mar 26, 2013 at 04:28:38PM -0400, Mike Snitzer wrote:
> On Tue, Mar 26 2013 at 4:05pm -0400,
> Milan Broz <gmazyland@gmail.com> wrote:
>
> > >On Mon, Mar 25, 2013 at 11:47:22PM -0400, Mikulas Patocka wrote:
> > >
> > >>For best performance we could use the unbound workqueue implementation
> > >>with request sorting, if people don't object to the request sorting being
> > >>done in dm-crypt.
> >
> > So again:
> >
> > - why IO scheduler is not working properly here? Do it need some extensions?
> > If fixed, it can help even is some other non-dmcrypt IO patterns.
> > (I mean dmcrypt can set some special parameter for underlying device queue
> > automagically to fine-tune sorting parameters.)
>
> Not sure, but IO scheduler changes are fairly slow to materialize given
> the potential for adverse side-effects. Are you so surprised that a
> shotgun blast of IOs might make the IO schduler less optimal than if
> some basic sorting were done at the layer above?
My memory is already pretty hazy but Vivek should be able to correct
me if I say something nonsense. The thing is, the order and timings
of IOs coming down from upper layers has certain meanings to ioscheds
and they exploit those patterns to do better scheduling.
Reordering IOs randomly actually makes certain information about the
IO stream lost and makes ioscheds mis-classify the IO stream -
e.g. what could have been classfied as "mostly consecutive streaming
IO" could after such reordering fail to be detected as such. Sure,
ioscheds can probably be improved to compensate for such temporary
localized reorderings but nothing is free and given that most of the
upper stacks already do pretty good job of issuing IOs orderly when
possible, it would be a bit silly to do more than usually necessary in
ioscheds.
So, no, I don't think maintaining IO order in stacking drivers is a
bad idea. I actually think all stacking drivers should do that;
otherwise, they really are destroying actual useful side-band
information.
> > - can we have some cpu-bound workqueue which automatically switch to unbound
> > (relocates work to another cpu) if it detects some saturation watermark etc?
> > (Again, this can be used in other code.
> > http://www.redhat.com/archives/dm-devel/2012-August/msg00288.html
> > (Yes, I see skepticism there :-)
>
> Question for Tejun? (now cc'd).
Unbound workqueues went through quite a bit of improvements lately and
are currently growing NUMA affinity support. Once merged, all unbound
work items issued on a NUMA node will be processed in the same NUMA
node, which should mitigate some, unfortunately not all, of the
disadvantages compared to per-cpu ones. Mikulas, can you share more
about your test setup? Was it a NUMA machine? Which wq branch did
you use?
The NUMA affinity support would have less severe but similar issue as
per-cpu. If all IOs are being issued from one node while other nodes
are idle, that specific node can get saturated. NUMA affinity support
is adjusted both from inside kernel and userland via sysfs, so there
are control knobs for corner cases.
As for maintaining CPU or NUMA affinity until the CPU / node is
saturated and spilling to other CPUs/nodes beyond that, yeah, an
interesting idea. It's non-trivial and would have to incorporate a
lot of notions on "load" similar to the scheduler. It really becomes
a generic load balancing problem as it'd be pointless and actually
harmful to, say, spill work items to each other between two saturated
NUMA nodes.
So, if the brunt of scattering workload across random CPUs can be
avoided by NUMA affinity, that could be a reasonable tradeoff, I
think.
Thanks.
--
tejun
next prev parent reply other threads:[~2013-03-28 18:53 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-26 3:47 dm-crypt performance Mikulas Patocka
2013-03-26 6:52 ` Christoph Hellwig
2013-03-26 12:27 ` [dm-devel] " Alasdair G Kergon
2013-03-26 20:05 ` Milan Broz
2013-03-26 20:28 ` Mike Snitzer
2013-03-26 20:58 ` Milan Broz
2013-03-28 18:53 ` Tejun Heo [this message]
2013-03-28 19:33 ` Vivek Goyal
2013-03-28 19:44 ` Tejun Heo
2013-03-28 20:38 ` Vivek Goyal
2013-03-28 20:45 ` Tejun Heo
2013-04-09 17:51 ` dm-crypt parallelization patches Mikulas Patocka
2013-04-09 17:57 ` Tejun Heo
2013-04-09 18:08 ` Mikulas Patocka
2013-04-09 18:10 ` Tejun Heo
2013-04-09 18:42 ` Vivek Goyal
2013-04-09 18:57 ` Tejun Heo
2013-04-09 19:13 ` Vivek Goyal
2013-04-09 19:42 ` Mikulas Patocka
2013-04-09 19:52 ` Tejun Heo
2013-04-09 20:32 ` Mikulas Patocka
2013-04-09 21:02 ` Tejun Heo
2013-04-09 21:03 ` Tejun Heo
2013-04-09 21:07 ` Vivek Goyal
2013-04-09 21:18 ` Mikulas Patocka
2013-04-10 19:24 ` Vivek Goyal
2013-04-10 23:42 ` [PATCH] make dm and dm-crypt forward cgroup context (was: dm-crypt parallelization patches) Mikulas Patocka
2013-04-10 23:50 ` Tejun Heo
2013-04-11 19:49 ` [PATCH v2] " Mikulas Patocka
2013-04-11 19:52 ` Tejun Heo
2013-04-11 20:00 ` Tejun Heo
2013-04-12 0:06 ` Mikulas Patocka
2013-04-12 0:22 ` Tejun Heo
2013-04-12 5:59 ` [PATCH v2] make dm and dm-crypt forward cgroup context Milan Broz
2013-04-12 18:17 ` [PATCH v2] make dm and dm-crypt forward cgroup context (was: dm-crypt parallelization patches) Mikulas Patocka
2013-04-12 18:01 ` Mikulas Patocka
2013-04-12 18:29 ` Tejun Heo
2013-04-15 13:02 ` Mikulas Patocka
2013-04-16 17:24 ` Tejun Heo
2013-04-16 19:41 ` Mikulas Patocka
2013-04-18 16:47 ` Mike Snitzer
2013-04-18 17:03 ` Tejun Heo
2013-05-22 18:50 ` Mike Snitzer
2013-05-22 19:48 ` Tejun Heo
2013-04-09 18:36 ` dm-crypt parallelization patches Vivek Goyal
[not found] ` <5151FF82.6090405-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-04-09 18:08 ` [dm-devel] dm-crypt performance Mikulas Patocka
2013-04-09 18:59 ` [dm-crypt] " Milan Broz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130328185327.GF14088@htj.dyndns.org \
--to=tj@kernel.org \
--cc=andi@firstfloor.org \
--cc=axboe@kernel.dk \
--cc=dm-crypt@saout.de \
--cc=dm-devel@redhat.com \
--cc=gmazyland@gmail.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=schmidt@digadd.de \
--cc=snitzer@redhat.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).