linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Chris Mason <chris.mason@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-raid@vger.kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/2] more raid456 thread pool experimentation
Date: Wed, 24 Mar 2010 11:06:40 -0700	[thread overview]
Message-ID: <e9c3a7c21003241106h10d1bc1am22b40fef4a335f70@mail.gmail.com> (raw)
In-Reply-To: <20100324155154.GG5021@think>

On Wed, Mar 24, 2010 at 8:51 AM, Chris Mason <chris.mason@oracle.com> wrote:
> On Wed, Mar 24, 2010 at 07:53:10AM -0700, Dan Williams wrote:
>> The current implementation with the async thread pool ends up spreading
>> the work over too many threads.  The btrfs workqueue is targeted at high
>> cpu utilization works and has a threshold mechanism to limit thread
>> spawning.  Unfortunately it still ends up increasing cpu utilization
>> without a comparable improvement in throughput.  Here are the numbers
>> relative to the multicore disabled case:
>>
>> idle_thresh   throughput      cycles
>> 4             +0%             +102%
>> 64            +4%             +63%
>> 128           +1%             +45%
>
> Interesting, do the btrfs workqueues improve things?  Or do you think they are
> just a better base for more tuning?

Both, throughput falls off a cliff with the async thread pool, and
there are more knobs to turn in this implementation.

> I had always hoped to find more users for the work queues and tried to
> keep btrfs specific features out of them.  The place I didn't entirely
> succeed was in the spin locks, the ordered queues take regular spin
> locks to avoid turning off irqs where btrfs always does things outside
> of interrupt time.  Doesn't look like raid needs the ordered queues so
> this should work pretty well.
>
>>
>> This appears to show that something more fundamental needs to happen to
>> take advantage of percpu raid processing.  More profiling is needed, but
>> the suspects in my mind are conf->device_lock contention and the fact
>> that all work is serialized through conf->handle_list with no method for
>> encouraging stripe_head to thread affinity.
>
> The big place I'd suggest to look inside the btrfs async-thread.c for
> optimization is the worker_loop().  For work that tends to be bursty and
> relatively short, we can have worker threads finish their work fairly
> quickly and go to sleep, only to be woken up very quickly again with
> another big batch of work.  The worker_loop code tries to wait around
> for a while, but the tuning here was btrfs specific.
>
> It might also help to tune the find_worker and next_worker code to prefer
> giving work to threads that are running but almost done with their
> queue.  Maybe they can be put onto a special hot list as they get near
> the end of their queue.
>

Thanks I'll take a look at these suggestions.  For these optimizations
to have a chance I think we need stripes to maintain affinity with the
first core that picks up the work.  Currently all stripes take a trip
through the single-threaded raid5d when their reference count drops to
zero, only to be immediately reissued to the thread pool potentially
on a different core (but I need to back this assumption up with more
profiling).

> There's a rule somewhere that patches renaming things must have replies
> questioning the new name.  The first reply isn't actually allowed to
> suggest a better name, which is good because I'm not very good at
> that kind of thing.
>
> Really though, btr_queue is fine by me, but don't feel obligated to keep
> some variation of btrfs in the name ;)

btr_queue seemed to make sense since it's spreading work like "butter" :-).

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-03-24 18:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-24 14:53 [RFC PATCH 0/2] more raid456 thread pool experimentation Dan Williams
2010-03-24 14:53 ` [RFC PATCH 1/2] btrq: uplevel the btrfs thread pool for md/raid456 usage Dan Williams
2010-03-24 14:53 ` [RFC PATCH 2/2] md/raid456: switch to btrq for multicore operation Dan Williams
2010-03-24 15:51 ` [RFC PATCH 0/2] more raid456 thread pool experimentation Chris Mason
2010-03-24 18:06   ` Dan Williams [this message]
2010-03-24 19:31     ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9c3a7c21003241106h10d1bc1am22b40fef4a335f70@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).