diff for duplicates of <20100324193156.GH5021@think> diff --git a/a/1.txt b/N1/1.txt index af537ad..454bf12 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -1,28 +1,21 @@ On Wed, Mar 24, 2010 at 11:06:40AM -0700, Dan Williams wrote: -> On Wed, Mar 24, 2010 at 8:51 AM, Chris Mason <chris.mason@oracle.com>= - wrote: +> On Wed, Mar 24, 2010 at 8:51 AM, Chris Mason <chris.mason@oracle.com> wrote: > > On Wed, Mar 24, 2010 at 07:53:10AM -0700, Dan Williams wrote: -> >> The current implementation with the async thread pool ends up spre= -ading -> >> the work over too many threads. =A0The btrfs workqueue is targeted= - at high -> >> cpu utilization works and has a threshold mechanism to limit threa= -d -> >> spawning. =A0Unfortunately it still ends up increasing cpu utiliza= -tion -> >> without a comparable improvement in throughput. =A0Here are the nu= -mbers +> >> The current implementation with the async thread pool ends up spreading +> >> the work over too many threads. The btrfs workqueue is targeted at high +> >> cpu utilization works and has a threshold mechanism to limit thread +> >> spawning. Unfortunately it still ends up increasing cpu utilization +> >> without a comparable improvement in throughput. Here are the numbers > >> relative to the multicore disabled case: > >> -> >> idle_thresh =A0 throughput =A0 =A0 =A0cycles -> >> 4 =A0 =A0 =A0 =A0 =A0 =A0 +0% =A0 =A0 =A0 =A0 =A0 =A0 +102% -> >> 64 =A0 =A0 =A0 =A0 =A0 =A0+4% =A0 =A0 =A0 =A0 =A0 =A0 +63% -> >> 128 =A0 =A0 =A0 =A0 =A0 +1% =A0 =A0 =A0 =A0 =A0 =A0 +45% +> >> idle_thresh throughput cycles +> >> 4 +0% +102% +> >> 64 +4% +63% +> >> 128 +1% +45% > > -> > Interesting, do the btrfs workqueues improve things? =A0Or do you t= -hink they are +> > Interesting, do the btrfs workqueues improve things? Or do you think they are > > just a better base for more tuning? ->=20 +> > Both, throughput falls off a cliff with the async thread pool, and > there are more knobs to turn in this implementation. @@ -31,56 +24,38 @@ at 63% more cycles. That looks like something more than just cache affinity, but at least for btrfs I always had bigger problems than affinity, so I haven't dug into that part yet. ->=20 -> > I had always hoped to find more users for the work queues and tried= - to -> > keep btrfs specific features out of them. =A0The place I didn't ent= -irely +> +> > I had always hoped to find more users for the work queues and tried to +> > keep btrfs specific features out of them. The place I didn't entirely > > succeed was in the spin locks, the ordered queues take regular spin -> > locks to avoid turning off irqs where btrfs always does things outs= -ide -> > of interrupt time. =A0Doesn't look like raid needs the ordered queu= -es so +> > locks to avoid turning off irqs where btrfs always does things outside +> > of interrupt time. Doesn't look like raid needs the ordered queues so > > this should work pretty well. > > > >> -> >> This appears to show that something more fundamental needs to happ= -en to -> >> take advantage of percpu raid processing. =A0More profiling is nee= -ded, but -> >> the suspects in my mind are conf->device_lock contention and the f= -act -> >> that all work is serialized through conf->handle_list with no meth= -od for +> >> This appears to show that something more fundamental needs to happen to +> >> take advantage of percpu raid processing. More profiling is needed, but +> >> the suspects in my mind are conf->device_lock contention and the fact +> >> that all work is serialized through conf->handle_list with no method for > >> encouraging stripe_head to thread affinity. > > -> > The big place I'd suggest to look inside the btrfs async-thread.c f= -or -> > optimization is the worker_loop(). =A0For work that tends to be bur= -sty and -> > relatively short, we can have worker threads finish their work fair= -ly -> > quickly and go to sleep, only to be woken up very quickly again wit= -h -> > another big batch of work. =A0The worker_loop code tries to wait ar= -ound +> > The big place I'd suggest to look inside the btrfs async-thread.c for +> > optimization is the worker_loop(). For work that tends to be bursty and +> > relatively short, we can have worker threads finish their work fairly +> > quickly and go to sleep, only to be woken up very quickly again with +> > another big batch of work. The worker_loop code tries to wait around > > for a while, but the tuning here was btrfs specific. > > -> > It might also help to tune the find_worker and next_worker code to = -prefer +> > It might also help to tune the find_worker and next_worker code to prefer > > giving work to threads that are running but almost done with their -> > queue. =A0Maybe they can be put onto a special hot list as they get= - near +> > queue. Maybe they can be put onto a special hot list as they get near > > the end of their queue. > > ->=20 -> Thanks I'll take a look at these suggestions. For these optimization= -s -> to have a chance I think we need stripes to maintain affinity with th= -e +> +> Thanks I'll take a look at these suggestions. For these optimizations +> to have a chance I think we need stripes to maintain affinity with the > first core that picks up the work. Currently all stripes take a trip -> through the single-threaded raid5d when their reference count drops t= -o +> through the single-threaded raid5d when their reference count drops to > zero, only to be immediately reissued to the thread pool potentially > on a different core (but I need to back this assumption up with more > profiling). @@ -88,23 +63,18 @@ o Well, I'd definitely say the pool is going to give you a random core. ->=20 -> > There's a rule somewhere that patches renaming things must have rep= -lies -> > questioning the new name. =A0The first reply isn't actually allowed= - to +> +> > There's a rule somewhere that patches renaming things must have replies +> > questioning the new name. The first reply isn't actually allowed to > > suggest a better name, which is good because I'm not very good at > > that kind of thing. > > -> > Really though, btr_queue is fine by me, but don't feel obligated to= - keep +> > Really though, btr_queue is fine by me, but don't feel obligated to keep > > some variation of btrfs in the name ;) ->=20 -> btr_queue seemed to make sense since it's spreading work like "butter= -" :-). +> +> btr_queue seemed to make sense since it's spreading work like "butter" :-). -Grin, it doesn't take many butter jokes to sell me. Careful though, yo= -u +Grin, it doesn't take many butter jokes to sell me. Careful though, you can't have just one buttery thread pool. Just wait, you'll think of good reasons to add another real soon. @@ -112,14 +82,12 @@ Speaking of which, you want to use the atomic thread starting helper. Basically kthread_run does GFP allocations that can wait on disk, and if you're the disk it is waiting on things can go badly in a hurry. -We should probably integrate a single kthread run helper thread into th= -e +We should probably integrate a single kthread run helper thread into the pools instead of forcing people to make their own. I'll take a look. -chris -- -To unsubscribe from this list: send the line "unsubscribe linux-raid" i= -n +To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/a/content_digest b/N1/content_digest index 62f0af4..039d80a 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -10,30 +10,23 @@ "\00:1\0" "b\0" "On Wed, Mar 24, 2010 at 11:06:40AM -0700, Dan Williams wrote:\n" - "> On Wed, Mar 24, 2010 at 8:51 AM, Chris Mason <chris.mason@oracle.com>=\n" - " wrote:\n" + "> On Wed, Mar 24, 2010 at 8:51 AM, Chris Mason <chris.mason@oracle.com> wrote:\n" "> > On Wed, Mar 24, 2010 at 07:53:10AM -0700, Dan Williams wrote:\n" - "> >> The current implementation with the async thread pool ends up spre=\n" - "ading\n" - "> >> the work over too many threads. =A0The btrfs workqueue is targeted=\n" - " at high\n" - "> >> cpu utilization works and has a threshold mechanism to limit threa=\n" - "d\n" - "> >> spawning. =A0Unfortunately it still ends up increasing cpu utiliza=\n" - "tion\n" - "> >> without a comparable improvement in throughput. =A0Here are the nu=\n" - "mbers\n" + "> >> The current implementation with the async thread pool ends up spreading\n" + "> >> the work over too many threads. \302\240The btrfs workqueue is targeted at high\n" + "> >> cpu utilization works and has a threshold mechanism to limit thread\n" + "> >> spawning. \302\240Unfortunately it still ends up increasing cpu utilization\n" + "> >> without a comparable improvement in throughput. \302\240Here are the numbers\n" "> >> relative to the multicore disabled case:\n" "> >>\n" - "> >> idle_thresh =A0 throughput =A0 =A0 =A0cycles\n" - "> >> 4 =A0 =A0 =A0 =A0 =A0 =A0 +0% =A0 =A0 =A0 =A0 =A0 =A0 +102%\n" - "> >> 64 =A0 =A0 =A0 =A0 =A0 =A0+4% =A0 =A0 =A0 =A0 =A0 =A0 +63%\n" - "> >> 128 =A0 =A0 =A0 =A0 =A0 +1% =A0 =A0 =A0 =A0 =A0 =A0 +45%\n" + "> >> idle_thresh \302\240 throughput \302\240 \302\240 \302\240cycles\n" + "> >> 4 \302\240 \302\240 \302\240 \302\240 \302\240 \302\240 +0% \302\240 \302\240 \302\240 \302\240 \302\240 \302\240 +102%\n" + "> >> 64 \302\240 \302\240 \302\240 \302\240 \302\240 \302\240+4% \302\240 \302\240 \302\240 \302\240 \302\240 \302\240 +63%\n" + "> >> 128 \302\240 \302\240 \302\240 \302\240 \302\240 +1% \302\240 \302\240 \302\240 \302\240 \302\240 \302\240 +45%\n" "> >\n" - "> > Interesting, do the btrfs workqueues improve things? =A0Or do you t=\n" - "hink they are\n" + "> > Interesting, do the btrfs workqueues improve things? \302\240Or do you think they are\n" "> > just a better base for more tuning?\n" - ">=20\n" + "> \n" "> Both, throughput falls off a cliff with the async thread pool, and\n" "> there are more knobs to turn in this implementation.\n" "\n" @@ -42,56 +35,38 @@ "affinity, but at least for btrfs I always had bigger problems than\n" "affinity, so I haven't dug into that part yet.\n" "\n" - ">=20\n" - "> > I had always hoped to find more users for the work queues and tried=\n" - " to\n" - "> > keep btrfs specific features out of them. =A0The place I didn't ent=\n" - "irely\n" + "> \n" + "> > I had always hoped to find more users for the work queues and tried to\n" + "> > keep btrfs specific features out of them. \302\240The place I didn't entirely\n" "> > succeed was in the spin locks, the ordered queues take regular spin\n" - "> > locks to avoid turning off irqs where btrfs always does things outs=\n" - "ide\n" - "> > of interrupt time. =A0Doesn't look like raid needs the ordered queu=\n" - "es so\n" + "> > locks to avoid turning off irqs where btrfs always does things outside\n" + "> > of interrupt time. \302\240Doesn't look like raid needs the ordered queues so\n" "> > this should work pretty well.\n" "> >\n" "> >>\n" - "> >> This appears to show that something more fundamental needs to happ=\n" - "en to\n" - "> >> take advantage of percpu raid processing. =A0More profiling is nee=\n" - "ded, but\n" - "> >> the suspects in my mind are conf->device_lock contention and the f=\n" - "act\n" - "> >> that all work is serialized through conf->handle_list with no meth=\n" - "od for\n" + "> >> This appears to show that something more fundamental needs to happen to\n" + "> >> take advantage of percpu raid processing. \302\240More profiling is needed, but\n" + "> >> the suspects in my mind are conf->device_lock contention and the fact\n" + "> >> that all work is serialized through conf->handle_list with no method for\n" "> >> encouraging stripe_head to thread affinity.\n" "> >\n" - "> > The big place I'd suggest to look inside the btrfs async-thread.c f=\n" - "or\n" - "> > optimization is the worker_loop(). =A0For work that tends to be bur=\n" - "sty and\n" - "> > relatively short, we can have worker threads finish their work fair=\n" - "ly\n" - "> > quickly and go to sleep, only to be woken up very quickly again wit=\n" - "h\n" - "> > another big batch of work. =A0The worker_loop code tries to wait ar=\n" - "ound\n" + "> > The big place I'd suggest to look inside the btrfs async-thread.c for\n" + "> > optimization is the worker_loop(). \302\240For work that tends to be bursty and\n" + "> > relatively short, we can have worker threads finish their work fairly\n" + "> > quickly and go to sleep, only to be woken up very quickly again with\n" + "> > another big batch of work. \302\240The worker_loop code tries to wait around\n" "> > for a while, but the tuning here was btrfs specific.\n" "> >\n" - "> > It might also help to tune the find_worker and next_worker code to =\n" - "prefer\n" + "> > It might also help to tune the find_worker and next_worker code to prefer\n" "> > giving work to threads that are running but almost done with their\n" - "> > queue. =A0Maybe they can be put onto a special hot list as they get=\n" - " near\n" + "> > queue. \302\240Maybe they can be put onto a special hot list as they get near\n" "> > the end of their queue.\n" "> >\n" - ">=20\n" - "> Thanks I'll take a look at these suggestions. For these optimization=\n" - "s\n" - "> to have a chance I think we need stripes to maintain affinity with th=\n" - "e\n" + "> \n" + "> Thanks I'll take a look at these suggestions. For these optimizations\n" + "> to have a chance I think we need stripes to maintain affinity with the\n" "> first core that picks up the work. Currently all stripes take a trip\n" - "> through the single-threaded raid5d when their reference count drops t=\n" - "o\n" + "> through the single-threaded raid5d when their reference count drops to\n" "> zero, only to be immediately reissued to the thread pool potentially\n" "> on a different core (but I need to back this assumption up with more\n" "> profiling).\n" @@ -99,23 +74,18 @@ "Well, I'd definitely say the pool is going to give you a random\n" "core.\n" "\n" - ">=20\n" - "> > There's a rule somewhere that patches renaming things must have rep=\n" - "lies\n" - "> > questioning the new name. =A0The first reply isn't actually allowed=\n" - " to\n" + "> \n" + "> > There's a rule somewhere that patches renaming things must have replies\n" + "> > questioning the new name. \302\240The first reply isn't actually allowed to\n" "> > suggest a better name, which is good because I'm not very good at\n" "> > that kind of thing.\n" "> >\n" - "> > Really though, btr_queue is fine by me, but don't feel obligated to=\n" - " keep\n" + "> > Really though, btr_queue is fine by me, but don't feel obligated to keep\n" "> > some variation of btrfs in the name ;)\n" - ">=20\n" - "> btr_queue seemed to make sense since it's spreading work like \"butter=\n" - "\" :-).\n" + "> \n" + "> btr_queue seemed to make sense since it's spreading work like \"butter\" :-).\n" "\n" - "Grin, it doesn't take many butter jokes to sell me. Careful though, yo=\n" - "u\n" + "Grin, it doesn't take many butter jokes to sell me. Careful though, you\n" "can't have just one buttery thread pool. Just wait, you'll think of\n" "good reasons to add another real soon.\n" "\n" @@ -123,16 +93,14 @@ "Basically kthread_run does GFP allocations that can wait on disk, and\n" "if you're the disk it is waiting on things can go badly in a hurry.\n" "\n" - "We should probably integrate a single kthread run helper thread into th=\n" - "e\n" + "We should probably integrate a single kthread run helper thread into the\n" "pools instead of forcing people to make their own. I'll take a look.\n" "\n" "-chris\n" "\n" "--\n" - "To unsubscribe from this list: send the line \"unsubscribe linux-raid\" i=\n" - "n\n" + "To unsubscribe from this list: send the line \"unsubscribe linux-raid\" in\n" "the body of a message to majordomo@vger.kernel.org\n" More majordomo info at http://vger.kernel.org/majordomo-info.html -c048c3f2123735ee9e9bc3abb26ceb0fe8f2b9382d5d3f14f1b8e9221e4821e6 +5774136127d6c5ab54b05301f762422e7762fbfc0523a72c5014725e7939310b
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.