* [PATCH]vmscan: add block plug for page reclaim @ 2011-07-20 2:53 Shaohua Li 2011-07-20 5:53 ` Minchan Kim 0 siblings, 1 reply; 16+ messages in thread From: Shaohua Li @ 2011-07-20 2:53 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, mgorman, linux-mm, lkml per-task block plug can reduce block queue lock contention and increase request merge. Currently page reclaim doesn't support it. I originally thought page reclaim doesn't need it, because kswapd thread count is limited and file cache write is done at flusher mostly. When I test a workload with heavy swap in a 4-node machine, each CPU is doing direct page reclaim and swap. This causes block queue lock contention. In my test, without below patch, the CPU utilization is about 2% ~ 7%. With the patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. This should improve normal kswapd write and file cache write too (increase request merge for example), but might not be so obvious as I explain above. Signed-off-by: Shaohua Li <shaohua.li@intel.com> diff --git a/mm/vmscan.c b/mm/vmscan.c index 5ed24b9..8ec04b2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1933,12 +1933,14 @@ static void shrink_zone(int priority, struct zone *zone, enum lru_list l; unsigned long nr_reclaimed, nr_scanned; unsigned long nr_to_reclaim = sc->nr_to_reclaim; + struct blk_plug plug; restart: nr_reclaimed = 0; nr_scanned = sc->nr_scanned; get_scan_count(zone, sc, nr, priority); + blk_start_plug(&plug); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) { for_each_evictable_lru(l) { @@ -1962,6 +1964,7 @@ restart: if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY) break; } + blk_finish_plug(&plug); sc->nr_reclaimed += nr_reclaimed; /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-20 2:53 [PATCH]vmscan: add block plug for page reclaim Shaohua Li @ 2011-07-20 5:53 ` Minchan Kim 2011-07-20 6:10 ` Shaohua Li 0 siblings, 1 reply; 16+ messages in thread From: Minchan Kim @ 2011-07-20 5:53 UTC (permalink / raw) To: Shaohua Li; +Cc: Jens Axboe, Andrew Morton, mgorman, linux-mm, lkml On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote: > per-task block plug can reduce block queue lock contention and increase request > merge. Currently page reclaim doesn't support it. I originally thought page > reclaim doesn't need it, because kswapd thread count is limited and file cache > write is done at flusher mostly. > When I test a workload with heavy swap in a 4-node machine, each CPU is doing > direct page reclaim and swap. This causes block queue lock contention. In my > test, without below patch, the CPU utilization is about 2% ~ 7%. With the > patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. Why doesn't it enhance through? It means merge is rare? > This should improve normal kswapd write and file cache write too (increase > request merge for example), but might not be so obvious as I explain above. CPU utilization enhance on 4-node machine with heavy swap? I think it isn't common situation. And I don't want to add new stack usage if it doesn't have a benefit. As you know, direct reclaim path has a stack overflow. These days, Mel, Dave and Christoph try to remove write path in reclaim for solving stack usage and enhance write performance. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-20 5:53 ` Minchan Kim @ 2011-07-20 6:10 ` Shaohua Li 2011-07-20 6:30 ` Minchan Kim 0 siblings, 1 reply; 16+ messages in thread From: Shaohua Li @ 2011-07-20 6:10 UTC (permalink / raw) To: Minchan Kim; +Cc: Jens Axboe, Andrew Morton, mgorman@suse.de, linux-mm, lkml On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote: > On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote: > > per-task block plug can reduce block queue lock contention and increase request > > merge. Currently page reclaim doesn't support it. I originally thought page > > reclaim doesn't need it, because kswapd thread count is limited and file cache > > write is done at flusher mostly. > > When I test a workload with heavy swap in a 4-node machine, each CPU is doing > > direct page reclaim and swap. This causes block queue lock contention. In my > > test, without below patch, the CPU utilization is about 2% ~ 7%. With the > > patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. > > Why doesn't it enhance through? throughput? The disk isn't that fast. We already can make it run in full speed, CPU isn't bottleneck here. > It means merge is rare? Merge is still there even without my patch, but maybe not be able to make the request size biggest in cocurrent I/O. > > This should improve normal kswapd write and file cache write too (increase > > request merge for example), but might not be so obvious as I explain above. > > CPU utilization enhance on 4-node machine with heavy swap? > I think it isn't common situation. > > And I don't want to add new stack usage if it doesn't have a benefit. > As you know, direct reclaim path has a stack overflow. > These days, Mel, Dave and Christoph try to remove write path in > reclaim for solving stack usage and enhance write performance. it will use a little stack, yes. When I said the benefit isn't so obvious, it doesn't mean it has no benefit. For example, if kswapd and other threads write the same disk, this can still reduce lock contention and increase request merge. Part reason I didn't see obvious affect for file cache is my disk is slow. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-20 6:10 ` Shaohua Li @ 2011-07-20 6:30 ` Minchan Kim 2011-07-20 6:49 ` Shaohua Li 0 siblings, 1 reply; 16+ messages in thread From: Minchan Kim @ 2011-07-20 6:30 UTC (permalink / raw) To: Shaohua Li; +Cc: Jens Axboe, Andrew Morton, mgorman@suse.de, linux-mm, lkml On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote: > On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote: >> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote: >> > per-task block plug can reduce block queue lock contention and increase request >> > merge. Currently page reclaim doesn't support it. I originally thought page >> > reclaim doesn't need it, because kswapd thread count is limited and file cache >> > write is done at flusher mostly. >> > When I test a workload with heavy swap in a 4-node machine, each CPU is doing >> > direct page reclaim and swap. This causes block queue lock contention. In my >> > test, without below patch, the CPU utilization is about 2% ~ 7%. With the >> > patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. >> >> Why doesn't it enhance through? > throughput? The disk isn't that fast. We already can make it run in full Yes. Sorry for the typo. > speed, CPU isn't bottleneck here. But you try to optimize CPU. so your experiment is not good. > >> It means merge is rare? > Merge is still there even without my patch, but maybe not be able to > make the request size biggest in cocurrent I/O. > >> > This should improve normal kswapd write and file cache write too (increase >> > request merge for example), but might not be so obvious as I explain above. >> >> CPU utilization enhance on 4-node machine with heavy swap? >> I think it isn't common situation. >> >> And I don't want to add new stack usage if it doesn't have a benefit. >> As you know, direct reclaim path has a stack overflow. >> These days, Mel, Dave and Christoph try to remove write path in >> reclaim for solving stack usage and enhance write performance. > it will use a little stack, yes. When I said the benefit isn't so > obvious, it doesn't mean it has no benefit. For example, if kswapd and > other threads write the same disk, this can still reduce lock contention > and increase request merge. Part reason I didn't see obvious affect for > file cache is my disk is slow. If it begin swapping, I think the the performance would be less important, But your patch is so simple that it would be mergable(Maybe Andrew would merge regardless of my comment) but impact is a little in your experiment. I suggest you test it with fast disk like SSD and show the benefit to us certainly. (I think you intel guy have a good SSD, apparently :D ) -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-20 6:30 ` Minchan Kim @ 2011-07-20 6:49 ` Shaohua Li 2011-07-21 19:32 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Shaohua Li @ 2011-07-20 6:49 UTC (permalink / raw) To: Minchan Kim; +Cc: Jens Axboe, Andrew Morton, mgorman@suse.de, linux-mm, lkml On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote: > On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote: > > On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote: > >> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote: > >> > per-task block plug can reduce block queue lock contention and increase request > >> > merge. Currently page reclaim doesn't support it. I originally thought page > >> > reclaim doesn't need it, because kswapd thread count is limited and file cache > >> > write is done at flusher mostly. > >> > When I test a workload with heavy swap in a 4-node machine, each CPU is doing > >> > direct page reclaim and swap. This causes block queue lock contention. In my > >> > test, without below patch, the CPU utilization is about 2% ~ 7%. With the > >> > patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. > >> > >> Why doesn't it enhance through? > > throughput? The disk isn't that fast. We already can make it run in full > > Yes. Sorry for the typo. > > > speed, CPU isn't bottleneck here. > > But you try to optimize CPU. so your experiment is not good. it's not that good, because the disk isn't fast. The swap test is the workload with most significant impact I can get. > >> It means merge is rare? > > Merge is still there even without my patch, but maybe not be able to > > make the request size biggest in cocurrent I/O. > > > >> > This should improve normal kswapd write and file cache write too (increase > >> > request merge for example), but might not be so obvious as I explain above. > >> > >> CPU utilization enhance on 4-node machine with heavy swap? > >> I think it isn't common situation. > >> > >> And I don't want to add new stack usage if it doesn't have a benefit. > >> As you know, direct reclaim path has a stack overflow. > >> These days, Mel, Dave and Christoph try to remove write path in > >> reclaim for solving stack usage and enhance write performance. > > it will use a little stack, yes. When I said the benefit isn't so > > obvious, it doesn't mean it has no benefit. For example, if kswapd and > > other threads write the same disk, this can still reduce lock contention > > and increase request merge. Part reason I didn't see obvious affect for > > file cache is my disk is slow. > > If it begin swapping, I think the the performance would be less important, > But your patch is so simple that it would be mergable(Maybe Andrew > would merge regardless of my comment) but impact is a little in your > experiment. > > I suggest you test it with fast disk like SSD and show the benefit to > us certainly. (I think you intel guy have a good SSD, apparently :D ) I do have one, but it's not good :). The high throughput is just like a normal disk. A hardware raid is preferred to do the test, but I haven't. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-20 6:49 ` Shaohua Li @ 2011-07-21 19:32 ` Jens Axboe 2011-07-22 5:14 ` Shaohua Li 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2011-07-21 19:32 UTC (permalink / raw) To: Shaohua Li; +Cc: Minchan Kim, Andrew Morton, mgorman@suse.de, linux-mm, lkml On 2011-07-20 08:49, Shaohua Li wrote: > On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote: >> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote: >>> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote: >>>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote: >>>>> per-task block plug can reduce block queue lock contention and increase request >>>>> merge. Currently page reclaim doesn't support it. I originally thought page >>>>> reclaim doesn't need it, because kswapd thread count is limited and file cache >>>>> write is done at flusher mostly. >>>>> When I test a workload with heavy swap in a 4-node machine, each CPU is doing >>>>> direct page reclaim and swap. This causes block queue lock contention. In my >>>>> test, without below patch, the CPU utilization is about 2% ~ 7%. With the >>>>> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. >>>> >>>> Why doesn't it enhance through? >>> throughput? The disk isn't that fast. We already can make it run in full >> >> Yes. Sorry for the typo. >> >>> speed, CPU isn't bottleneck here. >> >> But you try to optimize CPU. so your experiment is not good. > it's not that good, because the disk isn't fast. The swap test is the > workload with most significant impact I can get. Let me just interject here that a plug should be fine, from 3.1 we'll even auto-unplug if a certain depth has been reached. So latency should not be a worry. Personally I think the patch looks fine, though some numbers would be interesting to see. Cycles spent submitting the actual IO, combined with IO statistics what kind of IO patterns were observed for plain and with patch would be good. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-21 19:32 ` Jens Axboe @ 2011-07-22 5:14 ` Shaohua Li 2011-07-23 18:49 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Shaohua Li @ 2011-07-22 5:14 UTC (permalink / raw) To: Jens Axboe; +Cc: Minchan Kim, Andrew Morton, mgorman@suse.de, linux-mm, lkml On Fri, 2011-07-22 at 03:32 +0800, Jens Axboe wrote: > On 2011-07-20 08:49, Shaohua Li wrote: > > On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote: > >> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote: > >>> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote: > >>>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote: > >>>>> per-task block plug can reduce block queue lock contention and increase request > >>>>> merge. Currently page reclaim doesn't support it. I originally thought page > >>>>> reclaim doesn't need it, because kswapd thread count is limited and file cache > >>>>> write is done at flusher mostly. > >>>>> When I test a workload with heavy swap in a 4-node machine, each CPU is doing > >>>>> direct page reclaim and swap. This causes block queue lock contention. In my > >>>>> test, without below patch, the CPU utilization is about 2% ~ 7%. With the > >>>>> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. > >>>> > >>>> Why doesn't it enhance through? > >>> throughput? The disk isn't that fast. We already can make it run in full > >> > >> Yes. Sorry for the typo. > >> > >>> speed, CPU isn't bottleneck here. > >> > >> But you try to optimize CPU. so your experiment is not good. > > it's not that good, because the disk isn't fast. The swap test is the > > workload with most significant impact I can get. > > Let me just interject here that a plug should be fine, from 3.1 we'll > even auto-unplug if a certain depth has been reached. So latency should > not be a worry. Personally I think the patch looks fine, though some > numbers would be interesting to see. Cycles spent submitting the actual > IO, combined with IO statistics what kind of IO patterns were observed > for plain and with patch would be good. I can observe the average request size changes. Before the patch, the average request size is about 90k from iostat (but the variation is big). With the patch, the request size is about 100k and variation is small. how to check the cycles spend submitting the I/O? Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-22 5:14 ` Shaohua Li @ 2011-07-23 18:49 ` Jens Axboe 2011-07-27 3:09 ` Shaohua Li 2011-07-27 23:45 ` Andrew Morton 0 siblings, 2 replies; 16+ messages in thread From: Jens Axboe @ 2011-07-23 18:49 UTC (permalink / raw) To: Shaohua Li; +Cc: Minchan Kim, Andrew Morton, mgorman@suse.de, linux-mm, lkml On 2011-07-22 07:14, Shaohua Li wrote: > On Fri, 2011-07-22 at 03:32 +0800, Jens Axboe wrote: >> On 2011-07-20 08:49, Shaohua Li wrote: >>> On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote: >>>> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote: >>>>> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote: >>>>>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote: >>>>>>> per-task block plug can reduce block queue lock contention and increase request >>>>>>> merge. Currently page reclaim doesn't support it. I originally thought page >>>>>>> reclaim doesn't need it, because kswapd thread count is limited and file cache >>>>>>> write is done at flusher mostly. >>>>>>> When I test a workload with heavy swap in a 4-node machine, each CPU is doing >>>>>>> direct page reclaim and swap. This causes block queue lock contention. In my >>>>>>> test, without below patch, the CPU utilization is about 2% ~ 7%. With the >>>>>>> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. >>>>>> >>>>>> Why doesn't it enhance through? >>>>> throughput? The disk isn't that fast. We already can make it run in full >>>> >>>> Yes. Sorry for the typo. >>>> >>>>> speed, CPU isn't bottleneck here. >>>> >>>> But you try to optimize CPU. so your experiment is not good. >>> it's not that good, because the disk isn't fast. The swap test is the >>> workload with most significant impact I can get. >> >> Let me just interject here that a plug should be fine, from 3.1 we'll >> even auto-unplug if a certain depth has been reached. So latency should >> not be a worry. Personally I think the patch looks fine, though some >> numbers would be interesting to see. Cycles spent submitting the actual >> IO, combined with IO statistics what kind of IO patterns were observed >> for plain and with patch would be good. > I can observe the average request size changes. Before the patch, the > average request size is about 90k from iostat (but the variation is > big). With the patch, the request size is about 100k and variation is > small. That's a good win right there, imho. > how to check the cycles spend submitting the I/O? It's not easy, normal profiles is pretty much the only thing to go by. I guess I'm just a bit puzzled by Minchan's reluctance towards the patch, it seems like mostly goodness to me. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-23 18:49 ` Jens Axboe @ 2011-07-27 3:09 ` Shaohua Li 2011-07-27 23:45 ` Andrew Morton 1 sibling, 0 replies; 16+ messages in thread From: Shaohua Li @ 2011-07-27 3:09 UTC (permalink / raw) To: Jens Axboe; +Cc: Minchan Kim, Andrew Morton, mgorman@suse.de, linux-mm, lkml 2011/7/24 Jens Axboe <jaxboe@fusionio.com>: > On 2011-07-22 07:14, Shaohua Li wrote: >> On Fri, 2011-07-22 at 03:32 +0800, Jens Axboe wrote: >>> On 2011-07-20 08:49, Shaohua Li wrote: >>>> On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote: >>>>> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote: >>>>>> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote: >>>>>>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote: >>>>>>>> per-task block plug can reduce block queue lock contention and increase request >>>>>>>> merge. Currently page reclaim doesn't support it. I originally thought page >>>>>>>> reclaim doesn't need it, because kswapd thread count is limited and file cache >>>>>>>> write is done at flusher mostly. >>>>>>>> When I test a workload with heavy swap in a 4-node machine, each CPU is doing >>>>>>>> direct page reclaim and swap. This causes block queue lock contention. In my >>>>>>>> test, without below patch, the CPU utilization is about 2% ~ 7%. With the >>>>>>>> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. >>>>>>> >>>>>>> Why doesn't it enhance through? >>>>>> throughput? The disk isn't that fast. We already can make it run in full >>>>> >>>>> Yes. Sorry for the typo. >>>>> >>>>>> speed, CPU isn't bottleneck here. >>>>> >>>>> But you try to optimize CPU. so your experiment is not good. >>>> it's not that good, because the disk isn't fast. The swap test is the >>>> workload with most significant impact I can get. >>> >>> Let me just interject here that a plug should be fine, from 3.1 we'll >>> even auto-unplug if a certain depth has been reached. So latency should >>> not be a worry. Personally I think the patch looks fine, though some >>> numbers would be interesting to see. Cycles spent submitting the actual >>> IO, combined with IO statistics what kind of IO patterns were observed >>> for plain and with patch would be good. >> I can observe the average request size changes. Before the patch, the >> average request size is about 90k from iostat (but the variation is >> big). With the patch, the request size is about 100k and variation is >> small. > > That's a good win right there, imho. > >> how to check the cycles spend submitting the I/O? > > It's not easy, normal profiles is pretty much the only thing to go by. > > I guess I'm just a bit puzzled by Minchan's reluctance towards the > patch, it seems like mostly goodness to me. akpm, would you consider picking this up? Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-23 18:49 ` Jens Axboe 2011-07-27 3:09 ` Shaohua Li @ 2011-07-27 23:45 ` Andrew Morton 2011-07-28 1:04 ` Shaohua Li 2011-07-29 8:38 ` Minchan Kim 1 sibling, 2 replies; 16+ messages in thread From: Andrew Morton @ 2011-07-27 23:45 UTC (permalink / raw) To: Jens Axboe; +Cc: Shaohua Li, Minchan Kim, mgorman@suse.de, linux-mm, lkml On Sat, 23 Jul 2011 20:49:10 +0200 Jens Axboe <jaxboe@fusionio.com> wrote: > > I can observe the average request size changes. Before the patch, the > > average request size is about 90k from iostat (but the variation is > > big). With the patch, the request size is about 100k and variation is > > small. > > That's a good win right there, imho. yup. Reduced CPU consumption on that path isn't terribly exciting IMO, but improved request size is significant. Using an additional 44 bytes of stack on that path is also significant(ly bad). But we need to fix that problem anyway. One way we could improve things in mm/vmscan.c is to move the blk_plug into scan_control then get the scan_control off the stack in some manner. That's easy for kswapd: allocate one scan_control per kswapd at startup. Doing it for direct-reclaim would be a bit trickier... And I have the usual maintainability whine. If someone comes up to vmscan.c and sees it calling blk_start_plug(), how are they supposed to work out why that call is there? They go look at the blk_start_plug() definition and it is undocumented. I think we can do better than this? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-27 23:45 ` Andrew Morton @ 2011-07-28 1:04 ` Shaohua Li 2011-07-28 1:15 ` Andrew Morton 2011-07-29 8:38 ` Minchan Kim 1 sibling, 1 reply; 16+ messages in thread From: Shaohua Li @ 2011-07-28 1:04 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, Minchan Kim, mgorman@suse.de, linux-mm, lkml On Thu, 2011-07-28 at 07:45 +0800, Andrew Morton wrote: > On Sat, 23 Jul 2011 20:49:10 +0200 > Jens Axboe <jaxboe@fusionio.com> wrote: > > > > I can observe the average request size changes. Before the patch, the > > > average request size is about 90k from iostat (but the variation is > > > big). With the patch, the request size is about 100k and variation is > > > small. > > > > That's a good win right there, imho. > > yup. Reduced CPU consumption on that path isn't terribly exciting IMO, > but improved request size is significant. > > Using an additional 44 bytes of stack on that path is also > significant(ly bad). But we need to fix that problem anyway. One way > we could improve things in mm/vmscan.c is to move the blk_plug into > scan_control then get the scan_control off the stack in some manner. > That's easy for kswapd: allocate one scan_control per kswapd at > startup. Doing it for direct-reclaim would be a bit trickier... unfortunately, the direct-reclaim case is what cares about stack. BTW, the scan_control can be dieted. may_unmap/may_swap/may_writepage can be a bit. swappiness < 100, so can be a char. order <= 11, can be a char. should I do it to cut the size? > And I have the usual maintainability whine. If someone comes up to > vmscan.c and sees it calling blk_start_plug(), how are they supposed to > work out why that call is there? They go look at the blk_start_plug() > definition and it is undocumented. I think we can do better than this? the block plug is a little tricky, we definitely should document it. Jens, if you don't mind, I'll add comments there. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-28 1:04 ` Shaohua Li @ 2011-07-28 1:15 ` Andrew Morton 2011-07-28 1:34 ` Shaohua Li 0 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2011-07-28 1:15 UTC (permalink / raw) To: Shaohua Li; +Cc: Jens Axboe, Minchan Kim, mgorman@suse.de, linux-mm, lkml On Thu, 28 Jul 2011 09:04:20 +0800 Shaohua Li <shaohua.li@intel.com> wrote: > > Using an additional 44 bytes of stack on that path is also > > significant(ly bad). But we need to fix that problem anyway. One way > > we could improve things in mm/vmscan.c is to move the blk_plug into > > scan_control then get the scan_control off the stack in some manner. > > That's easy for kswapd: allocate one scan_control per kswapd at > > startup. Doing it for direct-reclaim would be a bit trickier... > unfortunately, the direct-reclaim case is what cares about stack. > > BTW, the scan_control can be dieted. may_unmap/may_swap/may_writepage > can be a bit. swappiness < 100, so can be a char. order <= 11, can be a > char. should I do it to cut the size? All five will fit in a 32-bit word, at some expense in code size. But I think first it would be better to work on a way of getting it all off the stack, along with the blk_plug. Could be done with a per-cpu array and CPU pinning, but CPU pinning is a bit expensive nowadays. Could put a scan_control* into the tack_struct, but that's dopey. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-28 1:15 ` Andrew Morton @ 2011-07-28 1:34 ` Shaohua Li 0 siblings, 0 replies; 16+ messages in thread From: Shaohua Li @ 2011-07-28 1:34 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, Minchan Kim, mgorman@suse.de, linux-mm, lkml On Thu, 2011-07-28 at 09:15 +0800, Andrew Morton wrote: > On Thu, 28 Jul 2011 09:04:20 +0800 Shaohua Li <shaohua.li@intel.com> wrote: > > > > Using an additional 44 bytes of stack on that path is also > > > significant(ly bad). But we need to fix that problem anyway. One way > > > we could improve things in mm/vmscan.c is to move the blk_plug into > > > scan_control then get the scan_control off the stack in some manner. > > > That's easy for kswapd: allocate one scan_control per kswapd at > > > startup. Doing it for direct-reclaim would be a bit trickier... > > unfortunately, the direct-reclaim case is what cares about stack. > > > > BTW, the scan_control can be dieted. may_unmap/may_swap/may_writepage > > can be a bit. swappiness < 100, so can be a char. order <= 11, can be a > > char. should I do it to cut the size? > > All five will fit in a 32-bit word, at some expense in code size. oh, I missed the code size will increase, so it's not good then. > But I think first it would be better to work on a way of getting it all > off the stack, along with the blk_plug. > > Could be done with a per-cpu array and CPU pinning, but CPU pinning is > a bit expensive nowadays. Could put a scan_control* into the > tack_struct, but that's dopey. looks it should be per task, as reclaim could sleep. either putting it to task_struct or allocating it, both are not good. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-27 23:45 ` Andrew Morton 2011-07-28 1:04 ` Shaohua Li @ 2011-07-29 8:38 ` Minchan Kim 2011-07-29 10:30 ` Shaohua Li 2011-07-29 10:43 ` Dave Chinner 1 sibling, 2 replies; 16+ messages in thread From: Minchan Kim @ 2011-07-29 8:38 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, Shaohua Li, mgorman@suse.de, linux-mm, lkml On Wed, Jul 27, 2011 at 04:45:23PM -0700, Andrew Morton wrote: > On Sat, 23 Jul 2011 20:49:10 +0200 > Jens Axboe <jaxboe@fusionio.com> wrote: > > > > I can observe the average request size changes. Before the patch, the > > > average request size is about 90k from iostat (but the variation is > > > big). With the patch, the request size is about 100k and variation is > > > small. > > > > That's a good win right there, imho. > > yup. Reduced CPU consumption on that path isn't terribly exciting IMO, > but improved request size is significant. Fair enough. He didn't write down it in the description. At least, The description should include request size and variation instead of CPU consumption thing. Shaohua, Please rewrite the description although it's annoying. > > Using an additional 44 bytes of stack on that path is also > significant(ly bad). But we need to fix that problem anyway. One way > we could improve things in mm/vmscan.c is to move the blk_plug into > scan_control then get the scan_control off the stack in some manner. > That's easy for kswapd: allocate one scan_control per kswapd at > startup. Doing it for direct-reclaim would be a bit trickier... Stack diet in direct reclaim... Of course, it's a matter as I pointed out in this patch but frankly speaking, it's very annoying to consider stack usage whenever we add something in direct reclaim path. I think better solution is to avoid write in direct reclaim like the approach of Mel. I vote the approach. So now I will not complain the stack usage in this patch but focus on Mel's patch > > > > And I have the usual maintainability whine. If someone comes up to > vmscan.c and sees it calling blk_start_plug(), how are they supposed to > work out why that call is there? They go look at the blk_start_plug() > definition and it is undocumented. I think we can do better than this? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-29 8:38 ` Minchan Kim @ 2011-07-29 10:30 ` Shaohua Li 2011-07-29 10:43 ` Dave Chinner 1 sibling, 0 replies; 16+ messages in thread From: Shaohua Li @ 2011-07-29 10:30 UTC (permalink / raw) To: Minchan Kim; +Cc: Andrew Morton, Jens Axboe, mgorman@suse.de, linux-mm, lkml On Fri, Jul 29, 2011 at 04:38:47PM +0800, Minchan Kim wrote: > On Wed, Jul 27, 2011 at 04:45:23PM -0700, Andrew Morton wrote: > > On Sat, 23 Jul 2011 20:49:10 +0200 > > Jens Axboe <jaxboe@fusionio.com> wrote: > > > > > > I can observe the average request size changes. Before the patch, the > > > > average request size is about 90k from iostat (but the variation is > > > > big). With the patch, the request size is about 100k and variation is > > > > small. > > > > > > That's a good win right there, imho. > > > > yup. Reduced CPU consumption on that path isn't terribly exciting IMO, > > but improved request size is significant. > > Fair enough. > He didn't write down it in the description. > At least, The description should include request size and variation instead of > CPU consumption thing. > > Shaohua, Please rewrite the description although it's annoying. that's fine. I add more description here. per-task block plug can reduce block queue lock contention and increase request merge. Currently page reclaim doesn't support it. I originally thought page reclaim doesn't need it, because kswapd thread count is limited and file cache write is done at flusher mostly. When I test a workload with heavy swap in a 4-node machine, each CPU is doing direct page reclaim and swap. This causes block queue lock contention. In my test, without below patch, the CPU utilization is about 2% ~ 7%. With the patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed. >From iostat, the average request size is increased too. Before the patch, the average request size is about 90k and the variation is big. With the patch, the size is about 100k and the variation is small. This should improve normal kswapd write and file cache write too (increase request merge for example), but might not be so obvious as I explain above. Signed-off-by: Shaohua Li <shaohua.li@intel.com> diff --git a/mm/vmscan.c b/mm/vmscan.c index 5ed24b9..8ec04b2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1933,12 +1933,14 @@ static void shrink_zone(int priority, struct zone *zone, enum lru_list l; unsigned long nr_reclaimed, nr_scanned; unsigned long nr_to_reclaim = sc->nr_to_reclaim; + struct blk_plug plug; restart: nr_reclaimed = 0; nr_scanned = sc->nr_scanned; get_scan_count(zone, sc, nr, priority); + blk_start_plug(&plug); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) { for_each_evictable_lru(l) { @@ -1962,6 +1964,7 @@ restart: if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY) break; } + blk_finish_plug(&plug); sc->nr_reclaimed += nr_reclaimed; /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH]vmscan: add block plug for page reclaim 2011-07-29 8:38 ` Minchan Kim 2011-07-29 10:30 ` Shaohua Li @ 2011-07-29 10:43 ` Dave Chinner 1 sibling, 0 replies; 16+ messages in thread From: Dave Chinner @ 2011-07-29 10:43 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Jens Axboe, Shaohua Li, mgorman@suse.de, linux-mm, lkml On Fri, Jul 29, 2011 at 05:38:47PM +0900, Minchan Kim wrote: > On Wed, Jul 27, 2011 at 04:45:23PM -0700, Andrew Morton wrote: > > Using an additional 44 bytes of stack on that path is also > > significant(ly bad). But we need to fix that problem anyway. One way > > we could improve things in mm/vmscan.c is to move the blk_plug into > > scan_control then get the scan_control off the stack in some manner. > > That's easy for kswapd: allocate one scan_control per kswapd at > > startup. Doing it for direct-reclaim would be a bit trickier... > > Stack diet in direct reclaim... > Of course, it's a matter as I pointed out in this patch > but frankly speaking, it's very annoying to consider stack usage > whenever we add something in direct reclaim path. It's a fact of life that direct reclaim has to live with - memory allocation can occur with a lot of stack already consumed. If you don't want to care about stack usage, then lets increase the default stack size to 16k for x86-64..... > I think better solution is to avoid write in direct reclaim like the approach of Mel. Yeah, and we should probably stop swapping in the direct reclaim path, too, because I've seen the stack usage from memory allocation to swap IO issue exceed 4k on x86-64.... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-07-29 10:43 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-20 2:53 [PATCH]vmscan: add block plug for page reclaim Shaohua Li 2011-07-20 5:53 ` Minchan Kim 2011-07-20 6:10 ` Shaohua Li 2011-07-20 6:30 ` Minchan Kim 2011-07-20 6:49 ` Shaohua Li 2011-07-21 19:32 ` Jens Axboe 2011-07-22 5:14 ` Shaohua Li 2011-07-23 18:49 ` Jens Axboe 2011-07-27 3:09 ` Shaohua Li 2011-07-27 23:45 ` Andrew Morton 2011-07-28 1:04 ` Shaohua Li 2011-07-28 1:15 ` Andrew Morton 2011-07-28 1:34 ` Shaohua Li 2011-07-29 8:38 ` Minchan Kim 2011-07-29 10:30 ` Shaohua Li 2011-07-29 10:43 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).