linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH]vmscan: add block plug for page reclaim
@ 2011-07-20  2:53 Shaohua Li
  2011-07-20  5:53 ` Minchan Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Shaohua Li @ 2011-07-20  2:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, mgorman, linux-mm, lkml

per-task block plug can reduce block queue lock contention and increase request
merge. Currently page reclaim doesn't support it. I originally thought page
reclaim doesn't need it, because kswapd thread count is limited and file cache
write is done at flusher mostly.
When I test a workload with heavy swap in a 4-node machine, each CPU is doing
direct page reclaim and swap. This causes block queue lock contention. In my
test, without below patch, the CPU utilization is about 2% ~ 7%. With the
patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
This should improve normal kswapd write and file cache write too (increase
request merge for example), but might not be so obvious as I explain above.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5ed24b9..8ec04b2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1933,12 +1933,14 @@ static void shrink_zone(int priority, struct zone *zone,
 	enum lru_list l;
 	unsigned long nr_reclaimed, nr_scanned;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
+	struct blk_plug plug;
 
 restart:
 	nr_reclaimed = 0;
 	nr_scanned = sc->nr_scanned;
 	get_scan_count(zone, sc, nr, priority);
 
+	blk_start_plug(&plug);
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {
@@ -1962,6 +1964,7 @@ restart:
 		if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
 			break;
 	}
+	blk_finish_plug(&plug);
 	sc->nr_reclaimed += nr_reclaimed;
 
 	/*


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-20  2:53 [PATCH]vmscan: add block plug for page reclaim Shaohua Li
@ 2011-07-20  5:53 ` Minchan Kim
  2011-07-20  6:10   ` Shaohua Li
  0 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2011-07-20  5:53 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Jens Axboe, Andrew Morton, mgorman, linux-mm, lkml

On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> per-task block plug can reduce block queue lock contention and increase request
> merge. Currently page reclaim doesn't support it. I originally thought page
> reclaim doesn't need it, because kswapd thread count is limited and file cache
> write is done at flusher mostly.
> When I test a workload with heavy swap in a 4-node machine, each CPU is doing
> direct page reclaim and swap. This causes block queue lock contention. In my
> test, without below patch, the CPU utilization is about 2% ~ 7%. With the
> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.

Why doesn't it enhance through?
It means merge is rare?

> This should improve normal kswapd write and file cache write too (increase
> request merge for example), but might not be so obvious as I explain above.

CPU utilization enhance on  4-node machine with heavy swap?
I think it isn't common situation.

And I don't want to add new stack usage if it doesn't have a benefit.
As you know, direct reclaim path has a stack overflow.
These days, Mel, Dave and Christoph try to remove write path in
reclaim for solving stack usage and enhance write performance.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-20  5:53 ` Minchan Kim
@ 2011-07-20  6:10   ` Shaohua Li
  2011-07-20  6:30     ` Minchan Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Shaohua Li @ 2011-07-20  6:10 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Jens Axboe, Andrew Morton, mgorman@suse.de, linux-mm, lkml

On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote:
> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > per-task block plug can reduce block queue lock contention and increase request
> > merge. Currently page reclaim doesn't support it. I originally thought page
> > reclaim doesn't need it, because kswapd thread count is limited and file cache
> > write is done at flusher mostly.
> > When I test a workload with heavy swap in a 4-node machine, each CPU is doing
> > direct page reclaim and swap. This causes block queue lock contention. In my
> > test, without below patch, the CPU utilization is about 2% ~ 7%. With the
> > patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
> 
> Why doesn't it enhance through?
throughput? The disk isn't that fast. We already can make it run in full
speed, CPU isn't bottleneck here.

> It means merge is rare?
Merge is still there even without my patch, but maybe not be able to
make the request size biggest in cocurrent I/O.

> > This should improve normal kswapd write and file cache write too (increase
> > request merge for example), but might not be so obvious as I explain above.
> 
> CPU utilization enhance on  4-node machine with heavy swap?
> I think it isn't common situation.
> 
> And I don't want to add new stack usage if it doesn't have a benefit.
> As you know, direct reclaim path has a stack overflow.
> These days, Mel, Dave and Christoph try to remove write path in
> reclaim for solving stack usage and enhance write performance.
it will use a little stack, yes. When I said the benefit isn't so
obvious, it doesn't mean it has no benefit. For example, if kswapd and
other threads write the same disk, this can still reduce lock contention
and increase request merge. Part reason I didn't see obvious affect for
file cache is my disk is slow.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-20  6:10   ` Shaohua Li
@ 2011-07-20  6:30     ` Minchan Kim
  2011-07-20  6:49       ` Shaohua Li
  0 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2011-07-20  6:30 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Jens Axboe, Andrew Morton, mgorman@suse.de, linux-mm, lkml

On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote:
>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>> > per-task block plug can reduce block queue lock contention and increase request
>> > merge. Currently page reclaim doesn't support it. I originally thought page
>> > reclaim doesn't need it, because kswapd thread count is limited and file cache
>> > write is done at flusher mostly.
>> > When I test a workload with heavy swap in a 4-node machine, each CPU is doing
>> > direct page reclaim and swap. This causes block queue lock contention. In my
>> > test, without below patch, the CPU utilization is about 2% ~ 7%. With the
>> > patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
>>
>> Why doesn't it enhance through?
> throughput? The disk isn't that fast. We already can make it run in full

Yes. Sorry for the typo.

> speed, CPU isn't bottleneck here.

But you try to optimize CPU. so your experiment is not good.

>
>> It means merge is rare?
> Merge is still there even without my patch, but maybe not be able to
> make the request size biggest in cocurrent I/O.
>
>> > This should improve normal kswapd write and file cache write too (increase
>> > request merge for example), but might not be so obvious as I explain above.
>>
>> CPU utilization enhance on  4-node machine with heavy swap?
>> I think it isn't common situation.
>>
>> And I don't want to add new stack usage if it doesn't have a benefit.
>> As you know, direct reclaim path has a stack overflow.
>> These days, Mel, Dave and Christoph try to remove write path in
>> reclaim for solving stack usage and enhance write performance.
> it will use a little stack, yes. When I said the benefit isn't so
> obvious, it doesn't mean it has no benefit. For example, if kswapd and
> other threads write the same disk, this can still reduce lock contention
> and increase request merge. Part reason I didn't see obvious affect for
> file cache is my disk is slow.

If it begin swapping, I think the the performance would be less important,
But your patch is so simple that it would be mergable(Maybe Andrew
would merge regardless of my comment) but impact is a little in your
experiment.

I suggest you test it with fast disk like SSD and show the benefit to
us certainly. (I think you intel guy have a good SSD, apparently :D )

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-20  6:30     ` Minchan Kim
@ 2011-07-20  6:49       ` Shaohua Li
  2011-07-21 19:32         ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Shaohua Li @ 2011-07-20  6:49 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Jens Axboe, Andrew Morton, mgorman@suse.de, linux-mm, lkml

On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote:
> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> > On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote:
> >> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> >> > per-task block plug can reduce block queue lock contention and increase request
> >> > merge. Currently page reclaim doesn't support it. I originally thought page
> >> > reclaim doesn't need it, because kswapd thread count is limited and file cache
> >> > write is done at flusher mostly.
> >> > When I test a workload with heavy swap in a 4-node machine, each CPU is doing
> >> > direct page reclaim and swap. This causes block queue lock contention. In my
> >> > test, without below patch, the CPU utilization is about 2% ~ 7%. With the
> >> > patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
> >>
> >> Why doesn't it enhance through?
> > throughput? The disk isn't that fast. We already can make it run in full
> 
> Yes. Sorry for the typo.
> 
> > speed, CPU isn't bottleneck here.
> 
> But you try to optimize CPU. so your experiment is not good.
it's not that good, because the disk isn't fast. The swap test is the
workload with most significant impact I can get.

> >> It means merge is rare?
> > Merge is still there even without my patch, but maybe not be able to
> > make the request size biggest in cocurrent I/O.
> >
> >> > This should improve normal kswapd write and file cache write too (increase
> >> > request merge for example), but might not be so obvious as I explain above.
> >>
> >> CPU utilization enhance on  4-node machine with heavy swap?
> >> I think it isn't common situation.
> >>
> >> And I don't want to add new stack usage if it doesn't have a benefit.
> >> As you know, direct reclaim path has a stack overflow.
> >> These days, Mel, Dave and Christoph try to remove write path in
> >> reclaim for solving stack usage and enhance write performance.
> > it will use a little stack, yes. When I said the benefit isn't so
> > obvious, it doesn't mean it has no benefit. For example, if kswapd and
> > other threads write the same disk, this can still reduce lock contention
> > and increase request merge. Part reason I didn't see obvious affect for
> > file cache is my disk is slow.
> 
> If it begin swapping, I think the the performance would be less important,
> But your patch is so simple that it would be mergable(Maybe Andrew
> would merge regardless of my comment) but impact is a little in your
> experiment.
> 
> I suggest you test it with fast disk like SSD and show the benefit to
> us certainly. (I think you intel guy have a good SSD, apparently :D )
I do have one, but it's not good :). The high throughput is just like a
normal disk. A hardware raid is preferred to do the test, but I haven't.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-20  6:49       ` Shaohua Li
@ 2011-07-21 19:32         ` Jens Axboe
  2011-07-22  5:14           ` Shaohua Li
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2011-07-21 19:32 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Minchan Kim, Andrew Morton, mgorman@suse.de, linux-mm, lkml

On 2011-07-20 08:49, Shaohua Li wrote:
> On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote:
>> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote:
>>> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote:
>>>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>> per-task block plug can reduce block queue lock contention and increase request
>>>>> merge. Currently page reclaim doesn't support it. I originally thought page
>>>>> reclaim doesn't need it, because kswapd thread count is limited and file cache
>>>>> write is done at flusher mostly.
>>>>> When I test a workload with heavy swap in a 4-node machine, each CPU is doing
>>>>> direct page reclaim and swap. This causes block queue lock contention. In my
>>>>> test, without below patch, the CPU utilization is about 2% ~ 7%. With the
>>>>> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
>>>>
>>>> Why doesn't it enhance through?
>>> throughput? The disk isn't that fast. We already can make it run in full
>>
>> Yes. Sorry for the typo.
>>
>>> speed, CPU isn't bottleneck here.
>>
>> But you try to optimize CPU. so your experiment is not good.
> it's not that good, because the disk isn't fast. The swap test is the
> workload with most significant impact I can get.

Let me just interject here that a plug should be fine, from 3.1 we'll
even auto-unplug if a certain depth has been reached. So latency should
not be a worry. Personally I think the patch looks fine, though some
numbers would be interesting to see. Cycles spent submitting the actual
IO, combined with IO statistics what kind of IO patterns were observed
for plain and with patch would be good.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-21 19:32         ` Jens Axboe
@ 2011-07-22  5:14           ` Shaohua Li
  2011-07-23 18:49             ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Shaohua Li @ 2011-07-22  5:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Minchan Kim, Andrew Morton, mgorman@suse.de, linux-mm, lkml

On Fri, 2011-07-22 at 03:32 +0800, Jens Axboe wrote:
> On 2011-07-20 08:49, Shaohua Li wrote:
> > On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote:
> >> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> >>> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote:
> >>>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> >>>>> per-task block plug can reduce block queue lock contention and increase request
> >>>>> merge. Currently page reclaim doesn't support it. I originally thought page
> >>>>> reclaim doesn't need it, because kswapd thread count is limited and file cache
> >>>>> write is done at flusher mostly.
> >>>>> When I test a workload with heavy swap in a 4-node machine, each CPU is doing
> >>>>> direct page reclaim and swap. This causes block queue lock contention. In my
> >>>>> test, without below patch, the CPU utilization is about 2% ~ 7%. With the
> >>>>> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
> >>>>
> >>>> Why doesn't it enhance through?
> >>> throughput? The disk isn't that fast. We already can make it run in full
> >>
> >> Yes. Sorry for the typo.
> >>
> >>> speed, CPU isn't bottleneck here.
> >>
> >> But you try to optimize CPU. so your experiment is not good.
> > it's not that good, because the disk isn't fast. The swap test is the
> > workload with most significant impact I can get.
> 
> Let me just interject here that a plug should be fine, from 3.1 we'll
> even auto-unplug if a certain depth has been reached. So latency should
> not be a worry. Personally I think the patch looks fine, though some
> numbers would be interesting to see. Cycles spent submitting the actual
> IO, combined with IO statistics what kind of IO patterns were observed
> for plain and with patch would be good.
I can observe the average request size changes. Before the patch, the
average request size is about 90k from iostat (but the variation is
big). With the patch, the request size is about 100k and variation is
small.
how to check the cycles spend submitting the I/O?

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-22  5:14           ` Shaohua Li
@ 2011-07-23 18:49             ` Jens Axboe
  2011-07-27  3:09               ` Shaohua Li
  2011-07-27 23:45               ` Andrew Morton
  0 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2011-07-23 18:49 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Minchan Kim, Andrew Morton, mgorman@suse.de, linux-mm, lkml

On 2011-07-22 07:14, Shaohua Li wrote:
> On Fri, 2011-07-22 at 03:32 +0800, Jens Axboe wrote:
>> On 2011-07-20 08:49, Shaohua Li wrote:
>>> On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote:
>>>> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote:
>>>>>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>>>> per-task block plug can reduce block queue lock contention and increase request
>>>>>>> merge. Currently page reclaim doesn't support it. I originally thought page
>>>>>>> reclaim doesn't need it, because kswapd thread count is limited and file cache
>>>>>>> write is done at flusher mostly.
>>>>>>> When I test a workload with heavy swap in a 4-node machine, each CPU is doing
>>>>>>> direct page reclaim and swap. This causes block queue lock contention. In my
>>>>>>> test, without below patch, the CPU utilization is about 2% ~ 7%. With the
>>>>>>> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
>>>>>>
>>>>>> Why doesn't it enhance through?
>>>>> throughput? The disk isn't that fast. We already can make it run in full
>>>>
>>>> Yes. Sorry for the typo.
>>>>
>>>>> speed, CPU isn't bottleneck here.
>>>>
>>>> But you try to optimize CPU. so your experiment is not good.
>>> it's not that good, because the disk isn't fast. The swap test is the
>>> workload with most significant impact I can get.
>>
>> Let me just interject here that a plug should be fine, from 3.1 we'll
>> even auto-unplug if a certain depth has been reached. So latency should
>> not be a worry. Personally I think the patch looks fine, though some
>> numbers would be interesting to see. Cycles spent submitting the actual
>> IO, combined with IO statistics what kind of IO patterns were observed
>> for plain and with patch would be good.
> I can observe the average request size changes. Before the patch, the
> average request size is about 90k from iostat (but the variation is
> big). With the patch, the request size is about 100k and variation is
> small.

That's a good win right there, imho.

> how to check the cycles spend submitting the I/O?

It's not easy, normal profiles is pretty much the only thing to go by.

I guess I'm just a bit puzzled by Minchan's reluctance towards the
patch, it seems like mostly goodness to me.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-23 18:49             ` Jens Axboe
@ 2011-07-27  3:09               ` Shaohua Li
  2011-07-27 23:45               ` Andrew Morton
  1 sibling, 0 replies; 16+ messages in thread
From: Shaohua Li @ 2011-07-27  3:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Minchan Kim, Andrew Morton, mgorman@suse.de, linux-mm, lkml

2011/7/24 Jens Axboe <jaxboe@fusionio.com>:
> On 2011-07-22 07:14, Shaohua Li wrote:
>> On Fri, 2011-07-22 at 03:32 +0800, Jens Axboe wrote:
>>> On 2011-07-20 08:49, Shaohua Li wrote:
>>>> On Wed, 2011-07-20 at 14:30 +0800, Minchan Kim wrote:
>>>>> On Wed, Jul 20, 2011 at 3:10 PM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>>> On Wed, 2011-07-20 at 13:53 +0800, Minchan Kim wrote:
>>>>>>> On Wed, Jul 20, 2011 at 11:53 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>>>>> per-task block plug can reduce block queue lock contention and increase request
>>>>>>>> merge. Currently page reclaim doesn't support it. I originally thought page
>>>>>>>> reclaim doesn't need it, because kswapd thread count is limited and file cache
>>>>>>>> write is done at flusher mostly.
>>>>>>>> When I test a workload with heavy swap in a 4-node machine, each CPU is doing
>>>>>>>> direct page reclaim and swap. This causes block queue lock contention. In my
>>>>>>>> test, without below patch, the CPU utilization is about 2% ~ 7%. With the
>>>>>>>> patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
>>>>>>>
>>>>>>> Why doesn't it enhance through?
>>>>>> throughput? The disk isn't that fast. We already can make it run in full
>>>>>
>>>>> Yes. Sorry for the typo.
>>>>>
>>>>>> speed, CPU isn't bottleneck here.
>>>>>
>>>>> But you try to optimize CPU. so your experiment is not good.
>>>> it's not that good, because the disk isn't fast. The swap test is the
>>>> workload with most significant impact I can get.
>>>
>>> Let me just interject here that a plug should be fine, from 3.1 we'll
>>> even auto-unplug if a certain depth has been reached. So latency should
>>> not be a worry. Personally I think the patch looks fine, though some
>>> numbers would be interesting to see. Cycles spent submitting the actual
>>> IO, combined with IO statistics what kind of IO patterns were observed
>>> for plain and with patch would be good.
>> I can observe the average request size changes. Before the patch, the
>> average request size is about 90k from iostat (but the variation is
>> big). With the patch, the request size is about 100k and variation is
>> small.
>
> That's a good win right there, imho.
>
>> how to check the cycles spend submitting the I/O?
>
> It's not easy, normal profiles is pretty much the only thing to go by.
>
> I guess I'm just a bit puzzled by Minchan's reluctance towards the
> patch, it seems like mostly goodness to me.
akpm,
would you consider picking this up?

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-23 18:49             ` Jens Axboe
  2011-07-27  3:09               ` Shaohua Li
@ 2011-07-27 23:45               ` Andrew Morton
  2011-07-28  1:04                 ` Shaohua Li
  2011-07-29  8:38                 ` Minchan Kim
  1 sibling, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2011-07-27 23:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Shaohua Li, Minchan Kim, mgorman@suse.de, linux-mm, lkml

On Sat, 23 Jul 2011 20:49:10 +0200
Jens Axboe <jaxboe@fusionio.com> wrote:

> > I can observe the average request size changes. Before the patch, the
> > average request size is about 90k from iostat (but the variation is
> > big). With the patch, the request size is about 100k and variation is
> > small.
> 
> That's a good win right there, imho.

yup.  Reduced CPU consumption on that path isn't terribly exciting IMO,
but improved request size is significant.

Using an additional 44 bytes of stack on that path is also
significant(ly bad).  But we need to fix that problem anyway.  One way
we could improve things in mm/vmscan.c is to move the blk_plug into
scan_control then get the scan_control off the stack in some manner. 
That's easy for kswapd: allocate one scan_control per kswapd at
startup.  Doing it for direct-reclaim would be a bit trickier...



And I have the usual maintainability whine.  If someone comes up to
vmscan.c and sees it calling blk_start_plug(), how are they supposed to
work out why that call is there?  They go look at the blk_start_plug()
definition and it is undocumented.  I think we can do better than this?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-27 23:45               ` Andrew Morton
@ 2011-07-28  1:04                 ` Shaohua Li
  2011-07-28  1:15                   ` Andrew Morton
  2011-07-29  8:38                 ` Minchan Kim
  1 sibling, 1 reply; 16+ messages in thread
From: Shaohua Li @ 2011-07-28  1:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jens Axboe, Minchan Kim, mgorman@suse.de, linux-mm, lkml

On Thu, 2011-07-28 at 07:45 +0800, Andrew Morton wrote:
> On Sat, 23 Jul 2011 20:49:10 +0200
> Jens Axboe <jaxboe@fusionio.com> wrote:
> 
> > > I can observe the average request size changes. Before the patch, the
> > > average request size is about 90k from iostat (but the variation is
> > > big). With the patch, the request size is about 100k and variation is
> > > small.
> > 
> > That's a good win right there, imho.
> 
> yup.  Reduced CPU consumption on that path isn't terribly exciting IMO,
> but improved request size is significant.
> 
> Using an additional 44 bytes of stack on that path is also
> significant(ly bad).  But we need to fix that problem anyway.  One way
> we could improve things in mm/vmscan.c is to move the blk_plug into
> scan_control then get the scan_control off the stack in some manner. 
> That's easy for kswapd: allocate one scan_control per kswapd at
> startup.  Doing it for direct-reclaim would be a bit trickier...
unfortunately, the direct-reclaim case is what cares about stack.

BTW, the scan_control can be dieted. may_unmap/may_swap/may_writepage
can be a bit. swappiness < 100, so can be a char. order <= 11, can be a
char. should I do it to cut the size?

> And I have the usual maintainability whine.  If someone comes up to
> vmscan.c and sees it calling blk_start_plug(), how are they supposed to
> work out why that call is there?  They go look at the blk_start_plug()
> definition and it is undocumented.  I think we can do better than this?
the block plug is a little tricky, we definitely should document it.
Jens, if you don't mind, I'll add comments there.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-28  1:04                 ` Shaohua Li
@ 2011-07-28  1:15                   ` Andrew Morton
  2011-07-28  1:34                     ` Shaohua Li
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2011-07-28  1:15 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Jens Axboe, Minchan Kim, mgorman@suse.de, linux-mm, lkml

On Thu, 28 Jul 2011 09:04:20 +0800 Shaohua Li <shaohua.li@intel.com> wrote:

> > Using an additional 44 bytes of stack on that path is also
> > significant(ly bad).  But we need to fix that problem anyway.  One way
> > we could improve things in mm/vmscan.c is to move the blk_plug into
> > scan_control then get the scan_control off the stack in some manner. 
> > That's easy for kswapd: allocate one scan_control per kswapd at
> > startup.  Doing it for direct-reclaim would be a bit trickier...
> unfortunately, the direct-reclaim case is what cares about stack.
> 
> BTW, the scan_control can be dieted. may_unmap/may_swap/may_writepage
> can be a bit. swappiness < 100, so can be a char. order <= 11, can be a
> char. should I do it to cut the size?

All five will fit in a 32-bit word, at some expense in code size.

But I think first it would be better to work on a way of getting it all
off the stack, along with the blk_plug.

Could be done with a per-cpu array and CPU pinning, but CPU pinning is
a bit expensive nowadays.  Could put a scan_control* into the
tack_struct, but that's dopey.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-28  1:15                   ` Andrew Morton
@ 2011-07-28  1:34                     ` Shaohua Li
  0 siblings, 0 replies; 16+ messages in thread
From: Shaohua Li @ 2011-07-28  1:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jens Axboe, Minchan Kim, mgorman@suse.de, linux-mm, lkml

On Thu, 2011-07-28 at 09:15 +0800, Andrew Morton wrote:
> On Thu, 28 Jul 2011 09:04:20 +0800 Shaohua Li <shaohua.li@intel.com> wrote:
> 
> > > Using an additional 44 bytes of stack on that path is also
> > > significant(ly bad).  But we need to fix that problem anyway.  One way
> > > we could improve things in mm/vmscan.c is to move the blk_plug into
> > > scan_control then get the scan_control off the stack in some manner. 
> > > That's easy for kswapd: allocate one scan_control per kswapd at
> > > startup.  Doing it for direct-reclaim would be a bit trickier...
> > unfortunately, the direct-reclaim case is what cares about stack.
> > 
> > BTW, the scan_control can be dieted. may_unmap/may_swap/may_writepage
> > can be a bit. swappiness < 100, so can be a char. order <= 11, can be a
> > char. should I do it to cut the size?
> 
> All five will fit in a 32-bit word, at some expense in code size.
oh, I missed the code size will increase, so it's not good then.

> But I think first it would be better to work on a way of getting it all
> off the stack, along with the blk_plug.
> 
> Could be done with a per-cpu array and CPU pinning, but CPU pinning is
> a bit expensive nowadays.  Could put a scan_control* into the
> tack_struct, but that's dopey.
looks it should be per task, as reclaim could sleep. either putting it
to task_struct or allocating it, both are not good.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-27 23:45               ` Andrew Morton
  2011-07-28  1:04                 ` Shaohua Li
@ 2011-07-29  8:38                 ` Minchan Kim
  2011-07-29 10:30                   ` Shaohua Li
  2011-07-29 10:43                   ` Dave Chinner
  1 sibling, 2 replies; 16+ messages in thread
From: Minchan Kim @ 2011-07-29  8:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jens Axboe, Shaohua Li, mgorman@suse.de, linux-mm, lkml

On Wed, Jul 27, 2011 at 04:45:23PM -0700, Andrew Morton wrote:
> On Sat, 23 Jul 2011 20:49:10 +0200
> Jens Axboe <jaxboe@fusionio.com> wrote:
> 
> > > I can observe the average request size changes. Before the patch, the
> > > average request size is about 90k from iostat (but the variation is
> > > big). With the patch, the request size is about 100k and variation is
> > > small.
> > 
> > That's a good win right there, imho.
> 
> yup.  Reduced CPU consumption on that path isn't terribly exciting IMO,
> but improved request size is significant.

Fair enough.
He didn't write down it in the description.
At least, The description should include request size and variation instead of
CPU consumption thing.

Shaohua, Please rewrite the description although it's annoying.

> 
> Using an additional 44 bytes of stack on that path is also
> significant(ly bad).  But we need to fix that problem anyway.  One way
> we could improve things in mm/vmscan.c is to move the blk_plug into
> scan_control then get the scan_control off the stack in some manner. 
> That's easy for kswapd: allocate one scan_control per kswapd at
> startup.  Doing it for direct-reclaim would be a bit trickier...

Stack diet in direct reclaim...
Of course, it's a matter as I pointed out in this patch
but frankly speaking, it's very annoying to consider stack usage
whenever we add something in direct reclaim path.
I think better solution is to avoid write in direct reclaim like the approach of Mel.
I vote the approach.
So now I will not complain the stack usage in this patch but focus on Mel's patch

> 
> 
> 
> And I have the usual maintainability whine.  If someone comes up to
> vmscan.c and sees it calling blk_start_plug(), how are they supposed to
> work out why that call is there?  They go look at the blk_start_plug()
> definition and it is undocumented.  I think we can do better than this?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-29  8:38                 ` Minchan Kim
@ 2011-07-29 10:30                   ` Shaohua Li
  2011-07-29 10:43                   ` Dave Chinner
  1 sibling, 0 replies; 16+ messages in thread
From: Shaohua Li @ 2011-07-29 10:30 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, Jens Axboe, mgorman@suse.de, linux-mm, lkml

On Fri, Jul 29, 2011 at 04:38:47PM +0800, Minchan Kim wrote:
> On Wed, Jul 27, 2011 at 04:45:23PM -0700, Andrew Morton wrote:
> > On Sat, 23 Jul 2011 20:49:10 +0200
> > Jens Axboe <jaxboe@fusionio.com> wrote:
> > 
> > > > I can observe the average request size changes. Before the patch, the
> > > > average request size is about 90k from iostat (but the variation is
> > > > big). With the patch, the request size is about 100k and variation is
> > > > small.
> > > 
> > > That's a good win right there, imho.
> > 
> > yup.  Reduced CPU consumption on that path isn't terribly exciting IMO,
> > but improved request size is significant.
> 
> Fair enough.
> He didn't write down it in the description.
> At least, The description should include request size and variation instead of
> CPU consumption thing.
> 
> Shaohua, Please rewrite the description although it's annoying.
that's fine. I add more description here.



per-task block plug can reduce block queue lock contention and increase request
merge. Currently page reclaim doesn't support it. I originally thought page
reclaim doesn't need it, because kswapd thread count is limited and file cache
write is done at flusher mostly.
When I test a workload with heavy swap in a 4-node machine, each CPU is doing
direct page reclaim and swap. This causes block queue lock contention. In my
test, without below patch, the CPU utilization is about 2% ~ 7%. With the
patch, the CPU utilization is about 1% ~ 3%. Disk throughput isn't changed.
>From iostat, the average request size is increased too. Before the patch,
the average request size is about 90k and the variation is big. With the patch,
the size is about 100k and the variation is small.
This should improve normal kswapd write and file cache write too (increase
request merge for example), but might not be so obvious as I explain above.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5ed24b9..8ec04b2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1933,12 +1933,14 @@ static void shrink_zone(int priority, struct zone *zone,
 	enum lru_list l;
 	unsigned long nr_reclaimed, nr_scanned;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
+	struct blk_plug plug;
 
 restart:
 	nr_reclaimed = 0;
 	nr_scanned = sc->nr_scanned;
 	get_scan_count(zone, sc, nr, priority);
 
+	blk_start_plug(&plug);
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {
@@ -1962,6 +1964,7 @@ restart:
 		if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
 			break;
 	}
+	blk_finish_plug(&plug);
 	sc->nr_reclaimed += nr_reclaimed;
 
 	/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH]vmscan: add block plug for page reclaim
  2011-07-29  8:38                 ` Minchan Kim
  2011-07-29 10:30                   ` Shaohua Li
@ 2011-07-29 10:43                   ` Dave Chinner
  1 sibling, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-07-29 10:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Jens Axboe, Shaohua Li, mgorman@suse.de, linux-mm,
	lkml

On Fri, Jul 29, 2011 at 05:38:47PM +0900, Minchan Kim wrote:
> On Wed, Jul 27, 2011 at 04:45:23PM -0700, Andrew Morton wrote:
> > Using an additional 44 bytes of stack on that path is also
> > significant(ly bad).  But we need to fix that problem anyway.  One way
> > we could improve things in mm/vmscan.c is to move the blk_plug into
> > scan_control then get the scan_control off the stack in some manner. 
> > That's easy for kswapd: allocate one scan_control per kswapd at
> > startup.  Doing it for direct-reclaim would be a bit trickier...
> 
> Stack diet in direct reclaim...
> Of course, it's a matter as I pointed out in this patch
> but frankly speaking, it's very annoying to consider stack usage
> whenever we add something in direct reclaim path.

It's a fact of life that direct reclaim has to live with - memory
allocation can occur with a lot of stack already consumed. If you
don't want to care about stack usage, then lets increase the default
stack size to 16k for x86-64.....

> I think better solution is to avoid write in direct reclaim like the approach of Mel.

Yeah, and we should probably stop swapping in the direct reclaim
path, too, because I've seen the stack usage from memory allocation
to swap IO issue exceed 4k on x86-64....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-07-29 10:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-20  2:53 [PATCH]vmscan: add block plug for page reclaim Shaohua Li
2011-07-20  5:53 ` Minchan Kim
2011-07-20  6:10   ` Shaohua Li
2011-07-20  6:30     ` Minchan Kim
2011-07-20  6:49       ` Shaohua Li
2011-07-21 19:32         ` Jens Axboe
2011-07-22  5:14           ` Shaohua Li
2011-07-23 18:49             ` Jens Axboe
2011-07-27  3:09               ` Shaohua Li
2011-07-27 23:45               ` Andrew Morton
2011-07-28  1:04                 ` Shaohua Li
2011-07-28  1:15                   ` Andrew Morton
2011-07-28  1:34                     ` Shaohua Li
2011-07-29  8:38                 ` Minchan Kim
2011-07-29 10:30                   ` Shaohua Li
2011-07-29 10:43                   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).