Re: hugepage compaction causes performance drop

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: lkp@lists.01.org
Subject: Re: hugepage compaction causes performance drop
Date: Thu, 19 Nov 2015 14:29:10 +0100	[thread overview]
Message-ID: <564DCEA6.3000802@suse.cz> (raw)
In-Reply-To: <20151119092920.GA11806@aaronlu.sh.intel.com>

[-- Attachment #1: Type: text/plain, Size: 2749 bytes --]

+CC Andrea, David, Joonsoo

On 11/19/2015 10:29 AM, Aaron Lu wrote:
> Hi,
>
> One vm related test case run by LKP on a Haswell EP with 128GiB memory
> showed that compaction code would cause performance drop about 30%. To
> illustrate the problem, I've simplified the test with a program called
> usemem(see attached). The test goes like this:
> 1 Boot up the server;
> 2 modprobe scsi_debug(a module that could use memory as SCSI device),
>    dev_size set to 4/5 free memory, i.e. about 100GiB. Use it as swap.
> 3 run the usemem test, which use mmap to map a MAP_PRIVATE | MAP_ANON
>    region with the size set to 3/4 of (remaining_free_memory + swap), and
>    then write to that region sequentially to trigger page fault and swap
>    out.
>
> The above test runs with two configs regarding the below two sysfs files:
> /sys/kernel/mm/transparent_hugepage/enabled
> /sys/kernel/mm/transparent_hugepage/defrag
> 1 transparent hugepage and defrag are both set to always, let's call it
>    always-always case;
> 2 transparent hugepage is set to always while defrag is set to never,
>    let's call it always-never case.
>
> The output from the always-always case is:
> Setting up swapspace version 1, size = 104627196 KiB
> no label, UUID=aafa53ae-af9e-46c9-acb9-8b3d4f57f610
> cmdline: /lkp/aaron/src/bin/usemem 99994672128
> 99994672128 transferred in 95 seconds, throughput: 1003 MB/s
>
> And the output from the always-never case is:
> etting up swapspace version 1, size = 104629244 KiB
> no label, UUID=60563c82-d1c6-4d86-b9fa-b52f208097e9
> cmdline: /lkp/aaron/src/bin/usemem 99995965440
> 99995965440 transferred in 67 seconds, throughput: 1423 MB/s

So yeah this is an example of workload that has no benefit from THP's, 
but pays all the cost. Fixing that is non-trivial and I admit I haven't 
pushed my prior efforts there too much lately...
But it's also possible there still are actual compaction bugs making the 
issue worse.

> The vmstat and perf-profile are also attached, please let me know if you
> need any more information, thanks.

Output from vmstat (the tool) isn't much useful here, a periodic "cat 
/proc/vmstat" would be much better.
The perf profiles are somewhat weirdly sorted by children cost (?), but 
I noticed a very high cost (46%) in pageblock_pfn_to_page(). This could 
be due to a very large but sparsely populated zone. Could you provide 
/proc/zoneinfo?
If the compaction scanners behave strangely due to a bug, enabling the 
ftrace compaction tracepoints should help find the cause. That might 
produce a very large output, but maybe it would be enough to see some 
parts of it (i.e. towards beginning, middle, end of the experiment).

Vlastimil

WARNING: multiple messages have this Message-ID (diff)

From: Vlastimil Babka <vbabka@suse.cz>
To: Aaron Lu <aaron.lu@intel.com>, linux-mm@kvack.org
Cc: Huang Ying <ying.huang@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	lkp@lists.01.org, Andrea Arcangeli <aarcange@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: hugepage compaction causes performance drop
Date: Thu, 19 Nov 2015 14:29:10 +0100	[thread overview]
Message-ID: <564DCEA6.3000802@suse.cz> (raw)
In-Reply-To: <20151119092920.GA11806@aaronlu.sh.intel.com>

+CC Andrea, David, Joonsoo

On 11/19/2015 10:29 AM, Aaron Lu wrote:
> Hi,
>
> One vm related test case run by LKP on a Haswell EP with 128GiB memory
> showed that compaction code would cause performance drop about 30%. To
> illustrate the problem, I've simplified the test with a program called
> usemem(see attached). The test goes like this:
> 1 Boot up the server;
> 2 modprobe scsi_debug(a module that could use memory as SCSI device),
>    dev_size set to 4/5 free memory, i.e. about 100GiB. Use it as swap.
> 3 run the usemem test, which use mmap to map a MAP_PRIVATE | MAP_ANON
>    region with the size set to 3/4 of (remaining_free_memory + swap), and
>    then write to that region sequentially to trigger page fault and swap
>    out.
>
> The above test runs with two configs regarding the below two sysfs files:
> /sys/kernel/mm/transparent_hugepage/enabled
> /sys/kernel/mm/transparent_hugepage/defrag
> 1 transparent hugepage and defrag are both set to always, let's call it
>    always-always case;
> 2 transparent hugepage is set to always while defrag is set to never,
>    let's call it always-never case.
>
> The output from the always-always case is:
> Setting up swapspace version 1, size = 104627196 KiB
> no label, UUID=aafa53ae-af9e-46c9-acb9-8b3d4f57f610
> cmdline: /lkp/aaron/src/bin/usemem 99994672128
> 99994672128 transferred in 95 seconds, throughput: 1003 MB/s
>
> And the output from the always-never case is:
> etting up swapspace version 1, size = 104629244 KiB
> no label, UUID=60563c82-d1c6-4d86-b9fa-b52f208097e9
> cmdline: /lkp/aaron/src/bin/usemem 99995965440
> 99995965440 transferred in 67 seconds, throughput: 1423 MB/s

So yeah this is an example of workload that has no benefit from THP's, 
but pays all the cost. Fixing that is non-trivial and I admit I haven't 
pushed my prior efforts there too much lately...
But it's also possible there still are actual compaction bugs making the 
issue worse.

> The vmstat and perf-profile are also attached, please let me know if you
> need any more information, thanks.

Output from vmstat (the tool) isn't much useful here, a periodic "cat 
/proc/vmstat" would be much better.
The perf profiles are somewhat weirdly sorted by children cost (?), but 
I noticed a very high cost (46%) in pageblock_pfn_to_page(). This could 
be due to a very large but sparsely populated zone. Could you provide 
/proc/zoneinfo?
If the compaction scanners behave strangely due to a bug, enabling the 
ftrace compaction tracepoints should help find the cause. That might 
produce a very large output, but maybe it would be enough to see some 
parts of it (i.e. towards beginning, middle, end of the experiment).

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-11-19 13:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-19  9:29 hugepage compaction causes performance drop Aaron Lu
2015-11-19  9:29 ` Aaron Lu
2015-11-19 13:29 ` Vlastimil Babka [this message]
2015-11-19 13:29   ` Vlastimil Babka
2015-11-20  8:55   ` Aaron Lu
2015-11-20  8:55     ` Aaron Lu
2015-11-20  9:33     ` Aaron Lu
2015-11-20  9:33       ` Aaron Lu
2015-11-20 10:06       ` Vlastimil Babka
2015-11-20 10:06         ` Vlastimil Babka
2015-11-23  8:16         ` Joonsoo Kim
2015-11-23  8:16           ` Joonsoo Kim
2015-11-23  8:33           ` Aaron Lu
2015-11-23  8:33             ` Aaron Lu
2015-11-23  9:24             ` Joonsoo Kim
2015-11-23  9:24               ` Joonsoo Kim
2015-11-24  3:40               ` Aaron Lu
2015-11-24  3:40                 ` Aaron Lu
2015-11-24  4:55                 ` Joonsoo Kim
2015-11-24  4:55                   ` Joonsoo Kim
2015-11-24  7:27                   ` Aaron Lu
2015-11-24  7:27                     ` Aaron Lu
2015-11-24  8:29                     ` Joonsoo Kim
2015-11-24  8:29                       ` Joonsoo Kim
2015-11-25 12:44                       ` Vlastimil Babka
2015-11-25 12:44                         ` Vlastimil Babka
2015-11-26  5:47                         ` Aaron Lu
2015-11-26  5:47                           ` Aaron Lu
2015-11-24  2:45         ` Joonsoo Kim
2015-11-24  2:45           ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=564DCEA6.3000802@suse.cz \
    --to=vbabka@suse.cz \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.