Re: [patch 2/2] MM: allow per-cpu vmstat_threshold and vmstat_worker configuration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Luiz Capitulino <lcapitulino@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Rik van Riel <riel@redhat.com>,
	Linux RT Users <linux-rt-users@vger.kernel.org>,
	cl@linux.com, cmetcalf@mellanox.com
Subject: Re: [patch 2/2] MM: allow per-cpu vmstat_threshold and vmstat_worker configuration
Date: Tue, 2 May 2017 13:15:27 -0400	[thread overview]
Message-ID: <20170502131527.7532fc2e@redhat.com> (raw)
In-Reply-To: <20170502165159.GA5457@amt.cnet>

On Tue, 2 May 2017 13:52:00 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> > I have several questions about the tunables:
> > 
> >  - What does the vmstat_threshold value mean? What are the implications
> >    of changing this value? What's the difference in choosing 1, 2, 3
> >    or 500?  
> 
> Its the maximum value for a vmstat statistics counter to hold. After
> that value, the statistics are transferred to the global counter:
> 
> void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
>                                 long delta)
> {
>         struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
>         s8 __percpu *p = pcp->vm_node_stat_diff + item;
>         long x;
>         long t;
> 
>         x = delta + __this_cpu_read(*p);
> 
>         t = __this_cpu_read(pcp->stat_threshold);
> 
>         if (unlikely(x > t || x < -t)) {
>                 node_page_state_add(x, pgdat, item);
>                 x = 0;
>         }
>         __this_cpu_write(*p, x);
> }
> EXPORT_SYMBOL(__mod_node_page_state);
> 
> BTW, there is a bug there, should change that to:
> 
>         if (unlikely(x >= t || x <= -t)) {
> 
> Increasing the threshold value does two things:
> 	1) It decreases the number of inter-processor accesses.
> 	2) It increases how much the global counters stay out of
> 	   sync relative to actual current values.

OK, but I'm mostly concerned with the sysadmin who will have
to change the tunable. So, I think it's a good idea to improve
the doc to contain that information.

> >  - If the purpose of having vmstat_threshold is to allow disabling
> >    the vmstat kworker, why can't the kernel pick a value automatically?  
> 
> Because it might be acceptable for the user to accept a small 
> out of syncedness of the global counters in favour of performance
> (one would have to analyze the situation).
> 
> Setting vmstat_threshold == 1 means the global counter is always
> in sync with the page counter state of the pCPU.

IMHO, if vmstat_threshold == 1 is the required setting for
disabling the vmstat kworker then I'd go with only one tunable
for now. But that's just a suggestion.

> 
> >  - What are the implications of disabling the vmstat kworker? Will vm
> >    stats still be collected someway or will it be completely off for
> >    the CPU?  
> 
> It will not be necessary to collect vmstats because at every modification
> of the vm statistics, pCPUs with vmstat_threshold=1 transfer their 
> values to the global counters (that is, there is no queueing of statistics
> locally to improve performance).

Ah, OK. Got this now. I'll give this patch a try. But I think we want
to hear from Christoph (who worked on reducing the vmstat interruptions
in the past).

> > Also, shouldn't this patch be split into two?  
> 
> First add one sysfs file, then add another sysfs file, you mean?

Yes, one tunable per patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Luiz Capitulino <lcapitulino@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Rik van Riel <riel@redhat.com>,
	Linux RT Users <linux-rt-users@vger.kernel.org>,
	cl@linux.com, cmetcalf@mellanox.com
Subject: Re: [patch 2/2] MM: allow per-cpu vmstat_threshold and vmstat_worker configuration
Date: Tue, 2 May 2017 13:15:27 -0400	[thread overview]
Message-ID: <20170502131527.7532fc2e@redhat.com> (raw)
In-Reply-To: <20170502165159.GA5457@amt.cnet>

On Tue, 2 May 2017 13:52:00 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> > I have several questions about the tunables:
> > 
> >  - What does the vmstat_threshold value mean? What are the implications
> >    of changing this value? What's the difference in choosing 1, 2, 3
> >    or 500?  
> 
> Its the maximum value for a vmstat statistics counter to hold. After
> that value, the statistics are transferred to the global counter:
> 
> void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
>                                 long delta)
> {
>         struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
>         s8 __percpu *p = pcp->vm_node_stat_diff + item;
>         long x;
>         long t;
> 
>         x = delta + __this_cpu_read(*p);
> 
>         t = __this_cpu_read(pcp->stat_threshold);
> 
>         if (unlikely(x > t || x < -t)) {
>                 node_page_state_add(x, pgdat, item);
>                 x = 0;
>         }
>         __this_cpu_write(*p, x);
> }
> EXPORT_SYMBOL(__mod_node_page_state);
> 
> BTW, there is a bug there, should change that to:
> 
>         if (unlikely(x >= t || x <= -t)) {
> 
> Increasing the threshold value does two things:
> 	1) It decreases the number of inter-processor accesses.
> 	2) It increases how much the global counters stay out of
> 	   sync relative to actual current values.

OK, but I'm mostly concerned with the sysadmin who will have
to change the tunable. So, I think it's a good idea to improve
the doc to contain that information.

> >  - If the purpose of having vmstat_threshold is to allow disabling
> >    the vmstat kworker, why can't the kernel pick a value automatically?  
> 
> Because it might be acceptable for the user to accept a small 
> out of syncedness of the global counters in favour of performance
> (one would have to analyze the situation).
> 
> Setting vmstat_threshold == 1 means the global counter is always
> in sync with the page counter state of the pCPU.

IMHO, if vmstat_threshold == 1 is the required setting for
disabling the vmstat kworker then I'd go with only one tunable
for now. But that's just a suggestion.

> 
> >  - What are the implications of disabling the vmstat kworker? Will vm
> >    stats still be collected someway or will it be completely off for
> >    the CPU?  
> 
> It will not be necessary to collect vmstats because at every modification
> of the vm statistics, pCPUs with vmstat_threshold=1 transfer their 
> values to the global counters (that is, there is no queueing of statistics
> locally to improve performance).

Ah, OK. Got this now. I'll give this patch a try. But I think we want
to hear from Christoph (who worked on reducing the vmstat interruptions
in the past).

> > Also, shouldn't this patch be split into two?  
> 
> First add one sysfs file, then add another sysfs file, you mean?

Yes, one tunable per patch.

next prev parent reply	other threads:[~2017-05-02 17:15 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-25 13:57 [patch 0/2] per-CPU vmstat thresholds and vmstat worker disablement Marcelo Tosatti
2017-04-25 13:57 ` Marcelo Tosatti
2017-04-25 13:57 ` [patch 1/2] MM: remove unused quiet_vmstat function Marcelo Tosatti
2017-04-25 13:57   ` Marcelo Tosatti
2017-04-25 13:57 ` [patch 2/2] MM: allow per-cpu vmstat_threshold and vmstat_worker configuration Marcelo Tosatti
2017-04-25 13:57   ` Marcelo Tosatti
2017-04-25 19:29   ` Rik van Riel
2017-04-25 19:29     ` Rik van Riel
2017-04-25 19:29     ` Rik van Riel
2017-04-25 19:36     ` Marcelo Tosatti
2017-04-25 19:36       ` Marcelo Tosatti
2017-04-25 19:36       ` Marcelo Tosatti
2017-05-02 14:28   ` Luiz Capitulino
2017-05-02 14:28     ` Luiz Capitulino
2017-05-02 16:52     ` Marcelo Tosatti
2017-05-02 16:52       ` Marcelo Tosatti
2017-05-02 17:15       ` Luiz Capitulino [this message]
2017-05-02 17:15         ` Luiz Capitulino
2017-05-02 17:21         ` Marcelo Tosatti
2017-05-02 17:21           ` Marcelo Tosatti
2017-05-02 17:21           ` Marcelo Tosatti
2017-05-11 15:37         ` Christoph Lameter
2017-05-11 15:37           ` Christoph Lameter
2017-05-12 12:27           ` Marcelo Tosatti
2017-05-12 12:27             ` Marcelo Tosatti
2017-05-12 15:11             ` Christoph Lameter
2017-05-12 15:11               ` Christoph Lameter
2017-05-12 15:40               ` Marcelo Tosatti
2017-05-12 15:40                 ` Marcelo Tosatti
2017-05-12 16:03                 ` Christoph Lameter
2017-05-12 16:03                   ` Christoph Lameter
2017-05-12 16:07                 ` Christoph Lameter
2017-05-12 16:07                   ` Christoph Lameter
2017-05-12 16:19                   ` Marcelo Tosatti
2017-05-12 16:19                     ` Marcelo Tosatti
2017-05-12 16:57                     ` Christoph Lameter
2017-05-12 16:57                       ` Christoph Lameter
2017-05-15 19:15                       ` Marcelo Tosatti
2017-05-15 19:15                         ` Marcelo Tosatti
2017-05-16 13:37                         ` Christoph Lameter
2017-05-16 13:37                           ` Christoph Lameter
2017-05-19 14:34                           ` Marcelo Tosatti
2017-05-19 14:34                             ` Marcelo Tosatti
2017-05-19 17:13                             ` Christoph Lameter
2017-05-19 17:13                               ` Christoph Lameter
2017-05-19 17:49                               ` Luiz Capitulino
2017-05-19 17:49                                 ` Luiz Capitulino
2017-05-22 16:35                                 ` Christoph Lameter
2017-05-22 16:35                                   ` Christoph Lameter
2017-05-25 19:35                                 ` Marcelo Tosatti
2017-05-25 19:35                                   ` Marcelo Tosatti
2017-05-25 19:35                                   ` Marcelo Tosatti
2017-05-26  3:24                                   ` Christoph Lameter
2017-05-26  3:24                                     ` Christoph Lameter
2017-05-26 19:09                                     ` Marcelo Tosatti
2017-05-26 19:09                                       ` Marcelo Tosatti
2017-05-30 18:17                                       ` Christoph Lameter
2017-05-30 18:17                                         ` Christoph Lameter
2017-07-10 15:05                                         ` Marcelo Tosatti
2017-07-10 15:05                                           ` Marcelo Tosatti
2017-05-20  8:26                               ` Marcelo Tosatti
2017-05-20  8:26                                 ` Marcelo Tosatti
2017-05-22 16:38                                 ` Christoph Lameter
2017-05-22 16:38                                   ` Christoph Lameter
2017-05-22 21:13                                   ` Marcelo Tosatti
2017-05-22 21:13                                     ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170502131527.7532fc2e@redhat.com \
    --to=lcapitulino@redhat.com \
    --cc=cl@linux.com \
    --cc=cmetcalf@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.