Re: [PATCH v4 2/2] mm: make lru_add_drain_all() selective

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chris Metcalf <cmetcalf@tilera.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Tejun Heo <tj@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Cody P Schafer <cody@linux.vnet.ibm.com>
Subject: Re: [PATCH v4 2/2] mm: make lru_add_drain_all() selective
Date: Mon, 12 Aug 2013 21:53:11 -0400	[thread overview]
Message-ID: <52099187.80301@tilera.com> (raw)
In-Reply-To: <20130812140520.c6a2255d2176a690fadf9ba7@linux-foundation.org>

On 8/12/2013 5:05 PM, Andrew Morton wrote:
> On Wed, 7 Aug 2013 16:52:22 -0400 Chris Metcalf <cmetcalf@tilera.com> wrote:
>
>> This change makes lru_add_drain_all() only selectively interrupt
>> the cpus that have per-cpu free pages that can be drained.
>>
>> This is important in nohz mode where calling mlockall(), for
>> example, otherwise will interrupt every core unnecessarily.
>>
>> ...
>>
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -405,6 +405,11 @@ static void activate_page_drain(int cpu)
>>  		pagevec_lru_move_fn(pvec, __activate_page, NULL);
>>  }
>>  
>> +static bool need_activate_page_drain(int cpu)
>> +{
>> +	return pagevec_count(&per_cpu(activate_page_pvecs, cpu)) != 0;
>> +}
>> +
>>  void activate_page(struct page *page)
>>  {
>>  	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
>> @@ -422,6 +427,11 @@ static inline void activate_page_drain(int cpu)
>>  {
>>  }
>>  
>> +static bool need_activate_page_drain(int cpu)
>> +{
>> +	return false;
>> +}
>> +
>>  void activate_page(struct page *page)
>>  {
>>  	struct zone *zone = page_zone(page);
>> @@ -683,7 +693,32 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
>>   */
>>  int lru_add_drain_all(void)
>>  {
>> -	return schedule_on_each_cpu(lru_add_drain_per_cpu);
>> +	cpumask_var_t mask;
>> +	int cpu, rc;
>> +
>> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
>> +		return -ENOMEM;
> Newly adding a GFP_KERNEL allocation attempt into lru_add_drain_all()
> is dangerous and undesirable.  I took a quick look at all the callsites
> and didn't immediately see a bug, but it's hard because they're
> splattered all over the place.  It would be far better if we were to
> not do this.

I think it should be safe, given that we already did alloc_percpu() to do
schedule_on_each_cpu(), and that is documented as doing GFP_KERNEL allocation
(pcpu_create_chunk will call alloc_pages with GFP_KERNEL).

> Rather than tossing this hang-grenade in there we should at a reluctant
> minimum change lru_add_drain_all() to take a gfp_t argument and then
> carefully review and update the callers.
>
>> +	cpumask_clear(mask);
>> +
>> +	/*
>> +	 * Figure out which cpus need flushing.  It's OK if we race
>> +	 * with changes to the per-cpu lru pvecs, since it's no worse
>> +	 * than if we flushed all cpus, since a cpu could still end
>> +	 * up putting pages back on its pvec before we returned.
>> +	 * And this avoids interrupting other cpus unnecessarily.
>> +	 */
>> +	for_each_online_cpu(cpu) {
>> +		if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
>> +		    pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
>> +		    pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
>> +		    need_activate_page_drain(cpu))
>> +			cpumask_set_cpu(cpu, mask);
>> +	}
>> +
>> +	rc = schedule_on_cpu_mask(lru_add_drain_per_cpu, mask);
> And it seems pretty easy to avoid the allocation.  Create a single
> cpumask at boot (or, preferably, at compile-time) and whenever we add a
> page to a drainable pagevec, do
>
> 	cpumask_set_cpu(smp_processor_id(), global_cpumask);
>
> and to avoid needlessly dirtying a cacheline,
>
> 	if (!cpu_isset(smp_processor_id(), global_cpumask))
> 		cpumask_set_cpu(smp_processor_id(), global_cpumask);
>
>
> This means that lru_add_drain_all() will need to clear the mask at some
> point and atomicity issues arise.  It would be better to do the
> clearing within schedule_on_cpu_mask() itself, using
> cpumask_test_and_clear_cpu().

The atomicity issue isn't that big a deal (given that the drain is
racy anyway, you just need to make sure to do cpumask_set_cpu after
the pagevec_add), but you do need to clear the cpumask before doing
the actual drain, and that either means inflicting that semantics
on schedule_on_cpu_mask(), which seems a little unnatural, or else
doing a copy of the mask, which gets us back to where we started
with GFP_KERNEL allocations.

Alternately, you could imagine a workqueue API that just took a function
pointer that returned for each cpu whether or not to schedule work on
that cpu:

  typedef bool (*check_work_func_t)(void *data, int cpu);
  schedule_on_some_cpus(work_func_t func, check_work_func_t checker, void *data);

For the lru stuff we wouldn't need to use a "data" pointer but I'd include
it since it's cheap, pretty standard, and makes the API more general.

Or, I suppose, one other possibility is just a compile-time struct
cpumask that we guard with a lock.  It seems a bit like overkill for
the very common case of not specifying CPUMASK_OFFSTACK.

All that said, I still tend to like the simple cpumask data-driven approach,
assuming you're comfortable with the possible GFP_KERNEL allocation.

> Also, what's up with the get_online_cpus() handling? 
> schedule_on_each_cpu() does it, lru_add_drain_all() does not do it and
> the schedule_on_cpu_mask() documentation forgot to mention it.

The missing get_online_cpus() for lru_add_drain_all() is in v6 of the
patch from Aug 9 (v5 had Tejun's feedback for doing validity-checking
on the schedule_on_cpu_mask() mask argument, and v6 added his Ack
and the missing get/put_online_cpus).

schedule_on_each_cpu() obviously uses get/put_online_cpus and needs it;
I would argue that there's no need to mention it in the docs for
schedule_on_cpu_mask() since if you're going to pass the cpu_online_mask
you'd better know that you should get/put_online_cpus().

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Chris Metcalf <cmetcalf@tilera.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	Tejun Heo <tj@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Cody P Schafer <cody@linux.vnet.ibm.com>
Subject: Re: [PATCH v4 2/2] mm: make lru_add_drain_all() selective
Date: Mon, 12 Aug 2013 21:53:11 -0400	[thread overview]
Message-ID: <52099187.80301@tilera.com> (raw)
In-Reply-To: <20130812140520.c6a2255d2176a690fadf9ba7@linux-foundation.org>

On 8/12/2013 5:05 PM, Andrew Morton wrote:
> On Wed, 7 Aug 2013 16:52:22 -0400 Chris Metcalf <cmetcalf@tilera.com> wrote:
>
>> This change makes lru_add_drain_all() only selectively interrupt
>> the cpus that have per-cpu free pages that can be drained.
>>
>> This is important in nohz mode where calling mlockall(), for
>> example, otherwise will interrupt every core unnecessarily.
>>
>> ...
>>
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -405,6 +405,11 @@ static void activate_page_drain(int cpu)
>>  		pagevec_lru_move_fn(pvec, __activate_page, NULL);
>>  }
>>  
>> +static bool need_activate_page_drain(int cpu)
>> +{
>> +	return pagevec_count(&per_cpu(activate_page_pvecs, cpu)) != 0;
>> +}
>> +
>>  void activate_page(struct page *page)
>>  {
>>  	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
>> @@ -422,6 +427,11 @@ static inline void activate_page_drain(int cpu)
>>  {
>>  }
>>  
>> +static bool need_activate_page_drain(int cpu)
>> +{
>> +	return false;
>> +}
>> +
>>  void activate_page(struct page *page)
>>  {
>>  	struct zone *zone = page_zone(page);
>> @@ -683,7 +693,32 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
>>   */
>>  int lru_add_drain_all(void)
>>  {
>> -	return schedule_on_each_cpu(lru_add_drain_per_cpu);
>> +	cpumask_var_t mask;
>> +	int cpu, rc;
>> +
>> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
>> +		return -ENOMEM;
> Newly adding a GFP_KERNEL allocation attempt into lru_add_drain_all()
> is dangerous and undesirable.  I took a quick look at all the callsites
> and didn't immediately see a bug, but it's hard because they're
> splattered all over the place.  It would be far better if we were to
> not do this.

I think it should be safe, given that we already did alloc_percpu() to do
schedule_on_each_cpu(), and that is documented as doing GFP_KERNEL allocation
(pcpu_create_chunk will call alloc_pages with GFP_KERNEL).

> Rather than tossing this hang-grenade in there we should at a reluctant
> minimum change lru_add_drain_all() to take a gfp_t argument and then
> carefully review and update the callers.
>
>> +	cpumask_clear(mask);
>> +
>> +	/*
>> +	 * Figure out which cpus need flushing.  It's OK if we race
>> +	 * with changes to the per-cpu lru pvecs, since it's no worse
>> +	 * than if we flushed all cpus, since a cpu could still end
>> +	 * up putting pages back on its pvec before we returned.
>> +	 * And this avoids interrupting other cpus unnecessarily.
>> +	 */
>> +	for_each_online_cpu(cpu) {
>> +		if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
>> +		    pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
>> +		    pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
>> +		    need_activate_page_drain(cpu))
>> +			cpumask_set_cpu(cpu, mask);
>> +	}
>> +
>> +	rc = schedule_on_cpu_mask(lru_add_drain_per_cpu, mask);
> And it seems pretty easy to avoid the allocation.  Create a single
> cpumask at boot (or, preferably, at compile-time) and whenever we add a
> page to a drainable pagevec, do
>
> 	cpumask_set_cpu(smp_processor_id(), global_cpumask);
>
> and to avoid needlessly dirtying a cacheline,
>
> 	if (!cpu_isset(smp_processor_id(), global_cpumask))
> 		cpumask_set_cpu(smp_processor_id(), global_cpumask);
>
>
> This means that lru_add_drain_all() will need to clear the mask at some
> point and atomicity issues arise.  It would be better to do the
> clearing within schedule_on_cpu_mask() itself, using
> cpumask_test_and_clear_cpu().

The atomicity issue isn't that big a deal (given that the drain is
racy anyway, you just need to make sure to do cpumask_set_cpu after
the pagevec_add), but you do need to clear the cpumask before doing
the actual drain, and that either means inflicting that semantics
on schedule_on_cpu_mask(), which seems a little unnatural, or else
doing a copy of the mask, which gets us back to where we started
with GFP_KERNEL allocations.

Alternately, you could imagine a workqueue API that just took a function
pointer that returned for each cpu whether or not to schedule work on
that cpu:

  typedef bool (*check_work_func_t)(void *data, int cpu);
  schedule_on_some_cpus(work_func_t func, check_work_func_t checker, void *data);

For the lru stuff we wouldn't need to use a "data" pointer but I'd include
it since it's cheap, pretty standard, and makes the API more general.

Or, I suppose, one other possibility is just a compile-time struct
cpumask that we guard with a lock.  It seems a bit like overkill for
the very common case of not specifying CPUMASK_OFFSTACK.

All that said, I still tend to like the simple cpumask data-driven approach,
assuming you're comfortable with the possible GFP_KERNEL allocation.

> Also, what's up with the get_online_cpus() handling? 
> schedule_on_each_cpu() does it, lru_add_drain_all() does not do it and
> the schedule_on_cpu_mask() documentation forgot to mention it.

The missing get_online_cpus() for lru_add_drain_all() is in v6 of the
patch from Aug 9 (v5 had Tejun's feedback for doing validity-checking
on the schedule_on_cpu_mask() mask argument, and v6 added his Ack
and the missing get/put_online_cpus).

schedule_on_each_cpu() obviously uses get/put_online_cpus and needs it;
I would argue that there's no need to mention it in the docs for
schedule_on_cpu_mask() since if you're going to pass the cpu_online_mask
you'd better know that you should get/put_online_cpus().

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

next prev parent reply	other threads:[~2013-08-13  1:53 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06 20:22 [PATCH] mm: make lru_add_drain_all() selective Chris Metcalf
2013-08-06 20:22 ` Chris Metcalf
2013-08-06 20:22 ` [PATCH v2] " Chris Metcalf
2013-08-06 20:22   ` Chris Metcalf
2013-08-07 20:45   ` Tejun Heo
2013-08-07 20:45     ` Tejun Heo
2013-08-07 20:49     ` [PATCH v3 1/2] workqueue: add new schedule_on_cpu_mask() API Chris Metcalf
2013-08-07 20:49       ` Chris Metcalf
2013-08-07 20:52     ` [PATCH v3 2/2] mm: make lru_add_drain_all() selective Chris Metcalf
2013-08-07 20:52       ` Chris Metcalf
2013-08-07 22:48   ` [PATCH v2] " Cody P Schafer
2013-08-07 22:48     ` Cody P Schafer
2013-08-07 20:49     ` [PATCH v4 1/2] workqueue: add new schedule_on_cpu_mask() API Chris Metcalf
2013-08-07 20:49       ` Chris Metcalf
2013-08-09 15:02       ` Tejun Heo
2013-08-09 15:02         ` Tejun Heo
2013-08-09 16:12         ` Chris Metcalf
2013-08-09 16:12           ` Chris Metcalf
2013-08-09 16:30           ` Tejun Heo
2013-08-09 16:30             ` Tejun Heo
2013-08-07 20:49             ` [PATCH v5 " Chris Metcalf
2013-08-07 20:49               ` Chris Metcalf
2013-08-09 17:40               ` Tejun Heo
2013-08-09 17:40                 ` Tejun Heo
2013-08-09 17:49                 ` [PATCH v6 " Chris Metcalf
2013-08-09 17:49                   ` Chris Metcalf
2013-08-09 17:52                 ` [PATCH v6 2/2] mm: make lru_add_drain_all() selective Chris Metcalf
2013-08-09 17:52                   ` Chris Metcalf
2013-08-07 20:52             ` [PATCH v5 " Chris Metcalf
2013-08-07 20:52               ` Chris Metcalf
2013-08-07 20:52     ` [PATCH v4 " Chris Metcalf
2013-08-07 20:52       ` Chris Metcalf
2013-08-12 21:05       ` Andrew Morton
2013-08-12 21:05         ` Andrew Morton
2013-08-13  1:53         ` Chris Metcalf [this message]
2013-08-13  1:53           ` Chris Metcalf
2013-08-13 19:35           ` Andrew Morton
2013-08-13 19:35             ` Andrew Morton
2013-08-13 20:19             ` Tejun Heo
2013-08-13 20:19               ` Tejun Heo
2013-08-13 20:31               ` Andrew Morton
2013-08-13 20:31                 ` Andrew Morton
2013-08-13 20:59                 ` Chris Metcalf
2013-08-13 20:59                   ` Chris Metcalf
2013-08-13 21:13                   ` Andrew Morton
2013-08-13 21:13                     ` Andrew Morton
2013-08-13 22:13                     ` Chris Metcalf
2013-08-13 22:13                       ` Chris Metcalf
2013-08-13 22:26                       ` Andrew Morton
2013-08-13 22:26                         ` Andrew Morton
2013-08-13 23:04                         ` Chris Metcalf
2013-08-13 23:04                           ` Chris Metcalf
2013-08-13 22:51                       ` [PATCH v7 1/2] workqueue: add schedule_on_each_cpu_cond Chris Metcalf
2013-08-13 22:51                         ` Chris Metcalf
2013-08-13 22:53                       ` [PATCH v7 2/2] mm: make lru_add_drain_all() selective Chris Metcalf
2013-08-13 22:53                         ` Chris Metcalf
2013-08-13 23:29                         ` Tejun Heo
2013-08-13 23:29                           ` Tejun Heo
2013-08-13 23:32                           ` Chris Metcalf
2013-08-13 23:32                             ` Chris Metcalf
2013-08-14  6:46                             ` Andrew Morton
2013-08-14  6:46                               ` Andrew Morton
2013-08-14 13:05                               ` Tejun Heo
2013-08-14 13:05                                 ` Tejun Heo
2013-08-14 16:03                               ` Chris Metcalf
2013-08-14 16:03                                 ` Chris Metcalf
2013-08-14 16:57                                 ` Tejun Heo
2013-08-14 16:57                                   ` Tejun Heo
2013-08-14 17:18                                   ` Chris Metcalf
2013-08-14 17:18                                     ` Chris Metcalf
2013-08-14 20:07                                     ` Tejun Heo
2013-08-14 20:07                                       ` Tejun Heo
2013-08-14 20:22                                       ` [PATCH v8] " Chris Metcalf
2013-08-14 20:22                                         ` Chris Metcalf
2013-08-14 20:44                                         ` Andrew Morton
2013-08-14 20:44                                           ` Andrew Morton
2013-08-14 20:50                                           ` Tejun Heo
2013-08-14 20:50                                             ` Tejun Heo
2013-08-14 21:03                                             ` Andrew Morton
2013-08-14 21:03                                               ` Andrew Morton
2013-08-14 21:07                                             ` Andrew Morton
2013-08-14 21:07                                               ` Andrew Morton
2013-08-14 21:12                                         ` Andrew Morton
2013-08-14 21:12                                           ` Andrew Morton
2013-08-14 21:23                                           ` Chris Metcalf
2013-08-14 21:23                                             ` Chris Metcalf
2013-08-13 23:44                           ` [PATCH v7 2/2] " Chris Metcalf
2013-08-13 23:44                             ` Chris Metcalf
2013-08-13 23:51                             ` Tejun Heo
2013-08-13 23:51                               ` Tejun Heo
2013-08-13 21:07                 ` [PATCH v4 " Tejun Heo
2013-08-13 21:07                   ` Tejun Heo
2013-08-13 21:16                   ` Andrew Morton
2013-08-13 21:16                     ` Andrew Morton
2013-08-13 22:07                     ` Tejun Heo
2013-08-13 22:07                       ` Tejun Heo
2013-08-13 22:18                       ` Andrew Morton
2013-08-13 22:18                         ` Andrew Morton
2013-08-13 22:33                         ` Tejun Heo
2013-08-13 22:33                           ` Tejun Heo
2013-08-13 22:47                           ` Andrew Morton
2013-08-13 22:47                             ` Andrew Morton
2013-08-13 23:03                             ` Tejun Heo
2013-08-13 23:03                               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52099187.80301@tilera.com \
    --to=cmetcalf@tilera.com \
    --cc=akpm@linux-foundation.org \
    --cc=cody@linux.vnet.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.