Re: needed lru_add_drain_all() change

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Subject: Re: needed lru_add_drain_all() change
Date: Fri, 29 Jun 2012 12:24:33 +0900	[thread overview]
Message-ID: <4FED1FF1.7000901@jp.fujitsu.com> (raw)
In-Reply-To: <4FECEBF4.7010202@kernel.org>

(2012/06/29 8:42), Minchan Kim wrote:
> On 06/28/2012 04:43 PM, Kamezawa Hiroyuki wrote:
>
>> (2012/06/27 6:37), Andrew Morton wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=43811
>>>
>>> lru_add_drain_all() uses schedule_on_each_cpu().  But
>>> schedule_on_each_cpu() hangs if a realtime thread is spinning, pinned
>>> to a CPU.  There's no intention to change the scheduler behaviour, so I
>>> think we should remove schedule_on_each_cpu() from the kernel.
>>>
>>> The biggest user of schedule_on_each_cpu() is lru_add_drain_all().
>>>
>>> Does anyone have any thoughts on how we can do this?  The obvious
>>> approach is to declare these:
>>>
>>> static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
>>> static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
>>> static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
>>>
>>> to be irq-safe and use on_each_cpu().  lru_rotate_pvecs is already
>>> irq-safe and converting lru_add_pvecs and lru_deactivate_pvecs looks
>>> pretty simple.
>>>
>>> Thoughts?
>>>
>>
>> How about this kind of RCU synchronization ?
>> ==
>> /*
>>   * Double buffered pagevec for quick drain.
>>   * The usual per-cpu-pvec user need to take rcu_read_lock() before
>> accessing.
>>   * External drainer of pvecs will relpace pvec vector and call
>> synchroize_rcu(),
>>   * and drain all pages on unused pvecs in turn.
>>   */
>> static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS * 2], lru_pvecs);
>>
>> atomic_t pvec_idx; /* must be placed onto some aligned address...*/
>>
>>
>> struct pagevec *my_pagevec(enum lru)
>> {
>>      return  pvec = &__get_cpu_var(lru_pvecs[lru << atomic_read(pvec_idx)]);
>> }
>>
>> /*
>>   * percpu pagevec access should be surrounded by these calls.
>>   */
>> static inline void pagevec_start_access()
>> {
>>      rcu_read_lock();
>> }
>>
>> static inline void pagevec_end_access()
>> {
>>      rcu_read_unlock();
>> }
>>
>>
>> /*
>>   * changing pagevec array vec 0 <-> 1
>>   */
>> static void lru_pvec_update()
>> {
>>      if (atomic_read(&pvec_idx))
>>          atomic_set(&pvec_idx, 0);
>>      else
>>          atomic_set(&pvec_idx, 1);
>> }
>>
>> /*
>>   * drain all LRUS on per-cpu pagevecs.
>>   */
>> DEFINE_MUTEX(lru_add_drain_all_mutex);
>> static void lru_add_drain_all()
>> {
>>      mutex_lock(&lru_add_drain_mutex);
>>      lru_pvec_update();
>>      synchronize_rcu();  /* waits for all accessors to pvec quits. */
>
>
> I don't know RCU internal but conceptually, I understood synchronize_rcu need
> context switching of all CPU. If it's partly true, it could be a problem, too.
>

Hmm, from Documenatation/RCU/stallwarn.txt
==

o       For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
         without invoking schedule().

o       A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
         happen to preempt a low-priority task in the middle of an RCU
         read-side critical section.   This is especially damaging if
         that low-priority task is not permitted to run on any other CPU,
         in which case the next RCU grace period can never complete, which
         will eventually cause the system to run out of memory and hang.
         While the system is in the process of running itself out of
         memory, you might see stall-warning messages.

o       A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
         is running at a higher priority than the RCU softirq threads.
         This will prevent RCU callbacks from ever being invoked,
         and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
         RCU grace periods from ever completing.  Either way, the
         system will eventually run out of memory and hang.  In the
         CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
         messages.
==
you're right. (RCU stall warning seems to be shown per 60secs at default.)

I'm wondering to do sync without RCU...
==
pvec_start_access(struct pagevec *pvec)
{
	atomic_inc(&pvec->using);
}

pvec_end_access(struct pagevec *pvec)
{
	atomic_dec(&pvec->using);
}

synchronize_pvec()
{
	for_each_cpu(cpu)
		wait for pvec->using to be 0.
}

static void lru_add_drain_all()
{
	mutex_lock();
	lru_pvec_update(); //switch pvec
	synchronize_pvec(); // wait for all user exits
	for_each_cpu()
		drain pages in pvec
	mutex_unlock()
}
==

"disable_irq() + intterupt()" will be easier.

What is the cost of IRQ-disable v.s. atomic_inc() for local variable...

Regards,
-Kame











--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2012-06-29  3:26 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-26 21:37 needed lru_add_drain_all() change Andrew Morton
2012-06-27  0:55 ` Minchan Kim
2012-06-27  1:15   ` Andrew Morton
2012-06-27  1:20     ` Minchan Kim
2012-06-27  1:29       ` Andrew Morton
2012-06-27  2:09     ` Minchan Kim
2012-06-27  5:12       ` Andrew Morton
2012-06-27  5:41         ` Minchan Kim
2012-06-27  5:55           ` Andrew Morton
2012-06-27  6:33             ` Minchan Kim
2012-06-27  6:41               ` Andrew Morton
2012-06-27 10:27                 ` Peter Zijlstra
2012-06-27  6:46               ` Andrew Morton
2012-06-27 10:31                 ` Peter Zijlstra
2012-06-27 12:04 ` Peter Zijlstra
2012-06-28  6:23 ` KOSAKI Motohiro
2012-06-29  3:47   ` Kamezawa Hiroyuki
2012-06-28  7:43 ` Kamezawa Hiroyuki
2012-06-28 23:42   ` Minchan Kim
2012-06-29  3:24     ` Kamezawa Hiroyuki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FED1FF1.7000901@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).