Re: [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Nick Piggin <npiggin@suse.de>,
	linux-kernel@vger.kernel.org, Hugh Dickins <hugh@veritas.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	linux-mm@kvack.org, Christoph Lameter <cl@linux-foundation.org>,
	Gautham Shenoy <ego@in.ibm.com>, Oleg Nesterov <oleg@tv-sign.ru>,
	Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu()
Date: Sun, 26 Oct 2008 12:06:16 +0100	[thread overview]
Message-ID: <1225019176.32713.5.camel@twins> (raw)
In-Reply-To: <20081023235425.9C40.KOSAKI.MOTOHIRO@jp.fujitsu.com>

On Fri, 2008-10-24 at 00:00 +0900, KOSAKI Motohiro wrote:

> It because following three circular locking dependency.
> 
> Some VM place has
>       mmap_sem -> kevent_wq via lru_add_drain_all()
> 
> net/core/dev.c::dev_ioctl()  has
>      rtnl_lock  ->  mmap_sem        (*) the ioctl has copy_from_user() and it can do page fault.
> 
> linkwatch_event has
>      kevent_wq -> rtnl_lock
> 
> 
> Actually, schedule_on_each_cpu() is very problematic function.
> it introduce the dependency of all worker on keventd_wq, 
> but we can't know what lock held by worker in kevend_wq because
> keventd_wq is widely used out of kernel drivers too.
> 
> So, the task of any lock held shouldn't wait on keventd_wq.
> Its task should use own special purpose work queue.
> 
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> CC: Christoph Lameter <cl@linux-foundation.org>
> CC: Nick Piggin <npiggin@suse.de>
> CC: Hugh Dickins <hugh@veritas.com>,
> CC: Andrew Morton <akpm@linux-foundation.org>,
> CC: Linus Torvalds <torvalds@linux-foundation.org>,
> CC: Rik van Riel <riel@redhat.com>,
> CC: Lee Schermerhorn <lee.schermerhorn@hp.com>,
> 
>  linux-2.6.27-git10-vm_wq/include/linux/workqueue.h |    1 
>  linux-2.6.27-git10-vm_wq/kernel/workqueue.c        |   37 +++++++++++++++++++++
>  linux-2.6.27-git10-vm_wq/mm/swap.c                 |    8 +++-
>  3 files changed, 45 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.27-git10-vm_wq/include/linux/workqueue.h
> ===================================================================
> --- linux-2.6.27-git10-vm_wq.orig/include/linux/workqueue.h	2008-10-23 21:01:38.000000000 +0900
> +++ linux-2.6.27-git10-vm_wq/include/linux/workqueue.h	2008-10-23 22:34:20.000000000 +0900
> @@ -195,6 +195,7 @@ extern int schedule_delayed_work(struct 
>  extern int schedule_delayed_work_on(int cpu, struct delayed_work *work,
>  					unsigned long delay);
>  extern int schedule_on_each_cpu(work_func_t func);
> +int queue_work_on_each_cpu(struct workqueue_struct *wq, work_func_t func);
>  extern int current_is_keventd(void);
>  extern int keventd_up(void);
>  
> Index: linux-2.6.27-git10-vm_wq/kernel/workqueue.c
> ===================================================================
> --- linux-2.6.27-git10-vm_wq.orig/kernel/workqueue.c	2008-10-23 21:01:38.000000000 +0900
> +++ linux-2.6.27-git10-vm_wq/kernel/workqueue.c	2008-10-23 22:34:20.000000000 +0900
> @@ -674,6 +674,8 @@ EXPORT_SYMBOL(schedule_delayed_work_on);
>   * Returns -ve errno on failure.
>   *
>   * schedule_on_each_cpu() is very slow.
> + * caller should NOT held any lock, otherwise flush_work(keventd_wq) can
> + * cause dead-lock.

I think this is too strong.

> */
>  int schedule_on_each_cpu(work_func_t func)
>  {
> @@ -698,6 +700,41 @@ int schedule_on_each_cpu(work_func_t fun
>  	return 0;
>  }
>  
> +/**
> + * queue_work_on_each_cpu - call a function on each online CPU
> + *
> + * @wq:   the workqueue
> + * @func: the function to call
> + *
> + * Returns zero on success.
> + * Returns -ve errno on failure.
> + *
> + * similar to schedule_on_each_cpu(), but wq argument is there.
> + * queue_work_on_each_cpu() is very slow.
> + */
> +int queue_work_on_each_cpu(struct workqueue_struct *wq, work_func_t func)
> +{
> +	int cpu;
> +	struct work_struct *works;
> +
> +	works = alloc_percpu(struct work_struct);
> +	if (!works)
> +		return -ENOMEM;
> +
> +	get_online_cpus();
> +	for_each_online_cpu(cpu) {
> +		struct work_struct *work = per_cpu_ptr(works, cpu);
> +
> +		INIT_WORK(work, func);
> +		queue_work_on(cpu, wq, work);
> +	}
> +	for_each_online_cpu(cpu)
> +		flush_work(per_cpu_ptr(works, cpu));
> +	put_online_cpus();
> +	free_percpu(works);
> +	return 0;
> +}
> +

Which gives the opportunity to implement schedule_on_each_cpu() with
this.

> void flush_scheduled_work(void)
>  {
>  	flush_workqueue(keventd_wq);
> Index: linux-2.6.27-git10-vm_wq/mm/swap.c
> ===================================================================
> --- linux-2.6.27-git10-vm_wq.orig/mm/swap.c	2008-10-23 21:01:38.000000000 +0900
> +++ linux-2.6.27-git10-vm_wq/mm/swap.c	2008-10-23 22:53:27.000000000 +0900
> @@ -39,6 +39,8 @@ int page_cluster;
>  static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
>  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
>  
> +static struct workqueue_struct *vm_wq __read_mostly;
> +
>  /*
>   * This path almost never happens for VM activity - pages are normally
>   * freed via pagevecs.  But it gets used by networking.
> @@ -310,7 +312,7 @@ static void lru_add_drain_per_cpu(struct
>   */
>  int lru_add_drain_all(void)
>  {
> -	return schedule_on_each_cpu(lru_add_drain_per_cpu);
> +	return queue_work_on_each_cpu(vm_wq, lru_add_drain_per_cpu);
>  }
>  
>  #else
> @@ -611,4 +613,8 @@ void __init swap_setup(void)
>  #ifdef CONFIG_HOTPLUG_CPU
>  	hotcpu_notifier(cpu_swap_callback, 0);
>  #endif
> +
> +	vm_wq = create_workqueue("vm_work");
> +	BUG_ON(!vm_wq);
> +
>  }

While I really hate adding yet another per-cpu thread for this, I don't
see another way out atm.

Oleg, Rusty, ego, you lot were discussing a similar extra per-cpu
workqueue, can we merge these two?

WARNING: multiple messages have this Message-ID (diff)

From: Peter Zijlstra <peterz@infradead.org>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Nick Piggin <npiggin@suse.de>,
	linux-kernel@vger.kernel.org, Hugh Dickins <hugh@veritas.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	linux-mm@kvack.org, Christoph Lameter <cl@linux-foundation.org>,
	Gautham Shenoy <ego@in.ibm.com>, Oleg Nesterov <oleg@tv-sign.ru>,
	Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu()
Date: Sun, 26 Oct 2008 12:06:16 +0100	[thread overview]
Message-ID: <1225019176.32713.5.camel@twins> (raw)
In-Reply-To: <20081023235425.9C40.KOSAKI.MOTOHIRO@jp.fujitsu.com>

On Fri, 2008-10-24 at 00:00 +0900, KOSAKI Motohiro wrote:

> It because following three circular locking dependency.
> 
> Some VM place has
>       mmap_sem -> kevent_wq via lru_add_drain_all()
> 
> net/core/dev.c::dev_ioctl()  has
>      rtnl_lock  ->  mmap_sem        (*) the ioctl has copy_from_user() and it can do page fault.
> 
> linkwatch_event has
>      kevent_wq -> rtnl_lock
> 
> 
> Actually, schedule_on_each_cpu() is very problematic function.
> it introduce the dependency of all worker on keventd_wq, 
> but we can't know what lock held by worker in kevend_wq because
> keventd_wq is widely used out of kernel drivers too.
> 
> So, the task of any lock held shouldn't wait on keventd_wq.
> Its task should use own special purpose work queue.
> 
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> CC: Christoph Lameter <cl@linux-foundation.org>
> CC: Nick Piggin <npiggin@suse.de>
> CC: Hugh Dickins <hugh@veritas.com>,
> CC: Andrew Morton <akpm@linux-foundation.org>,
> CC: Linus Torvalds <torvalds@linux-foundation.org>,
> CC: Rik van Riel <riel@redhat.com>,
> CC: Lee Schermerhorn <lee.schermerhorn@hp.com>,
> 
>  linux-2.6.27-git10-vm_wq/include/linux/workqueue.h |    1 
>  linux-2.6.27-git10-vm_wq/kernel/workqueue.c        |   37 +++++++++++++++++++++
>  linux-2.6.27-git10-vm_wq/mm/swap.c                 |    8 +++-
>  3 files changed, 45 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.27-git10-vm_wq/include/linux/workqueue.h
> ===================================================================
> --- linux-2.6.27-git10-vm_wq.orig/include/linux/workqueue.h	2008-10-23 21:01:38.000000000 +0900
> +++ linux-2.6.27-git10-vm_wq/include/linux/workqueue.h	2008-10-23 22:34:20.000000000 +0900
> @@ -195,6 +195,7 @@ extern int schedule_delayed_work(struct 
>  extern int schedule_delayed_work_on(int cpu, struct delayed_work *work,
>  					unsigned long delay);
>  extern int schedule_on_each_cpu(work_func_t func);
> +int queue_work_on_each_cpu(struct workqueue_struct *wq, work_func_t func);
>  extern int current_is_keventd(void);
>  extern int keventd_up(void);
>  
> Index: linux-2.6.27-git10-vm_wq/kernel/workqueue.c
> ===================================================================
> --- linux-2.6.27-git10-vm_wq.orig/kernel/workqueue.c	2008-10-23 21:01:38.000000000 +0900
> +++ linux-2.6.27-git10-vm_wq/kernel/workqueue.c	2008-10-23 22:34:20.000000000 +0900
> @@ -674,6 +674,8 @@ EXPORT_SYMBOL(schedule_delayed_work_on);
>   * Returns -ve errno on failure.
>   *
>   * schedule_on_each_cpu() is very slow.
> + * caller should NOT held any lock, otherwise flush_work(keventd_wq) can
> + * cause dead-lock.

I think this is too strong.

> */
>  int schedule_on_each_cpu(work_func_t func)
>  {
> @@ -698,6 +700,41 @@ int schedule_on_each_cpu(work_func_t fun
>  	return 0;
>  }
>  
> +/**
> + * queue_work_on_each_cpu - call a function on each online CPU
> + *
> + * @wq:   the workqueue
> + * @func: the function to call
> + *
> + * Returns zero on success.
> + * Returns -ve errno on failure.
> + *
> + * similar to schedule_on_each_cpu(), but wq argument is there.
> + * queue_work_on_each_cpu() is very slow.
> + */
> +int queue_work_on_each_cpu(struct workqueue_struct *wq, work_func_t func)
> +{
> +	int cpu;
> +	struct work_struct *works;
> +
> +	works = alloc_percpu(struct work_struct);
> +	if (!works)
> +		return -ENOMEM;
> +
> +	get_online_cpus();
> +	for_each_online_cpu(cpu) {
> +		struct work_struct *work = per_cpu_ptr(works, cpu);
> +
> +		INIT_WORK(work, func);
> +		queue_work_on(cpu, wq, work);
> +	}
> +	for_each_online_cpu(cpu)
> +		flush_work(per_cpu_ptr(works, cpu));
> +	put_online_cpus();
> +	free_percpu(works);
> +	return 0;
> +}
> +

Which gives the opportunity to implement schedule_on_each_cpu() with
this.

> void flush_scheduled_work(void)
>  {
>  	flush_workqueue(keventd_wq);
> Index: linux-2.6.27-git10-vm_wq/mm/swap.c
> ===================================================================
> --- linux-2.6.27-git10-vm_wq.orig/mm/swap.c	2008-10-23 21:01:38.000000000 +0900
> +++ linux-2.6.27-git10-vm_wq/mm/swap.c	2008-10-23 22:53:27.000000000 +0900
> @@ -39,6 +39,8 @@ int page_cluster;
>  static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
>  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
>  
> +static struct workqueue_struct *vm_wq __read_mostly;
> +
>  /*
>   * This path almost never happens for VM activity - pages are normally
>   * freed via pagevecs.  But it gets used by networking.
> @@ -310,7 +312,7 @@ static void lru_add_drain_per_cpu(struct
>   */
>  int lru_add_drain_all(void)
>  {
> -	return schedule_on_each_cpu(lru_add_drain_per_cpu);
> +	return queue_work_on_each_cpu(vm_wq, lru_add_drain_per_cpu);
>  }
>  
>  #else
> @@ -611,4 +613,8 @@ void __init swap_setup(void)
>  #ifdef CONFIG_HOTPLUG_CPU
>  	hotcpu_notifier(cpu_swap_callback, 0);
>  #endif
> +
> +	vm_wq = create_workqueue("vm_work");
> +	BUG_ON(!vm_wq);
> +
>  }

While I really hate adding yet another per-cpu thread for this, I don't
see another way out atm.

Oleg, Rusty, ego, you lot were discussing a similar extra per-cpu
workqueue, can we merge these two?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-10-26 11:07 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200810201659.m9KGxtFC016280@hera.kernel.org>
2008-10-21 15:13 ` mlock: mlocked pages are unevictable Heiko Carstens
2008-10-21 15:13   ` Heiko Carstens
2008-10-21 15:51   ` KOSAKI Motohiro
2008-10-21 15:51     ` KOSAKI Motohiro
2008-10-21 17:18     ` KOSAKI Motohiro
2008-10-21 17:18       ` KOSAKI Motohiro
2008-10-21 20:30       ` Peter Zijlstra
2008-10-21 20:30         ` Peter Zijlstra
2008-10-21 20:48         ` Peter Zijlstra
2008-10-21 20:48           ` Peter Zijlstra
2008-10-23 15:00       ` [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu() KOSAKI Motohiro
2008-10-23 15:00         ` KOSAKI Motohiro
2008-10-24  1:28         ` Nick Piggin
2008-10-24  1:28           ` Nick Piggin
2008-10-24  4:54           ` KOSAKI Motohiro
2008-10-24  4:54             ` KOSAKI Motohiro
2008-10-24  4:55             ` Nick Piggin
2008-10-24  4:55               ` Nick Piggin
2008-10-24  5:29               ` KOSAKI Motohiro
2008-10-24  5:29                 ` KOSAKI Motohiro
2008-10-24  5:34                 ` Nick Piggin
2008-10-24  5:34                   ` Nick Piggin
2008-10-24  5:51                   ` KOSAKI Motohiro
2008-10-24  5:51                     ` KOSAKI Motohiro
2008-10-24 19:20         ` Heiko Carstens
2008-10-24 19:20           ` Heiko Carstens
2008-10-26 11:06         ` Peter Zijlstra [this message]
2008-10-26 11:06           ` Peter Zijlstra
2008-10-26 13:37           ` KOSAKI Motohiro
2008-10-26 13:37             ` KOSAKI Motohiro
2008-10-26 13:49             ` Peter Zijlstra
2008-10-26 13:49               ` Peter Zijlstra
2008-10-26 15:51               ` KOSAKI Motohiro
2008-10-26 15:51                 ` KOSAKI Motohiro
2008-10-26 16:17                 ` Peter Zijlstra
2008-10-26 16:17                   ` Peter Zijlstra
2008-10-27  3:14                   ` KOSAKI Motohiro
2008-10-27  3:14                     ` KOSAKI Motohiro
2008-10-27  7:56                     ` Peter Zijlstra
2008-10-27  7:56                       ` Peter Zijlstra
2008-10-27  8:03                       ` KOSAKI Motohiro
2008-10-27  8:03                         ` KOSAKI Motohiro
2008-10-27 10:42                         ` KOSAKI Motohiro
2008-10-27 10:42                           ` KOSAKI Motohiro
2008-10-27 21:55         ` Andrew Morton
2008-10-27 21:55           ` Andrew Morton
2008-10-28 14:25           ` Christoph Lameter
2008-10-28 14:25             ` Christoph Lameter
2008-10-28 20:45             ` Andrew Morton
2008-10-28 20:45               ` Andrew Morton
2008-10-28 21:29               ` Lee Schermerhorn
2008-10-28 21:29                 ` Lee Schermerhorn
2008-10-29  7:17                 ` KOSAKI Motohiro
2008-10-29  7:17                   ` KOSAKI Motohiro
2008-10-29 12:40                   ` Lee Schermerhorn
2008-11-06  0:14                     ` [PATCH] get rid of lru_add_drain_all() in munlock path KOSAKI Motohiro
2008-11-06  0:14                       ` KOSAKI Motohiro
2008-11-06 16:33                       ` Kamalesh Babulal
2008-11-06 16:33                         ` Kamalesh Babulal
2008-10-29  7:20               ` [RFC][PATCH] lru_add_drain_all() don't use schedule_on_each_cpu() KOSAKI Motohiro
2008-10-29  7:20                 ` KOSAKI Motohiro
2008-10-29  8:21                 ` KAMEZAWA Hiroyuki
2008-10-29  8:21                   ` KAMEZAWA Hiroyuki
2008-11-05  9:51                 ` Peter Zijlstra
2008-11-05  9:51                   ` Peter Zijlstra
2008-11-05  9:55                   ` KOSAKI Motohiro
2008-11-05  9:55                     ` KOSAKI Motohiro
2008-10-22 15:28   ` mlock: mlocked pages are unevictable Lee Schermerhorn
2008-10-22 15:28     ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1225019176.32713.5.camel@twins \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=ego@in.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hugh@veritas.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@tv-sign.ru \
    --cc=riel@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.