Change to invalidate_bdev() may break emergency remount R/O

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Change to invalidate_bdev() may break emergency remount R/O
@ 2010-05-26 15:01 David Howells
  2010-05-27  9:57 ` [PATCH] fs: run emergency remount on dedicated workqueue Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: David Howells @ 2010-05-26 15:01 UTC (permalink / raw)
  To: Tejun Heo, davem, jens.axboe; +Cc: dhowells, linux-kernel, torvalds


The following commit may be a problem for emergency_remount() [Alt+SysRq+U]:

	commit fa4b9074cd8428958c2adf9dc0c831f46e27c193
	Author: Tejun Heo <tj@kernel.org>
	Date:   Sat May 15 20:09:27 2010 +0200

	    buffer: make invalidate_bdev() drain all percpu LRU add caches

	    invalidate_bdev() should release all page cache pages which are clean
	    and not being used; however, if some pages are still in the percpu LRU
	    add caches on other cpus, those pages are considered in used and don't
	    get released.  Fix it by calling lru_add_drain_all() before trying to
	    invalidate pages.

	    This problem was discovered while testing block automatic native
	    capacity unlocking.  Null pages which were read before automatic
	    unlocking didn't get released by invalidate_bdev() and ended up
	    interfering with partition scan after unlocking.

	    Signed-off-by: Tejun Heo <tj@kernel.org>
	    Acked-by: David S. Miller <davem@davemloft.net>
	    Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

The symptoms are a lockdep warning:

SysRq : Emergency Remount R/O

=============================================
[ INFO: possible recursive locking detected ]
2.6.34-cachefs #101
---------------------------------------------
events/0/9 is trying to acquire lock:
 (events){+.+.+.}, at: [<ffffffff81042cf0>] flush_work+0x34/0xec

but task is already holding lock:
 (events){+.+.+.}, at: [<ffffffff81042264>] worker_thread+0x19a/0x2e2

other info that might help us debug this:
3 locks held by events/0/9:
 #0:  (events){+.+.+.}, at: [<ffffffff81042264>] worker_thread+0x19a/0x2e2
 #1:  ((work)#3){+.+...}, at: [<ffffffff81042264>] worker_thread+0x19a/0x2e2
 #2:  (&type->s_umount_key#30){++++..}, at: [<ffffffff810b3fc8>] do_emergency_remount+0x54/0xda

stack backtrace:
Pid: 9, comm: events/0 Not tainted 2.6.34-cachefs #101
Call Trace:
 [<ffffffff81054e80>] validate_chain+0x584/0xd23
 [<ffffffff81052ad4>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff8101c264>] ? flat_send_IPI_mask+0x74/0x86
 [<ffffffff81055ea8>] __lock_acquire+0x889/0x8fa
 [<ffffffff8102bcc9>] ? try_to_wake_up+0x23b/0x24d
 [<ffffffff8108fe45>] ? lru_add_drain_per_cpu+0x0/0xb
 [<ffffffff81055f70>] lock_acquire+0x57/0x6d
 [<ffffffff81042cf0>] ? flush_work+0x34/0xec
 [<ffffffff81042d1c>] flush_work+0x60/0xec
 [<ffffffff81042cf0>] ? flush_work+0x34/0xec
 [<ffffffff8108fbc1>] ? ____pagevec_lru_add+0x140/0x156
 [<ffffffff8108fe45>] ? lru_add_drain_per_cpu+0x0/0xb
 [<ffffffff8108fdc5>] ? lru_add_drain+0x3b/0x8f
 [<ffffffff81042eba>] schedule_on_each_cpu+0x112/0x152
 [<ffffffff8108fc5a>] lru_add_drain_all+0x10/0x12
 [<ffffffff810d509e>] invalidate_bdev+0x28/0x3a
 [<ffffffff810b3ef3>] do_remount_sb+0x129/0x14e
 [<ffffffff810b3ff3>] do_emergency_remount+0x7f/0xda
 [<ffffffff810422b9>] worker_thread+0x1ef/0x2e2
 [<ffffffff81042264>] ? worker_thread+0x19a/0x2e2
 [<ffffffff810b3f74>] ? do_emergency_remount+0x0/0xda
 [<ffffffff81045fcd>] ? autoremove_wake_function+0x0/0x34
 [<ffffffff810420ca>] ? worker_thread+0x0/0x2e2
 [<ffffffff81045be7>] kthread+0x7a/0x82
 [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10
 [<ffffffff813e2c3c>] ? restore_args+0x0/0x30
 [<ffffffff81045b6d>] ? kthread+0x0/0x82
 [<ffffffff81002cd0>] ? kernel_thread_helper+0x0/0x10
Emergency Remount complete

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] fs: run emergency remount on dedicated workqueue
  2010-05-26 15:01 Change to invalidate_bdev() may break emergency remount R/O David Howells
@ 2010-05-27  9:57 ` Tejun Heo
  2010-05-27 14:59   ` Américo Wang
  2010-06-01 23:46   ` Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2010-05-27  9:57 UTC (permalink / raw)
  To: David Howells; +Cc: davem, jens.axboe, linux-kernel, torvalds, viro, akpm

Commit fa4b9074cd8428958c2adf9dc0c831f46e27c193 made s_umount depend
on keventd; however, emergency remount schedules works to keventd
which grabs s_umount creating a circular dependency.  Run emergency
remount on a separate workqueue to break it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Unless someone objects, Andrew, can you please take this patch?

Thanks.

 fs/super.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 69688b1..1ada607 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -575,6 +575,11 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 	return 0;
 }

+/*
+ * For emergency remount
+ */
+static struct workqueue_struct *emergency_remount_wq;
+
 static void do_emergency_remount(struct work_struct *work)
 {
 	struct super_block *sb, *n;
@@ -605,13 +610,25 @@ void emergency_remount(void)
 {
 	struct work_struct *work;

+	if (!emergency_remount_wq)
+		return;
+
 	work = kmalloc(sizeof(*work), GFP_ATOMIC);
 	if (work) {
 		INIT_WORK(work, do_emergency_remount);
-		schedule_work(work);
+		queue_work(emergency_remount_wq, work);
 	}
 }

+static int __init emergency_remount_init(void)
+{
+	emergency_remount_wq = create_singlethread_workqueue("emerg-remount");
+	if (!emergency_remount_wq)
+		pr_warn("failed to create emergency remount workqueue\n");
+	return 0;
+}
+subsys_initcall(emergency_remount_init);
+
 /*
  * Unnamed block devices are dummy devices used by virtual
  * filesystems which don't use real block-devices.  -- jrs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: run emergency remount on dedicated workqueue
  2010-05-27  9:57 ` [PATCH] fs: run emergency remount on dedicated workqueue Tejun Heo
@ 2010-05-27 14:59   ` Américo Wang
  2010-05-27 17:03     ` Tejun Heo
  2010-06-01 23:46   ` Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Américo Wang @ 2010-05-27 14:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Howells, davem, jens.axboe, linux-kernel, torvalds, viro,
	akpm

On Thu, May 27, 2010 at 11:57:23AM +0200, Tejun Heo wrote:
>Commit fa4b9074cd8428958c2adf9dc0c831f46e27c193 made s_umount depend
>on keventd; however, emergency remount schedules works to keventd
>which grabs s_umount creating a circular dependency.  Run emergency
>remount on a separate workqueue to break it.
>


I have a stupid question, why using workqueue instead of
calling do_remount_sb() directly in emergency_remount()?
Avoid blocking emergency_remount()?

Thanks.

-- 
Live like a child, think like the god.
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: run emergency remount on dedicated workqueue
  2010-05-27 14:59   ` Américo Wang
@ 2010-05-27 17:03     ` Tejun Heo
  2010-05-28  6:46       ` Américo Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2010-05-27 17:03 UTC (permalink / raw)
  To: Américo Wang
  Cc: David Howells, davem, jens.axboe, linux-kernel, torvalds, viro,
	akpm

On 05/27/2010 04:59 PM, Américo Wang wrote:
> On Thu, May 27, 2010 at 11:57:23AM +0200, Tejun Heo wrote:
>> Commit fa4b9074cd8428958c2adf9dc0c831f46e27c193 made s_umount depend
>> on keventd; however, emergency remount schedules works to keventd
>> which grabs s_umount creating a circular dependency.  Run emergency
>> remount on a separate workqueue to break it.
>>
> 
> I have a stupid question, why using workqueue instead of
> calling do_remount_sb() directly in emergency_remount()?
> Avoid blocking emergency_remount()?

Umm... because it's called from interrupt handler?  Right?

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: run emergency remount on dedicated workqueue
  2010-05-27 17:03     ` Tejun Heo
@ 2010-05-28  6:46       ` Américo Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Américo Wang @ 2010-05-28  6:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Américo Wang, David Howells, davem, jens.axboe, linux-kernel,
	torvalds, viro, akpm

On Thu, May 27, 2010 at 07:03:11PM +0200, Tejun Heo wrote:
>On 05/27/2010 04:59 PM, Américo Wang wrote:
>> On Thu, May 27, 2010 at 11:57:23AM +0200, Tejun Heo wrote:
>>> Commit fa4b9074cd8428958c2adf9dc0c831f46e27c193 made s_umount depend
>>> on keventd; however, emergency remount schedules works to keventd
>>> which grabs s_umount creating a circular dependency.  Run emergency
>>> remount on a separate workqueue to break it.
>>>
>> 
>> I have a stupid question, why using workqueue instead of
>> calling do_remount_sb() directly in emergency_remount()?
>> Avoid blocking emergency_remount()?
>
>Umm... because it's called from interrupt handler?  Right?

Ah, this is true, sysrq can be both triggered by keyboard and
/proc/sysrq-trigger.

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: run emergency remount on dedicated workqueue
  2010-05-27  9:57 ` [PATCH] fs: run emergency remount on dedicated workqueue Tejun Heo
  2010-05-27 14:59   ` Américo Wang
@ 2010-06-01 23:46   ` Andrew Morton
  2010-06-01 23:57     ` Linus Torvalds
  2010-06-02  1:02     ` Dave Young
  1 sibling, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2010-06-01 23:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Howells, davem, jens.axboe, linux-kernel, torvalds, viro,
	Nick Piggin

On Thu, 27 May 2010 11:57:23 +0200
Tejun Heo <tj@kernel.org> wrote:

> Commit fa4b9074cd8428958c2adf9dc0c831f46e27c193 made s_umount depend
> on keventd;

For a while I thought you had the wrong commit ID, but I worked it out!

Please, always quote the patch title rather than a bare commit ID.  The
usual form is

    fa4b9074cd8428958c2adf9dc0c831f46e27c193 ("buffer: make
    invalidate_bdev() drain all percpu LRU add caches:)

The main reason for this is so that people can more reliably and simply
identify the patch within a different tree.  I think.

> however, emergency remount schedules works to keventd
> which grabs s_umount creating a circular dependency.  Run emergency
> remount on a separate workqueue to break it.
> 
> ...
>
> index 69688b1..1ada607 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -575,6 +575,11 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
>  	return 0;
>  }
> 
> +/*
> + * For emergency remount
> + */
> +static struct workqueue_struct *emergency_remount_wq;
> +
>  static void do_emergency_remount(struct work_struct *work)
>  {
>  	struct super_block *sb, *n;
> @@ -605,13 +610,25 @@ void emergency_remount(void)
>  {
>  	struct work_struct *work;
> 
> +	if (!emergency_remount_wq)
> +		return;
> +
>  	work = kmalloc(sizeof(*work), GFP_ATOMIC);
>  	if (work) {
>  		INIT_WORK(work, do_emergency_remount);
> -		schedule_work(work);
> +		queue_work(emergency_remount_wq, work);
>  	}
>  }
> 
> +static int __init emergency_remount_init(void)
> +{
> +	emergency_remount_wq = create_singlethread_workqueue("emerg-remount");
> +	if (!emergency_remount_wq)
> +		pr_warn("failed to create emergency remount workqueue\n");
> +	return 0;
> +}
> +subsys_initcall(emergency_remount_init);
> +
>  /*
>   * Unnamed block devices are dummy devices used by virtual
>   * filesystems which don't use real block-devices.  -- jrs

gaah.  Do we really want to add Yet Another Kernel Thread just for that
dopey sysrq-U thing?

I assume (coz you didn't tell us) that it generates a lockdep spew? 
Perhaps it'd be better to just suppress that somehow rather than this...

And if we _do_ end up adding a new kernel thread for this, maybe it
would be better to use that thread for lru_add_drain_all() rather than
within the dopey do_emergency_remount(), so as to reduce the likelihood
that we'll need to add even more kernel threads to solve the same
problem elsewhere?  But this would require a new kernel thread on each
CPU, grr.

Another possibility might be to change lru_add_drain_all() to use IPI
interrupts rather than schedule_on_each_cpu().  That would greatly
speed up lru_add_drain_all().  I don't recall why we did it that way
and I don't immediately see a reason not to.  A few things in core mm
would need to be changed from spin_lock_irq() to spin_lock_irqsave().

But I do have vague memories that there was a reason for it.

<It's a huge PITA locating the commit which initially added
lru_add_drain_all()>

<ten minutes later>

: tree 05d7615894131a368fc4943f641b11acdd2ae694
: parent e236a166b2bc437769a9b8b5d19186a3761bde48
: author Nick Piggin <npiggin@suse.de> Thu, 19 Jan 2006 09:42:27 -0800
: committer Linus Torvalds <torvalds@g5.osdl.org> Thu, 19 Jan 2006 11:20:17 -0800
: 
: [PATCH] mm: migration page refcounting fix
: 
: Migration code currently does not take a reference to target page
: properly, so between unlocking the pte and trying to take a new
: reference to the page with isolate_lru_page, anything could happen to
: it.
: 
: Fix this by holding the pte lock until we get a chance to elevate the
: refcount.
: 
: Other small cleanups while we're here.

It didn't tell us.

<looks in the linux-mm archives>

Nope, no rationale is provided there either.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: run emergency remount on dedicated workqueue
  2010-06-01 23:46   ` Andrew Morton
@ 2010-06-01 23:57     ` Linus Torvalds
  2010-06-02  0:13       ` Tejun Heo
  2010-06-02  1:02     ` Dave Young
  1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2010-06-01 23:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tejun Heo, David Howells, davem, jens.axboe, linux-kernel, viro,
	Nick Piggin



On Tue, 1 Jun 2010, Andrew Morton wrote:
> 
> Please, always quote the patch title rather than a bare commit ID.  The
> usual form is
> 
>     fa4b9074cd8428958c2adf9dc0c831f46e27c193 ("buffer: make
>     invalidate_bdev() drain all percpu LRU add caches:)
> 
> The main reason for this is so that people can more reliably and simply
> identify the patch within a different tree.  I think.

Absolutely. Also, I think it's usually more readable to quote just the 
first 12 hex digits of the SHA1 - that's still going to be perfectly 
unique in any practical situation, and makes it way easier to flow the 
text to be readable.

> gaah.  Do we really want to add Yet Another Kernel Thread just for that
> dopey sysrq-U thing?

I do have to agree that it's disgusting. Can't we use an existing thread 
(slow-work?) or something like that?

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: run emergency remount on dedicated workqueue
  2010-06-01 23:57     ` Linus Torvalds
@ 2010-06-02  0:13       ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2010-06-02  0:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, David Howells, davem, jens.axboe, linux-kernel,
	viro, Nick Piggin

Hello,

On 06/02/2010 01:57 AM, Linus Torvalds wrote:
> On Tue, 1 Jun 2010, Andrew Morton wrote:
>>
>> Please, always quote the patch title rather than a bare commit ID.  The
>> usual form is
>>
>>     fa4b9074cd8428958c2adf9dc0c831f46e27c193 ("buffer: make
>>     invalidate_bdev() drain all percpu LRU add caches:)
>>
>> The main reason for this is so that people can more reliably and simply
>> identify the patch within a different tree.  I think.
> 
> Absolutely. Also, I think it's usually more readable to quote just the 
> first 12 hex digits of the SHA1 - that's still going to be perfectly 
> unique in any practical situation, and makes it way easier to flow the 
> text to be readable.

Alright, will do so from now on.

>> gaah.  Do we really want to add Yet Another Kernel Thread just for that
>> dopey sysrq-U thing?
> 
> I do have to agree that it's disgusting. Can't we use an existing thread 
> (slow-work?) or something like that?

The dedicated workqueue can go away with cmwq.  As it's a temporary
measure until then, I wanted to keep it simple.  Would it be okay if I
note that the dedicated workqueue will go away soonish in the patch
description and comment?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: run emergency remount on dedicated workqueue
  2010-06-01 23:46   ` Andrew Morton
  2010-06-01 23:57     ` Linus Torvalds
@ 2010-06-02  1:02     ` Dave Young
  2010-06-02  1:57       ` Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Young @ 2010-06-02  1:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tejun Heo, David Howells, davem, jens.axboe, linux-kernel,
	torvalds, viro, Nick Piggin

On Wed, Jun 2, 2010 at 7:46 AM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 27 May 2010 11:57:23 +0200
> Tejun Heo <tj@kernel.org> wrote:
>
>> Commit fa4b9074cd8428958c2adf9dc0c831f46e27c193 made s_umount depend
>> on keventd;
>
> For a while I thought you had the wrong commit ID, but I worked it out!
>
> Please, always quote the patch title rather than a bare commit ID.  The
> usual form is
>
>    fa4b9074cd8428958c2adf9dc0c831f46e27c193 ("buffer: make
>    invalidate_bdev() drain all percpu LRU add caches:)
>
> The main reason for this is so that people can more reliably and simply
> identify the patch within a different tree.  I think.
>
>> however, emergency remount schedules works to keventd
>> which grabs s_umount creating a circular dependency.  Run emergency
>> remount on a separate workqueue to break it.
>>
>> ...
>>
>> index 69688b1..1ada607 100644
>> --- a/fs/super.c
>> +++ b/fs/super.c
>> @@ -575,6 +575,11 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
>>       return 0;
>>  }
>>
>> +/*
>> + * For emergency remount
>> + */
>> +static struct workqueue_struct *emergency_remount_wq;
>> +
>>  static void do_emergency_remount(struct work_struct *work)
>>  {
>>       struct super_block *sb, *n;
>> @@ -605,13 +610,25 @@ void emergency_remount(void)
>>  {
>>       struct work_struct *work;
>>
>> +     if (!emergency_remount_wq)
>> +             return;
>> +
>>       work = kmalloc(sizeof(*work), GFP_ATOMIC);
>>       if (work) {
>>               INIT_WORK(work, do_emergency_remount);
>> -             schedule_work(work);
>> +             queue_work(emergency_remount_wq, work);
>>       }
>>  }
>>
>> +static int __init emergency_remount_init(void)
>> +{
>> +     emergency_remount_wq = create_singlethread_workqueue("emerg-remount");
>> +     if (!emergency_remount_wq)
>> +             pr_warn("failed to create emergency remount workqueue\n");
>> +     return 0;
>> +}
>> +subsys_initcall(emergency_remount_init);
>> +
>>  /*
>>   * Unnamed block devices are dummy devices used by virtual
>>   * filesystems which don't use real block-devices.  -- jrs
>
> gaah.  Do we really want to add Yet Another Kernel Thread just for that
> dopey sysrq-U thing?
>
> I assume (coz you didn't tell us) that it generates a lockdep spew?
> Perhaps it'd be better to just suppress that somehow rather than this...
>
> And if we _do_ end up adding a new kernel thread for this, maybe it
> would be better to use that thread for lru_add_drain_all() rather than
> within the dopey do_emergency_remount(), so as to reduce the likelihood
> that we'll need to add even more kernel threads to solve the same
> problem elsewhere?  But this would require a new kernel thread on each
> CPU, grr.
>
> Another possibility might be to change lru_add_drain_all() to use IPI
> interrupts rather than schedule_on_each_cpu().  That would greatly
> speed up lru_add_drain_all().  I don't recall why we did it that way
> and I don't immediately see a reason not to.  A few things in core mm
> would need to be changed from spin_lock_irq() to spin_lock_irqsave().
>
> But I do have vague memories that there was a reason for it.
>
> <It's a huge PITA locating the commit which initially added
> lru_add_drain_all()>
>
> <ten minutes later>
>
> : tree 05d7615894131a368fc4943f641b11acdd2ae694
> : parent e236a166b2bc437769a9b8b5d19186a3761bde48
> : author Nick Piggin <npiggin@suse.de> Thu, 19 Jan 2006 09:42:27 -0800
> : committer Linus Torvalds <torvalds@g5.osdl.org> Thu, 19 Jan 2006 11:20:17 -0800
> :
> : [PATCH] mm: migration page refcounting fix
> :
> : Migration code currently does not take a reference to target page
> : properly, so between unlocking the pte and trying to take a new
> : reference to the page with isolate_lru_page, anything could happen to
> : it.
> :
> : Fix this by holding the pte lock until we get a chance to elevate the
> : refcount.
> :
> : Other small cleanups while we're here.
>
> It didn't tell us.
>
> <looks in the linux-mm archives>
>
> Nope, no rationale is provided there either.

Maybe this thread?

http://lkml.org/lkml/2008/10/23/226

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
Regards
dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fs: run emergency remount on dedicated workqueue
  2010-06-02  1:02     ` Dave Young
@ 2010-06-02  1:57       ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2010-06-02  1:57 UTC (permalink / raw)
  To: Dave Young
  Cc: Tejun Heo, David Howells, davem, jens.axboe, linux-kernel,
	torvalds, viro, Nick Piggin, linux-mm

On Wed, 2 Jun 2010 09:02:40 +0800 Dave Young <hidave.darkstar@gmail.com> wrote:

> ...
>
> > Another possibility might be to change lru_add_drain_all() to use IPI
> > interrupts rather than schedule_on_each_cpu(). __That would greatly
> > speed up lru_add_drain_all(). __I don't recall why we did it that way
> > and I don't immediately see a reason not to. __A few things in core mm
> > would need to be changed from spin_lock_irq() to spin_lock_irqsave().
> >
> > But I do have vague memories that there was a reason for it.
> >
> > <It's a huge PITA locating the commit which initially added
> > lru_add_drain_all()>
> >
> > <ten minutes later>
> >
> > : tree 05d7615894131a368fc4943f641b11acdd2ae694
> > : parent e236a166b2bc437769a9b8b5d19186a3761bde48
> > : author Nick Piggin <npiggin@suse.de> Thu, 19 Jan 2006 09:42:27 -0800
> > : committer Linus Torvalds <torvalds@g5.osdl.org> Thu, 19 Jan 2006 11:20:17 -0800
> > :
> > : [PATCH] mm: migration page refcounting fix
> > :
> > : Migration code currently does not take a reference to target page
> > : properly, so between unlocking the pte and trying to take a new
> > : reference to the page with isolate_lru_page, anything could happen to
> > : it.
> > :
> > : Fix this by holding the pte lock until we get a chance to elevate the
> > : refcount.
> > :
> > : Other small cleanups while we're here.
> >
> > It didn't tell us.
> >
> > <looks in the linux-mm archives>
> >
> > Nope, no rationale is provided there either.
> 
> Maybe this thread?
> 
> http://lkml.org/lkml/2008/10/23/226

Close.  There's some talk there of using smp_call_function() (actually
on_each_cpu()) within lru_add_drain_all(), but nobody seems to have
tried it.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-06-02  1:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-26 15:01 Change to invalidate_bdev() may break emergency remount R/O David Howells
2010-05-27  9:57 ` [PATCH] fs: run emergency remount on dedicated workqueue Tejun Heo
2010-05-27 14:59   ` Américo Wang
2010-05-27 17:03     ` Tejun Heo
2010-05-28  6:46       ` Américo Wang
2010-06-01 23:46   ` Andrew Morton
2010-06-01 23:57     ` Linus Torvalds
2010-06-02  0:13       ` Tejun Heo
2010-06-02  1:02     ` Dave Young
2010-06-02  1:57       ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).