* [PATCH] swap migration: Fix lru drain
@ 2005-12-07 21:55 Christoph Lameter
2005-12-07 23:27 ` Nick Piggin
2005-12-08 0:13 ` Andrew Morton
0 siblings, 2 replies; 12+ messages in thread
From: Christoph Lameter @ 2005-12-07 21:55 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, lhms-devel
isolate_page() currently uses an IPI to notify other processors that the lru
caches need to be drained if the page cannot be found on the LRU. The IPI
interrupt may interrupt a processor that is just processing lru requests
and cause a race condition.
This patch introduces a new function run_on_each_cpu() that uses the keventd()
to run the LRU draining on each processor. Processors disable preemption
when dealing the LRU caches (these are per processor) and thus executing
LRU draining from another process is safe.
Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
condition.
This makes the
preserve-irq-status-in-release_pages-__pagevec_lru_add.patch
in Andrews tree no longer necessary.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.15-rc5-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/mm/vmscan.c 2005-12-05 11:32:21.000000000 -0800
+++ linux-2.6.15-rc5-mm1/mm/vmscan.c 2005-12-05 13:02:27.000000000 -0800
@@ -1038,7 +1038,7 @@ redo:
* Maybe this page is still waiting for a cpu to drain it
* from one of the lru lists?
*/
- on_each_cpu(lru_add_drain_per_cpu, NULL, 0, 1);
+ schedule_on_each_cpu(lru_add_drain_per_cpu, NULL);
if (PageLRU(page))
goto redo;
}
Index: linux-2.6.15-rc5-mm1/include/linux/workqueue.h
===================================================================
--- linux-2.6.15-rc5-mm1.orig/include/linux/workqueue.h 2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5-mm1/include/linux/workqueue.h 2005-12-05 13:02:07.000000000 -0800
@@ -65,6 +65,7 @@ extern int FASTCALL(schedule_work(struct
extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
extern int schedule_delayed_work_on(int cpu, struct work_struct *work, unsigned long delay);
+extern void schedule_on_each_cpu(void (*func)(void *info), void *info);
extern void flush_scheduled_work(void);
extern int current_is_keventd(void);
extern int keventd_up(void);
Index: linux-2.6.15-rc5-mm1/kernel/workqueue.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/workqueue.c 2005-12-05 11:15:24.000000000 -0800
+++ linux-2.6.15-rc5-mm1/kernel/workqueue.c 2005-12-06 17:50:44.000000000 -0800
@@ -424,6 +424,19 @@ int schedule_delayed_work_on(int cpu,
return ret;
}
+void schedule_on_each_cpu(void (*func) (void *info), void *info)
+{
+ int cpu;
+ struct work_struct * work = kmalloc(NR_CPUS * sizeof(struct work_struct), GFP_KERNEL);
+
+ for_each_online_cpu(cpu) {
+ INIT_WORK(work + cpu, func, info);
+ __queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), work + cpu);
+ }
+ flush_workqueue(keventd_wq);
+ kfree(work);
+}
+
void flush_scheduled_work(void)
{
flush_workqueue(keventd_wq);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] swap migration: Fix lru drain
2005-12-07 21:55 [PATCH] swap migration: Fix lru drain Christoph Lameter
@ 2005-12-07 23:27 ` Nick Piggin
2005-12-08 0:43 ` Christoph Lameter
2005-12-08 0:13 ` Andrew Morton
1 sibling, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2005-12-07 23:27 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-kernel, lhms-devel
Christoph Lameter wrote:
> isolate_page() currently uses an IPI to notify other processors that the lru
> caches need to be drained if the page cannot be found on the LRU. The IPI
> interrupt may interrupt a processor that is just processing lru requests
> and cause a race condition.
>
> This patch introduces a new function run_on_each_cpu() that uses the keventd()
> to run the LRU draining on each processor. Processors disable preemption
> when dealing the LRU caches (these are per processor) and thus executing
> LRU draining from another process is safe.
>
Couple of comments:
> ===================================================================
> --- linux-2.6.15-rc5-mm1.orig/kernel/workqueue.c 2005-12-05 11:15:24.000000000 -0800
> +++ linux-2.6.15-rc5-mm1/kernel/workqueue.c 2005-12-06 17:50:44.000000000 -0800
> @@ -424,6 +424,19 @@ int schedule_delayed_work_on(int cpu,
> return ret;
> }
>
> +void schedule_on_each_cpu(void (*func) (void *info), void *info)
> +{
> + int cpu;
> + struct work_struct * work = kmalloc(NR_CPUS * sizeof(struct work_struct), GFP_KERNEL);
> +
Do we need a lock_cpu_hotplug() around here?
> + for_each_online_cpu(cpu) {
> + INIT_WORK(work + cpu, func, info);
> + __queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), work + cpu);
> + }
> + flush_workqueue(keventd_wq);
> + kfree(work);
> +}
> +
Can't this deadlock if 2 CPUs each send work to the other?
--
SUSE Labs, Novell Inc.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] swap migration: Fix lru drain
2005-12-07 21:55 [PATCH] swap migration: Fix lru drain Christoph Lameter
2005-12-07 23:27 ` Nick Piggin
@ 2005-12-08 0:13 ` Andrew Morton
2005-12-08 0:41 ` Christoph Lameter
1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2005-12-08 0:13 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-kernel, lhms-devel
Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> isolate_page()
There's no such function. You're referring to isolate_lru_page().
btw, lru_add_drain() appears to be identical to lru_drain_cache().
> currently uses an IPI to notify other processors that the lru
> caches need to be drained if the page cannot be found on the LRU. The IPI
> interrupt may interrupt a processor that is just processing lru requests
> and cause a race condition.
>
> This patch introduces a new function run_on_each_cpu() that uses the keventd()
> to run the LRU draining on each processor. Processors disable preemption
> when dealing the LRU caches (these are per processor) and thus executing
> LRU draining from another process is safe.
>
> Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
> condition.
>
> This makes the
>
> preserve-irq-status-in-release_pages-__pagevec_lru_add.patch
>
> in Andrews tree no longer necessary.
Why not just extend the irq protection into lru_cache_add[_active]() and
lru_add_drain()?
Answer: because
preserve-irq-status-in-release_pages-__pagevec_lru_add.patch sucks, and
extending it in this manner sucks more.
Being able to push everything up to process context as you're proposing
puts all the suckiness into this slowpath, so fine.
> +void schedule_on_each_cpu(void (*func) (void *info), void *info)
> +{
> + int cpu;
> + struct work_struct * work = kmalloc(NR_CPUS * sizeof(struct work_struct), GFP_KERNEL);
> +
> + for_each_online_cpu(cpu) {
> + INIT_WORK(work + cpu, func, info);
> + __queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), work + cpu);
> + }
> + flush_workqueue(keventd_wq);
> + kfree(work);
> +}
Normally it's poor form for a library function to assume it can use
GFP_KERNEL. But in this case, the allocation has such a huge upper-bound
that there's no reasonable alternative, so OK.
kmalloc() can return NULL, y'know.
80-col xterms, please.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] swap migration: Fix lru drain
2005-12-08 0:13 ` Andrew Morton
@ 2005-12-08 0:41 ` Christoph Lameter
2005-12-08 0:57 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Lameter @ 2005-12-08 0:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, lhms-devel
On Wed, 7 Dec 2005, Andrew Morton wrote:
> > in Andrews tree no longer necessary.
>
> Why not just extend the irq protection into lru_cache_add[_active]() and
> lru_add_drain()?
>
> Answer: because
> preserve-irq-status-in-release_pages-__pagevec_lru_add.patch sucks, and
> extending it in this manner sucks more.
The concern was that doing so would mean disabling interrupts in more
cases.
> Being able to push everything up to process context as you're proposing
> puts all the suckiness into this slowpath, so fine.
Right.
> Normally it's poor form for a library function to assume it can use
> GFP_KERNEL. But in this case, the allocation has such a huge upper-bound
> that there's no reasonable alternative, so OK.
flush_workqueue() can sleep. gfp atomic is not a possbility.
> kmalloc() can return NULL, y'know.
Fixed. Here is a new version:
isolate_lru_page() currently uses an IPI to notify other processors that
the lru caches need to be drained if the page cannot be found on the LRU.
The IPI interrupt may interrupt a processor that is just processing lru
requests and cause a race condition.
This patch introduces a new function run_on_each_cpu() that uses the keventd()
to run the LRU draining on each processor. Processors disable preemption
when dealing the LRU caches (these are per processor) and thus executing
LRU draining from another process is safe.
Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
condition.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.15-rc5-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/mm/vmscan.c 2005-12-05 11:32:21.000000000 -0800
+++ linux-2.6.15-rc5-mm1/mm/vmscan.c 2005-12-06 18:03:24.000000000 -0800
@@ -1038,7 +1038,7 @@ redo:
* Maybe this page is still waiting for a cpu to drain it
* from one of the lru lists?
*/
- on_each_cpu(lru_add_drain_per_cpu, NULL, 0, 1);
+ schedule_on_each_cpu(lru_add_drain_per_cpu, NULL);
if (PageLRU(page))
goto redo;
}
Index: linux-2.6.15-rc5-mm1/include/linux/workqueue.h
===================================================================
--- linux-2.6.15-rc5-mm1.orig/include/linux/workqueue.h 2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5-mm1/include/linux/workqueue.h 2005-12-07 16:29:47.000000000 -0800
@@ -65,6 +65,7 @@ extern int FASTCALL(schedule_work(struct
extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
extern int schedule_delayed_work_on(int cpu, struct work_struct *work, unsigned long delay);
+extern int schedule_on_each_cpu(void (*func)(void *info), void *info);
extern void flush_scheduled_work(void);
extern int current_is_keventd(void);
extern int keventd_up(void);
Index: linux-2.6.15-rc5-mm1/kernel/workqueue.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/workqueue.c 2005-12-05 11:15:24.000000000 -0800
+++ linux-2.6.15-rc5-mm1/kernel/workqueue.c 2005-12-07 16:38:54.000000000 -0800
@@ -424,6 +424,25 @@ int schedule_delayed_work_on(int cpu,
return ret;
}
+int schedule_on_each_cpu(void (*func) (void *info), void *info)
+{
+ int cpu;
+ struct work_struct *work;
+
+ work = kmalloc(NR_CPUS * sizeof(struct work_struct), GFP_KERNEL);
+
+ if (!work)
+ return 0;
+ for_each_online_cpu(cpu) {
+ INIT_WORK(work + cpu, func, info);
+ __queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu),
+ work + cpu);
+ }
+ flush_workqueue(keventd_wq);
+ kfree(work);
+ return 1;
+}
+
void flush_scheduled_work(void)
{
flush_workqueue(keventd_wq);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] swap migration: Fix lru drain
2005-12-07 23:27 ` Nick Piggin
@ 2005-12-08 0:43 ` Christoph Lameter
2005-12-08 0:58 ` Andrew Morton
2005-12-08 0:59 ` Nick Piggin
0 siblings, 2 replies; 12+ messages in thread
From: Christoph Lameter @ 2005-12-08 0:43 UTC (permalink / raw)
To: Nick Piggin; +Cc: akpm, linux-kernel, lhms-devel
On Thu, 8 Dec 2005, Nick Piggin wrote:
> Do we need a lock_cpu_hotplug() around here?
Well, then we may need that lock for each "for_each_online_cpu" use?
> Can't this deadlock if 2 CPUs each send work to the other
Then we would need to fix the workqueue flushing function.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] swap migration: Fix lru drain
2005-12-08 0:41 ` Christoph Lameter
@ 2005-12-08 0:57 ` Andrew Morton
2005-12-08 1:14 ` Christoph Lameter
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2005-12-08 0:57 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-kernel, lhms-devel
Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> +int schedule_on_each_cpu(void (*func) (void *info), void *info)
> +{
> + int cpu;
> + struct work_struct *work;
> +
> + work = kmalloc(NR_CPUS * sizeof(struct work_struct), GFP_KERNEL);
> +
> + if (!work)
> + return 0;
> + for_each_online_cpu(cpu) {
> + INIT_WORK(work + cpu, func, info);
> + __queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu),
> + work + cpu);
> + }
> + flush_workqueue(keventd_wq);
> + kfree(work);
> + return 1;
> +}
I'll change this to return 0 on success, or -ENOMEM. Bit more
conventional, no?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] swap migration: Fix lru drain
2005-12-08 0:43 ` Christoph Lameter
@ 2005-12-08 0:58 ` Andrew Morton
2005-12-08 0:59 ` Nick Piggin
1 sibling, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2005-12-08 0:58 UTC (permalink / raw)
To: Christoph Lameter; +Cc: piggin, linux-kernel, lhms-devel
Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> On Thu, 8 Dec 2005, Nick Piggin wrote:
>
> > Do we need a lock_cpu_hotplug() around here?
>
> Well, then we may need that lock for each "for_each_online_cpu" use?
>
I suppose so..
> > Can't this deadlock if 2 CPUs each send work to the other
>
> Then we would need to fix the workqueue flushing function.
I don't think I can see a deadlock here.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] swap migration: Fix lru drain
2005-12-08 0:43 ` Christoph Lameter
2005-12-08 0:58 ` Andrew Morton
@ 2005-12-08 0:59 ` Nick Piggin
1 sibling, 0 replies; 12+ messages in thread
From: Nick Piggin @ 2005-12-08 0:59 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-kernel, lhms-devel
Christoph Lameter wrote:
>On Thu, 8 Dec 2005, Nick Piggin wrote:
>
>
>>Do we need a lock_cpu_hotplug() around here?
>>
>
>Well, then we may need that lock for each "for_each_online_cpu" use?
>
I think it depends on where and how it is used?
eg. for statistics gathering it doesn't matter so much. In this
case it would seem that you do want an actual online CPU... though
on looking at the workqueue code it seems that some of it would be
racy in a similar way, so perhaps this is handled elsewhere (I
can't see how, though).
>
>
>>Can't this deadlock if 2 CPUs each send work to the other
>>
>
>Then we would need to fix the workqueue flushing function.
>
>
Oh, you're right.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] swap migration: Fix lru drain
2005-12-08 0:57 ` Andrew Morton
@ 2005-12-08 1:14 ` Christoph Lameter
2005-12-08 5:02 ` [Lhms-devel] " KAMEZAWA Hiroyuki
2005-12-08 8:36 ` KAMEZAWA Hiroyuki
0 siblings, 2 replies; 12+ messages in thread
From: Christoph Lameter @ 2005-12-08 1:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, lhms-devel
On Wed, 7 Dec 2005, Andrew Morton wrote:
> I'll change this to return 0 on success, or -ENOMEM. Bit more
> conventional, no?
Ok. That also allows the addition of other error conditions in the future.
Need to revise isolate_lru_page to reflect that.
Index: linux-2.6.15-rc5-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/mm/vmscan.c 2005-12-06 18:03:24.000000000 -0800
+++ linux-2.6.15-rc5-mm1/mm/vmscan.c 2005-12-07 17:11:58.000000000 -0800
@@ -1038,8 +1038,8 @@ redo:
* Maybe this page is still waiting for a cpu to drain it
* from one of the lru lists?
*/
- schedule_on_each_cpu(lru_add_drain_per_cpu, NULL);
- if (PageLRU(page))
+ rc = schedule_on_each_cpu(lru_add_drain_per_cpu, NULL);
+ if (rc == 0 && PageLRU(page))
goto redo;
}
return rc;
Index: linux-2.6.15-rc5-mm1/kernel/workqueue.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/workqueue.c 2005-12-07 16:38:54.000000000 -0800
+++ linux-2.6.15-rc5-mm1/kernel/workqueue.c 2005-12-07 17:13:04.000000000 -0800
@@ -432,7 +432,7 @@ int schedule_on_each_cpu(void (*func) (v
work = kmalloc(NR_CPUS * sizeof(struct work_struct), GFP_KERNEL);
if (!work)
- return 0;
+ return -ENOMEM;
for_each_online_cpu(cpu) {
INIT_WORK(work + cpu, func, info);
__queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu),
@@ -440,7 +440,7 @@ int schedule_on_each_cpu(void (*func) (v
}
flush_workqueue(keventd_wq);
kfree(work);
- return 1;
+ return 0;
}
void flush_scheduled_work(void)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Lhms-devel] Re: [PATCH] swap migration: Fix lru drain
2005-12-08 1:14 ` Christoph Lameter
@ 2005-12-08 5:02 ` KAMEZAWA Hiroyuki
2005-12-08 8:36 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2005-12-08 5:02 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel, lhms-devel
Christoph Lameter wrote:
> On Wed, 7 Dec 2005, Andrew Morton wrote:
>
>
>>I'll change this to return 0 on success, or -ENOMEM. Bit more
>>conventional, no?
>
>
> Ok. That also allows the addition of other error conditions in the future.
> Need to revise isolate_lru_page to reflect that.
>
I think this 'schedule_on_each_cpu(lru_add_drain_per_cpu, NULL);' will be used
by memory-hot-removing code and some other codes.
How about move this to swap.c and name as 'lru_add_drain_all()' ?
(but there is no other users now....)
-- Kame
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Lhms-devel] Re: [PATCH] swap migration: Fix lru drain
2005-12-08 1:14 ` Christoph Lameter
2005-12-08 5:02 ` [Lhms-devel] " KAMEZAWA Hiroyuki
@ 2005-12-08 8:36 ` KAMEZAWA Hiroyuki
2005-12-08 8:43 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2005-12-08 8:36 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel, lhms-devel
Christoph Lameter wrote:
> On Wed, 7 Dec 2005, Andrew Morton wrote:
>
>
>>I'll change this to return 0 on success, or -ENOMEM. Bit more
>>conventional, no?
>
>
> Ok. That also allows the addition of other error conditions in the future.
> Need to revise isolate_lru_page to reflect that.
>
This patch was needed to compile.
-- Kame
==
Index: hotremove-2.6.15-rc5-mm1/include/linux/workqueue.h
===================================================================
--- hotremove-2.6.15-rc5-mm1.orig/include/linux/workqueue.h 2005-12-08 17:32:18.000000000 +0900
+++ hotremove-2.6.15-rc5-mm1/include/linux/workqueue.h 2005-12-08 17:32:43.000000000 +0900
@@ -65,7 +65,7 @@
extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
extern int schedule_delayed_work_on(int cpu, struct work_struct *work, unsigned long delay);
-extern void schedule_on_each_cpu(void (*func)(void *info), void *info);
+extern int schedule_on_each_cpu(void (*func)(void *info), void *info);
extern void flush_scheduled_work(void);
extern int current_is_keventd(void);
extern int keventd_up(void);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Lhms-devel] Re: [PATCH] swap migration: Fix lru drain
2005-12-08 8:36 ` KAMEZAWA Hiroyuki
@ 2005-12-08 8:43 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2005-12-08 8:43 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Christoph Lameter, Andrew Morton, linux-kernel, lhms-devel
KAMEZAWA Hiroyuki wrote:
> Christoph Lameter wrote:
>
>> On Wed, 7 Dec 2005, Andrew Morton wrote:
>>
>>
>>> I'll change this to return 0 on success, or -ENOMEM. Bit more
>>> conventional, no?
>>
>>
>>
>> Ok. That also allows the addition of other error conditions in the
>> future.
>> Need to revise isolate_lru_page to reflect that.
>>
> This patch was needed to compile.
Sorry ,I missed Christoph's previous patch..
ignore a patch which I sent.
-- Kame
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-12-08 8:46 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-07 21:55 [PATCH] swap migration: Fix lru drain Christoph Lameter
2005-12-07 23:27 ` Nick Piggin
2005-12-08 0:43 ` Christoph Lameter
2005-12-08 0:58 ` Andrew Morton
2005-12-08 0:59 ` Nick Piggin
2005-12-08 0:13 ` Andrew Morton
2005-12-08 0:41 ` Christoph Lameter
2005-12-08 0:57 ` Andrew Morton
2005-12-08 1:14 ` Christoph Lameter
2005-12-08 5:02 ` [Lhms-devel] " KAMEZAWA Hiroyuki
2005-12-08 8:36 ` KAMEZAWA Hiroyuki
2005-12-08 8:43 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox