* [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
[not found] <alpine.LFD.1.00.0802151302210.9496@woody.linux-foundation.org>
@ 2008-02-16 6:10 ` Kamalesh Babulal
2008-02-17 19:29 ` Jens Axboe
2008-02-17 20:08 ` Rafael J. Wysocki
0 siblings, 2 replies; 14+ messages in thread
From: Kamalesh Babulal @ 2008-02-16 6:10 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Dhaval Giani, Jens Axboe, Srivatsa Vaddagiri, linuxppc-dev,
Ingo Molnar, Balbir Singh
Hi,
The softlockup is seen from 2.6.25-rc1-git{1,3} and is visible in the 2.6.24-rc2 kernel,
While booting up with the 2.6.25-rc1-git{1,3} and 2.6.25-rc2 kernel(s) on the powerbox
Loading st.ko module
BUG: soft lockup - CPU#1 stuck for 61s! [insmod:379]
NIP: c0000000001b0620 LR: c0000000001a5dcc CTR: 0000000000000040
REGS: c00000077caab8a0 TRAP: 0901 Not tainted (2.6.25-rc2-autotest)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 84004088 XER: 20000000
TASK = c00000077cb450a0[379] 'insmod' THREAD: c00000077caa8000 CPU: 1
GPR00: c00000077c9d4000 c00000077caabb20 c000000000538a40 000000000000000b
GPR04: ffc0000000000000 c00000077e0c0000 0000000000000036 000000000000000a
GPR08: 0040000000000000 c00000077c9d4250 c000000000000000 0000000000000000
GPR12: c00000077c9d4230 c000000000481d00
NIP [c0000000001b0620] .radix_tree_gang_lookup+0x100/0x1e4
LR [c0000000001a5dcc] .call_for_each_cic+0x50/0x10c
Call Trace:
[c00000077caabb20] [c0000000001a5e2c] .call_for_each_cic+0xb0/0x10c (unreliable)
[c00000077caabc60] [c00000000019dba4] .exit_io_context+0xf0/0x110
[c00000077caabcf0] [c000000000061e38] .do_exit+0x820/0x850
[c00000077caabda0] [c000000000061f34] .do_group_exit+0xcc/0xe8
[c00000077caabe30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
7d296214 39290018 e8090000 7caa2038 39290008 2fa00000 409e0018 7caa4215
396b0001 418200cc 424000b8 4bffffdc <79691f24> 7d296214 e9690018 2fab0000
INFO: task insmod:387 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
insmod D 000000001000e144 12144 387 1
Call Trace:
[c00000077cb97600] [c0000000008fae80] 0xc0000000008fae80 (unreliable)
[c00000077cb977d0] [c000000000010c7c] .__switch_to+0x11c/0x154
[c00000077cb97860] [c000000000344498] .schedule+0x5d0/0x6b0
[c00000077cb97950] [c0000000003447d8] .schedule_timeout+0x3c/0xe8
[c00000077cb97a20] [c000000000343d34] .wait_for_common+0x150/0x22c
[c00000077cb97ae0] [c00000000008ef00] .__stop_machine_run+0xbc/0xf0
[c00000077cb97bb0] [c00000000008ef70] .stop_machine_run+0x3c/0x80
[c00000077cb97c50] [c0000000000891f0] .sys_init_module+0x14e4/0x1af4
[c00000077cb97e30] [c00000000000872c] syscall_exit+0x0/0x40
-- 0:conmux-control -- time-stamp -- Feb/15/08 16:04:12 --
INFO: task insmod:387 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
insmod D 000000001000e144 12144 387 1
Call Trace:
[c00000077cb97600] [c0000000008fae80] 0xc0000000008fae80 (unreliable)
[c00000077cb977d0] [c000000000010c7c] .__switch_to+0x11c/0x154
[c00000077cb97860] [c000000000344498] .schedule+0x5d0/0x6b0
[c00000077cb97950] [c0000000003447d8] .schedule_timeout+0x3c/0xe8
[c00000077cb97a20] [c000000000343d34] .wait_for_common+0x150/0x22c
[c00000077cb97ae0] [c00000000008ef00] .__stop_machine_run+0xbc/0xf0
[c00000077cb97bb0] [c00000000008ef70] .stop_machine_run+0x3c/0x80
[c00000077cb97c50] [c0000000000891f0] .sys_init_module+0x14e4/0x1af4
[c00000077cb97e30] [c00000000000872c] syscall_exit+0x0/0x40
-- 0:conmux-control -- time-stamp -- Feb/15/08 16:06:21 --
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-16 6:10 ` [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc Kamalesh Babulal
@ 2008-02-17 19:29 ` Jens Axboe
2008-02-19 8:04 ` KAMEZAWA Hiroyuki
2008-02-17 20:08 ` Rafael J. Wysocki
1 sibling, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2008-02-17 19:29 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Dhaval Giani, Linux Kernel Mailing List, Srivatsa Vaddagiri,
linuxppc-dev, Ingo Molnar, Balbir Singh
On Sat, Feb 16 2008, Kamalesh Babulal wrote:
> Hi,
>
> The softlockup is seen from 2.6.25-rc1-git{1,3} and is visible in the 2.6.24-rc2 kernel,
> While booting up with the 2.6.25-rc1-git{1,3} and 2.6.25-rc2 kernel(s) on the powerbox
>
> Loading st.ko module
> BUG: soft lockup - CPU#1 stuck for 61s! [insmod:379]
> NIP: c0000000001b0620 LR: c0000000001a5dcc CTR: 0000000000000040
> REGS: c00000077caab8a0 TRAP: 0901 Not tainted (2.6.25-rc2-autotest)
> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 84004088 XER: 20000000
> TASK = c00000077cb450a0[379] 'insmod' THREAD: c00000077caa8000 CPU: 1
> GPR00: c00000077c9d4000 c00000077caabb20 c000000000538a40 000000000000000b
> GPR04: ffc0000000000000 c00000077e0c0000 0000000000000036 000000000000000a
> GPR08: 0040000000000000 c00000077c9d4250 c000000000000000 0000000000000000
> GPR12: c00000077c9d4230 c000000000481d00
> NIP [c0000000001b0620] .radix_tree_gang_lookup+0x100/0x1e4
> LR [c0000000001a5dcc] .call_for_each_cic+0x50/0x10c
> Call Trace:
> [c00000077caabb20] [c0000000001a5e2c] .call_for_each_cic+0xb0/0x10c (unreliable)
> [c00000077caabc60] [c00000000019dba4] .exit_io_context+0xf0/0x110
> [c00000077caabcf0] [c000000000061e38] .do_exit+0x820/0x850
> [c00000077caabda0] [c000000000061f34] .do_group_exit+0xcc/0xe8
> [c00000077caabe30] [c00000000000872c] syscall_exit+0x0/0x40
> Instruction dump:
> 7d296214 39290018 e8090000 7caa2038 39290008 2fa00000 409e0018 7caa4215
> 396b0001 418200cc 424000b8 4bffffdc <79691f24> 7d296214 e9690018 2fab0000
It's odd stuff. Could you perhaps try and add some printks to
block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
from radix_tree_gang_lookup() and the pointer value of cics[i] in the
for() loop after the lookup?
How many SCSI devices are online?
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-16 6:10 ` [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc Kamalesh Babulal
2008-02-17 19:29 ` Jens Axboe
@ 2008-02-17 20:08 ` Rafael J. Wysocki
1 sibling, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2008-02-17 20:08 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Jens Axboe, Ingo Molnar, Balbir Singh
On Saturday, 16 of February 2008, Kamalesh Babulal wrote:
> Hi,
Hi,
> The softlockup is seen from 2.6.25-rc1-git{1,3} and is visible in the 2.6.24-rc2 kernel,
> While booting up with the 2.6.25-rc1-git{1,3} and 2.6.25-rc2 kernel(s) on the powerbox
Can you update the Bugzilla entry at:
http://bugzilla.kernel.org/show_bug.cgi?id=9948
with the above information, please?
Rafael
> Loading st.ko module
> BUG: soft lockup - CPU#1 stuck for 61s! [insmod:379]
> NIP: c0000000001b0620 LR: c0000000001a5dcc CTR: 0000000000000040
> REGS: c00000077caab8a0 TRAP: 0901 Not tainted (2.6.25-rc2-autotest)
> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 84004088 XER: 20000000
> TASK = c00000077cb450a0[379] 'insmod' THREAD: c00000077caa8000 CPU: 1
> GPR00: c00000077c9d4000 c00000077caabb20 c000000000538a40 000000000000000b
> GPR04: ffc0000000000000 c00000077e0c0000 0000000000000036 000000000000000a
> GPR08: 0040000000000000 c00000077c9d4250 c000000000000000 0000000000000000
> GPR12: c00000077c9d4230 c000000000481d00
> NIP [c0000000001b0620] .radix_tree_gang_lookup+0x100/0x1e4
> LR [c0000000001a5dcc] .call_for_each_cic+0x50/0x10c
> Call Trace:
> [c00000077caabb20] [c0000000001a5e2c] .call_for_each_cic+0xb0/0x10c (unreliable)
> [c00000077caabc60] [c00000000019dba4] .exit_io_context+0xf0/0x110
> [c00000077caabcf0] [c000000000061e38] .do_exit+0x820/0x850
> [c00000077caabda0] [c000000000061f34] .do_group_exit+0xcc/0xe8
> [c00000077caabe30] [c00000000000872c] syscall_exit+0x0/0x40
> Instruction dump:
> 7d296214 39290018 e8090000 7caa2038 39290008 2fa00000 409e0018 7caa4215
> 396b0001 418200cc 424000b8 4bffffdc <79691f24> 7d296214 e9690018 2fab0000
> INFO: task insmod:387 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> insmod D 000000001000e144 12144 387 1
> Call Trace:
> [c00000077cb97600] [c0000000008fae80] 0xc0000000008fae80 (unreliable)
> [c00000077cb977d0] [c000000000010c7c] .__switch_to+0x11c/0x154
> [c00000077cb97860] [c000000000344498] .schedule+0x5d0/0x6b0
> [c00000077cb97950] [c0000000003447d8] .schedule_timeout+0x3c/0xe8
> [c00000077cb97a20] [c000000000343d34] .wait_for_common+0x150/0x22c
> [c00000077cb97ae0] [c00000000008ef00] .__stop_machine_run+0xbc/0xf0
> [c00000077cb97bb0] [c00000000008ef70] .stop_machine_run+0x3c/0x80
> [c00000077cb97c50] [c0000000000891f0] .sys_init_module+0x14e4/0x1af4
> [c00000077cb97e30] [c00000000000872c] syscall_exit+0x0/0x40
> -- 0:conmux-control -- time-stamp -- Feb/15/08 16:04:12 --
> INFO: task insmod:387 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> insmod D 000000001000e144 12144 387 1
> Call Trace:
> [c00000077cb97600] [c0000000008fae80] 0xc0000000008fae80 (unreliable)
> [c00000077cb977d0] [c000000000010c7c] .__switch_to+0x11c/0x154
> [c00000077cb97860] [c000000000344498] .schedule+0x5d0/0x6b0
> [c00000077cb97950] [c0000000003447d8] .schedule_timeout+0x3c/0xe8
> [c00000077cb97a20] [c000000000343d34] .wait_for_common+0x150/0x22c
> [c00000077cb97ae0] [c00000000008ef00] .__stop_machine_run+0xbc/0xf0
> [c00000077cb97bb0] [c00000000008ef70] .stop_machine_run+0x3c/0x80
> [c00000077cb97c50] [c0000000000891f0] .sys_init_module+0x14e4/0x1af4
> [c00000077cb97e30] [c00000000000872c] syscall_exit+0x0/0x40
> -- 0:conmux-control -- time-stamp -- Feb/15/08 16:06:21 --
--
"Premature optimization is the root of all evil." - Donald Knuth
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-17 19:29 ` Jens Axboe
@ 2008-02-19 8:04 ` KAMEZAWA Hiroyuki
2008-02-19 8:36 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-02-19 8:04 UTC (permalink / raw)
To: Jens Axboe
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Balbir Singh
On Sun, 17 Feb 2008 20:29:13 +0100
Jens Axboe <jens.axboe@oracle.com> wrote:
> It's odd stuff. Could you perhaps try and add some printks to
> block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
> from radix_tree_gang_lookup() and the pointer value of cics[i] in the
> for() loop after the lookup?
>
I met the same issue on ia64/NUMA box.
seems cisc[]->key is NULL and index for radix_tree_gang_lookup() was always '1'.
Attached patch works well for me,
but I don't know much about cfq. please confirm.
Regards,
-Kame
==
cics[]->key can be NULL.
In that case, cics[]->dead_key has key value.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: linux-2.6.25-rc2/block/cfq-iosched.c
===================================================================
--- linux-2.6.25-rc2.orig/block/cfq-iosched.c
+++ linux-2.6.25-rc2/block/cfq-iosched.c
@@ -1171,7 +1171,11 @@ call_for_each_cic(struct io_context *ioc
break;
called += nr;
- index = 1 + (unsigned long) cics[nr - 1]->key;
+
+ if (!cics[nr - 1]->key)
+ index = 1 + (unsigned long) cics[nr - 1]->dead_key;
+ else
+ index = 1 + (unsigned long) cics[nr - 1]->key;
for (i = 0; i < nr; i++)
func(ioc, cics[i]);
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 8:04 ` KAMEZAWA Hiroyuki
@ 2008-02-19 8:36 ` Jens Axboe
2008-02-19 8:47 ` KAMEZAWA Hiroyuki
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Jens Axboe @ 2008-02-19 8:36 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Balbir Singh
On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
> On Sun, 17 Feb 2008 20:29:13 +0100
> Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > It's odd stuff. Could you perhaps try and add some printks to
> > block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
> > from radix_tree_gang_lookup() and the pointer value of cics[i] in the
> > for() loop after the lookup?
> >
> I met the same issue on ia64/NUMA box.
> seems cisc[]->key is NULL and index for radix_tree_gang_lookup() was
> always '1'.
Why does it keep repeating then? If ->key is NULL, the next lookup index
should be 1UL.
But I think the radix 'scan over entire tree' is a bit fragile. This
patch adds a parallel hlist for ease of properly browsing the members,
does that work for you? It compiles, but I haven't booted it here yet...
> Attached patch works well for me, but I don't know much about cfq.
> please confirm.
It doesn't make a lot of sense, I'm afraid.
block/blk-ioc.c | 35 +++++++++++++++--------------------
block/cfq-iosched.c | 37 +++++++++++--------------------------
include/linux/iocontext.h | 2 ++
3 files changed, 28 insertions(+), 46 deletions(-)
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 80245dc..73c7002 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -17,17 +17,13 @@ static struct kmem_cache *iocontext_cachep;
static void cfq_dtor(struct io_context *ioc)
{
- struct cfq_io_context *cic[1];
- int r;
+ if (!hlist_empty(&ioc->cic_list)) {
+ struct cfq_io_context *cic;
- /*
- * We don't have a specific key to lookup with, so use the gang
- * lookup to just retrieve the first item stored. The cfq exit
- * function will iterate the full tree, so any member will do.
- */
- r = radix_tree_gang_lookup(&ioc->radix_root, (void **) cic, 0, 1);
- if (r > 0)
- cic[0]->dtor(ioc);
+ cic = list_entry(ioc->cic_list.first, struct cfq_io_context,
+ cic_list);
+ cic->dtor(ioc);
+ }
}
/*
@@ -57,18 +53,16 @@ EXPORT_SYMBOL(put_io_context);
static void cfq_exit(struct io_context *ioc)
{
- struct cfq_io_context *cic[1];
- int r;
-
rcu_read_lock();
- /*
- * See comment for cfq_dtor()
- */
- r = radix_tree_gang_lookup(&ioc->radix_root, (void **) cic, 0, 1);
- rcu_read_unlock();
- if (r > 0)
- cic[0]->exit(ioc);
+ if (!hlist_empty(&ioc->cic_list)) {
+ struct cfq_io_context *cic;
+
+ cic = list_entry(ioc->cic_list.first, struct cfq_io_context,
+ cic_list);
+ cic->exit(ioc);
+ }
+ rcu_read_unlock();
}
/* Called by the exitting task */
@@ -105,6 +99,7 @@ struct io_context *alloc_io_context(gfp_t gfp_flags, int node)
ret->nr_batch_requests = 0; /* because this is 0 */
ret->aic = NULL;
INIT_RADIX_TREE(&ret->radix_root, GFP_ATOMIC | __GFP_HIGH);
+ INIT_HLIST_HEAD(&ret->cic_list);
ret->ioc_data = NULL;
}
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index ca198e6..62eda3f 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1145,38 +1145,19 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
/*
* Call func for each cic attached to this ioc. Returns number of cic's seen.
*/
-#define CIC_GANG_NR 16
static unsigned int
call_for_each_cic(struct io_context *ioc,
void (*func)(struct io_context *, struct cfq_io_context *))
{
- struct cfq_io_context *cics[CIC_GANG_NR];
- unsigned long index = 0;
- unsigned int called = 0;
- int nr;
+ struct cfq_io_context *cic;
+ struct hlist_node *n;
+ int called = 0;
rcu_read_lock();
-
- do {
- int i;
-
- /*
- * Perhaps there's a better way - this just gang lookups from
- * 0 to the end, restarting after each CIC_GANG_NR from the
- * last key + 1.
- */
- nr = radix_tree_gang_lookup(&ioc->radix_root, (void **) cics,
- index, CIC_GANG_NR);
- if (!nr)
- break;
-
- called += nr;
- index = 1 + (unsigned long) cics[nr - 1]->key;
-
- for (i = 0; i < nr; i++)
- func(ioc, cics[i]);
- } while (nr == CIC_GANG_NR);
-
+ hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
+ func(ioc, cic);
+ called++;
+ }
rcu_read_unlock();
return called;
@@ -1190,6 +1171,7 @@ static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
spin_lock_irqsave(&ioc->lock, flags);
radix_tree_delete(&ioc->radix_root, cic->dead_key);
+ hlist_del_rcu(&cic->cic_list);
spin_unlock_irqrestore(&ioc->lock, flags);
kmem_cache_free(cfq_ioc_pool, cic);
@@ -1280,6 +1262,7 @@ cfq_alloc_io_context(struct cfq_data *cfqd, gfp_t gfp_mask)
if (cic) {
cic->last_end_request = jiffies;
INIT_LIST_HEAD(&cic->queue_list);
+ INIT_HLIST_NODE(&cic->cic_list);
cic->dtor = cfq_free_io_context;
cic->exit = cfq_exit_io_context;
elv_ioc_count_inc(ioc_count);
@@ -1501,6 +1484,7 @@ cfq_drop_dead_cic(struct cfq_data *cfqd, struct io_context *ioc,
rcu_assign_pointer(ioc->ioc_data, NULL);
radix_tree_delete(&ioc->radix_root, (unsigned long) cfqd);
+ hlist_del_rcu(&cic->cic_list);
spin_unlock_irqrestore(&ioc->lock, flags);
cfq_cic_free(cic);
@@ -1561,6 +1545,7 @@ static int cfq_cic_link(struct cfq_data *cfqd, struct io_context *ioc,
spin_lock_irqsave(&ioc->lock, flags);
ret = radix_tree_insert(&ioc->radix_root,
(unsigned long) cfqd, cic);
+ hlist_add_head_rcu(&cic->cic_list, &ioc->cic_list);
spin_unlock_irqrestore(&ioc->lock, flags);
radix_tree_preload_end();
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index 593b222..1b4ccf2 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -50,6 +50,7 @@ struct cfq_io_context {
sector_t seek_mean;
struct list_head queue_list;
+ struct hlist_node cic_list;
void (*dtor)(struct io_context *); /* destructor */
void (*exit)(struct io_context *); /* called on task exit */
@@ -77,6 +78,7 @@ struct io_context {
struct as_io_context *aic;
struct radix_tree_root radix_root;
+ struct hlist_head cic_list;
void *ioc_data;
};
--
Jens Axboe
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 8:36 ` Jens Axboe
@ 2008-02-19 8:47 ` KAMEZAWA Hiroyuki
2008-02-19 8:58 ` Jens Axboe
2008-02-19 9:02 ` KAMEZAWA Hiroyuki
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-02-19 8:47 UTC (permalink / raw)
To: Jens Axboe
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Balbir Singh
On Tue, 19 Feb 2008 09:36:34 +0100
Jens Axboe <jens.axboe@oracle.com> wrote:
> On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
> > On Sun, 17 Feb 2008 20:29:13 +0100
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> >
> > > It's odd stuff. Could you perhaps try and add some printks to
> > > block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
> > > from radix_tree_gang_lookup() and the pointer value of cics[i] in the
> > > for() loop after the lookup?
> > >
> > I met the same issue on ia64/NUMA box.
> > seems cisc[]->key is NULL and index for radix_tree_gang_lookup() was
> > always '1'.
>
> Why does it keep repeating then? If ->key is NULL, the next lookup index
> should be 1UL.
>
when I inserted printk here
==
for (i = 0; i < nr; i++)
func(ioc, cics[i]);
printk("%d %lx\n", nr, index);
==
index was always "1" and nr was always 32.
So, cics[31]->key was always NULL when index=1 is passed to radix_tree_gang_lookup().
> But I think the radix 'scan over entire tree' is a bit fragile. This
> patch adds a parallel hlist for ease of properly browsing the members,
> does that work for you? It compiles, but I haven't booted it here yet...
>
will try. please wait a bit.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 8:47 ` KAMEZAWA Hiroyuki
@ 2008-02-19 8:58 ` Jens Axboe
2008-02-19 9:07 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2008-02-19 8:58 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Balbir Singh
On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
> On Tue, 19 Feb 2008 09:36:34 +0100
> Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
> > > On Sun, 17 Feb 2008 20:29:13 +0100
> > > Jens Axboe <jens.axboe@oracle.com> wrote:
> > >
> > > > It's odd stuff. Could you perhaps try and add some printks to
> > > > block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
> > > > from radix_tree_gang_lookup() and the pointer value of cics[i] in the
> > > > for() loop after the lookup?
> > > >
> > > I met the same issue on ia64/NUMA box.
> > > seems cisc[]->key is NULL and index for radix_tree_gang_lookup() was
> > > always '1'.
> >
> > Why does it keep repeating then? If ->key is NULL, the next lookup index
> > should be 1UL.
> >
> when I inserted printk here
> ==
> for (i = 0; i < nr; i++)
> func(ioc, cics[i]);
> printk("%d %lx\n", nr, index);
> ==
> index was always "1" and nr was always 32.
>
> So, cics[31]->key was always NULL when index=1 is passed to
> radix_tree_gang_lookup().
Hang on, it returned 32? It should not return more than 16, since that
is what we have room for and asked for. Using ->dead_key when ->key is
NULL is correct btw, since that is the correct location in the tree once
the process has exited. But that should not happen until AFTER the
func() call, so I still think the list patch is safer.
> > But I think the radix 'scan over entire tree' is a bit fragile. This
> > patch adds a parallel hlist for ease of properly browsing the
> > members, does that work for you? It compiles, but I haven't booted
> > it here yet...
> >
> will try. please wait a bit.
It boots here, so at least it passes normal sanity tests. It should
solve your problem as well, hopefully.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 9:02 ` KAMEZAWA Hiroyuki
@ 2008-02-19 9:01 ` Jens Axboe
0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2008-02-19 9:01 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Balbir Singh
On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
> On Tue, 19 Feb 2008 09:36:34 +0100
> Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
> > > On Sun, 17 Feb 2008 20:29:13 +0100
> > > Jens Axboe <jens.axboe@oracle.com> wrote:
> > >
> > > > It's odd stuff. Could you perhaps try and add some printks to
> > > > block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
> > > > from radix_tree_gang_lookup() and the pointer value of cics[i] in the
> > > > for() loop after the lookup?
> > > >
> > > I met the same issue on ia64/NUMA box.
> > > seems cisc[]->key is NULL and index for radix_tree_gang_lookup() was
> > > always '1'.
> >
> > Why does it keep repeating then? If ->key is NULL, the next lookup index
> > should be 1UL.
> >
> > But I think the radix 'scan over entire tree' is a bit fragile. This
> > patch adds a parallel hlist for ease of properly browsing the members,
> > does that work for you? It compiles, but I haven't booted it here yet...
> >
> Works well for me and my box booted !
Super, I'll get it upstream. Thanks for testing and debugging!
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 8:36 ` Jens Axboe
2008-02-19 8:47 ` KAMEZAWA Hiroyuki
@ 2008-02-19 9:02 ` KAMEZAWA Hiroyuki
2008-02-19 9:01 ` Jens Axboe
2008-02-19 13:19 ` Kamalesh Babulal
2008-02-22 7:24 ` Andrew Morton
3 siblings, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-02-19 9:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Balbir Singh
On Tue, 19 Feb 2008 09:36:34 +0100
Jens Axboe <jens.axboe@oracle.com> wrote:
> On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
> > On Sun, 17 Feb 2008 20:29:13 +0100
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> >
> > > It's odd stuff. Could you perhaps try and add some printks to
> > > block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
> > > from radix_tree_gang_lookup() and the pointer value of cics[i] in the
> > > for() loop after the lookup?
> > >
> > I met the same issue on ia64/NUMA box.
> > seems cisc[]->key is NULL and index for radix_tree_gang_lookup() was
> > always '1'.
>
> Why does it keep repeating then? If ->key is NULL, the next lookup index
> should be 1UL.
>
> But I think the radix 'scan over entire tree' is a bit fragile. This
> patch adds a parallel hlist for ease of properly browsing the members,
> does that work for you? It compiles, but I haven't booted it here yet...
>
Works well for me and my box booted !
Thanks,
-Kame
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 8:58 ` Jens Axboe
@ 2008-02-19 9:07 ` KAMEZAWA Hiroyuki
2008-02-19 9:09 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-02-19 9:07 UTC (permalink / raw)
To: Jens Axboe
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Balbir Singh
On Tue, 19 Feb 2008 09:58:38 +0100
Jens Axboe <jens.axboe@oracle.com> wrote:
> > when I inserted printk here
> > ==
> > for (i = 0; i < nr; i++)
> > func(ioc, cics[i]);
> > printk("%d %lx\n", nr, index);
> > ==
> > index was always "1" and nr was always 32.
> >
> > So, cics[31]->key was always NULL when index=1 is passed to
> > radix_tree_gang_lookup().
>
> Hang on, it returned 32? It should not return more than 16, since that
> is what we have room for and asked for.
sorry. Of course, it was 16 ;(
your patch works well. thank you.
-Kame
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 9:07 ` KAMEZAWA Hiroyuki
@ 2008-02-19 9:09 ` Jens Axboe
0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2008-02-19 9:09 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Balbir Singh
On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
> On Tue, 19 Feb 2008 09:58:38 +0100
> Jens Axboe <jens.axboe@oracle.com> wrote:
> > > when I inserted printk here
> > > ==
> > > for (i = 0; i < nr; i++)
> > > func(ioc, cics[i]);
> > > printk("%d %lx\n", nr, index);
> > > ==
> > > index was always "1" and nr was always 32.
> > >
> > > So, cics[31]->key was always NULL when index=1 is passed to
> > > radix_tree_gang_lookup().
> >
> > Hang on, it returned 32? It should not return more than 16, since that
> > is what we have room for and asked for.
> sorry. Of course, it was 16 ;(
I expected so, otherwise we would have had far more serious problems :-)
> your patch works well. thank you.
It's committed now and posted in the relevant bugzilla as well (#9948).
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 8:36 ` Jens Axboe
2008-02-19 8:47 ` KAMEZAWA Hiroyuki
2008-02-19 9:02 ` KAMEZAWA Hiroyuki
@ 2008-02-19 13:19 ` Kamalesh Babulal
2008-02-22 7:24 ` Andrew Morton
3 siblings, 0 replies; 14+ messages in thread
From: Kamalesh Babulal @ 2008-02-19 13:19 UTC (permalink / raw)
To: Jens Axboe
Cc: Dhaval Giani, Linux Kernel Mailing List, Srivatsa Vaddagiri,
linuxppc-dev, Ingo Molnar, KAMEZAWA Hiroyuki, Balbir Singh
Jens Axboe wrote:
> On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
>> On Sun, 17 Feb 2008 20:29:13 +0100
>> Jens Axboe <jens.axboe@oracle.com> wrote:
>>
>>> It's odd stuff. Could you perhaps try and add some printks to
>>> block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
>>> from radix_tree_gang_lookup() and the pointer value of cics[i] in the
>>> for() loop after the lookup?
>>>
>> I met the same issue on ia64/NUMA box.
>> seems cisc[]->key is NULL and index for radix_tree_gang_lookup() was
>> always '1'.
>
> Why does it keep repeating then? If ->key is NULL, the next lookup index
> should be 1UL.
>
> But I think the radix 'scan over entire tree' is a bit fragile. This
> patch adds a parallel hlist for ease of properly browsing the members,
> does that work for you? It compiles, but I haven't booted it here yet...
>
>> Attached patch works well for me, but I don't know much about cfq.
>> please confirm.
>
> It doesn't make a lot of sense, I'm afraid.
>
> block/blk-ioc.c | 35 +++++++++++++++--------------------
> block/cfq-iosched.c | 37 +++++++++++--------------------------
> include/linux/iocontext.h | 2 ++
> 3 files changed, 28 insertions(+), 46 deletions(-)
>
> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index 80245dc..73c7002 100644
> --- a/block/blk-ioc.c
<snip>
Hi Jens,
Thanks for the patch. The patch works fine, machine boots up without the kernel panic.
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-19 8:36 ` Jens Axboe
` (2 preceding siblings ...)
2008-02-19 13:19 ` Kamalesh Babulal
@ 2008-02-22 7:24 ` Andrew Morton
2008-02-22 7:40 ` Jens Axboe
3 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2008-02-22 7:24 UTC (permalink / raw)
To: Jens Axboe
Cc: Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, Dhaval,
KAMEZAWA Hiroyuki, Balbir Singh
On Tue, 19 Feb 2008 09:36:34 +0100 Jens Axboe <jens.axboe@oracle.com> wrote:
> But I think the radix 'scan over entire tree' is a bit fragile.
eek, it had better not be. Was this an error in the caller? Hope so.
> This
> patch adds a parallel hlist for ease of properly browsing the members,
Even though io_contexts are fairly uncommon, adding more stuff to a data
structure was a pretty sad alternative to fixing a bug in
radix_tree_gang_lookup(), or to fixing a bug in a caller of it.
IOW: what exactly went wrong here??
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
2008-02-22 7:24 ` Andrew Morton
@ 2008-02-22 7:40 ` Jens Axboe
0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2008-02-22 7:40 UTC (permalink / raw)
To: Andrew Morton
Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
linuxppc-dev, Ingo Molnar, Kamalesh Babulal, KAMEZAWA Hiroyuki,
Balbir Singh
On Thu, Feb 21 2008, Andrew Morton wrote:
> On Tue, 19 Feb 2008 09:36:34 +0100 Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > But I think the radix 'scan over entire tree' is a bit fragile.
>
> eek, it had better not be. Was this an error in the caller? Hope so.
The cfq use of it, not the radix tree code! It juggled the keys and
wants to make sure that we see all users, modulo raced added ones (ok if
we see them, doesn't matter if we don't).
> > This
> > patch adds a parallel hlist for ease of properly browsing the members,
>
> Even though io_contexts are fairly uncommon, adding more stuff to a data
> structure was a pretty sad alternative to fixing a bug in
> radix_tree_gang_lookup(), or to fixing a bug in a caller of it.
>
> IOW: what exactly went wrong here??
I could not convince myself that the current code would always do the
right thing. We should not have been seeing ->key == NULL entries in
there, it implied a double exit of that process. So I decided to fix it
by making the code a lot more readable (the patch in question deleted a
lot more than it added), at the cost of that hlist head + node.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-02-22 7:40 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.LFD.1.00.0802151302210.9496@woody.linux-foundation.org>
2008-02-16 6:10 ` [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc Kamalesh Babulal
2008-02-17 19:29 ` Jens Axboe
2008-02-19 8:04 ` KAMEZAWA Hiroyuki
2008-02-19 8:36 ` Jens Axboe
2008-02-19 8:47 ` KAMEZAWA Hiroyuki
2008-02-19 8:58 ` Jens Axboe
2008-02-19 9:07 ` KAMEZAWA Hiroyuki
2008-02-19 9:09 ` Jens Axboe
2008-02-19 9:02 ` KAMEZAWA Hiroyuki
2008-02-19 9:01 ` Jens Axboe
2008-02-19 13:19 ` Kamalesh Babulal
2008-02-22 7:24 ` Andrew Morton
2008-02-22 7:40 ` Jens Axboe
2008-02-17 20:08 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).