* dump runq with debug key 'r' may cause dead loop
@ 2011-03-04 9:40 Wei, Gang
2011-03-04 10:05 ` Keir Fraser
0 siblings, 1 reply; 3+ messages in thread
From: Wei, Gang @ 2011-03-04 9:40 UTC (permalink / raw)
To: xen-devel@lists.xensource.com; +Cc: Wei, Gang
Recently I found dump runq with debug key 'r' may cause dead loop like below:
(XEN) active vcpus:
(XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
(XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
(XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
...
(XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256]
...
(XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256]
...
This means the active vcpu 0.2 became non-active just after it was access in the loop '2:', and that list element became empty state (head->next==next).
Should we always hold a lock before access any schedule related list, even in the debug purpose dump code? If it is not acceptable, then we'd better add a list_empty() check in the dump functions which access schedule related list at least to avoid such a dead loop.
Jimmy
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: dump runq with debug key 'r' may cause dead loop
2011-03-04 9:40 dump runq with debug key 'r' may cause dead loop Wei, Gang
@ 2011-03-04 10:05 ` Keir Fraser
2011-03-04 14:51 ` [PATCH]sched_credit: Hold lock while dump scheduler info (RE: dump runq with debug key 'r' may cause dead loop) Wei, Gang
0 siblings, 1 reply; 3+ messages in thread
From: Keir Fraser @ 2011-03-04 10:05 UTC (permalink / raw)
To: Wei, Gang, xen-devel@lists.xensource.com
On 04/03/2011 09:40, "Wei, Gang" <gang.wei@intel.com> wrote:
> Recently I found dump runq with debug key 'r' may cause dead loop like below:
>
> (XEN) active vcpus:
> (XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
> (XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
> (XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
> ...
> (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256]
> ...
> (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256]
> ...
>
> This means the active vcpu 0.2 became non-active just after it was access in
> the loop '2:', and that list element became empty state (head->next==next).
>
> Should we always hold a lock before access any schedule related list, even in
> the debug purpose dump code? If it is not acceptable, then we'd better add a
> list_empty() check in the dump functions which access schedule related list at
> least to avoid such a dead loop.
The appropriate lock should be taken. Please send a patch.
-- Keir
> Jimmy
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH]sched_credit: Hold lock while dump scheduler info (RE: dump runq with debug key 'r' may cause dead loop)
2011-03-04 10:05 ` Keir Fraser
@ 2011-03-04 14:51 ` Wei, Gang
0 siblings, 0 replies; 3+ messages in thread
From: Wei, Gang @ 2011-03-04 14:51 UTC (permalink / raw)
To: Keir Fraser, xen-devel@lists.xensource.com; +Cc: George Dunlap, Wei, Gang
[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]
Here is the patch.
sched_credit: Hold lock while dump scheduler info
Dump runq with debug key 'r' may cause dead loop like below:
(XEN) active vcpus:
(XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
(XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
(XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
...
(XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256]
...
(XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256]
...
This means the active vcpu 0.2 became non-active with the active list element empty just after it was accessed in the loop '2:'.
We should always hold a lock before access scheduler related list, even in the debug purpose dump code.
Signed-off-by: Wei Gang <gang.wei@intel.com>
diff -r 6241fa0ad1a9 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Mar 03 18:52:09 2011 +0000
+++ b/xen/common/sched_credit.c Sun Mar 06 04:31:57 2011 +0800
@@ -1452,6 +1452,10 @@ csched_dump(const struct scheduler *ops)
struct list_head *iter_sdom, *iter_svc;
struct csched_private *prv = CSCHED_PRIV(ops);
int loop;
+ unsigned long flags;
+
+ spin_lock_irqsave(&(prv->lock), flags);
+
#define idlers_buf keyhandler_scratch
printk("info:\n"
@@ -1500,6 +1504,8 @@ csched_dump(const struct scheduler *ops)
}
}
#undef idlers_buf
+
+ spin_unlock_irqrestore(&(prv->lock), flags);
}
static int
Keir Fraser wrote on 2011-03-04:
> On 04/03/2011 09:40, "Wei, Gang" <gang.wei@intel.com> wrote:
>
>> Recently I found dump runq with debug key 'r' may cause dead loop like
>> below:
>>
>> (XEN) active vcpus:
>> (XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
>> (XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
>> (XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
>> ...
>> (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256] ...
>> (XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256] ...
>>
>> This means the active vcpu 0.2 became non-active just after it was
>> access in the loop '2:', and that list element became empty state
>> (head->next==next).
>>
>> Should we always hold a lock before access any schedule related
>> list, even in the debug purpose dump code? If it is not acceptable,
>> then we'd better add a
>> list_empty() check in the dump functions which access schedule
>> related list at least to avoid such a dead loop.
>
> The appropriate lock should be taken. Please send a patch.
Jimmy
[-- Attachment #2: csched_dump_fix.patch --]
[-- Type: application/octet-stream, Size: 1371 bytes --]
sched_credit: Hold lock while dump scheduler info
Dump runq with debug key 'r' may cause dead loop like below:
(XEN) active vcpus:
(XEN) 1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
(XEN) 2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
(XEN) 3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
...
(XEN) xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256]
...
(XEN) xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256]
...
This means the active vcpu 0.2 became non-active with the active list element empty just after it was accessed in the loop '2:'.
We should always hold a lock before access scheduler related list, even in the debug purpose dump code.
Signed-off-by: Wei Gang <gang.wei@intel.com>
diff -r 6241fa0ad1a9 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Mar 03 18:52:09 2011 +0000
+++ b/xen/common/sched_credit.c Sun Mar 06 04:31:57 2011 +0800
@@ -1452,6 +1452,10 @@ csched_dump(const struct scheduler *ops)
struct list_head *iter_sdom, *iter_svc;
struct csched_private *prv = CSCHED_PRIV(ops);
int loop;
+ unsigned long flags;
+
+ spin_lock_irqsave(&(prv->lock), flags);
+
#define idlers_buf keyhandler_scratch
printk("info:\n"
@@ -1500,6 +1504,8 @@ csched_dump(const struct scheduler *ops)
}
}
#undef idlers_buf
+
+ spin_unlock_irqrestore(&(prv->lock), flags);
}
static int
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-03-04 14:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-04 9:40 dump runq with debug key 'r' may cause dead loop Wei, Gang
2011-03-04 10:05 ` Keir Fraser
2011-03-04 14:51 ` [PATCH]sched_credit: Hold lock while dump scheduler info (RE: dump runq with debug key 'r' may cause dead loop) Wei, Gang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.