xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* dump runq with debug key 'r' may cause dead loop
@ 2011-03-04  9:40 Wei, Gang
  2011-03-04 10:05 ` Keir Fraser
  0 siblings, 1 reply; 3+ messages in thread
From: Wei, Gang @ 2011-03-04  9:40 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com; +Cc: Wei, Gang

Recently I found dump runq with debug key 'r' may cause dead loop like below:

(XEN) active vcpus:
(XEN) 	  1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
(XEN) 	  2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
(XEN) 	  3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
...
(XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256]
...
(XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256]
...

This means the active vcpu 0.2 became non-active just after it was access in the loop '2:', and that list element became empty state (head->next==next).

Should we always hold a lock before access any schedule related list, even in the debug purpose dump code? If it is not acceptable, then we'd better add a list_empty() check in the dump functions which access schedule related list at least to avoid such a dead loop.

Jimmy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: dump runq with debug key 'r' may cause dead loop
  2011-03-04  9:40 dump runq with debug key 'r' may cause dead loop Wei, Gang
@ 2011-03-04 10:05 ` Keir Fraser
  2011-03-04 14:51   ` [PATCH]sched_credit: Hold lock while dump scheduler info (RE: dump runq with debug key 'r' may cause dead loop) Wei, Gang
  0 siblings, 1 reply; 3+ messages in thread
From: Keir Fraser @ 2011-03-04 10:05 UTC (permalink / raw)
  To: Wei, Gang, xen-devel@lists.xensource.com

On 04/03/2011 09:40, "Wei, Gang" <gang.wei@intel.com> wrote:

> Recently I found dump runq with debug key 'r' may cause dead loop like below:
> 
> (XEN) active vcpus:
> (XEN)    1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
> (XEN)    2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
> (XEN)    3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
> ...
> (XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256]
> ...
> (XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256]
> ...
> 
> This means the active vcpu 0.2 became non-active just after it was access in
> the loop '2:', and that list element became empty state (head->next==next).
> 
> Should we always hold a lock before access any schedule related list, even in
> the debug purpose dump code? If it is not acceptable, then we'd better add a
> list_empty() check in the dump functions which access schedule related list at
> least to avoid such a dead loop.

The appropriate lock should be taken. Please send a patch.

 -- Keir

> Jimmy
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH]sched_credit: Hold lock while dump scheduler info (RE: dump runq with debug key 'r' may cause dead loop)
  2011-03-04 10:05 ` Keir Fraser
@ 2011-03-04 14:51   ` Wei, Gang
  0 siblings, 0 replies; 3+ messages in thread
From: Wei, Gang @ 2011-03-04 14:51 UTC (permalink / raw)
  To: Keir Fraser, xen-devel@lists.xensource.com; +Cc: George Dunlap, Wei, Gang

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

Here is the patch.

sched_credit: Hold lock while dump scheduler info

Dump runq with debug key 'r' may cause dead loop like below:

(XEN) active vcpus:
(XEN) 	  1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
(XEN) 	  2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
(XEN) 	  3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
...
(XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256]
...
(XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256]
...

This means the active vcpu 0.2 became non-active with the active list element empty just after it was accessed in the loop '2:'.

We should always hold a lock before access scheduler related list, even in the debug purpose dump code.

Signed-off-by: Wei Gang <gang.wei@intel.com>

diff -r 6241fa0ad1a9 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Thu Mar 03 18:52:09 2011 +0000
+++ b/xen/common/sched_credit.c	Sun Mar 06 04:31:57 2011 +0800
@@ -1452,6 +1452,10 @@ csched_dump(const struct scheduler *ops)
     struct list_head *iter_sdom, *iter_svc;
     struct csched_private *prv = CSCHED_PRIV(ops);
     int loop;
+    unsigned long flags;
+
+    spin_lock_irqsave(&(prv->lock), flags);
+
 #define idlers_buf keyhandler_scratch
 
     printk("info:\n"
@@ -1500,6 +1504,8 @@ csched_dump(const struct scheduler *ops)
         }
     }
 #undef idlers_buf
+
+    spin_unlock_irqrestore(&(prv->lock), flags);
 }
 
 static int

Keir Fraser wrote on 2011-03-04:
> On 04/03/2011 09:40, "Wei, Gang" <gang.wei@intel.com> wrote:
> 
>> Recently I found dump runq with debug key 'r' may cause dead loop like
>> below:
>> 
>> (XEN) active vcpus:
>> (XEN)    1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
>> (XEN)    2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
>> (XEN)    3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
>> ...
>> (XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256] ...
>> (XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256] ...
>> 
>> This means the active vcpu 0.2 became non-active just after it was
>> access in the loop '2:', and that list element became empty state
>> (head->next==next).
>> 
>> Should we always hold a lock before access any schedule related
>> list, even in the debug purpose dump code? If it is not acceptable,
>> then we'd better add a
>> list_empty() check in the dump functions which access schedule
>> related list at least to avoid such a dead loop.
> 
> The appropriate lock should be taken. Please send a patch.

Jimmy



[-- Attachment #2: csched_dump_fix.patch --]
[-- Type: application/octet-stream, Size: 1371 bytes --]

sched_credit: Hold lock while dump scheduler info

Dump runq with debug key 'r' may cause dead loop like below:

(XEN) active vcpus:
(XEN) 	  1: [1.0] pri=0 flags=0 cpu=0 credit=263 [w=256]
(XEN) 	  2: [0.2] pri=0 flags=0 cpu=5 credit=284 [w=256]
(XEN) 	  3: [0.2] pri=0 flags=0 cpu=5 credit=282 [w=256]
...
(XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=2 credit=54 [w=256]
...
(XEN)  xxxxx: [0.2] pri=0 flags=0 cpu=3 credit=-48 [w=256]
...

This means the active vcpu 0.2 became non-active with the active list element empty just after it was accessed in the loop '2:'.

We should always hold a lock before access scheduler related list, even in the debug purpose dump code.

Signed-off-by: Wei Gang <gang.wei@intel.com>

diff -r 6241fa0ad1a9 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Thu Mar 03 18:52:09 2011 +0000
+++ b/xen/common/sched_credit.c	Sun Mar 06 04:31:57 2011 +0800
@@ -1452,6 +1452,10 @@ csched_dump(const struct scheduler *ops)
     struct list_head *iter_sdom, *iter_svc;
     struct csched_private *prv = CSCHED_PRIV(ops);
     int loop;
+    unsigned long flags;
+
+    spin_lock_irqsave(&(prv->lock), flags);
+
 #define idlers_buf keyhandler_scratch
 
     printk("info:\n"
@@ -1500,6 +1504,8 @@ csched_dump(const struct scheduler *ops)
         }
     }
 #undef idlers_buf
+
+    spin_unlock_irqrestore(&(prv->lock), flags);
 }
 
 static int

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-03-04 14:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-04  9:40 dump runq with debug key 'r' may cause dead loop Wei, Gang
2011-03-04 10:05 ` Keir Fraser
2011-03-04 14:51   ` [PATCH]sched_credit: Hold lock while dump scheduler info (RE: dump runq with debug key 'r' may cause dead loop) Wei, Gang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).