All of lore.kernel.org
 help / color / mirror / Atom feed
* Two problems with DomU reboot (cmdline, duplicate domains)
@ 2007-01-17 22:36 Florian Kirstein
  2007-01-19  9:41 ` [PATCH] fix: growing kernel commandline Florian Kirstein
  2007-01-21  8:59 ` Bug: Problematic DomU Duplication on reboot Florian Kirstein
  0 siblings, 2 replies; 5+ messages in thread
From: Florian Kirstein @ 2007-01-17 22:36 UTC (permalink / raw)
  To: xen-devel

Hi,

just upgraded my testsystem to 3.0.4 (using the provided source rpm,
xen-3.0.4.1-1.src.rpm, rebuilt it to run on non-PXE but no changes besides
that, and it doesn't look like the 3.0.4-testing HG has major differences
so far?) and I have two problems with rebooting domains (xm reboot, or
reboot from inside the domain).

1) the kernel-commandline keeps growing. On the first boot it's OK, on
the second it's there 3-times as one long cmdline:
ip=172.16.37.9:1.2.3.4:172.16.37.19:255.255.255.0::eth0:off root=/dev/sda1 ro ip=172.16.37.9:1.2.3.4:172.16.37.19:255.255.255.0::eth0:off root=/dev/sda1 ro ip=172.16.37.9:1.2.3.4:172.16.37.19:255.255.255.0::eth0:off root=/dev/sda1 ro
and on the next reboot it's longer than the kernel supports, which usually
breaks networking as the kernel seems to use the last "ip=" parameter which
most probably is incomplete then. It also happens when I don't specify
networking there, using a config almost identical to xmexample1, after
the first reboot:
# cat /proc/cmdline 
root=/dev/sda1 ro root=/dev/sda1 ro root=/dev/sda1 ro 3
interesting that the "3" from the "extra" parameter is not doubled...

2) it happend twice so far (non-reproducable for me yet) that after
a reboot I had the same DomU twice, using the same name and the same
blockdevices (LVM based phy: devices). This of course resulted in
major data corruption and really doesn't make me feel well. I read
there were changes in the code which should prevent this, but
for me it seems like it got worse, had this never before...

Thanks for any ideas for this, will try the current 3.0.4-testing
hg later, but as I currently can't reproduce "2)" I hope someone
here knows if this is fixed and by what this could have been
triggered? 

(:ul8er, r@y
P.S: Oh, I could reproduce it a third time, think I just issued another
reboot on the domain. Probably while having a libvirt-based tool list
domains in the background, doing some tests there now... However, this
is how it looks:
DomU9                                     72    50     1     -b----      8.8
DomU9                                     73    50     1     -b----      8.1
and both are actually running, I can attach to both consoles using
/usr/lib/xen/bin/xenconsole 72 
/usr/lib/xen/bin/xenconsole 73
and they are indeed different instances running on the same blockdevices.
Ouch :)

P.P.S: Here's an excerpt from the xend.log (from the first time it
happend) which I suppose shows the part where the duplicate Domain
was created, unfortunately the log-level was set to INFO:

[2007-01-17 16:28:04 xend.XendDomainInfo 12761] INFO (XendDomainInfo:969) Domain
 has shutdown: name=DomU9 id=18 reason=reboot.
[2007-01-17 16:28:04 xend.XendDomainInfo 12761] INFO (XendDomainInfo:969) Domain
 has shutdown: name=DomU9 id=18 reason=reboot.
[2007-01-17 16:28:04 xend.XendDomainInfo 12761] INFO (XendDomainInfo:969) Domain
 has shutdown: name=DomU9 id=18 reason=reboot.
[2007-01-17 16:28:05 xend.XendDomainInfo 12761] ERROR (XendDomainInfo:1063) Xend
 failed during restart of domain None.  Refusing to restart to avoid loops.
[2007-01-17 16:28:05 xend.XendConfig 12761] WARNING (XendConfig:606) Unconverted
 key: cpus
[2007-01-17 16:28:05 xend 12761] INFO (image:125) buildDomain os=linux dom=19 vc
pus=1
[2007-01-17 16:28:05 xend.XendDomainInfo 12761] INFO (XendDomainInfo:1194) creat
eDevice: vbd : {'uuid': '5f42ac79-37a5-3511-5586-d215a356aa2d', 'driver': 'parav
irtualised', 'dev': 'sda1:disk', 'uname': 'phy:/dev/vgrc/h_root_110', 'mode': 'w
', 'backend': 0}
[2007-01-17 16:28:05 xend.XendDomainInfo 12761] INFO (XendDomainInfo:1194) creat
eDevice: vbd : {'uuid': 'a6f7ccdb-f331-211f-1721-22a8d7aa2097', 'driver': 'parav
irtualised', 'dev': 'sda2:disk', 'uname': 'phy:/dev/vgrc/swap110', 'mode': 'w', 
'backend': 0}
[2007-01-17 16:28:05 xend.XendDomainInfo 12761] INFO (XendDomainInfo:1194) creat
eDevice: vif : {'ip': '172.16.37.9', 'mac': '00:16:3e:00:32:f1', 'script': 'vi
f-route', 'uuid': '28a916b8-0e7c-cd43-81b6-fd35bf48ecae', 'backend': 0}
[2007-01-17 16:28:06 xend.XendConfig 12761] WARNING (XendConfig:606) Unconverted
 key: cpus
[2007-01-17 16:28:06 xend 12761] INFO (image:125) buildDomain os=linux dom=20 vc
pus=1
[2007-01-17 16:28:06 xend.XendDomainInfo 12761] INFO (XendDomainInfo:1194) creat
eDevice: vbd : {'uname': 'phy:/dev/vgrc/h_root_110', 'driver': 'paravirtualised'
, 'mode': 'w', 'dev': 'sda1', 'uuid': '5f42ac79-37a5-3511-5586-d215a356aa2d'}
[2007-01-17 16:28:06 xend.XendDomainInfo 12761] INFO (XendDomainInfo:1194) creat
eDevice: vbd : {'uname': 'phy:/dev/vgrc/swap110', 'driver': 'paravirtualised', '
mode': 'w', 'dev': 'sda2', 'uuid': 'a6f7ccdb-f331-211f-1721-22a8d7aa2097'}
[2007-01-17 16:28:06 xend.XendDomainInfo 12761] INFO (XendDomainInfo:1194) creat
eDevice: vif : {'ip': '172.16.37.9', 'mac': '00:16:3e:00:32:f1', 'script': 'vi
f-route', 'uuid': '28a916b8-0e7c-cd43-81b6-fd35bf48ecae', 'backend': 0}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] fix: growing kernel commandline
  2007-01-17 22:36 Two problems with DomU reboot (cmdline, duplicate domains) Florian Kirstein
@ 2007-01-19  9:41 ` Florian Kirstein
  2007-01-19 10:29   ` Ian Campbell
  2007-01-21  8:59 ` Bug: Problematic DomU Duplication on reboot Florian Kirstein
  1 sibling, 1 reply; 5+ messages in thread
From: Florian Kirstein @ 2007-01-19  9:41 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3475 bytes --]

Hi,

replying to myself... Obviously my problem No.1 (growing commandline)
is already known, but I didn't find much besides a small comment from
Ewan Mellor yesterday here on the list about it, after finding the
relevant source parts responsible.

So for the interested few, it's caused by John's Patch...
# User john.levon@sun.com                                                       
# Date 1167936545 28800                                                         
# Node ID acda3f65d9797126035cc8cae65d8804415c6036                              
...and is already fixed in xen-3.0-unstable. But unfortunately the released
3.0.4.1 is broken, so I transfered the patch from 3.0-unstable and modified
it a bit, so the commandline really remains in the same order as in
previous xen versions (in testing ip= and root= are swapped). Whoever
needs that, my scripts don't :). 

Oh, and one more thing: what's the idea behind the ip=[^ ] regexp
in the test? Different from the root= parameter check, this only matches
non-empty ip parameters, so if there's an empty ip= parameter, we add
our ip parameter anyway. Which won't change a thing in the "new order",
as I think the kernel uses the last ip= parameter it finds, which then
still is the empty one? However, I left this unchanged...

So: I'm not sure if users are supposed to post patches also for the 
xen-3.0.4-testing.hg repository, but as most productive environments
are probably using that instead of -unstable, here's the patch for
those who are annoyed by this problem and want to patch their local
sources:

# HG changeset patch
# User ray@build.ray.net
# Node ID c25e4e8a9668fc25c0424c2936d2e4f94345ab89
# Parent  f98a6a9df1b4ea6022d05cdb2d189cb7645408d2
Fix kernel commandline generation to prevent duplication of ip= and 
root= parameters on reboot, while preserving the parameter ordering
known from previous versions

Signed-off-by: Florian Kirstein <ray@ray.net>

diff -r f98a6a9df1b4 -r c25e4e8a9668 tools/python/xen/xend/XendConfig.py
--- a/tools/python/xen/xend/XendConfig.py       Mon Jan  8 12:54:41 2007 -0800
+++ b/tools/python/xen/xend/XendConfig.py       Fri Jan 19 10:12:20 2007 +0100
@@ -1104,19 +1104,15 @@ class XendConfig(dict):
 
         self['PV_kernel'] = sxp.child_value(image_sxp, 'kernel','')
         self['PV_ramdisk'] = sxp.child_value(image_sxp, 'ramdisk','')
-        kernel_args = ""
+        kernel_args = sxp.child_value(image_sxp, 'args', '')
         
         # attempt to extract extra arguments from SXP config
+        arg_root = sxp.child_value(image_sxp, 'root')
+        if arg_root and not re.search(r'root=', kernel_args):
+            kernel_args = 'root=%s ' % arg_root + kernel_args
         arg_ip = sxp.child_value(image_sxp, 'ip')
         if arg_ip and not re.search(r'ip=[^ ]+', kernel_args):
-            kernel_args += 'ip=%s ' % arg_ip
-        arg_root = sxp.child_value(image_sxp, 'root')
-        if arg_root and not re.search(r'root=', kernel_args):
-            kernel_args += 'root=%s ' % arg_root
-
-        # user-specified args must come last: previous releases did this and
-        # some domU kernels rely upon the ordering.
-        kernel_args += sxp.child_value(image_sxp, 'args', '')
+            kernel_args = 'ip=%s ' % arg_ip + kernel_args
 
         self['PV_args'] = kernel_args

Now I'm still stuck with my other (duplicate created DomUs shreddering the
filesystem) problem, will do tests to reproduce that later today...

(:ul8er, r@y

[-- Attachment #2: commandline.patch --]
[-- Type: text/plain, Size: 1739 bytes --]

# HG changeset patch
# User ray@build.ray.net
# Node ID c25e4e8a9668fc25c0424c2936d2e4f94345ab89
# Parent  f98a6a9df1b4ea6022d05cdb2d189cb7645408d2
Fix kernel commandline generation to prevent duplication of ip= and 
root= parameters on reboot, while preserving the parameter ordering
known from previous versions

Signed-off-by: Florian Kirstein <ray@ray.net>

diff -r f98a6a9df1b4 -r c25e4e8a9668 tools/python/xen/xend/XendConfig.py
--- a/tools/python/xen/xend/XendConfig.py	Mon Jan  8 12:54:41 2007 -0800
+++ b/tools/python/xen/xend/XendConfig.py	Fri Jan 19 10:12:20 2007 +0100
@@ -1104,19 +1104,15 @@ class XendConfig(dict):
 
         self['PV_kernel'] = sxp.child_value(image_sxp, 'kernel','')
         self['PV_ramdisk'] = sxp.child_value(image_sxp, 'ramdisk','')
-        kernel_args = ""
+        kernel_args = sxp.child_value(image_sxp, 'args', '')
         
         # attempt to extract extra arguments from SXP config
+        arg_root = sxp.child_value(image_sxp, 'root')
+        if arg_root and not re.search(r'root=', kernel_args):
+            kernel_args = 'root=%s ' % arg_root + kernel_args
         arg_ip = sxp.child_value(image_sxp, 'ip')
         if arg_ip and not re.search(r'ip=[^ ]+', kernel_args):
-            kernel_args += 'ip=%s ' % arg_ip
-        arg_root = sxp.child_value(image_sxp, 'root')
-        if arg_root and not re.search(r'root=', kernel_args):
-            kernel_args += 'root=%s ' % arg_root
-
-        # user-specified args must come last: previous releases did this and
-        # some domU kernels rely upon the ordering.
-        kernel_args += sxp.child_value(image_sxp, 'args', '')
+            kernel_args = 'ip=%s ' % arg_ip + kernel_args
 
         self['PV_args'] = kernel_args
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix: growing kernel commandline
  2007-01-19  9:41 ` [PATCH] fix: growing kernel commandline Florian Kirstein
@ 2007-01-19 10:29   ` Ian Campbell
  2007-01-19 11:59     ` Florian Kirstein
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Campbell @ 2007-01-19 10:29 UTC (permalink / raw)
  To: Florian Kirstein; +Cc: xen-devel

On Fri, 2007-01-19 at 10:41 +0100, Florian Kirstein wrote:
> Hi,
> 
> replying to myself... Obviously my problem No.1 (growing commandline)
> is already known, but I didn't find much besides a small comment from
> Ewan Mellor yesterday here on the list about it, after finding the
> relevant source parts responsible.
> 
> So for the interested few, it's caused by John's Patch...
> # User john.levon@sun.com                                                       
> # Date 1167936545 28800                                                         
> # Node ID acda3f65d9797126035cc8cae65d8804415c6036                              
> ...and is already fixed in xen-3.0-unstable. But unfortunately the released
> 3.0.4.1 is broken, so I transfered the patch from 3.0-unstable and modified
> it a bit, so the commandline really remains in the same order as in
> previous xen versions (in testing ip= and root= are swapped). Whoever
> needs that, my scripts don't :). 

Thanks for doing this.

I'm actually just about to push a straight backport of the fix which
went into unstable. If further fixes are required on top of that we
should consider making them in xen-unstable first.

Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix: growing kernel commandline
  2007-01-19 10:29   ` Ian Campbell
@ 2007-01-19 11:59     ` Florian Kirstein
  0 siblings, 0 replies; 5+ messages in thread
From: Florian Kirstein @ 2007-01-19 11:59 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Campbell

Hi,

> I'm actually just about to push a straight backport of the fix which
> went into unstable. If further fixes are required on top of that we
> should consider making them in xen-unstable first.
I agree for anything further than what I've done. But the switching of
ip= and root= parameter in the processing, basically just an exchange of
the two blocks:

arg_ip = sxp.child_value(image_sxp, 'ip')
if arg_ip and not re.search(r'ip=[^ ]+', kernel_args):
   kernel_args = 'ip=%s ' % arg_ip + kernel_args

and:

arg_root = sxp.child_value(image_sxp, 'root')
if arg_root and not re.search(r'root=', kernel_args):
   kernel_args = 'root=%s ' % arg_root + kernel_args

seems to be riskless, if you do this additional to the -unstable patch
we should be on the safe side, I hope :) Otherwise ip= and root= parameter
will change their order on the commandline (because they now prepend
instead of append themselves), and I thought keeping that intact for
bad parsers was the main reason for all of this... That's why I did
this change when backporting the fix.

(:ul8er, r@y

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Bug: Problematic DomU Duplication on reboot
  2007-01-17 22:36 Two problems with DomU reboot (cmdline, duplicate domains) Florian Kirstein
  2007-01-19  9:41 ` [PATCH] fix: growing kernel commandline Florian Kirstein
@ 2007-01-21  8:59 ` Florian Kirstein
  1 sibling, 0 replies; 5+ messages in thread
From: Florian Kirstein @ 2007-01-21  8:59 UTC (permalink / raw)
  To: xen-devel

Hi,

OK, I did some more experiments and can now reproduce the duplication
of a domain on it's reboot. Seems to be a race condition somewhere,
as I can trigger it by putting high load on xend.

The really bad thing: all instances of the domain are then actively
running on the same block devices, which almost certainly causes massive
data corruption :-( And: it also can happen in normal operation, I had
it at least twice in a "normal" environment without much load on xend,
possibly just a libvirt request at the wrong time during a DomU reboot.

If this is already known: sorry for the long mail then... Is there a fix
for 3.0.4-testing? :)

If not: I more or less see two Bugs there:
1) why is the domain multiplicated during the reboot
2) why is it possible at all that it's started twice, using the same
devices? Could there be a check added to prevent duplicate use of
the same device readwrite, or is there already one which is failing in
this case?

Reproduction:
I was able to reproduce this quite reliably using the sample-program
dump-info.pl from the perl-Sys-virt libvirt Interface. I (as root) just do a 
while true; do ./dump-info.pl; done
in the examples dir to stress the system/xend. Building the loop inside
dump-info.pl and removing all "print"s even makes it work a bit "better"
and really messing things up, so try that if the other doesn't work. I
tested it on a P4 3 GHz and a Dualcore A64 2.2Ghz, it's easier when
I use nosmp on the xen kernel on the A64 but it works also in the SMP
case.

While this is running I simply issue:
xm reboot DomU1
and most of the times it results in two or more DomU1s running
afterwards... Sometimes it also causes DomU1 to disappear, having an
entry in the log it was rebooting too fast (of course I waited long
enough with the reboot). If it "works" it looks like this:
DomU1                                     97   256     1     -b----     12.5
DomU1                                     98   256     1     -b----     12.9
afterwards. DomU1 being just a normal paravirtualized Linux Guest. 
Dom0 is a CentOS 4 in case it could matter.

Observations:
During the reboot sometimes multiple duplications were created, load
on Dom0 went up to about 30 and I saw lots of xen-backend hotplug agents:
10613 ?        S<     0:00  \_ /bin/sh /sbin/hotplug xen-backend
10617 ?        S<     0:01  |   \_ /bin/sh /etc/hotplug/xen-backend.agent
15018 ?        S<     0:00  \_ /bin/sh /sbin/hotplug xen-backend
15248 ?        S<     0:01  |   \_ /bin/sh /etc/hotplug/xen-backend.agent
14698 ?        S<     0:00  \_ /bin/sh /sbin/hotplug xen-backend
14702 ?        S<     0:00  |   \_ /bin/sh /etc/hotplug/xen-backend.agent
15091 ?        S<     0:00  \_ /bin/sh /sbin/hotplug xen-backend
(about 60 more lines like this - and I had just one domU). After everything
settled the result:
VM100                                     38   256     1     -b----     13.3
VM100                                     10   256     1     -b----     14.1
Noticable the large difference from 10-38, meaning 27 domains were
partially crated and then died, the Domain I rebooted had ID 9.

Oh, and one more thing: when using "stress" to put load on the Dom0
system instead of the perl-Sys-virt tool, it usually causes the
DomU to disappear on reboot, but I couldn't reproduce the duplication
that way.

All this done with the released 3.0.4.1-1, will try xen-unstable next,
but possibly someone already as an idea what could be wrong here?

(:ul8er, r@y

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-01-21  8:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-17 22:36 Two problems with DomU reboot (cmdline, duplicate domains) Florian Kirstein
2007-01-19  9:41 ` [PATCH] fix: growing kernel commandline Florian Kirstein
2007-01-19 10:29   ` Ian Campbell
2007-01-19 11:59     ` Florian Kirstein
2007-01-21  8:59 ` Bug: Problematic DomU Duplication on reboot Florian Kirstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.