* [xen-unstable test] 19308: regressions - FAIL
@ 2013-09-15 6:09 xen.org
2013-09-15 12:50 ` Ian Campbell
0 siblings, 1 reply; 9+ messages in thread
From: xen.org @ 2013-09-15 6:09 UTC (permalink / raw)
To: xen-devel; +Cc: ian.jackson
[-- Attachment #1: Type: text/plain, Size: 6803 bytes --]
flight 19308 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
test-amd64-i386-qemut-rhel6hvm-intel 11 leak-check/check fail in 19288 REGR. vs. 19208
test-amd64-amd64-xl-winxpsp3 8 guest-saverestore fail in 19288 REGR. vs. 19208
test-amd64-i386-xl-qemut-win7-amd64 12 guest-localmigrate/x10 fail in 19288 REGR. vs. 19208
Tests which are failing intermittently (not blocking):
test-amd64-i386-qemut-rhel6hvm-intel 7 redhat-install fail pass in 19288
test-amd64-amd64-xl-winxpsp3 7 windows-install fail pass in 19288
test-amd64-i386-xl-qemut-win7-amd64 7 windows-install fail pass in 19288
test-amd64-i386-rhel6hvm-intel 7 redhat-install fail in 19288 pass in 19308
test-amd64-amd64-xl-sedf 14 guest-localmigrate/x10 fail in 19288 pass in 19308
test-amd64-i386-qemut-rhel6hvm-amd 7 redhat-install fail in 19288 pass in 19308
test-amd64-i386-xl-win7-amd64 11 guest-localmigrate.2 fail in 19288 pass in 19308
test-amd64-amd64-xl-win7-amd64 7 windows-install fail in 19288 pass in 19308
test-amd64-amd64-xl-qemut-winxpsp3 7 windows-install fail in 19288 pass in 19308
Tests which did not succeed, but are not blocking:
test-armhf-armhf-xl 1 xen-build-check(1) blocked n/a
test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass
build-armhf-pvops 4 kernel-build fail never pass
test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass
test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xend-qemut-winxpsp3 16 leak-check/check fail never pass
test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-amd64-xl-qemut-win7-amd64 13 guest-stop fail never pass
version targeted for testing:
xen 593470233ff38385df9dcf5690cc58c7a4fb290d
baseline version:
xen cadbe2f9e768585fad52156be2433d49ec9feaf1
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli <dario.faggioli@citrix.com>
Fabio Fantoni <fabio.fantoni@m2r.biz>
George Dunlap <george.dunlap@eu.citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Juergen Gross <juergen.gross@ts.fujitsu.com>
Keir Fraser <keir@xen.org>
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Matthew Daley <mattjd@gmail.com>
Roger Pau Monné <roger.pau@citrix.com>
Samuel Thibault <samuel.thibault@ens-lyon.org>
Tim Deegan <tim@xen.org>
Wei Liu <wei.liu2@citrix.com>
------------------------------------------------------------
jobs:
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-pvops pass
build-armhf-pvops fail
build-i386-pvops pass
test-amd64-amd64-xl pass
test-armhf-armhf-xl blocked
test-amd64-i386-xl pass
test-amd64-i386-rhel6hvm-amd fail
test-amd64-i386-qemut-rhel6hvm-amd fail
test-amd64-i386-qemuu-rhel6hvm-amd fail
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-win7-amd64 fail
test-amd64-i386-xl-win7-amd64 fail
test-amd64-i386-xl-credit2 pass
test-amd64-amd64-xl-pcipt-intel fail
test-amd64-i386-rhel6hvm-intel fail
test-amd64-i386-qemut-rhel6hvm-intel fail
test-amd64-i386-qemuu-rhel6hvm-intel fail
test-amd64-i386-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-xl-sedf-pin pass
test-amd64-amd64-pv pass
test-amd64-i386-pv pass
test-amd64-amd64-xl-sedf pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail
test-amd64-i386-xl-winxpsp3-vcpus1 fail
test-amd64-i386-xend-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemuu-winxpsp3 fail
test-amd64-i386-xend-winxpsp3 fail
test-amd64-amd64-xl-winxpsp3 fail
------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images
Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs
Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 368 lines long.)
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [xen-unstable test] 19308: regressions - FAIL
2013-09-15 6:09 [xen-unstable test] 19308: regressions - FAIL xen.org
@ 2013-09-15 12:50 ` Ian Campbell
2013-09-16 7:49 ` Roger Pau Monné
2013-09-16 10:43 ` Ian Jackson
0 siblings, 2 replies; 9+ messages in thread
From: Ian Campbell @ 2013-09-15 12:50 UTC (permalink / raw)
To: xen.org, Wei Liu, Roger Pau Monne; +Cc: xen-devel
On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
> flight 19308 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
These are due to /var/run/xen-hotplug/block getting leaked
> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
These are:
libxl: error: libxl_device.c:894:device_backend_callback: unable
to add device with path /local/domain/0/backend/vbd/9/5632
libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
/var/log/xen/xenhotplug.log contains:
xenstore-read: couldn't read path backend/vbd/9/5632/node
For both of these I'm suspicious of:
11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
and to a lesser extent:
a508caf libxl: fix libxl__device_disk_from_xs_be to parse backend domid
It doesn't look like the bisector is looking at this, or else I'm
reading osstest's resource plan wrongly
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [xen-unstable test] 19308: regressions - FAIL
2013-09-15 12:50 ` Ian Campbell
@ 2013-09-16 7:49 ` Roger Pau Monné
2013-09-16 9:55 ` Wei Liu
2013-09-16 10:43 ` Ian Jackson
1 sibling, 1 reply; 9+ messages in thread
From: Roger Pau Monné @ 2013-09-16 7:49 UTC (permalink / raw)
To: Ian Campbell; +Cc: Wei Liu, xen-devel, xen.org
On 15/09/13 14:50, Ian Campbell wrote:
> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
>> flight 19308 xen-unstable real [real]
>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
>> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
>> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
>> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
>> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
>
> These are due to /var/run/xen-hotplug/block getting leaked
>
>> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
>> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
>> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
>
> These are:
> libxl: error: libxl_device.c:894:device_backend_callback: unable
> to add device with path /local/domain/0/backend/vbd/9/5632
> libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
>
> /var/log/xen/xenhotplug.log contains:
> xenstore-read: couldn't read path backend/vbd/9/5632/node
>
> For both of these I'm suspicious of:
> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
Hello,
I've tracked this down to libxl writing a wrong physical-device
xenstore node when using regular files. When using block devices libxl
can write the physical-device because it can be fetched without
requiring the execution of the block script, but with regular files it
is not true, we must first execute the block script in order to mount
the regular file into a loop device and then fetch the physical-device
from the loop device to which the image has been mounted. Following
patch solves the issue for me.
8<-------------------------------------------------------------------
From e150f00565bfe291809441e73630b243e21a52b0 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Mon, 16 Sep 2013 09:39:05 +0200
Subject: [PATCH] libxl: don't write physical-device vbd xenstore node in
libxl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
libxl used to write the physical-device xenstore node needed by the
phy backend type, because the phy backend type could only be used with
block devices. If libxl allows the backend type phy to be used with
regular files, it can no longer write physical-device because the
hotplug script has to be executed first in order to mount the regular
file into a loop device and then write the physical-device of the loop
device used.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
tools/libxl/libxl.c | 15 ---------------
1 files changed, 0 insertions(+), 15 deletions(-)
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0879f23..326a378 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2101,21 +2101,6 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
libxl__xen_script_dir_path());
flexarray_append_pair(back, "script", script);
- /* If the user did not supply a block script then we
- * write the physical-device node ourselves.
- *
- * If the user did supply a script then that script is
- * responsible for this since the block device may not
- * exist yet.
- */
- if (!disk->script &&
- disk->backend_domid == LIBXL_TOOLSTACK_DOMID) {
- int major, minor;
- libxl__device_physdisk_major_minor(dev, &major, &minor);
- flexarray_append_pair(back, "physical-device",
- libxl__sprintf(gc, "%x:%x", major, minor));
- }
-
assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD);
break;
--
1.7.7.5 (Apple Git-26)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [xen-unstable test] 19308: regressions - FAIL
2013-09-16 7:49 ` Roger Pau Monné
@ 2013-09-16 9:55 ` Wei Liu
2013-09-16 10:19 ` Roger Pau Monné
2013-09-16 12:43 ` Ian Campbell
0 siblings, 2 replies; 9+ messages in thread
From: Wei Liu @ 2013-09-16 9:55 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Wei Liu, xen-devel, xen.org, Ian Campbell
On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:
> On 15/09/13 14:50, Ian Campbell wrote:
> > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
> >> flight 19308 xen-unstable real [real]
> >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
> >>
> >> Regressions :-(
> >>
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
> >> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
> >> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
> >> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
> >> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
> >
> > These are due to /var/run/xen-hotplug/block getting leaked
> >
The error message in XenStore shows blkback tries to get hold of the
block device 0:0 but there's no such device entry in system.
> >> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
> >> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
> >> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
> >
> > These are:
> > libxl: error: libxl_device.c:894:device_backend_callback: unable
> > to add device with path /local/domain/0/backend/vbd/9/5632
> > libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
> >
> > /var/log/xen/xenhotplug.log contains:
> > xenstore-read: couldn't read path backend/vbd/9/5632/node
> >
> > For both of these I'm suspicious of:
> > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
>
> Hello,
>
> I've tracked this down to libxl writing a wrong physical-device
> xenstore node when using regular files. When using block devices libxl
> can write the physical-device because it can be fetched without
> requiring the execution of the block script, but with regular files it
> is not true, we must first execute the block script in order to mount
> the regular file into a loop device and then fetch the physical-device
> from the loop device to which the image has been mounted. Following
> patch solves the issue for me.
>
Yes, that's the in question I think. That code snippet was introduced in:
commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
Author: Ian Campbell <Ian.Campbell@citrix.com>
Date: Tue Aug 7 14:26:29 2012 +0100
libxl: write physical-device node if user did not supply a block script
This reverts one of the intentional changes from 25733:353bc0801b11.
That change exposed an issue with the xl migration protocol, which
although safe triggers the hotplug scripts device sharing logic.
For 4.2 we disable this logic by writing the physical-device xenstore
node ourselves if a user did not supply a script. If the user did
supply a script then we continue to rely on it to write the
physical-device node (not least because the script may create the
device and therefore it is not available before we run the script).
This means that to support localhost migration a block hotplug script
needs to be robust against adding a device twice and should not
deactivate the device until it has been removed twice.
This should be revisited for 4.3.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
And in the commit message it says this behavior should be revisited.
Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
things look more complicated. One interesting snippet in the commit
message is:
- libxl should not write the "physical-device" node. This is the
responsibility of the block script. Writing the "physical-device"
node in libxl basically completely short-cuts the standard block
hotplug script which uses "physical-device" to know if it has run
already or not.
That makes me believe the following fix is the correct thing to do in
long term.
I have to admit that I cannot fully consume the commit message of 25733
in one day so unless you (Ian) can confirm Roger's fix will not cause further
regression otherwise I would suggest reverting my change at the moment.
Wei.
> 8<-------------------------------------------------------------------
> >From e150f00565bfe291809441e73630b243e21a52b0 Mon Sep 17 00:00:00 2001
> From: Roger Pau Monne <roger.pau@citrix.com>
> Date: Mon, 16 Sep 2013 09:39:05 +0200
> Subject: [PATCH] libxl: don't write physical-device vbd xenstore node in
> libxl
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> libxl used to write the physical-device xenstore node needed by the
> phy backend type, because the phy backend type could only be used with
> block devices. If libxl allows the backend type phy to be used with
> regular files, it can no longer write physical-device because the
> hotplug script has to be executed first in order to mount the regular
> file into a loop device and then write the physical-device of the loop
> device used.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> tools/libxl/libxl.c | 15 ---------------
> 1 files changed, 0 insertions(+), 15 deletions(-)
>
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 0879f23..326a378 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -2101,21 +2101,6 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
> libxl__xen_script_dir_path());
> flexarray_append_pair(back, "script", script);
>
> - /* If the user did not supply a block script then we
> - * write the physical-device node ourselves.
> - *
> - * If the user did supply a script then that script is
> - * responsible for this since the block device may not
> - * exist yet.
> - */
> - if (!disk->script &&
> - disk->backend_domid == LIBXL_TOOLSTACK_DOMID) {
> - int major, minor;
> - libxl__device_physdisk_major_minor(dev, &major, &minor);
> - flexarray_append_pair(back, "physical-device",
> - libxl__sprintf(gc, "%x:%x", major, minor));
> - }
> -
> assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD);
> break;
>
> --
> 1.7.7.5 (Apple Git-26)
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [xen-unstable test] 19308: regressions - FAIL
2013-09-16 9:55 ` Wei Liu
@ 2013-09-16 10:19 ` Roger Pau Monné
2013-09-16 12:43 ` Ian Campbell
1 sibling, 0 replies; 9+ messages in thread
From: Roger Pau Monné @ 2013-09-16 10:19 UTC (permalink / raw)
To: Wei Liu; +Cc: xen-devel, xen.org, Ian Campbell
On 16/09/13 11:55, Wei Liu wrote:
> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:
>> On 15/09/13 14:50, Ian Campbell wrote:
>>> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
>>>> flight 19308 xen-unstable real [real]
>>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
>>>> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
>>>> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
>>>> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
>>>> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
>>>
>>> These are due to /var/run/xen-hotplug/block getting leaked
>>>
>
> The error message in XenStore shows blkback tries to get hold of the
> block device 0:0 but there's no such device entry in system.
>
>>>> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
>>>> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
>>>> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
>>>
>>> These are:
>>> libxl: error: libxl_device.c:894:device_backend_callback: unable
>>> to add device with path /local/domain/0/backend/vbd/9/5632
>>> libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
>>>
>>> /var/log/xen/xenhotplug.log contains:
>>> xenstore-read: couldn't read path backend/vbd/9/5632/node
>>>
>>> For both of these I'm suspicious of:
>>> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
>>
>> Hello,
>>
>> I've tracked this down to libxl writing a wrong physical-device
>> xenstore node when using regular files. When using block devices libxl
>> can write the physical-device because it can be fetched without
>> requiring the execution of the block script, but with regular files it
>> is not true, we must first execute the block script in order to mount
>> the regular file into a loop device and then fetch the physical-device
>> from the loop device to which the image has been mounted. Following
>> patch solves the issue for me.
>>
>
> Yes, that's the in question I think. That code snippet was introduced in:
>
> commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> Author: Ian Campbell <Ian.Campbell@citrix.com>
> Date: Tue Aug 7 14:26:29 2012 +0100
>
> libxl: write physical-device node if user did not supply a block script
>
> This reverts one of the intentional changes from 25733:353bc0801b11.
> That change exposed an issue with the xl migration protocol, which
> although safe triggers the hotplug scripts device sharing logic.
>
> For 4.2 we disable this logic by writing the physical-device xenstore
> node ourselves if a user did not supply a script. If the user did
> supply a script then we continue to rely on it to write the
> physical-device node (not least because the script may create the
> device and therefore it is not available before we run the script).
>
> This means that to support localhost migration a block hotplug script
> needs to be robust against adding a device twice and should not
> deactivate the device until it has been removed twice.
>
> This should be revisited for 4.3.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Committed-by: Ian Campbell <ian.campbell@citrix.com>
>
> And in the commit message it says this behavior should be revisited.
>
> Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> things look more complicated. One interesting snippet in the commit
> message is:
>
> - libxl should not write the "physical-device" node. This is the
> responsibility of the block script. Writing the "physical-device"
> node in libxl basically completely short-cuts the standard block
> hotplug script which uses "physical-device" to know if it has run
> already or not.
>
> That makes me believe the following fix is the correct thing to do in
> long term.
>
> I have to admit that I cannot fully consume the commit message of 25733
> in one day so unless you (Ian) can confirm Roger's fix will not cause further
> regression otherwise I would suggest reverting my change at the moment.
My fix deals with one part of the problem, but will fail on local
migrate (block script will refuse to attach the same device twice). This
is indeed a tricky issue, and I cannot see an easy way to deal with it.
The proper way to fix this would be to unplug the devices from the
suspended domain before creating the new domain, but I'm sure this is
not trivial (this would also imply reattaching the devices to the
original domain if migration fails).
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [xen-unstable test] 19308: regressions - FAIL
2013-09-15 12:50 ` Ian Campbell
2013-09-16 7:49 ` Roger Pau Monné
@ 2013-09-16 10:43 ` Ian Jackson
1 sibling, 0 replies; 9+ messages in thread
From: Ian Jackson @ 2013-09-16 10:43 UTC (permalink / raw)
To: Ian Campbell; +Cc: Wei Liu, xen-devel, xen.org, Roger Pau Monne
Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 19308: regressions - FAIL"):
> For both of these I'm suspicious of:
> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
>
> and to a lesser extent:
> a508caf libxl: fix libxl__device_disk_from_xs_be to parse backend domid
Yes.
> It doesn't look like the bisector is looking at this, or else I'm
> reading osstest's resource plan wrongly
The log volume had filled up with git caches. I'm pruning them.
Really this should be automatic but really there should be one cache,
not one per host.
Ian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [xen-unstable test] 19308: regressions - FAIL
2013-09-16 9:55 ` Wei Liu
2013-09-16 10:19 ` Roger Pau Monné
@ 2013-09-16 12:43 ` Ian Campbell
2013-09-16 13:25 ` Wei Liu
1 sibling, 1 reply; 9+ messages in thread
From: Ian Campbell @ 2013-09-16 12:43 UTC (permalink / raw)
To: Wei Liu; +Cc: xen-devel, xen.org, Roger Pau Monné
On Mon, 2013-09-16 at 10:55 +0100, Wei Liu wrote:
> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:
> > On 15/09/13 14:50, Ian Campbell wrote:
> > > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
> > >> flight 19308 xen-unstable real [real]
> > >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
> > >>
> > >> Regressions :-(
> > >>
> > >> Tests which did not succeed and are blocking,
> > >> including tests which could not be run:
> > >> test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
> > >> test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail REGR. vs. 19208
> > >> test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
> > >> test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
> > >> test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail REGR. vs. 19208
> > >
> > > These are due to /var/run/xen-hotplug/block getting leaked
> > >
>
> The error message in XenStore shows blkback tries to get hold of the
> block device 0:0 but there's no such device entry in system.
>
> > >> test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
> > >> test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10 fail REGR. vs. 19208
> > >> test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
> > >
> > > These are:
> > > libxl: error: libxl_device.c:894:device_backend_callback: unable
> > > to add device with path /local/domain/0/backend/vbd/9/5632
> > > libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
> > >
> > > /var/log/xen/xenhotplug.log contains:
> > > xenstore-read: couldn't read path backend/vbd/9/5632/node
> > >
> > > For both of these I'm suspicious of:
> > > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
> >
> > Hello,
> >
> > I've tracked this down to libxl writing a wrong physical-device
> > xenstore node when using regular files. When using block devices libxl
> > can write the physical-device because it can be fetched without
> > requiring the execution of the block script, but with regular files it
> > is not true, we must first execute the block script in order to mount
> > the regular file into a loop device and then fetch the physical-device
> > from the loop device to which the image has been mounted. Following
> > patch solves the issue for me.
> >
>
> Yes, that's the in question I think. That code snippet was introduced in:
>
> commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> Author: Ian Campbell <Ian.Campbell@citrix.com>
> Date: Tue Aug 7 14:26:29 2012 +0100
>
> libxl: write physical-device node if user did not supply a block script
>
> This reverts one of the intentional changes from 25733:353bc0801b11.
> That change exposed an issue with the xl migration protocol, which
> although safe triggers the hotplug scripts device sharing logic.
>
> For 4.2 we disable this logic by writing the physical-device xenstore
> node ourselves if a user did not supply a script. If the user did
> supply a script then we continue to rely on it to write the
> physical-device node (not least because the script may create the
> device and therefore it is not available before we run the script).
>
> This means that to support localhost migration a block hotplug script
> needs to be robust against adding a device twice and should not
> deactivate the device until it has been removed twice.
>
> This should be revisited for 4.3.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Committed-by: Ian Campbell <ian.campbell@citrix.com>
>
> And in the commit message it says this behavior should be revisited.
Which never happened :-(
I don't remember exactly but I think the real fix is a reworking of the
sequencing of block device attach/detach vs the migration stop and copy
phase, not a simple tweak IIRC.
> Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> things look more complicated. One interesting snippet in the commit
> message is:
>
> - libxl should not write the "physical-device" node. This is the
> responsibility of the block script. Writing the "physical-device"
> node in libxl basically completely short-cuts the standard block
> hotplug script which uses "physical-device" to know if it has run
> already or not.
>
> That makes me believe the following fix is the correct thing to do in
> long term.
>
> I have to admit that I cannot fully consume the commit message of 25733
> in one day so unless you (Ian) can confirm Roger's fix will not cause further
> regression otherwise I would suggest reverting my change at the moment.
Can you test some lifecycle operations, in particular localhost
migrations with both phy:// and file:// devices to see if it fixes it?
If not then we can revert.
Perhaps rather than removing that block entirely it should be
conditional on S_ISBLK?
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [xen-unstable test] 19308: regressions - FAIL
2013-09-16 12:43 ` Ian Campbell
@ 2013-09-16 13:25 ` Wei Liu
2013-09-16 13:30 ` Ian Campbell
0 siblings, 1 reply; 9+ messages in thread
From: Wei Liu @ 2013-09-16 13:25 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen.org, xen-devel, Wei Liu, Roger Pau Monné
On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote:
[...]
> > > I've tracked this down to libxl writing a wrong physical-device
> > > xenstore node when using regular files. When using block devices libxl
> > > can write the physical-device because it can be fetched without
> > > requiring the execution of the block script, but with regular files it
> > > is not true, we must first execute the block script in order to mount
> > > the regular file into a loop device and then fetch the physical-device
> > > from the loop device to which the image has been mounted. Following
> > > patch solves the issue for me.
> > >
> >
> > Yes, that's the in question I think. That code snippet was introduced in:
> >
> > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> > Author: Ian Campbell <Ian.Campbell@citrix.com>
> > Date: Tue Aug 7 14:26:29 2012 +0100
> >
> > libxl: write physical-device node if user did not supply a block script
> >
> > This reverts one of the intentional changes from 25733:353bc0801b11.
> > That change exposed an issue with the xl migration protocol, which
> > although safe triggers the hotplug scripts device sharing logic.
> >
> > For 4.2 we disable this logic by writing the physical-device xenstore
> > node ourselves if a user did not supply a script. If the user did
> > supply a script then we continue to rely on it to write the
> > physical-device node (not least because the script may create the
> > device and therefore it is not available before we run the script).
> >
> > This means that to support localhost migration a block hotplug script
> > needs to be robust against adding a device twice and should not
> > deactivate the device until it has been removed twice.
> >
> > This should be revisited for 4.3.
> >
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > Committed-by: Ian Campbell <ian.campbell@citrix.com>
> >
> > And in the commit message it says this behavior should be revisited.
>
> Which never happened :-(
>
> I don't remember exactly but I think the real fix is a reworking of the
> sequencing of block device attach/detach vs the migration stop and copy
> phase, not a simple tweak IIRC.
>
> > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> > things look more complicated. One interesting snippet in the commit
> > message is:
> >
> > - libxl should not write the "physical-device" node. This is the
> > responsibility of the block script. Writing the "physical-device"
> > node in libxl basically completely short-cuts the standard block
> > hotplug script which uses "physical-device" to know if it has run
> > already or not.
> >
> > That makes me believe the following fix is the correct thing to do in
> > long term.
> >
> > I have to admit that I cannot fully consume the commit message of 25733
> > in one day so unless you (Ian) can confirm Roger's fix will not cause further
> > regression otherwise I would suggest reverting my change at the moment.
>
> Can you test some lifecycle operations, in particular localhost
> migrations with both phy:// and file:// devices to see if it fixes it?
> If not then we can revert.
>
Unfortunately with Roger's patch applied local migration for raw format
file disk doesn't work.
xc: detail: Save exit of domid 69 with rc=0
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block add [8102] exited with error status 1
libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File /data/s0.raw is loopback-mounted through /dev/loop0,
which is mounted in a guest domain,
and so cannot be mounted now.
libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [8181] exited with error status 1
libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
migration target: Domain creation failed (code -3).
libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [8091] exited with error status 3
Migration failed, resuming at sender.
> Perhaps rather than removing that block entirely it should be
> conditional on S_ISBLK?
>
With the conditional on S_ISBLK, raw format file mounted to loopdev,
local migration still breaks with above error.
So for now please revert that change.
Wei.
> Ian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [xen-unstable test] 19308: regressions - FAIL
2013-09-16 13:25 ` Wei Liu
@ 2013-09-16 13:30 ` Ian Campbell
0 siblings, 0 replies; 9+ messages in thread
From: Ian Campbell @ 2013-09-16 13:30 UTC (permalink / raw)
To: Wei Liu; +Cc: xen-devel, xen.org, Roger Pau Monné
On Mon, 2013-09-16 at 14:25 +0100, Wei Liu wrote:
> On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote:
> [...]
> > > > I've tracked this down to libxl writing a wrong physical-device
> > > > xenstore node when using regular files. When using block devices libxl
> > > > can write the physical-device because it can be fetched without
> > > > requiring the execution of the block script, but with regular files it
> > > > is not true, we must first execute the block script in order to mount
> > > > the regular file into a loop device and then fetch the physical-device
> > > > from the loop device to which the image has been mounted. Following
> > > > patch solves the issue for me.
> > > >
> > >
> > > Yes, that's the in question I think. That code snippet was introduced in:
> > >
> > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> > > Author: Ian Campbell <Ian.Campbell@citrix.com>
> > > Date: Tue Aug 7 14:26:29 2012 +0100
> > >
> > > libxl: write physical-device node if user did not supply a block script
> > >
> > > This reverts one of the intentional changes from 25733:353bc0801b11.
> > > That change exposed an issue with the xl migration protocol, which
> > > although safe triggers the hotplug scripts device sharing logic.
> > >
> > > For 4.2 we disable this logic by writing the physical-device xenstore
> > > node ourselves if a user did not supply a script. If the user did
> > > supply a script then we continue to rely on it to write the
> > > physical-device node (not least because the script may create the
> > > device and therefore it is not available before we run the script).
> > >
> > > This means that to support localhost migration a block hotplug script
> > > needs to be robust against adding a device twice and should not
> > > deactivate the device until it has been removed twice.
> > >
> > > This should be revisited for 4.3.
> > >
> > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > > Committed-by: Ian Campbell <ian.campbell@citrix.com>
> > >
> > > And in the commit message it says this behavior should be revisited.
> >
> > Which never happened :-(
> >
> > I don't remember exactly but I think the real fix is a reworking of the
> > sequencing of block device attach/detach vs the migration stop and copy
> > phase, not a simple tweak IIRC.
> >
> > > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> > > things look more complicated. One interesting snippet in the commit
> > > message is:
> > >
> > > - libxl should not write the "physical-device" node. This is the
> > > responsibility of the block script. Writing the "physical-device"
> > > node in libxl basically completely short-cuts the standard block
> > > hotplug script which uses "physical-device" to know if it has run
> > > already or not.
> > >
> > > That makes me believe the following fix is the correct thing to do in
> > > long term.
> > >
> > > I have to admit that I cannot fully consume the commit message of 25733
> > > in one day so unless you (Ian) can confirm Roger's fix will not cause further
> > > regression otherwise I would suggest reverting my change at the moment.
> >
> > Can you test some lifecycle operations, in particular localhost
> > migrations with both phy:// and file:// devices to see if it fixes it?
> > If not then we can revert.
> >
>
> Unfortunately with Roger's patch applied local migration for raw format
> file disk doesn't work.
>
> xc: detail: Save exit of domid 69 with rc=0
> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block add [8102] exited with error status 1
> libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File /data/s0.raw is loopback-mounted through /dev/loop0,
> which is mounted in a guest domain,
> and so cannot be mounted now.
> libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices
> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [8181] exited with error status 1
> libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
> migration target: Domain creation failed (code -3).
> libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream
> libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [8091] exited with error status 3
> Migration failed, resuming at sender.
>
> > Perhaps rather than removing that block entirely it should be
> > conditional on S_ISBLK?
> >
>
> With the conditional on S_ISBLK, raw format file mounted to loopdev,
> local migration still breaks with above error.
>
> So for now please revert that change.
Done.
Ian.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-09-16 13:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-15 6:09 [xen-unstable test] 19308: regressions - FAIL xen.org
2013-09-15 12:50 ` Ian Campbell
2013-09-16 7:49 ` Roger Pau Monné
2013-09-16 9:55 ` Wei Liu
2013-09-16 10:19 ` Roger Pau Monné
2013-09-16 12:43 ` Ian Campbell
2013-09-16 13:25 ` Wei Liu
2013-09-16 13:30 ` Ian Campbell
2013-09-16 10:43 ` Ian Jackson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.