All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable test] 19308: regressions - FAIL
@ 2013-09-15  6:09 xen.org
  2013-09-15 12:50 ` Ian Campbell
  0 siblings, 1 reply; 9+ messages in thread
From: xen.org @ 2013-09-15  6:09 UTC (permalink / raw)
  To: xen-devel; +Cc: ian.jackson

[-- Attachment #1: Type: text/plain, Size: 6803 bytes --]

flight 19308 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check  fail REGR. vs. 19208
 test-amd64-i386-rhel6hvm-intel 11 leak-check/check        fail REGR. vs. 19208
 test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
 test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
 test-amd64-i386-rhel6hvm-amd 11 leak-check/check          fail REGR. vs. 19208
 test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10   fail REGR. vs. 19208
 test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10  fail REGR. vs. 19208
 test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
 test-amd64-i386-qemut-rhel6hvm-intel 11 leak-check/check fail in 19288 REGR. vs. 19208
 test-amd64-amd64-xl-winxpsp3 8 guest-saverestore fail in 19288 REGR. vs. 19208
 test-amd64-i386-xl-qemut-win7-amd64 12 guest-localmigrate/x10 fail in 19288 REGR. vs. 19208

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemut-rhel6hvm-intel  7 redhat-install      fail pass in 19288
 test-amd64-amd64-xl-winxpsp3  7 windows-install             fail pass in 19288
 test-amd64-i386-xl-qemut-win7-amd64  7 windows-install      fail pass in 19288
 test-amd64-i386-rhel6hvm-intel  7 redhat-install   fail in 19288 pass in 19308
 test-amd64-amd64-xl-sedf 14 guest-localmigrate/x10 fail in 19288 pass in 19308
 test-amd64-i386-qemut-rhel6hvm-amd 7 redhat-install fail in 19288 pass in 19308
 test-amd64-i386-xl-win7-amd64 11 guest-localmigrate.2 fail in 19288 pass in 19308
 test-amd64-amd64-xl-win7-amd64  7 windows-install  fail in 19288 pass in 19308
 test-amd64-amd64-xl-qemut-winxpsp3 7 windows-install fail in 19288 pass in 19308

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl           1 xen-build-check(1)           blocked  n/a
 test-amd64-amd64-xl-pcipt-intel  9 guest-start                 fail never pass
 build-armhf-pvops             4 kernel-build                 fail   never pass
 test-amd64-i386-xend-winxpsp3 16 leak-check/check             fail  never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop               fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop             fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 16 leak-check/check        fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop               fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop         fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 13 guest-stop             fail never pass

version targeted for testing:
 xen                  593470233ff38385df9dcf5690cc58c7a4fb290d
baseline version:
 xen                  cadbe2f9e768585fad52156be2433d49ec9feaf1

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Dario Faggioli <dario.faggioli@citrix.com>
  Fabio Fantoni <fabio.fantoni@m2r.biz>
  George Dunlap <george.dunlap@eu.citrix.com>
  Ian Campbell <ian.campbell@citrix.com>
  Ian Jackson <ian.jackson@eu.citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Juergen Gross <juergen.gross@ts.fujitsu.com>
  Keir Fraser <keir@xen.org>
  Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  Matthew Daley <mattjd@gmail.com>
  Roger Pau Monné <roger.pau@citrix.com>
  Samuel Thibault <samuel.thibault@ens-lyon.org>
  Tim Deegan <tim@xen.org>
  Wei Liu <wei.liu2@citrix.com>
------------------------------------------------------------

jobs:
 build-amd64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-pvops                                            pass    
 build-armhf-pvops                                            fail    
 build-i386-pvops                                             pass    
 test-amd64-amd64-xl                                          pass    
 test-armhf-armhf-xl                                          blocked 
 test-amd64-i386-xl                                           pass    
 test-amd64-i386-rhel6hvm-amd                                 fail    
 test-amd64-i386-qemut-rhel6hvm-amd                           fail    
 test-amd64-i386-qemuu-rhel6hvm-amd                           fail    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-amd64-xl-win7-amd64                               fail    
 test-amd64-i386-xl-win7-amd64                                fail    
 test-amd64-i386-xl-credit2                                   pass    
 test-amd64-amd64-xl-pcipt-intel                              fail    
 test-amd64-i386-rhel6hvm-intel                               fail    
 test-amd64-i386-qemut-rhel6hvm-intel                         fail    
 test-amd64-i386-qemuu-rhel6hvm-intel                         fail    
 test-amd64-i386-xl-multivcpu                                 pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-xl-sedf-pin                                 pass    
 test-amd64-amd64-pv                                          pass    
 test-amd64-i386-pv                                           pass    
 test-amd64-amd64-xl-sedf                                     pass    
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1                     fail    
 test-amd64-i386-xl-winxpsp3-vcpus1                           fail    
 test-amd64-i386-xend-qemut-winxpsp3                          fail    
 test-amd64-amd64-xl-qemut-winxpsp3                           fail    
 test-amd64-amd64-xl-qemuu-winxpsp3                           fail    
 test-amd64-i386-xend-winxpsp3                                fail    
 test-amd64-amd64-xl-winxpsp3                                 fail    


------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
    http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
    http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 368 lines long.)


[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-unstable test] 19308: regressions - FAIL
  2013-09-15  6:09 [xen-unstable test] 19308: regressions - FAIL xen.org
@ 2013-09-15 12:50 ` Ian Campbell
  2013-09-16  7:49   ` Roger Pau Monné
  2013-09-16 10:43   ` Ian Jackson
  0 siblings, 2 replies; 9+ messages in thread
From: Ian Campbell @ 2013-09-15 12:50 UTC (permalink / raw)
  To: xen.org, Wei Liu, Roger Pau Monne; +Cc: xen-devel

On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
> flight 19308 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check  fail REGR. vs. 19208
>  test-amd64-i386-rhel6hvm-intel 11 leak-check/check        fail REGR. vs. 19208
>  test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
>  test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
>  test-amd64-i386-rhel6hvm-amd 11 leak-check/check          fail REGR. vs. 19208

These are due to /var/run/xen-hotplug/block getting leaked

>  test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10   fail REGR. vs. 19208
>  test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10  fail REGR. vs. 19208
>  test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208

These are:
        libxl: error: libxl_device.c:894:device_backend_callback: unable
        to add device with path /local/domain/0/backend/vbd/9/5632
        libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices

/var/log/xen/xenhotplug.log contains:
        xenstore-read: couldn't read path backend/vbd/9/5632/node

For both of these I'm suspicious of:
11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file

and to a lesser extent:
a508caf libxl: fix libxl__device_disk_from_xs_be to parse backend domid

It doesn't look like the bisector is looking at this, or else I'm
reading osstest's resource plan wrongly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-unstable test] 19308: regressions - FAIL
  2013-09-15 12:50 ` Ian Campbell
@ 2013-09-16  7:49   ` Roger Pau Monné
  2013-09-16  9:55     ` Wei Liu
  2013-09-16 10:43   ` Ian Jackson
  1 sibling, 1 reply; 9+ messages in thread
From: Roger Pau Monné @ 2013-09-16  7:49 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, xen-devel, xen.org

On 15/09/13 14:50, Ian Campbell wrote:
> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
>> flight 19308 xen-unstable real [real]
>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check  fail REGR. vs. 19208
>>  test-amd64-i386-rhel6hvm-intel 11 leak-check/check        fail REGR. vs. 19208
>>  test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
>>  test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
>>  test-amd64-i386-rhel6hvm-amd 11 leak-check/check          fail REGR. vs. 19208
> 
> These are due to /var/run/xen-hotplug/block getting leaked
> 
>>  test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10   fail REGR. vs. 19208
>>  test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10  fail REGR. vs. 19208
>>  test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
> 
> These are:
>         libxl: error: libxl_device.c:894:device_backend_callback: unable
>         to add device with path /local/domain/0/backend/vbd/9/5632
>         libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
> 
> /var/log/xen/xenhotplug.log contains:
>         xenstore-read: couldn't read path backend/vbd/9/5632/node
> 
> For both of these I'm suspicious of:
> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file

Hello,

I've tracked this down to libxl writing a wrong physical-device 
xenstore node when using regular files. When using block devices libxl 
can write the physical-device because it can be fetched without 
requiring the execution of the block script, but with regular files it 
is not true, we must first execute the block script in order to mount 
the regular file into a loop device and then fetch the physical-device 
from the loop device to which the image has been mounted. Following 
patch solves the issue for me.

8<-------------------------------------------------------------------
From e150f00565bfe291809441e73630b243e21a52b0 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Mon, 16 Sep 2013 09:39:05 +0200
Subject: [PATCH] libxl: don't write physical-device vbd xenstore node in
 libxl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

libxl used to write the physical-device xenstore node needed by the
phy backend type, because the phy backend type could only be used with
block devices. If libxl allows the backend type phy to be used with
regular files, it can no longer write physical-device because the
hotplug script has to be executed first in order to mount the regular
file into a loop device and then write the physical-device of the loop
device used.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 tools/libxl/libxl.c |   15 ---------------
 1 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0879f23..326a378 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2101,21 +2101,6 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
                                          libxl__xen_script_dir_path());
                 flexarray_append_pair(back, "script", script);
 
-                /* If the user did not supply a block script then we
-                 * write the physical-device node ourselves.
-                 *
-                 * If the user did supply a script then that script is
-                 * responsible for this since the block device may not
-                 * exist yet.
-                 */
-                if (!disk->script &&
-                    disk->backend_domid == LIBXL_TOOLSTACK_DOMID) {
-                    int major, minor;
-                    libxl__device_physdisk_major_minor(dev, &major, &minor);
-                    flexarray_append_pair(back, "physical-device",
-                            libxl__sprintf(gc, "%x:%x", major, minor));
-                }
-
                 assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD);
                 break;
 
-- 
1.7.7.5 (Apple Git-26)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [xen-unstable test] 19308: regressions - FAIL
  2013-09-16  7:49   ` Roger Pau Monné
@ 2013-09-16  9:55     ` Wei Liu
  2013-09-16 10:19       ` Roger Pau Monné
  2013-09-16 12:43       ` Ian Campbell
  0 siblings, 2 replies; 9+ messages in thread
From: Wei Liu @ 2013-09-16  9:55 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Wei Liu, xen-devel, xen.org, Ian Campbell

On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:
> On 15/09/13 14:50, Ian Campbell wrote:
> > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
> >> flight 19308 xen-unstable real [real]
> >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
> >>
> >> Regressions :-(
> >>
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >>  test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check  fail REGR. vs. 19208
> >>  test-amd64-i386-rhel6hvm-intel 11 leak-check/check        fail REGR. vs. 19208
> >>  test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
> >>  test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
> >>  test-amd64-i386-rhel6hvm-amd 11 leak-check/check          fail REGR. vs. 19208
> > 
> > These are due to /var/run/xen-hotplug/block getting leaked
> > 

The error message in XenStore shows blkback tries to get hold of the
block device 0:0 but there's no such device entry in system.

> >>  test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10   fail REGR. vs. 19208
> >>  test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10  fail REGR. vs. 19208
> >>  test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
> > 
> > These are:
> >         libxl: error: libxl_device.c:894:device_backend_callback: unable
> >         to add device with path /local/domain/0/backend/vbd/9/5632
> >         libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
> > 
> > /var/log/xen/xenhotplug.log contains:
> >         xenstore-read: couldn't read path backend/vbd/9/5632/node
> > 
> > For both of these I'm suspicious of:
> > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
> 
> Hello,
> 
> I've tracked this down to libxl writing a wrong physical-device 
> xenstore node when using regular files. When using block devices libxl 
> can write the physical-device because it can be fetched without 
> requiring the execution of the block script, but with regular files it 
> is not true, we must first execute the block script in order to mount 
> the regular file into a loop device and then fetch the physical-device 
> from the loop device to which the image has been mounted. Following 
> patch solves the issue for me.
>

Yes, that's the in question I think. That code snippet was introduced in:

commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
Author: Ian Campbell <Ian.Campbell@citrix.com>
Date:   Tue Aug 7 14:26:29 2012 +0100

    libxl: write physical-device node if user did not supply a block script
    
    This reverts one of the intentional changes from 25733:353bc0801b11.
    That change exposed an issue with the xl migration protocol, which
    although safe triggers the hotplug scripts device sharing logic.
    
    For 4.2 we disable this logic by writing the physical-device xenstore
    node ourselves if a user did not supply a script. If the user did
    supply a script then we continue to rely on it to write the
    physical-device node (not least because the script may create the
    device and therefore it is not available before we run the script).
    
    This means that to support localhost migration a block hotplug script
    needs to be robust against adding a device twice and should not
    deactivate the device until it has been removed twice.
    
    This should be revisited for 4.3.
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
    Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
    Committed-by: Ian Campbell <ian.campbell@citrix.com>

And in the commit message it says this behavior should be revisited.

Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
things look more complicated. One interesting snippet in the commit
message is:

- libxl should not write the "physical-device" node. This is the
  responsibility of the block script. Writing the "physical-device"
  node in libxl basically completely short-cuts the standard block
  hotplug script which uses "physical-device" to know if it has run
  already or not.

That makes me believe the following fix is the correct thing to do in
long term.

I have to admit that I cannot fully consume the commit message of 25733
in one day so unless you (Ian) can confirm Roger's fix will not cause further
regression otherwise I would suggest reverting my change at the moment.

Wei.

> 8<-------------------------------------------------------------------
> >From e150f00565bfe291809441e73630b243e21a52b0 Mon Sep 17 00:00:00 2001
> From: Roger Pau Monne <roger.pau@citrix.com>
> Date: Mon, 16 Sep 2013 09:39:05 +0200
> Subject: [PATCH] libxl: don't write physical-device vbd xenstore node in
>  libxl
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> libxl used to write the physical-device xenstore node needed by the
> phy backend type, because the phy backend type could only be used with
> block devices. If libxl allows the backend type phy to be used with
> regular files, it can no longer write physical-device because the
> hotplug script has to be executed first in order to mount the regular
> file into a loop device and then write the physical-device of the loop
> device used.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  tools/libxl/libxl.c |   15 ---------------
>  1 files changed, 0 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 0879f23..326a378 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -2101,21 +2101,6 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
>                                           libxl__xen_script_dir_path());
>                  flexarray_append_pair(back, "script", script);
>  
> -                /* If the user did not supply a block script then we
> -                 * write the physical-device node ourselves.
> -                 *
> -                 * If the user did supply a script then that script is
> -                 * responsible for this since the block device may not
> -                 * exist yet.
> -                 */
> -                if (!disk->script &&
> -                    disk->backend_domid == LIBXL_TOOLSTACK_DOMID) {
> -                    int major, minor;
> -                    libxl__device_physdisk_major_minor(dev, &major, &minor);
> -                    flexarray_append_pair(back, "physical-device",
> -                            libxl__sprintf(gc, "%x:%x", major, minor));
> -                }
> -
>                  assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD);
>                  break;
>  
> -- 
> 1.7.7.5 (Apple Git-26)
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-unstable test] 19308: regressions - FAIL
  2013-09-16  9:55     ` Wei Liu
@ 2013-09-16 10:19       ` Roger Pau Monné
  2013-09-16 12:43       ` Ian Campbell
  1 sibling, 0 replies; 9+ messages in thread
From: Roger Pau Monné @ 2013-09-16 10:19 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, xen.org, Ian Campbell

On 16/09/13 11:55, Wei Liu wrote:
> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:
>> On 15/09/13 14:50, Ian Campbell wrote:
>>> On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
>>>> flight 19308 xen-unstable real [real]
>>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>  test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check  fail REGR. vs. 19208
>>>>  test-amd64-i386-rhel6hvm-intel 11 leak-check/check        fail REGR. vs. 19208
>>>>  test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
>>>>  test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
>>>>  test-amd64-i386-rhel6hvm-amd 11 leak-check/check          fail REGR. vs. 19208
>>>
>>> These are due to /var/run/xen-hotplug/block getting leaked
>>>
> 
> The error message in XenStore shows blkback tries to get hold of the
> block device 0:0 but there's no such device entry in system.
> 
>>>>  test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10   fail REGR. vs. 19208
>>>>  test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10  fail REGR. vs. 19208
>>>>  test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
>>>
>>> These are:
>>>         libxl: error: libxl_device.c:894:device_backend_callback: unable
>>>         to add device with path /local/domain/0/backend/vbd/9/5632
>>>         libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
>>>
>>> /var/log/xen/xenhotplug.log contains:
>>>         xenstore-read: couldn't read path backend/vbd/9/5632/node
>>>
>>> For both of these I'm suspicious of:
>>> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
>>
>> Hello,
>>
>> I've tracked this down to libxl writing a wrong physical-device 
>> xenstore node when using regular files. When using block devices libxl 
>> can write the physical-device because it can be fetched without 
>> requiring the execution of the block script, but with regular files it 
>> is not true, we must first execute the block script in order to mount 
>> the regular file into a loop device and then fetch the physical-device 
>> from the loop device to which the image has been mounted. Following 
>> patch solves the issue for me.
>>
> 
> Yes, that's the in question I think. That code snippet was introduced in:
> 
> commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> Author: Ian Campbell <Ian.Campbell@citrix.com>
> Date:   Tue Aug 7 14:26:29 2012 +0100
> 
>     libxl: write physical-device node if user did not supply a block script
>     
>     This reverts one of the intentional changes from 25733:353bc0801b11.
>     That change exposed an issue with the xl migration protocol, which
>     although safe triggers the hotplug scripts device sharing logic.
>     
>     For 4.2 we disable this logic by writing the physical-device xenstore
>     node ourselves if a user did not supply a script. If the user did
>     supply a script then we continue to rely on it to write the
>     physical-device node (not least because the script may create the
>     device and therefore it is not available before we run the script).
>     
>     This means that to support localhost migration a block hotplug script
>     needs to be robust against adding a device twice and should not
>     deactivate the device until it has been removed twice.
>     
>     This should be revisited for 4.3.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
>     Committed-by: Ian Campbell <ian.campbell@citrix.com>
> 
> And in the commit message it says this behavior should be revisited.
> 
> Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> things look more complicated. One interesting snippet in the commit
> message is:
> 
> - libxl should not write the "physical-device" node. This is the
>   responsibility of the block script. Writing the "physical-device"
>   node in libxl basically completely short-cuts the standard block
>   hotplug script which uses "physical-device" to know if it has run
>   already or not.
> 
> That makes me believe the following fix is the correct thing to do in
> long term.
> 
> I have to admit that I cannot fully consume the commit message of 25733
> in one day so unless you (Ian) can confirm Roger's fix will not cause further
> regression otherwise I would suggest reverting my change at the moment.

My fix deals with one part of the problem, but will fail on local
migrate (block script will refuse to attach the same device twice). This
is indeed a tricky issue, and I cannot see an easy way to deal with it.

The proper way to fix this would be to unplug the devices from the
suspended domain before creating the new domain, but I'm sure this is
not trivial (this would also imply reattaching the devices to the
original domain if migration fails).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-unstable test] 19308: regressions - FAIL
  2013-09-15 12:50 ` Ian Campbell
  2013-09-16  7:49   ` Roger Pau Monné
@ 2013-09-16 10:43   ` Ian Jackson
  1 sibling, 0 replies; 9+ messages in thread
From: Ian Jackson @ 2013-09-16 10:43 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, xen-devel, xen.org, Roger Pau Monne

Ian Campbell writes ("Re: [Xen-devel] [xen-unstable test] 19308: regressions - FAIL"):
> For both of these I'm suspicious of:
> 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
> 
> and to a lesser extent:
> a508caf libxl: fix libxl__device_disk_from_xs_be to parse backend domid

Yes.

> It doesn't look like the bisector is looking at this, or else I'm
> reading osstest's resource plan wrongly

The log volume had filled up with git caches.  I'm pruning them.
Really this should be automatic but really there should be one cache,
not one per host.

Ian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-unstable test] 19308: regressions - FAIL
  2013-09-16  9:55     ` Wei Liu
  2013-09-16 10:19       ` Roger Pau Monné
@ 2013-09-16 12:43       ` Ian Campbell
  2013-09-16 13:25         ` Wei Liu
  1 sibling, 1 reply; 9+ messages in thread
From: Ian Campbell @ 2013-09-16 12:43 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, xen.org, Roger Pau Monné

On Mon, 2013-09-16 at 10:55 +0100, Wei Liu wrote:
> On Mon, Sep 16, 2013 at 09:49:42AM +0200, Roger Pau Monné wrote:
> > On 15/09/13 14:50, Ian Campbell wrote:
> > > On Sun, 2013-09-15 at 07:09 +0100, xen.org wrote:
> > >> flight 19308 xen-unstable real [real]
> > >> http://www.chiark.greenend.org.uk/~xensrcts/logs/19308/
> > >>
> > >> Regressions :-(
> > >>
> > >> Tests which did not succeed and are blocking,
> > >> including tests which could not be run:
> > >>  test-amd64-i386-qemuu-rhel6hvm-intel 11 leak-check/check  fail REGR. vs. 19208
> > >>  test-amd64-i386-rhel6hvm-intel 11 leak-check/check        fail REGR. vs. 19208
> > >>  test-amd64-i386-qemuu-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
> > >>  test-amd64-i386-qemut-rhel6hvm-amd 11 leak-check/check    fail REGR. vs. 19208
> > >>  test-amd64-i386-rhel6hvm-amd 11 leak-check/check          fail REGR. vs. 19208
> > > 
> > > These are due to /var/run/xen-hotplug/block getting leaked
> > > 
> 
> The error message in XenStore shows blkback tries to get hold of the
> block device 0:0 but there's no such device entry in system.
> 
> > >>  test-amd64-i386-xl-win7-amd64 12 guest-localmigrate/x10   fail REGR. vs. 19208
> > >>  test-amd64-amd64-xl-win7-amd64 12 guest-localmigrate/x10  fail REGR. vs. 19208
> > >>  test-amd64-amd64-xl-qemut-winxpsp3 12 guest-localmigrate/x10 fail REGR. vs. 19208
> > > 
> > > These are:
> > >         libxl: error: libxl_device.c:894:device_backend_callback: unable
> > >         to add device with path /local/domain/0/backend/vbd/9/5632
> > >         libxl: error: libxl_create.c:935:domcreate_launch_dm: unable to add disk devices
> > > 
> > > /var/log/xen/xenhotplug.log contains:
> > >         xenstore-read: couldn't read path backend/vbd/9/5632/node
> > > 
> > > For both of these I'm suspicious of:
> > > 11a63a1 libxl, hotplug/Linux: default to phy backend for raw format file
> > 
> > Hello,
> > 
> > I've tracked this down to libxl writing a wrong physical-device 
> > xenstore node when using regular files. When using block devices libxl 
> > can write the physical-device because it can be fetched without 
> > requiring the execution of the block script, but with regular files it 
> > is not true, we must first execute the block script in order to mount 
> > the regular file into a loop device and then fetch the physical-device 
> > from the loop device to which the image has been mounted. Following 
> > patch solves the issue for me.
> >
> 
> Yes, that's the in question I think. That code snippet was introduced in:
> 
> commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> Author: Ian Campbell <Ian.Campbell@citrix.com>
> Date:   Tue Aug 7 14:26:29 2012 +0100
> 
>     libxl: write physical-device node if user did not supply a block script
>     
>     This reverts one of the intentional changes from 25733:353bc0801b11.
>     That change exposed an issue with the xl migration protocol, which
>     although safe triggers the hotplug scripts device sharing logic.
>     
>     For 4.2 we disable this logic by writing the physical-device xenstore
>     node ourselves if a user did not supply a script. If the user did
>     supply a script then we continue to rely on it to write the
>     physical-device node (not least because the script may create the
>     device and therefore it is not available before we run the script).
>     
>     This means that to support localhost migration a block hotplug script
>     needs to be robust against adding a device twice and should not
>     deactivate the device until it has been removed twice.
>     
>     This should be revisited for 4.3.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
>     Committed-by: Ian Campbell <ian.campbell@citrix.com>
> 
> And in the commit message it says this behavior should be revisited.

Which never happened :-(

I don't remember exactly but I think the real fix is a reworking of the
sequencing of block device attach/detach vs the migration stop and copy
phase, not a simple tweak IIRC.

> Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> things look more complicated. One interesting snippet in the commit
> message is:
> 
> - libxl should not write the "physical-device" node. This is the
>   responsibility of the block script. Writing the "physical-device"
>   node in libxl basically completely short-cuts the standard block
>   hotplug script which uses "physical-device" to know if it has run
>   already or not.
> 
> That makes me believe the following fix is the correct thing to do in
> long term.
> 
> I have to admit that I cannot fully consume the commit message of 25733
> in one day so unless you (Ian) can confirm Roger's fix will not cause further
> regression otherwise I would suggest reverting my change at the moment.

Can you test some lifecycle operations, in particular localhost
migrations with both phy:// and file:// devices to see if it fixes it?
If not then we can revert.

Perhaps rather than removing that block entirely it should be
conditional on S_ISBLK?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-unstable test] 19308: regressions - FAIL
  2013-09-16 12:43       ` Ian Campbell
@ 2013-09-16 13:25         ` Wei Liu
  2013-09-16 13:30           ` Ian Campbell
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Liu @ 2013-09-16 13:25 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen.org, xen-devel, Wei Liu, Roger Pau Monné

On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote:
[...]
> > > I've tracked this down to libxl writing a wrong physical-device 
> > > xenstore node when using regular files. When using block devices libxl 
> > > can write the physical-device because it can be fetched without 
> > > requiring the execution of the block script, but with regular files it 
> > > is not true, we must first execute the block script in order to mount 
> > > the regular file into a loop device and then fetch the physical-device 
> > > from the loop device to which the image has been mounted. Following 
> > > patch solves the issue for me.
> > >
> > 
> > Yes, that's the in question I think. That code snippet was introduced in:
> > 
> > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> > Author: Ian Campbell <Ian.Campbell@citrix.com>
> > Date:   Tue Aug 7 14:26:29 2012 +0100
> > 
> >     libxl: write physical-device node if user did not supply a block script
> >     
> >     This reverts one of the intentional changes from 25733:353bc0801b11.
> >     That change exposed an issue with the xl migration protocol, which
> >     although safe triggers the hotplug scripts device sharing logic.
> >     
> >     For 4.2 we disable this logic by writing the physical-device xenstore
> >     node ourselves if a user did not supply a script. If the user did
> >     supply a script then we continue to rely on it to write the
> >     physical-device node (not least because the script may create the
> >     device and therefore it is not available before we run the script).
> >     
> >     This means that to support localhost migration a block hotplug script
> >     needs to be robust against adding a device twice and should not
> >     deactivate the device until it has been removed twice.
> >     
> >     This should be revisited for 4.3.
> >     
> >     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> >     Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> >     Committed-by: Ian Campbell <ian.campbell@citrix.com>
> > 
> > And in the commit message it says this behavior should be revisited.
> 
> Which never happened :-(
> 
> I don't remember exactly but I think the real fix is a reworking of the
> sequencing of block device attach/detach vs the migration stop and copy
> phase, not a simple tweak IIRC.
> 
> > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> > things look more complicated. One interesting snippet in the commit
> > message is:
> > 
> > - libxl should not write the "physical-device" node. This is the
> >   responsibility of the block script. Writing the "physical-device"
> >   node in libxl basically completely short-cuts the standard block
> >   hotplug script which uses "physical-device" to know if it has run
> >   already or not.
> > 
> > That makes me believe the following fix is the correct thing to do in
> > long term.
> > 
> > I have to admit that I cannot fully consume the commit message of 25733
> > in one day so unless you (Ian) can confirm Roger's fix will not cause further
> > regression otherwise I would suggest reverting my change at the moment.
> 
> Can you test some lifecycle operations, in particular localhost
> migrations with both phy:// and file:// devices to see if it fixes it?
> If not then we can revert.
> 

Unfortunately with Roger's patch applied local migration for raw format
file disk doesn't work.

xc: detail: Save exit of domid 69 with rc=0
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block add [8102] exited with error status 1
libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File /data/s0.raw is loopback-mounted through /dev/loop0,
which is mounted in a guest domain,
and so cannot be mounted now.
libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [8181] exited with error status 1
libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
migration target: Domain creation failed (code -3).
libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [8091] exited with error status 3
Migration failed, resuming at sender.

> Perhaps rather than removing that block entirely it should be
> conditional on S_ISBLK?
> 

With the conditional on S_ISBLK, raw format file mounted to loopdev,
local migration still breaks with above error.

So for now please revert that change.

Wei.

> Ian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-unstable test] 19308: regressions - FAIL
  2013-09-16 13:25         ` Wei Liu
@ 2013-09-16 13:30           ` Ian Campbell
  0 siblings, 0 replies; 9+ messages in thread
From: Ian Campbell @ 2013-09-16 13:30 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, xen.org, Roger Pau Monné

On Mon, 2013-09-16 at 14:25 +0100, Wei Liu wrote:
> On Mon, Sep 16, 2013 at 01:43:57PM +0100, Ian Campbell wrote:
> [...]
> > > > I've tracked this down to libxl writing a wrong physical-device 
> > > > xenstore node when using regular files. When using block devices libxl 
> > > > can write the physical-device because it can be fetched without 
> > > > requiring the execution of the block script, but with regular files it 
> > > > is not true, we must first execute the block script in order to mount 
> > > > the regular file into a loop device and then fetch the physical-device 
> > > > from the loop device to which the image has been mounted. Following 
> > > > patch solves the issue for me.
> > > >
> > > 
> > > Yes, that's the in question I think. That code snippet was introduced in:
> > > 
> > > commit 15116f1c254a8aa7774e2f73a3e1340a6decd867
> > > Author: Ian Campbell <Ian.Campbell@citrix.com>
> > > Date:   Tue Aug 7 14:26:29 2012 +0100
> > > 
> > >     libxl: write physical-device node if user did not supply a block script
> > >     
> > >     This reverts one of the intentional changes from 25733:353bc0801b11.
> > >     That change exposed an issue with the xl migration protocol, which
> > >     although safe triggers the hotplug scripts device sharing logic.
> > >     
> > >     For 4.2 we disable this logic by writing the physical-device xenstore
> > >     node ourselves if a user did not supply a script. If the user did
> > >     supply a script then we continue to rely on it to write the
> > >     physical-device node (not least because the script may create the
> > >     device and therefore it is not available before we run the script).
> > >     
> > >     This means that to support localhost migration a block hotplug script
> > >     needs to be robust against adding a device twice and should not
> > >     deactivate the device until it has been removed twice.
> > >     
> > >     This should be revisited for 4.3.
> > >     
> > >     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > >     Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > >     Committed-by: Ian Campbell <ian.campbell@citrix.com>
> > > 
> > > And in the commit message it says this behavior should be revisited.
> > 
> > Which never happened :-(
> > 
> > I don't remember exactly but I think the real fix is a reworking of the
> > sequencing of block device attach/detach vs the migration stop and copy
> > phase, not a simple tweak IIRC.
> > 
> > > Tracing back to 25733 (http://xenbits.xen.org/hg/xen-unstable.hg/rev/353bc0801b11)
> > > things look more complicated. One interesting snippet in the commit
> > > message is:
> > > 
> > > - libxl should not write the "physical-device" node. This is the
> > >   responsibility of the block script. Writing the "physical-device"
> > >   node in libxl basically completely short-cuts the standard block
> > >   hotplug script which uses "physical-device" to know if it has run
> > >   already or not.
> > > 
> > > That makes me believe the following fix is the correct thing to do in
> > > long term.
> > > 
> > > I have to admit that I cannot fully consume the commit message of 25733
> > > in one day so unless you (Ian) can confirm Roger's fix will not cause further
> > > regression otherwise I would suggest reverting my change at the moment.
> > 
> > Can you test some lifecycle operations, in particular localhost
> > migrations with both phy:// and file:// devices to see if it fixes it?
> > If not then we can revert.
> > 
> 
> Unfortunately with Roger's patch applied local migration for raw format
> file disk doesn't work.
> 
> xc: detail: Save exit of domid 69 with rc=0
> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block add [8102] exited with error status 1
> libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: File /data/s0.raw is loopback-mounted through /dev/loop0,
> which is mounted in a guest domain,
> and so cannot be mounted now.
> libxl: error: libxl_create.c:932:domcreate_launch_dm: unable to add disk devices
> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [8181] exited with error status 1
> libxl: error: libxl_device.c:1021:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
> migration target: Domain creation failed (code -3).
> libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream
> libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [8091] exited with error status 3
> Migration failed, resuming at sender.
> 
> > Perhaps rather than removing that block entirely it should be
> > conditional on S_ISBLK?
> > 
> 
> With the conditional on S_ISBLK, raw format file mounted to loopdev,
> local migration still breaks with above error.
> 
> So for now please revert that change.

Done.

Ian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-09-16 13:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-15  6:09 [xen-unstable test] 19308: regressions - FAIL xen.org
2013-09-15 12:50 ` Ian Campbell
2013-09-16  7:49   ` Roger Pau Monné
2013-09-16  9:55     ` Wei Liu
2013-09-16 10:19       ` Roger Pau Monné
2013-09-16 12:43       ` Ian Campbell
2013-09-16 13:25         ` Wei Liu
2013-09-16 13:30           ` Ian Campbell
2013-09-16 10:43   ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.