xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* segfault in xl create for HVM with PCI passthrough
@ 2014-10-27 21:25 Atom2
  2014-10-28 10:59 ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-10-27 21:25 UTC (permalink / raw)
  To: xen-devel

Hi guys,
I have used XEN for quiet some time and after a steep learning curve I 
have always been a very happy user! XEN is really a great product. 
Unfortunately I am now facing a problem that leaves me at loss:

Using gentoo as a rolling distribution I recently I upgraded to XEN 
4.3.3 (from 4.3.1) and also upgraded the gcc compiler to 4.8.3 (from 
4.7.3). Both packages are the latest stable versions available under gentoo.

After emerging (that is the re-compilation and installation of XEN 4.3.3 
on my machine) following a toolchain upgrade to the new gcc I can't 
start my two HVM FreeBSD virtual machines anymore. Both use PCI 
passthrough devices and both the motherboard and the processor support 
VT-d. XEN PV gentoo domUs (without passed through PCI devices) still 
start up (but are useless for me at the moment as they depend on the 
services provided by the tow HVM domus).

The error when starting manifests itself as follows:
# xl create -c pfsense
Parsing config from 01:pfsense.1
xc: info: VIRTUAL MEMORY ARRANGEMENT:
   Loader:        0000000000100000->00000000001c12a4
   Modules:       0000000000000000->0000000000000000
   TOTAL:         0000000000000000->000000001f800000
   ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
   4KB PAGES: 0x0000000000000200
   2MB PAGES: 0x00000000000000fb
   1GB PAGES: 0x0000000000000000
Segmentation fault
#

The domU is in a state of paused for reasons unknown to me and does not 
use any CPU cycles:
# xl list
Name                              ID   Mem VCPUs      State   Time(s)
Domain-0                           0  4094     8     r-----      41.7
pfsense                            1   512     1     --p---       0.0
#

The expected output is actually the boot-menu from FreeBSD (I do use a 
serial console in FreeBSD and that worked for month without any hickup 
before the recent update). It also never entered a paused state ...

/var/log/messages shows the following line related to the segfault:
Oct 27 20:16:22 vm-host kernel: [  458.354314] xl[2906]: segfault at 
7fbc56b93eb0 ip 00007fbc54430b64 sp 00007fbc56b93eb0 error 6 in 
libgcc_s.so.1[7fbc54422000+16000]

If I destroy the paused domU by issuing
# xl destroy pfsense
Segmentation fault
#

An error is again logged in /var/log/messages similar to the start error 
messages as follows:
Oct 27 22:06:59 vm-host kernel: [ 7095.794688] xl[3218]: segfault at 
7f22ced42eb0 ip 00007f22cc5cfb64 sp 00007f22ced42eb0 error 6 in 
libgcc_s.so.1[7f22cc5c1000+16000]

The pfsense config file is pretty simple/basic, nothing fancy in there:
builder       = 'hvm'
cpus          = '2-7'
vcpus         = 1
cpu_weight    = 512
memory        = 512
name          = 'pfsense'
disk          = [ 'phy:/etc/xen/guests/disk.d/pfsense.disk,sda,w' ]
vif           = [ 'mac=00:16:3e:a1:64:01,bridge=xenbr0,model=e1000' ]
on_poweroff  = 'destroy'
on_reboot    = 'restart'
on_crash     = 'restart'
localtime    = 0
boot         = 'c'
vnc          = 0
nographic    = 1
serial       = 'pty'
nx           = 1
pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]


In order to rule out any inconsistency after the gcc update I have today 
also emerged the complete world set (including kernel re-compilation) - 
unfortunately to no avail: The error persists. Other than this xl/XEN 
problem the machine operates without any issues.

I'd very much appreciate if somebody could shed some light on this 
issue. In case you require any more information, I am more than happy to 
provide it.

Many thanks in advance and best regards

Atom2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: segfault in xl create for HVM with PCI passthrough
  2014-10-27 21:25 segfault in xl create for HVM with PCI passthrough Atom2
@ 2014-10-28 10:59 ` Ian Campbell
  2014-10-28 15:39   ` Atom2
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-10-28 10:59 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Mon, 2014-10-27 at 22:25 +0100, Atom2 wrote:
> Hi guys,
> I have used XEN for quiet some time and after a steep learning curve I 
> have always been a very happy user! XEN is really a great product. 
> Unfortunately I am now facing a problem that leaves me at loss:
> 
> Using gentoo as a rolling distribution I recently I upgraded to XEN 
> 4.3.3 (from 4.3.1) and also upgraded the gcc compiler to 4.8.3 (from 
> 4.7.3). Both packages are the latest stable versions available under gentoo.
> 
> After emerging (that is the re-compilation and installation of XEN 4.3.3 
> on my machine) following a toolchain upgrade to the new gcc I can't 
> start my two HVM FreeBSD virtual machines anymore. Both use PCI 
> passthrough devices and both the motherboard and the processor support 
> VT-d. XEN PV gentoo domUs (without passed through PCI devices) still 
> start up (but are useless for me at the moment as they depend on the 
> services provided by the tow HVM domus).
> 
> The error when starting manifests itself as follows:
> # xl create -c pfsense
> Parsing config from 01:pfsense.1
> xc: info: VIRTUAL MEMORY ARRANGEMENT:
>    Loader:        0000000000100000->00000000001c12a4
>    Modules:       0000000000000000->0000000000000000
>    TOTAL:         0000000000000000->000000001f800000
>    ENTRY ADDRESS: 0000000000100000
> xc: info: PHYSICAL MEMORY ALLOCATION:
>    4KB PAGES: 0x0000000000000200
>    2MB PAGES: 0x00000000000000fb
>    1GB PAGES: 0x0000000000000000
> Segmentation fault
> #
> 
> The domU is in a state of paused for reasons unknown to me and does not 
> use any CPU cycles:

Domains are created paused and then unpaused at the end of the creation
process, presumably this didn't happen because xl segfaulted first.

Please can you run the command under gdb and grab a back trace. It would
also be useful to "xl -vvv create pfsense".

[...]
> pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]

You say in $subject that the failure is with PCI, is that because you've
tried an HVM domain without and it is ok, or is it just that all your
HVM domains happen to have passthrough enabled?

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: segfault in xl create for HVM with PCI passthrough
  2014-10-28 10:59 ` Ian Campbell
@ 2014-10-28 15:39   ` Atom2
  2014-10-28 16:04     ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-10-28 15:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 5874 bytes --]

Hi Ian,
thanks for your quick reply - please see below.
Am 28.10.14 um 11:59 schrieb Ian Campbell:
> On Mon, 2014-10-27 at 22:25 +0100, Atom2 wrote:
>> Hi guys,
>> I have used XEN for quiet some time and after a steep learning curve I
>> have always been a very happy user! XEN is really a great product.
>> Unfortunately I am now facing a problem that leaves me at loss:
>>
>> Using gentoo as a rolling distribution I recently I upgraded to XEN
>> 4.3.3 (from 4.3.1) and also upgraded the gcc compiler to 4.8.3 (from
>> 4.7.3). Both packages are the latest stable versions available under gentoo.
>>
>> After emerging (that is the re-compilation and installation of XEN 4.3.3
>> on my machine) following a toolchain upgrade to the new gcc I can't
>> start my two HVM FreeBSD virtual machines anymore. Both use PCI
>> passthrough devices and both the motherboard and the processor support
>> VT-d. XEN PV gentoo domUs (without passed through PCI devices) still
>> start up (but are useless for me at the moment as they depend on the
>> services provided by the tow HVM domus).
>>
>> The error when starting manifests itself as follows:
>> # xl create -c pfsense
>> Parsing config from 01:pfsense.1
>> xc: info: VIRTUAL MEMORY ARRANGEMENT:
>>     Loader:        0000000000100000->00000000001c12a4
>>     Modules:       0000000000000000->0000000000000000
>>     TOTAL:         0000000000000000->000000001f800000
>>     ENTRY ADDRESS: 0000000000100000
>> xc: info: PHYSICAL MEMORY ALLOCATION:
>>     4KB PAGES: 0x0000000000000200
>>     2MB PAGES: 0x00000000000000fb
>>     1GB PAGES: 0x0000000000000000
>> Segmentation fault
>> #
>>
>> The domU is in a state of paused for reasons unknown to me and does not
>> use any CPU cycles:
>
> Domains are created paused and then unpaused at the end of the creation
> process, presumably this didn't happen because xl segfaulted first.
I was not aware of that as this pausing/unpausing happens within a very 
short period of time and was never visible to me. But that at least 
explains why the domain is paused ... I again learned something new.
>
> Please can you run the command under gdb and grab a back trace. It would
> also be useful to "xl -vvv create pfsense".
>
First of all attached please find the output of xl -vvv create pfsense. 
I decided to attach a file as most of the output lines are longer than 
80 chars and therefore would most likely be folded by eMail clients.
In terms of the last message before the segfault in my attached file it 
seems to me that the bridge stuff was setup correctly as per the 
following commands:

# brctl show xenbr0
bridge name     bridge id               STP enabled     interfaces
xenbr0          8000.00187d1d7274       no              bond0
                                                         vif2.0
                                                         vif2.0-emu
# ifconfig
<snip>
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
         inet 127.0.0.1  netmask 255.0.0.0
         inet6 ::1  prefixlen 128  scopeid 0x10<host>
         loop  txqueuelen 0  (Local Loopback)
         RX packets 118  bytes 11408 (11.1 KiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 118  bytes 11408 (11.1 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vif2.0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether fe:ff:ff:ff:ff:ff  txqueuelen 32  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vif2.0-emu: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::fcff:ffff:feff:ffff  prefixlen 64  scopeid 0x20<link>
         ether fe:ff:ff:ff:ff:ff  txqueuelen 500  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 598 overruns 0  carrier 0  collisions 0

xenbr0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 1500
         inet 192.168.19.2  netmask 255.255.255.0  broadcast 192.168.19.255
         inet6 fe80::218:7dff:fe1d:7274  prefixlen 64  scopeid 0x20<link>
         ether 00:18:7d:1d:72:74  txqueuelen 0  (Ethernet)
         RX packets 58364  bytes 16721913 (15.9 MiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 13224  bytes 3090681 (2.9 MiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


With regards to gdb: I can certainly run the command under gdb after 
including debug support to the executables - that's no big deal.
I would, however, ask for your advice as to what I need to recompile 
with debugger support? Is xen-tools (which includes xl) sufficient or 
would you think that I also need to include debug support for gcc as the 
library that is mentioned in /var/log/messages (libgcc_s.so.1) seems to 
belong to the gcc package? Or is this library a red herring that just 
works as the catch-all code getting and finally handling the segfault? 
Please advise. Tx.
> [...]
>> pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]
>
> You say in $subject that the failure is with PCI, is that because you've
> tried an HVM domain without and it is ok, or is it just that all your
> HVM domains happen to have passthrough enabled?
I haven't tried HVM domains without PCI passthrough (but PV domains w/o 
PCI passthrough and they did not segfault) so far as all my HVM domains 
require PCI devices (either at least a network card for pfsense - in 
actual facts it's more than one that's being passed through - or a SATA 
controller for my second HVM which is used as a storage VM).

If you think that after the gdb stuff it would still be beneficial to go 
down that route, I am sure I can come up with something.
>
> Ian.
>
Again many thanks Atom2

[-- Attachment #2: output --]
[-- Type: text/plain, Size: 7633 bytes --]

Parsing config from pfsense
libxl: debug: libxl_create.c:1243:do_domain_create: ao 0x7fdcfc226ce0: create: how=(nil) callback=(nil) poller=0x7fdcfc227690
libxl: debug: libxl_device.c:257:libxl__device_disk_set_backend: Disk vdev=sda spec.backend=unknown
libxl: debug: libxl_device.c:296:libxl__device_disk_set_backend: Disk vdev=sda, using backend phy
libxl: debug: libxl_create.c:699:initiate_domain_create: running bootloader
libxl: debug: libxl_bootloader.c:321:libxl__bootloader_run: not a PV domain, skipping bootloader
libxl: debug: libxl_event.c:608:libxl__ev_xswatch_deregister: watch w=0x7fdcfc228098: deregister unregistered
xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0xc12a4
xc: detail: elf_parse_binary: memory: 0x100000 -> 0x1c12a4
xc: info: VIRTUAL MEMORY ARRANGEMENT:
  Loader:        0000000000100000->00000000001c12a4
  Modules:       0000000000000000->0000000000000000
  TOTAL:         0000000000000000->000000001f800000
  ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
  4KB PAGES: 0x0000000000000200
  2MB PAGES: 0x00000000000000fb
  1GB PAGES: 0x0000000000000000
xc: detail: elf_load_binary: phdr 0 at 0x7fdcfbe26000 -> 0x7fdcfbede12d
libxl: debug: libxl_device.c:257:libxl__device_disk_set_backend: Disk vdev=sda spec.backend=phy
libxl: debug: libxl_event.c:559:libxl__ev_xswatch_register: watch w=0x7fdcfc227398 wpath=/local/domain/0/backend/vbd/2/2048/state token=3/0: register slotnum=3
libxl: debug: libxl_create.c:1256:do_domain_create: ao 0x7fdcfc226ce0: inprogress: poller=0x7fdcfc227690, flags=i
libxl: debug: libxl_event.c:503:watchfd_callback: watch w=0x7fdcfc227398 wpath=/local/domain/0/backend/vbd/2/2048/state token=3/0: event epath=/local/domain/0/backend/vbd/2/2048/state
libxl: debug: libxl_event.c:647:devstate_watch_callback: backend /local/domain/0/backend/vbd/2/2048/state wanted state 2 still waiting state 1
libxl: debug: libxl_event.c:503:watchfd_callback: watch w=0x7fdcfc227398 wpath=/local/domain/0/backend/vbd/2/2048/state token=3/0: event epath=/local/domain/0/backend/vbd/2/2048/state
libxl: debug: libxl_event.c:643:devstate_watch_callback: backend /local/domain/0/backend/vbd/2/2048/state wanted state 2 ok
libxl: debug: libxl_event.c:596:libxl__ev_xswatch_deregister: watch w=0x7fdcfc227398 wpath=/local/domain/0/backend/vbd/2/2048/state token=3/0: deregister slotnum=3
libxl: debug: libxl_event.c:608:libxl__ev_xswatch_deregister: watch w=0x7fdcfc227398: deregister unregistered
libxl: debug: libxl_device.c:959:device_hotplug: calling hotplug script: /etc/xen/scripts/block add
libxl: debug: libxl_dm.c:1211:libxl__spawn_local_dm: Spawning device-model /usr/lib/xen/bin/qemu-system-i386 with arguments:
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   /usr/lib/xen/bin/qemu-system-i386
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -xen-domid
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   2
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -chardev
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-2,server,nowait
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -mon
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   chardev=libxl-cmd,mode=control
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -name
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   pfsense
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -global
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   isa-fdc.driveA=
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -serial
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   pty
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -nographic
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -vga
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   cirrus
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -global
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   vga.vram_size_mb=8
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -boot
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   order=c
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -device
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   e1000,id=nic0,netdev=net0,mac=00:16:3e:a1:64:01
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -netdev
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   type=tap,id=net0,ifname=vif2.0-emu,script=no,downscript=no
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -M
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   xenfv
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -m
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   504
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   -drive
libxl: debug: libxl_dm.c:1213:libxl__spawn_local_dm:   file=/etc/xen/guests/disk.d/pfsense.disk,if=scsi,bus=0,unit=0,format=raw,cache=writeback
libxl: debug: libxl_event.c:559:libxl__ev_xswatch_register: watch w=0x7fdcfc2282d0 wpath=/local/domain/0/device-model/2/state token=3/1: register slotnum=3
libxl: debug: libxl_event.c:503:watchfd_callback: watch w=0x7fdcfc2282d0 wpath=/local/domain/0/device-model/2/state token=3/1: event epath=/local/domain/0/device-model/2/state
libxl: debug: libxl_event.c:503:watchfd_callback: watch w=0x7fdcfc2282d0 wpath=/local/domain/0/device-model/2/state token=3/1: event epath=/local/domain/0/device-model/2/state
libxl: debug: libxl_event.c:596:libxl__ev_xswatch_deregister: watch w=0x7fdcfc2282d0 wpath=/local/domain/0/device-model/2/state token=3/1: deregister slotnum=3
libxl: debug: libxl_event.c:608:libxl__ev_xswatch_deregister: watch w=0x7fdcfc2282d0: deregister unregistered
libxl: debug: libxl_qmp.c:707:libxl__qmp_initialize: connected to /var/run/xen/qmp-libxl-2
libxl: debug: libxl_qmp.c:299:qmp_handle_response: message type: qmp
libxl: debug: libxl_qmp.c:555:qmp_send_prepare: next qmp command: '{
    "execute": "qmp_capabilities",
    "id": 1
}
'
libxl: debug: libxl_qmp.c:299:qmp_handle_response: message type: return
libxl: debug: libxl_qmp.c:555:qmp_send_prepare: next qmp command: '{
    "execute": "query-chardev",
    "id": 2
}
'
libxl: debug: libxl_qmp.c:299:qmp_handle_response: message type: return
libxl: debug: libxl_qmp.c:555:qmp_send_prepare: next qmp command: '{
    "execute": "query-vnc",
    "id": 3
}
'
libxl: debug: libxl_qmp.c:299:qmp_handle_response: message type: return
libxl: debug: libxl_event.c:559:libxl__ev_xswatch_register: watch w=0x7fdcfc22b8f8 wpath=/local/domain/0/backend/vif/2/0/state token=3/2: register slotnum=3
libxl: debug: libxl_event.c:503:watchfd_callback: watch w=0x7fdcfc22b8f8 wpath=/local/domain/0/backend/vif/2/0/state token=3/2: event epath=/local/domain/0/backend/vif/2/0/state
libxl: debug: libxl_event.c:647:devstate_watch_callback: backend /local/domain/0/backend/vif/2/0/state wanted state 2 still waiting state 1
libxl: debug: libxl_event.c:503:watchfd_callback: watch w=0x7fdcfc22b8f8 wpath=/local/domain/0/backend/vif/2/0/state token=3/2: event epath=/local/domain/0/backend/vif/2/0/state
libxl: debug: libxl_event.c:643:devstate_watch_callback: backend /local/domain/0/backend/vif/2/0/state wanted state 2 ok
libxl: debug: libxl_event.c:596:libxl__ev_xswatch_deregister: watch w=0x7fdcfc22b8f8 wpath=/local/domain/0/backend/vif/2/0/state token=3/2: deregister slotnum=3
libxl: debug: libxl_event.c:608:libxl__ev_xswatch_deregister: watch w=0x7fdcfc22b8f8: deregister unregistered
libxl: debug: libxl_device.c:959:device_hotplug: calling hotplug script: /etc/xen/scripts/vif-bridge online
libxl: debug: libxl_device.c:959:device_hotplug: calling hotplug script: /etc/xen/scripts/vif-bridge add
Segmentation fault

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: segfault in xl create for HVM with PCI passthrough
  2014-10-28 15:39   ` Atom2
@ 2014-10-28 16:04     ` Ian Campbell
  2014-10-29  0:26       ` Atom2
  2014-11-09 23:03       ` Atom2
  0 siblings, 2 replies; 25+ messages in thread
From: Ian Campbell @ 2014-10-28 16:04 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Tue, 2014-10-28 at 16:39 +0100, Atom2 wrote:
> > Please can you run the command under gdb and grab a back trace. It would
> > also be useful to "xl -vvv create pfsense".
> >
> First of all attached please find the output of xl -vvv create pfsense. 
> I decided to attach a file as most of the output lines are longer than 
> 80 chars and therefore would most likely be folded by eMail clients.

Thanks.

> In terms of the last message before the segfault in my attached file it 
> seems to me that the bridge stuff was setup correctly as per the 
> following commands:

Agreed, it seems the crash happens after that sometime.

> With regards to gdb: I can certainly run the command under gdb after 
> including debug support to the executables - that's no big deal.
> I would, however, ask for your advice as to what I need to recompile 
> with debugger support? Is xen-tools (which includes xl) sufficient

I think just the Xen bits would be sufficient, at least to start with.

>  or 
> would you think that I also need to include debug support for gcc as the 
> library that is mentioned in /var/log/messages (libgcc_s.so.1) seems to 
> belong to the gcc package? Or is this library a red herring that just 
> works as the catch-all code getting and finally handling the segfault? 

I'd recommend ignoring it for now, in the event that the backtrace from
just the xen bits suggests a gcc issue that might change. My money right
now is on it being a xen issue though.

> Please advise. Tx.
> > [...]
> >> pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]
> >
> > You say in $subject that the failure is with PCI, is that because you've
> > tried an HVM domain without and it is ok, or is it just that all your
> > HVM domains happen to have passthrough enabled?
> I haven't tried HVM domains without PCI passthrough (but PV domains w/o 
> PCI passthrough and they did not segfault) so far as all my HVM domains 
> require PCI devices (either at least a network card for pfsense - in 
> actual facts it's more than one that's being passed through - or a SATA 
> controller for my second HVM which is used as a storage VM).

The VM doesn't need to be fully functional, it just needs to boot
without crashing the toolstack. Just running your existing VM with the
pci line commented out would be useful.

Ian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: segfault in xl create for HVM with PCI passthrough
  2014-10-28 16:04     ` Ian Campbell
@ 2014-10-29  0:26       ` Atom2
  2014-10-30 23:05         ` Atom2
  2014-11-09 23:03       ` Atom2
  1 sibling, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-10-29  0:26 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 4721 bytes --]

To keep the thread together I am again submitting the relevant parts of 
my last answer (which due to an error on my part originally went out to 
Ian only and I only forward it to the list afterwards which resulted in 
an out-of-thread appeareance) together with the (new) results of my gdb 
excercise. Sorry for any confusion this may(might have) cause(d).

Am 28.10.14 um 17:04 schrieb Ian Campbell:
[...]
>> With regards to gdb: I can certainly run the command under gdb after
>> including debug support to the executables - that's no big deal.
>> I would, however, ask for your advice as to what I need to recompile
>> with debugger support? Is xen-tools (which includes xl) sufficient
>
> I think just the Xen bits would be sufficient, at least to start with.
>
>>   or
>> would you think that I also need to include debug support for gcc as the
>> library that is mentioned in /var/log/messages (libgcc_s.so.1) seems to
>> belong to the gcc package? Or is this library a red herring that just
>> works as the catch-all code getting and finally handling the segfault?
>
> I'd recommend ignoring it for now, in the event that the backtrace from
> just the xen bits suggests a gcc issue that might change. My money right
> now is on it being a xen issue though.
>
After recompiling xen-tools with gdb debug support I started the 
following command:
# gdb --args /usr/sbin/xl create pfsense -c

Please find the command's screen output after its start up to the 
segfault including the output of the bt command after the segfault in 
the attached document named "create".

Furthermore I did the same for the destroy command:
# gdb --args /usr/sbin/xl destroy pfsense

The output of this command is in the attached document named "destroy".

I haven't got much experience with gdb yet so I am unable to interpret 
the outcome of either. Also if there's more/different stuff required, 
please advise me what to do next. Tx.

>>> [...]
>>>> pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]
>>>
>>> You say in $subject that the failure is with PCI, is that because you've
>>> tried an HVM domain without and it is ok, or is it just that all your
>>> HVM domains happen to have passthrough enabled?
>> I haven't tried HVM domains without PCI passthrough (but PV domains w/o
>> PCI passthrough and they did not segfault) so far as all my HVM domains
>> require PCI devices (either at least a network card for pfsense - in
>> actual facts it's more than one that's being passed through - or a SATA
>> controller for my second HVM which is used as a storage VM).
>
> The VM doesn't need to be fully functional, it just needs to boot
> without crashing the toolstack. Just running your existing VM with the
> pci line commented out would be useful.
Before re-compiling the xen-tools I made a quick test as you suggested
and commented out the pci line from my config file ... and the boot menu 
showed up (which it did not before when the segfault happened).
I did not boot the pfsense vm any further as this might lead to a change 
in my configuration due to missing devices, but to me this at first 
sight seemed to indicate that is has to do with the PCI passthrough 
functionality.
Although as I did not want to boot the machine (and "xl shutdown" did
not work, not even with -F) I then decided to
     xl destroy pfsense
and that printed a segmentation fault message (in both the shell window
where I started the command from and the console window where the
boot-menu was shown) despite no PCI devices being passed through.

To also check PCI passthrough with a PV domain: I added a pci device to
a config file for a PV domain and started that with
     xl create voip -c
The boot menu appeared without issues. I then also tried
     xl destroy voip
from another window and that issued the following error messages in the
shell window (without using any -vvv option):

# xl destroy voip
libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
irq=17
libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
/local/domain/0/backend/pci/4/0 not ready
libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
irq=16
libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
/local/domain/0/backend/pci/4/0 not ready
libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
irq=23
libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
/local/domain/0/backend/pci/4/0 not ready
Segmentation fault

The "Segmentation fault" message also appeared in both the console
window for the domU and the shell window.

This all seems a bit strange to me at the moement, but I am sure with
your help we will arrive at the grounds of this.

Thanks and regards Atom

[-- Attachment #2: create --]
[-- Type: text/plain, Size: 2863 bytes --]

GNU gdb (Gentoo 7.6.2 p1) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/sbin/xl...Reading symbols from /usr/lib64/debug/usr/sbin/xl.debug...done.
done.
(gdb) run
Starting program: /usr/sbin/xl create pfsense -c
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Parsing config from pfsense
xc: info: VIRTUAL MEMORY ARRANGEMENT:
  Loader:        0000000000100000->00000000001c12a4
  Modules:       0000000000000000->0000000000000000
  TOTAL:         0000000000000000->000000001f800000
  ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
  4KB PAGES: 0x0000000000000200
  2MB PAGES: 0x00000000000000fb
  1GB PAGES: 0x0000000000000000
[New Thread 0x7ffff7ff5700 (LWP 13464)]
[New Thread 0x7ffff7fe6700 (LWP 13574)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fe6700 (LWP 13574)]
0x00007ffff5882b64 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
(gdb) bt
#0  0x00007ffff5882b64 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#1  0x00007ffff58835cc in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#2  0x00007ffff5883945 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#3  0x00007ffff58845c6 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#4  0x00007ffff588494c in _Unwind_ForcedUnwind () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#5  0x00007ffff731a733 in __pthread_unwind () from /lib64/libpthread.so.0
#6  0x00007ffff7311b49 in sigcancel_handler () from /lib64/libpthread.so.0
#7  <signal handler called>
#8  0x00007ffff731ae4d in read () from /lib64/libpthread.so.0
#9  0x00007ffff6b17b53 in read (__nbytes=16, __buf=0x7fffe80008d0, __fd=18) at /usr/include/bits/unistd.h:44
#10 read_all (fd=18, data=data@entry=0x7fffe80008d0, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374
#11 0x00007ffff6b17c94 in read_message (h=h@entry=0x555555785670, nonblocking=nonblocking@entry=0) at xs.c:1139
#12 0x00007ffff6b18626 in read_thread (arg=0x555555785670) at xs.c:1211
#13 0x00007ffff731332d in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff704a19d in clone () from /lib64/libc.so.6
(gdb)

[-- Attachment #3: destroy --]
[-- Type: text/plain, Size: 2431 bytes --]

GNU gdb (Gentoo 7.6.2 p1) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/sbin/xl...Reading symbols from /usr/lib64/debug/usr/sbin/xl.debug...done.
done.
(gdb) run
Starting program: /usr/sbin/xl destroy pfsense
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7ff6700 (LWP 13639)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7ff6700 (LWP 13639)]
0x00007ffff5882b64 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
(gdb) bt
#0  0x00007ffff5882b64 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#1  0x00007ffff58835cc in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#2  0x00007ffff5883945 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#3  0x00007ffff58845c6 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#4  0x00007ffff588494c in _Unwind_ForcedUnwind () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1
#5  0x00007ffff731a733 in __pthread_unwind () from /lib64/libpthread.so.0
#6  0x00007ffff7311b49 in sigcancel_handler () from /lib64/libpthread.so.0
#7  <signal handler called>
#8  0x00007ffff731ae4d in read () from /lib64/libpthread.so.0
#9  0x00007ffff6b17b53 in read (__nbytes=16, __buf=0x7ffff0000a10, __fd=10) at /usr/include/bits/unistd.h:44
#10 read_all (fd=10, data=data@entry=0x7ffff0000a10, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374
#11 0x00007ffff6b17c94 in read_message (h=h@entry=0x55555577ed40, nonblocking=nonblocking@entry=0) at xs.c:1139
#12 0x00007ffff6b18626 in read_thread (arg=0x55555577ed40) at xs.c:1211
#13 0x00007ffff731332d in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff704a19d in clone () from /lib64/libc.so.6
(gdb)

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: segfault in xl create for HVM with PCI passthrough
  2014-10-29  0:26       ` Atom2
@ 2014-10-30 23:05         ` Atom2
  2014-11-04 15:13           ` [BUG] XEN 4.3.3 - " Atom2
  0 siblings, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-10-30 23:05 UTC (permalink / raw)
  To: xen-devel, Ian Campbell

Ian,
apologies for pinging this, but I am not sure whether there's anything 
else over and above the answers in my last message (copied below) that 
you are expecting me to provide before being able to judge where and 
what the issue might be?

Many thanks in advance, Atom2

P.S. In case you again require the attachments to my last message, 
please let me know.

Am 29.10.14 um 01:26 schrieb Atom2:
> To keep the thread together I am again submitting the relevant parts of
> my last answer (which due to an error on my part originally went out to
> Ian only and I only forward it to the list afterwards which resulted in
> an out-of-thread appeareance) together with the (new) results of my gdb
> excercise. Sorry for any confusion this may(might have) cause(d).
>
> Am 28.10.14 um 17:04 schrieb Ian Campbell:
> [...]
>>> With regards to gdb: I can certainly run the command under gdb after
>>> including debug support to the executables - that's no big deal.
>>> I would, however, ask for your advice as to what I need to recompile
>>> with debugger support? Is xen-tools (which includes xl) sufficient
>>
>> I think just the Xen bits would be sufficient, at least to start with.
>>
>>>   or
>>> would you think that I also need to include debug support for gcc as the
>>> library that is mentioned in /var/log/messages (libgcc_s.so.1) seems to
>>> belong to the gcc package? Or is this library a red herring that just
>>> works as the catch-all code getting and finally handling the segfault?
>>
>> I'd recommend ignoring it for now, in the event that the backtrace from
>> just the xen bits suggests a gcc issue that might change. My money right
>> now is on it being a xen issue though.
>>
> After recompiling xen-tools with gdb debug support I started the
> following command:
> # gdb --args /usr/sbin/xl create pfsense -c
>
> Please find the command's screen output after its start up to the
> segfault including the output of the bt command after the segfault in
> the attached document named "create".
>
> Furthermore I did the same for the destroy command:
> # gdb --args /usr/sbin/xl destroy pfsense
>
> The output of this command is in the attached document named "destroy".
>
> I haven't got much experience with gdb yet so I am unable to interpret
> the outcome of either. Also if there's more/different stuff required,
> please advise me what to do next. Tx.
>
>>>> [...]
>>>>> pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]
>>>>
>>>> You say in $subject that the failure is with PCI, is that because
>>>> you've
>>>> tried an HVM domain without and it is ok, or is it just that all your
>>>> HVM domains happen to have passthrough enabled?
>>> I haven't tried HVM domains without PCI passthrough (but PV domains w/o
>>> PCI passthrough and they did not segfault) so far as all my HVM domains
>>> require PCI devices (either at least a network card for pfsense - in
>>> actual facts it's more than one that's being passed through - or a SATA
>>> controller for my second HVM which is used as a storage VM).
>>
>> The VM doesn't need to be fully functional, it just needs to boot
>> without crashing the toolstack. Just running your existing VM with the
>> pci line commented out would be useful.
> Before re-compiling the xen-tools I made a quick test as you suggested
> and commented out the pci line from my config file ... and the boot menu
> showed up (which it did not before when the segfault happened).
> I did not boot the pfsense vm any further as this might lead to a change
> in my configuration due to missing devices, but to me this at first
> sight seemed to indicate that is has to do with the PCI passthrough
> functionality.
> Although as I did not want to boot the machine (and "xl shutdown" did
> not work, not even with -F) I then decided to
>      xl destroy pfsense
> and that printed a segmentation fault message (in both the shell window
> where I started the command from and the console window where the
> boot-menu was shown) despite no PCI devices being passed through.
>
> To also check PCI passthrough with a PV domain: I added a pci device to
> a config file for a PV domain and started that with
>      xl create voip -c
> The boot menu appeared without issues. I then also tried
>      xl destroy voip
> from another window and that issued the following error messages in the
> shell window (without using any -vvv option):
>
> # xl destroy voip
> libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
> irq=17
> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
> /local/domain/0/backend/pci/4/0 not ready
> libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
> irq=16
> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
> /local/domain/0/backend/pci/4/0 not ready
> libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
> irq=23
> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
> /local/domain/0/backend/pci/4/0 not ready
> Segmentation fault
>
> The "Segmentation fault" message also appeared in both the console
> window for the domU and the shell window.
>
> This all seems a bit strange to me at the moement, but I am sure with
> your help we will arrive at the grounds of this.
>
> Thanks and regards Atom
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-10-30 23:05         ` Atom2
@ 2014-11-04 15:13           ` Atom2
  2014-11-04 15:44             ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-11-04 15:13 UTC (permalink / raw)
  To: xen-devel, Ian Campbell

I assume it may be warranted to "upgrade" this issue to a bug status 
(obviously also in the hope that it attractes wider interest) by 
prefixing the subject line with a [BUG] prefix as per 
http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen_Project. I have 
exhausted all my options (including numerous IRC attempts), provided all 
the information I have been asked for but the issue persists and nobody 
seems to have an idea how to rectify the problem.

If you need any more information, I'll do my utmost to provide it asap.

I appologise if this is not the right approach, but at the moment, I'm 
just frustrated that a well working system under 4.3.1 simply stopped 
working after upgrading to 4.3.3 more than a week ago. I very much love 
XEN, it's a fabulous product, and I would simply like to get it working 
again.

Many thanks in advance for your support and your understanding

Atom2

Am 31.10.14 um 00:05 schrieb Atom2:
> Ian,
> apologies for pinging this, but I am not sure whether there's anything
> else over and above the answers in my last message (copied below) that
> you are expecting me to provide before being able to judge where and
> what the issue might be?
>
> Many thanks in advance, Atom2
>
> P.S. In case you again require the attachments to my last message,
> please let me know.
>
> Am 29.10.14 um 01:26 schrieb Atom2:
>> To keep the thread together I am again submitting the relevant parts of
>> my last answer (which due to an error on my part originally went out to
>> Ian only and I only forward it to the list afterwards which resulted in
>> an out-of-thread appeareance) together with the (new) results of my gdb
>> excercise. Sorry for any confusion this may(might have) cause(d).
>>
>> Am 28.10.14 um 17:04 schrieb Ian Campbell:
>> [...]
>>>> With regards to gdb: I can certainly run the command under gdb after
>>>> including debug support to the executables - that's no big deal.
>>>> I would, however, ask for your advice as to what I need to recompile
>>>> with debugger support? Is xen-tools (which includes xl) sufficient
>>>
>>> I think just the Xen bits would be sufficient, at least to start with.
>>>
>>>>   or
>>>> would you think that I also need to include debug support for gcc as
>>>> the
>>>> library that is mentioned in /var/log/messages (libgcc_s.so.1) seems to
>>>> belong to the gcc package? Or is this library a red herring that just
>>>> works as the catch-all code getting and finally handling the segfault?
>>>
>>> I'd recommend ignoring it for now, in the event that the backtrace from
>>> just the xen bits suggests a gcc issue that might change. My money right
>>> now is on it being a xen issue though.
>>>
>> After recompiling xen-tools with gdb debug support I started the
>> following command:
>> # gdb --args /usr/sbin/xl create pfsense -c
>>
>> Please find the command's screen output after its start up to the
>> segfault including the output of the bt command after the segfault in
>> the attached document named "create".
>>
>> Furthermore I did the same for the destroy command:
>> # gdb --args /usr/sbin/xl destroy pfsense
>>
>> The output of this command is in the attached document named "destroy".
>>
>> I haven't got much experience with gdb yet so I am unable to interpret
>> the outcome of either. Also if there's more/different stuff required,
>> please advise me what to do next. Tx.
>>
>>>>> [...]
>>>>>> pci          = [ '04:00.0', '0a:08.0', '0a:0b.0' ]
>>>>>
>>>>> You say in $subject that the failure is with PCI, is that because
>>>>> you've
>>>>> tried an HVM domain without and it is ok, or is it just that all your
>>>>> HVM domains happen to have passthrough enabled?
>>>> I haven't tried HVM domains without PCI passthrough (but PV domains w/o
>>>> PCI passthrough and they did not segfault) so far as all my HVM domains
>>>> require PCI devices (either at least a network card for pfsense - in
>>>> actual facts it's more than one that's being passed through - or a SATA
>>>> controller for my second HVM which is used as a storage VM).
>>>
>>> The VM doesn't need to be fully functional, it just needs to boot
>>> without crashing the toolstack. Just running your existing VM with the
>>> pci line commented out would be useful.
>> Before re-compiling the xen-tools I made a quick test as you suggested
>> and commented out the pci line from my config file ... and the boot menu
>> showed up (which it did not before when the segfault happened).
>> I did not boot the pfsense vm any further as this might lead to a change
>> in my configuration due to missing devices, but to me this at first
>> sight seemed to indicate that is has to do with the PCI passthrough
>> functionality.
>> Although as I did not want to boot the machine (and "xl shutdown" did
>> not work, not even with -F) I then decided to
>>      xl destroy pfsense
>> and that printed a segmentation fault message (in both the shell window
>> where I started the command from and the console window where the
>> boot-menu was shown) despite no PCI devices being passed through.
>>
>> To also check PCI passthrough with a PV domain: I added a pci device to
>> a config file for a PV domain and started that with
>>      xl create voip -c
>> The boot menu appeared without issues. I then also tried
>>      xl destroy voip
>> from another window and that issued the following error messages in the
>> shell window (without using any -vvv option):
>>
>> # xl destroy voip
>> libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
>> irq=17
>> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
>> /local/domain/0/backend/pci/4/0 not ready
>> libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
>> irq=16
>> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
>> /local/domain/0/backend/pci/4/0 not ready
>> libxl: error: libxl_pci.c:1247:do_pci_remove: xc_domain_irq_permission
>> irq=23
>> libxl: error: libxl_device.c:1127:libxl__wait_for_backend: Backend
>> /local/domain/0/backend/pci/4/0 not ready
>> Segmentation fault
>>
>> The "Segmentation fault" message also appeared in both the console
>> window for the domU and the shell window.
>>
>> This all seems a bit strange to me at the moement, but I am sure with
>> your help we will arrive at the grounds of this.
>>
>> Thanks and regards Atom

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-04 15:13           ` [BUG] XEN 4.3.3 - " Atom2
@ 2014-11-04 15:44             ` Ian Campbell
  2014-11-04 16:14               ` Atom2
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-11-04 15:44 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
> I assume it may be warranted to "upgrade" this issue to a bug status 
> (obviously also in the hope that it attractes wider interest) by 
> prefixing the subject line with a [BUG] prefix as per 
> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen_Project. I have 
> exhausted all my options (including numerous IRC attempts), provided all 
> the information I have been asked for but the issue persists and nobody 
> seems to have an idea how to rectify the problem.

Sorry for the delay, the issue is quite perplexing so I was intending to
sleep on it, but didn't get any inspiration in doing so...

In the gdb traces you provided there is:
#10 read_all (fd=10, data=data@entry=0x7ffff0000a10, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374

which seems to correspond to the 
        if (!read_all(h->fd, &msg->hdr, sizeof(msg->hdr), nonblocking)) { /* Cancellation point */
in read_message (because the size and offset seem matches this call, so
I think it is more likely than the other one, but the logic below
applies in either case).

The thing we are reading into has literally just been allocated, so I
can't think of any reason accessing it should fault.

There is only one xenstore change between 4.3.1 and 4.3.3 which is 
        commit 014f9219f1dca3ee92948f0cfcda8d1befa6cbcd
        Author: Matthew Daley <mattd@bugfuzz.com>
        Date:   Sat Nov 30 13:20:04 2013 +1300
        
            xenstore: sanity check incoming message body lengths
            
            This is for the client-side receiving messages from xenstored, so there
            is no security impact, unlike XSA-72.
        
but I can't see any way that could possibly cause a segfault.

So, I'm afraid I'm completely mystified.

You could try running the xl command under valgrind, you may find "xl
create -F" (which keeps xl in the foreground) handy if you try this.
That might help catch any heap corruption etc.

A related thing to try might be to run "MALLOC_CHECK_=2 xl create ..."
which enables glib's heap consistency checks (described at the end of
http://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html) which might give a clue.

Otherwise I think the next step would be to downgrade to 4.3.1 and see
if the problem persists, in order to rule out changes elsewhere in the
system. If the problem doesn't happen with a 4.3.1 rebuilt on your
current system then the next thing would probably be to bisect the
issue. There are only 31 toolstack changes in that range, so it ought to
only take 5-6 iterations.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-04 15:44             ` Ian Campbell
@ 2014-11-04 16:14               ` Atom2
  2014-11-04 16:31                 ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-11-04 16:14 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Am 04.11.14 um 16:44 schrieb Ian Campbell:
> On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
>> I assume it may be warranted to "upgrade" this issue to a bug status
>> (obviously also in the hope that it attractes wider interest) by
>> prefixing the subject line with a [BUG] prefix as per
>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen_Project. I have
>> exhausted all my options (including numerous IRC attempts), provided all
>> the information I have been asked for but the issue persists and nobody
>> seems to have an idea how to rectify the problem.
>
> Sorry for the delay, the issue is quite perplexing so I was intending to
> sleep on it, but didn't get any inspiration in doing so...
Thanks for getting back ... obviously sometimes sleep is not the right cure.
>
> In the gdb traces you provided there is:
> #10 read_all (fd=10, data=data@entry=0x7ffff0000a10, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374
>
Just to be on the same page: That was for the destroy case. The 
corresponding line for the create case was:
#10 read_all (fd=18, data=data@entry=0x7fffe80008d0, len=len@entry=16, 
nonblocking=nonblocking@entry=0) at xs.c:374

I don't know whether that makes any difference though.
> which seems to correspond to the
>          if (!read_all(h->fd, &msg->hdr, sizeof(msg->hdr), nonblocking)) { /* Cancellation point */
I did have a look at the file xs.c as well in the source and there are 3 
source code files named xs.c:
	tools/xenstore/xs.c
	tools/python/xen/lowlevel/xs/xs.c
	extras/mini-os/lib/xs.c
Out of these only the first two do have at least 374 lines and only the 
first one has a non empty source code line at line 374. That line 
however reads as follows in my source:
	done = read(fd, data, len)
and is located in function
	static bool read_all(int fd, void *data, unsigned int len, int nonblocking)
starting at line 361

The line you referr to is located at line 1139 in the same file. I just 
wanted to bring this to your attention, but I might be on the wrong 
track here ...

> in read_message (because the size and offset seem matches this call, so
> I think it is more likely than the other one, but the logic below
> applies in either case).
>
> The thing we are reading into has literally just been allocated, so I
> can't think of any reason accessing it should fault.
>
> There is only one xenstore change between 4.3.1 and 4.3.3 which is
>          commit 014f9219f1dca3ee92948f0cfcda8d1befa6cbcd
>          Author: Matthew Daley <mattd@bugfuzz.com>
>          Date:   Sat Nov 30 13:20:04 2013 +1300
>
>              xenstore: sanity check incoming message body lengths
>
>              This is for the client-side receiving messages from xenstored, so there
>              is no security impact, unlike XSA-72.
>
> but I can't see any way that could possibly cause a segfault.
>
> So, I'm afraid I'm completely mystified.
>
> You could try running the xl command under valgrind, you may find "xl
> create -F" (which keeps xl in the foreground) handy if you try this.
> That might help catch any heap corruption etc.
I don't know what valgrind is, but I'll have a look and see how to deal 
with that ...
>
> A related thing to try might be to run "MALLOC_CHECK_=2 xl create ..."
> which enables glib's heap consistency checks (described at the end of
> http://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html) which might give a clue.
I tried that, but the same segfault and no more messages on the screen - 
or should I have run this under gdb as well?
>
> Otherwise I think the next step would be to downgrade to 4.3.1 and see
> if the problem persists, in order to rule out changes elsewhere in the
> system. If the problem doesn't happen with a 4.3.1 rebuilt on your
> current system then the next thing would probably be to bisect the
> issue. There are only 31 toolstack changes in that range, so it ought to
> only take 5-6 iterations.
Unfortunately 4.3.1 is no longer available as an ebuild as 4.3.3 seemed 
to fix security issues and therefore 4.3.1 has been deleted from the 
repos. So it's not straightforward and I need to figure out how to get 
the old version back. But I am sure there's a way.

Thanks Atom2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-04 16:14               ` Atom2
@ 2014-11-04 16:31                 ` Ian Campbell
  2014-11-04 16:48                   ` Atom2
  2014-11-04 17:30                   ` Atom2
  0 siblings, 2 replies; 25+ messages in thread
From: Ian Campbell @ 2014-11-04 16:31 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Tue, 2014-11-04 at 17:14 +0100, Atom2 wrote:
> Am 04.11.14 um 16:44 schrieb Ian Campbell:
> > On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
> >> I assume it may be warranted to "upgrade" this issue to a bug status
> >> (obviously also in the hope that it attractes wider interest) by
> >> prefixing the subject line with a [BUG] prefix as per
> >> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen_Project. I have
> >> exhausted all my options (including numerous IRC attempts), provided all
> >> the information I have been asked for but the issue persists and nobody
> >> seems to have an idea how to rectify the problem.
> >
> > Sorry for the delay, the issue is quite perplexing so I was intending to
> > sleep on it, but didn't get any inspiration in doing so...
> Thanks for getting back ... obviously sometimes sleep is not the right cure.
> >
> > In the gdb traces you provided there is:
> > #10 read_all (fd=10, data=data@entry=0x7ffff0000a10, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374
> >
> Just to be on the same page: That was for the destroy case. The 
> corresponding line for the create case was:
> #10 read_all (fd=18, data=data@entry=0x7fffe80008d0, len=len@entry=16, 
> nonblocking=nonblocking@entry=0) at xs.c:374
> 
> I don't know whether that makes any difference though.
> > which seems to correspond to the
> >          if (!read_all(h->fd, &msg->hdr, sizeof(msg->hdr), nonblocking)) { /* Cancellation point */
> I did have a look at the file xs.c as well in the source and there are 3 
> source code files named xs.c:
> 	tools/xenstore/xs.c
> 	tools/python/xen/lowlevel/xs/xs.c
> 	extras/mini-os/lib/xs.c
> Out of these only the first two do have at least 374 lines and only the 
> first one has a non empty source code line at line 374. That line 
> however reads as follows in my source:
> 	done = read(fd, data, len)
> and is located in function
> 	static bool read_all(int fd, void *data, unsigned int len, int nonblocking)
> starting at line 361
> 
> The line you referr to is located at line 1139 in the same file. I just 
> wanted to bring this to your attention, but I might be on the wrong 
> track here ...

Right, the line at 1139 is the caller of thing in stack frame #10. The
other potentially caller is at 1150. It's the callers which ultimately
provide the buffer to read into (called "data" in real_all), which is
why I was interested in them.

> > So, I'm afraid I'm completely mystified.
> >
> > You could try running the xl command under valgrind, you may find "xl
> > create -F" (which keeps xl in the foreground) handy if you try this.
> > That might help catch any heap corruption etc.
> I don't know what valgrind is, but I'll have a look and see how to deal 
> with that ...

Valgrind is a totally awesome memory allocation debugger ;-)

> > A related thing to try might be to run "MALLOC_CHECK_=2 xl create ..."
> > which enables glib's heap consistency checks (described at the end of
> > http://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html) which might give a clue.
> I tried that, but the same segfault and no more messages on the screen - 
> or should I have run this under gdb as well?

No, just setting the var is all that is needed. The fact that nothing
has changed would suggest that it's not an obvious heap corruption at
least. Valgrind might find something more subtle, but TBH I doubt it.

> > Otherwise I think the next step would be to downgrade to 4.3.1 and see
> > if the problem persists, in order to rule out changes elsewhere in the
> > system. If the problem doesn't happen with a 4.3.1 rebuilt on your
> > current system then the next thing would probably be to bisect the
> > issue. There are only 31 toolstack changes in that range, so it ought to
> > only take 5-6 iterations.
> Unfortunately 4.3.1 is no longer available as an ebuild as 4.3.3 seemed 
> to fix security issues and therefore 4.3.1 has been deleted from the 
> repos. So it's not straightforward and I need to figure out how to get 
> the old version back. But I am sure there's a way.

I don't know anything about ebuilds, but if you end up needing to bisect
then building from xen.git might be useful anyway (unless you can get an
ebuild to trivially build some other version).

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-04 16:31                 ` Ian Campbell
@ 2014-11-04 16:48                   ` Atom2
  2014-11-05  9:33                     ` Ian Campbell
  2014-11-04 17:30                   ` Atom2
  1 sibling, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-11-04 16:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Am 04.11.14 um 17:31 schrieb Ian Campbell:
> On Tue, 2014-11-04 at 17:14 +0100, Atom2 wrote:
>> Am 04.11.14 um 16:44 schrieb Ian Campbell:
>>> On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
>>> Otherwise I think the next step would be to downgrade to 4.3.1 and see
>>> if the problem persists, in order to rule out changes elsewhere in the
>>> system. If the problem doesn't happen with a 4.3.1 rebuilt on your
>>> current system then the next thing would probably be to bisect the
>>> issue. There are only 31 toolstack changes in that range, so it ought to
>>> only take 5-6 iterations.
>> Unfortunately 4.3.1 is no longer available as an ebuild as 4.3.3 seemed
>> to fix security issues and therefore 4.3.1 has been deleted from the
>> repos. So it's not straightforward and I need to figure out how to get
>> the old version back. But I am sure there's a way.
>
> I don't know anything about ebuilds, but if you end up needing to bisect
> then building from xen.git might be useful anyway (unless you can get an
> ebuild to trivially build some other version).
Before I go down that route would I also need to re-compile xen (i.e. 
the hypervisor at /boot/xen-4.3.3.gz that's being booted from grub) or 
only xen-tools? In other words does xen-tools 4.3.1 work with the 
hypervisor version 4.3.3 under /boot? I wouldn't want to end up with an 
unbootable system.

In terms of ebuilds: Adding patches to a version is pretty easy as 
gentoo works from source code. So if 4.3.3 is just different by a number 
of well defined patches from 4.3.1 then that should be straightforward 
as applying patches is really trivial.

Thanks Atom2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-04 16:31                 ` Ian Campbell
  2014-11-04 16:48                   ` Atom2
@ 2014-11-04 17:30                   ` Atom2
  2014-11-05  9:45                     ` Ian Campbell
  1 sibling, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-11-04 17:30 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

Am 04.11.14 um 17:31 schrieb Ian Campbell:
> On Tue, 2014-11-04 at 17:14 +0100, Atom2 wrote:
>> Am 04.11.14 um 16:44 schrieb Ian Campbell:
>>> On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
>>> You could try running the xl command under valgrind, you may find "xl
>>> create -F" (which keeps xl in the foreground) handy if you try this.
>>> That might help catch any heap corruption etc.
>> I don't know what valgrind is, but I'll have a look and see how to deal
>> with that ...
>
> Valgrind is a totally awesome memory allocation debugger ;-)
>
In the attached file named 'valgrind 'please find the output of
	# valgrind xl create -F -c pfsense

There's a lot of information in that file which is above my current 
level of expertise, but I am sure you are able to make sense out of it.

BTW the stuff at lines 6 and 111 to 119 is the usual output of the xl 
command before it segfaults.

Thanks Atom2

[-- Attachment #2: valgrind --]
[-- Type: text/plain, Size: 27603 bytes --]

==30524== Memcheck, a memory error detector
==30524== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==30524== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==30524== Command: xl create -F -c pfsense
==30524==
Parsing config from pfsense
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059E81: libxl_list_domain (libxl.c:560)
==30524==    by 0x507BF50: libxl_name_to_domid (libxl_utils.c:73)
==30524==    by 0x5059364: libxl__domain_rename (libxl.c:312)
==30524==    by 0x5069161: libxl__domain_make (libxl_create.c:471)
==30524==    by 0x5069F57: do_domain_create (libxl_create.c:643)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059E85: libxl_list_domain (libxl.c:566)
==30524==    by 0x507BF50: libxl_name_to_domid (libxl_utils.c:73)
==30524==    by 0x5059364: libxl__domain_rename (libxl.c:312)
==30524==    by 0x5069161: libxl__domain_make (libxl_create.c:471)
==30524==    by 0x5069F57: do_domain_create (libxl_create.c:643)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059F07: libxl_list_domain (libxl.c:566)
==30524==    by 0x507BF50: libxl_name_to_domid (libxl_utils.c:73)
==30524==    by 0x5059364: libxl__domain_rename (libxl.c:312)
==30524==    by 0x5069161: libxl__domain_make (libxl_create.c:471)
==30524==    by 0x5069F57: do_domain_create (libxl_create.c:643)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x507BF62: libxl_name_to_domid (libxl_utils.c:77)
==30524==    by 0x5059364: libxl__domain_rename (libxl.c:312)
==30524==    by 0x5069161: libxl__domain_make (libxl_create.c:471)
==30524==    by 0x5069F57: do_domain_create (libxl_create.c:643)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x507BFC3: libxl_name_to_domid (libxl_utils.c:77)
==30524==    by 0x5059364: libxl__domain_rename (libxl.c:312)
==30524==    by 0x5069161: libxl__domain_make (libxl_create.c:471)
==30524==    by 0x5069F57: do_domain_create (libxl_create.c:643)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x505A29B: libxl_list_vm (libxl.c:684)
==30524==    by 0x5069347: libxl__domain_make (libxl_create.c:506)
==30524==    by 0x5069F57: do_domain_create (libxl_create.c:643)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x505A29F: libxl_list_vm (libxl.c:688)
==30524==    by 0x5069347: libxl__domain_make (libxl_create.c:506)
==30524==    by 0x5069F57: do_domain_create (libxl_create.c:643)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x505A34F: libxl_list_vm (libxl.c:688)
==30524==    by 0x5069347: libxl__domain_make (libxl_create.c:506)
==30524==    by 0x5069F57: do_domain_create (libxl_create.c:643)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5072FA1: libxl__domain_cpupool (libxl_dom.c:67)
==30524==    by 0x507309E: libxl__domain_scheduler (libxl_dom.c:82)
==30524==    by 0x506A01F: do_domain_create (libxl_create.c:53)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5072E51: libxl__domain_type (libxl_dom.c:34)
==30524==    by 0x505F20D: libxl__device_nic_setdefault (libxl.c:2827)
==30524==    by 0x506A199: do_domain_create (libxl_create.c:684)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
--30524-- WARNING: unhandled __HYPERVISOR_domctl subop: 10
--30524-- You may be able to write your own handler.
--30524-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--30524-- Nevertheless we consider this a bug.  Please report
--30524-- it at http://valgrind.org/support/bug_reports.html &
--30524-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
xc: info: VIRTUAL MEMORY ARRANGEMENT:
  Loader:        0000000000100000->00000000001c12a4
  Modules:       0000000000000000->0000000000000000
  TOTAL:         0000000000000000->000000001f800000
  ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
  4KB PAGES: 0x0000000000000200
  2MB PAGES: 0x00000000000000fb
  1GB PAGES: 0x0000000000000000
--30524-- WARNING: unhandled __HYPERVISOR_memory_op subop: 7
--30524-- You may be able to write your own handler.
--30524-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--30524-- Nevertheless we consider this a bug.  Please report
--30524-- it at http://valgrind.org/support/bug_reports.html &
--30524-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
xc: error: panic: xc_dom_boot.c:388: xc_dom_gnttab_hvm_seed: failed to add gnttab to physmap [errno=38]
: Internal error
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5072FA1: libxl__domain_cpupool (libxl_dom.c:67)
==30524==    by 0x507309E: libxl__domain_scheduler (libxl_dom.c:82)
==30524==    by 0x506546A: libxl_domain_sched_params_set (libxl.c:4597)
==30524==    by 0x50735CE: libxl__build_post (libxl_dom.c:267)
==30524==    by 0x5068D73: libxl__domain_build (libxl_create.c:375)
==30524==    by 0x5069C70: domcreate_bootloader_done (libxl_create.c:770)
==30524==    by 0x508AD4F: bootloader_local_detached_cb (libxl_bootloader.c:281)
==30524==    by 0x50583DD: local_device_detach_cb (libxl.c:2777)
==30524==    by 0x505EB05: libxl__device_disk_local_initiate_detach (libxl.c:2752)
==30524==    by 0x508B5A4: bootloader_callback (libxl_bootloader.c:265)
==30524==    by 0x508C5C0: libxl__bootloader_run (libxl_bootloader.c:392)
==30524==    by 0x506A2EB: do_domain_create (libxl_create.c:709)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x506563E: libxl_domain_sched_params_set (libxl.c:4370)
==30524==    by 0x50735CE: libxl__build_post (libxl_dom.c:267)
==30524==    by 0x5068D73: libxl__domain_build (libxl_create.c:375)
==30524==    by 0x5069C70: domcreate_bootloader_done (libxl_create.c:770)
==30524==    by 0x508AD4F: bootloader_local_detached_cb (libxl_bootloader.c:281)
==30524==    by 0x50583DD: local_device_detach_cb (libxl.c:2777)
==30524==    by 0x505EB05: libxl__device_disk_local_initiate_detach (libxl.c:2752)
==30524==    by 0x508B5A4: bootloader_callback (libxl_bootloader.c:265)
==30524==    by 0x508C5C0: libxl__bootloader_run (libxl_bootloader.c:392)
==30524==    by 0x506A2EB: do_domain_create (libxl_create.c:709)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5065690: libxl_domain_sched_params_set (libxl.c:4374)
==30524==    by 0x50735CE: libxl__build_post (libxl_dom.c:267)
==30524==    by 0x5068D73: libxl__domain_build (libxl_create.c:375)
==30524==    by 0x5069C70: domcreate_bootloader_done (libxl_create.c:770)
==30524==    by 0x508AD4F: bootloader_local_detached_cb (libxl_bootloader.c:281)
==30524==    by 0x50583DD: local_device_detach_cb (libxl.c:2777)
==30524==    by 0x505EB05: libxl__device_disk_local_initiate_detach (libxl.c:2752)
==30524==    by 0x508B5A4: bootloader_callback (libxl_bootloader.c:265)
==30524==    by 0x508C5C0: libxl__bootloader_run (libxl_bootloader.c:392)
==30524==    by 0x506A2EB: do_domain_create (libxl_create.c:709)
==30524==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30524==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5072E51: libxl__domain_type (libxl_dom.c:34)
==30524==    by 0x505D6DA: device_disk_add (libxl.c:2061)
==30524==    by 0x505DE65: libxl__device_disk_add (libxl.c:2229)
==30524==    by 0x5079723: libxl__add_disks (libxl_device.c:549)
==30524==    by 0x506771C: domcreate_rebuild_done (libxl_create.c:933)
==30524==    by 0x5069C7E: domcreate_bootloader_done (libxl_create.c:771)
==30524==    by 0x508AD4F: bootloader_local_detached_cb (libxl_bootloader.c:281)
==30524==    by 0x50583DD: local_device_detach_cb (libxl.c:2777)
==30524==    by 0x505EB05: libxl__device_disk_local_initiate_detach (libxl.c:2752)
==30524==    by 0x508B5A4: bootloader_callback (libxl_bootloader.c:265)
==30524==    by 0x508C5C0: libxl__bootloader_run (libxl_bootloader.c:392)
==30524==    by 0x506A2EB: do_domain_create (libxl_create.c:709)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059F8A: libxl_domain_info (libxl.c:579)
==30524==    by 0x5079FBA: libxl__wait_device_connection (libxl_device.c:723)
==30524==    by 0x505DDB8: device_disk_add (libxl.c:2215)
==30524==    by 0x505DE65: libxl__device_disk_add (libxl.c:2229)
==30524==    by 0x5079723: libxl__add_disks (libxl_device.c:549)
==30524==    by 0x506771C: domcreate_rebuild_done (libxl_create.c:933)
==30524==    by 0x5069C7E: domcreate_bootloader_done (libxl_create.c:771)
==30524==    by 0x508AD4F: bootloader_local_detached_cb (libxl_bootloader.c:281)
==30524==    by 0x50583DD: local_device_detach_cb (libxl.c:2777)
==30524==    by 0x505EB05: libxl__device_disk_local_initiate_detach (libxl.c:2752)
==30524==    by 0x508B5A4: bootloader_callback (libxl_bootloader.c:265)
==30524==    by 0x508C5C0: libxl__bootloader_run (libxl_bootloader.c:392)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059FCD: libxl_domain_info (libxl.c:583)
==30524==    by 0x5079FBA: libxl__wait_device_connection (libxl_device.c:723)
==30524==    by 0x505DDB8: device_disk_add (libxl.c:2215)
==30524==    by 0x505DE65: libxl__device_disk_add (libxl.c:2229)
==30524==    by 0x5079723: libxl__add_disks (libxl_device.c:549)
==30524==    by 0x506771C: domcreate_rebuild_done (libxl_create.c:933)
==30524==    by 0x5069C7E: domcreate_bootloader_done (libxl_create.c:771)
==30524==    by 0x508AD4F: bootloader_local_detached_cb (libxl_bootloader.c:281)
==30524==    by 0x50583DD: local_device_detach_cb (libxl.c:2777)
==30524==    by 0x505EB05: libxl__device_disk_local_initiate_detach (libxl.c:2752)
==30524==    by 0x508B5A4: bootloader_callback (libxl_bootloader.c:265)
==30524==    by 0x508C5C0: libxl__bootloader_run (libxl_bootloader.c:392)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5072E51: libxl__domain_type (libxl_dom.c:34)
==30524==    by 0x505F20D: libxl__device_nic_setdefault (libxl.c:2827)
==30524==    by 0x505F33D: libxl__device_nic_add (libxl.c:2871)
==30524==    by 0x50797CB: libxl__add_nics (libxl_device.c:550)
==30524==    by 0x5067B3E: domcreate_devmodel_started (libxl_create.c:1104)
==30524==    by 0x506A96F: device_model_spawn_outcome (libxl_dm.c:1295)
==30524==    by 0x506A9C4: device_model_detached (libxl_dm.c:1269)
==30524==    by 0x5076B21: spawn_middle_death (libxl_exec.c:465)
==30524==    by 0x508A139: childproc_reaped (libxl_fork.c:266)
==30524==    by 0x508A542: libxl__fork_selfpipe_woken (libxl_fork.c:302)
==30524==    by 0x5087F8B: afterpoll_internal (libxl_event.c:1008)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059F8A: libxl_domain_info (libxl.c:579)
==30524==    by 0x5079FBA: libxl__wait_device_connection (libxl_device.c:723)
==30524==    by 0x505F744: libxl__device_nic_add (libxl.c:2947)
==30524==    by 0x50797CB: libxl__add_nics (libxl_device.c:550)
==30524==    by 0x5067B3E: domcreate_devmodel_started (libxl_create.c:1104)
==30524==    by 0x506A96F: device_model_spawn_outcome (libxl_dm.c:1295)
==30524==    by 0x506A9C4: device_model_detached (libxl_dm.c:1269)
==30524==    by 0x5076B21: spawn_middle_death (libxl_exec.c:465)
==30524==    by 0x508A139: childproc_reaped (libxl_fork.c:266)
==30524==    by 0x508A542: libxl__fork_selfpipe_woken (libxl_fork.c:302)
==30524==    by 0x5087F8B: afterpoll_internal (libxl_event.c:1008)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059FCD: libxl_domain_info (libxl.c:583)
==30524==    by 0x5079FBA: libxl__wait_device_connection (libxl_device.c:723)
==30524==    by 0x505F744: libxl__device_nic_add (libxl.c:2947)
==30524==    by 0x50797CB: libxl__add_nics (libxl_device.c:550)
==30524==    by 0x5067B3E: domcreate_devmodel_started (libxl_create.c:1104)
==30524==    by 0x506A96F: device_model_spawn_outcome (libxl_dm.c:1295)
==30524==    by 0x506A9C4: device_model_detached (libxl_dm.c:1269)
==30524==    by 0x5076B21: spawn_middle_death (libxl_exec.c:465)
==30524==    by 0x508A139: childproc_reaped (libxl_fork.c:266)
==30524==    by 0x508A542: libxl__fork_selfpipe_woken (libxl_fork.c:302)
==30524==    by 0x5087F8B: afterpoll_internal (libxl_event.c:1008)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5072E51: libxl__domain_type (libxl_dom.c:34)
==30524==    by 0x506FEA8: libxl__device_pci_add (libxl_pci.c:1038)
==30524==    by 0x506782E: domcreate_attach_pci (libxl_create.c:1168)
==30524==    by 0x5067A0F: domcreate_attach_vtpms (libxl_create.c:1142)
==30524==    by 0x5078535: multidev_one_callback (libxl_device.c:512)
==30524==    by 0x5079A32: device_hotplug_done (libxl_device.c:1059)
==30524==    by 0x5079CD4: device_hotplug (libxl_device.c:982)
==30524==    by 0x5079E16: device_hotplug_child_death_cb (libxl_device.c:1036)
==30524==    by 0x508A139: childproc_reaped (libxl_fork.c:266)
==30524==    by 0x508A542: libxl__fork_selfpipe_woken (libxl_fork.c:302)
==30524==    by 0x5087F8B: afterpoll_internal (libxl_event.c:1008)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==
--30524-- WARNING: unhandled __HYPERVISOR_domctl subop: 45
--30524-- You may be able to write your own handler.
--30524-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--30524-- Nevertheless we consider this a bug.  Please report
--30524-- it at http://valgrind.org/support/bug_reports.html &
--30524-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 0000:04:00.0 cannot be assigned - no IOMMU?
--30524-- WARNING: unhandled __HYPERVISOR_domctl subop: 45
--30524-- You may be able to write your own handler.
--30524-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--30524-- Nevertheless we consider this a bug.  Please report
--30524-- it at http://valgrind.org/support/bug_reports.html &
--30524-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 0000:0a:08.0 cannot be assigned - no IOMMU?
--30524-- WARNING: unhandled __HYPERVISOR_domctl subop: 45
--30524-- You may be able to write your own handler.
--30524-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--30524-- Nevertheless we consider this a bug.  Please report
--30524-- it at http://valgrind.org/support/bug_reports.html &
--30524-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 0000:0a:0b.0 cannot be assigned - no IOMMU?
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059F8A: libxl_domain_info (libxl.c:579)
==30524==    by 0x5072CC2: userdata_path (libxl_dom.c:1516)
==30524==    by 0x5075C05: libxl_userdata_store (libxl_dom.c:1577)
==30524==    by 0x119C44: create_domain (xl_cmdimpl.c:2053)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059FCD: libxl_domain_info (libxl.c:583)
==30524==    by 0x5072CC2: userdata_path (libxl_dom.c:1516)
==30524==    by 0x5075C05: libxl_userdata_store (libxl_dom.c:1577)
==30524==    by 0x119C44: create_domain (xl_cmdimpl.c:2053)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059F8A: libxl_domain_info (libxl.c:579)
==30524==    by 0x5072CC2: userdata_path (libxl_dom.c:1516)
==30524==    by 0x5075C3E: libxl_userdata_store (libxl_dom.c:1588)
==30524==    by 0x119C44: create_domain (xl_cmdimpl.c:2053)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5059FCD: libxl_domain_info (libxl.c:583)
==30524==    by 0x5072CC2: userdata_path (libxl_dom.c:1516)
==30524==    by 0x5075C3E: libxl_userdata_store (libxl_dom.c:1588)
==30524==    by 0x119C44: create_domain (xl_cmdimpl.c:2053)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5072E51: libxl__domain_type (libxl_dom.c:34)
==30524==    by 0x505AC35: libxl_domain_unpause (libxl.c:837)
==30524==    by 0x119C81: create_domain (xl_cmdimpl.c:2064)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
Waiting for domain pfsense (domid 16) to die [pid 30524]
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5058B00: domain_death_xswatch_callback (libxl.c:994)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x594EE12: vfprintf (vfprintf.c:1634)
==30524==    by 0x5A0DF0A: __vasprintf_chk (vasprintf_chk.c:66)
==30524==    by 0x507AD61: libxl__logv (stdio2.h:210)
==30524==    by 0x507AF1B: libxl__log (libxl_internal.c:209)
==30524==    by 0x5058BA1: domain_death_xswatch_callback (libxl.c:1002)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Use of uninitialised value of size 8
==30524==    at 0x594DCBB: _itoa_word (_itoa.c:179)
==30524==    by 0x5950E23: vfprintf (vfprintf.c:1634)
==30524==    by 0x5A0DF0A: __vasprintf_chk (vasprintf_chk.c:66)
==30524==    by 0x507AD61: libxl__logv (stdio2.h:210)
==30524==    by 0x507AF1B: libxl__log (libxl_internal.c:209)
==30524==    by 0x5058BA1: domain_death_xswatch_callback (libxl.c:1002)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x594DCC5: _itoa_word (_itoa.c:179)
==30524==    by 0x5950E23: vfprintf (vfprintf.c:1634)
==30524==    by 0x5A0DF0A: __vasprintf_chk (vasprintf_chk.c:66)
==30524==    by 0x507AD61: libxl__logv (stdio2.h:210)
==30524==    by 0x507AF1B: libxl__log (libxl_internal.c:209)
==30524==    by 0x5058BA1: domain_death_xswatch_callback (libxl.c:1002)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5950E73: vfprintf (vfprintf.c:1634)
==30524==    by 0x5A0DF0A: __vasprintf_chk (vasprintf_chk.c:66)
==30524==    by 0x507AD61: libxl__logv (stdio2.h:210)
==30524==    by 0x507AF1B: libxl__log (libxl_internal.c:209)
==30524==    by 0x5058BA1: domain_death_xswatch_callback (libxl.c:1002)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x594EEEA: vfprintf (vfprintf.c:1634)
==30524==    by 0x5A0DF0A: __vasprintf_chk (vasprintf_chk.c:66)
==30524==    by 0x507AD61: libxl__logv (stdio2.h:210)
==30524==    by 0x507AF1B: libxl__log (libxl_internal.c:209)
==30524==    by 0x5058BA1: domain_death_xswatch_callback (libxl.c:1002)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x594EF6D: vfprintf (vfprintf.c:1634)
==30524==    by 0x5A0DF0A: __vasprintf_chk (vasprintf_chk.c:66)
==30524==    by 0x507AD61: libxl__logv (stdio2.h:210)
==30524==    by 0x507AF1B: libxl__log (libxl_internal.c:209)
==30524==    by 0x5058BA1: domain_death_xswatch_callback (libxl.c:1002)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5058BF9: domain_death_xswatch_callback (libxl.c:1012)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5058C5E: domain_death_xswatch_callback (libxl.c:1017)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30524== Conditional jump or move depends on uninitialised value(s)
==30524==    at 0x5058C7B: domain_death_xswatch_callback (libxl.c:1022)
==30524==    by 0x5087637: watchfd_callback (libxl_event.c:504)
==30524==    by 0x5087EC5: afterpoll_internal (libxl_event.c:995)
==30524==    by 0x5088208: eventloop_iteration (libxl_event.c:1440)
==30524==    by 0x50891C6: libxl_event_wait (libxl_event.c:1465)
==30524==    by 0x11A14D: create_domain (xl_cmdimpl.c:1786)
==30524==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30524==    by 0x1122E6: main (xl.c:356)
==30524==
==30635== Conditional jump or move depends on uninitialised value(s)
==30635==    at 0x5072E51: libxl__domain_type (libxl_dom.c:34)
==30635==    by 0x50578BC: libxl__primary_console_find (libxl.c:1572)
==30635==    by 0x505C5DC: libxl_primary_console_exec (libxl.c:1603)
==30635==    by 0x113584: autoconnect_console (xl_cmdimpl.c:1776)
==30635==    by 0x50887D9: libxl__egc_cleanup (libxl_event.c:1161)
==30635==    by 0x5089756: libxl__ao_inprogress (libxl_event.c:1698)
==30635==    by 0x506A374: do_domain_create (libxl_create.c:1256)
==30635==    by 0x506A459: libxl_domain_create_new (libxl_create.c:1277)
==30635==    by 0x119AD7: create_domain (xl_cmdimpl.c:2020)
==30635==    by 0x11DB31: main_create (xl_cmdimpl.c:4211)
==30635==    by 0x1122E6: main (xl.c:356)
==30635==


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-04 16:48                   ` Atom2
@ 2014-11-05  9:33                     ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2014-11-05  9:33 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Tue, 2014-11-04 at 17:48 +0100, Atom2 wrote:
> Am 04.11.14 um 17:31 schrieb Ian Campbell:
> > On Tue, 2014-11-04 at 17:14 +0100, Atom2 wrote:
> >> Am 04.11.14 um 16:44 schrieb Ian Campbell:
> >>> On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
> >>> Otherwise I think the next step would be to downgrade to 4.3.1 and see
> >>> if the problem persists, in order to rule out changes elsewhere in the
> >>> system. If the problem doesn't happen with a 4.3.1 rebuilt on your
> >>> current system then the next thing would probably be to bisect the
> >>> issue. There are only 31 toolstack changes in that range, so it ought to
> >>> only take 5-6 iterations.
> >> Unfortunately 4.3.1 is no longer available as an ebuild as 4.3.3 seemed
> >> to fix security issues and therefore 4.3.1 has been deleted from the
> >> repos. So it's not straightforward and I need to figure out how to get
> >> the old version back. But I am sure there's a way.
> >
> > I don't know anything about ebuilds, but if you end up needing to bisect
> > then building from xen.git might be useful anyway (unless you can get an
> > ebuild to trivially build some other version).
> Before I go down that route would I also need to re-compile xen (i.e. 
> the hypervisor at /boot/xen-4.3.3.gz that's being booted from grub) or 
> only xen-tools? In other words does xen-tools 4.3.1 work with the 
> hypervisor version 4.3.3 under /boot?

I think in principal within a stable branch (i.e. 4.3.x) it ought to
work, but I wouldn't swear to it. I think you can probably risk it --
the failure mode in case of a bad mismatch will be very different (liek
a permissions failure early in building a domU), so if you see anything
newly odd you could rebuild Xen.

> I wouldn't want to end up with an unbootable system.

A tools/hv mismatch won't ever stop dom0 booting, it would just stop you
starting a guest. You can always boot the kernel natively in any case.

> In terms of ebuilds: Adding patches to a version is pretty easy as 
> gentoo works from source code. So if 4.3.3 is just different by a number 
> of well defined patches from 4.3.1 then that should be straightforward 
> as applying patches is really trivial.

The git history from 4.3.1 to 4.3.3 is (as it happens) completely
linear, so something like git format patch should trivially produce a
list of patches to apply (or perhaps unapply if it is easier to work
backwards from 4.3.3). I don't know what local patches the ebuild
includes or how much conflict there will be though.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-04 17:30                   ` Atom2
@ 2014-11-05  9:45                     ` Ian Campbell
  2014-11-05 12:01                       ` Atom2
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-11-05  9:45 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Tue, 2014-11-04 at 18:30 +0100, Atom2 wrote:
> Am 04.11.14 um 17:31 schrieb Ian Campbell:
> > On Tue, 2014-11-04 at 17:14 +0100, Atom2 wrote:
> >> Am 04.11.14 um 16:44 schrieb Ian Campbell:
> >>> On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
> >>> You could try running the xl command under valgrind, you may find "xl
> >>> create -F" (which keeps xl in the foreground) handy if you try this.
> >>> That might help catch any heap corruption etc.
> >> I don't know what valgrind is, but I'll have a look and see how to deal
> >> with that ...
> >
> > Valgrind is a totally awesome memory allocation debugger ;-)
> >
> In the attached file named 'valgrind 'please find the output of
> 	# valgrind xl create -F -c pfsense

Sadly it looks like your version of valgrind doesn't know how to handle
the hypercalls made by the Xen toolstack, which means it produces a lot
of unrelated noise.

You seem to be using valgrind 3.9.0, which lacked knowledge of some of
the HVM related hypercalls that weren't added until 3.10.0. It's
probably not worth pursuing this angle any further (unless it is utterly
trivial to pull in the new version).

Apart from the valgrind output there is a new message from libxl:
        libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 0000:04:00.0 cannot be assigned - no IOMMU?
which suggests that it isn't passing things through (this might be
fallout from valgrind not understanding things) and no segfault.

OOI what does "xl create -F ..." do without valgrind (I'm wondering if
-F is responsible for the change in behaviour).

> There's a lot of information in that file which is above my current 
> level of expertise, but I am sure you are able to make sense out of it.
> 
> BTW the stuff at lines 6 and 111 to 119 is the usual output of the xl 
> command before it segfaults.
> 
> Thanks Atom2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-05  9:45                     ` Ian Campbell
@ 2014-11-05 12:01                       ` Atom2
  2014-11-05 12:39                         ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-11-05 12:01 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 4137 bytes --]

Am 05.11.14 um 10:45 schrieb Ian Campbell:
> On Tue, 2014-11-04 at 18:30 +0100, Atom2 wrote:
>> Am 04.11.14 um 17:31 schrieb Ian Campbell:
>>> On Tue, 2014-11-04 at 17:14 +0100, Atom2 wrote:
>>>> Am 04.11.14 um 16:44 schrieb Ian Campbell:
>>>>> On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
> Sadly it looks like your version of valgrind doesn't know how to handle
> the hypercalls made by the Xen toolstack, which means it produces a lot
> of unrelated noise.
>
> You seem to be using valgrind 3.9.0, which lacked knowledge of some of
> the HVM related hypercalls that weren't added until 3.10.0. It's
> probably not worth pursuing this angle any further (unless it is utterly
> trivial to pull in the new version).
Many thanks again for your quick answers.

You were right, I used valgrind-3.9.0 which is the latest stable version 
for gentoo. 3.10.0 is available under unstable and it was indeed trivial 
to pull that in instead. The unrelated noise seems to have disappeared, 
so attached please find the output of running
	# valgrindd xl create -F -c pfsense

The strange thing was: No segfault at the start, but obviously also 
issues with passing through the PCI devices as evidenced by the same 
error messages you flagged below. Also the boot menu now showed up and I 
was able to boot the domain - but, as expected by the error message, no 
network devices have been passed through. Even a

	# xl shutdown -F pfsense
	Shutting down domain 2
	PV control interface not available: sending ACPI power button event.
	#

from another ssh connection to dom0 worked (no segfault message in that 
session) and as such the attached file 'valgrind.out' contains the 
complete screen output of the valgrind session from start to finnish. 
However, towards the end of that file (line 235) you'll see a SEGFAULT 
message from valgrind. I hope you can make some sense out of that ... or 
should I rerun with some options to valgrind (like the ones mentioned in 
the output):
	--leak-check=full
	-v

To me, it looks as if something is broken with the PCI passthrough stuff 
and that has started with 4.3.3. Strangely however, valgrind seems to 
work around that issue insofar that no segfault happens. Is there any 
explanation of the different behaviour between native execution of xl 
and starting xl under valgrind's control?

In any case, I am positive that there hasn't been any change to the 
hardware of the system, not even a slot change of an add-on card. So I 
have no clue why the system after the upgrade misbehaves.
>
> Apart from the valgrind output there is a new message from libxl:
>          libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 0000:04:00.0 cannot be assigned - no IOMMU?
> which suggests that it isn't passing things through (this might be
> fallout from valgrind not understanding things) and no segfault.
>
> OOI what does "xl create -F ..." do without valgrind (I'm wondering if
> -F is responsible for the change in behaviour).
I tried that as well:

	vm-host auto [526] # xl create -F -c pfsense
	Parsing config from pfsense
	xc: info: VIRTUAL MEMORY ARRANGEMENT:
	  Loader:        0000000000100000->00000000001c12a4
	  Modules:       0000000000000000->0000000000000000
	  TOTAL:         0000000000000000->000000001f800000
	  ENTRY ADDRESS: 0000000000100000
	xc: info: PHYSICAL MEMORY ALLOCATION:
	  4KB PAGES: 0x0000000000000200
	  2MB PAGES: 0x00000000000000fb
	  1GB PAGES: 0x0000000000000000
	Segmentation fault
	vm-host auto [527] # xl list
	Name                        ID   Mem VCPUs      State   Time(s)
	Domain-0                     0  4094     8     r-----     451.5
	pfsense                      1   512     1     --p---       0.0
	vm-host auto [528] # xl destroy pfsense
	Segmentation fault
	vm-host auto [529] # xl list
	Name                        ID   Mem VCPUs      State   Time(s)
	Domain-0                     0  4096     8     r-----     452.1
	vm-host auto [529] #

and, as you can see, again had the segfault and the same status of the 
domU as back at the time when the issues started (i.e. paused - which 
you explained as being normal after a start).

Thanks Atom2

[-- Attachment #2: valgrind.out --]
[-- Type: text/plain, Size: 13682 bytes --]

vm-host auto [540] # valgrind xl create -F -c pfsense
==24982== Memcheck, a memory error detector
==24982== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==24982== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==24982== Command: xl create -F -c pfsense
==24982==
Parsing config from pfsense
--24982-- WARNING: unhandled __HYPERVISOR_domctl shadow(10) subop: 31
--24982-- You may be able to write your own handler.
--24982-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--24982-- Nevertheless we consider this a bug.  Please report
--24982-- it at http://valgrind.org/support/bug_reports.html &
--24982-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
xc: info: VIRTUAL MEMORY ARRANGEMENT:
  Loader:        0000000000100000->00000000001c12a4
  Modules:       0000000000000000->0000000000000000
  TOTAL:         0000000000000000->000000001f800000
  ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
  4KB PAGES: 0x0000000000000200
  2MB PAGES: 0x00000000000000fb
  1GB PAGES: 0x0000000000000000
--24982-- WARNING: unhandled __HYPERVISOR_domctl subop: 45
--24982-- You may be able to write your own handler.
--24982-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--24982-- Nevertheless we consider this a bug.  Please report
--24982-- it at http://valgrind.org/support/bug_reports.html &
--24982-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 0000:04:00.0 cannot be assigned - no IOMMU?
--24982-- WARNING: unhandled __HYPERVISOR_domctl subop: 45
--24982-- You may be able to write your own handler.
--24982-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--24982-- Nevertheless we consider this a bug.  Please report
--24982-- it at http://valgrind.org/support/bug_reports.html &
--24982-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 0000:0a:08.0 cannot be assigned - no IOMMU?
--24982-- WARNING: unhandled __HYPERVISOR_domctl subop: 45
--24982-- You may be able to write your own handler.
--24982-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--24982-- Nevertheless we consider this a bug.  Please report
--24982-- it at http://valgrind.org/support/bug_reports.html &
--24982-- http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.
libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI device 0000:0a:0b.0 cannot be assigned - no IOMMU?
Waiting for domain pfsense (domid 2) to die [pid 24982]
/boot/config: -DConsoles: internal video/keyboard  serial port
BIOS drive C: is disk0
BIOS 639kB/515068kB available memory

FreeBSD/x86 bootstrap loader, Revision 1.1
(root@pf2_1_1_amd64.pfsense.org, Mon Aug 25 08:18:48 EDT 2014)
Loading /boot/defaults/loader.conf
/boot/kernel/kernel data=0xbc1792 data=0x596478+0xe0ed0 syms=[0x8+0x125f70+0x8+0x113bc5]
\

 �������������������������������������������
 �                                         �
 �                                         �
 �                                         �
 �          Welcome to pfSense!            �
 �                                         �                 ______
 �                                         �                /      \
 �  1. Boot pfSense [default]              �          _____/    f   \
 �  2. Boot pfSense with ACPI disabled     �         /     \        /
 �  3. Boot pfSense using USB device       �        /   p   \______/  Sense
 �  4. Boot pfSense in Safe Mode           �        \       /      \
 �  5. Boot pfSense in single user mode    �         \_____/        \
 �  6. Boot pfSense with verbose logging   �               \        /
 �  7. Escape to loader prompt             �                \______/
 �  8. Reboot                              �
 �                                         �
 �                                         �
 �                                         �
 �  Select option, [Enter] for default     �
 �  or [Space] to pause timer  3           �
 �������������������������������������������


KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:27:11 EDT 2014
    root@pf2_1_1_amd64.pfsense.org:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.8 amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz (2394.57-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x206a7  Family = 6  Model = 2a  Stepping = 7
  Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x97ba2203<SSE3,PCLMULQDQ,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,AVX,HV>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 528482304 (504 MB)
avail memory = 483524608 (461 MB)
ACPI APIC Table: <HPQOEM SLIC-CPC>
ioapic0: Changing APIC ID to 1
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-47 on motherboard
wlan: mac acl policy registered
ipw_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw/.
ipw_bss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (ipw_bss_fw, 0xffffffff804abaf0, 0) error 1
ipw_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw/.
ipw_ibss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (ipw_ibss_fw, 0xffffffff804abb90, 0) error 1
ipw_monitor: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw/.
ipw_monitor: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (ipw_monitor_fw, 0xffffffff804abc30, 0) error 1
kbd1 at kbdmux0
cryptosoft0: <software crypto> on motherboard
padlock0: No ACE support.
acpi0: <HPQOEM SLIC-CPC> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0xb008-0xb00b on acpi0
cpu0: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc240-0xc24f at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata0: [ITHREAD]
ata1: <ATA channel> at channel 1 on atapci0
ata1: [ITHREAD]
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 0xf0000000-0xf1ffffff,0xf3052000-0xf3052fff at device 2.0 on pci0
pci0: <unknown> at device 3.0 (no driver attached)
sym0: <895a> port 0xc100-0xc1ff mem 0xf3053000-0xf30533ff,0xf3050000-0xf3051fff irq 32 at device 4.0 on pci0
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: [ITHREAD]
em0: <Intel(R) PRO/1000 Legacy Network Connection 1.0.6> port 0xc200-0xc23f mem 0xf3000000-0xf301ffff irq 36 at device 5.0 on pci0
em0: [FILTER]
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 62500000 Hz quality 900
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
uart0: console (115200,n,8,1)
ppc0: <Parallel port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppc0: [ITHREAD]
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
plip0: [ITHREAD]
lpt0: <Printer> on ppbus0
lpt0: [ITHREAD]
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
orm0: <ISA Option ROM> at iomem 0xed800-0xeffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 2394570596 Hz quality 800
Timecounters tick every 10.000 msec
IPsec: Initialized Security Association Processing.
acd0: DVDROM <QEMU DVD-ROM/1.3.1> at ata1-master WDMA2
sym0: unknown interrupt(s) ignored, ISTAT=0x1 DSTAT=0x80 SIST=0x0
da0 at sym0 bus 0 scbus0 target 0 lun 0
da0: <QEMU QEMU HARDDISK 1.3.> Fixed Direct Access SCSI-5 device
da0: 3.300MB/s transfers
da0: Command Queueing enabled
da0: 8192MB (16777216 512 byte sectors: 255H 63S/T 1044C)
Trying to mount root from ufs:/dev/da0s1a
Configuring crash dumps...
Using /dev/da0s1b for dump device.
Mounting filesystems...
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS WARNING: Recommended minimum kmem_size is 512MB; expect unstable behavior.
             Consider tuning vm.kmem_size and vm.kmem_size_max
             in /boot/loader.conf.
ZFS filesystem version 5
ZFS storage pool version 28

     ___
 ___/ f \
/ p \___/ Sense
\___/   \
    \___/

Welcome to pfSense 2.1.5-RELEASE  ...

No core dumps found.
Creating symlinks......done.
>>> Under 512 megabytes of ram detected.  Not enabling APC.
External config loader 1.0 is now starting...
Launching the init system... done.
Initializing............................. done.
Starting device manager (devd)...done.
Loading configuration......done.
Warning: Configuration references interfaces that do not exist: ath0 rl0

Network interface mismatch -- Running interface assignment option.

Valid interfaces are:

em0   00:16:3e:a1:64:01   (up) Intel(R) PRO/1000 Legacy Network Connection 1.0.6

Do you want to set up VLANs first?

If you are not going to use VLANs, or only for optional interfaces, you should
say no here and use the webConfigurator to configure VLANs later, if required.

Do you want to set up VLANs now [y|n]? em0: link state changed to UP

pfSense is now shutting down ...

Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...0 done
All buffers synced.
Uptime: 4m13s
acpi0: Powering system off
Domain 2 has shut down, reason code 0 0x0
                                         Action for shutdown reason code 0 is destroy
                                                                                     Domain 2 needs to be cleaned up: destroying the domain
       ==24982==
                 ==24982== Process terminating with default action of signal 11 (SIGSEGV)
                                                                                         ==24982==  Bad permissions for mapped region at address 0x4035D40
                      ==24982==    at 0x56EDB95: sigcancel_handler (nptl-init.c:174)
                                                                                    ==24982==
                                                                                              ==24982== HEAP SUMMARY:
                                                                                                                     ==24982==     in use at exit: 7,580 bytes in 50 blocks
                                       ==24982==   total heap usage: 1,688 allocs, 1,638 frees, 4,978,265 bytes allocated
                                                                                                                         ==24982==
                                                                                                                                   ==24982== LEAK SUMMARY:
                      ==24982==    definitely lost: 516 bytes in 7 blocks
                                                                         ==24982==    indirectly lost: 0 bytes in 0 blocks
                                                                                                                          ==24982==      possibly lost: 576 bytes in 2 blocks
                                         ==24982==    still reachable: 6,488 bytes in 41 blocks
                                                                                               ==24982==         suppressed: 0 bytes in 0 blocks
            ==24982== Rerun with --leak-check=full to see details of leaked memory
                                                                                  ==24982==
                                                                                            ==24982== For counts of detected and suppressed errors, rerun with: -v
                              ==24982== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
                                                                                                      Killed


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-05 12:01                       ` Atom2
@ 2014-11-05 12:39                         ` Ian Campbell
  2014-11-05 12:45                           ` Andrew Cooper
  2014-11-06 15:11                           ` Atom2
  0 siblings, 2 replies; 25+ messages in thread
From: Ian Campbell @ 2014-11-05 12:39 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Wed, 2014-11-05 at 13:01 +0100, Atom2 wrote:

Thanks for all that, sadly it's not giving me any clues what is going
wrong :-/

> To me, it looks as if something is broken with the PCI passthrough stuff 
> and that has started with 4.3.3. Strangely however, valgrind seems to 
> work around that issue insofar that no segfault happens. Is there any 
> explanation of the different behaviour between native execution of xl 
> and starting xl under valgrind's control?

Valgrind has it's own memory allocator etc, but it's supposed to catch
errors, not hide them. I think even 3.10.0 is missing support for some
hypercalls which are being used by passthrough, which is why we are
continuing to see different behaviours.

I think we are reaching the point of diminishing returns with vagrind.
It probably is worth rerunning with "-v --leak-check=full", but after
that we'd be looking at adding valgrind patches for the new hypercalls,
which I don't think will be worthwhile (although I intend to write the
patches anyway).

So unless "-v --leak-check=full" tells me something (which I'm doubtful
of at this stage) I think we're back to bisecting the changes since
4.3.1, sorry.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-05 12:39                         ` Ian Campbell
@ 2014-11-05 12:45                           ` Andrew Cooper
  2014-11-05 12:47                             ` Ian Campbell
  2014-11-06 15:11                           ` Atom2
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-05 12:45 UTC (permalink / raw)
  To: Ian Campbell, Atom2; +Cc: xen-devel

On 05/11/14 12:39, Ian Campbell wrote:
> On Wed, 2014-11-05 at 13:01 +0100, Atom2 wrote:
>
> Thanks for all that, sadly it's not giving me any clues what is going
> wrong :-/
>
>> To me, it looks as if something is broken with the PCI passthrough stuff 
>> and that has started with 4.3.3. Strangely however, valgrind seems to 
>> work around that issue insofar that no segfault happens. Is there any 
>> explanation of the different behaviour between native execution of xl 
>> and starting xl under valgrind's control?
> Valgrind has it's own memory allocator etc, but it's supposed to catch
> errors, not hide them. I think even 3.10.0 is missing support for some
> hypercalls which are being used by passthrough, which is why we are
> continuing to see different behaviours.
>
> I think we are reaching the point of diminishing returns with vagrind.
> It probably is worth rerunning with "-v --leak-check=full", but after
> that we'd be looking at adding valgrind patches for the new hypercalls,
> which I don't think will be worthwhile (although I intend to write the
> patches anyway).
>
> So unless "-v --leak-check=full" tells me something (which I'm doubtful
> of at this stage) I think we're back to bisecting the changes since
> 4.3.1, sorry.

The lack valgrind support for XEN_DOMCTL_test_assign is causing PCI
Passthrough to fail.

This is where "libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI
device 0000:0a:0b.0 cannot be assigned - no IOMMU?" comes from.

While I do have 10 patches I really should get around to upstreaming
into valgrind, the passthrough hypercalls are not amongst them.  Fixing
XEN_DOMCTL_test_assign is only the first step.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-05 12:45                           ` Andrew Cooper
@ 2014-11-05 12:47                             ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2014-11-05 12:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Atom2, xen-devel

On Wed, 2014-11-05 at 12:45 +0000, Andrew Cooper wrote:
> On 05/11/14 12:39, Ian Campbell wrote:
> > On Wed, 2014-11-05 at 13:01 +0100, Atom2 wrote:
> >
> > Thanks for all that, sadly it's not giving me any clues what is going
> > wrong :-/
> >
> >> To me, it looks as if something is broken with the PCI passthrough stuff 
> >> and that has started with 4.3.3. Strangely however, valgrind seems to 
> >> work around that issue insofar that no segfault happens. Is there any 
> >> explanation of the different behaviour between native execution of xl 
> >> and starting xl under valgrind's control?
> > Valgrind has it's own memory allocator etc, but it's supposed to catch
> > errors, not hide them. I think even 3.10.0 is missing support for some
> > hypercalls which are being used by passthrough, which is why we are
> > continuing to see different behaviours.
> >
> > I think we are reaching the point of diminishing returns with vagrind.
> > It probably is worth rerunning with "-v --leak-check=full", but after
> > that we'd be looking at adding valgrind patches for the new hypercalls,
> > which I don't think will be worthwhile (although I intend to write the
> > patches anyway).
> >
> > So unless "-v --leak-check=full" tells me something (which I'm doubtful
> > of at this stage) I think we're back to bisecting the changes since
> > 4.3.1, sorry.
> 
> The lack valgrind support for XEN_DOMCTL_test_assign is causing PCI
> Passthrough to fail.
> 
> This is where "libxl: error: libxl_pci.c:1045:libxl__device_pci_add: PCI
> device 0000:0a:0b.0 cannot be assigned - no IOMMU?" comes from.
> 
> While I do have 10 patches I really should get around to upstreaming
> into valgrind, the passthrough hypercalls are not amongst them.  Fixing
> XEN_DOMCTL_test_assign is only the first step.

I've just written (but not tested) that one.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-05 12:39                         ` Ian Campbell
  2014-11-05 12:45                           ` Andrew Cooper
@ 2014-11-06 15:11                           ` Atom2
  2014-11-10 11:16                             ` Ian Campbell
  1 sibling, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-11-06 15:11 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Am 05.11.14 um 13:39 schrieb Ian Campbell:
> On Wed, 2014-11-05 at 13:01 +0100, Atom2 wrote:
>
> Thanks for all that, sadly it's not giving me any clues what is going
> wrong :-/
>
> So unless "-v --leak-check=full" tells me something (which I'm doubtful
> of at this stage) I think we're back to bisecting the changes since
> 4.3.1, sorry.
Things are getting very strange at the moment.
After much work an research I have been able to download the source and 
compile the old version which has worked before (which incidentally was 
not 4.3.1 but rather 4.3.2-r5 - sorry for any confusion that might have 
caused). I initially thought that's good news because there are less 
changes between 4.3.2 and 4.3.3 but after re-ompiling 4.3.2-r5 I am now 
experiencing the same segfault as with 4.3.3.

So my next step was trying to figure out what else had changed since the 
problems started on 26.10.14 by working through log files and those are 
the relevant events that had happened. The sequence of events was as 
follows:

11.10.14 04:13:	Last system reboot with working version (xen 4.3.2-r5)
		(xen-4.3.2-r5 was in use since 21.08.14)
18.10.14 22:50:	Last successful creation of HVM with PCI passthrough
		(that domU run up to 26.10.14 as did another HVM)

Updates and new package installs since last reboot:
22.10.14:	app-misc/pax-utils-0.8.1 (update)
24.10.14:	dev-libs/libaio-0.3.110 (update)
		dev-libs/popt-1.16-r2 (update)
		sys-libs/libcap-ng-0.7.3 (new)
		dev-libs/libgcrypt-1.5.4-r1 (update)
		net-analyzer/tcpdump-4.6.2 (update)
25.10.14:	sys-devel/gcc-4.8.3 (update from 4.7.3-r1)
26.10.14:	app-emulation/xen-tools-4.3.3-r1 (update from 4.3.2-r5)
		app-emulation/xen-4.3.3-r1 (update from 4.3.2-r5)

26.10.14:	reboot - 1st segfault msg in syslog at shutdown time
		system reboots, can't start HVM PCI passthrough domUs
		segfault messages in syslog referring to libgcc_s.so.1
		problems since despite world/kernel/system recompile

If I read this correctly I would come to the conclusion that the only 
package that is a dependency for both 4.3.2-r5 (the previously working, 
but now also non-working version) and 4.3.3-r1 (which never worked) is 
gcc which is required to compile the binaries from source. I don't think 
any of the other packages should have any influence.

Also the error message referring to libgcc_s.so.1 might hint towards a 
problme with gcc. It's probably worth mentioning that the system apart 
from XEN runs without any hickups and is still rock solid. At the moment 
it looks as if xen and gcc-4.8.3 don't co-operate well.

It's probably also worth mentioning that gcc is (and also was with the 
older gcc-4.7.3) the hardened gcc version of gentoo which forces 
position-independent executables (PIE), stack smashing protection (SPP) 
and compile time buffer checks (see 
http://wiki.gentoo.org/wiki/Hardened_Gentoo). The rest of hardend (PAX, 
grSecurity, SELinux is not (and never was) in use (so far). I don't know 
whether any of this might have contributed to the problems I am 
currently being faced with.

Now going back to to an older version of gcc from a newer version is not 
recommened and (according to my research on google) might create 
numerous other issues - so there seems to be no easy route to get back 
to gcc-4.7.3 and therefore getting back the binaries for 4.3.2-r5 in the 
state they were before the problems started seems impossible.

I am still at loss and hope for the combined intelligence of the list to 
again get my system up and running.

Many thanks Atom2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: segfault in xl create for HVM with PCI passthrough
  2014-10-28 16:04     ` Ian Campbell
  2014-10-29  0:26       ` Atom2
@ 2014-11-09 23:03       ` Atom2
  1 sibling, 0 replies; 25+ messages in thread
From: Atom2 @ 2014-11-09 23:03 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

Am 28.10.14 um 17:04 schrieb Ian Campbell:
> On Tue, 2014-10-28 at 16:39 +0100, Atom2 wrote:
>>> Please can you run the command under gdb and grab a back trace.
>>>

I have now re-compiled a few more pieces with debugging support, namely 
gcc-8.4.3 and glibc and again run the command
	xl create pfsense -c
under gdb. The new (full) backtrace output is attached to this mail and 
might provide you with some more clues.

BTW the same problem also seems to exist for xen-4.4.1/gcc-4.8.3 and was 
found independent of my report - please see further details and a 
discussion at http://forums.gentoo.org/viewtopic-t-1003746.html and the 
related bug report at https://bugs.gentoo.org/show_bug.cgi?id=528690.

Ian - if you (or anybody else) could add any more insight into this, it 
would be very much appreciated.

Thanks again Atom2

[-- Attachment #2: backtrace-xen-4.3.3-r1 --]
[-- Type: text/plain, Size: 11397 bytes --]

vm-host auto [512] # gdb --args xl create pfsense -c
GNU gdb (Gentoo 7.7.1 p1) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from xl...Reading symbols from /usr/lib64/debug//usr/sbin/xl.debug...done.
done.
(gdb) run
Starting program: /usr/sbin/xl create pfsense -c
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Parsing config from pfsense
xc: info: VIRTUAL MEMORY ARRANGEMENT:
  Loader:        0000000000100000->00000000001c10c4
  Modules:       0000000000000000->0000000000000000
  TOTAL:         0000000000000000->000000001f800000
  ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
  4KB PAGES: 0x0000000000000200
  2MB PAGES: 0x00000000000000fb
  1GB PAGES: 0x0000000000000000
[New Thread 0x7ffff7ff5700 (LWP 2489)]
[New Thread 0x7ffff7fe6700 (LWP 2601)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fe6700 (LWP 2601)]
0x00007ffff5892624 in execute_stack_op (op_ptr=0x7ffff7329b83 "w\240\001\006\020\b\002w(\020\t\002w0\020\n\002w8\020\v\003w\300",
    op_end=0x7ffff7329b87 "\020\b\002w(\020\t\002w0\020\n\002w8\020\v\003w\300", context=context@entry=0x7ffff7fe5190,
    initial=initial@entry=0) at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c:516
516     /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c: No such file or directory.
(gdb) bt full
#0  0x00007ffff5892624 in execute_stack_op (
    op_ptr=0x7ffff7329b83 "w\240\001\006\020\b\002w(\020\t\002w0\020\n\002w8\020\v\003w\300",
    op_end=0x7ffff7329b87 "\020\b\002w(\020\t\002w0\020\n\002w8\020\v\003w\300", context=context@entry=0x7ffff7fe5190,
    initial=initial@entry=0) at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c:516
        stack = {0 <repeats 44 times>, 140737354027168, 140737312803731, 140737354027184, 140737354027488, 140737340660732,
          140737340663016, 140737354027312, 140737312808747, 140737354027328, 140733193388035, 140737340663560, 352, 10, 167, 220,
          0, 0, 0, 0, 140737354129736}
        stack_elt = <optimized out>
#1  0x00007ffff589308c in uw_update_context_1 (context=context@entry=0x7ffff7fe55a0, fs=fs@entry=0x7ffff7fe52f0)
    at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c:1424
        exp = <optimized out>
        len = <optimized out>
        orig_context = {reg = {0x7ffff7fe5698, 0x7ffff7fe56a0, 0x0, 0x7ffff7fe56a8, 0x0, 0x0, 0x7ffff7fe56f0, 0x7ffff7fe5180, 0x0,
            0x0, 0x0, 0x0, 0x7ffff7fe56b0, 0x7ffff7fe56b8, 0x7ffff7fe56c0, 0x7ffff7fe56c8, 0x7ffff7fe56f8, 0x0},
          cfa = 0x7ffff7fe5700, ra = 0x7ffff7322e00 <__restore_rt>, lsda = 0x0, bases = {tbase = 0x0, dbase = 0x0,
            func = 0x7ffff7322dff}, flags = 4611686018427387904, version = 0, args_size = 0, by_value = '\000' <repeats 17 times>}
        cfa = <optimized out>
        i = <optimized out>
        tmp_sp = {ptr = 140737354028800, word = 140737354028800}
#2  0x00007ffff5893405 in uw_update_context (context=context@entry=0x7ffff7fe55a0, fs=fs@entry=0x7ffff7fe52f0)
    at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c:1506
No locals.
#3  0x00007ffff5894086 in uw_advance_context (fs=0x7ffff7fe52f0, context=0x7ffff7fe55a0)
    at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2.c:1529
No locals.
#4  _Unwind_ForcedUnwind_Phase2 (exc=exc@entry=0x7ffff7fe6d70, context=context@entry=0x7ffff7fe55a0)
    at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind.inc:185
        fs = {regs = {reg = {{loc = {reg = 140737340677076, offset = 140737340677076,
                  exp = 0x7ffff7329bd4 "\003w\220\001\020\002\003w\230\001\020\a\003w\240\001\020\020\003w\250\001"},
                how = REG_SAVED_EXP}, {loc = {reg = 140737340677070, offset = 140737340677070,
                  exp = 0x7ffff7329bce "\003w\210\001\020"}, how = REG_SAVED_EXP}, {loc = {reg = 140737340677082,
                  offset = 140737340677082, exp = 0x7ffff7329bda "\003w\230\001\020\a\003w\240\001\020\020\003w\250\001"},
                how = REG_SAVED_EXP}, {loc = {reg = 140737340677064, offset = 140737340677064,
                  exp = 0x7ffff7329bc8 "\003w\200\001\020\001\003w\210\001\020"}, how = REG_SAVED_EXP}, {loc = {
                  reg = 140737340677052, offset = 140737340677052, exp = 0x7ffff7329bbc "\003", <incomplete sequence \360>},
                how = REG_SAVED_EXP}, {loc = {reg = 140737340677046, offset = 140737340677046,
                  exp = 0x7ffff7329bb6 "\003", <incomplete sequence \350>}, how = REG_SAVED_EXP}, {loc = {reg = 140737340677058,
                  offset = 140737340677058, exp = 0x7ffff7329bc2 "\003", <incomplete sequence \370>}, how = REG_SAVED_EXP}, {
                loc = {reg = 140737340677088, offset = 140737340677088,
                  exp = 0x7ffff7329be0 "\003w\240\001\020\020\003w\250\001"}, how = REG_SAVED_EXP}, {loc = {reg = 140737340677001,
                  offset = 140737340677001, exp = 0x7ffff7329b89 "\002w(\020\t\002w0\020\n\002w8\020\v\003w\300"},
                how = REG_SAVED_EXP}, {loc = {reg = 140737340677006, offset = 140737340677006,
                  exp = 0x7ffff7329b8e "\002w0\020\n\002w8\020\v\003w\300"}, how = REG_SAVED_EXP}, {loc = {reg = 140737340677011,
                  offset = 140737340677011, exp = 0x7ffff7329b93 "\002w8\020\v\003w\300"}, how = REG_SAVED_EXP}, {loc = {
                  reg = 140737340677016, offset = 140737340677016, exp = 0x7ffff7329b98 "\003w\300"}, how = REG_SAVED_EXP}, {
                loc = {reg = 140737340677022, offset = 140737340677022, exp = 0x7ffff7329b9e "\003", <incomplete sequence \310>},
                how = REG_SAVED_EXP}, {loc = {reg = 140737340677028, offset = 140737340677028,
                  exp = 0x7ffff7329ba4 "\003", <incomplete sequence \320>}, how = REG_SAVED_EXP}, {loc = {reg = 140737340677034,
                  offset = 140737340677034, exp = 0x7ffff7329baa "\003", <incomplete sequence \330>}, how = REG_SAVED_EXP}, {
                loc = {reg = 140737340677040, offset = 140737340677040, exp = 0x7ffff7329bb0 "\003", <incomplete sequence \340>},
                how = REG_SAVED_EXP}, {loc = {reg = 140737340677094, offset = 140737340677094,
                  exp = 0x7ffff7329be6 "\003w\250\001"}, how = REG_SAVED_EXP}, {loc = {reg = 0, offset = 0, exp = 0x0},
                how = REG_UNSAVED}}, prev = 0x0, cfa_offset = 0, cfa_reg = 0,
            cfa_exp = 0x7ffff7329b82 "\004w\240\001\006\020\b\002w(\020\t\002w0\020\n\002w8\020\v\003w\300", cfa_how = CFA_EXP},
          pc = 0x7ffff7322dff, personality = 0x0, data_align = -8, code_align = 1, retaddr_column = 16, fde_encoding = 27 '\033',
          lsda_encoding = 255 '\377', saw_z = 1 '\001', signal_frame = 1 '\001', eh_ptr = 0x0}
        action = 10
        stop = 0x7ffff73215e0 <unwind_stop>
        stop_argument = 0x7ffff7fe5d30
        code = <optimized out>
        stop_code = <optimized out>
#5  0x00007ffff589440c in _Unwind_ForcedUnwind (exc=0x7ffff7fe6d70, stop=stop@entry=0x7ffff73215e0 <unwind_stop>,
    stop_argument=0x7ffff7fe5d30) at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind.inc:207
        this_context = {reg = {0x7ffff7fe5698, 0x7ffff7fe56a0, 0x0, 0x7ffff7fe56a8, 0x0, 0x0, 0x7ffff7fe56d0, 0x0, 0x0, 0x0, 0x0,
            0x0, 0x7ffff7fe56b0, 0x7ffff7fe56b8, 0x7ffff7fe56c0, 0x7ffff7fe56c8, 0x7ffff7fe56d8, 0x0}, cfa = 0x7ffff7fe56e0,
          ra = 0x7ffff7321773 <__GI___pthread_unwind+83>, lsda = 0x0, bases = {tbase = 0x0, dbase = 0x0,
            func = 0x7ffff58943a0 <_Unwind_ForcedUnwind>}, flags = 4611686018427387904, version = 0, args_size = 0,
          by_value = '\000' <repeats 17 times>}
        cur_context = {reg = {0x7ffff7fe5698, 0x7ffff7fe56a0, 0x0, 0x7ffff7fe56a8, 0x0, 0x0, 0x7ffff7fe56f0, 0x0, 0x0, 0x0, 0x0,
            0x0, 0x7ffff7fe56b0, 0x7ffff7fe56b8, 0x7ffff7fe56c0, 0x7ffff7fe56c8, 0x7ffff7fe56f8, 0x0}, cfa = 0x7ffff7fe5700,
          ra = 0x7ffff7322e00 <__restore_rt>, lsda = 0x0, bases = {tbase = 0x0, dbase = 0x0, func = 0x7ffff7322dff},
          flags = 4611686018427387904, version = 0, args_size = 0, by_value = '\000' <repeats 17 times>}
        code = <optimized out>
#6  0x00007ffff7321773 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:129
        ibuf = <optimized out>
        self = <optimized out>
#7  0x00007ffff7318b89 in __do_cancel () at ../nptl/pthreadP.h:280
No locals.
#8  sigcancel_handler (sig=<optimized out>, si=<optimized out>, ctx=<optimized out>) at nptl-init.c:214
        si = <optimized out>
        ctx = <optimized out>
        pid = <optimized out>
        oldval = <optimized out>
#9  <signal handler called>
No locals.
#10 0x00007ffff7321e8d in read () at ../sysdeps/unix/syscall-template.S:81
No locals.
#11 0x00007ffff6b247c3 in read (__nbytes=16, __buf=0x7fffe80008d0, __fd=14) at /usr/include/bits/unistd.h:44
No locals.
#12 read_all (fd=14, data=data@entry=0x7fffe80008d0, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374
        done = <optimized out>
#13 0x00007ffff6b24904 in read_message (h=h@entry=0x555555784280, nonblocking=nonblocking@entry=0) at xs.c:1139
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {93824994525824, 1282643245906007851, 1, 140737488341120,
                93824994524064, 140737354032896, 1282643245857773355, 1282645892206696235}, __mask_was_saved = 0}}, __pad = {
            0x7ffff7fe5ee0, 0x0, 0x0, 0x0}}
        __cancel_arg = 0x7fffe80008c0
        __not_first_call = <optimized out>
        msg = 0x7fffe80008c0
        body = 0x0
        saved_errno = 0
        ret = -1
#14 0x00007ffff6b25296 in read_thread (arg=0x555555784280) at xs.c:1211
        h = 0x555555784280
        fd = <optimized out>
#15 0x00007ffff731a36d in start_thread (arg=0x7ffff7fe6700) at pthread_create.c:309
        __res = <optimized out>
        pd = 0x7ffff7fe6700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737354032896, 1282643245910202155, 1, 140737488341120, 93824994524064,
                140737354032896, 1282643245897619243, 1282642588654638891}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0,
              0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        robust = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#16 0x00007ffff7052e0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.
(gdb)

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-06 15:11                           ` Atom2
@ 2014-11-10 11:16                             ` Ian Campbell
  2014-11-10 11:44                               ` Atom2
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-11-10 11:16 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Thu, 2014-11-06 at 16:11 +0100, Atom2 wrote:

> It's probably also worth mentioning that gcc is (and also was with the 
> older gcc-4.7.3) the hardened gcc version of gentoo which forces 
> position-independent executables (PIE), stack smashing protection (SPP) 
> and compile time buffer checks (see 
> http://wiki.gentoo.org/wiki/Hardened_Gentoo). The rest of hardend (PAX, 
> grSecurity, SELinux is not (and never was) in use (so far). I don't know 
> whether any of this might have contributed to the problems I am 
> currently being faced with.

Is it at all possible to recompile at least the Xen toolstack bits with
these extra gcc features disabled? Either by using the old compiler or
somehow (CFLAGS?) disabling those features of the new one.

I'm afraid it's looking more and more like a toolchain issue. I'm not
expert on this side on things but it looks to me like you are hitting an
issue with some sort of buffer overflow check gone wrong? I think you'll
need a gcc hardening person for this one.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-10 11:16                             ` Ian Campbell
@ 2014-11-10 11:44                               ` Atom2
  2014-11-10 12:09                                 ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Atom2 @ 2014-11-10 11:44 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Ian,
Thanks again for your reply.

Am 10.11.14 um 12:16 schrieb Ian Campbell:
> On Thu, 2014-11-06 at 16:11 +0100, Atom2 wrote:
>
> Is it at all possible to recompile at least the Xen toolstack bits with
> these extra gcc features disabled? Either by using the old compiler or
> somehow (CFLAGS?) disabling those features of the new one.
The old compiler (after I brought it in again) for reasons unknow to me 
still seemed to use the version of libgcc_s.so.1 from the newer compiler 
(which was part of the segfault issue - see my latest post from Sunday 
with debugging enabled for gcc and glibc and a full backtrace). But 
downgrading a compiler is anyways something that everybody warns from, 
so I then reverted back to gcc-4.8.3

Re disabling the hardened features for the compiler: I have also tested 
that over the weekend for xen-* stuff with the 4.8.3 compiler (I 
selected the vanilla variant of gcc for the compile process of the 
xen-bits) and that did not change anything - it was still segfaulting. 
But it's worth pointing out that test the rest of the system (including 
kernel, glibc and the rest of world) was still using the hardened toolchain.
>
> I'm afraid it's looking more and more like a toolchain issue. I'm not
> expert on this side on things but it looks to me like you are hitting an
> issue with some sort of buffer overflow check gone wrong? I think you'll
> need a gcc hardening person for this one.
The issue currently is with the guys at gentoo (for links please again 
see my latest post to the list from Sunday which also seems to confirm 
that the issue is not confined to 4.3.3 but also 4.4.1).

Thanks Atom2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-10 11:44                               ` Atom2
@ 2014-11-10 12:09                                 ` Ian Campbell
  2014-12-01  3:34                                   ` Dennis Lan (dlan)
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-11-10 12:09 UTC (permalink / raw)
  To: Atom2; +Cc: xen-devel

On Mon, 2014-11-10 at 12:44 +0100, Atom2 wrote:

> > I'm afraid it's looking more and more like a toolchain issue. I'm not
> > expert on this side on things but it looks to me like you are hitting an
> > issue with some sort of buffer overflow check gone wrong? I think you'll
> > need a gcc hardening person for this one.
> The issue currently is with the guys at gentoo (for links please again 
> see my latest post to the list from Sunday which also seems to confirm 
> that the issue is not confined to 4.3.3 but also 4.4.1).

OK, I'll wait and see what the gentoo folks have to say before looking
any close then, thanks.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-11-10 12:09                                 ` Ian Campbell
@ 2014-12-01  3:34                                   ` Dennis Lan (dlan)
  2014-12-01  9:38                                     ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Dennis Lan (dlan) @ 2014-12-01  3:34 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Atom2, xen-devel

On Mon, Nov 10, 2014 at 8:09 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Mon, 2014-11-10 at 12:44 +0100, Atom2 wrote:
>
>> > I'm afraid it's looking more and more like a toolchain issue. I'm not
>> > expert on this side on things but it looks to me like you are hitting an
>> > issue with some sort of buffer overflow check gone wrong? I think you'll
>> > need a gcc hardening person for this one.
>> The issue currently is with the guys at gentoo (for links please again
>> see my latest post to the list from Sunday which also seems to confirm
>> that the issue is not confined to 4.3.3 but also 4.4.1).
>
> OK, I'll wait and see what the gentoo folks have to say before looking
> any close then, thanks.
>
Hi Ian
 what we found now is, the Gentoo's hardened toolchain, turn CFLAGS
-fstack-check on by default, with this flag and compile gcc will
result xl segfault (actually with libgcc_s.so)
 we have a patch to force gcc build libgcc(only this part) code with
-fstack-check=no, make the segfault gone
 more info can be found at https://bugs.gentoo.org/show_bug.cgi?id=528690

> Ian.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough
  2014-12-01  3:34                                   ` Dennis Lan (dlan)
@ 2014-12-01  9:38                                     ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2014-12-01  9:38 UTC (permalink / raw)
  To: Dennis Lan (dlan); +Cc: Atom2, xen-devel

On Mon, 2014-12-01 at 11:34 +0800, Dennis Lan (dlan) wrote:
> On Mon, Nov 10, 2014 at 8:09 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Mon, 2014-11-10 at 12:44 +0100, Atom2 wrote:
> >
> >> > I'm afraid it's looking more and more like a toolchain issue. I'm not
> >> > expert on this side on things but it looks to me like you are hitting an
> >> > issue with some sort of buffer overflow check gone wrong? I think you'll
> >> > need a gcc hardening person for this one.
> >> The issue currently is with the guys at gentoo (for links please again
> >> see my latest post to the list from Sunday which also seems to confirm
> >> that the issue is not confined to 4.3.3 but also 4.4.1).
> >
> > OK, I'll wait and see what the gentoo folks have to say before looking
> > any close then, thanks.
> >
> Hi Ian
>  what we found now is, the Gentoo's hardened toolchain, turn CFLAGS
> -fstack-check on by default, with this flag and compile gcc will
> result xl segfault (actually with libgcc_s.so)
>  we have a patch to force gcc build libgcc(only this part) code with
> -fstack-check=no, make the segfault gone
>  more info can be found at https://bugs.gentoo.org/show_bug.cgi?id=528690

Excellent, thanks for letting us know.

Just to be sure: This isn't (so far as anyone knows) the result of any
coding/build-system problem in Xen, right?

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-12-01  9:38 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-27 21:25 segfault in xl create for HVM with PCI passthrough Atom2
2014-10-28 10:59 ` Ian Campbell
2014-10-28 15:39   ` Atom2
2014-10-28 16:04     ` Ian Campbell
2014-10-29  0:26       ` Atom2
2014-10-30 23:05         ` Atom2
2014-11-04 15:13           ` [BUG] XEN 4.3.3 - " Atom2
2014-11-04 15:44             ` Ian Campbell
2014-11-04 16:14               ` Atom2
2014-11-04 16:31                 ` Ian Campbell
2014-11-04 16:48                   ` Atom2
2014-11-05  9:33                     ` Ian Campbell
2014-11-04 17:30                   ` Atom2
2014-11-05  9:45                     ` Ian Campbell
2014-11-05 12:01                       ` Atom2
2014-11-05 12:39                         ` Ian Campbell
2014-11-05 12:45                           ` Andrew Cooper
2014-11-05 12:47                             ` Ian Campbell
2014-11-06 15:11                           ` Atom2
2014-11-10 11:16                             ` Ian Campbell
2014-11-10 11:44                               ` Atom2
2014-11-10 12:09                                 ` Ian Campbell
2014-12-01  3:34                                   ` Dennis Lan (dlan)
2014-12-01  9:38                                     ` Ian Campbell
2014-11-09 23:03       ` Atom2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).