All of lore.kernel.org
 help / color / mirror / Atom feed
* pci passthrough xhci host controller
@ 2010-09-15 21:09 Sander Eikelenboom
  2010-09-20 20:33 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 9+ messages in thread
From: Sander Eikelenboom @ 2010-09-15 21:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel@lists.xensource.com

Hi Konrad,

I have changed my setup a bit, using my old workstation as a xen test platform at the moment.

I'm now running:
- Xen-unstable                       xen_changeset : Fri Sep 10 19:06:33 2010 +0100 22132:3985fea87987
- Dom0: pvops stable-2.6.32.x        last commit b297cdac0373625d3cd0e6f2b393570dcf2edba6
- DomU: Own merge of:
                  -linus 2.6.36(-rc4) tree last commit 9c03f1622af051004416dd3e24d8a0fa31e34178
                  -your pci-front 0.6 tree

- Only one domU is running (copy of the one i used before on the other machine)
- Only one pci-e xhci hostcontroller is passed through (02:00.0)
- domU is booted with only iommu-soft

What happens:
     - domU boots fine, pci device is present, lsusb shows the card, but the grab util can't find the grabber on /dev/video0
     - The app keeps on trying ..
     - What i do see is a continuing stream of suspected kmemleaks in the domU

Any more debug info/output can of course be generated ...
--
Sander

unreferenced object 0xffff88002d7004c0 (size 32):
  comm "swapper", pid 1, jiffies 4294667951 (age 931.016s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff8105459a>] kzalloc_node+0xf/0x11
    [<ffffffff81054ef8>] get_one_free_irq_cfg+0x1a/0x46
    [<ffffffff81054f3e>] arch_init_chip_data+0x1a/0x3a
    [<ffffffff815f285a>] irq_to_desc_alloc_node+0x168/0x199
    [<ffffffff81298972>] xen_allocate_pirq+0x7e/0x110
    [<ffffffff814bac4c>] xen_setup_msi_irqs+0xec/0x17e
    [<ffffffff8124aa95>] pci_enable_msix+0x3b1/0x3c2
    [<ffffffff813fd538>] xhci_run+0x108/0x520
    [<ffffffff813e9aa1>] usb_add_hcd+0x34f/0x62e
    [<ffffffff813f60a6>] usb_hcd_pci_probe+0x23d/0x35b
    [<ffffffff812405e7>] local_pci_probe+0x48/0x91
    [<ffffffff81240995>] pci_device_probe+0x5f/0x89
    [<ffffffff81313998>] driver_probe_device+0xb2/0x16d
    [<ffffffff81313aaf>] __driver_attach+0x5c/0x7f
security:~# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88002fc025c0 (size 64):
  comm "swapper", pid 1, jiffies 4294667307 (age 934.525s)
  hex dump (first 32 bytes):
    f8 0c 00 00 00 00 00 00 ff 0c 00 00 00 00 00 00  ................
    a5 38 a8 81 ff ff ff ff 00 00 00 80 00 00 00 00  .8..............
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff81079ccb>] kzalloc+0xf/0x11
    [<ffffffff81079d20>] __request_region+0x53/0x18b
    [<ffffffff81db1b0d>] pci_direct_probe+0x3c/0x276
    [<ffffffff81db18df>] pci_arch_init+0xe/0x69
    [<ffffffff810020aa>] do_one_initcall+0x7c/0x15c
    [<ffffffff81d7e741>] kernel_init+0x158/0x1e2
    [<ffffffff81039b24>] kernel_thread_helper+0x4/0x10
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88002fc02600 (size 64):
  comm "swapper", pid 1, jiffies 4294667307 (age 934.525s)
  hex dump (first 32 bytes):
    f8 0c 00 00 00 00 00 00 fb 0c 00 00 00 00 00 00  ................
    af 38 a8 81 ff ff ff ff 00 00 00 80 00 00 00 00  .8..............
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff81079ccb>] kzalloc+0xf/0x11
    [<ffffffff81079d20>] __request_region+0x53/0x18b
    [<ffffffff81db1c20>] pci_direct_probe+0x14f/0x276
    [<ffffffff81db18df>] pci_arch_init+0xe/0x69
    [<ffffffff810020aa>] do_one_initcall+0x7c/0x15c
    [<ffffffff81d7e741>] kernel_init+0x158/0x1e2
    [<ffffffff81039b24>] kernel_thread_helper+0x4/0x10
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88002fc02640 (size 64):
  comm "swapper", pid 1, jiffies 4294667307 (age 934.525s)
  hex dump (first 32 bytes):
    00 c0 00 00 00 00 00 00 ff cf 00 00 00 00 00 00  ................
    af 38 a8 81 ff ff ff ff 00 00 00 80 00 00 00 00  .8..............
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff81079ccb>] kzalloc+0xf/0x11
    [<ffffffff81079d20>] __request_region+0x53/0x18b
    [<ffffffff81db1c4c>] pci_direct_probe+0x17b/0x276
    [<ffffffff81db18df>] pci_arch_init+0xe/0x69
    [<ffffffff810020aa>] do_one_initcall+0x7c/0x15c
    [<ffffffff81d7e741>] kernel_init+0x158/0x1e2
    [<ffffffff81039b24>] kernel_thread_helper+0x4/0x10
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88002d700460 (size 32):
  comm "swapper", pid 1, jiffies 4294667948 (age 933.884s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff8105459a>] kzalloc_node+0xf/0x11
    [<ffffffff81054ef8>] get_one_free_irq_cfg+0x1a/0x46
    [<ffffffff81054f3e>] arch_init_chip_data+0x1a/0x3a
    [<ffffffff815f285a>] irq_to_desc_alloc_node+0x168/0x199
    [<ffffffff81298972>] xen_allocate_pirq+0x7e/0x110
    [<ffffffff814bad29>] xen_pcifront_enable_irq+0x4b/0x7a
    [<ffffffff814bc1c4>] pcibios_enable_device+0x29/0x2d
    [<ffffffff81240166>] do_pci_enable_device+0x28/0x40
    [<ffffffff812401d3>] __pci_enable_device_flags+0x55/0x69
    [<ffffffff812401f5>] pci_enable_device+0xe/0x10
    [<ffffffff813f5eab>] usb_hcd_pci_probe+0x42/0x35b
    [<ffffffff812405e7>] local_pci_probe+0x48/0x91
    [<ffffffff81240995>] pci_device_probe+0x5f/0x89
    [<ffffffff81313998>] driver_probe_device+0xb2/0x16d
unreferenced object 0xffff88002d7002a0 (size 32):
  comm "swapper", pid 1, jiffies 4294667951 (age 933.918s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff8105459a>] kzalloc_node+0xf/0x11
    [<ffffffff81054ef8>] get_one_free_irq_cfg+0x1a/0x46
    [<ffffffff81054f3e>] arch_init_chip_data+0x1a/0x3a
    [<ffffffff815f285a>] irq_to_desc_alloc_node+0x168/0x199
    [<ffffffff81298972>] xen_allocate_pirq+0x7e/0x110
    [<ffffffff814bac4c>] xen_setup_msi_irqs+0xec/0x17e
    [<ffffffff8124aa95>] pci_enable_msix+0x3b1/0x3c2
    [<ffffffff813fd538>] xhci_run+0x108/0x520
    [<ffffffff813e9aa1>] usb_add_hcd+0x34f/0x62e
    [<ffffffff813f60a6>] usb_hcd_pci_probe+0x23d/0x35b
    [<ffffffff812405e7>] local_pci_probe+0x48/0x91
    [<ffffffff81240995>] pci_device_probe+0x5f/0x89
    [<ffffffff81313998>] driver_probe_device+0xb2/0x16d
    [<ffffffff81313aaf>] __driver_attach+0x5c/0x7f
unreferenced object 0xffff88002d7004c0 (size 32):
  comm "swapper", pid 1, jiffies 4294667951 (age 933.918s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff8105459a>] kzalloc_node+0xf/0x11
    [<ffffffff81054ef8>] get_one_free_irq_cfg+0x1a/0x46
    [<ffffffff81054f3e>] arch_init_chip_data+0x1a/0x3a
    [<ffffffff815f285a>] irq_to_desc_alloc_node+0x168/0x199
    [<ffffffff81298972>] xen_allocate_pirq+0x7e/0x110
    [<ffffffff814bac4c>] xen_setup_msi_irqs+0xec/0x17e
    [<ffffffff8124aa95>] pci_enable_msix+0x3b1/0x3c2
    [<ffffffff813fd538>] xhci_run+0x108/0x520
    [<ffffffff813e9aa1>] usb_add_hcd+0x34f/0x62e
    [<ffffffff813f60a6>] usb_hcd_pci_probe+0x23d/0x35b
    [<ffffffff812405e7>] local_pci_probe+0x48/0x91
    [<ffffffff81240995>] pci_device_probe+0x5f/0x89
    [<ffffffff81313998>] driver_probe_device+0xb2/0x16d
    [<ffffffff81313aaf>] __driver_attach+0x5c/0x7f
security:~# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88002fc025c0 (size 64):
  comm "swapper", pid 1, jiffies 4294667307 (age 935.641s)
  hex dump (first 32 bytes):
    f8 0c 00 00 00 00 00 00 ff 0c 00 00 00 00 00 00  ................
    a5 38 a8 81 ff ff ff ff 00 00 00 80 00 00 00 00  .8..............
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff81079ccb>] kzalloc+0xf/0x11
    [<ffffffff81079d20>] __request_region+0x53/0x18b
    [<ffffffff81db1b0d>] pci_direct_probe+0x3c/0x276
    [<ffffffff81db18df>] pci_arch_init+0xe/0x69
    [<ffffffff810020aa>] do_one_initcall+0x7c/0x15c
    [<ffffffff81d7e741>] kernel_init+0x158/0x1e2
    [<ffffffff81039b24>] kernel_thread_helper+0x4/0x10
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88002fc02600 (size 64):
  comm "swapper", pid 1, jiffies 4294667307 (age 935.641s)
  hex dump (first 32 bytes):
    f8 0c 00 00 00 00 00 00 fb 0c 00 00 00 00 00 00  ................
    af 38 a8 81 ff ff ff ff 00 00 00 80 00 00 00 00  .8..............
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff81079ccb>] kzalloc+0xf/0x11
    [<ffffffff81079d20>] __request_region+0x53/0x18b
    [<ffffffff81db1c20>] pci_direct_probe+0x14f/0x276
    [<ffffffff81db18df>] pci_arch_init+0xe/0x69
    [<ffffffff810020aa>] do_one_initcall+0x7c/0x15c
    [<ffffffff81d7e741>] kernel_init+0x158/0x1e2
    [<ffffffff81039b24>] kernel_thread_helper+0x4/0x10
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88002fc02640 (size 64):
  comm "swapper", pid 1, jiffies 4294667307 (age 935.641s)
  hex dump (first 32 bytes):
    00 c0 00 00 00 00 00 00 ff cf 00 00 00 00 00 00  ................
    af 38 a8 81 ff ff ff ff 00 00 00 80 00 00 00 00  .8..............
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff81079ccb>] kzalloc+0xf/0x11
    [<ffffffff81079d20>] __request_region+0x53/0x18b
    [<ffffffff81db1c4c>] pci_direct_probe+0x17b/0x276
    [<ffffffff81db18df>] pci_arch_init+0xe/0x69
    [<ffffffff810020aa>] do_one_initcall+0x7c/0x15c
    [<ffffffff81d7e741>] kernel_init+0x158/0x1e2
    [<ffffffff81039b24>] kernel_thread_helper+0x4/0x10
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88002d700460 (size 32):
  comm "swapper", pid 1, jiffies 4294667948 (age 935.008s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff8105459a>] kzalloc_node+0xf/0x11
    [<ffffffff81054ef8>] get_one_free_irq_cfg+0x1a/0x46
    [<ffffffff81054f3e>] arch_init_chip_data+0x1a/0x3a
    [<ffffffff815f285a>] irq_to_desc_alloc_node+0x168/0x199
    [<ffffffff81298972>] xen_allocate_pirq+0x7e/0x110
    [<ffffffff814bad29>] xen_pcifront_enable_irq+0x4b/0x7a
    [<ffffffff814bc1c4>] pcibios_enable_device+0x29/0x2d
    [<ffffffff81240166>] do_pci_enable_device+0x28/0x40
    [<ffffffff812401d3>] __pci_enable_device_flags+0x55/0x69
    [<ffffffff812401f5>] pci_enable_device+0xe/0x10
    [<ffffffff813f5eab>] usb_hcd_pci_probe+0x42/0x35b
    [<ffffffff812405e7>] local_pci_probe+0x48/0x91
    [<ffffffff81240995>] pci_device_probe+0x5f/0x89
    [<ffffffff81313998>] driver_probe_device+0xb2/0x16d
unreferenced object 0xffff88002d7002a0 (size 32):
  comm "swapper", pid 1, jiffies 4294667951 (age 935.035s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff8105459a>] kzalloc_node+0xf/0x11
    [<ffffffff81054ef8>] get_one_free_irq_cfg+0x1a/0x46
    [<ffffffff81054f3e>] arch_init_chip_data+0x1a/0x3a
    [<ffffffff815f285a>] irq_to_desc_alloc_node+0x168/0x199
    [<ffffffff81298972>] xen_allocate_pirq+0x7e/0x110
    [<ffffffff814bac4c>] xen_setup_msi_irqs+0xec/0x17e
    [<ffffffff8124aa95>] pci_enable_msix+0x3b1/0x3c2
    [<ffffffff813fd538>] xhci_run+0x108/0x520
    [<ffffffff813e9aa1>] usb_add_hcd+0x34f/0x62e
    [<ffffffff813f60a6>] usb_hcd_pci_probe+0x23d/0x35b
    [<ffffffff812405e7>] local_pci_probe+0x48/0x91
    [<ffffffff81240995>] pci_device_probe+0x5f/0x89
    [<ffffffff81313998>] driver_probe_device+0xb2/0x16d
    [<ffffffff81313aaf>] __driver_attach+0x5c/0x7f
unreferenced object 0xffff88002d7004c0 (size 32):
  comm "swapper", pid 1, jiffies 4294667951 (age 935.035s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f3030>] kmemleak_alloc+0x21/0x3e
    [<ffffffff81117e5c>] __kmalloc+0x1c1/0x1eb
    [<ffffffff8105459a>] kzalloc_node+0xf/0x11
    [<ffffffff81054ef8>] get_one_free_irq_cfg+0x1a/0x46
    [<ffffffff81054f3e>] arch_init_chip_data+0x1a/0x3a
    [<ffffffff815f285a>] irq_to_desc_alloc_node+0x168/0x199
    [<ffffffff81298972>] xen_allocate_pirq+0x7e/0x110
    [<ffffffff814bac4c>] xen_setup_msi_irqs+0xec/0x17e
    [<ffffffff8124aa95>] pci_enable_msix+0x3b1/0x3c2
    [<ffffffff813fd538>] xhci_run+0x108/0x520
    [<ffffffff813e9aa1>] usb_add_hcd+0x34f/0x62e
    [<ffffffff813f60a6>] usb_hcd_pci_probe+0x23d/0x35b
    [<ffffffff812405e7>] local_pci_probe+0x48/0x91
    [<ffffffff81240995>] pci_device_probe+0x5f/0x89
    [<ffffffff81313998>] driver_probe_device+0xb2/0x16d
    [<ffffffff81313aaf>] __driver_attach+0x5c/0x7f




lspci domU:

02:00.0 USB Controller [0c03]: NEC Corporation Device [1033:0194] (rev 03) (prog-if 30)
        Subsystem: Micro-Star International Co., Ltd. Device [1462:4257]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at fea00000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=0 offset=00001000
                PBA: BAR=0 offset=00001080
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
        Capabilities: [150] #18
        Kernel driver in use: xhci_hcd


 lspci dom0:

 02:00.0 USB Controller [0c03]: NEC Corporation Device [1033:0194] (rev 03) (prog-if 30)
        Subsystem: Micro-Star International Co., Ltd. Device [1462:4257]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at fea00000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] MSI-X: Enable+ Mask- TabSize=8
                Vector table: BAR=0 offset=00001000
                PBA: BAR=0 offset=00001080
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited
                        ClockPM+ Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting <?>
        Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
        Capabilities: [150] #18
        Kernel driver in use: pciback

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: pci passthrough xhci host controller
  2010-09-15 21:09 pci passthrough xhci host controller Sander Eikelenboom
@ 2010-09-20 20:33 ` Konrad Rzeszutek Wilk
  2010-09-21 20:03   ` Sander Eikelenboom
  0 siblings, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-09-20 20:33 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel@lists.xensource.com

On Wed, Sep 15, 2010 at 11:09:35PM +0200, Sander Eikelenboom wrote:
> Hi Konrad,
> 
> I have changed my setup a bit, using my old workstation as a xen test platform at the moment.
> 
> I'm now running:
> - Xen-unstable                       xen_changeset : Fri Sep 10 19:06:33 2010 +0100 22132:3985fea87987
> - Dom0: pvops stable-2.6.32.x        last commit b297cdac0373625d3cd0e6f2b393570dcf2edba6
> - DomU: Own merge of:
>                   -linus 2.6.36(-rc4) tree last commit 9c03f1622af051004416dd3e24d8a0fa31e34178
>                   -your pci-front 0.6 tree
> 
> - Only one domU is running (copy of the one i used before on the other machine)
> - Only one pci-e xhci hostcontroller is passed through (02:00.0)
> - domU is booted with only iommu-soft
> 
> What happens:
>      - domU boots fine, pci device is present, lsusb shows the card, but the grab util can't find the grabber on /dev/video0
>      - The app keeps on trying ..

What was the error with the /dev/video0?
The same as before where the em_8xx died in a horrible death?

>      - What i do see is a continuing stream of suspected kmemleaks in the domU

Hmm.. They aren't huge, they are actually all quite small (64 bytes and 32 bytes).
That is all that happens when DomU dies due to OOM going wild?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: pci passthrough xhci host controller
  2010-09-20 20:33 ` Konrad Rzeszutek Wilk
@ 2010-09-21 20:03   ` Sander Eikelenboom
  2010-09-27 15:59     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 9+ messages in thread
From: Sander Eikelenboom @ 2010-09-21 20:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel@lists.xensource.com

Hi Konrad,

I indeed have the feeling the memleak's aren't huge, and adding the diverse kernel hacking debug options, ended op doing more wrong than right.
I have turned off the options i added, re-instated the "swiotlb=force" in the domU config to see if it goes from a working to a freezing config, but i have the feeling it will not make a difference.

Then i have 4 differences left:

- Other dom0 kernel since the tests resulting in continous freezes of my server
- Other domU kernel since the tests resulting in continous freezes of my server
- Other workload (server is running more VM's)
- Other physical hardware
        - server is AMD phenom X6, current config Intel quad core
        - Both have there iommu disabled
        - Both are 64 capable cpu's with 64 xen, dom0 and domU

        - But most notably perhaps, the intel has only 2GB RAM, the server 8GB

Could the available physical RAM be an issue here ?
I limit the ram for dom0 with dom0_mem=

After this test succeeds on the intel machine, i will retry the samen xen,dom0 kernel and domU kernel on the AMD config.
Is there anything i can especially log/configure/debug to get more detail to see if the 8GB could be the problem ?

--

Sander


Monday, September 20, 2010, 10:33:44 PM, you wrote:

> On Wed, Sep 15, 2010 at 11:09:35PM +0200, Sander Eikelenboom wrote:
>> Hi Konrad,
>> 
>> I have changed my setup a bit, using my old workstation as a xen test platform at the moment.
>> 
>> I'm now running:
>> - Xen-unstable                       xen_changeset : Fri Sep 10 19:06:33 2010 +0100 22132:3985fea87987
>> - Dom0: pvops stable-2.6.32.x        last commit b297cdac0373625d3cd0e6f2b393570dcf2edba6
>> - DomU: Own merge of:
>>                   -linus 2.6.36(-rc4) tree last commit 9c03f1622af051004416dd3e24d8a0fa31e34178
>>                   -your pci-front 0.6 tree
>> 
>> - Only one domU is running (copy of the one i used before on the other machine)
>> - Only one pci-e xhci hostcontroller is passed through (02:00.0)
>> - domU is booted with only iommu-soft
>> 
>> What happens:
>>      - domU boots fine, pci device is present, lsusb shows the card, but the grab util can't find the grabber on /dev/video0
>>      - The app keeps on trying ..

> What was the error with the /dev/video0?
> The same as before where the em_8xx died in a horrible death?

>>      - What i do see is a continuing stream of suspected kmemleaks in the domU

> Hmm.. They aren't huge, they are actually all quite small (64 bytes and 32 bytes).
> That is all that happens when DomU dies due to OOM going wild?



-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: pci passthrough xhci host controller
  2010-09-21 20:03   ` Sander Eikelenboom
@ 2010-09-27 15:59     ` Konrad Rzeszutek Wilk
  2010-09-27 20:35       ` Sander Eikelenboom
  2010-09-30 19:24       ` Sander Eikelenboom
  0 siblings, 2 replies; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-09-27 15:59 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel@lists.xensource.com

On Tue, Sep 21, 2010 at 10:03:10PM +0200, Sander Eikelenboom wrote:
> Hi Konrad,
> 
> I indeed have the feeling the memleak's aren't huge, and adding the diverse kernel hacking debug options, ended op doing more wrong than right.
> I have turned off the options i added, re-instated the "swiotlb=force" in the domU config to see if it goes from a working to a freezing config, but i have the feeling it will not make a difference.
> 
> Then i have 4 differences left:
> 
> - Other dom0 kernel since the tests resulting in continous freezes of my server
> - Other domU kernel since the tests resulting in continous freezes of my server
> - Other workload (server is running more VM's)
> - Other physical hardware
>         - server is AMD phenom X6, current config Intel quad core
>         - Both have there iommu disabled
>         - Both are 64 capable cpu's with 64 xen, dom0 and domU
> 
>         - But most notably perhaps, the intel has only 2GB RAM, the server 8GB
> 
> Could the available physical RAM be an issue here ?
> I limit the ram for dom0 with dom0_mem=

OK, but that would not limit the memory of where the guest get their memory. I think
you might need this in conjunction with maxmem, say: maxmem=4GB dom0_mem=max:512MB

This way your 8GB machine has 4GB of memory available for both dom0 and the guest.

> 
> After this test succeeds on the intel machine, i will retry the samen xen,dom0 kernel and domU kernel on the AMD config.
> Is there anything i can especially log/configure/debug to get more detail to see if the 8GB could be the problem ?

I think we have concluded that the device in question (3.0 PCIe USB host controller) can do
64-bit DMA. In which case the SWIOTLB is only used as an address translation system
(pfn -> mfn, and vice-versa). If it was 32-bit it would also be utilized for bouncing
the DMA buffers - there are sometimes cases were the driver does not sync after the bounce
(perfect examples are the existing radeon/nouveau drivers) ending up with corruption/hanged
device. But those show up early in development, and this is the new USB controller than
can do 64-bit instead of the dreaded 32-bit limit that all other USB controllers are stuck
with it.

The memory difference might be a red-herring. It could be the workload - more VMs
and a latency issue (say we are waiting for an IRQ and it comes just a bit too late)?
I think the idea of narrowing down on the AMD machine the amount of memory could help.

What is the exact model of your USB capture device and the USB PCI device?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: pci passthrough xhci host controller
  2010-09-27 15:59     ` Konrad Rzeszutek Wilk
@ 2010-09-27 20:35       ` Sander Eikelenboom
  2010-09-30 19:24       ` Sander Eikelenboom
  1 sibling, 0 replies; 9+ messages in thread
From: Sander Eikelenboom @ 2010-09-27 20:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel@lists.xensource.com

Hi Konrad,

Since it all seemed to work on the intel machine, i returned to the AMD ..
And with the same hypervisor, dom0 and domU versions, i did experience a freeze already (within 2 hours after  fresh boot), with the same boot options.
Currently i'm testing the suggestions you made below, i hope i can give some news in about 2 days ...

Thanks again !

Monday, September 27, 2010, 5:59:52 PM, you wrote:

> On Tue, Sep 21, 2010 at 10:03:10PM +0200, Sander Eikelenboom wrote:
>> Hi Konrad,
>> 
>> I indeed have the feeling the memleak's aren't huge, and adding the diverse kernel hacking debug options, ended op doing more wrong than right.
>> I have turned off the options i added, re-instated the "swiotlb=force" in the domU config to see if it goes from a working to a freezing config, but i have the feeling it will not make a difference.
>> 
>> Then i have 4 differences left:
>> 
>> - Other dom0 kernel since the tests resulting in continous freezes of my server
>> - Other domU kernel since the tests resulting in continous freezes of my server
>> - Other workload (server is running more VM's)
>> - Other physical hardware
>>         - server is AMD phenom X6, current config Intel quad core
>>         - Both have there iommu disabled
>>         - Both are 64 capable cpu's with 64 xen, dom0 and domU
>> 
>>         - But most notably perhaps, the intel has only 2GB RAM, the server 8GB
>> 
>> Could the available physical RAM be an issue here ?
>> I limit the ram for dom0 with dom0_mem=

> OK, but that would not limit the memory of where the guest get their memory. I think
> you might need this in conjunction with maxmem, say: maxmem=4GB dom0_mem=max:512MB

> This way your 8GB machine has 4GB of memory available for both dom0 and the guest.

I have used mem=4G and dom0_mem=768M, this does limit the available ram to less than 4G and makes dom0 768M.
I also used "noirqbalance" for xen, and used the suggestion of Pasi: libata.noacpi=1 booted both dom0 and domU with "iommu=soft" only, no swiotlb=force specified.


>> 
>> After this test succeeds on the intel machine, i will retry the samen xen,dom0 kernel and domU kernel on the AMD config.
>> Is there anything i can especially log/configure/debug to get more detail to see if the 8GB could be the problem ?

> I think we have concluded that the device in question (3.0 PCIe USB host controller) can do
> 64-bit DMA. In which case the SWIOTLB is only used as an address translation system
(pfn ->> mfn, and vice-versa). If it was 32-bit it would also be utilized for bouncing
> the DMA buffers - there are sometimes cases were the driver does not sync after the bounce
> (perfect examples are the existing radeon/nouveau drivers) ending up with corruption/hanged
> device. But those show up early in development, and this is the new USB controller than
> can do 64-bit instead of the dreaded 32-bit limit that all other USB controllers are stuck
> with it.

> The memory difference might be a red-herring. It could be the workload - more VMs
> and a latency issue (say we are waiting for an IRQ and it comes just a bit too late)?
> I think the idea of narrowing down on the AMD machine the amount of memory could help.

> What is the exact model of your USB capture device and the USB PCI device?

Is there a way to detect if it's doing 32 bit or 64bit DMA ?

Although latency could be an issue when the xhci driver/hardware would be more sensitive to that, or it would enter paths in the driver that haven't had much testing, the latency issues shouldn't be much different from USB2.
The capture device is the same, and using the same driver and bandwidth in either case ...
That said .. it could be a corner case in the driver, that in combination with more than 4G ram could do something wrong perhaps, (and perhaps than only in combination with xen)


When i look at /proc/buddyinfo in the dom0, i only see the figures on the line DMA32 changing (allocating and freeing)

Node 0, zone      DMA      7     13      6      5      7      1      2      3      3      1      1
Node 0, zone    DMA32    354    823    585    149     60     19     13      0      0      1      0
Node 0, zone   Normal     15      3      9      4      4      3      3      1      1      1      0

In the domU, i don't have the "normal" line, and i only see changes on the DMA32 line
Node 0, zone      DMA      6      0      0      1      1      1      1      1      0      1      0
Node 0, zone    DMA32    552    151    119     33      1      0      0      0      0      0      0


And it is using MSI, /proc/interrupts on domU shows (i don't see the normal IRQ the devices has(33) listed here ?):

 44:          0  xen-pirq-pcifront  ohci_hcd:usb2
 45:      20810  xen-pirq-pcifront  ohci_hcd:usb3
 46:          2  xen-pirq-pcifront  ehci_hcd:usb1
 86:          0  xen-pirq-pcifront-msi-x  xhci_hcd
 87:   72858256  xen-pirq-pcifront-msi-x  xhci_hcd
244:      12674   xen-dyn-event     eth0
245:     154352   xen-dyn-event     blkif
246:       7518   xen-dyn-event     blkif
247:         31   xen-dyn-event     blkif
248:       2189   xen-dyn-event     hvc_console
249:         41   xen-dyn-event     pcifront
250:        593   xen-dyn-event     xenbus
251:          0  xen-percpu-ipi       callfuncsingle0
252:          0  xen-percpu-virq      debug0
253:          0  xen-percpu-ipi       callfunc0
254:          0  xen-percpu-ipi       resched0
255:    4849041  xen-percpu-virq      timer0
NMI:          0   Non-maskable interrupts
LOC:          0   Local timer interrupts
SPU:          0   Spurious interrupts
PMI:          0   Performance monitoring interrupts
PND:          0   Performance pending work
RES:          0   Rescheduling interrupts
CAL:          0   Function call interrupts
TLB:          0   TLB shootdowns
MCE:          0   Machine check exceptions
MCP:          0   Machine check polls
ERR:          0
MIS:          0

And /proc/interrupts on dom0:

            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
   1:          2          0          0          0          0          0  xen-pirq-ioapic-edge  i8042
   8:          0          0          0          0          0          0  xen-pirq-ioapic-edge  rtc0
   9:          0          0          0          0          0          0  xen-pirq-ioapic-edge  acpi
  12:          4          0          0          0          0          0  xen-pirq-ioapic-edge  i8042
  17:         12          0          0          0          0          0  xen-pirq-ioapic-level  ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
  18:          4          0          0          0          0          0  xen-pirq-ioapic-level  ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
  25:         18          0          0          0          0          0  xen-pirq-ioapic-level  HDA Intel
  33:          0          0          0          0          0          0  xen-pirq-ioapic-level  pciback[0000:07:00.0]
  44:          0          0          0          0          0          0  xen-pirq-ioapic-level  pciback[0000:09:01.0]
  45:      21068          0          0          0          0          0  xen-pirq-ioapic-level  pciback[0000:09:01.1]
  46:          2          0          0          0          0          0  xen-pirq-ioapic-level  pciback[0000:09:01.2]
1700:        215          0          0          0          0          0   xen-dyn-event     vif8.0
1701:        974          0          0          0          0          0   xen-dyn-event     blkif-backend
1702:         19          0          0          0          0          0   xen-dyn-event     blkif-backend
1703:     214484          0          0          0          0          0   xen-dyn-event     evtchn:xenconsoled
1704:        434          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1705:        215          0          0          0          0          0   xen-dyn-event     vif7.0
1706:       1058          0          0          0          0          0   xen-dyn-event     blkif-backend
1707:         19          0          0          0          0          0   xen-dyn-event     blkif-backend
1708:       1188          0          0          0          0          0   xen-dyn-event     evtchn:xenconsoled
1709:        416          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1712:       4816          0          0          0          0          0   xen-dyn-event     vif6.0
1713:       3203          0          0          0          0          0   xen-dyn-event     blkif-backend
1714:       5055          0          0          0          0          0   xen-dyn-event     blkif-backend
1715:         25          0          0          0          0          0   xen-dyn-event     blkif-backend
1716:       1365          0          0          0          0          0   xen-dyn-event     pciback
1717:     434518          0          0          0          0          0   xen-dyn-event     evtchn:xenconsoled
1718:        558          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1719:     299487          0          0          0          0          0   xen-dyn-event     vif5.0
1720:       4529          0          0          0          0          0   xen-dyn-event     blkif-backend
1721:         25          0          0          0          0          0   xen-dyn-event     blkif-backend
1722:        342          0          0          0          0          0   xen-dyn-event     evtchn:xenconsoled
1723:        321          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1724:       1448          0          0          0          0          0   xen-dyn-event     blkif-backend
1725:         23          0          0          0          0          0   xen-dyn-event     blkif-backend
1726:       1112          0          0          0          0          0   xen-dyn-event     vif4.0
1727:      14889          0          0          0          0          0   xen-dyn-event     evtchn:xenconsoled
1728:        391          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1729:        473          0          0          0          0          0   xen-dyn-event     vif3.0
1730:       2759          0          0          0          0          0   xen-dyn-event     blkif-backend
1731:         19          0          0          0          0          0   xen-dyn-event     blkif-backend
1732:       1051          0          0          0          0          0   xen-dyn-event     evtchn:xenconsoled
1733:        401          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1734:         80          0          0          0          0          0   xen-dyn-event     vif2.0
1735:        958          0          0          0          0          0   xen-dyn-event     blkif-backend
1736:         19          0          0          0          0          0   xen-dyn-event     blkif-backend
1737:       1023          0          0          0          0          0   xen-dyn-event     evtchn:xenconsoled
1738:        509          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1739:     158347          0          0          0          0          0   xen-dyn-event     vif1.0
1740:      17860          0          0          0          0          0   xen-dyn-event     blkif-backend
1741:         19          0          0          0          0          0   xen-dyn-event     blkif-backend
1742:       1009          0          0          0          0          0   xen-dyn-event     evtchn:xenconsoled
1743:        365          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1744:          0          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1745:      12740          0          0          0          0          0   xen-dyn-event     evtchn:xenstored
1746:     143704          0          0          0          0          0  xen-pirq-msi       eth1
1747:     129018          0          0          0          0          0  xen-pirq-msi       eth0
1748:          0          0          0          0          0          0  xen-pirq-msi       ahci
1749:     158608          0          0          0          0          0  xen-pirq-msi       ahci
1760:          0          0          0          0          0          0  xen-percpu-virq      pcpu
1761:      10069          0          0          0          0          0   xen-dyn-event     xenbus
1762:          0          0          0          0          0      10534  xen-percpu-ipi       callfuncsingle5
1763:          0          0          0          0          0          0  xen-percpu-virq      debug5
1764:          0          0          0          0          0        107  xen-percpu-ipi       callfunc5
1765:          0          0          0          0          0     179072  xen-percpu-ipi       resched5
1766:          0          0          0          0          0    3819179  xen-percpu-virq      timer5
1767:          0          0          0          0      20758          0  xen-percpu-ipi       callfuncsingle4
1768:          0          0          0          0          0          0  xen-percpu-virq      debug4
1769:          0          0          0          0        165          0  xen-percpu-ipi       callfunc4
1770:          0          0          0          0     176246          0  xen-percpu-ipi       resched4
1771:          0          0          0          0   10775783          0  xen-percpu-virq      timer4
1772:          0          0          0       8431          0          0  xen-percpu-ipi       callfuncsingle3
1773:          0          0          0          0          0          0  xen-percpu-virq      debug3
1774:          0          0          0        120          0          0  xen-percpu-ipi       callfunc3
1775:          0          0          0     219617          0          0  xen-percpu-ipi       resched3
1776:          0          0          0    3821742          0          0  xen-percpu-virq      timer3
1777:          0          0      11293          0          0          0  xen-percpu-ipi       callfuncsingle2
1778:          0          0          0          0          0          0  xen-percpu-virq      debug2
1779:          0          0        207          0          0          0  xen-percpu-ipi       callfunc2
1780:          0          0     239804          0          0          0  xen-percpu-ipi       resched2
1781:          0          0    4937213          0          0          0  xen-percpu-virq      timer2
1782:          0      34348          0          0          0          0  xen-percpu-ipi       callfuncsingle1
1783:          0          0          0          0          0          0  xen-percpu-virq      debug1
1784:          0        176          0          0          0          0  xen-percpu-ipi       callfunc1
1785:          0     220234          0          0          0          0  xen-percpu-ipi       resched1
1786:          0   10874047          0          0          0          0  xen-percpu-virq      timer1
1787:       6367          0          0          0          0          0  xen-percpu-ipi       callfuncsingle0
1788:          0          0          0          0          0          0  xen-percpu-virq      debug0
1789:         38          0          0          0          0          0  xen-percpu-ipi       callfunc0
1790:     178784          0          0          0          0          0  xen-percpu-ipi       resched0
1791:   10963806          0          0          0          0          0  xen-percpu-virq      timer0
 NMI:          0          0          0          0          0          0   Non-maskable interrupts
 LOC:          0          0          0          0          0          0   Local timer interrupts
 SPU:          0          0          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0          0          0   Performance pending work
 RES:     178784     220234     239804     219617     176246     179072   Rescheduling interrupts
 CAL:       6405      34524      11500       8551      20923      10641   Function call interrupts
 TLB:          0          0          0          0          0          0   TLB shootdowns
 TRM:          0          0          0          0          0          0   Thermal event interrupts
 MCE:          0          0          0          0          0          0   Machine check exceptions
 MCP:         37         37         37         37         37         37   Machine check polls
 ERR:          0
 MIS:          0







I have tried 2 different USB 3 controllers, both previously caused the freezes, both have a NEC chip.

The USB controller in the AMD system is a ASUS U3S6, from which i only passthrough the USB3 controller and not the S-ATA controller.
The other controller is an MSI, which only does USB3.

lspci (domU):
07:00.0 USB Controller [0c03]: NEC Corporation Device [1033:0194] (rev 03) (prog-if 30)
        Subsystem: ASUSTeK Computer Inc. Device [1043:8413]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 33
        Region 0: Memory at fe500000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=0 offset=00001000
                PBA: BAR=0 offset=00001080
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
        Capabilities: [150] #18
        Kernel driver in use: xhci_hcd


The capture device is a Kworld k2800, which has a em28xx chip, and it's a USB 2 device.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: pci passthrough xhci host controller
  2010-09-27 15:59     ` Konrad Rzeszutek Wilk
  2010-09-27 20:35       ` Sander Eikelenboom
@ 2010-09-30 19:24       ` Sander Eikelenboom
  2010-10-01 20:54         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 9+ messages in thread
From: Sander Eikelenboom @ 2010-09-30 19:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com

Hello Konrad,

I have done some more tests, the results:

- boot xen with mem=4G, > 2 days uptime with passthrough and videograbbing
- boot xen without mem=4G, < 1 day freeze with passthrough and videograbbing
- on both no problems as long as you don't grab video (so the controller doesn't do much)
- on both no problems when grabbing video with usb2, so it's xhci specific

I haven't changed anything else, same number of VM's running etc. etc., videograbbing is working on both (until the freeze or until i ended the test)
I'm reading some messages about msi(-x) interrupt problems with xen on xen-devel, and suggestions to try noirqbalance with xen, so on both i use noirqbalance.

So it seems to be related to the amount of mem available.
I do see one difference on the domU, with mem=4G i see some occasional warnings in syslog:
Sep 28 17:55:02 security kernel: [81744.078288] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.092653] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint

I don't see these warnings in the syslog when no mem=4G is used, so a hunch would be it goes wrong there while the xhci code tries to clean something up.
It could do something "strange" that seems to work on bare metal and on xen with mem=4G, but freezes everything with mem > 4G and gives no time to write the warning to the syslog / disk in time.

in the syslog of dom0 i do see some occasional memleaks going by, but one set could be related:
Sep 28 17:55:19 localhost kernel: [81962.053321] kmemleak: 22 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

I will add a script that cat's the content of /sys/kernel/debug/kmemleak to syslog when kmemleak reports new suspected leaks.

Any suggestions to try to debug this further ?

I boot with:

title           xen-4.1-unstable.gz / Debian GNU/Linux, 2.6.32.21-xen-stable-2.6.32.x-20100914
root            (hd0,0)
kernel          /xen-4.1-unstable.gz mem=4G dom0_mem=768M loglvl=all loglvl_guest=all com1=115200,8n1 sync_console console_to_ring console_timestamps console=com1,vga iommu=soft noirqbalance irqbalance=off
module          /vmlinuz-2.6.32.21-xen-stable-2.6.32.x-20100914 root=/dev/mapper/serveerstertje-root ro earlyprintk=xen max_loop=255 loop_max_part=63 libata.noacpi=1 iommu=soft xen-pciback.hide=(03:06.0)(07:00.0)(09:01.0)(09:01.1)(09:01.2) pci=resource_alignment=03:06.0;07:00.0;09:01.0;09:01.1;09:01.2;
module          /initrd.img-2.6.32.21-xen-stable-2.6.32.x-20100914


--
Sander



Monday, September 27, 2010, 5:59:52 PM, you wrote:

> On Tue, Sep 21, 2010 at 10:03:10PM +0200, Sander Eikelenboom wrote:
>> Hi Konrad,
>> 
>> I indeed have the feeling the memleak's aren't huge, and adding the diverse kernel hacking debug options, ended op doing more wrong than right.
>> I have turned off the options i added, re-instated the "swiotlb=force" in the domU config to see if it goes from a working to a freezing config, but i have the feeling it will not make a difference.
>> 
>> Then i have 4 differences left:
>> 
>> - Other dom0 kernel since the tests resulting in continous freezes of my server
>> - Other domU kernel since the tests resulting in continous freezes of my server
>> - Other workload (server is running more VM's)
>> - Other physical hardware
>>         - server is AMD phenom X6, current config Intel quad core
>>         - Both have there iommu disabled
>>         - Both are 64 capable cpu's with 64 xen, dom0 and domU
>> 
>>         - But most notably perhaps, the intel has only 2GB RAM, the server 8GB
>> 
>> Could the available physical RAM be an issue here ?
>> I limit the ram for dom0 with dom0_mem=

> OK, but that would not limit the memory of where the guest get their memory. I think
> you might need this in conjunction with maxmem, say: maxmem=4GB dom0_mem=max:512MB

> This way your 8GB machine has 4GB of memory available for both dom0 and the guest.

>> 
>> After this test succeeds on the intel machine, i will retry the samen xen,dom0 kernel and domU kernel on the AMD config.
>> Is there anything i can especially log/configure/debug to get more detail to see if the 8GB could be the problem ?

> I think we have concluded that the device in question (3.0 PCIe USB host controller) can do
> 64-bit DMA. In which case the SWIOTLB is only used as an address translation system
(pfn ->> mfn, and vice-versa). If it was 32-bit it would also be utilized for bouncing
> the DMA buffers - there are sometimes cases were the driver does not sync after the bounce
> (perfect examples are the existing radeon/nouveau drivers) ending up with corruption/hanged
> device. But those show up early in development, and this is the new USB controller than
> can do 64-bit instead of the dreaded 32-bit limit that all other USB controllers are stuck
> with it.

> The memory difference might be a red-herring. It could be the workload - more VMs
> and a latency issue (say we are waiting for an IRQ and it comes just a bit too late)?
> I think the idea of narrowing down on the AMD machine the amount of memory could help.

> What is the exact model of your USB capture device and the USB PCI device?



-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: pci passthrough xhci host controller
  2010-09-30 19:24       ` Sander Eikelenboom
@ 2010-10-01 20:54         ` Konrad Rzeszutek Wilk
  2010-10-01 23:33           ` Sander Eikelenboom
  0 siblings, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-01 20:54 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com

On Thu, Sep 30, 2010 at 09:24:48PM +0200, Sander Eikelenboom wrote:
> Hello Konrad,
> 
> I have done some more tests, the results:
> 
> - boot xen with mem=4G, > 2 days uptime with passthrough and videograbbing
> - boot xen without mem=4G, < 1 day freeze with passthrough and videograbbing
> - on both no problems as long as you don't grab video (so the controller doesn't do much)
> - on both no problems when grabbing video with usb2, so it's xhci specific
> 
> I haven't changed anything else, same number of VM's running etc. etc., videograbbing is working on both (until the freeze or until i ended the test)
> I'm reading some messages about msi(-x) interrupt problems with xen on xen-devel, and suggestions to try noirqbalance with xen, so on both i use noirqbalance.
> 
> So it seems to be related to the amount of mem available.
> I do see one difference on the domU, with mem=4G i see some occasional warnings in syslog:
> Sep 28 17:55:02 security kernel: [81744.078288] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
> Sep 28 17:55:02 security kernel: [81744.092653] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
> 
> I don't see these warnings in the syslog when no mem=4G is used, so a hunch would be it goes wrong there while the xhci code tries to clean something up.
> It could do something "strange" that seems to work on bare metal and on xen with mem=4G, but freezes everything with mem > 4G and gives no time to write the warning to the syslog / disk in time.
> 
> in the syslog of dom0 i do see some occasional memleaks going by, but one set could be related:
> Sep 28 17:55:19 localhost kernel: [81962.053321] kmemleak: 22 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> 
> I will add a script that cat's the content of /sys/kernel/debug/kmemleak to syslog when kmemleak reports new suspected leaks.
> 
> Any suggestions to try to debug this further ?

<shakes his head>
Do you have the name of the grabber + USB3 device? If it is not too much I might
as well get it and see what happens on my boxes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: pci passthrough xhci host controller
  2010-10-01 20:54         ` Konrad Rzeszutek Wilk
@ 2010-10-01 23:33           ` Sander Eikelenboom
  2010-10-02 17:44             ` Sander Eikelenboom
  0 siblings, 1 reply; 9+ messages in thread
From: Sander Eikelenboom @ 2010-10-01 23:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel@lists.xensource.com

Hmmm i can get it to freeze with or without the mem=4G now.

Letting the domU grab video, and let dom0 compile a kernel with make -j6 lets the machine freeze after a very short while ..
With all the debug things the machine seems a bit slow any how for a six core, but it seems to choke on the interrupts generated by the xhci controller.

With the host controller now using 32bit instead of 64bit DMA it now shows with or without the mem=4G some warnings before freezing:

Oct  2 00:23:07 security kernel: [  524.020717] xhci_hcd 0000:07:00.0: Spurious interrupt.
Oct  2 00:23:10 security kernel: [  526.926654] xhci_hcd 0000:07:00.0: Spurious interrupt.
Oct  2 00:23:11 security kernel: [  527.714567] xhci_hcd 0000:07:00.0: Spurious interrupt.
Oct  2 00:23:42 security kernel: [  558.402659] xhci_hcd 0000:07:00.0: Spurious interrupt.
Oct  2 00:25:00 security kernel: [  636.278406] xhci_hcd 0000:07:00.0: Spurious interrupt.

When i do the kernel compile with the domU started, but not grabbing video, the kernel compile completes without a problem.
With the domU running cpuburn, it does complete without a problem.
I do have the feeling the videograbbing does cause a lot of interrupts .. (this is still booting xen with noirqbalance and dom0 and domU with pci=nomsi).

So the 4G is then probably a red herring ...

--
Sander




Friday, October 1, 2010, 10:54:17 PM, you wrote:

> On Thu, Sep 30, 2010 at 09:24:48PM +0200, Sander Eikelenboom wrote:
>> Hello Konrad,
>> 
>> I have done some more tests, the results:
>> 
>> - boot xen with mem=4G, > 2 days uptime with passthrough and videograbbing
>> - boot xen without mem=4G, < 1 day freeze with passthrough and videograbbing
>> - on both no problems as long as you don't grab video (so the controller doesn't do much)
>> - on both no problems when grabbing video with usb2, so it's xhci specific
>> 
>> I haven't changed anything else, same number of VM's running etc. etc., videograbbing is working on both (until the freeze or until i ended the test)
>> I'm reading some messages about msi(-x) interrupt problems with xen on xen-devel, and suggestions to try noirqbalance with xen, so on both i use noirqbalance.
>> 
>> So it seems to be related to the amount of mem available.
>> I do see one difference on the domU, with mem=4G i see some occasional warnings in syslog:
>> Sep 28 17:55:02 security kernel: [81744.078288] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>> Sep 28 17:55:02 security kernel: [81744.092653] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>> 
>> I don't see these warnings in the syslog when no mem=4G is used, so a hunch would be it goes wrong there while the xhci code tries to clean something up.
>> It could do something "strange" that seems to work on bare metal and on xen with mem=4G, but freezes everything with mem > 4G and gives no time to write the warning to the syslog / disk in time.
>> 
>> in the syslog of dom0 i do see some occasional memleaks going by, but one set could be related:
>> Sep 28 17:55:19 localhost kernel: [81962.053321] kmemleak: 22 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
>> 
>> I will add a script that cat's the content of /sys/kernel/debug/kmemleak to syslog when kmemleak reports new suspected leaks.
>> 
>> Any suggestions to try to debug this further ?

> <shakes his head>
> Do you have the name of the grabber + USB3 device? If it is not too much I might
> as well get it and see what happens on my boxes.



-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: pci passthrough xhci host controller
  2010-10-01 23:33           ` Sander Eikelenboom
@ 2010-10-02 17:44             ` Sander Eikelenboom
  0 siblings, 0 replies; 9+ messages in thread
From: Sander Eikelenboom @ 2010-10-02 17:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel@lists.xensource.com

Ok the freezing on a kernel compile with "make -j6" was a cpu0 stall, so it's locked, by that amount that i can't use ctrl-a to get in the hypervisor.
Removing the noirqbalance makes it possible to compile the kernel in dom0 while videograbbing in domU.

I can start a fish shop with all my red herrings :(


Saturday, October 2, 2010, 1:33:36 AM, you wrote:

> Hmmm i can get it to freeze with or without the mem=4G now.

> Letting the domU grab video, and let dom0 compile a kernel with make -j6 lets the machine freeze after a very short while ..
> With all the debug things the machine seems a bit slow any how for a six core, but it seems to choke on the interrupts generated by the xhci controller.

> With the host controller now using 32bit instead of 64bit DMA it now shows with or without the mem=4G some warnings before freezing:

> Oct  2 00:23:07 security kernel: [  524.020717] xhci_hcd 0000:07:00.0: Spurious interrupt.
> Oct  2 00:23:10 security kernel: [  526.926654] xhci_hcd 0000:07:00.0: Spurious interrupt.
> Oct  2 00:23:11 security kernel: [  527.714567] xhci_hcd 0000:07:00.0: Spurious interrupt.
> Oct  2 00:23:42 security kernel: [  558.402659] xhci_hcd 0000:07:00.0: Spurious interrupt.
> Oct  2 00:25:00 security kernel: [  636.278406] xhci_hcd 0000:07:00.0: Spurious interrupt.

> When i do the kernel compile with the domU started, but not grabbing video, the kernel compile completes without a problem.
> With the domU running cpuburn, it does complete without a problem.
> I do have the feeling the videograbbing does cause a lot of interrupts .. (this is still booting xen with noirqbalance and dom0 and domU with pci=nomsi).

> So the 4G is then probably a red herring ...

> --
> Sander




> Friday, October 1, 2010, 10:54:17 PM, you wrote:

>> On Thu, Sep 30, 2010 at 09:24:48PM +0200, Sander Eikelenboom wrote:
>>> Hello Konrad,
>>> 
>>> I have done some more tests, the results:
>>> 
>>> - boot xen with mem=4G, > 2 days uptime with passthrough and videograbbing
>>> - boot xen without mem=4G, < 1 day freeze with passthrough and videograbbing
>>> - on both no problems as long as you don't grab video (so the controller doesn't do much)
>>> - on both no problems when grabbing video with usb2, so it's xhci specific
>>> 
>>> I haven't changed anything else, same number of VM's running etc. etc., videograbbing is working on both (until the freeze or until i ended the test)
>>> I'm reading some messages about msi(-x) interrupt problems with xen on xen-devel, and suggestions to try noirqbalance with xen, so on both i use noirqbalance.
>>> 
>>> So it seems to be related to the amount of mem available.
>>> I do see one difference on the domU, with mem=4G i see some occasional warnings in syslog:
>>> Sep 28 17:55:02 security kernel: [81744.078288] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>>> Sep 28 17:55:02 security kernel: [81744.092653] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: transfer error on endpoint
>>> 
>>> I don't see these warnings in the syslog when no mem=4G is used, so a hunch would be it goes wrong there while the xhci code tries to clean something up.
>>> It could do something "strange" that seems to work on bare metal and on xen with mem=4G, but freezes everything with mem > 4G and gives no time to write the warning to the syslog / disk in time.
>>> 
>>> in the syslog of dom0 i do see some occasional memleaks going by, but one set could be related:
>>> Sep 28 17:55:19 localhost kernel: [81962.053321] kmemleak: 22 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
>>> 
>>> I will add a script that cat's the content of /sys/kernel/debug/kmemleak to syslog when kmemleak reports new suspected leaks.
>>> 
>>> Any suggestions to try to debug this further ?

>> <shakes his head>
>> Do you have the name of the grabber + USB3 device? If it is not too much I might
>> as well get it and see what happens on my boxes.






-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-10-02 17:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-15 21:09 pci passthrough xhci host controller Sander Eikelenboom
2010-09-20 20:33 ` Konrad Rzeszutek Wilk
2010-09-21 20:03   ` Sander Eikelenboom
2010-09-27 15:59     ` Konrad Rzeszutek Wilk
2010-09-27 20:35       ` Sander Eikelenboom
2010-09-30 19:24       ` Sander Eikelenboom
2010-10-01 20:54         ` Konrad Rzeszutek Wilk
2010-10-01 23:33           ` Sander Eikelenboom
2010-10-02 17:44             ` Sander Eikelenboom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.