All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Sander Eikelenboom <linux@eikelenboom.it>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: xen-swiotlb
Date: Thu, 26 Aug 2010 17:43:21 -0400	[thread overview]
Message-ID: <20100826214321.GA17372@phenom.dumpdata.com> (raw)
In-Reply-To: <1077869382.20100816213227@eikelenboom.it>

> > Looking at the output, the physical addresses that DMA-ed are:
> 
> > 0x1f2962dc0
> > 0x1f24f2e68
> 
> > and they look to be called quite often. In fact, there looks to be a
> > loop that does something like this:
> 
> > again:
> >   p = kmalloc(..)
> 
> >   dma = pci_map_single(p)
> >   pci_dma_mapping_error(dma);
> >   /* get some data.. */
> >   /* parse the: (pipe 0x80000280): IN:  c0 00 00 00 0c 00 01 00 */
> >   pci_unmap_sg(dma);
> >   goto again;
> 
> > As the virtual address sent to pci_map_single looks to be sequentially
> > increasing.

I am not sure what virtual address I was thinking off. Looking again
at the dmesg output it looks actually correct. SWIOTLB is giving out
chunks, increasing its own index, and then unmaping those pages right
away.


This stack-trace in the middle of the log:
 
Aug 15 23:14:02 security kernel: [  103.646497] em28xx #0 em28xx_isoc_copy :Video frame 1, length=960, odd
Aug 15 23:14:02 security kernel: [  103.650177] xhci_hcd 0000:08:00.0: Miss service interval error, set skip flag
Aug 15 23:14:02 security kernel: [  103.651371] em28xx #0 em28xx_irq_callback :urb completition error -18.
Aug 15 23:14:02 security kernel: [  103.651406] em28xx #0 print_err_status :URB status -18 [Unknown].
Aug 15 23:14:02 security kernel: [  103.651432] em28xx #0 em28xx_isoc_copy :Video frame 2, length=2888, even
Aug 15 23:14:02 security kernel: [  103.651454] em28xx #0 get_next_buf :No active queue to serve
Aug 15 23:14:02 security kernel: [  103.651477] em28xx #0 print_err_status :URB packet 63, status -18 [Unknown].
Aug 15 23:14:02 security kernel: [  103.653143] motion: page allocation failure. order:1, mode:0x20
Aug 15 23:14:02 security kernel: [  103.653177] Pid: 1418, comm: motion Not tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14
Aug 15 23:14:02 security kernel: [  103.653205] Call Trace:
Aug 15 23:14:02 security kernel: [  103.653219]  <IRQ>  [<ffffffff810a7ebf>] __alloc_pages_nodemask+0x667/0x6c7
Aug 15 23:14:02 security kernel: [  103.653272]  [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653300]  [<ffffffff810ca32f>] cache_alloc_refill+0x3b3/0x851
Aug 15 23:14:02 security kernel: [  103.653326]  [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf
Aug 15 23:14:02 security kernel: [  103.653352]  [<ffffffff81007202>] ? check_events+0x12/0x20
Aug 15 23:14:02 security kernel: [  103.653375]  [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653401]  [<ffffffff810ce9bd>] ? create_object+0x2e/0x2a1
Aug 15 23:14:02 security kernel: [  103.653425]  [<ffffffff810c9e3f>] kmem_cache_alloc+0xf7/0x1c4
Aug 15 23:14:02 security kernel: [  103.653451]  [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653475]  [<ffffffff810ce9bd>] create_object+0x2e/0x2a1
Aug 15 23:14:02 security kernel: [  103.653499]  [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653523]  [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653548]  [<ffffffff810c8f68>] ? cache_alloc_debugcheck_after+0x1bf/0x239
Aug 15 23:14:02 security kernel: [  103.653575]  [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653602]  [<ffffffff8143e7d7>] kmemleak_alloc+0x21/0x3e
Aug 15 23:14:02 security kernel: [  103.653625]  [<ffffffff810c9edf>] kmem_cache_alloc+0x197/0x1c4
Aug 15 23:14:02 security kernel: [  103.653649]  [<ffffffff8135e085>] xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653675]  [<ffffffff81340450>] usb_hcd_submit_urb+0xb1f/0xbf5
Aug 15 23:14:02 security kernel: [  103.653701]  [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653726]  [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf
Aug 15 23:14:02 security kernel: [  103.653751]  [<ffffffff81007202>] ? check_events+0x12/0x20
Aug 15 23:14:02 security kernel: [  103.653777]  [<ffffffff812b6267>] ? vt_console_print+0x43/0x346
Aug 15 23:14:02 security kernel: [  103.653802]  [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653830]  [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653857]  [<ffffffff8106d5b0>] ? lock_release+0x1e0/0x1ed
Aug 15 23:14:03 security kernel: [  103.653882]  [<ffffffff81340d45>] usb_submit_urb+0x249/0x260
Aug 15 23:14:03 security kernel: [  103.653919]  [<ffffffffa0060615>] em28xx_irq_callback+0xc9/0xfd [em28xx]
Aug 15 23:14:03 security kernel: [  103.653946]  [<ffffffff8133f16a>] usb_hcd_giveback_urb+0x84/0xb8
Aug 15 23:14:03 security kernel: [  103.653972]  [<ffffffff8136480a>] xhci_handle_event+0xced/0xdf7
Aug 15 23:14:03 security kernel: [  103.653998]  [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7
Aug 15 23:14:03 security kernel: [  103.654024]  [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7
Aug 15 23:14:03 security kernel: [  103.654050]  [<ffffffff81364a31>] xhci_irq+0x11d/0x1af
Aug 15 23:14:03 security kernel: [  103.654074]  [<ffffffff81364ade>] xhci_msi_irq+0x1b/0x1d
Aug 15 23:14:03 security kernel: [  103.654103]  [<ffffffff8108e4aa>] handle_IRQ_event+0x20/0xc7
Aug 15 23:14:03 security kernel: [  103.654128]  [<ffffffff810902a4>] handle_level_irq+0x98/0xf3
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812997a8>] __xen_evtchn_do_upcall+0xe1/0x183
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8129a423>] xen_evtchn_do_upcall+0x2a/0x3c
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8100a9ee>] xen_do_hypervisor_callback+0x1e/0x30
Aug 15 23:14:03 security kernel: [  103.654151]  <EOI>  [<ffffffff810012eb>] ? hypercall_page+0x2eb/0x1006
Aug 15 23:14:03 security kernel: [  103.654151] Mem-Info:
Aug 15 23:14:03 security kernel: [  103.654151] DMA per-cpu:
Aug 15 23:14:03 security kernel: [  103.654151] CPU    0: hi:    0, btch:   1 usd:   0
Aug 15 23:14:03 security kernel: [  103.654151] DMA32 per-cpu:
Aug 15 23:14:03 security kernel: [  103.654151] CPU    0: hi:  186, btch:  31 usd: 168
Aug 15 23:14:03 security kernel: [  103.654151] active_anon:4503 inactive_anon:4767 isolated_anon:0
Aug 15 23:14:03 security kernel: [  103.654151]  active_file:4071 inactive_file:9087 isolated_file:0
Aug 15 23:14:03 security kernel: [  103.654151]  unevictable:0 dirty:1081 writeback:0 unstable:0
Aug 15 23:14:03 security kernel: [  103.654151]  free:1584 slab_reclaimable:5991 slab_unreclaimable:153956
Aug 15 23:14:03 security kernel: [  103.654151]  mapped:3401 shmem:48 pagetables:1452 bounce:0
Aug 15 23:14:03 security kernel: [  103.654151] DMA free:3992kB min:60kB low:72kB high:88kB active_anon:84kB inactive_anon:436kB active_file:24kB inactive_file:148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15712kB mlocked:0kB dirty:20kB writeback:0kB mapped:16kB shmem:0kB slab_reclaimable:264kB slab_unreclaimable:8932kB kernel_stack:0kB pagetables:384kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Aug 15 23:14:03 security kernel: [  103.654151] lowmem_reserve[]: 0 994 994 994
Aug 15 23:14:03 security kernel: [  103.654151] DMA32 free:2344kB min:4000kB low:5000kB high:6000kB active_anon:17928kB inactive_anon:18632kB active_file:16260kB inactive_file:36200kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018080kB mlocked:0kB dirty:4304kB writeback:0kB mapped:13588kB shmem:192kB slab_reclaimable:23700kB slab_unreclaimable:606892kB kernel_stack:720kB pagetables:5424kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Aug 15 23:14:03 security kernel: [  103.654151] lowmem_reserve[]: 0 0 0 0
Aug 15 23:14:03 security kernel: [  103.654151] DMA: 88*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3992kB
Aug 15 23:14:03 security kernel: [  103.654151] DMA32: 406*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2344kB
Aug 15 23:14:03 security kernel: [  103.654151] 13236 total pagecache pages
Aug 15 23:14:03 security kernel: [  103.654151] 21 pages in swap cache
Aug 15 23:14:03 security kernel: [  103.654151] Swap cache stats: add 32, delete 11, find 27/27
Aug 15 23:14:03 security kernel: [  103.654151] Free swap  = 524200kB
Aug 15 23:14:03 security kernel: [  103.654151] Total swap = 524284kB
Aug 15 23:14:03 security kernel: [  103.654151] 262128 pages RAM
Aug 15 23:14:03 security kernel: [  103.654151] 73203 pages reserved
Aug 15 23:14:03 security kernel: [  103.654151] 21103 pages shared
Aug 15 23:14:03 security kernel: [  103.654151] 105944 pages non-shared
Aug 15 23:14:03 security kernel: [  103.654151] kmemleak: Cannot allocate a kmemleak_object structure
Aug 15 23:14:03 security kernel: [  103.654151] Pid: 1418, comm: motion Not tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14
Aug 15 23:14:03 security kernel: [  103.654151] Call Trace:
Aug 15 23:14:03 security kernel: [  103.654151]  <IRQ>  [<ffffffff810ce9d8>] create_object+0x49/0x2a1
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810c8f68>] ? cache_alloc_debugcheck_after+0x1bf/0x239
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8143e7d7>] kmemleak_alloc+0x21/0x3e
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810c9edf>] kmem_cache_alloc+0x197/0x1c4
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81340450>] usb_hcd_submit_urb+0xb1f/0xbf5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81007202>] ? check_events+0x12/0x20
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812b6267>] ? vt_console_print+0x43/0x346
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8106d5b0>] ? lock_release+0x1e0/0x1ed
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81340d45>] usb_submit_urb+0x249/0x260
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffffa0060615>] em28xx_irq_callback+0xc9/0xfd [em28xx]
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8133f16a>] usb_hcd_giveback_urb+0x84/0xb8
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8136480a>] xhci_handle_event+0xced/0xdf7
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81364a31>] xhci_irq+0x11d/0x1af
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81364ade>] xhci_msi_irq+0x1b/0x1d
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8108e4aa>] handle_IRQ_event+0x20/0xc7
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810902a4>] handle_level_irq+0x98/0xf3
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812997a8>] __xen_evtchn_do_upcall+0xe1/0x183
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8129a423>] xen_evtchn_do_upcall+0x2a/0x3c
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8100a9ee>] xen_do_hypervisor_callback+0x1e/0x30
Aug 15 23:14:03 security kernel: [  103.654151]  <EOI>  [<ffffffff810012eb>] ? hypercall_page+0x2eb/0x1006
Aug 15 23:14:03 security kernel: [  103.654151] kmemleak: Kernel memory leak detector disabled
Aug 15 23:14:03 security kernel: [  103.654151] xhci_hcd 0000:08:00.0: Found td. Clear skip flag.
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB packet 0, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB packet 1, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB packet 2, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB packet 3, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB packet 4, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB packet 5, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB packet 6, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB packet 7, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.659299] xhci_hcd 0000:08:00.0: Miss service interval error, set skip flag
Aug 15 23:14:03 security kernel: [  103.659307] xhci_hcd 0000:08:00.0: Found td. Clear skip flag.
Aug 15 23:14:03 security kernel: [  103.661748] kmemleak: Automatic memory scanning thread ended
Aug 15 23:14:03 security kernel: [  103.755620] em28xx #0 print_err_status :URB packet 6, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.755630] em28xx #0 print_err_status :URB packet 7, status -18 [Unknown].


The first fault is kmemleaker not being able to allocate its structure,
and then it throws an error too.

Then the em28xx gets horribly confused that it could not send an URB
anymore and from there one things get worst.

There is something causing this memory problem,and I think we can narrow
it down if you output the kmemleaker output. You might have to
unload/reload the em28xx driver to get an idea of where the leak is.

> So what i have as domU kernel is:
> - from Linus tree 2.6.36-pre with latest commit 8357422d4bf33bc2c35884d4016c3fc9efbbc1d2
> - pulled your devel/xen-pcifront-0.5 tree, and fixed the merge conflicts due to the pv on hvm patches, in the same way Jeremy did for his 2.6.36 branch.
>     Conflicts:
>         drivers/xen/events.c
>         include/xen/events.h
> 
> - Added a patch for xhci isoc that fixes another bug. (attached 0001-xHCI-update-ring-dequeue-pointer-when-process-missed.patch)
> - Added a patch that shows extra debug info for xhci from the author of the xhci-isoc pathes (attached isoc_length5.patch)
> 
> - Changed some debug lines from dbg to warn level, (just enabling xhci-debug in kernel config floods the logs too fast, so i just enabled them in xhci-mem.c)
>   (Attached a patch with all my changes to xhci*, including the 2 patches above applied.)
> 
> - Added some printk's to swiotlb-xen.c to see which functions were used, and let some of them print the address as well, in the hope i could find some debug info there.(attached)

So the other thing I realized is that xhci can actually do 64bit
transfers. Try booting without the 'swiotlb=force' (but still have
iommu=soft).

> 
> Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ?

I am still puzzled as why Xen hypervisor would freeze. It _should_ work
just fine.

> Even an Oops is much easier to debug than a freeze. Due to the nature of DMA that could perhaps be difficult, although there is an DMA API ...
> 
> Had a fruitful LinuxCon ?

Quite so!
> 
> --
> Sander

  reply	other threads:[~2010-08-26 21:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-15 21:28 xen-swiotlb Sander Eikelenboom
2010-08-16 14:43 ` xen-swiotlb Konrad Rzeszutek Wilk
2010-08-16 19:32   ` xen-swiotlb Sander Eikelenboom
2010-08-26 21:43     ` Konrad Rzeszutek Wilk [this message]
2010-08-27  6:46       ` xen-swiotlb Pasi Kärkkäinen
2010-08-27 14:19         ` Konrad Rzeszutek Wilk
2010-08-27 14:32           ` Pasi Kärkkäinen
2010-09-13  6:59             ` Sander Eikelenboom
2010-09-13  6:57       ` xen-swiotlb Sander Eikelenboom
2010-09-19 13:14       ` xen-swiotlb Sander Eikelenboom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100826214321.GA17372@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=linux@eikelenboom.it \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.