All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Mark Hurenkamp <mark.hurenkamp@xs4all.nl>
Cc: xen-devel@lists.xensource.com
Subject: Re: xennet: skb rides the rocket messages in domU dmesg
Date: Tue, 01 Jun 2010 09:42:16 -0700	[thread overview]
Message-ID: <4C053868.4010400@goop.org> (raw)
In-Reply-To: <4C018A9A.7060806@xs4all.nl>

On 05/29/2010 02:43 PM, Mark Hurenkamp wrote:
>> That appears to mean that you're getting single packets which are larger
>> than 18 pages long (72k).  I'm not quite sure how that's possible, since
>> I thought the datagram limit is 64k..
>>
>> Are you using nfs over udp or tcp?  (I think tcp, from your stack
>> trace.)
>>
>> Does turning of tso/gso with ethtool make a difference?
>>    
> Ok, i tried this on the running system, and it did seem to improve
> things, but still i'd see some (other) messages.
> After a reboot, with the new xen/stable-2.6.32.13.x based kernel
> and switching tso and gso off with ethtool, these messages are
> now completely gone (have the system up for about a day now).

Hm.  I don't think disabling them should be necessary, but the only
downside in doing so is slightly higher per-packet processing cost.

>
> I do notice something else though (might have been there before,
> but now it is the only message in domU dmesg), just after starting
> nfs during boot of the domU:
>
> BUG: unable to handle kernel paging request at 00000002dcf32198
> IP: [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
> PGD a777067 PUD 0
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/pci-0/pci0000:08/0000:08:02.0/local_cpus

What device is 0000:08:02.0?

> CPU 0
> Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss
> autofs4 ipv6 wm8775 tea5767 cx25840 tuner_simple sunrpc tuner_types
> tda9887 tda8290 tuner msp3400 saa7127 saa7115 ivtv i2c_algo_bit
> cx2341x v4l2_common videodev v4l1_compat xen_fbfront
> v4l2_compat_ioctl32 fb_sys_fops tveeprom sysimgblt joydev i2c_core
> sysfillrect xen_kbdfront syscopyarea xen_netfront raid10 raid456
> async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
> async_tx raid1 raid0 multipath linear
> Pid: 3468, comm: irqbalance Not tainted 2.6.32.13m7.1 #1
> RIP: e030:[<ffffffff811cf09a>]  [<ffffffff811cf09a>]
> bitmap_scnprintf+0x5c/0xb6
> RSP: e02b:ffff88001cbd9e18  EFLAGS: 00010246
> RAX: ffffffff81527f2b RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000ffe RDI: 0000000000000000
> RBP: ffff88001cbd9e48 R08: 0000000000000010 R09: 0000000000000001
> R10: 0000000000000357 R11: dead000000200200 R12: 0000000000000000
> R13: 0000000000000ffe R14: 00000002dcf32198 R15: ffff880002bbd000
> FS:  00007fc142b6d720(0000) GS:ffff8800046e0000(0000)
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000002dcf32198 CR3: 000000001ca58000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process irqbalance (pid: 3468, threadinfo ffff88001cbd8000, task
> ffff88001ded2920)
> Stack:
>  0000000000000200 ffff880002bbd000 ffff88001cbd9f58 ffff880002eeb858
> <0> ffff88001ce8ed10 ffffffff81616230 ffff88001cbd9e68 ffffffff811dd333
> <0> ffff880002eeb878 ffffffff81606368 ffff88001cbd9e98 ffffffff81273574
> Call Trace:
>  [<ffffffff811dd333>] local_cpus_show+0x44/0x57
>  [<ffffffff81273574>] dev_attr_show+0x22/0x49
>  [<ffffffff810a4e8e>] ? __get_free_pages+0x9/0x46
>  [<ffffffff8112fbc2>] sysfs_read_file+0xb4/0x139
>  [<ffffffff810da927>] vfs_read+0xa6/0x103
>  [<ffffffff810daa3a>] sys_read+0x45/0x69
>  [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b
> Code: e0 48 c7 c0 2b 7f 52 81 41 83 ec 20 31 db eb 60 44 89 e2 44 89
> e1 48 63 fb 83 e1 3f c1 fa 06 41 b9 01 00 00 00 48 63 d2 44 89 ee <49>
> 8b 14 d6 29 de 48 d3 ea 49 8d 3c 3f 44 88 c1 41 83 ec 20 49
> RIP  [<ffffffff811cf09a>] bitmap_scnprintf+0x5c/0xb6
>  RSP <ffff88001cbd9e18>
> CR2: 00000002dcf32198
> ---[ end trace 5f520ed1e48e5394 ]---
>
>
> During boot of dom0 i see the following when it is starting my domU
> (seems to be more of a warning):
> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.

Interesting.  That looks like a bug in the core kernel's mmu notifier
machinery that we're using, but the only side-effect is that it will
disable lockdep checking.

> Pid: 5861, comm: qemu-dm Not tainted 2.6.32.13m7.1 #1
> Call Trace:
>  [<ffffffff8106a625>] __lock_acquire+0x431/0x459
>  [<ffffffff810b029d>] ? vma_prio_tree_remove+0x27/0xda
>  [<ffffffff8106a6b1>] lock_acquire+0x64/0x81
>  [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c
>  [<ffffffff813cdb70>] _spin_lock_nest_lock+0x31/0x66
>  [<ffffffff810b939d>] ? mm_take_all_locks+0xe5/0x11c
>  [<ffffffff813ccc0e>] ? mutex_lock_nested+0x34/0x39
>  [<ffffffff810b939d>] mm_take_all_locks+0xe5/0x11c
>  [<ffffffff810cbcbc>] ? do_mmu_notifier_register+0x56/0x113
>  [<ffffffff810cbcc4>] do_mmu_notifier_register+0x5e/0x113
>  [<ffffffff810cbd94>] mmu_notifier_register+0xe/0x10
>  [<ffffffff8123acdb>] gntdev_open+0x8f/0xcc
>  [<ffffffff81257dc2>] misc_open+0x188/0x21e
>  [<ffffffff810dd1f6>] chrdev_open+0x164/0x185
>  [<ffffffff810dd092>] ? chrdev_open+0x0/0x185
>  [<ffffffff810d8bd5>] __dentry_open+0x149/0x27f
>  [<ffffffff810d8dd1>] nameidata_to_filp+0x3d/0x4e
>  [<ffffffff810e59ed>] do_filp_open+0x4ee/0x9e9
>  [<ffffffff8100e871>] ? xen_force_evtchn_callback+0xd/0xf
>  [<ffffffff8100eff2>] ? check_events+0x12/0x20
>  [<ffffffff811d0637>] ? _raw_spin_unlock+0x8f/0x98
>  [<ffffffff813cdb3a>] ? _spin_unlock+0x26/0x2b
>  [<ffffffff810eedf2>] ? alloc_fd+0x111/0x123
>  [<ffffffff810d89a3>] do_sys_open+0x5e/0x10a
>  [<ffffffff810d8a78>] sys_open+0x1b/0x1d
>  [<ffffffff81011b02>] system_call_fastpath+0x16/0x1b
>
>
> Probably not related, i see the following message in my dom0 from time
> to time, and if it appears at the 'wrong' moment, it causes my system
> to become completely unusable as soon as a process needs disk access.
>
> ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata4.00: BMDMA stat 0x64
> ata4.00: failed command: READ DMA
> ata4.00: cmd c8/00:08:99:13:5c/00:00:00:00:00/ef tag 0 dma 4096 in
>          res 51/40:00:a0:13:5c/00:00:00:00:00/ef Emask 0x9 (media error)
> ata4.00: status: { DRDY ERR }
> ata4.00: error: { UNC }
> ata4.00: configured for UDMA/133
> ata4.01: configured for UDMA/133
> ata4: EH complete
>
> Not sure if this is related though, it could be just a bad disk (it
> seems to be always related to the same disk), i'm going to replace the
> disk, and see if that makes a difference.

That looks like a real disk error - it's getting uncorrectable read errors.

    J

      reply	other threads:[~2010-06-01 16:42 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-26 21:21 xennet: skb rides the rocket messages in domU dmesg Mark Hurenkamp
2010-05-26 22:39 ` Jeremy Fitzhardinge
2010-05-29 21:43   ` Mark Hurenkamp
2010-06-01 16:42     ` Jeremy Fitzhardinge [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C053868.4010400@goop.org \
    --to=jeremy@goop.org \
    --cc=mark.hurenkamp@xs4all.nl \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.