xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Nomen Nescio <info@nomennesc.io>
To: xen-devel@lists.xensource.com
Subject: Re: remus trouble
Date: Wed, 7 Jul 2010 14:12:09 +0200	[thread overview]
Message-ID: <20100707121209.GO9918@puscii.nl> (raw)
In-Reply-To: <20100706180212.GE13388@kremvax.cs.ubc.ca>

Hey Brendan & all,

> > I ran into some problems trying remus on xen4.0.1rc4 with the 2.6.31.13
> > dom0 (checkout from yesterday):
> 
> Wat's your domU kernel? pvops support was recently added to dom0, but
> still doesn't work for domU.

Ah, that explains a few things, however similar behaviour occurs with
hvm. Remus starts, spits out the following output:

qemu logdirty mode: enable
 1: sent 267046, skipped 218, delta 8962ms, dom0 68%, target 0%, sent
976Mb/s, dirtied 1Mb/s 290 pages
 2: sent 290, skipped 0, delta 12ms, dom0 66%, target 0%, sent 791Mb/s,
dirtied 43Mb/s 16 pages
 3: sent 16, skipped 0, Start last iteration
PROF: suspending at 1278503125.101352
issuing HVM suspend hypercall
suspend hypercall returned 0
pausing QEMU
SUSPEND shinfo 000fffff
delta 11ms, dom0 18%, target 0%, sent 47Mb/s, dirtied 47Mb/s 16 pages
 4: sent 16, skipped 0, delta 5ms, dom0 20%, target 0%, sent 104Mb/s,
dirtied 104Mb/s 16 pages
Total pages sent= 267368 (0.25x)
(of which 0 were fixups)
All memory is saved
PROF: resumed at 1278503125.111614
resuming QEMU
Sending 6017 bytes of QEMU state
PROF: flushed memory at 1278503125.112014


and then seems to become inactive. ps tree looks like this:

root      4756  0.4  0.1  82740 11040 pts/0    SLl+ 13:45   0:03
/usr/bin/python /usr/bin/remus --no-net remus1 backup


according to strace, it's stuck reading FD6, which is a FIFO file:
/var/run/tap/remus_nas1_9000.msg


the domU comes up in blocked state on the backup machine and seems to
run fine there. however xm list on the primary shows no state whatsoever:

Domain-0                                     0 10208    12     r-----
468.6
remus1                                       1  1024     1     ------
41.8


and after a ctrl-c remus segfaults:
remus[4756]: segfault at 0 ip 00007f3f49cc7376 sp 00007fffec999fd8 error
4 in libc-2.11.1.so[7f3f49ba1000+178000]


> Are these in dom0 or the primary domU? Looks a bit like dom0, but I
> haven't seen these before.

those were in dom0. this time dmesg shows output after destroying
the domU on the primary:

[ 1920.059226] INFO: task xenwatch:55 blocked for more than 120 seconds.
[ 1920.059262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1920.059315] xenwatch      D 0000000000000000     0    55      2
0x00000000
[ 1920.059363]  ffff8802e2e656c0 0000000000000246 0000000000011200
0000000000000000
[ 1920.059439]  ffff8802e2e65720 0000000000000000 ffff8802d55d20c0
00000001001586b3
[ 1920.059520]  ffff8802e2e683b0 000000000000f668 00000000000153c0
ffff8802e2e683b0
[ 1920.059592] Call Trace:
[ 1920.059626]  [<ffffffff8157553d>] io_schedule+0x2d/0x40
[ 1920.059661]  [<ffffffff812afbc9>] get_request_wait+0xe9/0x1c0
[ 1920.059695]  [<ffffffff810af240>] ? autoremove_wake_function+0x0/0x40
[ 1920.059732]  [<ffffffff812a3e87>] ? elv_merge+0x37/0x200
[ 1920.059765]  [<ffffffff812afd41>] __make_request+0xa1/0x470
[ 1920.059800]  [<ffffffff810389ff>] ? xen_restore_fl_direct_end+0x0/0x1
[ 1920.059837]  [<ffffffff8103ed5d>] ? retint_restore_args+0x5/0x6
[ 1920.059874]  [<ffffffff812ae5dc>] generic_make_request+0x17c/0x4a0
[ 1920.059909]  [<ffffffff8111bdf6>] ? mempool_alloc+0x56/0x140
[ 1920.059946]  [<ffffffff8103819d>] ?
xen_force_evtchn_callback+0xd/0x10
[ 1920.059979]  [<ffffffff812ae978>] submit_bio+0x78/0xf0
[ 1920.060013]  [<ffffffff81180489>] submit_bh+0xf9/0x140
[ 1920.060046]  [<ffffffff81182600>] __block_write_full_page+0x1e0/0x3a0
[ 1920.060080]  [<ffffffff811819c0>] ? end_buffer_async_write+0x0/0x1f0
[ 1920.060116]  [<ffffffff81186980>] ? blkdev_get_block+0x0/0x70
[ 1920.060151]  [<ffffffff81186980>] ? blkdev_get_block+0x0/0x70
[ 1920.060186]  [<ffffffff811819c0>] ? end_buffer_async_write+0x0/0x1f0
[ 1920.060222]  [<ffffffff81182ec1>]
block_write_full_page_endio+0xe1/0x120
[ 1920.060259]  [<ffffffff81038a12>] ? check_events+0x12/0x20
[ 1920.060294]  [<ffffffff81182f15>] block_write_full_page+0x15/0x20
[ 1920.060330]  [<ffffffff81187928>] blkdev_writepage+0x18/0x20
[ 1920.060365]  [<ffffffff81120937>] __writepage+0x17/0x40
[ 1920.060399]  [<ffffffff81121897>] write_cache_pages+0x227/0x4d0
[ 1920.060434]  [<ffffffff81120920>] ? __writepage+0x0/0x40
[ 1920.060469]  [<ffffffff810389ff>] ? xen_restore_fl_direct_end+0x0/0x1
[ 1920.060504]  [<ffffffff81121b64>] generic_writepages+0x24/0x30
[ 1920.060539]  [<ffffffff81121b9d>] do_writepages+0x2d/0x50
[ 1920.060576]  [<ffffffff81119beb>]
__filemap_fdatawrite_range+0x5b/0x60
[ 1920.060613]  [<ffffffff8111a1ff>] filemap_fdatawrite+0x1f/0x30
[ 1920.060646]  [<ffffffff8111a245>] filemap_write_and_wait+0x35/0x50
[ 1920.060681]  [<ffffffff81187ba4>] __sync_blockdev+0x24/0x50
[ 1920.060716]  [<ffffffff81187be3>] sync_blockdev+0x13/0x20
[ 1920.060748]  [<ffffffff81187cc8>] __blkdev_put+0xa8/0x1a0
[ 1920.060784]  [<ffffffff81187dd0>] blkdev_put+0x10/0x20
[ 1920.060819]  [<ffffffff81344fea>] vbd_free+0x2a/0x40
[ 1920.060851]  [<ffffffff81344499>] blkback_remove+0x59/0x90
[ 1920.060885]  [<ffffffff8133e890>] xenbus_dev_remove+0x50/0x70
[ 1920.060921]  [<ffffffff8138b9d8>] __device_release_driver+0x58/0xb0
[ 1920.060956]  [<ffffffff8138bb4d>] device_release_driver+0x2d/0x40
[ 1920.060991]  [<ffffffff8138ac0a>] bus_remove_device+0x9a/0xc0
[ 1920.061027]  [<ffffffff81388da7>] device_del+0x127/0x1d0
[ 1920.061061]  [<ffffffff81388e66>] device_unregister+0x16/0x30
[ 1920.061095]  [<ffffffff813441a0>] frontend_changed+0x90/0x2a0
[ 1920.061131]  [<ffffffff8133eb82>] xenbus_otherend_changed+0xb2/0xc0
[ 1920.061167]  [<ffffffff81577aa7>] ? _spin_unlock_irqrestore+0x37/0x60
[ 1920.061209]  [<ffffffff8133f150>] frontend_changed+0x10/0x20
[ 1920.061243]  [<ffffffff8133c794>] xenwatch_thread+0xb4/0x190
[ 1920.061281]  [<ffffffff810af240>] ? autoremove_wake_function+0x0/0x40
[ 1920.061314]  [<ffffffff8133c6e0>] ? xenwatch_thread+0x0/0x190
[ 1920.061349]  [<ffffffff810aecb6>] kthread+0xa6/0xb0
[ 1920.061383]  [<ffffffff8103f3ea>] child_rip+0xa/0x20
[ 1920.061415]  [<ffffffff8103e5d7>] ? int_ret_from_sys_call+0x7/0x1b
[ 1920.061451]  [<ffffffff8103ed5d>] ? retint_restore_args+0x5/0x6
[ 1920.061485]  [<ffffffff8103f3e0>] ? child_rip+0x0/0x20


Any idea what's going wrong? Thanks!

Cheers,

NN

      reply	other threads:[~2010-07-07 12:12 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-06 11:36 remus trouble Nomen Nescio
2010-07-06 18:02 ` Brendan Cully
2010-07-07 12:12   ` Nomen Nescio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100707121209.GO9918@puscii.nl \
    --to=info@nomennesc.io \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).