From: Pete Ashdown <pashdown@xmission.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: kvm@vger.kernel.org, Aaron Toponce <atoponce@xmission.com>
Subject: Re: kvm + raid1 showstopper bug
Date: Sun, 19 Feb 2012 11:17:43 -0700 [thread overview]
Message-ID: <4F413CC7.1020904@xmission.com> (raw)
In-Reply-To: <CAJSP0QUBnnmQepv8AZuXdg+mEXY6ckPMeUTuW_drDD3yLEtouw@mail.gmail.com>
On 2/18/12 6:25 AM, Stefan Hajnoczi wrote:
>> In my case, it is drbd+RAID10, but the bug still applies. It isn't
>> whenever checkarray runs, but whenever checkarray decides to do a resync,
>> it will block all IO somewhere before the end of the resync. Then yes, it
>> isn't long before the guests start to fail due to their inability to
>> read/write.
> I have not attempted to reproduce this yet but have taken a look at
> drviers/md/raid10.c resync code. md resync uses a similar mechanism
> for RAID1 and RAID10. While a block is being synced the entire device
> will force regular I/O requests to wait. There are tunables which let
> you rate-limit resyncing, I think this can solve your problem.
> Perhaps the resync is too aggressive and is impacting regular I/O so
> much that the guest is warning about it. See Documentation/md.txt for
> sync_speed_max and other sysfs attributes.
Is sync_speed_max independent of dev.raid.speed_limit_max? Because I
tried that to no avail.
> The bug report suggests qemu-kvm itself is operating fine because the
> guest is still executing and VNC/monitor are alive. After a while the
> guest warns about the stuck I/O.
>
> Networking may become unresponsive if there is disk I/O required, e.g.
> ssh daemon reading keys for a user. Your best bet at testing that
> theory is using ICMP ping because that shouldn't involve disk I/O.
>
> It would be interesting to start resync and then run the following on
> the host: time dd if=/dev/zero of=/path/to/device/tmpfile oflag=sync
> bs=4k count=1. You don't even need qemu-kvm for this test. I suspect
> this single 4 KB write to the file system will take many
> seconds/minutes. It would show that the problem is in the host -
> there is too little time for regular I/O which causes guest operating
> systems and applications to freak out.
>
> Another approach to testing is running a guest without RAID resync
> underneath. Use dm-delay to insert an artificial delay on I/O
> requests (try 130 seconds). My guess is the guest operating system
> will react in the same way because its I/O requests take an extremely
> long time.
>
I will try these other tests when I get a chance.
next prev parent reply other threads:[~2012-02-19 18:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-17 4:57 kvm + raid1 showstopper bug Pete Ashdown
2012-02-17 11:30 ` Stefan Hajnoczi
2012-02-17 15:31 ` Pete Ashdown
2012-02-18 13:25 ` Stefan Hajnoczi
2012-02-19 18:17 ` Pete Ashdown [this message]
2012-02-21 12:40 ` Jes Sorensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F413CC7.1020904@xmission.com \
--to=pashdown@xmission.com \
--cc=atoponce@xmission.com \
--cc=kvm@vger.kernel.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.