Re: Random data corruption in VM, possibly caused by rbd

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Josh Durgin <josh.durgin@inktank.com>
To: Guido Winkelmann <guido-ceph@thisisnotatest.de>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Random data corruption in VM, possibly caused by rbd
Date: Thu, 07 Jun 2012 12:48:05 -0700	[thread overview]
Message-ID: <4FD10575.7010300@inktank.com> (raw)
In-Reply-To: <21601270.dfB0BsVfyn@pc10>

On 06/07/2012 11:04 AM, Guido Winkelmann wrote:
> Hi,
>
> I'm using Ceph with RBD to provide network-transparent disk images for KVM-
> based virtual servers. The last two days, I've been hunting some weird elusive
> bug where data in the virtual machines would be corrupted in weird ways. It
> usually manifests in files having some random data - usually zeroes - at the
> start before the actual contents that should be in there start.

I definitely want to figure out what's going on with this.
A few questions:

Are you using rbd caching? If so, what settings?

In either case, does the corruption still occur if you
switch caching on/off? There are different I/O paths here,
and this might tell us if the problem is on the client side.

Another thing to try is turning off sparse reads on the osd by setting
filestore fiemap threshold = 0

> To track this down, I wrote a simple io tester. It does the following:
>
> - Create 1 Megabyte of random data
> - Calculate the SHA256 hash of that data
> - Write the data to a file on the harddisk, in a given directory, using the
> hash as the filename
> - Repeat until the disk is full
> - Delete the last file (because it is very likely to be incompletely written)
> - Read and delete all the files just written while checking that their sha256
> sums are equal to their filenames
>
> When running this io tester in a VM that uses a qcow2 file on a local harddisk
> for its virtual disk, no errors are found. When the same VM is running using
> rbd, the io tester finds on average about one corruption every 200 Megabytes,
> reproducably.
>
> (As in an interesting aside, the io tester also prints how long it took to
> read or write 100 MB, and it turns out reading the data back in again is about
> three times slower than writing them in the first place...)
>
> Ceph is version 0.47.2. Qemu KVM is 1.0, compiled with the spec file from
> http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=summary
> (And compiled after ceph 0.47.2 was installed on that machine, so it would use
> the correct headers...)
> Both the Ceph cluster and the KVM host machines are running on Fedora 16, with
> a fairly recent 3.3.x kernel.

Those versions should all work.

> The ceph cluster uses btrf for the osd's data dirs. The journal is on a tmpfs.
> (This is not a production setup - luckily.)
> The virtual machine is using ext4 as its filesystem.
> There were no obvious other problems with either the ceph cluster or the KVM
> host machines.

Were there any nodes with osds restarted during the test runs? I wonder
if it's a problem with losing the tmpfs journal.

As Oliver suggested, switching the osd data dir filesystem might help
too.

> I have attached a copy of the ceph.conf in use, in case it might be helpful.
>
> This is a huge problem, and any help in tracking it down would be much
> appreciated.

Agreed, and I'm happy to help.

Josh

> Regards,
>
> 	Guido

next prev parent reply	other threads:[~2012-06-07 19:48 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-07 18:04 Random data corruption in VM, possibly caused by rbd Guido Winkelmann
2012-06-07 18:18 ` Stefan Priebe
2012-06-07 18:37   ` Guido Winkelmann
2012-06-07 19:54     ` Andrey Korolyov
2012-06-07 21:03       ` Guido Winkelmann
2012-06-07 21:53     ` Marcus Sorensen
2012-06-07 22:12       ` Guido Winkelmann
2012-06-07 18:40 ` Oliver Francke
2012-06-07 19:48 ` Josh Durgin [this message]
2012-06-07 21:36   ` Guido Winkelmann
2012-06-07 22:13     ` Tommi Virtanen
2012-06-08 12:55   ` Guido Winkelmann
2012-06-08 13:08     ` Guido Winkelmann
2012-06-08 13:36     ` Oliver Francke
2012-06-08 13:55       ` Sage Weil
2012-06-08 14:50         ` Josh Durgin
2012-06-08 15:39           ` Oliver Francke
2012-06-08 17:15           ` Guido Winkelmann
2012-06-10  3:04             ` Sage Weil
2012-06-10  3:07               ` Sage Weil
2012-06-11 14:15               ` Guido Winkelmann
2012-06-11 15:50         ` Guido Winkelmann
2012-06-11 16:30           ` Sage Weil
2012-06-11 17:07             ` Guido Winkelmann
2012-06-11 17:12               ` Sage Weil
2012-06-11 17:29               ` Josh Durgin
2012-06-12 12:31             ` Guido Winkelmann
2012-06-15 12:14               ` Stefan Majer
2012-06-15 15:38                 ` Josh Durgin
2012-06-15 18:50                   ` Josh Durgin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD10575.7010300@inktank.com \
    --to=josh.durgin@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=guido-ceph@thisisnotatest.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.