All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Durgin <josh.durgin@inktank.com>
To: Sage Weil <sage@inktank.com>
Cc: Oliver Francke <Oliver.Francke@filoo.de>,
	Guido Winkelmann <guido-ceph@thisisnotatest.de>,
	ceph-devel@vger.kernel.org
Subject: Re: Random data corruption in VM, possibly caused by rbd
Date: Fri, 08 Jun 2012 07:50:36 -0700	[thread overview]
Message-ID: <4FD2113C.3070906@inktank.com> (raw)
In-Reply-To: <Pine.LNX.4.64.1206080652031.10292@cobra.newdream.net>

On 06/08/2012 06:55 AM, Sage Weil wrote:
> On Fri, 8 Jun 2012, Oliver Francke wrote:
>> Hi Guido,
>>
>> yeah, there is something weird going on. I just started to establish some
>> test-VM's. Freshly imported from running *.qcow2 images.
>> Kernel panic with INIT, seg-faults and other "funny" stuff.
>>
>> Just added the rbd_cache=true in my config, voila. All is
>> fast-n-up-n-running...
>> All my testing was done with cache enabled... Since our errors all came from
>> rbd_writeback from former ceph-versions...
>
> Are you guys able to reproduce the corruption with 'debug osd = 20' and
> 'debug ms = 1'?  Ideally we'd like to:
>
>   - reproduce from a fresh vm, with osd logs
>   - identify the bad file
>   - map that file to a block offset (see
>     http://ceph.com/qa/fiemap.[ch], linux_fiemap.h)
>   - use that to identify the badness in the log
>
> I suspect the cache is just masking the problem because it submits fewer
> IOs...

The cache also doesn't do sparse reads. Is it still reproducible with
a fresh vm when you set filestore_fiemap_threshold = 0 for the osds,
and run without rbd caching?

Josh

> sage
>
>
>>
>> Josh? Sage? Help?!
>>
>> Oliver.
>>
>> On 06/08/2012 02:55 PM, Guido Winkelmann wrote:
>>> Am Donnerstag, 7. Juni 2012, 12:48:05 schrieben Sie:
>>>> On 06/07/2012 11:04 AM, Guido Winkelmann wrote:
>>>>> Hi,
>>>>>
>>>>> I'm using Ceph with RBD to provide network-transparent disk images for
>>>>> KVM-
>>>>> based virtual servers. The last two days, I've been hunting some weird
>>>>> elusive bug where data in the virtual machines would be corrupted in
>>>>> weird ways. It usually manifests in files having some random data -
>>>>> usually zeroes - at the start before the actual contents that should be
>>>>> in there start.
>>>> I definitely want to figure out what's going on with this.
>>>> A few questions:
>>>>
>>>> Are you using rbd caching? If so, what settings?
>>>>
>>>> In either case, does the corruption still occur if you
>>>> switch caching on/off? There are different I/O paths here,
>>>> and this might tell us if the problem is on the client side.
>>> Okay, I've tried enabling rbd caching now, and so far, the problem appears
>>> to
>>> be gone.
>>>
>>> I am using libvirt for starting and managing the virtual machines, and what
>>> I
>>> did was change the<source>   element for the virtual disk from
>>>
>>> <source protocol='rbd' name='rbd/name_of_image'>
>>>
>>> to
>>>
>>> <source protocol='rbd' name='rbd/name_of_image:rbd_cache=true'>
>>>
>>> and then restart the VM.
>>> (I found that in one of your mails on this list; there does not appear to be
>>> any proper documentation on this...)
>>>
>>> The iotester does not find any corruptions with these settings.
>>>
>>> The VM ist still horribly broken, but that's probably lingering filesystem
>>> damage from yesterday. I'll try with a fresh image next.
>>>
>>> I did not change anything else in the setup. In particular, the OSDs still
>>> use
>>> btrfs. One of the OSD has been restarted, though. I will run another test
>>> with
>>> a VM without rbd caching, to make sure it wasn't by random chance restarting
>>> that one osd that made the real difference.
>>>
>>> Enabling btrfs did not appear to make any difference wrt performance, but
>>> that's probably because my tests mostly create sustained sequential IO, for
>>> which caches are generally not very helpful.
>>>
>>> Enabling rbd caching is not a solution I particularly like, for two reasons:
>>>
>>> 1. In my setup, migrating VMs from one host to another is a normal part of
>>> operation, and I still don't know ho to prevent data corruption (in the form
>>> of silently lost writes) when combining rbd caching and migration.
>>>
>>> 2. I'm not really looking into speeding up single VM, I'm really more
>>> interested in just how many VMs I can run before performance starts
>>> degrading
>>> for everyone, and I don't think rbd caching will help with that.
>>>
>>> Regards,
>>> 	Guido
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>>
>> Oliver Francke
>>
>> filoo GmbH
>> Moltkestraße 25a
>> 33330 Gütersloh
>> HRB4355 AG Gütersloh
>>
>> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
>>
>> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-06-08 14:50 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-07 18:04 Random data corruption in VM, possibly caused by rbd Guido Winkelmann
2012-06-07 18:18 ` Stefan Priebe
2012-06-07 18:37   ` Guido Winkelmann
2012-06-07 19:54     ` Andrey Korolyov
2012-06-07 21:03       ` Guido Winkelmann
2012-06-07 21:53     ` Marcus Sorensen
2012-06-07 22:12       ` Guido Winkelmann
2012-06-07 18:40 ` Oliver Francke
2012-06-07 19:48 ` Josh Durgin
2012-06-07 21:36   ` Guido Winkelmann
2012-06-07 22:13     ` Tommi Virtanen
2012-06-08 12:55   ` Guido Winkelmann
2012-06-08 13:08     ` Guido Winkelmann
2012-06-08 13:36     ` Oliver Francke
2012-06-08 13:55       ` Sage Weil
2012-06-08 14:50         ` Josh Durgin [this message]
2012-06-08 15:39           ` Oliver Francke
2012-06-08 17:15           ` Guido Winkelmann
2012-06-10  3:04             ` Sage Weil
2012-06-10  3:07               ` Sage Weil
2012-06-11 14:15               ` Guido Winkelmann
2012-06-11 15:50         ` Guido Winkelmann
2012-06-11 16:30           ` Sage Weil
2012-06-11 17:07             ` Guido Winkelmann
2012-06-11 17:12               ` Sage Weil
2012-06-11 17:29               ` Josh Durgin
2012-06-12 12:31             ` Guido Winkelmann
2012-06-15 12:14               ` Stefan Majer
2012-06-15 15:38                 ` Josh Durgin
2012-06-15 18:50                   ` Josh Durgin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD2113C.3070906@inktank.com \
    --to=josh.durgin@inktank.com \
    --cc=Oliver.Francke@filoo.de \
    --cc=ceph-devel@vger.kernel.org \
    --cc=guido-ceph@thisisnotatest.de \
    --cc=sage@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.