From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oliver Francke Subject: Re: Random data corruption in VM, possibly caused by rbd Date: Fri, 08 Jun 2012 17:39:17 +0200 Message-ID: <4FD21CA5.9030402@filoo.de> References: <21601270.dfB0BsVfyn@pc10> <4FD10575.7010300@inktank.com> <6535521.l6e0muMKBm@pc10> <4FD1FFD2.6050707@filoo.de> <4FD2113C.3070906@inktank.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-3.de-punkt.de ([93.190.64.33]:60323 "EHLO mail-3.de-punkt.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761278Ab2FHPjT (ORCPT ); Fri, 8 Jun 2012 11:39:19 -0400 In-Reply-To: <4FD2113C.3070906@inktank.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Josh Durgin Cc: Sage Weil , Guido Winkelmann , ceph-devel@vger.kernel.org Well then, quite busy, too with some other stuff, but... On 06/08/2012 04:50 PM, Josh Durgin wrote: > On 06/08/2012 06:55 AM, Sage Weil wrote: >> On Fri, 8 Jun 2012, Oliver Francke wrote: >>> Hi Guido, >>> >>> yeah, there is something weird going on. I just started to establis= h=20 >>> some >>> test-VM's. Freshly imported from running *.qcow2 images. >>> Kernel panic with INIT, seg-faults and other "funny" stuff. >>> >>> Just added the rbd_cache=3Dtrue in my config, voila. All is >>> fast-n-up-n-running... >>> All my testing was done with cache enabled... Since our errors all=20 >>> came from >>> rbd_writeback from former ceph-versions... >> >> Are you guys able to reproduce the corruption with 'debug osd =3D 20= ' and >> 'debug ms =3D 1'? Ideally we'd like to: >> >> - reproduce from a fresh vm, with osd logs >> - identify the bad file >> - map that file to a block offset (see >> http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) >> - use that to identify the badness in the log a logfile with debugging is available at our local store... >> >> I suspect the cache is just masking the problem because it submits f= ewer >> IOs... > > The cache also doesn't do sparse reads. Is it still reproducible with > a fresh vm when you set filestore_fiemap_threshold =3D 0 for the osds= , > and run without rbd caching? restarted OSDs with this setting, but without rbd_cache I still get=20 errors. *sigh* Oliver. > > Josh > >> sage >> >> >>> >>> Josh? Sage? Help?! >>> >>> Oliver. >>> >>> On 06/08/2012 02:55 PM, Guido Winkelmann wrote: >>>> Am Donnerstag, 7. Juni 2012, 12:48:05 schrieben Sie: >>>>> On 06/07/2012 11:04 AM, Guido Winkelmann wrote: >>>>>> Hi, >>>>>> >>>>>> I'm using Ceph with RBD to provide network-transparent disk=20 >>>>>> images for >>>>>> KVM- >>>>>> based virtual servers. The last two days, I've been hunting some= =20 >>>>>> weird >>>>>> elusive bug where data in the virtual machines would be corrupte= d in >>>>>> weird ways. It usually manifests in files having some random dat= a - >>>>>> usually zeroes - at the start before the actual contents that=20 >>>>>> should be >>>>>> in there start. >>>>> I definitely want to figure out what's going on with this. >>>>> A few questions: >>>>> >>>>> Are you using rbd caching? If so, what settings? >>>>> >>>>> In either case, does the corruption still occur if you >>>>> switch caching on/off? There are different I/O paths here, >>>>> and this might tell us if the problem is on the client side. >>>> Okay, I've tried enabling rbd caching now, and so far, the problem= =20 >>>> appears >>>> to >>>> be gone. >>>> >>>> I am using libvirt for starting and managing the virtual machines,= =20 >>>> and what >>>> I >>>> did was change the element for the virtual disk from >>>> >>>> >>>> >>>> to >>>> >>>> >>>> >>>> and then restart the VM. >>>> (I found that in one of your mails on this list; there does not=20 >>>> appear to be >>>> any proper documentation on this...) >>>> >>>> The iotester does not find any corruptions with these settings. >>>> >>>> The VM ist still horribly broken, but that's probably lingering=20 >>>> filesystem >>>> damage from yesterday. I'll try with a fresh image next. >>>> >>>> I did not change anything else in the setup. In particular, the=20 >>>> OSDs still >>>> use >>>> btrfs. One of the OSD has been restarted, though. I will run=20 >>>> another test >>>> with >>>> a VM without rbd caching, to make sure it wasn't by random chance=20 >>>> restarting >>>> that one osd that made the real difference. >>>> >>>> Enabling btrfs did not appear to make any difference wrt=20 >>>> performance, but >>>> that's probably because my tests mostly create sustained sequentia= l=20 >>>> IO, for >>>> which caches are generally not very helpful. >>>> >>>> Enabling rbd caching is not a solution I particularly like, for tw= o=20 >>>> reasons: >>>> >>>> 1. In my setup, migrating VMs from one host to another is a normal= =20 >>>> part of >>>> operation, and I still don't know ho to prevent data corruption (i= n=20 >>>> the form >>>> of silently lost writes) when combining rbd caching and migration. >>>> >>>> 2. I'm not really looking into speeding up single VM, I'm really m= ore >>>> interested in just how many VMs I can run before performance start= s >>>> degrading >>>> for everyone, and I don't think rbd caching will help with that. >>>> >>>> Regards, >>>> Guido >>>> >>>> --=20 >>>> To unsubscribe from this list: send the line "unsubscribe=20 >>>> ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> --=20 >>> >>> Oliver Francke >>> >>> filoo GmbH >>> Moltkestra=DFe 25a >>> 33330 G=FCtersloh >>> HRB4355 AG G=FCtersloh >>> >>> Gesch=E4ftsf=FChrer: S.Grewing | J.Rehp=F6hler | C.Kunz >>> >>> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh >>> >>> --=20 >>> To unsubscribe from this list: send the line "unsubscribe=20 >>> ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > --=20 Oliver Francke filoo GmbH Moltkestra=DFe 25a 33330 G=FCtersloh HRB4355 AG G=FCtersloh Gesch=E4ftsf=FChrer: S.Grewing | J.Rehp=F6hler | C.Kunz =46olgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html