From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josh Durgin Subject: Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing Date: Tue, 26 Mar 2013 01:30:02 -0700 Message-ID: <51515C8A.1080107@inktank.com> References: <34E007C3-D952-4350-83FA-F9BC34294EEF@filoo.de> <514CB14F.8040209@inktank.com> <51502118.7060906@filoo.de> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-pa0-f47.google.com ([209.85.220.47]:58661 "EHLO mail-pa0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934095Ab3CZI3y (ORCPT ); Tue, 26 Mar 2013 04:29:54 -0400 Received: by mail-pa0-f47.google.com with SMTP id bj3so1033910pad.34 for ; Tue, 26 Mar 2013 01:29:53 -0700 (PDT) In-Reply-To: <51502118.7060906@filoo.de> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Oliver Francke Cc: "ceph-devel@vger.kernel.org" On 03/25/2013 03:04 AM, Oliver Francke wrote: > Hi josh, > > logfile is attached... Thanks. It shows nothing out of the ordinary, but I just reproduced the incorrect rollback locally, so it shouldn't be hard to track down from here. I opened http://tracker.ceph.com/issues/4551 to track it. Josh > On 03/22/2013 08:30 PM, Josh Durgin wrote: >> On 03/22/2013 12:09 PM, Oliver Francke wrote: >>> Hi Josh, all, >>> >>> I did not want to hijack the thread dealing with a crashing VM, but >>> perhaps there are some common things. >>> >>> Today I installed a fresh cluster with mkephfs, went fine, imported= a >>> "master" debian 6.0 image with "format 2", made a snapshot, protect= ed >>> it, and made some clones. >>> Clones mounted with qemu-nbd, fiddled a bit with >>> IP/interfaces/hosts/net.rules=85etc and cleanly unmounted, VM start= ed, >>> took 2 secs and the VM was up n running. Cool. >>> >>> Now an ordinary shutdown was performed, made a snapshot of this >>> image. Started again, did some "apt-get update=85 install s/t=85". >>> Shutdown -> rbd rollback -> startup again -> login -> install s/t >>> else=85 filesystem showed "many" ex3-errors, fell into read-only mo= de, >>> massive corruption. >> >> This sounds like it might be a bug in rollback. Could you try clonin= g >> and snapshotting again, but export the image before booting, and aft= er >> rolling back, and compare the md5sums? > > Done, first MD5-mismatch after 32 4MB blocks, checked with dd and a b= s > of 4MB. > >> >> Running the rollback with: >> >> --debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log >> >> might help too. Does your ceph.conf where you ran the rollback have >> anything related to rbd_cache in it? > > No cache settings in global ceph.conf. > > Hope it helps, > > Oliver. > >> >>> qemu config was with ":rbd_cache=3Dfalse" if it matters. Above scen= ario >>> is reproducible, and as I stated out, no crash detected. >>> >>> Perhaps it is in the same area as in the crash-thread, otherwise I >>> will provide logfiles as needed. >> >> It's unrelated, the other thread is an issue with the cache, which d= oes >> not cause corruption but triggers a crash. >> >> Josh >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel= " in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html