* Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing @ 2013-03-22 19:09 Oliver Francke 2013-03-22 19:30 ` Josh Durgin 0 siblings, 1 reply; 5+ messages in thread From: Oliver Francke @ 2013-03-22 19:09 UTC (permalink / raw) To: ceph-devel@vger.kernel.org; +Cc: josh.durgin@inktank.com Durgin Hi Josh, all, I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things. Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones. Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool. Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…". Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption. qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected. Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed. Kind regards, Oliver. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing 2013-03-22 19:09 Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing Oliver Francke @ 2013-03-22 19:30 ` Josh Durgin [not found] ` <51502118.7060906@filoo.de> 0 siblings, 1 reply; 5+ messages in thread From: Josh Durgin @ 2013-03-22 19:30 UTC (permalink / raw) To: Oliver Francke; +Cc: ceph-devel@vger.kernel.org On 03/22/2013 12:09 PM, Oliver Francke wrote: > Hi Josh, all, > > I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things. > > Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones. > Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool. > > Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…". > Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption. This sounds like it might be a bug in rollback. Could you try cloning and snapshotting again, but export the image before booting, and after rolling back, and compare the md5sums? Running the rollback with: --debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log might help too. Does your ceph.conf where you ran the rollback have anything related to rbd_cache in it? > qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected. > > Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed. It's unrelated, the other thread is an issue with the cache, which does not cause corruption but triggers a crash. Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <51502118.7060906@filoo.de>]
* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing [not found] ` <51502118.7060906@filoo.de> @ 2013-03-26 8:30 ` Josh Durgin 2013-03-26 8:33 ` Oliver Francke 0 siblings, 1 reply; 5+ messages in thread From: Josh Durgin @ 2013-03-26 8:30 UTC (permalink / raw) To: Oliver Francke; +Cc: ceph-devel@vger.kernel.org On 03/25/2013 03:04 AM, Oliver Francke wrote: > Hi josh, > > logfile is attached... Thanks. It shows nothing out of the ordinary, but I just reproduced the incorrect rollback locally, so it shouldn't be hard to track down from here. I opened http://tracker.ceph.com/issues/4551 to track it. Josh > On 03/22/2013 08:30 PM, Josh Durgin wrote: >> On 03/22/2013 12:09 PM, Oliver Francke wrote: >>> Hi Josh, all, >>> >>> I did not want to hijack the thread dealing with a crashing VM, but >>> perhaps there are some common things. >>> >>> Today I installed a fresh cluster with mkephfs, went fine, imported a >>> "master" debian 6.0 image with "format 2", made a snapshot, protected >>> it, and made some clones. >>> Clones mounted with qemu-nbd, fiddled a bit with >>> IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, >>> took 2 secs and the VM was up n running. Cool. >>> >>> Now an ordinary shutdown was performed, made a snapshot of this >>> image. Started again, did some "apt-get update… install s/t…". >>> Shutdown -> rbd rollback -> startup again -> login -> install s/t >>> else… filesystem showed "many" ex3-errors, fell into read-only mode, >>> massive corruption. >> >> This sounds like it might be a bug in rollback. Could you try cloning >> and snapshotting again, but export the image before booting, and after >> rolling back, and compare the md5sums? > > Done, first MD5-mismatch after 32 4MB blocks, checked with dd and a bs > of 4MB. > >> >> Running the rollback with: >> >> --debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log >> >> might help too. Does your ceph.conf where you ran the rollback have >> anything related to rbd_cache in it? > > No cache settings in global ceph.conf. > > Hope it helps, > > Oliver. > >> >>> qemu config was with ":rbd_cache=false" if it matters. Above scenario >>> is reproducible, and as I stated out, no crash detected. >>> >>> Perhaps it is in the same area as in the crash-thread, otherwise I >>> will provide logfiles as needed. >> >> It's unrelated, the other thread is an issue with the cache, which does >> not cause corruption but triggers a crash. >> >> Josh >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing 2013-03-26 8:30 ` Josh Durgin @ 2013-03-26 8:33 ` Oliver Francke 0 siblings, 0 replies; 5+ messages in thread From: Oliver Francke @ 2013-03-26 8:33 UTC (permalink / raw) To: Josh Durgin; +Cc: ceph-devel@vger.kernel.org Hi Josh, thanks for the quick response and... On 03/26/2013 09:30 AM, Josh Durgin wrote: > On 03/25/2013 03:04 AM, Oliver Francke wrote: >> Hi josh, >> >> logfile is attached... > > Thanks. It shows nothing out of the ordinary, but I just reproduced the > incorrect rollback locally, so it shouldn't be hard to track down from > here. > > I opened http://tracker.ceph.com/issues/4551 to track it. the good news. Oliver. > > Josh > >> On 03/22/2013 08:30 PM, Josh Durgin wrote: >>> On 03/22/2013 12:09 PM, Oliver Francke wrote: >>>> Hi Josh, all, >>>> >>>> I did not want to hijack the thread dealing with a crashing VM, but >>>> perhaps there are some common things. >>>> >>>> Today I installed a fresh cluster with mkephfs, went fine, imported a >>>> "master" debian 6.0 image with "format 2", made a snapshot, protected >>>> it, and made some clones. >>>> Clones mounted with qemu-nbd, fiddled a bit with >>>> IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, >>>> took 2 secs and the VM was up n running. Cool. >>>> >>>> Now an ordinary shutdown was performed, made a snapshot of this >>>> image. Started again, did some "apt-get update… install s/t…". >>>> Shutdown -> rbd rollback -> startup again -> login -> install s/t >>>> else… filesystem showed "many" ex3-errors, fell into read-only mode, >>>> massive corruption. >>> >>> This sounds like it might be a bug in rollback. Could you try cloning >>> and snapshotting again, but export the image before booting, and after >>> rolling back, and compare the md5sums? >> >> Done, first MD5-mismatch after 32 4MB blocks, checked with dd and a bs >> of 4MB. >> >>> >>> Running the rollback with: >>> >>> --debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log >>> >>> might help too. Does your ceph.conf where you ran the rollback have >>> anything related to rbd_cache in it? >> >> No cache settings in global ceph.conf. >> >> Hope it helps, >> >> Oliver. >> >>> >>>> qemu config was with ":rbd_cache=false" if it matters. Above scenario >>>> is reproducible, and as I stated out, no crash detected. >>>> >>>> Perhaps it is in the same area as in the crash-thread, otherwise I >>>> will provide logfiles as needed. >>> >>> It's unrelated, the other thread is an issue with the cache, which does >>> not cause corruption but triggers a crash. >>> >>> Josh >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- Oliver Francke filoo GmbH Moltkestraße 25a 33330 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing @ 2013-03-22 19:09 Oliver Francke 0 siblings, 0 replies; 5+ messages in thread From: Oliver Francke @ 2013-03-22 19:09 UTC (permalink / raw) To: ceph-devel@vger.kernel.org; +Cc: josh.durgin@inktank.com Durgin Hi Josh, all, I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things. Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones. Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool. Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…". Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption. qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected. Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed. Kind regards, Oliver. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-03-26 8:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-22 19:09 Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing Oliver Francke
2013-03-22 19:30 ` Josh Durgin
[not found] ` <51502118.7060906@filoo.de>
2013-03-26 8:30 ` Josh Durgin
2013-03-26 8:33 ` Oliver Francke
-- strict thread matches above, loose matches on Subject: below --
2013-03-22 19:09 Oliver Francke
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.