From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: BUG: ext3 corruption in domU Date: Wed, 22 May 2013 16:10:44 -0400 Message-ID: <20130522201044.GA12372@phenom.dumpdata.com> References: <1366203601.25579.24.camel@zakaz.uk.xensource.com> <1366633594.22143.60.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1366633594.22143.60.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Anthony Sheetz , Roger Pau Monne , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote: > Konrad is on vacation this week, so it'll probably be next week before > this gets looked at by him. And I finally got to this email in my 'vacation-mbox' > > Ian. > > On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote: > > I realize folks are pretty busy, but we're still interested in getting > > this problem solved, and I want to be sure it's not lost in the > > shuffle. > > Any chance of getting some attention for it? > > > > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell wrote: > > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote: > > >> (re-sending, first message seems to have gotten lost) > > >> > > >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org. > > > > > > I'm here too (different hat ;-)), thanks for posting it here. I've added > > > some people who know about the block stuff to the CC. > > > > > > Guys, my suspicion is that the issue is that barriers issued by ext3 > > > inside the guest aren't making it all the way down the > > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the > > > filesystem to eventually corrupt itself. > > > > > > The issue seems to relate to the use of dm-crypt since > > > ext3->blkfront->blkback->lvm->disk is reported work fine. > > > > > > However there is no problem with the local dom0 ext3 root filesystem > > > which is also in the same lvm VG on the crypt device (i.e. > > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure > > > something is up at the blkfront->back link which causes the barriers > > > which blkback is injecting into the block subsystem either don't make it > > > to the dm-crypt layer or do not DTRT once they arrive. > > > > > > I'm not really sure with how to proceed (or how to ask Anthony to > > > proceed) with verifying any part of that hypothesis though. > > > > > > ISTR issues with old vs new style barriers or barriers with no data in > > > them or something, could this be related to that? (or am I thinking of > > > DISCARD?) You are using two different kernel versions. The 2.6.32 domU is only using WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated. The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch: ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b Author: Konrad Rzeszutek Wilk Date: Mon Oct 10 00:42:22 2011 -0400 xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests. which emulates the barrier request by draining all of the oustanding I/Os and then sending the WRITE_FLUSH. But it looks like you are hitting an issue here. Just to make sure that is the case, what happens if you use the _same_ kernel in both dom0 and domU? Does it work then? > > > > > > The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU > > > on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with > > > Wheezy on Wheezy now so this isn't cross version confusion about barrier > > > semantics AFAICT. > > > > > > Ian. > > > > > >> First, I'm happy to provide more information about this bug as > > >> requsted. I recognize not all relevant data has > > >> been collected yet. > > >> > > >> Detailed information about this bug can be found at > > >> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124. > > >> > > >> The executive summary is: Using Debian Testing (7.0, wheezy) dom0 with > > >> LVM and full disk encryption with > > >> Debian Stable (6.0, Squeeze) domU, transferring large files via scp or > > >> rsync over openswan results in data corruption, with > > >> eventual file system corruption. The culprit appears to be full disk > > >> encryption, however that evidence may not be conclusive. > > >> > > >> While I don't mind providing additional information, I'd hate to have > > >> to repeat the information I've provided to the Debian bug hunting > > >> folks. > > >> > > >> Thanks in advance for any help you can provide. > > >> > > >> _______________________________________________ > > >> Xen-devel mailing list > > >> Xen-devel@lists.xen.org > > >> http://lists.xen.org/xen-devel > > > > > > > >