public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Christopher S. Aker" <caker@theshore.net>
To: linux-kernel@vger.kernel.org
Cc: xen devel <xen-devel@lists.xensource.com>
Subject: ext3 directory corruption under Xen
Date: Mon, 23 Jun 2008 12:15:33 -0400	[thread overview]
Message-ID: <485FCC25.7090401@theshore.net> (raw)

We've been seeing a rash of ext3 directory corruption occurring under 
Xen.  All but one of the reports have been with filesystems formatted 
with 1024 blocksize.  We have one report, that's potentialy the same 
bug, occurring on a filesystem with 4096 blocksize (either way, it was 
some type of corruption in that case).  In all cases, the filesystems 
were mounted with ext3's default journaling mode.  No quotas or anything 
else other than the default ext3 mount options.

It's happened on a number of different hosts, all of the same hardware 
and software configuration (Xen 3.2 64bit, 32bit pae dom0, 32bit pae 
domUs.  LVM backend with 3ware hardware RAID-1).  Some of those hosts 
were previously running non-virtaulized Linux and UML, using the 
identical guest images, and under that configuration never experienced 
this problem.

This has occurred under both 2.6.18-xenbits and the more recent pv_ops 
based kernels (2.6.24, 2.6.25), which I presume are all using the same 
blkfront driver code.

The common workloads from the reports seems to be active maildirs and rsync.

The initial errors reported back are all from fs/ext3/dir.c, in 
ext3_check_dir_entry(). Most commonly hit is the "rec_len % 4 != 0" 
check.  We've seen other checks trigger, but my assumption is that those 
happen after more stuff gets whacked out.

Eventually the fs will go read-only.  In extreme cases, the fs is chewed 
through enough that data is lost.

It's tricky to track down the trigger because you can only detect the 
corruption after it's happened.  Our attempts to reproduce this using 
various filesystem thrashing scripts haven't yielded a reliable way to 
trigger it, however we have been successful in triggering it twice -- in 
two weeks :( .

My hope is that this triggers an "a-hah" from someone in LKML or Xen 
land who has experience with this code, or that this is a known issue 
and a fix already lives.

We're scared.  Please help.

Thanks,
-Chris


             reply	other threads:[~2008-06-23 16:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-23 16:15 Christopher S. Aker [this message]
2008-06-23 19:13 ` ext3 directory corruption under Xen adam radford
2008-06-23 19:53   ` Christopher S. Aker
2008-06-23 23:08     ` adam radford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=485FCC25.7090401@theshore.net \
    --to=caker@theshore.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox