public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Yann Dupont <Yann.Dupont@univ-nantes.fr>
Cc: xfs@oss.sgi.com
Subject: Re: Problems with kernel 3.6.x (vm ?) (was : Is kernel 3.6.1 or filestreams option toxic ?)
Date: Mon, 29 Oct 2012 23:18:52 +1100	[thread overview]
Message-ID: <20121029121851.GQ29378@dastard> (raw)
In-Reply-To: <20121029012540.GO29378@dastard>

On Mon, Oct 29, 2012 at 12:25:40PM +1100, Dave Chinner wrote:
> On Mon, Oct 29, 2012 at 10:48:02AM +1100, Dave Chinner wrote:
> > On Sat, Oct 27, 2012 at 12:05:34AM +0200, Yann Dupont wrote:
> > > Le 26/10/2012 12:03, Yann Dupont a écrit :
> > > >Le 25/10/2012 23:10, Dave Chinner a écrit :
> > > - mkfs.xfs on it, with default options
> > > - mounted with default options
> > > - launch something that hammers this volume. I launched compilebench
> > > 0.6  on it
> > > - wait some time to fill memory,buffers, and be sure your disks are
> > > really busy. I waited some minutes after the initial 30 kernel
> > > unpacking in compilebench
> > > - hard reset the server (I'm using the Idrac of the server to
> > > generate a power cycle)
> > > - After some try, I finally had the impossibility to mount the xfs
> > > volume, with the error reported in previous mails. So far this is
> > > normal .
> > 
> > So it doesn't happen every time, and it may be power cycle related.
> > What is your "local disk"?
> 
> I can't reproduce this with a similar setup but using KVM (i.e.
> killing the VM instead of power cycling) or forcing a shutdown of
> the filesystem without flushing the log. The second case is very
> much the same as power cycling, but without the potential "power
> failure caused partial IOs to be written" problem.
> 
> The only thing I can see in the logprint that I haven't seen so far
> in my testing is that your log print indicates a checkpoint that
> wraps the end of the log. I haven't yet hit that situation by
> chance, so I'll keep trying to see if that's the case that is
> causing the problem....

Well, it's taken about 12 hours of random variation of parameters
in the loop of:

mkfs.xfs -f /dev/vdb
mount /dev/vdb /mnt/scratch
./compilebench -D /mnt/scratch &
sleep <some period>
/home/dave/src/xfstests-dev/src/godown /mnt/scratch
sleep 5
umount /mnt/scratch
xfs_logprint -d /dev/vdb

To get a log with a wrapped checkpoint to occur. That was with <some
period> equal to 36s. In all that time, I hadn't seen a single log
mount failure, and the moment I get a wrapped log:

1917 HEADER Cycle 10 tail 9:018456 len  32256 ops 468
1981 HEADER Cycle 10 tail 9:018456 len  32256 ops 427
            ^^^^^^^^^^^^^^^
[00000 - 02045] Cycle 0x0000000a New Cycle 0x00000009

[  368.364232] XFS (vdb): Mounting Filesystem
[  369.096144] XFS (vdb): Starting recovery (logdev: internal)
[  369.126545] XFS (vdb): xlog_recover_process_data: bad clientid 0x2c
[  369.129522] XFS (vdb): log mount/recovery failed: error 5
[  369.131884] XFS (vdb): log mount failed

Ok, so no LVM, no power failure involved, etc. Dig deeper. Let's see
if logprint can dump the transactional record of the log:

# xfs_logprint -f log.img -t
.....
LOG REC AT LSN cycle 9 block 20312 (0x9, 0x4f58)

LOG REC AT LSN cycle 9 block 20376 (0x9, 0x4f98)
xfs_logprint: failed in xfs_do_recovery_pass, error: 12288
#

Ok, xfs_logprint failed to decode the wrapped transaction at the end
of the log. I can't see anything obviously wrong with the contents
of the log off the top of my head (logprint is notoriously buggy),
but the above command can reproduce the problem (3 out of 3 so far),
so I should be able to track down the bug from this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-10-29 12:17 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 14:14 Is kernel 3.6.1 or filestreams option toxic ? Yann Dupont
2012-10-23  8:24 ` Problems with kernel 3.6.x (vm ?) (was : Is kernel 3.6.1 or filestreams option toxic ?) Yann Dupont
2012-10-25 15:21   ` Yann Dupont
2012-10-25 20:55     ` Yann Dupont
2012-10-25 21:10     ` Dave Chinner
2012-10-26 10:03       ` Yann Dupont
2012-10-26 22:05         ` Yann Dupont
2012-10-28 23:48           ` Dave Chinner
2012-10-29  1:25             ` Dave Chinner
2012-10-29  8:11               ` Yann Dupont
2012-10-29 12:21                 ` Dave Chinner
2012-10-29 12:18               ` Dave Chinner [this message]
2012-10-29 12:43                 ` Yann Dupont
2012-10-30  1:33                   ` Dave Chinner
2012-10-31 11:45                     ` Gaudenz Steinlin
2012-11-05 13:57                     ` Yann Dupont
2012-10-29  8:07             ` Yann Dupont
2012-10-29  8:17               ` Yann Dupont
  -- strict thread matches above, loose matches on Subject: below --
2012-11-28  9:39 reste donewell
2012-11-28 20:37 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121029121851.GQ29378@dastard \
    --to=david@fromorbit.com \
    --cc=Yann.Dupont@univ-nantes.fr \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox