public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Arkadiusz Bubała" <arkadiusz.bubala@open-e.com>
Cc: xfs@oss.sgi.com
Subject: Re: [BUG] Call trace during snapshot start/stop sequence
Date: Thu, 28 Nov 2013 09:19:23 +1100	[thread overview]
Message-ID: <20131127221923.GI10988@dastard> (raw)
In-Reply-To: <5295C307.6030804@open-e.com>

On Wed, Nov 27, 2013 at 11:01:43AM +0100, Arkadiusz Bubała wrote:
> Hello,
> 
> we're running test script that starts and stops
> snapshots in a loop while overfilling them.  After a few days of running
> system hangs. We've captured following call trace:
> 
> [116649.755761] XFS (dm-42): metadata I/O error: block 0xfa2b06
> ("xlog_iodone") error 5 buf count 1024
> [116649.947247] XFS (dm-42): Log I/O Error Detected.  Shutting down
> filesystem
> [116650.073881] XFS (dm-42): Please umount the filesystem and rectify
> the problem(s)

So, an EIO error on a log IO, resulting in a shutdown....

> [116650.207186] BUG: unable to handle kernel paging request at
> 00000000000010a8

That's an interesting offset - quite large for a null pointer
dereference.

> [116650.335185] IP: [<ffffffff8102e1d6>] __ticket_spin_lock+0x6/0x20
> [116650.451052] PGD 0
> [116650.518151] Oops: 0002 [#1] SMP
> [116650.599477] CPU 0
> [116650.622838] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O)
> drbd(O) twofish_x86_64 twofish_generic twofish_common
> serpent_sse2_x86_64 lrw xts gf1]
> [116651.479730]
> [116651.540674] Pid: 30173, comm: kworker/0:5 Tainted: G           O
> 3.4.63-oe64-00000-g1a33902 #38 Intel Corporation S1200BTL/S1200BTL

Running a custom built 3.4.63 kernel with a bunch of out of tree
modules installed. can you reproduce this on a vanilla 3.12 kernel?

> [116653.923833] Call Trace:
> [116653.995006]  [<ffffffff815f4b45>] ? _raw_spin_lock+0x5/0x10
> [116654.103462]  [<ffffffff812685f2>] ? xlog_state_done_syncing+0x32/0xc0
> [116654.221716]  [<ffffffff81051843>] ? process_one_work+0xf3/0x320
> [116654.333195]  [<ffffffff810534f2>] ? worker_thread+0xe2/0x280
> [116654.441031]  [<ffffffff81053410>] ? gcwq_mayday_timeout+0x80/0x80
> [116654.553512]  [<ffffffff8105776b>] ? kthread+0x9b/0xb0

Which is this line:

STATIC void
xlog_state_done_syncing(
        xlog_in_core_t  *iclog,
        int             aborted)
{
        struct xlog        *log = iclog->ic_log;

        spin_lock(&log->l_icloglock);

So, the icloglock is at offset 296 bytes into the struct xlog, and
the iclog structure is only 256 bytes in size itself, so that
structure offset is way outside anything the code should be trying
to access (ignoring the null pointer issue). Even if we assume that
the 0x1000 bit is a memory corruption, offset 0xa8 lands in a hole
in the struct xlog_in_core, and isin the middle of a bunch of log
size constants in the struct xlog (l_sectBBsize to be exact).

So this doesn't make much sense to me.

BTW, you should compile you kernels with frame pointers enabled so
that the kernel emits stack traces that can be trusted rather than
just dumping a list of symbols found on the stack...

> It looks like a race condition.

Looks more like memory corruption to me....

> Test script source:

I'll see if I can reproduce it locally.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-11-27 22:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27 10:01 [BUG] Call trace during snapshot start/stop sequence Arkadiusz Bubała
2013-11-27 22:19 ` Dave Chinner [this message]
2013-11-27 23:06   ` Dave Chinner
2013-11-28 10:00     ` Arkadiusz Bubała
2013-11-28 21:16       ` Dave Chinner
2013-12-05  8:36         ` Arkadiusz Bubała

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131127221923.GI10988@dastard \
    --to=david@fromorbit.com \
    --cc=arkadiusz.bubala@open-e.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox