public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: status of userspace release
Date: Fri, 2 Nov 2012 16:51:02 +1100	[thread overview]
Message-ID: <20121102055102.GY29378@dastard> (raw)
In-Reply-To: <20121025151501.GV1377@sgi.com>

On Thu, Oct 25, 2012 at 10:15:01AM -0500, Ben Myers wrote:
> Hi Folks,
> 
> We're working toward a userspace release this month.  There are several patches
> that need to go in first, including backing out the xfsdump format version bump
> from Eric, fixes for the makefiles from Mike, and the Polish language update
> for xfsdump from Jakub.  If anyone knows of something else we need, now is the
> time to flame about it.  I will take a look around for other important patches
> too.
> 
> This time I'm going to tag an -rc1 (probably later today or tomorrow).  We'll
> give everyone a few working days to do a final test and/or pipe up if we have
> missed something important.  Then if all goes well we'll cut the release next
> Tuesday.

I think that dump/restore need more work/testing. I've just been
running with whatever xfsdump I have had installed on my test
machines for some time. I think I was the 3.0.6 - whatever is in the
current debian unstable repository - or some version of 3.1.0 that I
built a while back.

I've already pointed Eric to the header checksum failures (forkoff
patch being needed), and that fixes the failures I've been seeing on
normal xfstests runs.

Running some large filesystem testing, however, I see more problems.
I'm using a 17TB filesytsem and the --largefs patch series. This
results in a futex hang in 059 like so:

[ 4770.007858] xfsrestore      S ffff88021fc52d40  5504  3926   3487 0x00000000
[ 4770.007858]  ffff880212ea9c68 0000000000000082 ffff880207830140 ffff880212ea9fd8
[ 4770.007858]  ffff880212ea9fd8 ffff880212ea9fd8 ffff880216cec2c0 ffff880207830140
[ 4770.007858]  ffff880212ea9d08 ffff880212ea9d58 ffff880207830140 0000000000000000
[ 4770.007858] Call Trace:
[ 4770.007858]  [<ffffffff81b8a009>] schedule+0x29/0x70
[ 4770.007858]  [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
[ 4770.007858]  [<ffffffff810db809>] futex_wait+0x189/0x290
[ 4770.007858]  [<ffffffff8113acf7>] ? __free_pages+0x47/0x70
[ 4770.007858]  [<ffffffff810dd41c>] do_futex+0x11c/0xa80
[ 4770.007858]  [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
[ 4770.007858]  [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
[ 4770.007858]  [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
[ 4770.007858]  [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
[ 4770.007858]  [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
[ 4770.007858]  [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
[ 4770.007858] xfsrestore      S ffff88021fc52d40  5656  3927   3487 0x00000000
[ 4770.007858]  ffff880208f29c68 0000000000000082 ffff880208f84180 ffff880208f29fd8
[ 4770.007858]  ffff880208f29fd8 ffff880208f29fd8 ffff880216cec2c0 ffff880208f84180
[ 4770.007858]  ffff880208f29d08 ffff880208f29d58 ffff880208f84180 0000000000000000
[ 4770.007858] Call Trace:
[ 4770.007858]  [<ffffffff81b8a009>] schedule+0x29/0x70
[ 4770.007858]  [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
[ 4770.007858]  [<ffffffff810db809>] futex_wait+0x189/0x290
[ 4770.007858]  [<ffffffff810dd41c>] do_futex+0x11c/0xa80
[ 4770.007858]  [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
[ 4770.007858]  [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
[ 4770.007858]  [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
[ 4770.007858]  [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
[ 4770.007858]  [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
[ 4770.007858]  [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
[ 4770.007858] xfsrestore      S ffff88021fc92d40  5848  3928   3487 0x00000000
[ 4770.007858]  ffff880212d0dc68 0000000000000082 ffff880208e76240 ffff880212d0dfd8
[ 4770.007858]  ffff880212d0dfd8 ffff880212d0dfd8 ffff880216cf2300 ffff880208e76240
[ 4770.007858]  ffff880212d0dd08 ffff880212d0dd58 ffff880208e76240 0000000000000000
[ 4770.007858] Call Trace:
[ 4770.007858]  [<ffffffff81b8a009>] schedule+0x29/0x70
[ 4770.007858]  [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
[ 4770.007858]  [<ffffffff810db809>] futex_wait+0x189/0x290
[ 4770.007858]  [<ffffffff810dd41c>] do_futex+0x11c/0xa80
[ 4770.007858]  [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
[ 4770.007858]  [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
[ 4770.007858]  [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
[ 4770.007858]  [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
[ 4770.007858]  [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
[ 4770.007858]  [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b

I can't reliably reproduce it at this point, but there does appear
to be some kind of locking problem in the multistream support.

Speaking of which, most large filesystems dump/restore tests are
failing because of this output:

026 20s ... - output mismatch (see 026.out.bad)
--- 026.out     2012-10-05 11:37:51.000000000 +1000
+++ 026.out.bad 2012-11-02 16:20:17.000000000 +1100
@@ -20,6 +20,7 @@
 xfsdump: media file size NUM bytes
 xfsdump: dump size (non-dir files) : NUM bytes
 xfsdump: dump complete: SECS seconds elapsed
+xfsdump:   stream 0 DUMP_FILE OK (success)
 xfsdump: Dump Status: SUCCESS
 Restoring from file...
 xfsrestore  -f DUMP_FILE  -L stress_026 RESTORE_DIR
@@ -32,6 +33,7 @@
 xfsrestore: directory post-processing
 xfsrestore: restoring non-directory files
 xfsrestore: restore complete: SECS seconds elapsed
+xfsrestore:   stream 0 DUMP_FILE OK (success)
 xfsrestore: Restore Status: SUCCESS
 Comparing dump directory with restore directory
 Files DUMP_DIR/big and RESTORE_DIR/DUMP_SUBDIR/big are identical

Which looks like output from the multistream code. Why it is
emitting this for large filesystem testing and not for small
filesystems, I'm not sure yet. 

In fact, with --largefs, I see this for the dump group:

Failures: 026 028 046 047 056 059 060 061 063 064 065 066 266 281
282 283
Failed 16 of 19 tests

And this for the normal sized (10GB) scratch device:

Passed all 18 tests

So there's something funky going on here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-11-02  5:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-25 15:15 status of userspace release Ben Myers
2012-10-26 21:57 ` Ben Myers
2012-10-28 21:27   ` Dave Chinner
2012-10-29 16:17     ` Ben Myers
2012-11-02  5:51 ` Dave Chinner [this message]
2012-11-02 18:59   ` Ben Myers
2012-11-02 23:03     ` Dave Chinner
2012-11-03  0:16       ` Dave Chinner
2012-11-03  1:35         ` Eric Sandeen
2012-11-03  1:55           ` Dave Chinner
2012-11-03  3:16         ` Dave Chinner
2012-11-03  1:53       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121102055102.GY29378@dastard \
    --to=david@fromorbit.com \
    --cc=bpm@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox