From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: status of userspace release
Date: Fri, 2 Nov 2012 16:51:02 +1100 [thread overview]
Message-ID: <20121102055102.GY29378@dastard> (raw)
In-Reply-To: <20121025151501.GV1377@sgi.com>
On Thu, Oct 25, 2012 at 10:15:01AM -0500, Ben Myers wrote:
> Hi Folks,
>
> We're working toward a userspace release this month. There are several patches
> that need to go in first, including backing out the xfsdump format version bump
> from Eric, fixes for the makefiles from Mike, and the Polish language update
> for xfsdump from Jakub. If anyone knows of something else we need, now is the
> time to flame about it. I will take a look around for other important patches
> too.
>
> This time I'm going to tag an -rc1 (probably later today or tomorrow). We'll
> give everyone a few working days to do a final test and/or pipe up if we have
> missed something important. Then if all goes well we'll cut the release next
> Tuesday.
I think that dump/restore need more work/testing. I've just been
running with whatever xfsdump I have had installed on my test
machines for some time. I think I was the 3.0.6 - whatever is in the
current debian unstable repository - or some version of 3.1.0 that I
built a while back.
I've already pointed Eric to the header checksum failures (forkoff
patch being needed), and that fixes the failures I've been seeing on
normal xfstests runs.
Running some large filesystem testing, however, I see more problems.
I'm using a 17TB filesytsem and the --largefs patch series. This
results in a futex hang in 059 like so:
[ 4770.007858] xfsrestore S ffff88021fc52d40 5504 3926 3487 0x00000000
[ 4770.007858] ffff880212ea9c68 0000000000000082 ffff880207830140 ffff880212ea9fd8
[ 4770.007858] ffff880212ea9fd8 ffff880212ea9fd8 ffff880216cec2c0 ffff880207830140
[ 4770.007858] ffff880212ea9d08 ffff880212ea9d58 ffff880207830140 0000000000000000
[ 4770.007858] Call Trace:
[ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
[ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
[ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
[ 4770.007858] [<ffffffff8113acf7>] ? __free_pages+0x47/0x70
[ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
[ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
[ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
[ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
[ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
[ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
[ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
[ 4770.007858] xfsrestore S ffff88021fc52d40 5656 3927 3487 0x00000000
[ 4770.007858] ffff880208f29c68 0000000000000082 ffff880208f84180 ffff880208f29fd8
[ 4770.007858] ffff880208f29fd8 ffff880208f29fd8 ffff880216cec2c0 ffff880208f84180
[ 4770.007858] ffff880208f29d08 ffff880208f29d58 ffff880208f84180 0000000000000000
[ 4770.007858] Call Trace:
[ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
[ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
[ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
[ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
[ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
[ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
[ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
[ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
[ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
[ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
[ 4770.007858] xfsrestore S ffff88021fc92d40 5848 3928 3487 0x00000000
[ 4770.007858] ffff880212d0dc68 0000000000000082 ffff880208e76240 ffff880212d0dfd8
[ 4770.007858] ffff880212d0dfd8 ffff880212d0dfd8 ffff880216cf2300 ffff880208e76240
[ 4770.007858] ffff880212d0dd08 ffff880212d0dd58 ffff880208e76240 0000000000000000
[ 4770.007858] Call Trace:
[ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
[ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
[ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
[ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
[ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
[ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
[ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
[ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
[ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
[ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
I can't reliably reproduce it at this point, but there does appear
to be some kind of locking problem in the multistream support.
Speaking of which, most large filesystems dump/restore tests are
failing because of this output:
026 20s ... - output mismatch (see 026.out.bad)
--- 026.out 2012-10-05 11:37:51.000000000 +1000
+++ 026.out.bad 2012-11-02 16:20:17.000000000 +1100
@@ -20,6 +20,7 @@
xfsdump: media file size NUM bytes
xfsdump: dump size (non-dir files) : NUM bytes
xfsdump: dump complete: SECS seconds elapsed
+xfsdump: stream 0 DUMP_FILE OK (success)
xfsdump: Dump Status: SUCCESS
Restoring from file...
xfsrestore -f DUMP_FILE -L stress_026 RESTORE_DIR
@@ -32,6 +33,7 @@
xfsrestore: directory post-processing
xfsrestore: restoring non-directory files
xfsrestore: restore complete: SECS seconds elapsed
+xfsrestore: stream 0 DUMP_FILE OK (success)
xfsrestore: Restore Status: SUCCESS
Comparing dump directory with restore directory
Files DUMP_DIR/big and RESTORE_DIR/DUMP_SUBDIR/big are identical
Which looks like output from the multistream code. Why it is
emitting this for large filesystem testing and not for small
filesystems, I'm not sure yet.
In fact, with --largefs, I see this for the dump group:
Failures: 026 028 046 047 056 059 060 061 063 064 065 066 266 281
282 283
Failed 16 of 19 tests
And this for the normal sized (10GB) scratch device:
Passed all 18 tests
So there's something funky going on here....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-11-02 5:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-25 15:15 status of userspace release Ben Myers
2012-10-26 21:57 ` Ben Myers
2012-10-28 21:27 ` Dave Chinner
2012-10-29 16:17 ` Ben Myers
2012-11-02 5:51 ` Dave Chinner [this message]
2012-11-02 18:59 ` Ben Myers
2012-11-02 23:03 ` Dave Chinner
2012-11-03 0:16 ` Dave Chinner
2012-11-03 1:35 ` Eric Sandeen
2012-11-03 1:55 ` Dave Chinner
2012-11-03 3:16 ` Dave Chinner
2012-11-03 1:53 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121102055102.GY29378@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox