From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: status of userspace release
Date: Fri, 2 Nov 2012 13:59:23 -0500 [thread overview]
Message-ID: <20121102185923.GG9783@sgi.com> (raw)
In-Reply-To: <20121102055102.GY29378@dastard>
Hi Dave,
On Fri, Nov 02, 2012 at 04:51:02PM +1100, Dave Chinner wrote:
> On Thu, Oct 25, 2012 at 10:15:01AM -0500, Ben Myers wrote:
> > Hi Folks,
> >
> > We're working toward a userspace release this month. There are several patches
> > that need to go in first, including backing out the xfsdump format version bump
> > from Eric, fixes for the makefiles from Mike, and the Polish language update
> > for xfsdump from Jakub. If anyone knows of something else we need, now is the
> > time to flame about it. I will take a look around for other important patches
> > too.
> >
> > This time I'm going to tag an -rc1 (probably later today or tomorrow). We'll
> > give everyone a few working days to do a final test and/or pipe up if we have
> > missed something important. Then if all goes well we'll cut the release next
> > Tuesday.
>
> I think that dump/restore need more work/testing.
Sounds good. AFAIK there is no blazing hurry to release immediately.
> I've already pointed Eric to the header checksum failures (forkoff
> patch being needed), and that fixes the failures I've been seeing on
> normal xfstests runs.
I've pulled that patch in. Interesting that it doesn't reproduce on i586 but
is so reliable on x86_64. It's a good excuse to do some testing on a wider set
of arches before the release.
> Running some large filesystem testing, however, I see more problems.
> I'm using a 17TB filesytsem and the --largefs patch series. This
> results in a futex hang in 059 like so:
>
> [ 4770.007858] xfsrestore S ffff88021fc52d40 5504 3926 3487 0x00000000
> [ 4770.007858] ffff880212ea9c68 0000000000000082 ffff880207830140 ffff880212ea9fd8
> [ 4770.007858] ffff880212ea9fd8 ffff880212ea9fd8 ffff880216cec2c0 ffff880207830140
> [ 4770.007858] ffff880212ea9d08 ffff880212ea9d58 ffff880207830140 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff8113acf7>] ? __free_pages+0x47/0x70
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
> [ 4770.007858] xfsrestore S ffff88021fc52d40 5656 3927 3487 0x00000000
> [ 4770.007858] ffff880208f29c68 0000000000000082 ffff880208f84180 ffff880208f29fd8
> [ 4770.007858] ffff880208f29fd8 ffff880208f29fd8 ffff880216cec2c0 ffff880208f84180
> [ 4770.007858] ffff880208f29d08 ffff880208f29d58 ffff880208f84180 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
> [ 4770.007858] xfsrestore S ffff88021fc92d40 5848 3928 3487 0x00000000
> [ 4770.007858] ffff880212d0dc68 0000000000000082 ffff880208e76240 ffff880212d0dfd8
> [ 4770.007858] ffff880212d0dfd8 ffff880212d0dfd8 ffff880216cf2300 ffff880208e76240
> [ 4770.007858] ffff880212d0dd08 ffff880212d0dd58 ffff880208e76240 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
>
> I can't reliably reproduce it at this point, but there does appear
> to be some kind of locking problem in the multistream support.
One of my machines hit this overnight without --largefs. I wasn't able to get
a dump though. Just another data point.
> Speaking of which, most large filesystems dump/restore tests are
> failing because of this output:
>
> 026 20s ... - output mismatch (see 026.out.bad)
> --- 026.out 2012-10-05 11:37:51.000000000 +1000
> +++ 026.out.bad 2012-11-02 16:20:17.000000000 +1100
> @@ -20,6 +20,7 @@
> xfsdump: media file size NUM bytes
> xfsdump: dump size (non-dir files) : NUM bytes
> xfsdump: dump complete: SECS seconds elapsed
> +xfsdump: stream 0 DUMP_FILE OK (success)
> xfsdump: Dump Status: SUCCESS
> Restoring from file...
> xfsrestore -f DUMP_FILE -L stress_026 RESTORE_DIR
> @@ -32,6 +33,7 @@
> xfsrestore: directory post-processing
> xfsrestore: restoring non-directory files
> xfsrestore: restore complete: SECS seconds elapsed
> +xfsrestore: stream 0 DUMP_FILE OK (success)
> xfsrestore: Restore Status: SUCCESS
> Comparing dump directory with restore directory
> Files DUMP_DIR/big and RESTORE_DIR/DUMP_SUBDIR/big are identical
>
> Which looks like output from the multistream code. Why it is
> emitting this for large filesystem testing and not for small
> filesystems, I'm not sure yet.
>
> In fact, with --largefs, I see this for the dump group:
>
> Failures: 026 028 046 047 056 059 060 061 063 064 065 066 266 281
> 282 283
> Failed 16 of 19 tests
>
> And this for the normal sized (10GB) scratch device:
>
> Passed all 18 tests
>
> So there's something funky going on here....
Rich also reported some golden output related changes with --largefs awhile
back. I don't think he saw this one though.
The TODO list for userspace release currently stands at:
1) fix the header checksum failures... which is resolved
2) fix a futex hang in 059
3) fix the golden output changes related to multistream support in xfsdump
and --largefs
4) test on more platforms
Regards,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-11-02 18:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-25 15:15 status of userspace release Ben Myers
2012-10-26 21:57 ` Ben Myers
2012-10-28 21:27 ` Dave Chinner
2012-10-29 16:17 ` Ben Myers
2012-11-02 5:51 ` Dave Chinner
2012-11-02 18:59 ` Ben Myers [this message]
2012-11-02 23:03 ` Dave Chinner
2012-11-03 0:16 ` Dave Chinner
2012-11-03 1:35 ` Eric Sandeen
2012-11-03 1:55 ` Dave Chinner
2012-11-03 3:16 ` Dave Chinner
2012-11-03 1:53 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121102185923.GG9783@sgi.com \
--to=bpm@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox