From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: status of userspace release
Date: Fri, 2 Nov 2012 13:59:23 -0500 [thread overview]
Message-ID: <20121102185923.GG9783@sgi.com> (raw)
In-Reply-To: <20121102055102.GY29378@dastard>
Hi Dave,
On Fri, Nov 02, 2012 at 04:51:02PM +1100, Dave Chinner wrote:
> On Thu, Oct 25, 2012 at 10:15:01AM -0500, Ben Myers wrote:
> > Hi Folks,
> >
> > We're working toward a userspace release this month. There are several patches
> > that need to go in first, including backing out the xfsdump format version bump
> > from Eric, fixes for the makefiles from Mike, and the Polish language update
> > for xfsdump from Jakub. If anyone knows of something else we need, now is the
> > time to flame about it. I will take a look around for other important patches
> > too.
> >
> > This time I'm going to tag an -rc1 (probably later today or tomorrow). We'll
> > give everyone a few working days to do a final test and/or pipe up if we have
> > missed something important. Then if all goes well we'll cut the release next
> > Tuesday.
>
> I think that dump/restore need more work/testing.
Sounds good. AFAIK there is no blazing hurry to release immediately.
> I've already pointed Eric to the header checksum failures (forkoff
> patch being needed), and that fixes the failures I've been seeing on
> normal xfstests runs.
I've pulled that patch in. Interesting that it doesn't reproduce on i586 but
is so reliable on x86_64. It's a good excuse to do some testing on a wider set
of arches before the release.
> Running some large filesystem testing, however, I see more problems.
> I'm using a 17TB filesytsem and the --largefs patch series. This
> results in a futex hang in 059 like so:
>
> [ 4770.007858] xfsrestore S ffff88021fc52d40 5504 3926 3487 0x00000000
> [ 4770.007858] ffff880212ea9c68 0000000000000082 ffff880207830140 ffff880212ea9fd8
> [ 4770.007858] ffff880212ea9fd8 ffff880212ea9fd8 ffff880216cec2c0 ffff880207830140
> [ 4770.007858] ffff880212ea9d08 ffff880212ea9d58 ffff880207830140 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff8113acf7>] ? __free_pages+0x47/0x70
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
> [ 4770.007858] xfsrestore S ffff88021fc52d40 5656 3927 3487 0x00000000
> [ 4770.007858] ffff880208f29c68 0000000000000082 ffff880208f84180 ffff880208f29fd8
> [ 4770.007858] ffff880208f29fd8 ffff880208f29fd8 ffff880216cec2c0 ffff880208f84180
> [ 4770.007858] ffff880208f29d08 ffff880208f29d58 ffff880208f84180 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
> [ 4770.007858] xfsrestore S ffff88021fc92d40 5848 3928 3487 0x00000000
> [ 4770.007858] ffff880212d0dc68 0000000000000082 ffff880208e76240 ffff880212d0dfd8
> [ 4770.007858] ffff880212d0dfd8 ffff880212d0dfd8 ffff880216cf2300 ffff880208e76240
> [ 4770.007858] ffff880212d0dd08 ffff880212d0dd58 ffff880208e76240 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
>
> I can't reliably reproduce it at this point, but there does appear
> to be some kind of locking problem in the multistream support.
One of my machines hit this overnight without --largefs. I wasn't able to get
a dump though. Just another data point.
> Speaking of which, most large filesystems dump/restore tests are
> failing because of this output:
>
> 026 20s ... - output mismatch (see 026.out.bad)
> --- 026.out 2012-10-05 11:37:51.000000000 +1000
> +++ 026.out.bad 2012-11-02 16:20:17.000000000 +1100
> @@ -20,6 +20,7 @@
> xfsdump: media file size NUM bytes
> xfsdump: dump size (non-dir files) : NUM bytes
> xfsdump: dump complete: SECS seconds elapsed
> +xfsdump: stream 0 DUMP_FILE OK (success)
> xfsdump: Dump Status: SUCCESS
> Restoring from file...
> xfsrestore -f DUMP_FILE -L stress_026 RESTORE_DIR
> @@ -32,6 +33,7 @@
> xfsrestore: directory post-processing
> xfsrestore: restoring non-directory files
> xfsrestore: restore complete: SECS seconds elapsed
> +xfsrestore: stream 0 DUMP_FILE OK (success)
> xfsrestore: Restore Status: SUCCESS
> Comparing dump directory with restore directory
> Files DUMP_DIR/big and RESTORE_DIR/DUMP_SUBDIR/big are identical
>
> Which looks like output from the multistream code. Why it is
> emitting this for large filesystem testing and not for small
> filesystems, I'm not sure yet.
>
> In fact, with --largefs, I see this for the dump group:
>
> Failures: 026 028 046 047 056 059 060 061 063 064 065 066 266 281
> 282 283
> Failed 16 of 19 tests
>
> And this for the normal sized (10GB) scratch device:
>
> Passed all 18 tests
>
> So there's something funky going on here....
Rich also reported some golden output related changes with --largefs awhile
back. I don't think he saw this one though.
The TODO list for userspace release currently stands at:
1) fix the header checksum failures... which is resolved
2) fix a futex hang in 059
3) fix the golden output changes related to multistream support in xfsdump
and --largefs
4) test on more platforms
Regards,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-11-02 18:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-25 15:15 status of userspace release Ben Myers
2012-10-26 21:57 ` Ben Myers
2012-10-28 21:27 ` Dave Chinner
2012-10-29 16:17 ` Ben Myers
2012-11-02 5:51 ` Dave Chinner
2012-11-02 18:59 ` Ben Myers [this message]
2012-11-02 23:03 ` Dave Chinner
2012-11-03 0:16 ` Dave Chinner
2012-11-03 1:35 ` Eric Sandeen
2012-11-03 1:55 ` Dave Chinner
2012-11-03 3:16 ` Dave Chinner
2012-11-03 1:53 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121102185923.GG9783@sgi.com \
--to=bpm@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.