From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/2 V4] Resubmit items failed during writeback
Date: Wed, 21 Jun 2017 06:15:26 -0400 [thread overview]
Message-ID: <20170621101526.GC28914@bfoster.bfoster> (raw)
In-Reply-To: <20170621004550.GF4733@birch.djwong.org>
On Tue, Jun 20, 2017 at 05:45:50PM -0700, Darrick J. Wong wrote:
> On Mon, Jun 19, 2017 at 02:51:11PM -0400, Brian Foster wrote:
> > On Mon, Jun 19, 2017 at 10:42:03AM -0700, Darrick J. Wong wrote:
> > > On Mon, Jun 19, 2017 at 09:51:22AM -0400, Brian Foster wrote:
> > > > On Fri, Jun 16, 2017 at 12:54:43PM +0200, Carlos Maiolino wrote:
> > > > > Hi,
> > > > >
> > > > > there goes a new version of this patchset based on previous reviews on V3.
> > > > >
> > > > > Changelogs added separated to each patch.
> > > > >
> > > >
> > > > Hi Carlos,
> > > >
> > > > I pointed out the last things that I'm aware of that I think need to be
> > > > fixed in this series (along with a few nits here and there). That said,
> > > > it's already been pointed out that we probably want an xfstests test
> > > > case to go along with this before it would be merged. Is that something
> > > > you are still planning on?
> > > >
> > > > I'd actually like to take this a bit farther than a regression test and
> > > > see if we can improve our ability to test the error handling code in
> > > > general. What do you (or anybody else..) think about including a patch
> > > > in this series that introduces the ability to inject metadata writeback
> > > > errors on DEBUG kernels? For example, consider something that just sets
> > > > b_error at the top of xfs_buf_iodone_callbacks() based on a random value
> > > > and configurable error frequency. This could use XFS_TEST_ERROR() or
> > > > something like a new DEBUG sysfs attribute in the error configuration
> > > > (see log_badcrc_factor for a similar example).
> > >
> > > Sounds reasonable.
> > >
> > > I wonder if it would be more useful to have individual knobs for each
> > > metadata object type so that you could have multiple xfstests, each of
> > > which runs the same software scenario but with different failures
> > > configured? Then we could test what happens when, say, only inode
> > > writes fail, or bmbt writes fail, etc. rather than one big hammer that's
> > > harder to control?
> > >
> >
> > I suppose you could do some of that in the test just by making certain
> > modifications in isolation (e.g., test an inode modification, test a
> > dquot, buffer, etc..). It might be harder to do that from a test script
> > at the granularity of things like bmbt buffers, though that may also be
> > the case for error injection. Did you have something in mind to filter
> > the buffer types.. the blf type perhaps?
> >
> > > For a moment I also wondered why not use the error injection mechanism
> > > that we already have, rather than inventing more sysfs knobs? But we
> > > have limited space in xfs_error_injection (29/32 bits used) so we
> > > probably should just switch everything to sysfs knobs.
> > >
> >
> > Yeah, I think we discussed this before as well. IIRC, I had already
> > implemented whatever the sysfs knob was and and corresponding test, and
> > it just wasn't really important enough to reimplement the whole thing.
>
> We probably did, back when the other error injection knobs (drop_writes,
> log/log_badcrc_factor) got added. Unfortunately now they're all over
> sysfs, which makes for sort of a mess.
>
> > I didn't really have a preference as to how this would be implemented.
> > On a quick look though, it looks like a downside to the existing
> > injection mechanism is that we can't control the error frequency..? If
> > that's the case, I like the idea of stepping back, perhaps audit the
> > granularity of the broader error injection framework and define a new
> > sysfs interface that also allows controlling things like the error
> > frequency dynamically.
>
> Yes, I've heard you ask for this twice now. I agree that being able to
> control the frequency would be a useful feature to enable longer-running
> tests so that you could, say, have the refcount_finish_one error trigger
> once every 1,000,000 updates instead of immediately afterwards.
>
Yep. It allows modeling for longevity tests as you note above, but I'm
also thinking about the ability to turn a particular error on with 100%
frequency in order to run fast, deterministic regression tests.
> So with that in mind I coded up a quick RFC to create sysfs knobs in
> /sys/fs/xfs/$device/errortag/$tagname ; the units are the same as the
> XFS_RANDOM_ values (i.e. inverted frequency). Now we're free of the
> limitation of only being able to inject 10 error types across all
> mounted fses, and we can individually disable injection too.
>
Nice, that sounds very interesting.. thanks!
Brian
> > (I would still like to get something targeted towards testing this
> > series in the meantime.)
>
> Same here.
>
> --D
>
> > Brian
> >
> > > --D
> > >
> > > > This would facilitate longer running tests where iodone callback errors
> > > > occur randomly and transiently and we can thus actually exercise the
> > > > error handling and recovery code. I'd love to run some fsstress testing,
> > > > for example, as I'm hoping that would help wring out any further issues
> > > > that could be lurking (particularly with the tricky xfs_iflush_done()
> > > > logic and whatnot). If implemented generally enough, that could also be
> > > > reused for a more simple xfstests regression test for the original
> > > > problem (e.g., mount fs, set error frequency = 100%, modify an inode,
> > > > set error frequency = 0, umount), albeit it would require debug mode.
> > > > Thoughts?
> > > >
> > > > Brian
> > > >
> > > > >
> > > > > --
> > > > > 2.9.4
> > > > >
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-06-21 10:15 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-16 10:54 [PATCH 0/2 V4] Resubmit items failed during writeback Carlos Maiolino
2017-06-16 10:54 ` [PATCH 1/2 V4] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
2017-06-19 13:48 ` Brian Foster
2017-06-20 7:15 ` Carlos Maiolino
2017-06-16 10:54 ` [PATCH 2/2] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-06-16 11:06 ` Carlos Maiolino
2017-06-16 18:35 ` Luis R. Rodriguez
2017-06-16 19:24 ` Darrick J. Wong
2017-06-16 19:37 ` Luis R. Rodriguez
2017-06-16 19:45 ` Eric Sandeen
2017-06-19 10:59 ` Brian Foster
2017-06-20 16:52 ` Luis R. Rodriguez
2017-06-20 17:20 ` Brian Foster
2017-06-20 18:05 ` Luis R. Rodriguez
2017-06-21 10:10 ` Brian Foster
2017-06-21 15:25 ` Luis R. Rodriguez
2017-06-20 18:38 ` Luis R. Rodriguez
2017-06-20 7:01 ` Carlos Maiolino
2017-06-20 16:24 ` Luis R. Rodriguez
2017-06-21 11:51 ` Carlos Maiolino
2017-06-19 13:49 ` Brian Foster
2017-06-19 15:09 ` Brian Foster
2017-06-19 13:51 ` [PATCH 0/2 V4] Resubmit items failed during writeback Brian Foster
2017-06-19 17:42 ` Darrick J. Wong
2017-06-19 18:51 ` Brian Foster
2017-06-21 0:45 ` Darrick J. Wong
2017-06-21 10:15 ` Brian Foster [this message]
2017-06-21 11:03 ` Carlos Maiolino
2017-06-21 11:51 ` Brian Foster
2017-06-21 16:54 ` Darrick J. Wong
2017-06-22 12:05 ` Carlos Maiolino
2017-06-22 12:40 ` Brian Foster
2017-06-30 11:09 ` Carlos Maiolino
2017-06-30 11:33 ` Brian Foster
2017-06-30 12:22 ` Carlos Maiolino
2017-06-30 17:01 ` Darrick J. Wong
2017-07-03 8:37 ` Carlos Maiolino
2017-06-21 16:45 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170621101526.GC28914@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=cmaiolino@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).