linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing\
Date: Tue, 9 Mar 2021 21:10:47 -0500	[thread overview]
Message-ID: <YEgqp87obzpezjwT@bfoster> (raw)
In-Reply-To: <20210309043559.GT3419940@magnolia>

On Mon, Mar 08, 2021 at 08:35:59PM -0800, Darrick J. Wong wrote:
> On Tue, Mar 09, 2021 at 11:44:10AM +1100, Dave Chinner wrote:
> > On Fri, Mar 05, 2021 at 09:58:26AM -0500, Brian Foster wrote:
> > > On Fri, Mar 05, 2021 at 09:48:48AM +1100, Dave Chinner wrote:
> > > > > > > So not only does this patch expose subsystem internals to
> > > > > > > external layers and create more log forcing interfaces/behaviors to
> > > > > > 
> > > > > > Sorry, I don't follow. What "subsystem internals" are being exposed
> > > > > > and what external layer are they being exposed to?
> > > > > > 
> > > > > > > > Cleaning up the mess that is the xfs_log_* and xlog_* interfaces and
> > > > > > > > changing things like log force behaviour and implementation is for
> > > > > > > > a future series.
> > > > > > > > 
> > > > > > > 
> > > > > > > TBH I think this patch is kind of a mess on its own. I think the
> > > > > > > mechanism it wants to provide is sane, but I've not even got to the
> > > > > > > point of reviewing _that_ yet because of the seeming dismissal of higher
> > > > > > > level feedback. I'd rather not go around in circles on this so I'll just
> > > > > > > offer my summarized feedback to this patch...
> > > > > > 
> > > > > > I'm not dismissing review nor am I saying the API cannot or should
> > > > > > not be improved. I'm simply telling you that I think the additional
> > > > > > changes you are proposing are outside the scope of the problem I am
> > > > > > addressing here. I already plan to rework the log force API (and
> > > > > > others) but doing so it not something that this patchset needs to
> > > > > > address, or indeed should address.
> > > > > > 
> > > > > 
> > > > > I'm not proposing additional changes nor to rework the log force API.
> > > > > I'm pointing out that I find this implementation to be extremely and
> > > > > unnecessarily confusing.
> > > > 
> > > > And that's just fine - not everyone has to understand all of the
> > > > code all of the time. That's why we have a team....
> > > > 
> > > > > To improve it, I'm suggesting to either coopt
> > > > > the existing async log force API...
> > > > 
> > > > And I'm telling you that this is *future work*. As the person
> > > > actually doing the work, that's my decision and you need to accept
> > > > that. I understand your concerns and have a plan to address them -
> > > > you just have to accept that the plan to address them isn't going to
> > > > be exactly to your liking.
> > > > 
> > > 
> > > As a reviewer, I am pointing out that I object to how this patch is
> > > implemented and offered several fairly simple suggestions on how to
> > > address my concerns. Those suggestions have been rejected pretty much
> > > out of hand on the basis of the existence of some future plans.
> > 
> > We're allowed to disagree on the best approach. But to find a way
> > forward means that both the summitter and the reviewer need to
> > compromise. I've fixed up the obvious warts and bugs you've pointed
> > out, and agreed that it needs cleaning up and have committed to
> > cleaning it up in the near future.
> > 
> > > Future rework plans do not justify or change my view of this patch
> > > and the mess it adds (for other developers, for historical context
> > > and downstreams, etc.).  Therefore, if the only option you're
> > > willing to consider is "make this mess now and clean it up later
> > > in some broader rework," then I'd rather we just defer this patch
> > > until after that rework is available and avoid the mess that way.
> > 
> > So you won't review it until I have 100 outstanding patches in this
> > series and it's completely and utterly unreviewable?
> 
> Already unreviewable at 45, and I've only gotten through 2/3 of it.
> 
> > Then you'll ask me to split it up and re-order it into digestable
> > chunks, and so we'll go back to having to merge this because any of
> > the API rework that depends on the mechanism that this patch
> > introduces.
> 
> Here's something I haven't previously shared with all of you: Last cycle
> when we were going around and around on the ENOSPC/EDQUOT retry loop
> patches (which exploded from 13 to 41 patches) it was /very/ stressful
> to have to rework this and that part every day and a half for almost
> three weeks.
> 
> When I get a patchset ready for submission, I export the patches from my
> development branch into a topic branch based off the latest -rc.  Every
> time anyone makes a suggestion, I update the topic branch and send that
> to the fstests "cloud" to see if it broke anything.  If it succeeds, I
> push it to k.org and resubmit.
> 
> The part that's less obvious is the fact that rebasing is /not/ that
> simple.  Next I integrate the changes into my development tree and
> rebase everything that comes after /that/ to make sure it doesn't
> interfere with my longer term development goals.  Sometimes this is
> easy, sometimes this takes an entire day to work out the warts.
> Then I send that to the fstests "cloud".  This rebase is particularly
> painful because every change that everyone makes to inode
> initialization collides with a rework of inode initialization that I've
> been working on in preparation for metadata directory trees.
> 
> The part that's even /less/ obvious than that is that once the
> development tree tests ok, then I do the same to my xfsprogs dev tree to
> make sure that nothing broke.  Frequently there are ABI or cognitive
> mismatches between kernel and userspace such that the build breaks, and
> then I have to patch the two kernel trees and re-run everything.
> 
> So, that's really stressful for me because a minor tweak to an interface
> can result in an enormous amount of work.  And I reject the argument
> that I could just rebase less frequently -- Dave does that, which means
> that any time he hears that one of his topic branches is being asked
> about, he has to spend weeks rebasing the whole mess to the latest
> upstream.  Maybe that works for him, but for me, I would hate that even
> more than doing a daily rebase.
> 
> Add to that the fact that vger has been a total delivery sh*tshow all
> year.  Now in addition to feeling disconnected from my family and
> friends, I also feel disconnected from work people too.  This really did
> nearly push me over the edge three weeks ago.
> 
> Please remember, one of the big advantages of our open development
> processes is that we /do/ accept code with warty (but functional)
> interfaces now, and we can clean them up later.  This is (IMHO) a good
> stress-reduction tactic, because each of us (ideally) should concentrate
> on getting the core algorithms right, and not focusing on rebasing code
> and smoothing over the same d*** merge conflicts over and over.
> 
> Yes, it's true that people think that a maintainer's only real power is
> to say 'no' in the hopes of forcing developers to fix everything now
> because they can't trust that a dev will ever come back with the
> promised updates, but I reject that 110%.  I'm not going anywhere, and I
> /do/ trust that when the rest of you say that you'll be back with wart
> remover, you will.
> 
> > I'm not going to play that "now jump through this hoop" game.  We
> > add flags for on-off behaviours in internal functions -all the
> > time-. If this makes the interface so complex and confusing that you
> > don't understand it, then the interface was already too complex and
> > confusing. And fixing that is *not in the scope of this patchset*.
> > 
> > Demanding that code be made perfect before it can be merged is
> > really not very helpful. Especially when there are already plans to
> > rework the API but that rework is dependent on a bunch of other
> > changes than need to be done first.
> > 
> > iclogs are something that need to be moved behind the CIL, not sit
> > in front of CIL. The CIL abstracts the journal and writing to the
> > journal completely away from the transaction subsystem, yet the log
> > force code punches straight through that abstraction and walks
> > iclogs directly. The whole log force implementation needs to change,
> > and I plan for the API that wraps the log forces to get reworked at
> > that time.
> 
> So here's what I want to know: Do Dave's changes to the log force APIs
> introduce broken behavior?  If the interfaces are so confusing that
> /none/ of us understand it, can we introduce the appropriate wrappers
> and documentation so that the rest of us plugging away at the rest of
> the system can only call it the supported ways to achieve any of the
> supported outcomes?
> 

It looks to me that the changes to xlog_cil_force_seq() could
essentially be replaced with something like:

/*
 * Behold the magical async CIL force
 * ... <explain what this does> ...
 */
static inline void
xfs_cil_force_async(
	struct xlog	*log)
{
	struct xfs_cil	*cil = log->l_cilp;

	/* true for magic async CIL force */
	xlog_cil_push_now(log, cil->xc_current_sequence, true);
}

If so, we wouldn't have to hack up xlog_cil_force_seq() with a flags
parameter that provides inconsistent async semantics (and thus show
mercy on the log force call stack above it) and otherwise only serves to
bypass 95% of that function anyways. I also think that inverting the cil
push bool param (and calling it async_push or some such) is a bit more
clear because it separates it from XFS_LOG_SYNC (or !XFS_LOG_SYNC)
semantics and simultaneously maps directly to the underlying
->xc_push_async control.

Brian

> I'm willing to accept a bunch of documentation and "trivial" wrappers
> for the rest of us as a shoofly to enable the rest of the xfs developers
> to keep moving around a messy slow-moving log restructuring without
> falling into a work pit.
> 
> However, it's been difficult for me to check that without being able to
> reference a branch to see that at least the end result looks sane.  That
> was immensely helpful for reviewing Allison's deferred xattrs series.
> 
> (TLDR: git branch plz)
> 
> The other way to ease my fears, of course, would be to submit a ton of
> fstests to examine the log behavior for correctness, but that's
> difficult to pull off when the control surface is the userspace ABI.
> 
> > For example, if we want to direct map storage for log writes, then
> > iclog-based log force synchronisation needs to go away because we
> > don't need iclogs for buffering journal writes. Hence the log foce
> > code should interface only with the CIL, and only the CIL should
> > manage whatever mechanism it is using to write to stable storage.
> > The code is currently the way it is because the CIL, when first
> > implemented, had to co-exist with the old way of writing to the log.
> > We haven't used that old way for a decade now and we have very
> > different storage performance characteristics these days, so it's
> > about time we got rid of the mechanism designed to be optimal for
> > spinning disks and actually integrated the CIL and the log
> > efficiently.
> > 
> > There are a *lot* of steps to do this, and reworking the log force
> > implementation and API is part of that. But reworking that API is
> > premature because we haven't done all the necessary pre-work in
> > place to make such a change yet. This patch is actually part of that
> > pre-work to get the mechanisms that the log force rework will rely
> > on.
> > 
> > I have very good reasons for pushing back against your suggestions,
> > Brian. Your suggestions have merit but this patch is not the time to
> > be making the changes you suggest. Code does not need to be perfect
> > to be merged, nor does the entire larger change they will be part of
> > need to be complete and 100% tested and reviewed before preparation
> > and infrastructure patches can be merged. This is how we've done big
> > changes in the past - they've been staged across multiple kernel
> > cycles - and not everything needs to be done in the first
> > patchset of a larger body of work....
> 
> You mean there's even more beyond the 45 already on the list? /groan/
> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> 


  reply	other threads:[~2021-03-10  2:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  5:32 [PATCH 0/3] xfs: CIL improvements Dave Chinner
2021-02-23  5:32 ` [PATCH 1/3] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-02-24 21:42   ` Darrick J. Wong
2021-02-24 22:19     ` Dave Chinner
2021-02-25 19:01     ` Christoph Hellwig
2021-03-02 18:12   ` Brian Foster
2021-02-23  5:32 ` [PATCH 2/3] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-02-24 21:10   ` Dave Chinner
2021-02-24 23:26     ` [PATCH 2/3 v2] " Dave Chinner
2021-02-25  2:15       ` Dave Chinner
2021-03-02 21:44       ` Brian Foster
2021-03-03  0:57         ` Dave Chinner
2021-03-03 17:32           ` Brian Foster
2021-03-04  1:59             ` Dave Chinner
2021-03-04 13:13               ` Brian Foster
2021-03-04 22:48                 ` Dave Chinner
2021-03-05 14:58                   ` Brian Foster
2021-03-09  0:44                     ` Dave Chinner
2021-03-09  4:35                       ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing\ Darrick J. Wong
2021-03-10  2:10                         ` Brian Foster [this message]
2021-03-10 22:00                           ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-03-10 15:13                         ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing\ Brian Foster
2021-03-11 12:41                         ` Christoph Hellwig
2021-03-10 14:49                       ` [PATCH 2/3 v2] xfs: AIL needs asynchronous CIL forcing Brian Foster
2021-02-25 13:12     ` [PATCH 2/3] " Brian Foster
2021-02-25 22:03       ` Dave Chinner
2021-02-27 16:25         ` Brian Foster
2021-03-01  4:54           ` Dave Chinner
2021-03-01 13:32             ` Brian Foster
2021-03-03  1:23               ` Dave Chinner
2021-03-03 17:20                 ` Brian Foster
2021-03-04  2:01                   ` Dave Chinner
2021-02-23  5:32 ` [PATCH 3/3] xfs: CIL work is serialised, not pipelined Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YEgqp87obzpezjwT@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).