All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leah Rumancik <leah.rumancik@gmail.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Dave Chinner <dchinner@redhat.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 5.15 12/15] xfs: async CIL flushes need pending pushes to be made stable
Date: Mon, 13 Jun 2022 10:31:50 -0700	[thread overview]
Message-ID: <Yqd0hk2n9vyp56OA@google.com> (raw)
In-Reply-To: <CAOQ4uxh__DXycqz+6AFZK3JxLw0Bb_xCNv3eAmX-FdTk0miq8g@mail.gmail.com>

On Wed, Jun 08, 2022 at 10:43:57AM +0300, Amir Goldstein wrote:
> On Mon, Jun 6, 2022 at 8:12 AM Leah Rumancik <leah.rumancik@gmail.com> wrote:
> >
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > [ Upstream commit 70447e0ad9781f84e60e0990888bd8c84987f44e ]
> >
> > When the AIL tries to flush the CIL, it relies on the CIL push
> > ending up on stable storage without having to wait for and
> > manipulate iclog state directly. However, if there is already a
> > pending CIL push when the AIL tries to flush the CIL, it won't set
> > the cil->xc_push_commit_stable flag and so the CIL push will not
> > actively flush the commit record iclog.
> >
> > generic/530 when run on a single CPU test VM can trigger this fairly
> > reliably. This test exercises unlinked inode recovery, and can
> > result in inodes being pinned in memory by ongoing modifications to
> > the inode cluster buffer to record unlinked list modifications. As a
> > result, the first inode unlinked in a buffer can pin the tail of the
> > log whilst the inode cluster buffer is pinned by the current
> > checkpoint that has been pushed but isn't on stable storage because
> > because the cil->xc_push_commit_stable was not set. This results in
> > the log/AIL effectively deadlocking until something triggers the
> > commit record iclog to be pushed to stable storage (i.e. the
> > periodic log worker calling xfs_log_force()).
> >
> > The fix is two-fold - first we should always set the
> > cil->xc_push_commit_stable when xlog_cil_flush() is called,
> > regardless of whether there is already a pending push or not.
> >
> > Second, if the CIL is empty, we should trigger an iclog flush to
> > ensure that the iclogs of the last checkpoint have actually been
> > submitted to disk as that checkpoint may not have been run under
> > stable completion constraints.
> >
> > Reported-and-tested-by: Matthew Wilcox <willy@infradead.org>
> > Fixes: 0020a190cf3e ("xfs: AIL needs asynchronous CIL forcing")
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> > ---
> 
> Two questions/suggestions regarding backporting this patch.
> 
> DISCLAIMER: I am raising questions/suggestions.
> There is no presumption that I know the answers.
> The author of the patch is the best authority when it comes to answering
> those questions and w.r.t adopting or discarding my suggestions.
> 
> 1. I think the backport should also be tested with a single CPU VM as
>     described above
> 2. I wonder if it would make sense to backport the 3 "defensive fixes" that
>     Dave mentioned in the cover letter [1] along with this fix?
> 
> The rationale being that it is not enough to backport the fix itself.
> Anything that is required to test the fix reliably should be backported with it
> and since this issue involves subtle timing and races (maybe not as much
> on a single CPU VM?), the "defensive fixes" that change the timing and
> amount of wakeups/pushes may impact the ability to test the fix?
> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/all/20220317053907.164160-1-david@fromorbit.com/

This patch has been postponed till the second set, but I can certainly
run this test described for the second set and look into the other fixes
from the cover letter. Thanks for pointing that out.

Leah

  reply	other threads:[~2022-06-13 19:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-03 18:57 [PATCH 5.15 01/15] xfs: use kmem_cache_free() for kmem_cache objects Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 02/15] xfs: punch out data fork delalloc blocks on COW writeback failure Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 03/15] xfs: Fix the free logic of state in xfs_attr_node_hasname Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 04/15] xfs: remove xfs_inew_wait Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 05/15] xfs: remove all COW fork extents when remounting readonly Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 06/15] xfs: only run COW extent recovery when there are no live extents Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 07/15] xfs: check sb_meta_uuid for dabuf buffer recovery Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 08/15] xfs: prevent UAF in xfs_log_item_in_current_chkpt Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 09/15] xfs: only bother with sync_filesystem during readonly remount Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 10/15] xfs: don't generate selinux audit messages for capability testing Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 11/15] xfs: use setattr_copy to set vfs inode attributes Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 12/15] xfs: async CIL flushes need pending pushes to be made stable Leah Rumancik
2022-06-08  7:43   ` Amir Goldstein
2022-06-13 17:31     ` Leah Rumancik [this message]
2022-06-03 18:57 ` [PATCH 5.15 13/15] xfs: don't include bnobt blocks when reserving free block pool Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 14/15] xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks Leah Rumancik
2022-06-03 18:57 ` [PATCH 5.15 15/15] xfs: drop async cache flushes from CIL commits Leah Rumancik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yqd0hk2n9vyp56OA@google.com \
    --to=leah.rumancik@gmail.com \
    --cc=amir73il@gmail.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.