All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Josef Bacik <jbacik@fb.com>,
	"stable [v4.9]" <stable@vger.kernel.org>
Subject: Re: [PATCH] xfs: fix incorrect log_flushed on fsync
Date: Thu, 31 Aug 2017 12:39:54 -0400	[thread overview]
Message-ID: <20170831163954.GH21939@bfoster.bfoster> (raw)
In-Reply-To: <CAOQ4uxgrdijqVpFm+rZ4JE_ZY0A_CZdSPHQEb8S1_R_dyJYzSg@mail.gmail.com>

On Thu, Aug 31, 2017 at 05:37:06PM +0300, Amir Goldstein wrote:
> On Thu, Aug 31, 2017 at 4:47 PM, Christoph Hellwig <hch@lst.de> wrote:
> > I think something like the following patch (totally untested,
> > just an idea) should fix the issue, right?
> 
> I think that is not enough.
> 
> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index c4893e226fd8..555fcae9a18f 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -135,7 +135,7 @@ xfs_file_fsync(
> >         struct xfs_inode        *ip = XFS_I(inode);
> >         struct xfs_mount        *mp = ip->i_mount;
> >         int                     error = 0;
> > -       int                     log_flushed = 0;
> > +       unsigned int            flushseq;
> >         xfs_lsn_t               lsn = 0;
> >
> >         trace_xfs_file_fsync(ip);
> > @@ -143,6 +143,7 @@ xfs_file_fsync(
> >         error = file_write_and_wait_range(file, start, end);
> >         if (error)
> >                 return error;
> > +       flushseq = READ_ONCE(mp->m_flushseq);
> 
> imagine that flush was submitted and completed before
> file_write_and_wait_range() but m_flushseq incremented  after.
> maybe here READ m_flush_submitted_seq...
> 
> >
> >         if (XFS_FORCED_SHUTDOWN(mp))
> >                 return -EIO;
> > @@ -181,7 +182,7 @@ xfs_file_fsync(
> >         }
> >
> >         if (lsn) {
> > -               error = _xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed);
> > +               error = _xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, NULL);
> >                 ip->i_itemp->ili_fsync_fields = 0;
> >         }
> >         xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > @@ -193,8 +194,9 @@ xfs_file_fsync(
> >          * an already allocated file and thus do not have any metadata to
> >          * commit.
> >          */
> > -       if (!log_flushed && !XFS_IS_REALTIME_INODE(ip) &&
> > -           mp->m_logdev_targp == mp->m_ddev_targp)
> > +       if (!XFS_IS_REALTIME_INODE(ip) &&
> > +           mp->m_logdev_targp == mp->m_ddev_targp &&
> > +           flushseq == READ_ONCE(mp->m_flushseq))
> 
> ... and here READ m_flush_completed_seq
> if (m_flush_completed_seq > m_flush_submitted_seq)
> it is safe to skip issue flush.
> Then probably READ_ONCE() is not enough and need smb_rmb?
> 

IIUC, basically we need to guarantee that a flush submits after
file_write_and_wait() and completes before we return. If we do something
like the above, I wonder if that means we could wait for the submit ==
complete if we observe submit was bumped since it was initially sampled
above (rather than issue another flush, which would be necessary if a
submit hadn't occurred))..?

If we do end up with something like this, I think it's a bit cleaner to
stuff the counter(s) in the xfs_buftarg structure and update them from
the generic buffer submit/completion code based on XBF_FLUSH. FWIW, I
suspect we could also update said counter(s) from
xfs_blkdev_issue_flush().

Brian

> >                 xfs_blkdev_issue_flush(mp->m_ddev_targp);
> >
> >         return error;
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index bcb2f860e508..3c0cbb98581e 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -2922,6 +2922,8 @@ xlog_state_done_syncing(
> >                 iclog->ic_state = XLOG_STATE_DONE_SYNC;
> >         }
> >
> > +       log->l_mp->m_flushseq++;
> 
> I recon this should use WRITE_ONCE or smp_wmb()
> and then also increment m_flush_submitted_seq *before*
> issueing flush
> 
> If state machine does not allow more than a single flush
> to be in flight (?) then the 2 seq counters could be reduced
> to single seq counter with (m_flushseq % 2) == 1 for submitted
> and  (m_flushseq % 2) == 0 for completed and the test in fsync
> would be (flushseq % 2) == (READ_ONCE(mp->m_flushseq) % 2)
> 
> ... maybe?
> 
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-08-31 16:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-30 13:38 [PATCH] xfs: fix incorrect log_flushed on fsync Amir Goldstein
2017-08-30 13:46 ` Christoph Hellwig
2017-08-30 14:12   ` Amir Goldstein
2017-08-30 14:21     ` Christoph Hellwig
2017-08-30 17:01 ` Darrick J. Wong
2017-08-31 13:47 ` Christoph Hellwig
2017-08-31 14:37   ` Amir Goldstein
2017-08-31 16:39     ` Brian Foster [this message]
2017-08-31 19:20       ` Amir Goldstein
2017-08-31 20:10         ` Brian Foster
2017-09-01  7:58           ` Amir Goldstein
2017-09-01 10:46             ` Brian Foster
2017-09-01  9:52         ` Christoph Hellwig
2017-09-01 10:37           ` Amir Goldstein
2017-09-01 10:43             ` Christoph Hellwig
2017-09-01  9:47     ` Christoph Hellwig
2017-09-15 12:40 ` Amir Goldstein
2017-09-18 17:11   ` Darrick J. Wong
2017-09-18 18:00     ` Amir Goldstein
2017-09-18 18:35       ` Greg KH
2017-09-18 19:29         ` Amir Goldstein
2017-09-19  6:32           ` Greg KH
2018-06-09  4:44             ` Amir Goldstein
2018-06-09  7:13               ` Greg KH
2017-09-18 21:24       ` Dave Chinner
2017-09-19  5:31         ` Amir Goldstein
2017-09-19  5:45           ` Darrick J. Wong
2017-09-20  0:40           ` Dave Chinner
2017-09-20  1:08             ` Vijay Chidambaram
2017-09-20  8:59             ` Eryu Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170831163954.GH21939@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jbacik@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.