From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, Eric Sandeen <sandeen@sandeen.net>,
Dave Chinner <dchinner@redhat.com>,
Surbhi Palande <csurbhi@gmail.com>,
Kamal Mostafa <kamal@canonical.com>,
Christoph Hellwig <hch@infradead.org>,
LKML <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, linux-ext4@vger.kernel.org,
Ben Myers <bpm@sgi.com>, Alex Elder <elder@kernel.org>
Subject: Re: [PATCH 5/8] xfs: Protect xfs_file_aio_write() & xfs_setattr_size() with sb_start_write - sb_end_write
Date: Tue, 24 Jan 2012 18:19:26 +1100 [thread overview]
Message-ID: <20120124071926.GM15102@dastard> (raw)
In-Reply-To: <1327091686-23177-6-git-send-email-jack@suse.cz>
On Fri, Jan 20, 2012 at 09:34:43PM +0100, Jan Kara wrote:
> Replace racy xfs_wait_for_freeze() check in xfs_file_aio_write() with
> a reliable sb_start_write() - sb_end_write() locking. Due to lock ranking
> dictated by the page fault code we have to call sb_start_write() after we
> acquire ilock.
It appears to me that you have indeed confused the ilock with the
iolock.
> Similarly we have to protect xfs_setattr_size() because it can modify last
> page of truncated file. Because ilock is dropped in xfs_setattr_size() we
> have to drop and retake write access as well to avoid deadlocks.
>
> CC: Ben Myers <bpm@sgi.com>
> CC: Alex Elder <elder@kernel.org>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/xfs/xfs_file.c | 6 ++++--
> fs/xfs/xfs_iops.c | 6 ++++++
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 753ed9b..9efd153 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -862,9 +862,11 @@ xfs_file_dio_aio_write(
> *iolock = XFS_IOLOCK_SHARED;
> }
>
> + sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
> trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
> ret = generic_file_direct_write(iocb, iovp,
> &nr_segs, pos, &iocb->ki_pos, count, ocount);
> + sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
That's inside the iolock, not the ilock. Either way, it is
incorrect. This accounting should be outside the iolock - because
xfs_trans_alloc() can be called with the iolock held. Therefore the
freeze/lock order needs to be
sb_start_write(SB_FREEZE_WRITE)
XFS(ip)->i_iolock
XFS(ip)->i_ilock
sb_end_write(SB_FREEZE_WRITE)
Which matches the current freeze/lock order.
> @@ -945,8 +949,6 @@ xfs_file_aio_write(
> if (ocount == 0)
> return 0;
>
> - xfs_wait_for_freeze(ip->i_mount, SB_FREEZE_WRITE);
> -
that's where sb_start_write() needs to be, and the sb-end_write()
call needs to below the generic_write_sync() calls that will trigger
IO on O_SYNC writes. Otherwise it is not covering all the IO path
correctly.
> if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> return -EIO;
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 3579bc8..798b9c6 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -793,6 +793,7 @@ xfs_setattr_size(
> return xfs_setattr_nonsize(ip, iattr, 0);
> }
>
> + sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
> /*
> * Make sure that the dquots are attached to the inode.
> */
> @@ -849,10 +850,14 @@ xfs_setattr_size(
> xfs_get_blocks);
> if (error)
> goto out_unlock;
> + /* Drop the write access to avoid lock inversion with ilock */
> + sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
>
> xfs_ilock(ip, XFS_ILOCK_EXCL);
> lock_flags |= XFS_ILOCK_EXCL;
>
> + sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
> +
This is caused by the previous problems I pointed out. You should
not need to drop the freeze reference here at all.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Alex Elder <elder@kernel.org>, Surbhi Palande <csurbhi@gmail.com>,
Kamal Mostafa <kamal@canonical.com>,
Eric Sandeen <sandeen@sandeen.net>,
LKML <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, Christoph Hellwig <hch@infradead.org>,
Ben Myers <bpm@sgi.com>, Dave Chinner <dchinner@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 5/8] xfs: Protect xfs_file_aio_write() & xfs_setattr_size() with sb_start_write - sb_end_write
Date: Tue, 24 Jan 2012 18:19:26 +1100 [thread overview]
Message-ID: <20120124071926.GM15102@dastard> (raw)
In-Reply-To: <1327091686-23177-6-git-send-email-jack@suse.cz>
On Fri, Jan 20, 2012 at 09:34:43PM +0100, Jan Kara wrote:
> Replace racy xfs_wait_for_freeze() check in xfs_file_aio_write() with
> a reliable sb_start_write() - sb_end_write() locking. Due to lock ranking
> dictated by the page fault code we have to call sb_start_write() after we
> acquire ilock.
It appears to me that you have indeed confused the ilock with the
iolock.
> Similarly we have to protect xfs_setattr_size() because it can modify last
> page of truncated file. Because ilock is dropped in xfs_setattr_size() we
> have to drop and retake write access as well to avoid deadlocks.
>
> CC: Ben Myers <bpm@sgi.com>
> CC: Alex Elder <elder@kernel.org>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/xfs/xfs_file.c | 6 ++++--
> fs/xfs/xfs_iops.c | 6 ++++++
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 753ed9b..9efd153 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -862,9 +862,11 @@ xfs_file_dio_aio_write(
> *iolock = XFS_IOLOCK_SHARED;
> }
>
> + sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
> trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
> ret = generic_file_direct_write(iocb, iovp,
> &nr_segs, pos, &iocb->ki_pos, count, ocount);
> + sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
That's inside the iolock, not the ilock. Either way, it is
incorrect. This accounting should be outside the iolock - because
xfs_trans_alloc() can be called with the iolock held. Therefore the
freeze/lock order needs to be
sb_start_write(SB_FREEZE_WRITE)
XFS(ip)->i_iolock
XFS(ip)->i_ilock
sb_end_write(SB_FREEZE_WRITE)
Which matches the current freeze/lock order.
> @@ -945,8 +949,6 @@ xfs_file_aio_write(
> if (ocount == 0)
> return 0;
>
> - xfs_wait_for_freeze(ip->i_mount, SB_FREEZE_WRITE);
> -
that's where sb_start_write() needs to be, and the sb-end_write()
call needs to below the generic_write_sync() calls that will trigger
IO on O_SYNC writes. Otherwise it is not covering all the IO path
correctly.
> if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> return -EIO;
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 3579bc8..798b9c6 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -793,6 +793,7 @@ xfs_setattr_size(
> return xfs_setattr_nonsize(ip, iattr, 0);
> }
>
> + sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
> /*
> * Make sure that the dquots are attached to the inode.
> */
> @@ -849,10 +850,14 @@ xfs_setattr_size(
> xfs_get_blocks);
> if (error)
> goto out_unlock;
> + /* Drop the write access to avoid lock inversion with ilock */
> + sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
>
> xfs_ilock(ip, XFS_ILOCK_EXCL);
> lock_flags |= XFS_ILOCK_EXCL;
>
> + sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
> +
This is caused by the previous problems I pointed out. You should
not need to drop the freeze reference here at all.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-01-24 7:19 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-20 20:34 [PATCH 0/8] Fix filesystem freezing Jan Kara
2012-01-20 20:34 ` Jan Kara
2012-01-20 20:34 ` [PATCH 1/8] fs: Improve filesystem freezing handling Jan Kara
2012-01-20 20:34 ` Jan Kara
2012-02-04 3:03 ` Eric Sandeen
2012-02-04 3:03 ` Eric Sandeen
2012-02-06 15:17 ` Jan Kara
2012-02-06 15:17 ` Jan Kara
2012-01-20 20:34 ` [PATCH 2/8] vfs: Protect write paths by sb_start_write - sb_end_write Jan Kara
2012-01-20 20:34 ` Jan Kara
2012-01-24 8:21 ` Dave Chinner
2012-01-24 8:21 ` Dave Chinner
2012-01-24 11:44 ` Jan Kara
2012-01-24 11:44 ` Jan Kara
2012-02-05 6:13 ` Eric Sandeen
2012-02-05 6:13 ` Eric Sandeen
2012-02-06 15:33 ` Jan Kara
2012-02-06 15:33 ` Jan Kara
2012-01-20 20:34 ` [PATCH 3/8] ext4: Protect ext4_page_mkwrite & ext4_setattr with " Jan Kara
2012-01-20 20:34 ` Jan Kara
2012-01-20 20:34 ` [PATCH 4/8] xfs: Move ilock before transaction start in xfs_setattr_size() Jan Kara
2012-01-20 20:34 ` Jan Kara
2012-01-24 6:59 ` Dave Chinner
2012-01-24 6:59 ` Dave Chinner
2012-01-24 11:52 ` Jan Kara
2012-01-24 11:52 ` Jan Kara
2012-01-20 20:34 ` [PATCH 5/8] xfs: Protect xfs_file_aio_write() & xfs_setattr_size() with sb_start_write - sb_end_write Jan Kara
2012-01-20 20:34 ` Jan Kara
2012-01-24 7:19 ` Dave Chinner [this message]
2012-01-24 7:19 ` Dave Chinner
2012-01-24 19:35 ` Jan Kara
2012-01-24 19:35 ` Jan Kara
2012-02-04 4:30 ` Eric Sandeen
2012-02-04 4:30 ` Eric Sandeen
2012-02-04 4:50 ` Eric Sandeen
2012-02-04 4:50 ` Eric Sandeen
2012-02-05 23:11 ` Dave Chinner
2012-02-05 23:11 ` Dave Chinner
2012-01-20 20:34 ` [PATCH 6/8] xfs: Use generic writers counter instead of m_active_trans counter Jan Kara
2012-01-20 20:34 ` Jan Kara
2012-01-24 8:05 ` Dave Chinner
2012-01-24 8:05 ` Dave Chinner
2012-02-04 2:13 ` Eric Sandeen
2012-02-04 2:13 ` Eric Sandeen
2012-02-04 2:42 ` Eric Sandeen
2012-02-04 2:42 ` Eric Sandeen
2012-02-04 4:34 ` Eric Sandeen
2012-02-04 4:34 ` Eric Sandeen
2012-01-20 20:34 ` [PATCH 7/8] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Jan Kara
2012-01-20 20:34 ` Jan Kara
2012-01-20 20:34 ` [PATCH 8/8] vfs: Document s_frozen state through freeze_super Jan Kara
2012-01-20 20:34 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120124071926.GM15102@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=csurbhi@gmail.com \
--cc=dchinner@redhat.com \
--cc=elder@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=kamal@canonical.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.