All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Bobrowski <mbobrowski@mbobrowski.org>
To: Jan Kara <jack@suse.cz>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>,
	adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, hch@infradead.org,
	david@fromorbit.com, darrick.wong@oracle.com
Subject: Re: [PATCH v5 00/12] ext4: port direct I/O to iomap infrastructure
Date: Wed, 23 Oct 2019 21:11:38 +1100	[thread overview]
Message-ID: <20191023101138.GA6725@bobrowski> (raw)
In-Reply-To: <20191023100153.GB22307@quack2.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 3420 bytes --]

On Wed, Oct 23, 2019 at 12:01:53PM +0200, Jan Kara wrote:
> On Wed 23-10-19 13:35:19, Matthew Bobrowski wrote:
> > On Mon, Oct 21, 2019 at 09:43:30PM +0200, Jan Kara wrote:
> > > On Mon 21-10-19 09:31:12, Theodore Y. Ts'o wrote:
> > > > Hi Matthew, thanks for your work on this patch series!
> > > > 
> > > > I applied it against 4c3, and ran a quick test run on it, and found
> > > > the following locking problem.  To reproduce:
> > > > 
> > > > kvm-xfstests -c nojournal generic/113
> > > > 
> > > > generic/113		[09:27:19][    5.841937] run fstests generic/113 at 2019-10-21 09:27:19
> > > > [    7.959477] 
> > > > [    7.959798] ============================================
> > > > [    7.960518] WARNING: possible recursive locking detected
> > > > [    7.961225] 5.4.0-rc3-xfstests-00012-g7fe6ea084e48 #1238 Not tainted
> > > > [    7.961991] --------------------------------------------
> > > > [    7.962569] aio-stress/1516 is trying to acquire lock:
> > > > [    7.963129] ffff9fd4791148c8 (&sb->s_type->i_mutex_key#12){++++}, at: __generic_file_fsync+0x3e/0xb0
> > > > [    7.964109] 
> > > > [    7.964109] but task is already holding lock:
> > > > [    7.964740] ffff9fd4791148c8 (&sb->s_type->i_mutex_key#12){++++}, at: ext4_dio_write_iter+0x15b/0x430
> > > 
> > > This is going to be a tricky one. With iomap, the inode locking is handled
> > > by the filesystem while calling generic_write_sync() is done by
> > > iomap_dio_rw(). I would really prefer to avoid tweaking iomap_dio_rw() not
> > > to call generic_write_sync(). So we need to remove inode_lock from
> > > __generic_file_fsync() (used from ext4_sync_file()). This locking is mostly
> > > for legacy purposes and we don't need this in ext4 AFAICT - but removing
> > > the lock from __generic_file_fsync() would mean auditing all legacy
> > > filesystems that use this to make sure flushing inode & its metadata buffer
> > > list while it is possibly changing cannot result in something unexpected. I
> > > don't want to clutter this series with it so we are left with
> > > reimplementing __generic_file_fsync() inside ext4 without inode_lock. Not
> > > too bad but not great either. Thoughts?
> > 
> > So, I just looked at this on my lunch break and I think the simplest
> > approach would be to just transfer the necessary chunks of code from
> > within __generic_file_fsync() into ext4_sync_file() for !journal cases,
> > minus the inode lock, and minus calling into __generic_file_fsync(). I
> > don't forsee this causing any issues, but feel free to correct me if I'm
> > wrong.
> 
> Yes, that's what I'd suggest as well. In fact when doing that you can share
> file_write_and_wait_range() call with the one already in ext4_sync_file()
> use for other cases. Similarly with file_check_and_advance_wb_err(). So the
> copied bit will be really only:
> 
>         ret = sync_mapping_buffers(inode->i_mapping);
>         if (!(inode->i_state & I_DIRTY_ALL))
>                 goto out;
>         if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
>                 goto out;
> 
>         err = sync_inode_metadata(inode, 1);
>         if (ret == 0)
>                 ret = err;
> 
> > If this is deemed to be OK, then I will go ahead and include this as a
> > separate patch in my series.
> 
> Yes, please.

Heh!

I just finished writing and testing it and exactly what I've done
(attached). Anyway, I will include it in v6. :)

--<M>--

[-- Attachment #2: 0001-ext4-update-ext4_sync_file-to-not-use-__generic_file.patch --]
[-- Type: text/x-patch, Size: 2318 bytes --]

From 4c82edb34324f91788c941956954d4e7e1886c2c Mon Sep 17 00:00:00 2001
From: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Date: Wed, 23 Oct 2019 17:43:23 +1100
Subject: [PATCH 1/2] ext4: update ext4_sync_file() to not use
 __generic_file_fsync()

When the filesystem is created without a journal, we eventually call
into __generic_file_fsync() in order to write out all the modified
in-core data to the permanent storage device. This function happens to
try and obtain an inode_lock() while synchronizing the files buffer
and it's associated metadata.

Generally, this is fine, however it becomes a problem when there is
higher level code that has already obtained an inode_lock() as this
leads to a recursive lock situation. This case is especially true when
porting across direct I/O to iomap infrastructure as we obtain an
inode_lock() early on in the I/O within ext4_dio_write_iter() and hold
it until the I/O has been completed. Consequently, to not run into
this specific issue, we move away from calling into
__generic_file_fsync() and perform the necessary synchronization tasks
within ext4_sync_file().

Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
---
 fs/ext4/fsync.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 5508baa11bb6..9e11868e82f9 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -116,8 +116,21 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 	}
 
+	ret = file_write_and_wait_range(file, start, end);
+	if (ret)
+		return ret;
+
 	if (!journal) {
-		ret = __generic_file_fsync(file, start, end, datasync);
+		ret = sync_mapping_buffers(inode->i_mapping);
+		if (!(inode->i_state & I_DIRTY_ALL))
+			goto out;
+		if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
+			goto out;
+
+		err = sync_inode_metadata(inode, 1);
+		if (!ret)
+			ret = err;
+
 		if (!ret)
 			ret = ext4_sync_parent(inode);
 		if (test_opt(inode->i_sb, BARRIER))
@@ -125,9 +138,6 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 	}
 
-	ret = file_write_and_wait_range(file, start, end);
-	if (ret)
-		return ret;
 	/*
 	 * data=writeback,ordered:
 	 *  The caller's filemap_fdatawrite()/wait will sync the data.
-- 
2.20.1


  reply	other threads:[~2019-10-23 10:11 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-21  9:17 [PATCH v5 00/12] ext4: port direct I/O to iomap infrastructure Matthew Bobrowski
2019-10-21  9:17 ` [PATCH v5 01/12] ext4: move set iomap routines into separate helper ext4_set_iomap() Matthew Bobrowski
2019-10-21 13:23   ` Jan Kara
2019-10-23  6:31   ` Ritesh Harjani
2019-10-23 10:14     ` Matthew Bobrowski
2019-10-21  9:17 ` [PATCH v5 02/12] ext4: iomap that extends beyond EOF should be marked dirty Matthew Bobrowski
2019-10-21 13:28   ` Jan Kara
2019-10-22  1:49     ` Matthew Bobrowski
2019-10-23  6:35   ` Ritesh Harjani
2019-10-23 10:20     ` Matthew Bobrowski
2019-10-21  9:18 ` [PATCH v5 03/12] ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helper Matthew Bobrowski
2019-10-21 13:31   ` Jan Kara
2019-10-23  6:37   ` Ritesh Harjani
2019-10-21  9:18 ` [PATCH v5 04/12] ext4: introduce new callback for IOMAP_REPORT Matthew Bobrowski
2019-10-21 13:37   ` Jan Kara
2019-10-22  1:55     ` Matthew Bobrowski
2019-10-23  6:39       ` Ritesh Harjani
2019-10-23 10:35         ` Matthew Bobrowski
2019-10-21  9:18 ` [PATCH v5 05/12] iomap: Allow forcing of waiting for running DIO in iomap_dio_rw() mbobrowski
2019-10-24  1:41   ` Christoph Hellwig
2019-10-24 11:17     ` Matthew Bobrowski
2019-10-21  9:18 ` [PATCH v5 06/12] xfs: Use iomap_dio_rw_wait() mbobrowski
2019-10-21 13:38   ` Jan Kara
2019-10-21  9:18 ` [PATCH v5 07/12] ext4: introduce direct I/O read using iomap infrastructure Matthew Bobrowski
2019-10-21 13:41   ` Jan Kara
2019-10-22  1:58     ` Matthew Bobrowski
2019-10-23  6:40   ` Ritesh Harjani
2019-10-21  9:18 ` [PATCH v5 08/12] ext4: update direct I/O read to do trylock in IOCB_NOWAIT cases Matthew Bobrowski
2019-10-21 13:48   ` Jan Kara
2019-10-22  2:04     ` Matthew Bobrowski
2019-10-22  7:50       ` Jan Kara
2019-10-23  6:51   ` Ritesh Harjani
2019-10-21  9:18 ` [PATCH v5 09/12] ext4: move inode extension/truncate code out from ->iomap_end() callback Matthew Bobrowski
2019-10-21 13:53   ` Jan Kara
2019-10-22  2:07     ` Matthew Bobrowski
2019-10-21  9:20 ` [PATCH v5 12/12] ext4: introduce direct I/O write using iomap infrastructure Matthew Bobrowski
2019-10-21 16:18   ` Jan Kara
2019-10-22  3:02     ` Matthew Bobrowski
2019-10-22  7:55       ` Jan Kara
2019-10-21  9:20 ` [PATCH v5 11/12] ext4: reorder map->m_flags checks in ext4_set_iomap() Matthew Bobrowski
2019-10-21  9:20 ` [PATCH v5 10/12] ext4: move inode extension check out from ext4_iomap_alloc() Matthew Bobrowski
2019-10-21 13:31 ` [PATCH v5 00/12] ext4: port direct I/O to iomap infrastructure Theodore Y. Ts'o
2019-10-21 19:43   ` Jan Kara
2019-10-21 22:38     ` Dave Chinner
2019-10-22  8:01       ` Jan Kara
2019-10-23  2:35     ` Matthew Bobrowski
2019-10-23 10:01       ` Jan Kara
2019-10-23 10:11         ` Matthew Bobrowski [this message]
2019-10-24  1:58           ` Christoph Hellwig
2019-10-24 11:09             ` Matthew Bobrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191023101138.GA6725@bobrowski \
    --to=mbobrowski@mbobrowski.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.