From: Dave Chinner <david@fromorbit.com>
To: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>,
Eryu Guan <eguan@redhat.com>,
linux-kernel@vger.kernel.org, xfs@oss.sgi.com, axboe@fb.com,
Jan Kara <jack@suse.com>,
linux-fsdevel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
Date: Tue, 25 Aug 2015 08:27:20 +1000 [thread overview]
Message-ID: <20150824222720.GD714@dastard> (raw)
In-Reply-To: <20150824181038.GA28944@mtj.duckdns.org>
On Mon, Aug 24, 2015 at 02:10:38PM -0400, Tejun Heo wrote:
> Hello, Dave.
>
> On Fri, Aug 21, 2015 at 09:04:51AM +1000, Dave Chinner wrote:
> > > Maybe I'm misunderstanding the code but all xfs_writepage() calls are
> > > from unbound workqueues - the writeback workers - while
> > > xfs_setfilesize() are from bound workqueues, so I wondered why that
> > > was and looked at the code and the setsize functions are run off of a
> > > separate work item which is queued from the end_bio callback and I
> > > can't tell who would be waiting for them. Dave, what am I missing?
> >
> > xfs_setfilesize runs transactions, so it can't be run from IO
> > completion context as it needs to block (i.e. on log space or inode
> > locks). It also can't block log IO completion, nor metadata Io
> > completion, as only log IO completion can free log space, and the
> > inode lock might be waiting on metadata buffer IO completion (e.g.
> > during delayed allocation). Hence we have multiple IO completion
> > workqueues to keep these things separated and deadlock free. i.e.
> > they all get punted to a workqueue where they are then processed in
> > a context that can block safely.
>
> I'm still a bit confused. What prevents the following from happening?
>
> 1. io completion of last dirty page of an inode and work item for
> xfs_setfilesize() is queued.
>
> 2. inode removed from dirty list.
The inode has already been removed from the dirty list - that
happens at inode writeback submission time, not IO completion.
> 3. __sync_filesystem() invokes sync_inodes_sb(). There are no dirty
> pages, so it finishes.
There are no dirty pages, but the pages aren't clean, either. i.e
they are still under writeback. Hence we need to invoke
wait_inodes_sb() to wait for writeback on all pages to complete
before returning.
> 4. xfs_fs_sync_fs() is called which calls _xfs_log_force() but the
> work item from #1 hasn't run yet, so the size update isn't written
> out.
The bug here is that wait_inodes_sb() has not been run, therefore
->syncfs is being run before IO completions have been processed and
pages marked clean.
> 5. Crash.
>
> Is it that _xfs_log_force() waits for the setfilesize transaction
> created during writepage?
No, it's wait_inodes_sb() that does the waiting for data IO
completion for sync.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Tejun Heo <tj@kernel.org>
Cc: Eryu Guan <eguan@redhat.com>, Jens Axboe <axboe@kernel.dk>,
Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org, xfs@oss.sgi.com, axboe@fb.com,
Jan Kara <jack@suse.com>,
linux-fsdevel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes
Date: Tue, 25 Aug 2015 08:27:20 +1000 [thread overview]
Message-ID: <20150824222720.GD714@dastard> (raw)
In-Reply-To: <20150824181038.GA28944@mtj.duckdns.org>
On Mon, Aug 24, 2015 at 02:10:38PM -0400, Tejun Heo wrote:
> Hello, Dave.
>
> On Fri, Aug 21, 2015 at 09:04:51AM +1000, Dave Chinner wrote:
> > > Maybe I'm misunderstanding the code but all xfs_writepage() calls are
> > > from unbound workqueues - the writeback workers - while
> > > xfs_setfilesize() are from bound workqueues, so I wondered why that
> > > was and looked at the code and the setsize functions are run off of a
> > > separate work item which is queued from the end_bio callback and I
> > > can't tell who would be waiting for them. Dave, what am I missing?
> >
> > xfs_setfilesize runs transactions, so it can't be run from IO
> > completion context as it needs to block (i.e. on log space or inode
> > locks). It also can't block log IO completion, nor metadata Io
> > completion, as only log IO completion can free log space, and the
> > inode lock might be waiting on metadata buffer IO completion (e.g.
> > during delayed allocation). Hence we have multiple IO completion
> > workqueues to keep these things separated and deadlock free. i.e.
> > they all get punted to a workqueue where they are then processed in
> > a context that can block safely.
>
> I'm still a bit confused. What prevents the following from happening?
>
> 1. io completion of last dirty page of an inode and work item for
> xfs_setfilesize() is queued.
>
> 2. inode removed from dirty list.
The inode has already been removed from the dirty list - that
happens at inode writeback submission time, not IO completion.
> 3. __sync_filesystem() invokes sync_inodes_sb(). There are no dirty
> pages, so it finishes.
There are no dirty pages, but the pages aren't clean, either. i.e
they are still under writeback. Hence we need to invoke
wait_inodes_sb() to wait for writeback on all pages to complete
before returning.
> 4. xfs_fs_sync_fs() is called which calls _xfs_log_force() but the
> work item from #1 hasn't run yet, so the size update isn't written
> out.
The bug here is that wait_inodes_sb() has not been run, therefore
->syncfs is being run before IO completions have been processed and
pages marked clean.
> 5. Crash.
>
> Is it that _xfs_log_force() waits for the setfilesize transaction
> created during writepage?
No, it's wait_inodes_sb() that does the waiting for data IO
completion for sync.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2015-08-24 22:27 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-12 10:12 generic/04[89] fail on XFS due to change in writeback code Eryu Guan
2015-08-12 10:27 ` Eryu Guan
2015-08-13 0:44 ` generic/04[89] fail on XFS due to change in writeback code [4.2-rc1 regression] Dave Chinner
2015-08-13 0:44 ` Dave Chinner
2015-08-13 15:34 ` Tejun Heo
2015-08-13 15:34 ` Tejun Heo
2015-08-13 19:16 ` Tejun Heo
2015-08-13 22:44 ` [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes Tejun Heo
2015-08-13 22:44 ` Tejun Heo
2015-08-14 11:14 ` Jan Kara
2015-08-14 11:14 ` Jan Kara
2015-08-14 15:14 ` Damien Wyart
2015-08-14 15:14 ` Damien Wyart
2015-08-17 20:00 ` Tejun Heo
2015-08-17 20:00 ` Tejun Heo
2015-08-18 5:33 ` Damien Wyart
2015-08-18 5:33 ` Damien Wyart
2015-08-17 20:02 ` Tejun Heo
2015-08-17 20:02 ` Tejun Heo
2015-08-18 9:16 ` Jan Kara
2015-08-18 9:16 ` Jan Kara
2015-08-18 17:47 ` Tejun Heo
2015-08-18 17:47 ` Tejun Heo
2015-08-18 19:54 ` Tejun Heo
2015-08-18 19:54 ` Tejun Heo
2015-08-18 21:56 ` Dave Chinner
2015-08-18 21:56 ` Dave Chinner
2015-08-20 6:12 ` Eryu Guan
2015-08-20 6:12 ` Eryu Guan
2015-08-20 14:01 ` Eryu Guan
2015-08-20 14:36 ` Eryu Guan
2015-08-20 14:36 ` Eryu Guan
2015-08-20 14:37 ` Eryu Guan
2015-08-20 14:37 ` Eryu Guan
2015-08-20 16:55 ` Tejun Heo
2015-08-20 16:55 ` Tejun Heo
2015-08-20 23:04 ` Dave Chinner
2015-08-20 23:04 ` Dave Chinner
2015-08-24 18:10 ` Tejun Heo
2015-08-24 18:10 ` Tejun Heo
2015-08-24 22:27 ` Dave Chinner [this message]
2015-08-24 22:27 ` Dave Chinner
2015-08-24 22:53 ` Tejun Heo
2015-08-24 22:53 ` Tejun Heo
2015-08-21 10:20 ` Eryu Guan
2015-08-21 10:20 ` Eryu Guan
2015-08-22 0:30 ` Dave Chinner
2015-08-22 0:30 ` Dave Chinner
2015-08-22 4:46 ` Eryu Guan
2015-08-22 4:46 ` Eryu Guan
2015-08-24 1:11 ` Dave Chinner
2015-08-24 1:11 ` Dave Chinner
2015-08-24 3:18 ` Eryu Guan
2015-08-24 3:18 ` Eryu Guan
2015-08-24 6:24 ` Dave Chinner
2015-08-24 6:24 ` Dave Chinner
2015-08-24 8:34 ` Eryu Guan
2015-08-24 8:34 ` Eryu Guan
2015-08-24 8:55 ` Dave Chinner
2015-08-24 8:55 ` Dave Chinner
2015-08-24 9:19 ` Jan Kara
2015-08-24 9:19 ` Jan Kara
2015-08-24 14:51 ` Tejun Heo
2015-08-24 14:51 ` Tejun Heo
2015-08-24 17:11 ` Tejun Heo
2015-08-24 17:11 ` Tejun Heo
2015-08-24 19:08 ` Jan Kara
2015-08-24 19:08 ` Jan Kara
2015-08-24 19:32 ` Tejun Heo
2015-08-24 19:32 ` Tejun Heo
2015-08-24 21:09 ` Jan Kara
2015-08-24 21:09 ` Jan Kara
2015-08-24 21:45 ` Tejun Heo
2015-08-24 21:45 ` Tejun Heo
2015-08-24 22:54 ` Tejun Heo
2015-08-24 22:54 ` Tejun Heo
2015-08-24 22:57 ` Dave Chinner
2015-08-24 22:57 ` Dave Chinner
2015-08-25 18:11 ` [PATCH v2 block/for-linus] writeback: sync_inodes_sb() must write out I_DIRTY_TIME inodes and always call wait_sb_inodes() Tejun Heo
2015-08-25 18:11 ` Tejun Heo
2015-08-25 20:37 ` Jens Axboe
2015-08-25 20:37 ` Jens Axboe
2015-08-26 9:00 ` Jan Kara
2015-08-26 9:00 ` Jan Kara
2015-08-13 23:24 ` generic/04[89] fail on XFS due to change in writeback code [4.2-rc1 regression] Tejun Heo
2015-08-13 23:24 ` Tejun Heo
2015-08-14 6:19 ` Eryu Guan
2015-08-14 6:19 ` Eryu Guan
2015-08-17 20:27 ` Tejun Heo
2015-08-17 20:27 ` Tejun Heo
2015-08-18 3:57 ` Eryu Guan
2015-08-18 3:57 ` Eryu Guan
2015-08-18 5:31 ` Eryu Guan
2015-08-18 5:31 ` Eryu Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150824222720.GD714@dastard \
--to=david@fromorbit.com \
--cc=axboe@fb.com \
--cc=axboe@kernel.dk \
--cc=eguan@redhat.com \
--cc=jack@suse.com \
--cc=jack@suse.cz \
--cc=kernel-team@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.