linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Martijn Coenen <maco@android.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	Jens Axboe <axboe@kernel.dk>,
	miklos@szeredi.hu, tj@kernel.org, linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	android-storage-core@google.com, kernel-team@android.com
Subject: Re: Writeback bug causing writeback stalls
Date: Fri, 22 May 2020 16:41:00 +0200	[thread overview]
Message-ID: <20200522144100.GE14199@quack2.suse.cz> (raw)
In-Reply-To: <CAB0TPYGCOZmixbzrV80132X=V5TcyQwD6V7x-8PKg_BqCva8Og@mail.gmail.com>

Hi!

On Fri 22-05-20 11:57:42, Martijn Coenen wrote:
<snip>

> So, the sequence of events is something like this. Let's assume the inode is
> already on b_dirty_time for valid reasons. Then:
> 
> CPU1                                          CPU2
> fuse_flush()
>   write_inode_now()
>     writeback_single_inode()
>       sets I_SYNC
>         __writeback_single_inode()
>           writes back data
>           clears inode dirty flags
>           unlocks inode
>           calls mark_inode_dirty_sync()
>             sets I_DIRTY_SYNC, but doesn't
>             update wb list because I_SYNC is
>             still set
>                                               write() // somebody else writes
>                                               mark_inode_dirty(I_DIRTY_PAGES)
>                                               sets I_DIRTY_PAGES on i_state
>                                               doesn't update wb list,
>                                               because I_SYNC set
>       locks inode again
>       sees inode is still dirty,
>       doesn't touch WB list
>       clears I_SYNC
> 
> So now we have an inode on b_dirty_time with I_DIRTY_PAGES | I_DIRTY_SYNC set,
> and subsequent calls to mark_inode_dirty() with either I_DIRTY_PAGES or
> I_DIRTY_SYNC will do nothing to change that. The flusher won't touch
> the inode either,
> because it's not on a b_dirty or b_io list.

Thanks for the detailed analysis and explanation! I agree that what you
describe is a bug in the writeback code.

> The easiest way to fix this, I think, is to call requeue_inode() at the end of
> writeback_single_inode(), much like it is called from writeback_sb_inodes().
> However, requeue_inode() has the following ominous warning:
> 
> /*
>  * Find proper writeback list for the inode depending on its current state and
>  * possibly also change of its state while we were doing writeback.  Here we
>  * handle things such as livelock prevention or fairness of writeback among
>  * inodes. This function can be called only by flusher thread - noone else
>  * processes all inodes in writeback lists and requeueing inodes behind flusher
>  * thread's back can have unexpected consequences.
>  */
> 
> Obviously this is very critical code both from a correctness and a performance
> point of view, so I wanted to run this by the maintainers and folks who have
> contributed to this code first.

Sadly, the fix won't be so easy. The main problem with calling
requeue_inode() from writeback_single_inode() is that if there's parallel
sync(2) call, inode->i_io_list is used to track all inodes that need writing
before sync(2) can complete. So requeueing inodes in parallel while sync(2)
runs can result in breaking data integrity guarantees of it. But I agree
we need to find some mechanism to safely move inode to appropriate dirty
list reasonably quickly.

Probably I'd add an inode state flag telling that inode is queued for
writeback by flush worker and we won't touch dirty lists in that case,
otherwise we are safe to update current writeback list as needed. I'll work
on fixing this as when I was reading the code I've noticed there are other
quirks in the code as well. Thanks for the report!

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2020-05-22 14:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-22  9:57 Writeback bug causing writeback stalls Martijn Coenen
2020-05-22 14:41 ` Jan Kara [this message]
2020-05-22 15:23   ` Martijn Coenen
2020-05-22 15:36     ` Jan Kara
2020-05-23  8:15       ` Martijn Coenen
2020-05-25  7:31         ` Jan Kara
2020-05-27  8:14           ` Martijn Coenen
2020-05-29 15:20             ` Jan Kara
2020-05-29 19:37               ` Martijn Coenen
2020-06-01  9:09                 ` Jan Kara
2020-06-02 12:16                   ` Martijn Coenen
     [not found] ` <20200524140522.14196-1-hdanton@sina.com>
2020-05-25  7:38   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200522144100.GE14199@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=android-storage-core@google.com \
    --cc=axboe@kernel.dk \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maco@android.com \
    --cc=miklos@szeredi.hu \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).