From: Jan Kara <jack@suse.cz>
To: Martijn Coenen <maco@android.com>
Cc: Jan Kara <jack@suse.cz>, Al Viro <viro@zeniv.linux.org.uk>,
Jens Axboe <axboe@kernel.dk>,
miklos@szeredi.hu, tj@kernel.org, linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>,
kernel-team@android.com
Subject: Re: Writeback bug causing writeback stalls
Date: Fri, 22 May 2020 17:36:15 +0200 [thread overview]
Message-ID: <20200522153615.GF14199@quack2.suse.cz> (raw)
In-Reply-To: <CAB0TPYF+Nqd63Xf_JkuepSJV7CzndBw6_MUqcnjusy4ztX24hQ@mail.gmail.com>
On Fri 22-05-20 17:23:30, Martijn Coenen wrote:
> [ dropped android-storage-core@google.com from CC: since that list
> can't receive emails from outside google.com - sorry about that ]
>
> Hi Jan,
>
> On Fri, May 22, 2020 at 4:41 PM Jan Kara <jack@suse.cz> wrote:
> > > The easiest way to fix this, I think, is to call requeue_inode() at the end of
> > > writeback_single_inode(), much like it is called from writeback_sb_inodes().
> > > However, requeue_inode() has the following ominous warning:
> > >
> > > /*
> > > * Find proper writeback list for the inode depending on its current state and
> > > * possibly also change of its state while we were doing writeback. Here we
> > > * handle things such as livelock prevention or fairness of writeback among
> > > * inodes. This function can be called only by flusher thread - noone else
> > > * processes all inodes in writeback lists and requeueing inodes behind flusher
> > > * thread's back can have unexpected consequences.
> > > */
> > >
> > > Obviously this is very critical code both from a correctness and a performance
> > > point of view, so I wanted to run this by the maintainers and folks who have
> > > contributed to this code first.
> >
> > Sadly, the fix won't be so easy. The main problem with calling
> > requeue_inode() from writeback_single_inode() is that if there's parallel
> > sync(2) call, inode->i_io_list is used to track all inodes that need writing
> > before sync(2) can complete. So requeueing inodes in parallel while sync(2)
> > runs can result in breaking data integrity guarantees of it.
>
> Ah, makes sense.
>
> > But I agree
> > we need to find some mechanism to safely move inode to appropriate dirty
> > list reasonably quickly.
> >
> > Probably I'd add an inode state flag telling that inode is queued for
> > writeback by flush worker and we won't touch dirty lists in that case,
> > otherwise we are safe to update current writeback list as needed. I'll work
> > on fixing this as when I was reading the code I've noticed there are other
> > quirks in the code as well. Thanks for the report!
>
> Thanks! While looking at the code I also saw some other paths that
> appeared to be racy, though I haven't worked them out in detail to
> confirm that - the locking around the inode and writeback lists is
> tricky. What's the best way to follow up on those? Happy to post them
> to this same thread after I spend a bit more time looking at the code.
Sure, if you are aware some some other problems, just write them to this
thread. FWIW stuff that I've found so far:
1) __I_DIRTY_TIME_EXPIRED setting in move_expired_inodes() can get lost as
there are other places doing RMW modifications of inode->i_state.
2) sync(2) is prone to livelocks as when we queue inodes from b_dirty_time
list, we don't take dirtied_when into account (and that's the only thing
that makes sure aggressive dirtier cannot livelock sync).
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2020-05-22 15:36 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-22 9:57 Writeback bug causing writeback stalls Martijn Coenen
2020-05-22 14:41 ` Jan Kara
2020-05-22 15:23 ` Martijn Coenen
2020-05-22 15:36 ` Jan Kara [this message]
2020-05-23 8:15 ` Martijn Coenen
2020-05-25 7:31 ` Jan Kara
2020-05-27 8:14 ` Martijn Coenen
2020-05-29 15:20 ` Jan Kara
2020-05-29 19:37 ` Martijn Coenen
2020-06-01 9:09 ` Jan Kara
2020-06-02 12:16 ` Martijn Coenen
[not found] ` <20200524140522.14196-1-hdanton@sina.com>
2020-05-25 7:38 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200522153615.GF14199@quack2.suse.cz \
--to=jack@suse.cz \
--cc=axboe@kernel.dk \
--cc=kernel-team@android.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maco@android.com \
--cc=miklos@szeredi.hu \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.