From: Jan Kara <jack@suse.cz>
To: harshad shirwadkar <harshadshirwadkar@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
Ritesh Harjani <riteshh@linux.ibm.com>,
"Theodore Y. Ts'o" <tytso@mit.edu>
Subject: Re: [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait
Date: Mon, 21 Mar 2022 15:06:17 +0100 [thread overview]
Message-ID: <20220321140617.wc2ohrorqny4leyc@quack3.lan> (raw)
In-Reply-To: <CAD+ocbzGsf3=2OK5MD_MyF=SyV63q1Z7Vg5VtkaE5FzmZ7_qqw@mail.gmail.com>
Sorry for delayed reply, I've got dragged into other things and somewhat
forgot about this...
On Fri 11-03-22 00:25:48, harshad shirwadkar wrote:
> Hmm, after removing ext4_fc_track_inode() from ext4_journal_start(), I
> see one deadlock - there are some places in code where
> ext4_mark_inode_dirty gets called while holding i_data_sem. Commit
> path requires i_data_sem to commit inode data (via ext4_map_blocks).
> So, if an inode is being committed, ext4_mark_inode_dirty may start
> waiting for the inode to commit while holding i_data_sem and the
> commit path may wait on i_data_sem.
Indeed, that is a problem.
> The right way to fix this is to
> call ext4_fc_track_inode in such places before acquiring i_data_sem in
> write mode. But that would mean we sprinkle code with more
> ext4_fc_track_inode() calls which is something that I preferably
> wanted to avoid.
So I agree calling ext4_fc_track_inode() from ext4_reserve_inode_write()
isn't going to fly as is. We need to find a better way of doing things.
> This makes me wonder though, for fast commits, is it a terrible idea
> to extend the meaning of ext4_journal_start() from "start a new
> handle" to "start a new handle with an intention to modify the passed
> inode"? Most of the handles modify only one inode, and for other
> places where we do modify multiple inodes, ext4_reserve_inode_write()
> would ensure that those inodes are tracked as well.
Well but to avoid deadlocks like you've described above you would have to
start tracking inode with explicit ext4_fc_track_inode() calls before
grabbing i_data_sem. ext4_reserve_inode_write() would be only useful for an
assertion check that indeed the inode is already tracked.
> All of the
> existing places where inode gets modified after grabbing i_data_sem,
> i_data_sem is grabbed only after starting the handle. This would take
> care of the deadlock mentioned above and similar deadlocks. Another
> advantage with doing this is that developers wouldn't need to worry
> about adding explicit ext4_fc_track_inode() calls for future changes.
Well, at least for the simple case where only one inode is modified. But I
agree that's a majority.
> If we decide to do this, we would need to do a thorough code review to
> ensure that the above rule is followed everywhere. But note that
> ext4_fc_track_inode() is idempotent, so it doesn't matter if this
> function gets called multiple times in the same handle. So to avoid
> breaking fast commits, we can be super careful and in the first
> version, we can have ext4_fc_track_range() calls in
> ext4_reserve_inode_dirty(), ext4_journal_start(), inline.c and in
> handles where i_data_sem gets acquired in write mode. We can then
> carefully evaluate each code path and remove redundant
> ext4_fc_track_range() calls.
As I wrote above, if we go this path, I'd be for ext4_fc_track_inode()
calls in ext4_journal_start() and then adding explicit calls to
ext4_fc_track_inode() where additionally needed and have only assertion
checks in ext4_reserve_inode_dirty() and other places which modify inode
metadata, to catch places which need explicit calls to
ext4_fc_track_inode(). That way we won't have any hidden deadlocks in the
code waiting to happen.
I have also another proposal for consideration how we could handle this.
Mostly branstorming now: We could also drop the need for fastcommit code to
acquire i_data_sem during commit. We could use just information in extent
status tree to provide block mapping for the fastcommit code (that does not
need i_data_sem). The only thing is we'd need to make sure modified extents
from the status tree are not evicted from memory before the appropriate
transaction commits so that they are available for appropriate fastcommit -
for that we'd probably need to add TID of the transaction that last
modified an extent and add check into the shrinker to avoid shrinking
uncommitted extents. As a bonus, we could now add to fastcommit only
extents with appropriate TID and thus save on extent logging for sparsely
modified inode.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2022-03-21 14:13 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-08 16:33 [PATCH v2 0/5] ext4: improve commit path performance for fast commit Harshad Shirwadkar
2022-03-08 16:33 ` [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
2022-03-09 10:10 ` Jan Kara
2022-03-11 4:02 ` harshad shirwadkar
2022-03-08 16:33 ` [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait Harshad Shirwadkar
2022-03-09 10:14 ` Jan Kara
2022-03-11 4:17 ` harshad shirwadkar
2022-03-11 8:25 ` harshad shirwadkar
2022-03-21 14:06 ` Jan Kara [this message]
2022-04-18 21:02 ` harshad shirwadkar
2022-03-08 16:33 ` [PATCH v2 3/5] ext4: rework fast commit commit path Harshad Shirwadkar
2022-03-09 10:17 ` Jan Kara
2022-03-08 16:33 ` [PATCH v2 4/5] ext4: drop i_fc_updates from inode fc info Harshad Shirwadkar
2022-03-09 10:17 ` Jan Kara
2022-03-08 16:33 ` [PATCH v2 5/5] ext4: update code documentation Harshad Shirwadkar
2022-03-09 10:19 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220321140617.wc2ohrorqny4leyc@quack3.lan \
--to=jack@suse.cz \
--cc=harshadshirwadkar@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=riteshh@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox