From: Chris Mason <chris.mason@oracle.com>
To: Mingming <cmm@us.ibm.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC][PATCH]btrfs delete ordered inode handling fix
Date: Thu, 22 May 2008 10:11:19 -0400 [thread overview]
Message-ID: <200805221011.19568.chris.mason@oracle.com> (raw)
In-Reply-To: <1211408996.8596.1.camel@BVR-FS.beaverton.ibm.com>
On Wednesday 21 May 2008, Mingming wrote:
> Hi Chris, I thought I spotted a few bugs, While looking at how to
> properly remove inode from ordered tree, let me know if I got it right.
Hi Mingming, thanks for going through this code. The i_count rules in the
current code should work like this:
* btrfs_add_ordered_inode calls igrab when the inode is inserted into the
list. The whole time the inode is on the list, there's an extra reference on
i_count. There will be no final iput while the inode is on the list.
* btrfs_find_first_ordered_inode and btrfs_find_del_ordered_inode both
increment i_count before returning the inode. This gives the caller an extra
reference on i_count in addition to the one held by the list.
* ordered-data.c never drops i_count. This allows (and forces) the caller to
decide when it is safe to call iput because callers have different locks held
and because the last iput might can in theory trigger delete_inode.
All of the current callers of btrfs_del_ordered_inode already have a i_count
increased before they call it. So, they can safely just do an atomic_dec on
i_count and let their own iputs later do the delete_inode.
>
> * inode->i_count accounting issue
> - remove extra i_count_decrease after calling
> filemap_file_write_and_wait() in btrfs_write_ordered_inodes()
There's an atomic_dec of i_count to drop the reference from
btrfs_find_del_first_ordered_inode, and an iput to drop the reference added
by igrab when the inode was put on the list.
>
> - Remove extra i_count decrease after each call
> btrfs_del_ordered_inode(), it decrease the i_count again. I couldn't
> find the where the i_count being increased. The tree_lock should
> protecting it races with btrfs_write_ordered_inodes(), no?
The increase comes from igrab on list insertion
>
> * There is possible race with inode delete and
> btrfs_find_first_ordered_inode(). The inode could possibly in the
> process of freeing while we are trying to get hold of it during commit
> transaction. The fix is using igrab() instead, and search for next inode
> in the tree if the found one is in the middle of being released.
These kinds of races where the main reason why I had the list take a reference
on the inode. delete_inode won't be called while i_count is increased.
Over the long term I'd prefer to move the ordered-data list to a model where
the list doesn't have a reference and it is magically removed after all the
dirty pages are gone (by the end_io_hook handlers in inode.c). The end_io
hooks in inode.c may be sufficient for this.
>
> * get rid of btrfs_put_inode(), and move the functionality under the
> btrfs_del_ordered_inode() directly.
I like this change, thanks.
>
> * Remove the inode from ordered tree at last iput(). Did not do it at
> file release() time, as it may remove the inode from the ordered tree
> before ensure the ordering of write to the same inode from other
> process.
>
> Perhaps calling btrfs_del_ordered_inode() under unlink() is enough, but
> it would not be hurt to do it again at delete_inode() time.
I'm afraid we'll have to do it at file_release time, at least until the
ordered list is changed not to keep a reference.
>
>
> Here is the patch trying to address above issues... compiled, but not
> tested. Is ordered mode on by default?
>
Yes, there's actually no way to turn it off ;)
-chris
next prev parent reply other threads:[~2008-05-22 14:11 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-24 22:47 cloning file data Sage Weil
2008-04-25 13:41 ` [Btrfs-devel] " Chris Mason
2008-04-25 16:50 ` Zach Brown
2008-04-25 16:58 ` Chris Mason
2008-04-25 17:04 ` Zach Brown
2008-04-25 16:50 ` Zach Brown
2008-04-25 18:32 ` Sage Weil
2008-04-25 18:26 ` Sage Weil
2008-04-26 4:38 ` Sage Weil
2008-05-03 4:44 ` Yan Zheng
2008-05-03 6:16 ` Sage Weil
2008-05-03 6:48 ` Yan Zheng
2008-05-03 7:25 ` Yan Zheng
2008-05-05 10:27 ` Chris Mason
2008-05-05 15:57 ` Sage Weil
2008-05-21 17:19 ` btrfs_put_inode Mingming
2008-05-21 18:02 ` btrfs_put_inode Chris Mason
2008-05-21 18:45 ` btrfs_put_inode Mingming
2008-05-21 18:52 ` btrfs_put_inode Chris Mason
2008-05-21 22:29 ` [RFC][PATCH]btrfs delete ordered inode handling fix Mingming
2008-05-22 14:11 ` Chris Mason [this message]
2008-05-22 17:43 ` Mingming
2008-05-22 17:47 ` Chris Mason
2008-05-22 20:39 ` Mingming
2008-05-22 22:23 ` Chris Mason
2008-05-21 18:23 ` btrfs_put_inode Ryan Hope
2008-05-21 18:32 ` btrfs_put_inode Chris Mason
2008-05-21 19:02 ` btrfs_put_inode Mingming
2008-04-25 20:28 ` [Btrfs-devel] cloning file data Sage Weil
2008-04-29 20:52 ` Chris Mason
2008-05-02 20:50 ` Chris Mason
2008-05-02 21:38 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200805221011.19568.chris.mason@oracle.com \
--to=chris.mason@oracle.com \
--cc=cmm@us.ibm.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.