* [PATCH] btrfs: add delayed_iput list head to btrfs inode
@ 2013-02-05 22:00 Eric Sandeen
2013-02-05 23:14 ` Zach Brown
0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2013-02-05 22:00 UTC (permalink / raw)
To: linux-btrfs; +Cc: Jeff Mahoney
Following the lead from Jeff Mahoney's comment in the code:
/* JDM: If this is fs-wide, why can't we add a pointer to
* btrfs_inode instead and avoid the allocation? */
Remove the NOFAIL kmalloc in btrfs_add_delayed_iput(), and just
use a list head in the btrfs inode.
This does grow the btrfs inode by 16 bytes, but doesn't change
slab cache utilization on my machine. Rearranging the btrfs
inode could get back 8 bytes or so if people are worried about it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
---
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 2a8c242..3024006 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -86,6 +86,8 @@ struct btrfs_inode {
*/
struct list_head ordered_operations;
+ struct list_head delayed_iput;
+
/* node for the red-black tree that links inodes in subvolume root */
struct rb_node rb_node;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cc93b23..cac7f43 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2119,34 +2119,24 @@ zeroit:
return -EIO;
}
-struct delayed_iput {
- struct list_head list;
- struct inode *inode;
-};
-
-/* JDM: If this is fs-wide, why can't we add a pointer to
- * btrfs_inode instead and avoid the allocation? */
void btrfs_add_delayed_iput(struct inode *inode)
{
- struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
- struct delayed_iput *delayed;
+ struct btrfs_inode *b_inode = BTRFS_I(inode);
+ struct btrfs_fs_info *fs_info = b_inode->root->fs_info;
if (atomic_add_unless(&inode->i_count, -1, 1))
return;
- delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
- delayed->inode = inode;
-
spin_lock(&fs_info->delayed_iput_lock);
- list_add_tail(&delayed->list, &fs_info->delayed_iputs);
+ list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs);
spin_unlock(&fs_info->delayed_iput_lock);
}
void btrfs_run_delayed_iputs(struct btrfs_root *root)
{
LIST_HEAD(list);
+ struct btrfs_inode *b_inode;
struct btrfs_fs_info *fs_info = root->fs_info;
- struct delayed_iput *delayed;
int empty;
spin_lock(&fs_info->delayed_iput_lock);
@@ -2160,10 +2150,9 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
spin_unlock(&fs_info->delayed_iput_lock);
while (!list_empty(&list)) {
- delayed = list_entry(list.next, struct delayed_iput, list);
- list_del(&delayed->list);
- iput(delayed->inode);
- kfree(delayed);
+ b_inode = list_entry(list.next, struct btrfs_inode, delayed_iput);
+ list_del(&b_inode->delayed_iput);
+ iput(&b_inode->vfs_inode);
}
}
@@ -7142,6 +7131,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
btrfs_ordered_inode_tree_init(&ei->ordered_tree);
INIT_LIST_HEAD(&ei->delalloc_inodes);
INIT_LIST_HEAD(&ei->ordered_operations);
+ INIT_LIST_HEAD(&ei->delayed_iput);
RB_CLEAR_NODE(&ei->rb_node);
return inode;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
2013-02-05 22:00 [PATCH] btrfs: add delayed_iput list head to btrfs inode Eric Sandeen
@ 2013-02-05 23:14 ` Zach Brown
2013-02-06 2:08 ` Liu Bo
0 siblings, 1 reply; 7+ messages in thread
From: Zach Brown @ 2013-02-05 23:14 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-btrfs, Jeff Mahoney
> + struct btrfs_inode *b_inode = BTRFS_I(inode);
> + struct btrfs_fs_info *fs_info = b_inode->root->fs_info;
>
> if (atomic_add_unless(&inode->i_count, -1, 1))
> return;
>
> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
> - delayed->inode = inode;
> -
> spin_lock(&fs_info->delayed_iput_lock);
> - list_add_tail(&delayed->list, &fs_info->delayed_iputs);
> + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs);
> spin_unlock(&fs_info->delayed_iput_lock);
> }
Hmm. I'm not great with inode life cycles, but isn't this only safe if
someone else can't get an i_count reference while this is in flight? It
looks like the final iput does the unhashing, and so on, so couldn't an
iget/iput race with this and try to add the inode's list_head twice?
- z
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
2013-02-05 23:14 ` Zach Brown
@ 2013-02-06 2:08 ` Liu Bo
2013-02-06 14:14 ` Eric Sandeen
2013-02-06 15:53 ` Eric Sandeen
0 siblings, 2 replies; 7+ messages in thread
From: Liu Bo @ 2013-02-06 2:08 UTC (permalink / raw)
To: Zach Brown; +Cc: Eric Sandeen, linux-btrfs, Jeff Mahoney
On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:
> > + struct btrfs_inode *b_inode = BTRFS_I(inode);
> > + struct btrfs_fs_info *fs_info = b_inode->root->fs_info;
> >
> > if (atomic_add_unless(&inode->i_count, -1, 1))
> > return;
> >
> > - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
> > - delayed->inode = inode;
> > -
> > spin_lock(&fs_info->delayed_iput_lock);
> > - list_add_tail(&delayed->list, &fs_info->delayed_iputs);
> > + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs);
> > spin_unlock(&fs_info->delayed_iput_lock);
> > }
>
> Hmm. I'm not great with inode life cycles, but isn't this only safe if
> someone else can't get an i_count reference while this is in flight? It
> looks like the final iput does the unhashing, and so on, so couldn't an
> iget/iput race with this and try to add the inode's list_head twice?
Yeah, same concern here. Basically this will result in inodes still being
in use on unmount.
Actually I did a similar one, here is some disscussion:
https://patchwork.kernel.org/patch/1824711/
thanks,
liubo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
2013-02-06 2:08 ` Liu Bo
@ 2013-02-06 14:14 ` Eric Sandeen
2013-02-06 15:53 ` Eric Sandeen
1 sibling, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2013-02-06 14:14 UTC (permalink / raw)
To: bo.li.liu@oracle.com; +Cc: Zach Brown, Eric Sandeen, linux-btrfs, Jeff Mahoney
On Feb 5, 2013, at 8:11 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:
>>> + struct btrfs_inode *b_inode = BTRFS_I(inode);
>>> + struct btrfs_fs_info *fs_info = b_inode->root->fs_info;
>>>
>>> if (atomic_add_unless(&inode->i_count, -1, 1))
>>> return;
>>>
>>> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
>>> - delayed->inode = inode;
>>> -
>>> spin_lock(&fs_info->delayed_iput_lock);
>>> - list_add_tail(&delayed->list, &fs_info->delayed_iputs);
>>> + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs);
>>> spin_unlock(&fs_info->delayed_iput_lock);
>>> }
>>
>> Hmm. I'm not great with inode life cycles, but isn't this only safe if
>> someone else can't get an i_count reference while this is in flight? It
>> looks like the final iput does the unhashing, and so on, so couldn't an
>> iget/iput race with this and try to add the inode's list_head twice?
>
> Yeah, same concern here. Basically this will result in inodes still being
> in use on unmount.
>
> Actually I did a similar one, here is some disscussion:
>
> https://patchwork.kernel.org/patch/1824711/
>
Ok, thanks all. We should remove Jeff's comment then, it sure sounded like a good idea...
Eric
> thanks,
> liubo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
2013-02-06 2:08 ` Liu Bo
2013-02-06 14:14 ` Eric Sandeen
@ 2013-02-06 15:53 ` Eric Sandeen
2013-02-06 16:02 ` Liu Bo
1 sibling, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2013-02-06 15:53 UTC (permalink / raw)
To: bo.li.liu; +Cc: Zach Brown, linux-btrfs, Jeff Mahoney
On 2/5/13 8:08 PM, Liu Bo wrote:
> On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:
>>> + struct btrfs_inode *b_inode = BTRFS_I(inode);
>>> + struct btrfs_fs_info *fs_info = b_inode->root->fs_info;
>>>
>>> if (atomic_add_unless(&inode->i_count, -1, 1))
>>> return;
>>>
>>> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
>>> - delayed->inode = inode;
>>> -
>>> spin_lock(&fs_info->delayed_iput_lock);
>>> - list_add_tail(&delayed->list, &fs_info->delayed_iputs);
>>> + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs);
>>> spin_unlock(&fs_info->delayed_iput_lock);
>>> }
>>
>> Hmm. I'm not great with inode life cycles, but isn't this only safe if
>> someone else can't get an i_count reference while this is in flight? It
>> looks like the final iput does the unhashing, and so on, so couldn't an
>> iget/iput race with this and try to add the inode's list_head twice?
>
> Yeah, same concern here. Basically this will result in inodes still being
> in use on unmount.
>
> Actually I did a similar one, here is some disscussion:
>
> https://patchwork.kernel.org/patch/1824711/
I read it, thanks. Did you try the counter approach?
-Eric
> thanks,
> liubo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
2013-02-06 15:53 ` Eric Sandeen
@ 2013-02-06 16:02 ` Liu Bo
2013-02-12 7:34 ` Jeff Mahoney
0 siblings, 1 reply; 7+ messages in thread
From: Liu Bo @ 2013-02-06 16:02 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Zach Brown, linux-btrfs, Jeff Mahoney
On Wed, Feb 06, 2013 at 09:53:05AM -0600, Eric Sandeen wrote:
> On 2/5/13 8:08 PM, Liu Bo wrote:
> > On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:
> >>> + struct btrfs_inode *b_inode = BTRFS_I(inode);
> >>> + struct btrfs_fs_info *fs_info = b_inode->root->fs_info;
> >>>
> >>> if (atomic_add_unless(&inode->i_count, -1, 1))
> >>> return;
> >>>
> >>> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
> >>> - delayed->inode = inode;
> >>> -
> >>> spin_lock(&fs_info->delayed_iput_lock);
> >>> - list_add_tail(&delayed->list, &fs_info->delayed_iputs);
> >>> + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs);
> >>> spin_unlock(&fs_info->delayed_iput_lock);
> >>> }
> >>
> >> Hmm. I'm not great with inode life cycles, but isn't this only safe if
> >> someone else can't get an i_count reference while this is in flight? It
> >> looks like the final iput does the unhashing, and so on, so couldn't an
> >> iget/iput race with this and try to add the inode's list_head twice?
> >
> > Yeah, same concern here. Basically this will result in inodes still being
> > in use on unmount.
> >
> > Actually I did a similar one, here is some disscussion:
> >
> > https://patchwork.kernel.org/patch/1824711/
>
> I read it, thanks. Did you try the counter approach?
Yes, it'll bring a tradeoff situation.
With counter, we need to lock the list all the time instead of
doing a splice on the list and unlocking it. I think splice would be
faster so I didn't go further(I MIGHT be wrong on this)..
thanks,
liubo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
2013-02-06 16:02 ` Liu Bo
@ 2013-02-12 7:34 ` Jeff Mahoney
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Mahoney @ 2013-02-12 7:34 UTC (permalink / raw)
To: bo.li.liu; +Cc: Eric Sandeen, Zach Brown, linux-btrfs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2/6/13 11:02 AM, Liu Bo wrote:
> On Wed, Feb 06, 2013 at 09:53:05AM -0600, Eric Sandeen wrote:
>> On 2/5/13 8:08 PM, Liu Bo wrote:
>>> On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:
>>>>> + struct btrfs_inode *b_inode = BTRFS_I(inode); + struct
>>>>> btrfs_fs_info *fs_info = b_inode->root->fs_info;
>>>>>
>>>>> if (atomic_add_unless(&inode->i_count, -1, 1)) return;
>>>>>
>>>>> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS |
>>>>> __GFP_NOFAIL); - delayed->inode = inode; -
>>>>> spin_lock(&fs_info->delayed_iput_lock); -
>>>>> list_add_tail(&delayed->list, &fs_info->delayed_iputs); +
>>>>> list_add_tail(&b_inode->delayed_iput,
>>>>> &fs_info->delayed_iputs);
>>>>> spin_unlock(&fs_info->delayed_iput_lock); }
>>>>
>>>> Hmm. I'm not great with inode life cycles, but isn't this
>>>> only safe if someone else can't get an i_count reference
>>>> while this is in flight? It looks like the final iput does
>>>> the unhashing, and so on, so couldn't an iget/iput race with
>>>> this and try to add the inode's list_head twice?
>>>
>>> Yeah, same concern here. Basically this will result in inodes
>>> still being in use on unmount.
>>>
>>> Actually I did a similar one, here is some disscussion:
>>>
>>> https://patchwork.kernel.org/patch/1824711/
>>
>> I read it, thanks. Did you try the counter approach?
>
> Yes, it'll bring a tradeoff situation.
>
> With counter, we need to lock the list all the time instead of
> doing a splice on the list and unlocking it. I think splice would
> be faster so I didn't go further(I MIGHT be wrong on this)..
Thanks for looking into this. I left this note to myself during the
development of the error handling patches while on a tangent to try to
eliminate NOFAIL allocs. It's not the alloc/free that's the issue
(though eliminating these can probably only help), it's that NOFAIL
allocs essentially become locks when memory pressure is high enough
that the NOFAIL functionality gets invoked. OTOH, bailing out of that
path when we encounter an allocation failure is impossible.
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org
iQIcBAEBAgAGBQJRGfB3AAoJEB57S2MheeWy/E4QALVJ2YI1zbwCHnkUia+yuT40
LoYfyRJoTiKwnwiFeByy98tX9WxVnXGZUVpR8GMwVuLfDIMyVgQmaAicqiirHHHD
ySNV3jsyz8HCOb6ALu7eQyWy4F8yBD1HG75njvvzVO+zUlSsaKGmfvsXS0f4ubCk
hyxg7OujW++cWg+WOedCZsg2n7kF34MLPJiyjS1E1vw8DZW3tHKWgv/hyJIzp+JK
wIZQPrzNUTp0kS4N6+b8rJnXTNkj7zMhWPYeJdIMIG9/+oDr2r1N/XedYMY7fkdS
g7Gj28nmTtufYlTcgztL6MHFwxm/tRQNl85+lRU/zYFKIR0ok4+1kFrpZ5KcF97m
NZeGSsSiaZfMXE+t6B/AgagFJUws+y/RHBJ/V9paMNjsojLRUBVPQOdeHw355XVm
lJeTtyElA+SSawPkzf2115IEj1EgFmHIouSQJdUCPoTfS126NHhH0PYX2GHgAs8b
1ImyG9E/Z/JswVRzAxWGQSffdxzg5Vb8P8w7LzAlIdToVa0tM3Q2n9h3a0vcl83m
NQEqe3+GnsflB2xSVyoztVx+ZL8664HC1UzIjgb7oUihGHe7gJZ4uqDgaClGprKh
pQyvr8zsbjeMwpvlqv7gRQDFyY3JKK4W5UeS/pGjTM7ORS1LmEUTR5S4pQknTUgc
Qj/bH6806My5pW3VB5i5
=ZSdX
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-02-12 7:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-05 22:00 [PATCH] btrfs: add delayed_iput list head to btrfs inode Eric Sandeen
2013-02-05 23:14 ` Zach Brown
2013-02-06 2:08 ` Liu Bo
2013-02-06 14:14 ` Eric Sandeen
2013-02-06 15:53 ` Eric Sandeen
2013-02-06 16:02 ` Liu Bo
2013-02-12 7:34 ` Jeff Mahoney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).