From: Boaz Harrosh <bharrosh@panasas.com>
To: Christoph Hellwig <hch@infradead.org>,
Dave Chinner <david@fromorbit.com>,
Nick Piggin <npiggin@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 06/18] exofs: use iput() for inode reference count decrements
Date: Sun, 24 Oct 2010 20:06:06 +0200 [thread overview]
Message-ID: <4CC4758E.9020102@panasas.com> (raw)
In-Reply-To: <20101017012437.GA4227@infradead.org>
On 10/17/2010 03:24 AM, Christoph Hellwig wrote:
> On Wed, Oct 13, 2010 at 10:49:46AM -0400, Boaz Harrosh wrote:
>> I suspect it's not a bug but a useless inc/dec because in all my testing
>> I have not seen an inode leak. Let me investigate if it can be removed.
>>
>> So I do not think we need it for 2.6.36.
>>
>> I'll take this patch into my 2.6.37-rcX merge window. It should appear
>> in linux-next by tomorrow. Hopefully followed by a removal patch later.
>
> It's a very real bug. If an inode goes away in-core before the creation
> on the OSD has finished, e.g. by using the drop_cache files the
> atomic_dec instead of the iput means you will never call iput_final
> and thus leak all ressources associated with the inode, as well as
> leaving it on all lists. It's not easy to hit, but very nasty when
> it is hit.
>
Hi Christoph Dave
As I suspected this fix is not good. For a simple reason, The create_done()
is called from scsi_done() which has irq disabled. So in iput() in the case
evict() is needed we BUG on trying to take the i_mutex.
> Another option to fix it might be to drop the refcount games and just
> add a wait for the objection creation in the evict_inode method to
> make sure we never remove the inode before the object creation
> has finished.
>
On the other hand this solution does work, perfectly. Actually there
was already a "wait for the objection creation" in exofs_evict_inode().
Hence the reason I've never seen an inode leak. Below is the patch I'm
putting in -next for push to 2.6.37 (So there was no bug in exofs after all,
I'm not CC(ing) stable@)
Boaz
---
From: Boaz Harrosh <bharrosh@panasas.com>
Subject: [PATCH] exofs: remove inode->i_count ref/deref in exofs_new_inode/create_done
exofs_new_inode was incrementing the inode->i_count and
decrementing it in create_done, in a bad attempt to make
sure the inode will still be there when asynchronous create_done
finally arrives. This was stupid because iput() was not called,
and if is the final ref, could leak the inode.
However all this is not needed, because at exofs_evict_inode()
we already wait for create_done return by waiting for the
create_object event. Therefore remove the extra ref counting
and just Thicken the comment at exofs_evict_inode() a bit.
(Also use ready made __exofs_wait_obj_created instead of
open-coding it.)
CC: Dave Chinner <dchinner@redhat.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
fs/exofs/inode.c | 19 ++++++-------------
1 files changed, 6 insertions(+), 13 deletions(-)
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 0ba9886..31e9164 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -1102,7 +1102,6 @@ static void create_done(struct exofs_io_state *ios, void *p)
set_obj_created(oi);
- atomic_dec(&inode->i_count);
wake_up(&oi->i_wq);
}
@@ -1153,17 +1152,11 @@ struct inode *exofs_new_inode(struct inode *dir, int mode)
ios->obj.id = exofs_oi_objno(oi);
exofs_make_credential(oi->i_cred, &ios->obj);
- /* increment the refcount so that the inode will still be around when we
- * reach the callback
- */
- atomic_inc(&inode->i_count);
-
ios->done = create_done;
ios->private = inode;
ios->cred = oi->i_cred;
ret = exofs_sbi_create(ios);
if (ret) {
- atomic_dec(&inode->i_count);
exofs_put_io_state(ios);
return ERR_PTR(ret);
}
@@ -1321,12 +1314,12 @@ void exofs_evict_inode(struct inode *inode)
inode->i_size = 0;
end_writeback(inode);
- /* if we are deleting an obj that hasn't been created yet, wait */
- if (!obj_created(oi)) {
- BUG_ON(!obj_2bcreated(oi));
- wait_event(oi->i_wq, obj_created(oi));
- /* ignore the error attempt a remove anyway */
- }
+ /* if we are deleting an obj that hasn't been created yet, wait
+ * This also makes sure that create_done cannot be called with an
+ * already deleted inode.
+ */
+ __exofs_wait_obj_created(oi);
+ /* ignore the error attempt a remove anyway */
/* Now Remove the OSD objects */
ret = exofs_get_io_state(&sbi->layout, &ios);
--
1.7.2.3
next prev parent reply other threads:[~2010-10-24 18:06 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-13 0:15 fs: Inode cache scalability V3 Dave Chinner
2010-10-13 0:15 ` [PATCH 01/18] kernel: add bl_list Dave Chinner
2010-10-13 0:15 ` [PATCH 02/18] fs: Convert nr_inodes and nr_unused to per-cpu counters Dave Chinner
2010-10-13 0:15 ` [PATCH 03/18] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-10-13 13:32 ` Christoph Hellwig
2010-10-16 0:11 ` Dave Chinner
2010-10-16 7:56 ` Nick Piggin
2010-10-13 0:15 ` [PATCH 04/18] fs: inode split IO and LRU lists Dave Chinner
2010-10-13 11:31 ` Christoph Hellwig
2010-10-13 0:15 ` [PATCH 05/18] fs: Clean up inode reference counting Dave Chinner
2010-10-13 11:33 ` Christoph Hellwig
2010-10-13 0:15 ` [PATCH 06/18] exofs: use iput() for inode reference count decrements Dave Chinner
2010-10-13 11:34 ` Christoph Hellwig
2010-10-13 14:49 ` Boaz Harrosh
2010-10-17 1:24 ` Christoph Hellwig
2010-10-24 18:06 ` Boaz Harrosh [this message]
2010-10-13 0:15 ` [PATCH 07/18] fs: rework icount to be a locked variable Dave Chinner
2010-10-13 11:36 ` Christoph Hellwig
2010-10-16 0:15 ` Dave Chinner
2010-10-16 0:20 ` Dave Chinner
2010-10-16 0:23 ` Christoph Hellwig
2010-10-13 0:15 ` [PATCH 08/18] fs: Factor inode hash operations into functions Dave Chinner
2010-10-13 0:15 ` [PATCH 09/18] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-10-13 11:41 ` Christoph Hellwig
2010-10-13 15:05 ` Christoph Hellwig
2010-10-13 0:15 ` [PATCH 10/18] fs: add a per-superblock lock for the inode list Dave Chinner
2010-10-13 0:15 ` [PATCH 11/18] fs: split locking of inode writeback and LRU lists Dave Chinner
2010-10-13 3:26 ` Lin Ming
2010-10-13 3:26 ` Lin Ming
2010-10-13 13:18 ` Christoph Hellwig
2010-10-13 0:15 ` [PATCH 12/18] fs: Protect inode->i_state with the inode->i_lock Dave Chinner
2010-10-13 13:27 ` Christoph Hellwig
2010-10-13 0:15 ` [PATCH 13/18] fs: introduce a per-cpu last_ino allocator Dave Chinner
2010-10-13 0:15 ` [PATCH 14/18] fs: Make iunique independent of inode_lock Dave Chinner
2010-10-13 0:15 ` [PATCH 15/18] fs: icache remove inode_lock Dave Chinner
2010-10-13 2:09 ` Dave Chinner
2010-10-13 13:42 ` Christoph Hellwig
2010-10-13 0:15 ` [PATCH 16/18] fs: Reduce inode I_FREEING and factor inode disposal Dave Chinner
2010-10-13 13:51 ` Christoph Hellwig
2010-10-13 0:16 ` [PATCH 17/18] fs: split __inode_add_to_list Dave Chinner
2010-10-13 15:08 ` Christoph Hellwig
2010-10-13 0:16 ` [PATCH 18/18] fs: do not assign default i_ino in new_inode Dave Chinner
2010-10-16 7:57 ` Nick Piggin
2010-10-16 16:30 ` Christoph Hellwig
2010-10-13 14:51 ` fs: Inode cache scalability V3 Christoph Hellwig
2010-10-13 15:58 ` Christoph Hellwig
2010-10-13 21:46 ` Christoph Hellwig
2010-10-13 23:36 ` Christoph Hellwig
2010-10-13 23:55 ` Dave Chinner
2010-10-14 0:06 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CC4758E.9020102@panasas.com \
--to=bharrosh@panasas.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@kernel.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.