All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: bugzilla-daemon@bugzilla.kernel.org, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [Bug 13232] ext3/4 with synchronous writes gets wedged by Postfix
Date: Mon, 18 May 2009 16:10:11 +0200	[thread overview]
Message-ID: <20090518141011.GE25232@duck.suse.cz> (raw)
In-Reply-To: <20090513181340.GA8128@ZenIV.linux.org.uk>

On Wed 13-05-09 19:13:40, Al Viro wrote:
> On Wed, May 13, 2009 at 05:52:54PM +0100, Al Viro wrote:
> > On Wed, May 13, 2009 at 03:48:02PM +0200, Jan Kara wrote:
> > > >   Here, we have started a transaction in ext3_create() and then wait in
> > > > find_inode_fast() for I_FREEING to be cleared (obviously we have
> > > > reallocated the inode and squeezed the allocation before journal_stop()
> > > > from the delete was called).
> > > >   Nasty deadlock and I don't see how to fix it now - have to go home for
> > > > today... Tomorrow I'll have a look what we can do about it.
> > >   OK, the deadlock has been introduced by ext3 variant of
> > > 261bca86ed4f7f391d1938167624e78da61dcc6b (adding Al to CC). The deadlock
> > > is really tough to avoid - we have to first allocate inode on disk so
> > > that we know the inode number. For this we need transaction open but we
> > > cannot afford waiting for old inode with same INO to be freed when we have
> > > transaction open because of the above deadlock. So we'd have to wait for
> > > inode release only after everything is done and we closed the transaction. But
> > > that would mean reordering a lot of code in ext3/namei.c so that all the
> > > dcache handling is done after all the IO is done.
> > >   Hmm, maybe we could change the delete side of the deadlock but that's
> > > going to be tricky as well :(.
> > >   Al, any idea if we could somehow get away without waiting on
> > > I_FREEING?
> > 
> > At which point do we actually run into deadlock on delete side?  We could,
> > in principle, skip everything like that in insert_inode_locked(), but
> > I would rather avoid the "two inodes in icache at the same time, with the
> > same inumber" situations completely.  We might get away with that, since
> > everything else *will* wait, so we can afford a bunch of inodes past the
> > point in foo_delete_inode() that has cleared it in bitmap + new locked
> > one, but if it's at all possible to avoid, I'd rather avoid it.
> 
> OK, that's probably the easiest way to do that, as much as I don't like it...
> Since iget() et.al. will not accept I_FREEING (will wait to go away
> and restart), and since we'd better have serialization between new/free
> on fs data structures anyway, we can afford simply skipping I_FREEING
> et.al. in insert_inode_locked().
> 
> We do that from new_inode, so it won't race with free_inode in any interesting
> ways and it won't race with iget (of any origin; nfsd or in case of fs
> corruption a lookup) since both still will wait for I_LOCK.
> 
> Tentative patch follow; folks, I would very much like review on that one,
> since I'm far too low on caffeine and the area is nasty.
  The patch looks fine. Everyone else will either get new inode and wait
for I_LOCK or get old inode and wait for I_FREEING so everything should be
fine... You can add.
  Acked-by: Jan Kara <jack@suse.cz>

									Honza
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 9d26490..4406952 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1053,13 +1053,22 @@ int insert_inode_locked(struct inode *inode)
>  	struct super_block *sb = inode->i_sb;
>  	ino_t ino = inode->i_ino;
>  	struct hlist_head *head = inode_hashtable + hash(sb, ino);
> -	struct inode *old;
>  
>  	inode->i_state |= I_LOCK|I_NEW;
>  	while (1) {
> +		struct hlist_node *node;
> +		struct inode *old = NULL;
>  		spin_lock(&inode_lock);
> -		old = find_inode_fast(sb, head, ino);
> -		if (likely(!old)) {
> +		hlist_for_each_entry(old, node, head, i_hash) {
> +			if (old->i_ino != ino)
> +				continue;
> +			if (old->i_sb != sb)
> +				continue;
> +			if (old->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
> +				continue;
> +			break;
> +		}
> +		if (likely(!node)) {
>  			hlist_add_head(&inode->i_hash, head);
>  			spin_unlock(&inode_lock);
>  			return 0;
> @@ -1081,14 +1090,24 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
>  {
>  	struct super_block *sb = inode->i_sb;
>  	struct hlist_head *head = inode_hashtable + hash(sb, hashval);
> -	struct inode *old;
>  
>  	inode->i_state |= I_LOCK|I_NEW;
>  
>  	while (1) {
> +		struct hlist_node *node;
> +		struct inode *old = NULL;
> +
>  		spin_lock(&inode_lock);
> -		old = find_inode(sb, head, test, data);
> -		if (likely(!old)) {
> +		hlist_for_each_entry(old, node, head, i_hash) {
> +			if (old->i_sb != sb)
> +				continue;
> +			if (!test(old, data))
> +				continue;
> +			if (old->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
> +				continue;
> +			break;
> +		}
> +		if (likely(!node)) {
>  			hlist_add_head(&inode->i_hash, head);
>  			spin_unlock(&inode_lock);
>  			return 0;
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  parent reply	other threads:[~2009-05-18 14:10 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-05 21:58 [Bug 13232] New: ext3/4 with synchronous writes gets wedged by Postfix bugzilla-daemon
2009-05-05 23:16 ` [Bug 13232] " bugzilla-daemon
2009-05-12 16:56 ` bugzilla-daemon
2009-05-13 13:48   ` Jan Kara
2009-05-13 16:07     ` Theodore Tso
2009-05-18 12:45       ` Jan Kara
2009-05-13 16:52     ` Al Viro
2009-05-13 18:13       ` Al Viro
2009-05-18 13:15         ` Theodore Tso
2009-05-18 14:10         ` Jan Kara [this message]
2009-05-18 12:53       ` Jan Kara
2009-05-13 13:48 ` bugzilla-daemon
2009-05-13 16:07 ` bugzilla-daemon
2009-05-13 16:18 ` bugzilla-daemon
2009-05-13 16:52 ` bugzilla-daemon
2009-05-13 18:13 ` bugzilla-daemon
2009-05-18 12:45 ` bugzilla-daemon
2009-05-18 12:54 ` bugzilla-daemon
2009-05-18 13:16 ` bugzilla-daemon
2009-05-18 14:10 ` bugzilla-daemon
2009-05-19 20:38 ` bugzilla-daemon
2009-05-19 20:39 ` bugzilla-daemon
2009-06-07 19:56 ` bugzilla-daemon
2009-06-07 19:56 ` bugzilla-daemon
2009-06-07 20:44 ` bugzilla-daemon
  -- strict thread matches above, loose matches on Subject: below --
2009-05-16 19:58 2.6.30-rc6: Reported regressions 2.6.28 -> 2.6.29 Rafael J. Wysocki
2009-05-16 20:06 ` [Bug #13232] ext3/4 with synchronous writes gets wedged by Postfix Rafael J. Wysocki
2009-05-16 20:06   ` Rafael J. Wysocki
2009-05-18 13:25   ` Theodore Tso
2009-05-18 13:25     ` Theodore Tso
     [not found]     ` <20090518132517.GL32019-3s7WtUTddSA@public.gmane.org>
2009-05-19 17:17       ` David Watson
2009-05-19 17:17         ` David Watson
     [not found]         ` <20090519171713.GA3354-yvBcC19sZ6P0OyVTGvYuXB2eb7JE58TQ@public.gmane.org>
2009-05-19 17:53           ` Theodore Tso
2009-05-19 17:53             ` Theodore Tso
2009-05-19 18:27             ` John Stoffel
     [not found]               ` <18962.64002.324970.49512-HgN6juyGXH5AfugRpC6u6w@public.gmane.org>
2009-05-19 20:41                 ` Theodore Tso
2009-05-19 20:41                   ` Theodore Tso
     [not found]                   ` <20090519204152.GE9053-3s7WtUTddSA@public.gmane.org>
2009-05-20 16:53                     ` John Stoffel
2009-05-20 16:53                       ` John Stoffel
2009-05-24 19:27 2.6.30-rc7: Reported regressions 2.6.28 -> 2.6.29 Rafael J. Wysocki
2009-05-24 19:31 ` [Bug #13232] ext3/4 with synchronous writes gets wedged by Postfix Rafael J. Wysocki
2009-05-24 19:31   ` Rafael J. Wysocki
2009-05-30 19:50 2.6.30-rc7-git4: Reported regressions 2.6.28 -> 2.6.29 Rafael J. Wysocki
2009-05-30 19:55 ` [Bug #13232] ext3/4 with synchronous writes gets wedged by Postfix Rafael J. Wysocki
2009-05-30 19:55   ` Rafael J. Wysocki
2009-06-07 10:02 2.6.30-rc8-git4: Reported regressions 2.6.28 -> 2.6.29 Rafael J. Wysocki
2009-06-07 10:06 ` [Bug #13232] ext3/4 with synchronous writes gets wedged by Postfix Rafael J. Wysocki
2009-06-07 10:06   ` Rafael J. Wysocki
2009-06-07 17:14   ` Theodore Tso
2009-06-07 17:14     ` Theodore Tso
     [not found]     ` <20090607171418.GA22756-3s7WtUTddSA@public.gmane.org>
2009-06-07 17:17       ` Al Viro
2009-06-07 17:17         ` Al Viro
     [not found]         ` <20090607171739.GI8633-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2009-06-07 20:10           ` Theodore Tso
2009-06-07 20:10             ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090518141011.GE25232@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.