[Ocfs2-devel] a bug about deadlock when enable quota on ocfs2

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] a bug about deadlock when enable quota on ocfs2
       [not found] <4FFCF397.20108@oracle.com>
@ 2012-07-13 22:09 ` Jan Kara
  2012-07-16 23:39   ` Srinivas Eeda
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2012-07-13 22:09 UTC (permalink / raw)
  To: ocfs2-devel

  Hello,

On Wed 11-07-12 11:31:35, Tiger Yang wrote:
> As a ocfs2 user reported he met deadlock when enable quota on ocfs2,
> I done the same test and could easily reproduced the bug(the steps
> are at the bottom).
> Srinivas analysed the bug and addressed it the root cause is the
> commit ea455f8ab68338ba69f5d3362b342c115bea8e13,
  Yes, Wengang Wang already talked to me about problems with this commit
about an year ago. We worked on a fix but didn't end up anywhere and I
forgot about it.

> his comments:
> @ With those patches in, all other nodes will now queue downgrade of dentry
> @ locks to ocfs2_wq thread. Then Node 1 gets a lock is in use when it calls
> @ ocfs2_try_open_lock and so does other nodes and hence orphans lie
> around. Now
> @ orphans will keep growing and only gets cleared when all nodes umount the
> @ volume. This causes a different problems 1)space is not cleared 2)
> as orphans
> @ keep growing, orphan thread takes long time to scan all orphans(but still
> @ fails to clear oprhans because of open lock still around) and hence will
> @ block new unlinks for that duration because it gets a EX on orphan
> scan lock.
  I think the analysis is not completely correct (or I misunderstood it).
We defer only putting of inode reference to workqueue (lockres is freed
already in ocfs2_drop_dentry_lock()). However it is correct that the queue
of inodes to put can get long and the system gets into trouble.

> My questions are
> 1.) what kind of "potential deadlock" in your comment?
  Dropping inode reference can result in deletion of inode when this was
the last reference to an unlinked file. However ocfs2_delete_inode() needs
to take locks which rank above locks held when ocfs2_drop_dentry_lock() is
called. You can check this by removing my patches, enabling
CONFIG_PROVE_LOCKING and see the warning lockdep spits.

> 2.) I have tried removing this patch, ocfs2 became more durable,
> although it caused another panic but not get deadlock again, could
> we remove this patch and just to fix the new problem? may the new
> problem is the "potential deadlock" you mentioned.
  I talked about possible solutions to Wengang already. Basically before we
start unlinking we could check whether there aren't too many queued puts of
inode references and if yes, drop some of them directly from unlink process
before we acquire any cluster locks or so.

								Honza

PS: I CC'ed ocfs2-devel so that this discussion gets archived and other
developers can help as well.
PS2: I'm going for a longer vacation now so I won't be responding to email
for some time.
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] a bug about deadlock when enable quota on ocfs2
  2012-07-13 22:09 ` [Ocfs2-devel] a bug about deadlock when enable quota on ocfs2 Jan Kara
@ 2012-07-16 23:39   ` Srinivas Eeda
  2012-08-15 22:12     ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Srinivas Eeda @ 2012-07-16 23:39 UTC (permalink / raw)
  To: ocfs2-devel

Hi Jan,

thanks for helping.

Jan Kara wrote:
>   Hello,
>> his comments:
>> @ With those patches in, all other nodes will now queue downgrade of dentry
>> @ locks to ocfs2_wq thread. Then Node 1 gets a lock is in use when it calls
>> @ ocfs2_try_open_lock and so does other nodes and hence orphans lie
>> around. Now
>> @ orphans will keep growing and only gets cleared when all nodes umount the
>> @ volume. This causes a different problems 1)space is not cleared 2)
>> as orphans
>> @ keep growing, orphan thread takes long time to scan all orphans(but still
>> @ fails to clear oprhans because of open lock still around) and hence will
>> @ block new unlinks for that duration because it gets a EX on orphan
>> scan lock.
>>     
>   I think the analysis is not completely correct (or I misunderstood it).
> We defer only putting of inode reference to workqueue (lockres is freed
> already in ocfs2_drop_dentry_lock()). However it is correct that the queue
> of inodes to put can get long and the system gets into trouble.
>   
Sorry for not being clear. This is an issue when thread running unlink 
and ocfs2_wq on other node end up running ocfs2_delete_inode at the same 
time. They both call ocfs2_try_open_lock during query wipe inode and get 
EAGAIN. So they both defer the actual clean up.

This will become a problem if a user deletes tons of files at the same 
time. Lot of  orphans gets queued and it becomes a problem when user 
continues to delete.
>> My questions are
>> 1.) what kind of "potential deadlock" in your comment?
>>     
>   Dropping inode reference can result in deletion of inode when this was
> the last reference to an unlinked file. However ocfs2_delete_inode() needs
> to take locks which rank above locks held when ocfs2_drop_dentry_lock() is
> called. You can check this by removing my patches, enabling
> CONFIG_PROVE_LOCKING and see the warning lockdep spits.
I am not familiar with which locks get out of order. Tiger can you 
please check this.
>> .) I have tried removing this patch, ocfs2 became more durable,
>> although it caused another panic but not get deadlock again, could
>> we remove this patch and just to fix the new problem? may the new
>> problem is the "potential deadlock" you mentioned.
>>     
>   I talked about possible solutions to Wengang already. Basically before we
> start unlinking we could check whether there aren't too many queued puts of
> inode references and if yes, drop some of them directly from unlink process
> before we acquire any cluster locks or so.
>   
We could do this but if there is a deadlock bug we could still run into 
it when we try deleting directly right?
> 								Honza
>
> PS: I CC'ed ocfs2-devel so that this discussion gets archived and other
> developers can help as well.
> PS2: I'm going for a longer vacation now so I won't be responding to email
> for some time.
>   
Have a good vacation :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] a bug about deadlock when enable quota on ocfs2
  2012-07-16 23:39   ` Srinivas Eeda
@ 2012-08-15 22:12     ` Jan Kara
       [not found]       ` <502D2E7C.6090508@oracle.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2012-08-15 22:12 UTC (permalink / raw)
  To: ocfs2-devel

  Hi,

  I'm back from vacation:
On Mon 16-07-12 16:39:15, Srinivas Eeda wrote:
> Jan Kara wrote:
> >  Hello,
> >>his comments:
> >>@ With those patches in, all other nodes will now queue downgrade of dentry
> >>@ locks to ocfs2_wq thread. Then Node 1 gets a lock is in use when it calls
> >>@ ocfs2_try_open_lock and so does other nodes and hence orphans lie
> >>around. Now
> >>@ orphans will keep growing and only gets cleared when all nodes umount the
> >>@ volume. This causes a different problems 1)space is not cleared 2)
> >>as orphans
> >>@ keep growing, orphan thread takes long time to scan all orphans(but still
> >>@ fails to clear oprhans because of open lock still around) and hence will
> >>@ block new unlinks for that duration because it gets a EX on orphan
> >>scan lock.
> >  I think the analysis is not completely correct (or I misunderstood it).
> >We defer only putting of inode reference to workqueue (lockres is freed
> >already in ocfs2_drop_dentry_lock()). However it is correct that the queue
> >of inodes to put can get long and the system gets into trouble.
> Sorry for not being clear. This is an issue when thread running
> unlink and ocfs2_wq on other node end up running ocfs2_delete_inode
> at the same time. They both call ocfs2_try_open_lock during query
> wipe inode and get EAGAIN. So they both defer the actual clean up.
> 
> This will become a problem if a user deletes tons of files at the
> same time. Lot of  orphans gets queued and it becomes a problem when
> user continues to delete.
  I see. But then I see nothing ocfs2_wq specific in this race. It just
seems the race is always there because the open lock logic is racy.
ocfs2_wq might make it more likely to hit the race but I don't see why it
would create it. Look, the sequence in ocfs2_evict_inode() essentially is:
  ocfs2_inode_lock()
  ocfs2_try_open_lock() [tries to get open lock in ex mode]
  ocfs2_inode_unlock()
  ocfs2_open_unlock()   [drops shared open lock we hold]

  Now if two nodes happen to execute ocfs2_evict_inode() in parallel and
ocfs2_try_open_lock() happens on both nodes before ocfs2_open_unlock() is
called on any of them, ocfs2_try_open_lock() fails for both nodes... So I
don't see why removing my patch offloading inode removal to ocfs2_wq should
help anything.

  I would think that the code should be reorganized so that shared open
lock is dropped before we drop inode lock. Then the race could not happen.
But I'm not sure if something else would not break.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] a bug about deadlock when enable quota on ocfs2
       [not found]       ` <502D2E7C.6090508@oracle.com>
@ 2012-08-16 21:57         ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2012-08-16 21:57 UTC (permalink / raw)
  To: ocfs2-devel

  Hello Srini,

On Thu 16-08-12 10:31:40, srinivas eeda wrote:
> If we do not offload work to ocfs2_wq then the unlink steps on other
> nodes are synchronous because ocfs2_delete_inode on other nodes will
> be called by the down covert thread. This will/should clear the
> inodes from other nodes if not in use. By the time the node that
> issued unlink calls ocfs2_delete_inode the inodes on other nodes
> should already be cleared and hence they gets cleared from orphan
> directory.
  I see but it's really rather fragile for inode deletion to depend on
the way how downconvert threads are called, isn't it? The fact that you
need to call d_drop() from ocfs2_dentry_convert_worker() to invalidate the
dentry and force inode reference to be dropped only proves my point. And I
see no reason why even d_drop() is enough because someone else could be
holding a reference to the dentry which further delays destruction of it
to the time that reference is dropped (which need not be from a downconvert
thread).

So although I believe you your patch makes customer workload work, I don't
think it's the right way of fixing the problem. Rather we should make it
safe for ocfs2_evict_inode() to be run in parallel on two nodes. And as I
wrote in my previous email it doesn't seem to require much - just dropping
of exclusive inode lock has to happen *after* we drop open lock. Not
before.

								Honza

PS: I've added ocfs2-devel list back to CC since I think it's good to have
this discussion archived for future reference and also so that other
ocfs2 developers can read it and offer their ideas.
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-08-16 21:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4FFCF397.20108@oracle.com>
2012-07-13 22:09 ` [Ocfs2-devel] a bug about deadlock when enable quota on ocfs2 Jan Kara
2012-07-16 23:39   ` Srinivas Eeda
2012-08-15 22:12     ` Jan Kara
     [not found]       ` <502D2E7C.6090508@oracle.com>
2012-08-16 21:57         ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.