All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
@ 2011-07-06  4:38 Wengang Wang
  2011-07-06  6:17 ` Sunil Mushran
  0 siblings, 1 reply; 10+ messages in thread
From: Wengang Wang @ 2011-07-06  4:38 UTC (permalink / raw)
  To: ocfs2-devel

There is a use case that the app deletes huge number(XX kilo) of files in every
5 minutes. The deletions of some specific files are extreamly slow(costing
xx~xxx seconds). That is unacceptable.

Reading out the dir entries and the relavent inodes cost time. And we are doing
that with i_mutex held, it causes unlink path waiting on the mutex for long time.

fix:
We drops and retake the mutex in the duration giving change to unlink to go on.
Also, for live nodes, one node only scan and recover this slot where the node
resides(helps performance). And always do it at each scan time. For those dead
(not mounted), we do it when we "should". And for dead slots, no dropping-retaking
mutex is needed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-06  4:38 [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting Wengang Wang
@ 2011-07-06  6:17 ` Sunil Mushran
  2011-07-06  6:41   ` Wengang Wang
  2011-07-07  6:19   ` Srinivas Eeda
  0 siblings, 2 replies; 10+ messages in thread
From: Sunil Mushran @ 2011-07-06  6:17 UTC (permalink / raw)
  To: ocfs2-devel

On 07/05/2011 09:38 PM, Wengang Wang wrote:
> There is a use case that the app deletes huge number(XX kilo) of files in every
> 5 minutes. The deletions of some specific files are extreamly slow(costing
> xx~xxx seconds). That is unacceptable.
>
> Reading out the dir entries and the relavent inodes cost time. And we are doing
> that with i_mutex held, it causes unlink path waiting on the mutex for long time.
>
> fix:
> We drops and retake the mutex in the duration giving change to unlink to go on.
> Also, for live nodes, one node only scan and recover this slot where the node
> resides(helps performance). And always do it at each scan time. For those dead
> (not mounted), we do it when we "should". And for dead slots, no dropping-retaking
> mutex is needed.

Yes, this is a good issue to tackle. I will read the patch in greater detail
later. But offhand, I have two comments.

1. "should" is not descriptive. I am assuming you mean do it only during
actual recovery. If so, that would be incorrect. Say node 0 unlinks a file
that was being used by node 1. Node 0 dies. Recovery will notice that
that inode is active and not delete it. If node 1 dies, or is unable to 
delete
the file for any other reason, then our only hope is orphan scan.

2. All nodes have to scan all slots. Even live slots. I remember we did for
a reason. And that reason should be in the comment in the patch written
by Srini.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-06  6:17 ` Sunil Mushran
@ 2011-07-06  6:41   ` Wengang Wang
  2011-07-06  6:48     ` Wengang Wang
  2011-07-07  6:19   ` Srinivas Eeda
  1 sibling, 1 reply; 10+ messages in thread
From: Wengang Wang @ 2011-07-06  6:41 UTC (permalink / raw)
  To: ocfs2-devel

On 11-07-05 23:17, Sunil Mushran wrote:
> On 07/05/2011 09:38 PM, Wengang Wang wrote:
> >There is a use case that the app deletes huge number(XX kilo) of files in every
> >5 minutes. The deletions of some specific files are extreamly slow(costing
> >xx~xxx seconds). That is unacceptable.
> >
> >Reading out the dir entries and the relavent inodes cost time. And we are doing
> >that with i_mutex held, it causes unlink path waiting on the mutex for long time.
> >
> >fix:
> >We drops and retake the mutex in the duration giving change to unlink to go on.
> >Also, for live nodes, one node only scan and recover this slot where the node
> >resides(helps performance). And always do it at each scan time. For those dead
> >(not mounted), we do it when we "should". And for dead slots, no dropping-retaking
> >mutex is needed.
> 
> Yes, this is a good issue to tackle. I will read the patch in greater detail
> later. But offhand, I have two comments.
> 
> 1. "should" is not descriptive. I am assuming you mean do it only during
> actual recovery. If so, that would be incorrect. Say node 0 unlinks a file
> that was being used by node 1. Node 0 dies. Recovery will notice that
> that inode is active and not delete it. If node 1 dies, or is unable
> to delete
> the file for any other reason, then our only hope is orphan scan.

Sorry. the "should" doesn't mean a actual recovery. I meant when 
"os->os_seqno == seqno", the orginal condition determining whether we do
queue the scans.

> 
> 2. All nodes have to scan all slots. Even live slots. I remember we did for
> a reason. And that reason should be in the comment in the patch written
> by Srini.

Oh... I will check Srini's patch.

thanks,
wengang.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-06  6:41   ` Wengang Wang
@ 2011-07-06  6:48     ` Wengang Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Wengang Wang @ 2011-07-06  6:48 UTC (permalink / raw)
  To: ocfs2-devel

On 11-07-06 14:41, Wengang Wang wrote:
> On 11-07-05 23:17, Sunil Mushran wrote:
> > On 07/05/2011 09:38 PM, Wengang Wang wrote:
> > >There is a use case that the app deletes huge number(XX kilo) of files in every
> > >5 minutes. The deletions of some specific files are extreamly slow(costing
> > >xx~xxx seconds). That is unacceptable.
> > >
> > >Reading out the dir entries and the relavent inodes cost time. And we are doing
> > >that with i_mutex held, it causes unlink path waiting on the mutex for long time.
> > >
> > >fix:
> > >We drops and retake the mutex in the duration giving change to unlink to go on.
> > >Also, for live nodes, one node only scan and recover this slot where the node
> > >resides(helps performance). And always do it at each scan time. For those dead
> > >(not mounted), we do it when we "should". And for dead slots, no dropping-retaking
> > >mutex is needed.
> > 
> > Yes, this is a good issue to tackle. I will read the patch in greater detail
> > later. But offhand, I have two comments.
> > 
> > 1. "should" is not descriptive. I am assuming you mean do it only during
> > actual recovery. If so, that would be incorrect. Say node 0 unlinks a file
> > that was being used by node 1. Node 0 dies. Recovery will notice that
> > that inode is active and not delete it. If node 1 dies, or is unable
> > to delete
> > the file for any other reason, then our only hope is orphan scan.
> 
> Sorry. the "should" doesn't mean a actual recovery. I meant when 
> "os->os_seqno == seqno", the orginal condition determining whether we do
> queue the scans.
> 
> > 
> > 2. All nodes have to scan all slots. Even live slots. I remember we did for
> > a reason. And that reason should be in the comment in the patch written
> > by Srini.
> 
> Oh... I will check Srini's patch.
The whole description is the following:

-------------
When a dentry is unlinked, the unlinking node takes an EX on the dentry
lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock.  When the EX is
requested, the other nodes flag the corresponding inode as
MAYBE_ORPHANED
during downconvert.  The inode is finally deleted when the last node to
iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of
memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode
and
happens to be the one performing the last iput on that inode, it will
fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to
distribute
the workload across the cluster so that no one node has to perform the
task
all the time.
---------------

I didn't see the reason All nodes have to scan all slots. If there is
one, the "load balance"? Or the real reason is not in the description?

thanks,
wengang.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-06  6:17 ` Sunil Mushran
  2011-07-06  6:41   ` Wengang Wang
@ 2011-07-07  6:19   ` Srinivas Eeda
  2011-07-07 20:02     ` Sunil Mushran
  1 sibling, 1 reply; 10+ messages in thread
From: Srinivas Eeda @ 2011-07-07  6:19 UTC (permalink / raw)
  To: ocfs2-devel

On 7/5/2011 11:17 PM, Sunil Mushran wrote:
> 2. All nodes have to scan all slots. Even live slots. I remember we 
> did for
> a reason. And that reason should be in the comment in the patch written
> by Srini.
When a node unlinks a file it inserts an entry into it's own orphan 
slot. If another node is the last one to close the file and dentry got 
flushed then it will not do the cleanup as it doesn't know the file was 
orphaned. The file will remain in the orphan slot  till the node umounts 
and the same slot is reused again. To overcome this problem a node has 
to rescan all slots(including live slots) and try to do the cleanup.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-07  6:19   ` Srinivas Eeda
@ 2011-07-07 20:02     ` Sunil Mushran
  2011-07-07 20:26       ` Sunil Mushran
  0 siblings, 1 reply; 10+ messages in thread
From: Sunil Mushran @ 2011-07-07 20:02 UTC (permalink / raw)
  To: ocfs2-devel

On 07/06/2011 11:19 PM, Srinivas Eeda wrote:
> On 7/5/2011 11:17 PM, Sunil Mushran wrote:
>> 2. All nodes have to scan all slots. Even live slots. I remember we
>> did for
>> a reason. And that reason should be in the comment in the patch written
>> by Srini.
> When a node unlinks a file it inserts an entry into it's own orphan
> slot. If another node is the last one to close the file and dentry got
> flushed then it will not do the cleanup as it doesn't know the file was
> orphaned. The file will remain in the orphan slot  till the node umounts
> and the same slot is reused again. To overcome this problem a node has
> to rescan all slots(including live slots) and try to do the cleanup.

The qs is not why are we scanning all live slots on all nodes. As in,
why not just recover the local slot. There was a reason for that.
Yes, we have to recover unused slots for the reason listed previously.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-07 20:02     ` Sunil Mushran
@ 2011-07-07 20:26       ` Sunil Mushran
  2011-07-08  7:02         ` Srinivas Eeda
  0 siblings, 1 reply; 10+ messages in thread
From: Sunil Mushran @ 2011-07-07 20:26 UTC (permalink / raw)
  To: ocfs2-devel

On 07/07/2011 01:02 PM, Sunil Mushran wrote:
> On 07/06/2011 11:19 PM, Srinivas Eeda wrote:
>> On 7/5/2011 11:17 PM, Sunil Mushran wrote:
>>> 2. All nodes have to scan all slots. Even live slots. I remember we
>>> did for
>>> a reason. And that reason should be in the comment in the patch written
>>> by Srini.
>> When a node unlinks a file it inserts an entry into it's own orphan
>> slot. If another node is the last one to close the file and dentry got
>> flushed then it will not do the cleanup as it doesn't know the file was
>> orphaned. The file will remain in the orphan slot  till the node umounts
>> and the same slot is reused again. To overcome this problem a node has
>> to rescan all slots(including live slots) and try to do the cleanup.
> The qs is not why are we scanning all live slots on all nodes. As in,
> why not just recover the local slot. There was a reason for that.
> Yes, we have to recover unused slots for the reason listed previously.

bleh.... let me rephrase.

The qs is why are we scanning all live slots on all nodes. Wengangs
patch limits the scanning to the local (live) slot only. And I remember
we had a reason for it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-07 20:26       ` Sunil Mushran
@ 2011-07-08  7:02         ` Srinivas Eeda
  2011-07-08 16:18           ` Sunil Mushran
  2011-07-28 10:14           ` Joel Becker
  0 siblings, 2 replies; 10+ messages in thread
From: Srinivas Eeda @ 2011-07-08  7:02 UTC (permalink / raw)
  To: ocfs2-devel

Below is excerpts from Joel's email for the same question :)

>	Currently, orphan scan just iterate all the slots and call
>  ocfs2_queue_recovery_completion, but I don't think it is proper for a node
>  to query another mounted one since that node will query it by
>  itself.

	Node 1 has an inode it was using.  The dentry went away due to
memory pressure.  Node 1 closes the inode, but it's on the free list.
The node has the open lock.
	Node 2 unlinks the inode.  It grabs the dentry lock to notify
others, but node 1 has no dentry and doesn't get the message.  It
trylocks the open lock, sees that another node has a PR, and does
nothing.
	Later node 2 runs its orphan dir.  It igets the inode, trylocks
the open lock, sees the PR still, and does nothing.
	Basically, we have to trigger an orphan iput on node 1.  The
only way for this to happen is if node 1 runs node 2's orphan dir.  This
patch exists because that wasn't happening.


On 7/7/2011 1:26 PM, Sunil Mushran wrote:
> On 07/07/2011 01:02 PM, Sunil Mushran wrote:
>> On 07/06/2011 11:19 PM, Srinivas Eeda wrote:
>>> On 7/5/2011 11:17 PM, Sunil Mushran wrote:
>>>> 2. All nodes have to scan all slots. Even live slots. I remember we
>>>> did for
>>>> a reason. And that reason should be in the comment in the patch 
>>>> written
>>>> by Srini.
>>> When a node unlinks a file it inserts an entry into it's own orphan
>>> slot. If another node is the last one to close the file and dentry got
>>> flushed then it will not do the cleanup as it doesn't know the file was
>>> orphaned. The file will remain in the orphan slot  till the node 
>>> umounts
>>> and the same slot is reused again. To overcome this problem a node has
>>> to rescan all slots(including live slots) and try to do the cleanup.
>> The qs is not why are we scanning all live slots on all nodes. As in,
>> why not just recover the local slot. There was a reason for that.
>> Yes, we have to recover unused slots for the reason listed previously.
>
> bleh.... let me rephrase.
>
> The qs is why are we scanning all live slots on all nodes. Wengangs
> patch limits the scanning to the local (live) slot only. And I remember
> we had a reason for it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-08  7:02         ` Srinivas Eeda
@ 2011-07-08 16:18           ` Sunil Mushran
  2011-07-28 10:14           ` Joel Becker
  1 sibling, 0 replies; 10+ messages in thread
From: Sunil Mushran @ 2011-07-08 16:18 UTC (permalink / raw)
  To: ocfs2-devel

On 07/08/2011 12:02 AM, Srinivas Eeda wrote:
> Below is excerpts from Joel's email for the same question :)
>
>>     Currently, orphan scan just iterate all the slots and call
>>  ocfs2_queue_recovery_completion, but I don't think it is proper for a node
>>  to query another mounted one since that node will query it by
>>  itself.
>
>     Node 1 has an inode it was using.  The dentry went away due to
> memory pressure.  Node 1 closes the inode, but it's on the free list.
> The node has the open lock.
>     Node 2 unlinks the inode.  It grabs the dentry lock to notify
> others, but node 1 has no dentry and doesn't get the message.  It
> trylocks the open lock, sees that another node has a PR, and does
> nothing.
>     Later node 2 runs its orphan dir.  It igets the inode, trylocks
> the open lock, sees the PR still, and does nothing.
>     Basically, we have to trigger an orphan iput on node 1.  The
> only way for this to happen is if node 1 runs node 2's orphan dir.  This
> patch exists because that wasn't happening.

Thanks for the reminder.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting
  2011-07-08  7:02         ` Srinivas Eeda
  2011-07-08 16:18           ` Sunil Mushran
@ 2011-07-28 10:14           ` Joel Becker
  1 sibling, 0 replies; 10+ messages in thread
From: Joel Becker @ 2011-07-28 10:14 UTC (permalink / raw)
  To: ocfs2-devel

On Fri, Jul 08, 2011 at 12:02:57AM -0700, Srinivas Eeda wrote:
> Below is excerpts from Joel's email for the same question :)
> 
> >	Currently, orphan scan just iterate all the slots and call
> >  ocfs2_queue_recovery_completion, but I don't think it is proper for a node
> >  to query another mounted one since that node will query it by
> >  itself.
> 
> 	Node 1 has an inode it was using.  The dentry went away due to
> memory pressure.  Node 1 closes the inode, but it's on the free list.
> The node has the open lock.
> 	Node 2 unlinks the inode.  It grabs the dentry lock to notify
> others, but node 1 has no dentry and doesn't get the message.  It
> trylocks the open lock, sees that another node has a PR, and does
> nothing.
> 	Later node 2 runs its orphan dir.  It igets the inode, trylocks
> the open lock, sees the PR still, and does nothing.
> 	Basically, we have to trigger an orphan iput on node 1.  The
> only way for this to happen is if node 1 runs node 2's orphan dir.  This
> patch exists because that wasn't happening.

	What he said.  And yes, I know I'm late to the party.

Joel

-- 

"Lately I've been talking in my sleep.
 Can't imagine what I'd have to say.
 Except my world will be right
 When love comes back my way."

			http://www.jlbec.org/
			jlbec at evilplan.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-07-28 10:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-06  4:38 [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting Wengang Wang
2011-07-06  6:17 ` Sunil Mushran
2011-07-06  6:41   ` Wengang Wang
2011-07-06  6:48     ` Wengang Wang
2011-07-07  6:19   ` Srinivas Eeda
2011-07-07 20:02     ` Sunil Mushran
2011-07-07 20:26       ` Sunil Mushran
2011-07-08  7:02         ` Srinivas Eeda
2011-07-08 16:18           ` Sunil Mushran
2011-07-28 10:14           ` Joel Becker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.