* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting @ 2011-07-06 4:38 Wengang Wang 2011-07-06 6:17 ` Sunil Mushran 0 siblings, 1 reply; 10+ messages in thread From: Wengang Wang @ 2011-07-06 4:38 UTC (permalink / raw) To: ocfs2-devel There is a use case that the app deletes huge number(XX kilo) of files in every 5 minutes. The deletions of some specific files are extreamly slow(costing xx~xxx seconds). That is unacceptable. Reading out the dir entries and the relavent inodes cost time. And we are doing that with i_mutex held, it causes unlink path waiting on the mutex for long time. fix: We drops and retake the mutex in the duration giving change to unlink to go on. Also, for live nodes, one node only scan and recover this slot where the node resides(helps performance). And always do it at each scan time. For those dead (not mounted), we do it when we "should". And for dead slots, no dropping-retaking mutex is needed. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-06 4:38 [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting Wengang Wang @ 2011-07-06 6:17 ` Sunil Mushran 2011-07-06 6:41 ` Wengang Wang 2011-07-07 6:19 ` Srinivas Eeda 0 siblings, 2 replies; 10+ messages in thread From: Sunil Mushran @ 2011-07-06 6:17 UTC (permalink / raw) To: ocfs2-devel On 07/05/2011 09:38 PM, Wengang Wang wrote: > There is a use case that the app deletes huge number(XX kilo) of files in every > 5 minutes. The deletions of some specific files are extreamly slow(costing > xx~xxx seconds). That is unacceptable. > > Reading out the dir entries and the relavent inodes cost time. And we are doing > that with i_mutex held, it causes unlink path waiting on the mutex for long time. > > fix: > We drops and retake the mutex in the duration giving change to unlink to go on. > Also, for live nodes, one node only scan and recover this slot where the node > resides(helps performance). And always do it at each scan time. For those dead > (not mounted), we do it when we "should". And for dead slots, no dropping-retaking > mutex is needed. Yes, this is a good issue to tackle. I will read the patch in greater detail later. But offhand, I have two comments. 1. "should" is not descriptive. I am assuming you mean do it only during actual recovery. If so, that would be incorrect. Say node 0 unlinks a file that was being used by node 1. Node 0 dies. Recovery will notice that that inode is active and not delete it. If node 1 dies, or is unable to delete the file for any other reason, then our only hope is orphan scan. 2. All nodes have to scan all slots. Even live slots. I remember we did for a reason. And that reason should be in the comment in the patch written by Srini. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-06 6:17 ` Sunil Mushran @ 2011-07-06 6:41 ` Wengang Wang 2011-07-06 6:48 ` Wengang Wang 2011-07-07 6:19 ` Srinivas Eeda 1 sibling, 1 reply; 10+ messages in thread From: Wengang Wang @ 2011-07-06 6:41 UTC (permalink / raw) To: ocfs2-devel On 11-07-05 23:17, Sunil Mushran wrote: > On 07/05/2011 09:38 PM, Wengang Wang wrote: > >There is a use case that the app deletes huge number(XX kilo) of files in every > >5 minutes. The deletions of some specific files are extreamly slow(costing > >xx~xxx seconds). That is unacceptable. > > > >Reading out the dir entries and the relavent inodes cost time. And we are doing > >that with i_mutex held, it causes unlink path waiting on the mutex for long time. > > > >fix: > >We drops and retake the mutex in the duration giving change to unlink to go on. > >Also, for live nodes, one node only scan and recover this slot where the node > >resides(helps performance). And always do it at each scan time. For those dead > >(not mounted), we do it when we "should". And for dead slots, no dropping-retaking > >mutex is needed. > > Yes, this is a good issue to tackle. I will read the patch in greater detail > later. But offhand, I have two comments. > > 1. "should" is not descriptive. I am assuming you mean do it only during > actual recovery. If so, that would be incorrect. Say node 0 unlinks a file > that was being used by node 1. Node 0 dies. Recovery will notice that > that inode is active and not delete it. If node 1 dies, or is unable > to delete > the file for any other reason, then our only hope is orphan scan. Sorry. the "should" doesn't mean a actual recovery. I meant when "os->os_seqno == seqno", the orginal condition determining whether we do queue the scans. > > 2. All nodes have to scan all slots. Even live slots. I remember we did for > a reason. And that reason should be in the comment in the patch written > by Srini. Oh... I will check Srini's patch. thanks, wengang. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-06 6:41 ` Wengang Wang @ 2011-07-06 6:48 ` Wengang Wang 0 siblings, 0 replies; 10+ messages in thread From: Wengang Wang @ 2011-07-06 6:48 UTC (permalink / raw) To: ocfs2-devel On 11-07-06 14:41, Wengang Wang wrote: > On 11-07-05 23:17, Sunil Mushran wrote: > > On 07/05/2011 09:38 PM, Wengang Wang wrote: > > >There is a use case that the app deletes huge number(XX kilo) of files in every > > >5 minutes. The deletions of some specific files are extreamly slow(costing > > >xx~xxx seconds). That is unacceptable. > > > > > >Reading out the dir entries and the relavent inodes cost time. And we are doing > > >that with i_mutex held, it causes unlink path waiting on the mutex for long time. > > > > > >fix: > > >We drops and retake the mutex in the duration giving change to unlink to go on. > > >Also, for live nodes, one node only scan and recover this slot where the node > > >resides(helps performance). And always do it at each scan time. For those dead > > >(not mounted), we do it when we "should". And for dead slots, no dropping-retaking > > >mutex is needed. > > > > Yes, this is a good issue to tackle. I will read the patch in greater detail > > later. But offhand, I have two comments. > > > > 1. "should" is not descriptive. I am assuming you mean do it only during > > actual recovery. If so, that would be incorrect. Say node 0 unlinks a file > > that was being used by node 1. Node 0 dies. Recovery will notice that > > that inode is active and not delete it. If node 1 dies, or is unable > > to delete > > the file for any other reason, then our only hope is orphan scan. > > Sorry. the "should" doesn't mean a actual recovery. I meant when > "os->os_seqno == seqno", the orginal condition determining whether we do > queue the scans. > > > > > 2. All nodes have to scan all slots. Even live slots. I remember we did for > > a reason. And that reason should be in the comment in the patch written > > by Srini. > > Oh... I will check Srini's patch. The whole description is the following: ------------- When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes that have this dentry in cache have a PR on the same dentry lock. When the EX is requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED during downconvert. The inode is finally deleted when the last node to iput the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set. A problem arises if a node is forced to free dentry locks because of memory pressure. If this happens, the node will no longer get downconvert notifications for the dentries that have been unlinked on another node. If it also happens that node is actively using the corresponding inode and happens to be the one performing the last iput on that inode, it will fail to delete the inode as it will not have the MAYBE_ORPHANED flag set. This patch fixes this shortcoming by introducing a periodic scan of the orphan directories to delete such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. --------------- I didn't see the reason All nodes have to scan all slots. If there is one, the "load balance"? Or the real reason is not in the description? thanks, wengang. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-06 6:17 ` Sunil Mushran 2011-07-06 6:41 ` Wengang Wang @ 2011-07-07 6:19 ` Srinivas Eeda 2011-07-07 20:02 ` Sunil Mushran 1 sibling, 1 reply; 10+ messages in thread From: Srinivas Eeda @ 2011-07-07 6:19 UTC (permalink / raw) To: ocfs2-devel On 7/5/2011 11:17 PM, Sunil Mushran wrote: > 2. All nodes have to scan all slots. Even live slots. I remember we > did for > a reason. And that reason should be in the comment in the patch written > by Srini. When a node unlinks a file it inserts an entry into it's own orphan slot. If another node is the last one to close the file and dentry got flushed then it will not do the cleanup as it doesn't know the file was orphaned. The file will remain in the orphan slot till the node umounts and the same slot is reused again. To overcome this problem a node has to rescan all slots(including live slots) and try to do the cleanup. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-07 6:19 ` Srinivas Eeda @ 2011-07-07 20:02 ` Sunil Mushran 2011-07-07 20:26 ` Sunil Mushran 0 siblings, 1 reply; 10+ messages in thread From: Sunil Mushran @ 2011-07-07 20:02 UTC (permalink / raw) To: ocfs2-devel On 07/06/2011 11:19 PM, Srinivas Eeda wrote: > On 7/5/2011 11:17 PM, Sunil Mushran wrote: >> 2. All nodes have to scan all slots. Even live slots. I remember we >> did for >> a reason. And that reason should be in the comment in the patch written >> by Srini. > When a node unlinks a file it inserts an entry into it's own orphan > slot. If another node is the last one to close the file and dentry got > flushed then it will not do the cleanup as it doesn't know the file was > orphaned. The file will remain in the orphan slot till the node umounts > and the same slot is reused again. To overcome this problem a node has > to rescan all slots(including live slots) and try to do the cleanup. The qs is not why are we scanning all live slots on all nodes. As in, why not just recover the local slot. There was a reason for that. Yes, we have to recover unused slots for the reason listed previously. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-07 20:02 ` Sunil Mushran @ 2011-07-07 20:26 ` Sunil Mushran 2011-07-08 7:02 ` Srinivas Eeda 0 siblings, 1 reply; 10+ messages in thread From: Sunil Mushran @ 2011-07-07 20:26 UTC (permalink / raw) To: ocfs2-devel On 07/07/2011 01:02 PM, Sunil Mushran wrote: > On 07/06/2011 11:19 PM, Srinivas Eeda wrote: >> On 7/5/2011 11:17 PM, Sunil Mushran wrote: >>> 2. All nodes have to scan all slots. Even live slots. I remember we >>> did for >>> a reason. And that reason should be in the comment in the patch written >>> by Srini. >> When a node unlinks a file it inserts an entry into it's own orphan >> slot. If another node is the last one to close the file and dentry got >> flushed then it will not do the cleanup as it doesn't know the file was >> orphaned. The file will remain in the orphan slot till the node umounts >> and the same slot is reused again. To overcome this problem a node has >> to rescan all slots(including live slots) and try to do the cleanup. > The qs is not why are we scanning all live slots on all nodes. As in, > why not just recover the local slot. There was a reason for that. > Yes, we have to recover unused slots for the reason listed previously. bleh.... let me rephrase. The qs is why are we scanning all live slots on all nodes. Wengangs patch limits the scanning to the local (live) slot only. And I remember we had a reason for it. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-07 20:26 ` Sunil Mushran @ 2011-07-08 7:02 ` Srinivas Eeda 2011-07-08 16:18 ` Sunil Mushran 2011-07-28 10:14 ` Joel Becker 0 siblings, 2 replies; 10+ messages in thread From: Srinivas Eeda @ 2011-07-08 7:02 UTC (permalink / raw) To: ocfs2-devel Below is excerpts from Joel's email for the same question :) > Currently, orphan scan just iterate all the slots and call > ocfs2_queue_recovery_completion, but I don't think it is proper for a node > to query another mounted one since that node will query it by > itself. Node 1 has an inode it was using. The dentry went away due to memory pressure. Node 1 closes the inode, but it's on the free list. The node has the open lock. Node 2 unlinks the inode. It grabs the dentry lock to notify others, but node 1 has no dentry and doesn't get the message. It trylocks the open lock, sees that another node has a PR, and does nothing. Later node 2 runs its orphan dir. It igets the inode, trylocks the open lock, sees the PR still, and does nothing. Basically, we have to trigger an orphan iput on node 1. The only way for this to happen is if node 1 runs node 2's orphan dir. This patch exists because that wasn't happening. On 7/7/2011 1:26 PM, Sunil Mushran wrote: > On 07/07/2011 01:02 PM, Sunil Mushran wrote: >> On 07/06/2011 11:19 PM, Srinivas Eeda wrote: >>> On 7/5/2011 11:17 PM, Sunil Mushran wrote: >>>> 2. All nodes have to scan all slots. Even live slots. I remember we >>>> did for >>>> a reason. And that reason should be in the comment in the patch >>>> written >>>> by Srini. >>> When a node unlinks a file it inserts an entry into it's own orphan >>> slot. If another node is the last one to close the file and dentry got >>> flushed then it will not do the cleanup as it doesn't know the file was >>> orphaned. The file will remain in the orphan slot till the node >>> umounts >>> and the same slot is reused again. To overcome this problem a node has >>> to rescan all slots(including live slots) and try to do the cleanup. >> The qs is not why are we scanning all live slots on all nodes. As in, >> why not just recover the local slot. There was a reason for that. >> Yes, we have to recover unused slots for the reason listed previously. > > bleh.... let me rephrase. > > The qs is why are we scanning all live slots on all nodes. Wengangs > patch limits the scanning to the local (live) slot only. And I remember > we had a reason for it. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-08 7:02 ` Srinivas Eeda @ 2011-07-08 16:18 ` Sunil Mushran 2011-07-28 10:14 ` Joel Becker 1 sibling, 0 replies; 10+ messages in thread From: Sunil Mushran @ 2011-07-08 16:18 UTC (permalink / raw) To: ocfs2-devel On 07/08/2011 12:02 AM, Srinivas Eeda wrote: > Below is excerpts from Joel's email for the same question :) > >> Currently, orphan scan just iterate all the slots and call >> ocfs2_queue_recovery_completion, but I don't think it is proper for a node >> to query another mounted one since that node will query it by >> itself. > > Node 1 has an inode it was using. The dentry went away due to > memory pressure. Node 1 closes the inode, but it's on the free list. > The node has the open lock. > Node 2 unlinks the inode. It grabs the dentry lock to notify > others, but node 1 has no dentry and doesn't get the message. It > trylocks the open lock, sees that another node has a PR, and does > nothing. > Later node 2 runs its orphan dir. It igets the inode, trylocks > the open lock, sees the PR still, and does nothing. > Basically, we have to trigger an orphan iput on node 1. The > only way for this to happen is if node 1 runs node 2's orphan dir. This > patch exists because that wasn't happening. Thanks for the reminder. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting 2011-07-08 7:02 ` Srinivas Eeda 2011-07-08 16:18 ` Sunil Mushran @ 2011-07-28 10:14 ` Joel Becker 1 sibling, 0 replies; 10+ messages in thread From: Joel Becker @ 2011-07-28 10:14 UTC (permalink / raw) To: ocfs2-devel On Fri, Jul 08, 2011 at 12:02:57AM -0700, Srinivas Eeda wrote: > Below is excerpts from Joel's email for the same question :) > > > Currently, orphan scan just iterate all the slots and call > > ocfs2_queue_recovery_completion, but I don't think it is proper for a node > > to query another mounted one since that node will query it by > > itself. > > Node 1 has an inode it was using. The dentry went away due to > memory pressure. Node 1 closes the inode, but it's on the free list. > The node has the open lock. > Node 2 unlinks the inode. It grabs the dentry lock to notify > others, but node 1 has no dentry and doesn't get the message. It > trylocks the open lock, sees that another node has a PR, and does > nothing. > Later node 2 runs its orphan dir. It igets the inode, trylocks > the open lock, sees the PR still, and does nothing. > Basically, we have to trigger an orphan iput on node 1. The > only way for this to happen is if node 1 runs node 2's orphan dir. This > patch exists because that wasn't happening. What he said. And yes, I know I'm late to the party. Joel -- "Lately I've been talking in my sleep. Can't imagine what I'd have to say. Except my world will be right When love comes back my way." http://www.jlbec.org/ jlbec at evilplan.org ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-07-28 10:14 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-06 4:38 [Ocfs2-devel] [PATCH 0/3] ocfs2: fix slow deleting Wengang Wang 2011-07-06 6:17 ` Sunil Mushran 2011-07-06 6:41 ` Wengang Wang 2011-07-06 6:48 ` Wengang Wang 2011-07-07 6:19 ` Srinivas Eeda 2011-07-07 20:02 ` Sunil Mushran 2011-07-07 20:26 ` Sunil Mushran 2011-07-08 7:02 ` Srinivas Eeda 2011-07-08 16:18 ` Sunil Mushran 2011-07-28 10:14 ` Joel Becker
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.