All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fixes for undeletable empty directory
@ 2012-12-04  8:09 Yan, Zheng
  2012-12-04  8:09 ` [PATCH 1/2] mds: don't create bloom filter for incomplete dir Yan, Zheng
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Yan, Zheng @ 2012-12-04  8:09 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

These two patches fix MDS restart issues that may corrupt the fragstat accounting
and cause undeletable empty directory. These two patches, together with the patch
"mds: fix journaling issue regarding rstat accounting", have passed overnight
"fsstress + restart MDS every 120s" test. No undeletable directory was found.

These patches are also in:
  git://github.com/ukernel/ceph.git wip-mds

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] mds: don't create bloom filter for incomplete dir
  2012-12-04  8:09 [PATCH 0/2] fixes for undeletable empty directory Yan, Zheng
@ 2012-12-04  8:09 ` Yan, Zheng
  2012-12-04  8:09 ` [PATCH 2/2] mds: journal remote inode's projected parent Yan, Zheng
  2012-12-05  8:43 ` [PATCH 0/2] fixes for undeletable empty directory Amon Ott
  2 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng @ 2012-12-04  8:09 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Creating bloom filter for incomplete dir that was added by log
replay will confuse subsequent dir lookup and can create null
dentry for existing file. The erroneous null dentry confuses the
fragstat accounting and causes undeletable empty directory.

The fix is check if the dir is complete before creating the bloom
filter. For the MDCache::trim_non_auth{,_subtree} cases, just do
not call CDir::add_to_bloom because bloom filter is useless for
replica.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 src/mds/CDir.cc    |  6 +++++-
 src/mds/MDCache.cc | 12 +++++++-----
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc
index c5220ed..835a0e9 100644
--- a/src/mds/CDir.cc
+++ b/src/mds/CDir.cc
@@ -615,8 +615,12 @@ void CDir::unlink_inode_work( CDentry *dn )
 
 void CDir::add_to_bloom(CDentry *dn)
 {
-  if (!bloom)
+  if (!bloom) {
+    /* not create bloom filter for incomplete dir that was added by log replay */
+    if (!is_complete())
+      return;
     bloom = new bloom_filter(100, 0.05, 0);
+  }
   /* This size and false positive probability is completely random.*/
   bloom->insert(dn->name.c_str(), dn->name.size());
 }
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 7152b5e..fcc1ecd 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -5498,9 +5498,10 @@ void MDCache::trim_dentry(CDentry *dn, map<int, MCacheExpire*>& expiremap)
   // adjust the dir state
   // NOTE: we can safely remove a clean, null dentry without effecting
   //       directory completeness.
-  // (do this _before_ we unlink the inode, below!)
+  // (check this _before_ we unlink the inode, below!)
+  bool clear_complete = false;
   if (!(dnl->is_null() && dn->is_clean())) 
-    dir->state_clear(CDir::STATE_COMPLETE); 
+    clear_complete = true;
   
   // unlink the dentry
   if (dnl->is_remote()) {
@@ -5520,6 +5521,9 @@ void MDCache::trim_dentry(CDentry *dn, map<int, MCacheExpire*>& expiremap)
   // remove dentry
   dir->add_to_bloom(dn);
   dir->remove_dentry(dn);
+
+  if (clear_complete)
+    dir->state_clear(CDir::STATE_COMPLETE);
   
   // reexport?
   if (dir->get_num_head_items() == 0 && dir->is_subtree_root())
@@ -5708,9 +5712,8 @@ void MDCache::trim_non_auth()
       else {
 	assert(dnl->is_null());
       }
-      dir->add_to_bloom(dn);
+
       dir->remove_dentry(dn);
-      
       // adjust the dir state
       dir->state_clear(CDir::STATE_COMPLETE);  // dir incomplete!
     }
@@ -5811,7 +5814,6 @@ bool MDCache::trim_non_auth_subtree(CDir *dir)
         dout(20) << "trim_non_auth_subtree(" << dir << ") removing inode " << in << " with dentry" << dn << dendl;
         dir->unlink_inode(dn);
         remove_inode(in);
-        dir->add_to_bloom(dn);
         dir->remove_dentry(dn);
       } else {
         dout(20) << "trim_non_auth_subtree(" << dir << ") keeping inode " << in << " with dentry " << dn <<dendl;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] mds: journal remote inode's projected parent
  2012-12-04  8:09 [PATCH 0/2] fixes for undeletable empty directory Yan, Zheng
  2012-12-04  8:09 ` [PATCH 1/2] mds: don't create bloom filter for incomplete dir Yan, Zheng
@ 2012-12-04  8:09 ` Yan, Zheng
  2012-12-05  8:43 ` [PATCH 0/2] fixes for undeletable empty directory Amon Ott
  2 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng @ 2012-12-04  8:09 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Server::_rename_prepare() adds remote inode's parent instead of
projected parent to the journal. So during journal replay, the
journal entry for the rename operation will wrongly revert the
remote inode's projected rename. This issue can be reproduced by:

 touch file1
 ln file1 file2
 rm file1
 mv file2 file3

After journal replay, file1 reappears and directory's fragstat
gets corrupted.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 src/mds/Server.cc | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 34f40ec..7e099ab 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -5719,9 +5719,10 @@ void Server::_rename_prepare(MDRequest *mdr,
     } else if (destdnl->is_remote()) {
       if (oldin->is_auth()) {
 	// auth for targeti
-	metablob->add_dir_context(oldin->get_parent_dir());
-	mdcache->journal_cow_dentry(mdr, metablob, oldin->parent, CEPH_NOSNAP, 0, destdnl);
-	metablob->add_primary_dentry(oldin->parent, true, oldin);
+	metablob->add_dir_context(oldin->get_projected_parent_dir());
+	mdcache->journal_cow_dentry(mdr, metablob, oldin->get_projected_parent_dn(),
+				    CEPH_NOSNAP, 0, destdnl);
+	metablob->add_primary_dentry(oldin->get_projected_parent_dn(), true, oldin);
       }
       if (destdn->is_auth()) {
 	// auth for dn, not targeti
@@ -5740,10 +5741,10 @@ void Server::_rename_prepare(MDRequest *mdr,
 
       if (destdn->is_auth())
         metablob->add_remote_dentry(destdn, true, srcdnl->get_remote_ino(), srcdnl->get_remote_d_type());
-      if (srci->get_parent_dn()->is_auth()) { // it's remote
-	metablob->add_dir_context(srci->get_parent_dir());
-        mdcache->journal_cow_dentry(mdr, metablob, srci->get_parent_dn(), CEPH_NOSNAP, 0, srcdnl);
-	metablob->add_primary_dentry(srci->get_parent_dn(), true, srci);
+      if (srci->get_projected_parent_dn()->is_auth()) { // it's remote
+	metablob->add_dir_context(srci->get_projected_parent_dir());
+        mdcache->journal_cow_dentry(mdr, metablob, srci->get_projected_parent_dn(), CEPH_NOSNAP, 0, srcdnl);
+	metablob->add_primary_dentry(srci->get_projected_parent_dn(), true, srci);
       }
     } else {
       if (destdn->is_auth() && !destdnl->is_null())
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] fixes for undeletable empty directory
  2012-12-04  8:09 [PATCH 0/2] fixes for undeletable empty directory Yan, Zheng
  2012-12-04  8:09 ` [PATCH 1/2] mds: don't create bloom filter for incomplete dir Yan, Zheng
  2012-12-04  8:09 ` [PATCH 2/2] mds: journal remote inode's projected parent Yan, Zheng
@ 2012-12-05  8:43 ` Amon Ott
  2012-12-06 15:09   ` Yan, Zheng
  2 siblings, 1 reply; 5+ messages in thread
From: Amon Ott @ 2012-12-05  8:43 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

Am 04.12.2012 09:09, schrieb Yan, Zheng:
> These two patches fix MDS restart issues that may corrupt the fragstat accounting
> and cause undeletable empty directory. These two patches, together with the patch
> "mds: fix journaling issue regarding rstat accounting", have passed overnight
> "fsstress + restart MDS every 120s" test. No undeletable directory was found.
> 
> These patches are also in:
>   git://github.com/ukernel/ceph.git wip-mds

Thank you very much for working on MDS issues, your work gave me some
hope that our very long standing crashes at MDS failover could be fixed
soon.

Right now, we have our four node test cluster running with 0.55 and your
latest patches. Hopefully, today's stress test will work fine.

One question: with your latest patches, is a multiple MDS setup stable
in your tests? So far, we only tried with max mds = 1. As our work load
writes many small files, some load sharing for MDS would be nice to
have. I would be willing to help with tests and reports, if needed.

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH           Tel: +49 30 24342334
Am Köllnischen Park 1    Fax: +49 30 99296856
10179 Berlin             http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] fixes for undeletable empty directory
  2012-12-05  8:43 ` [PATCH 0/2] fixes for undeletable empty directory Amon Ott
@ 2012-12-06 15:09   ` Yan, Zheng
  0 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng @ 2012-12-06 15:09 UTC (permalink / raw)
  To: Amon Ott; +Cc: ceph-devel

On 12/05/2012 04:43 PM, Amon Ott wrote:
> Am 04.12.2012 09:09, schrieb Yan, Zheng:
>> These two patches fix MDS restart issues that may corrupt the fragstat accounting
>> and cause undeletable empty directory. These two patches, together with the patch
>> "mds: fix journaling issue regarding rstat accounting", have passed overnight
>> "fsstress + restart MDS every 120s" test. No undeletable directory was found.
>>
>> These patches are also in:
>>   git://github.com/ukernel/ceph.git wip-mds
> 
> Thank you very much for working on MDS issues, your work gave me some
> hope that our very long standing crashes at MDS failover could be fixed
> soon.
Hi,

Sorry for the delay

> 
> Right now, we have our four node test cluster running with 0.55 and your
> latest patches. Hopefully, today's stress test will work fine.
> 
> One question: with your latest patches, is a multiple MDS setup stable
> in your tests? So far, we only tried with max mds = 1. As our work load
> writes many small files, some load sharing for MDS would be nice to
> have. I would be willing to help with tests and reports, if needed.
> 
No, multiple MDS setup is still unstable.

Regards
Yan, Zheng

> Amon Ott
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-12-06 15:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-04  8:09 [PATCH 0/2] fixes for undeletable empty directory Yan, Zheng
2012-12-04  8:09 ` [PATCH 1/2] mds: don't create bloom filter for incomplete dir Yan, Zheng
2012-12-04  8:09 ` [PATCH 2/2] mds: journal remote inode's projected parent Yan, Zheng
2012-12-05  8:43 ` [PATCH 0/2] fixes for undeletable empty directory Amon Ott
2012-12-06 15:09   ` Yan, Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.