* [PATCH 0/2] fixes for undeletable empty directory
@ 2012-12-04 8:09 Yan, Zheng
2012-12-04 8:09 ` [PATCH 1/2] mds: don't create bloom filter for incomplete dir Yan, Zheng
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Yan, Zheng @ 2012-12-04 8:09 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
These two patches fix MDS restart issues that may corrupt the fragstat accounting
and cause undeletable empty directory. These two patches, together with the patch
"mds: fix journaling issue regarding rstat accounting", have passed overnight
"fsstress + restart MDS every 120s" test. No undeletable directory was found.
These patches are also in:
git://github.com/ukernel/ceph.git wip-mds
Regards
Yan, Zheng
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] mds: don't create bloom filter for incomplete dir
2012-12-04 8:09 [PATCH 0/2] fixes for undeletable empty directory Yan, Zheng
@ 2012-12-04 8:09 ` Yan, Zheng
2012-12-04 8:09 ` [PATCH 2/2] mds: journal remote inode's projected parent Yan, Zheng
2012-12-05 8:43 ` [PATCH 0/2] fixes for undeletable empty directory Amon Ott
2 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng @ 2012-12-04 8:09 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Creating bloom filter for incomplete dir that was added by log
replay will confuse subsequent dir lookup and can create null
dentry for existing file. The erroneous null dentry confuses the
fragstat accounting and causes undeletable empty directory.
The fix is check if the dir is complete before creating the bloom
filter. For the MDCache::trim_non_auth{,_subtree} cases, just do
not call CDir::add_to_bloom because bloom filter is useless for
replica.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/CDir.cc | 6 +++++-
src/mds/MDCache.cc | 12 +++++++-----
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc
index c5220ed..835a0e9 100644
--- a/src/mds/CDir.cc
+++ b/src/mds/CDir.cc
@@ -615,8 +615,12 @@ void CDir::unlink_inode_work( CDentry *dn )
void CDir::add_to_bloom(CDentry *dn)
{
- if (!bloom)
+ if (!bloom) {
+ /* not create bloom filter for incomplete dir that was added by log replay */
+ if (!is_complete())
+ return;
bloom = new bloom_filter(100, 0.05, 0);
+ }
/* This size and false positive probability is completely random.*/
bloom->insert(dn->name.c_str(), dn->name.size());
}
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 7152b5e..fcc1ecd 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -5498,9 +5498,10 @@ void MDCache::trim_dentry(CDentry *dn, map<int, MCacheExpire*>& expiremap)
// adjust the dir state
// NOTE: we can safely remove a clean, null dentry without effecting
// directory completeness.
- // (do this _before_ we unlink the inode, below!)
+ // (check this _before_ we unlink the inode, below!)
+ bool clear_complete = false;
if (!(dnl->is_null() && dn->is_clean()))
- dir->state_clear(CDir::STATE_COMPLETE);
+ clear_complete = true;
// unlink the dentry
if (dnl->is_remote()) {
@@ -5520,6 +5521,9 @@ void MDCache::trim_dentry(CDentry *dn, map<int, MCacheExpire*>& expiremap)
// remove dentry
dir->add_to_bloom(dn);
dir->remove_dentry(dn);
+
+ if (clear_complete)
+ dir->state_clear(CDir::STATE_COMPLETE);
// reexport?
if (dir->get_num_head_items() == 0 && dir->is_subtree_root())
@@ -5708,9 +5712,8 @@ void MDCache::trim_non_auth()
else {
assert(dnl->is_null());
}
- dir->add_to_bloom(dn);
+
dir->remove_dentry(dn);
-
// adjust the dir state
dir->state_clear(CDir::STATE_COMPLETE); // dir incomplete!
}
@@ -5811,7 +5814,6 @@ bool MDCache::trim_non_auth_subtree(CDir *dir)
dout(20) << "trim_non_auth_subtree(" << dir << ") removing inode " << in << " with dentry" << dn << dendl;
dir->unlink_inode(dn);
remove_inode(in);
- dir->add_to_bloom(dn);
dir->remove_dentry(dn);
} else {
dout(20) << "trim_non_auth_subtree(" << dir << ") keeping inode " << in << " with dentry " << dn <<dendl;
--
1.7.11.7
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] mds: journal remote inode's projected parent
2012-12-04 8:09 [PATCH 0/2] fixes for undeletable empty directory Yan, Zheng
2012-12-04 8:09 ` [PATCH 1/2] mds: don't create bloom filter for incomplete dir Yan, Zheng
@ 2012-12-04 8:09 ` Yan, Zheng
2012-12-05 8:43 ` [PATCH 0/2] fixes for undeletable empty directory Amon Ott
2 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng @ 2012-12-04 8:09 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Server::_rename_prepare() adds remote inode's parent instead of
projected parent to the journal. So during journal replay, the
journal entry for the rename operation will wrongly revert the
remote inode's projected rename. This issue can be reproduced by:
touch file1
ln file1 file2
rm file1
mv file2 file3
After journal replay, file1 reappears and directory's fragstat
gets corrupted.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/Server.cc | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 34f40ec..7e099ab 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -5719,9 +5719,10 @@ void Server::_rename_prepare(MDRequest *mdr,
} else if (destdnl->is_remote()) {
if (oldin->is_auth()) {
// auth for targeti
- metablob->add_dir_context(oldin->get_parent_dir());
- mdcache->journal_cow_dentry(mdr, metablob, oldin->parent, CEPH_NOSNAP, 0, destdnl);
- metablob->add_primary_dentry(oldin->parent, true, oldin);
+ metablob->add_dir_context(oldin->get_projected_parent_dir());
+ mdcache->journal_cow_dentry(mdr, metablob, oldin->get_projected_parent_dn(),
+ CEPH_NOSNAP, 0, destdnl);
+ metablob->add_primary_dentry(oldin->get_projected_parent_dn(), true, oldin);
}
if (destdn->is_auth()) {
// auth for dn, not targeti
@@ -5740,10 +5741,10 @@ void Server::_rename_prepare(MDRequest *mdr,
if (destdn->is_auth())
metablob->add_remote_dentry(destdn, true, srcdnl->get_remote_ino(), srcdnl->get_remote_d_type());
- if (srci->get_parent_dn()->is_auth()) { // it's remote
- metablob->add_dir_context(srci->get_parent_dir());
- mdcache->journal_cow_dentry(mdr, metablob, srci->get_parent_dn(), CEPH_NOSNAP, 0, srcdnl);
- metablob->add_primary_dentry(srci->get_parent_dn(), true, srci);
+ if (srci->get_projected_parent_dn()->is_auth()) { // it's remote
+ metablob->add_dir_context(srci->get_projected_parent_dir());
+ mdcache->journal_cow_dentry(mdr, metablob, srci->get_projected_parent_dn(), CEPH_NOSNAP, 0, srcdnl);
+ metablob->add_primary_dentry(srci->get_projected_parent_dn(), true, srci);
}
} else {
if (destdn->is_auth() && !destdnl->is_null())
--
1.7.11.7
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] fixes for undeletable empty directory
2012-12-04 8:09 [PATCH 0/2] fixes for undeletable empty directory Yan, Zheng
2012-12-04 8:09 ` [PATCH 1/2] mds: don't create bloom filter for incomplete dir Yan, Zheng
2012-12-04 8:09 ` [PATCH 2/2] mds: journal remote inode's projected parent Yan, Zheng
@ 2012-12-05 8:43 ` Amon Ott
2012-12-06 15:09 ` Yan, Zheng
2 siblings, 1 reply; 5+ messages in thread
From: Amon Ott @ 2012-12-05 8:43 UTC (permalink / raw)
To: Yan, Zheng; +Cc: ceph-devel
Am 04.12.2012 09:09, schrieb Yan, Zheng:
> These two patches fix MDS restart issues that may corrupt the fragstat accounting
> and cause undeletable empty directory. These two patches, together with the patch
> "mds: fix journaling issue regarding rstat accounting", have passed overnight
> "fsstress + restart MDS every 120s" test. No undeletable directory was found.
>
> These patches are also in:
> git://github.com/ukernel/ceph.git wip-mds
Thank you very much for working on MDS issues, your work gave me some
hope that our very long standing crashes at MDS failover could be fixed
soon.
Right now, we have our four node test cluster running with 0.55 and your
latest patches. Hopefully, today's stress test will work fine.
One question: with your latest patches, is a multiple MDS setup stable
in your tests? So far, we only tried with max mds = 1. As our work load
writes many small files, some load sharing for MDS would be nice to
have. I would be willing to help with tests and reports, if needed.
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 99296856
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] fixes for undeletable empty directory
2012-12-05 8:43 ` [PATCH 0/2] fixes for undeletable empty directory Amon Ott
@ 2012-12-06 15:09 ` Yan, Zheng
0 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng @ 2012-12-06 15:09 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel
On 12/05/2012 04:43 PM, Amon Ott wrote:
> Am 04.12.2012 09:09, schrieb Yan, Zheng:
>> These two patches fix MDS restart issues that may corrupt the fragstat accounting
>> and cause undeletable empty directory. These two patches, together with the patch
>> "mds: fix journaling issue regarding rstat accounting", have passed overnight
>> "fsstress + restart MDS every 120s" test. No undeletable directory was found.
>>
>> These patches are also in:
>> git://github.com/ukernel/ceph.git wip-mds
>
> Thank you very much for working on MDS issues, your work gave me some
> hope that our very long standing crashes at MDS failover could be fixed
> soon.
Hi,
Sorry for the delay
>
> Right now, we have our four node test cluster running with 0.55 and your
> latest patches. Hopefully, today's stress test will work fine.
>
> One question: with your latest patches, is a multiple MDS setup stable
> in your tests? So far, we only tried with max mds = 1. As our work load
> writes many small files, some load sharing for MDS would be nice to
> have. I would be willing to help with tests and reports, if needed.
>
No, multiple MDS setup is still unstable.
Regards
Yan, Zheng
> Amon Ott
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-12-06 15:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-04 8:09 [PATCH 0/2] fixes for undeletable empty directory Yan, Zheng
2012-12-04 8:09 ` [PATCH 1/2] mds: don't create bloom filter for incomplete dir Yan, Zheng
2012-12-04 8:09 ` [PATCH 2/2] mds: journal remote inode's projected parent Yan, Zheng
2012-12-05 8:43 ` [PATCH 0/2] fixes for undeletable empty directory Amon Ott
2012-12-06 15:09 ` Yan, Zheng
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.