* [PATCH 00/14] fixes for MDS
@ 2012-12-11 8:30 Yan, Zheng
2012-12-11 8:30 ` [PATCH 01/14] mds: fix journaling issue regarding rstat accounting Yan, Zheng
` (14 more replies)
0 siblings, 15 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
The first patch fixes a journal bug that may corrupt the rstat accounting,
I think it should be included in the next release. The rest patches fix
various issues I encountered when running 3 MDSs with thrash_exports==1.
Regards
Yan, Zheng
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 01/14] mds: fix journaling issue regarding rstat accounting
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 02/14] mds: alllow handle_client_readdir() fetching freezing dir Yan, Zheng
` (13 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Rename operation can call predirty_journal_parents() several times.
So a directory fragment's rstat can also be modified several times.
But only the first modification is journaled because EMetaBlob::add_dir()
does not update existing dirlump.
For example: when hanlding 'mv a/b/c a/c', Server::_rename_prepare may
first decrease directory a and b's nested files count by one, then
increases directory a's nested files count by one.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/events/EMetaBlob.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/mds/events/EMetaBlob.h b/src/mds/events/EMetaBlob.h
index 9c281e9..116b704 100644
--- a/src/mds/events/EMetaBlob.h
+++ b/src/mds/events/EMetaBlob.h
@@ -635,12 +635,12 @@ private:
dirty, complete, isnew);
}
dirlump& add_dir(dirfrag_t df, fnode_t *pf, version_t pv, bool dirty, bool complete=false, bool isnew=false) {
- if (lump_map.count(df) == 0) {
+ if (lump_map.count(df) == 0)
lump_order.push_back(df);
- lump_map[df].fnode = *pf;
- lump_map[df].fnode.version = pv;
- }
+
dirlump& l = lump_map[df];
+ l.fnode = *pf;
+ l.fnode.version = pv;
if (complete) l.mark_complete();
if (dirty) l.mark_dirty();
if (isnew) l.mark_new();
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 02/14] mds: alllow handle_client_readdir() fetching freezing dir.
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
2012-12-11 8:30 ` [PATCH 01/14] mds: fix journaling issue regarding rstat accounting Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 03/14] mds: properly mark dirfrag dirty Yan, Zheng
` (12 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
At that point, the request already auth pins and locks some objects.
So CDir::fetch() should ignore the can_auth_pin check and continue
to fetch freezing dir.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/Server.cc | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index ba43656..c8c52e1 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -2733,9 +2733,14 @@ void Server::handle_client_readdir(MDRequest *mdr)
assert(dir->is_auth());
if (!dir->is_complete()) {
+ if (dir->is_frozen()) {
+ dout(7) << "dir is frozen " << *dir << dendl;
+ dir->add_waiter(CDir::WAIT_UNFREEZE, new C_MDS_RetryRequest(mdcache, mdr));
+ return;
+ }
// fetch
dout(10) << " incomplete dir contents for readdir on " << *dir << ", fetching" << dendl;
- dir->fetch(new C_MDS_RetryRequest(mdcache, mdr));
+ dir->fetch(new C_MDS_RetryRequest(mdcache, mdr), true);
return;
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 03/14] mds: properly mark dirfrag dirty
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
2012-12-11 8:30 ` [PATCH 01/14] mds: fix journaling issue regarding rstat accounting Yan, Zheng
2012-12-11 8:30 ` [PATCH 02/14] mds: alllow handle_client_readdir() fetching freezing dir Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 04/14] mds: no bloom filter for replica dir Yan, Zheng
` (11 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
If predirty_journal_parents() does not propagate changes in dir's
fragstat into corresponding inode's dirstat, it should mark the
inode as dirfrag dirty. This happens when we modify dir fragments
that are auth subtree roots.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/MDCache.cc | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 58a8b8a..c8055ea 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -1993,6 +1993,11 @@ void MDCache::predirty_journal_parents(Mutation *mut, EMetaBlob *blob,
mds->locker->mark_updated_scatterlock(&pin->nestlock);
mut->ls->dirty_dirfrag_nest.push_back(&pin->item_dirty_dirfrag_nest);
mut->add_updated_lock(&pin->nestlock);
+ if (do_parent_mtime || linkunlink) {
+ mds->locker->mark_updated_scatterlock(&pin->filelock);
+ mut->ls->dirty_dirfrag_dir.push_back(&pin->item_dirty_dirfrag_dir);
+ mut->add_updated_lock(&pin->filelock);
+ }
break;
}
if (!mut->wrlocks.count(&pin->versionlock))
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 04/14] mds: no bloom filter for replica dir
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (2 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 03/14] mds: properly mark dirfrag dirty Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 05/14] mds: set want_base_dir to false for MDCache::discover_ino() Yan, Zheng
` (10 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
We should delete dir fragment's bloom filter after exporting the dir
fragment to other MDS. Otherwise the residual bloom filter may cause
problem if the MDS imports dir fragment later.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/CDir.cc | 9 +++++++--
src/mds/CDir.h | 1 +
src/mds/MDCache.cc | 5 ++++-
src/mds/Migrator.cc | 2 ++
4 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc
index 4b1d3ef..cbda038 100644
--- a/src/mds/CDir.cc
+++ b/src/mds/CDir.cc
@@ -632,6 +632,12 @@ bool CDir::is_in_bloom(const string& name)
return bloom->contains(name.c_str(), name.size());
}
+void CDir::remove_bloom()
+{
+ delete bloom;
+ bloom = NULL;
+}
+
void CDir::remove_null_dentries() {
dout(12) << "remove_null_dentries " << *this << dendl;
@@ -1287,8 +1293,7 @@ void CDir::log_mark_dirty()
void CDir::mark_complete() {
state_set(STATE_COMPLETE);
- delete bloom;
- bloom = NULL;
+ remove_bloom();
}
void CDir::first_get()
diff --git a/src/mds/CDir.h b/src/mds/CDir.h
index 2222418..91e53d2 100644
--- a/src/mds/CDir.h
+++ b/src/mds/CDir.h
@@ -349,6 +349,7 @@ protected:
void add_to_bloom(CDentry *dn);
bool is_in_bloom(const string& name);
bool has_bloom() { return (bloom ? true : false); }
+ void remove_bloom();
private:
void link_inode_work( CDentry *dn, CInode *in );
void unlink_inode_work( CDentry *dn );
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index c8055ea..7733d0d 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -5524,7 +5524,8 @@ void MDCache::trim_dentry(CDentry *dn, map<int, MCacheExpire*>& expiremap)
}
// remove dentry
- dir->add_to_bloom(dn);
+ if (dir->is_auth())
+ dir->add_to_bloom(dn);
dir->remove_dentry(dn);
if (clear_complete)
@@ -5718,6 +5719,7 @@ void MDCache::trim_non_auth()
assert(dnl->is_null());
}
+ assert(!dir->has_bloom());
dir->remove_dentry(dn);
// adjust the dir state
dir->state_clear(CDir::STATE_COMPLETE); // dir incomplete!
@@ -5819,6 +5821,7 @@ bool MDCache::trim_non_auth_subtree(CDir *dir)
dout(20) << "trim_non_auth_subtree(" << dir << ") removing inode " << in << " with dentry" << dn << dendl;
dir->unlink_inode(dn);
remove_inode(in);
+ assert(!dir->has_bloom());
dir->remove_dentry(dn);
} else {
dout(20) << "trim_non_auth_subtree(" << dir << ") keeping inode " << in << " with dentry " << dn <<dendl;
diff --git a/src/mds/Migrator.cc b/src/mds/Migrator.cc
index a804eab..cc045b4 100644
--- a/src/mds/Migrator.cc
+++ b/src/mds/Migrator.cc
@@ -1196,6 +1196,7 @@ void Migrator::finish_export_dir(CDir *dir, list<Context*>& finished, utime_t no
// mark
assert(dir->is_auth());
dir->state_clear(CDir::STATE_AUTH);
+ dir->remove_bloom();
dir->replica_nonce = CDir::NONCE_EXPORT;
if (dir->is_dirty())
@@ -2006,6 +2007,7 @@ void Migrator::import_reverse(CDir *dir)
// dir
assert(cur->is_auth());
cur->state_clear(CDir::STATE_AUTH);
+ cur->remove_bloom();
cur->clear_replica_map();
if (cur->is_dirty())
cur->mark_clean();
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 05/14] mds: set want_base_dir to false for MDCache::discover_ino()
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (3 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 04/14] mds: no bloom filter for replica dir Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 06/14] mds: fix error hanlding in MDCache::handle_discover_reply() Yan, Zheng
` (9 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
When frozen inode is encountered, MDCache::handle_discover() sends
reply immediately if the reply message is not empty. When handling
"discover ino" requests, the reply message always contains the base
directory fragment. But requestor already has the base directory
fragment, the only effect of the reply message is wake the requestor
and make it send same "discover ino" request again. So the requestor
keeps sending "discover ino" requests but can't make any progress.
The fix is set want_base_dir to false for MDCache::discover_ino().
After set want_base_dir to false, also need update the code that
handles "discover ino" error.
This patch also remove unused error handling code for flag_error_dn
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/MDCache.cc | 38 +++++++++++++-------------------------
1 file changed, 13 insertions(+), 25 deletions(-)
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 7733d0d..893b651 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -8547,7 +8547,7 @@ void MDCache::discover_ino(CDir *base,
d.ino = base->ino();
d.frag = base->get_frag();
d.want_ino = want_ino;
- d.want_base_dir = true;
+ d.want_base_dir = false;
d.want_xlocked = want_xlocked;
_send_discover(d);
}
@@ -8890,6 +8890,18 @@ void MDCache::handle_discover_reply(MDiscoverReply *m)
}
}
+ // discover ino error
+ if (p.end() && m->is_flag_error_ino()) {
+ assert(cur->is_dir());
+ CDir *dir = cur->get_dirfrag(m->get_base_dir_frag());
+ if (dir) {
+ dout(7) << " flag_error on ino " << m->get_wanted_ino()
+ << ", triggering ino" << dendl;
+ dir->take_ino_waiting(m->get_wanted_ino(), error);
+ } else
+ assert(0);
+ }
+
// discover may start with an inode
if (!p.end() && next == MDiscoverReply::INODE) {
cur = add_replica_inode(p, NULL, finished);
@@ -8925,30 +8937,6 @@ void MDCache::handle_discover_reply(MDiscoverReply *m)
curdir = cur->get_dirfrag(m->get_base_dir_frag());
}
- // dentry error?
- if (p.end() && (m->is_flag_error_dn() || m->is_flag_error_ino())) {
- // error!
- assert(cur->is_dir());
- if (curdir) {
- if (m->get_error_dentry().length()) {
- dout(7) << " flag_error on dentry " << m->get_error_dentry()
- << ", triggering dentry" << dendl;
- curdir->take_dentry_waiting(m->get_error_dentry(),
- m->get_wanted_snapid(), m->get_wanted_snapid(), error);
- } else {
- dout(7) << " flag_error on ino " << m->get_wanted_ino()
- << ", triggering ino" << dendl;
- curdir->take_ino_waiting(m->get_wanted_ino(), error);
- }
- } else {
- dout(7) << " flag_error on dentry " << m->get_error_dentry()
- << ", triggering dir?" << dendl;
- cur->take_waiting(CInode::WAIT_DIR, error);
- }
- break;
- }
- assert(curdir);
-
if (p.end())
break;
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 06/14] mds: fix error hanlding in MDCache::handle_discover_reply()
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (4 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 05/14] mds: set want_base_dir to false for MDCache::discover_ino() Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 07/14] mds: always send discover if want_xlocked is true Yan, Zheng
` (8 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
The error hanlding code in MDCache::handle_discover_reply() has two
main issues. MDCache::handle_discover_reply() does not wake waiters
if dir_auth_hint in reply message is equal to itself's nodeid. This
can happen if discover race with subtree importing. Another issue is
that it checks the existence of cached directory fragment to decide
if it should take waiter from inode or from directory fragment. The
check is unreliable because subtree importing can add directory
fragments to the cache.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/MDCache.cc | 54 ++++++++++++++++++++++++++++++++++++------------------
1 file changed, 36 insertions(+), 18 deletions(-)
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 893b651..eb18eeb 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -8957,9 +8957,7 @@ void MDCache::handle_discover_reply(MDiscoverReply *m)
if (m->is_flag_error_dir() && !cur->is_dir()) {
// not a dir.
cur->take_waiting(CInode::WAIT_DIR, error);
- } else if (m->is_flag_error_dir() ||
- (m->get_dir_auth_hint() != CDIR_AUTH_UNKNOWN &&
- m->get_dir_auth_hint() != mds->get_nodeid())) {
+ } else if (m->is_flag_error_dir() || m->get_dir_auth_hint() != CDIR_AUTH_UNKNOWN) {
int who = m->get_dir_auth_hint();
if (who == mds->get_nodeid()) who = -1;
if (who >= 0)
@@ -8971,27 +8969,47 @@ void MDCache::handle_discover_reply(MDiscoverReply *m)
frag_t fg = cur->pick_dirfrag(m->get_error_dentry());
CDir *dir = cur->get_dirfrag(fg);
filepath relpath(m->get_error_dentry(), 0);
+
+ if (cur->is_waiter_for(CInode::WAIT_DIR)) {
+ if (cur->is_auth() || dir)
+ cur->take_waiting(CInode::WAIT_DIR, finished);
+ else
+ discover_path(cur, m->get_wanted_snapid(), relpath, 0, m->get_wanted_xlocked(), who);
+ } else
+ dout(7) << " doing nothing, nobody is waiting for dir" << dendl;
+
if (dir) {
// don't actaully need the hint, now
- if (dir->lookup(m->get_error_dentry()) == 0 &&
- dir->is_waiting_for_dentry(m->get_error_dentry().c_str(), m->get_wanted_snapid()))
- discover_path(dir, m->get_wanted_snapid(), relpath, 0, m->get_wanted_xlocked());
- else
- dout(7) << " doing nothing, have dir but nobody is waiting on dentry "
+ if (dir->is_waiting_for_dentry(m->get_error_dentry().c_str(), m->get_wanted_snapid())) {
+ if (dir->is_auth() || dir->lookup(m->get_error_dentry()))
+ dir->take_dentry_waiting(m->get_error_dentry(), m->get_wanted_snapid(),
+ m->get_wanted_snapid(), finished);
+ else
+ discover_path(dir, m->get_wanted_snapid(), relpath, 0, m->get_wanted_xlocked());
+ } else
+ dout(7) << " doing nothing, have dir but nobody is waiting on dentry "
<< m->get_error_dentry() << dendl;
- } else {
- if (cur->is_waiter_for(CInode::WAIT_DIR))
- discover_path(cur, m->get_wanted_snapid(), relpath, 0, m->get_wanted_xlocked(), who);
- else
- dout(7) << " doing nothing, nobody is waiting for dir" << dendl;
}
} else {
- // wanted just the dir
+ // wanted dir or ino
frag_t fg = m->get_base_dir_frag();
- if (cur->get_dirfrag(fg) == 0 && cur->is_waiter_for(CInode::WAIT_DIR))
- discover_dir_frag(cur, fg, 0, who);
- else
- dout(7) << " doing nothing, nobody is waiting for dir" << dendl;
+ CDir *dir = cur->get_dirfrag(fg);
+
+ if (cur->is_waiter_for(CInode::WAIT_DIR)) {
+ if (cur->is_auth() || dir)
+ cur->take_waiting(CInode::WAIT_DIR, finished);
+ else
+ discover_dir_frag(cur, fg, 0, who);
+ } else
+ dout(7) << " doing nothing, nobody is waiting for dir" << dendl;
+
+ if (dir && m->get_wanted_ino() && dir->is_waiting_for_ino(m->get_wanted_ino())) {
+ if (dir->is_auth() || get_inode(m->get_wanted_ino()))
+ dir->take_ino_waiting(m->get_wanted_ino(), finished);
+ else
+ discover_ino(dir, m->get_wanted_ino(), 0, m->get_wanted_xlocked());
+ } else
+ dout(7) << " doing nothing, nobody is waiting for ino" << dendl;
}
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 07/14] mds: always send discover if want_xlocked is true
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (5 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 06/14] mds: fix error hanlding in MDCache::handle_discover_reply() Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 08/14] mds: re-issue caps after importing caps Yan, Zheng
` (7 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
If want_xlocked is true, we can not rely on previously sent discover
because it's likely the previous discover is blocking on the xlocked
dentry.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/MDCache.cc | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index eb18eeb..326e9d7 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -8449,7 +8449,8 @@ void MDCache::discover_path(CInode *base,
return;
}
- if (!base->is_waiter_for(CInode::WAIT_DIR) || !onfinish) { // FIXME: weak!
+ if ((want_xlocked && want_path.depth() == 1) ||
+ !base->is_waiter_for(CInode::WAIT_DIR) || !onfinish) { // FIXME: weak!
discover_info_t& d = _create_discover(from);
d.ino = base->ino();
d.snap = snap;
@@ -8496,7 +8497,8 @@ void MDCache::discover_path(CDir *base,
return;
}
- if (!base->is_waiting_for_dentry(want_path[0].c_str(), snap) || !onfinish) {
+ if ((want_xlocked && want_path.depth() == 1) ||
+ !base->is_waiting_for_dentry(want_path[0].c_str(), snap) || !onfinish) {
discover_info_t& d = _create_discover(from);
d.ino = base->ino();
d.frag = base->get_frag();
@@ -8542,7 +8544,7 @@ void MDCache::discover_ino(CDir *base,
return;
}
- if (!base->is_waiting_for_ino(want_ino)) {
+ if (want_xlocked || !base->is_waiting_for_ino(want_ino) || !onfinish) {
discover_info_t& d = _create_discover(from);
d.ino = base->ino();
d.frag = base->get_frag();
@@ -8801,11 +8803,14 @@ void MDCache::handle_discover(MDiscover *dis)
// is this the last (tail) item in the discover traversal?
if (tailitem && dis->wants_xlocked()) {
dout(7) << "handle_discover allowing discovery of xlocked tail " << *dn << dendl;
- } else {
+ } else if (reply->is_empty()) {
dout(7) << "handle_discover blocking on xlocked " << *dn << dendl;
dn->lock.add_waiter(SimpleLock::WAIT_RD, new C_MDS_RetryMessage(mds, dis));
reply->put();
return;
+ } else {
+ dout(7) << "handle_discover non-empty reply, xlocked tail " << *dn << dendl;
+ break;
}
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 08/14] mds: re-issue caps after importing caps
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (6 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 07/14] mds: always send discover if want_xlocked is true Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 09/14] mds: take export lock set before sending MExportDirDiscover Yan, Zheng
` (6 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
The imported caps may prevent unstable locks from entering stable
states. So we should call Locker::eval_gather() with parameter
"first" set to true after caps are imported.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/Locker.cc | 16 ++++++++--------
src/mds/Locker.h | 6 +++---
src/mds/Migrator.cc | 3 ++-
3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index a8ec19f..860577f 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@@ -769,7 +769,7 @@ void Locker::eval_gather(SimpleLock *lock, bool first, bool *pneed_issue, list<C
}
-bool Locker::eval(CInode *in, int mask)
+bool Locker::eval(CInode *in, int mask, bool caps_imported)
{
bool need_issue = false;
@@ -790,19 +790,19 @@ bool Locker::eval(CInode *in, int mask)
retry:
if (mask & CEPH_LOCK_IFILE)
- eval_any(&in->filelock, &need_issue);
+ eval_any(&in->filelock, &need_issue, caps_imported);
if (mask & CEPH_LOCK_IAUTH)
- eval_any(&in->authlock, &need_issue);
+ eval_any(&in->authlock, &need_issue, caps_imported);
if (mask & CEPH_LOCK_ILINK)
- eval_any(&in->linklock, &need_issue);
+ eval_any(&in->linklock, &need_issue,caps_imported);
if (mask & CEPH_LOCK_IXATTR)
- eval_any(&in->xattrlock, &need_issue);
+ eval_any(&in->xattrlock, &need_issue, caps_imported);
if (mask & CEPH_LOCK_INEST)
- eval_any(&in->nestlock, &need_issue);
+ eval_any(&in->nestlock, &need_issue, caps_imported);
if (mask & CEPH_LOCK_IFLOCK)
- eval_any(&in->flocklock, &need_issue);
+ eval_any(&in->flocklock, &need_issue, caps_imported);
if (mask & CEPH_LOCK_IPOLICY)
- eval_any(&in->policylock, &need_issue);
+ eval_any(&in->policylock, &need_issue, caps_imported);
// drop loner?
if (in->is_auth() && in->is_head() && in->get_wanted_loner() != in->get_loner()) {
diff --git a/src/mds/Locker.h b/src/mds/Locker.h
index b3b9919..04a5252 100644
--- a/src/mds/Locker.h
+++ b/src/mds/Locker.h
@@ -99,9 +99,9 @@ public:
void eval_gather(SimpleLock *lock, bool first=false, bool *need_issue=0, list<Context*> *pfinishers=0);
void eval(SimpleLock *lock, bool *need_issue);
- void eval_any(SimpleLock *lock, bool *need_issue) {
+ void eval_any(SimpleLock *lock, bool *need_issue, bool first=false) {
if (!lock->is_stable())
- eval_gather(lock, false, need_issue);
+ eval_gather(lock, first, need_issue);
else if (lock->get_parent()->is_auth())
eval(lock, need_issue);
}
@@ -122,7 +122,7 @@ public:
void eval_cap_gather(CInode *in, set<CInode*> *issue_set=0);
- bool eval(CInode *in, int mask);
+ bool eval(CInode *in, int mask, bool caps_imported=false);
void try_eval(MDSCacheObject *p, int mask);
void try_eval(SimpleLock *lock, bool *pneed_issue);
diff --git a/src/mds/Migrator.cc b/src/mds/Migrator.cc
index cc045b4..c157279 100644
--- a/src/mds/Migrator.cc
+++ b/src/mds/Migrator.cc
@@ -2230,7 +2230,7 @@ void Migrator::import_finish(CDir *dir)
p != cap_imports.end();
p++)
if (p->first->is_auth())
- mds->locker->eval(p->first, CEPH_CAP_LOCKS);
+ mds->locker->eval(p->first, CEPH_CAP_LOCKS, true);
// send pending import_maps?
mds->mdcache->maybe_send_pending_resolves();
@@ -2614,6 +2614,7 @@ void Migrator::logged_import_caps(CInode *in,
assert(cap_imports.count(in));
finish_import_inode_caps(in, from, cap_imports[in]);
+ mds->locker->eval(in, CEPH_CAP_LOCKS, true);
mds->send_message_mds(new MExportCapsAck(in->ino()), from);
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 09/14] mds: take export lock set before sending MExportDirDiscover
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (7 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 08/14] mds: re-issue caps after importing caps Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 10/14] mds: don't retry readdir request after issuing caps Yan, Zheng
` (5 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Migrator::export_dir() only check if it can lock the export lock set
but not take the lock set. So someone else can change the path to
the exporting dir and confuse Migrator::handle_export_discover().
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/Migrator.cc | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/mds/Migrator.cc b/src/mds/Migrator.cc
index c157279..5db21cd 100644
--- a/src/mds/Migrator.cc
+++ b/src/mds/Migrator.cc
@@ -236,6 +236,8 @@ void Migrator::handle_mds_failure_or_stop(int who)
dir->unfreeze_tree(); // cancel the freeze
dir->auth_unpin(this);
export_state.erase(dir); // clean up
+ export_unlock(dir);
+ export_locks.erase(dir);
dir->state_clear(CDir::STATE_EXPORTING);
if (export_peer[dir] != who) // tell them.
mds->send_message_mds(new MExportDirCancel(dir->dirfrag()), export_peer[dir]);
@@ -663,6 +665,8 @@ void Migrator::export_dir(CDir *dir, int dest)
dout(7) << "export_dir can't rdlock needed locks, failing." << dendl;
return;
}
+ mds->locker->rdlock_take_set(locks);
+ export_locks[dir].swap(locks);
// ok.
assert(export_state.count(dir) == 0);
@@ -705,6 +709,9 @@ void Migrator::handle_export_discover_ack(MExportDirDiscoverAck *m)
export_peer[dir] != m->get_source().num()) {
dout(7) << "must have aborted" << dendl;
} else {
+ // release locks to avoid deadlock
+ export_unlock(dir);
+ export_locks.erase(dir);
// freeze the subtree
export_state[dir] = EXPORT_FREEZING;
dir->auth_unpin(this);
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 10/14] mds: don't retry readdir request after issuing caps
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (8 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 09/14] mds: take export lock set before sending MExportDirDiscover Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 11/14] mds: delay processing cache expire when state >= EXPORT_EXPORTING Yan, Zheng
` (4 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
If remote linkage without inode is encountered after some caps are
issued, Server::handle_client_readdir() should send the reply to
client immediately instead of retrying the request after opening
the remote dentry. This is because the MDS may want to revoke these
caps before the MDS succeeds in opening the remote dentry.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/Server.cc | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index c8c52e1..c95344e 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -2830,14 +2830,22 @@ void Server::handle_client_readdir(MDRequest *mdr)
dout(10) << "skipping bad remote ino on " << *dn << dendl;
continue;
} else {
- mdcache->open_remote_dentry(dn, dnp, new C_MDS_RetryRequest(mdcache, mdr));
-
// touch everything i _do_ have
for (it = dir->begin();
it != dir->end();
it++)
if (!it->second->get_linkage()->is_null())
mdcache->lru.lru_touch(it->second);
+
+ // already issued caps and leases, reply immediately.
+ if (dnbl.length() > 0) {
+ mdcache->open_remote_dentry(dn, dnp, new C_NoopContext);
+ dout(10) << " open remote dentry after caps were issued, stopping at "
+ << dnbl.length() << " < " << bytes_left << dendl;
+ break;
+ }
+
+ mdcache->open_remote_dentry(dn, dnp, new C_MDS_RetryRequest(mdcache, mdr));
return;
}
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 11/14] mds: delay processing cache expire when state >= EXPORT_EXPORTING
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (9 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 10/14] mds: don't retry readdir request after issuing caps Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 12/14] mds: fix file existing check in Server::handle_client_openc() Yan, Zheng
` (3 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
It's possible that MDS receives cache expire in EXPORT_LOGGINGFINISH
and EXPORT_NOTIFYING states.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/MDCache.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 326e9d7..fe100f9 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -5924,7 +5924,7 @@ void MDCache::handle_cache_expire(MCacheExpire *m)
(parent_dir->is_auth() && parent_dir->is_exporting() &&
((migrator->get_export_state(parent_dir) == Migrator::EXPORT_WARNING &&
migrator->export_has_warned(parent_dir,from)) ||
- migrator->get_export_state(parent_dir) == Migrator::EXPORT_EXPORTING))) {
+ migrator->get_export_state(parent_dir) >= Migrator::EXPORT_EXPORTING))) {
// not auth.
dout(7) << "delaying nonauth|warned expires for " << *parent_dir << dendl;
assert(parent_dir->is_frozen_tree_root());
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 12/14] mds: fix file existing check in Server::handle_client_openc()
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (10 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 11/14] mds: delay processing cache expire when state >= EXPORT_EXPORTING Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:30 ` [PATCH 13/14] mds: fix race between send_dentry_link() and cache expire Yan, Zheng
` (2 subsequent siblings)
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Creating new file needs to be handled by directory fragment's auth
MDS, opening existing file in write mode needs to be handled by
corresponding inode's auth MDS. If a file is remote link, its parent
directory fragment's auth MDS can be different from corresponding
inode's auth MDS. So which MDS to handle create file request can be
affected by if the corresponding file already exists.
handle_client_openc() calls rdlock_path_xlock_dentry() at the very
beginning. It always assumes the request needs to be handled by
directory fragment's auth MDS. When handling a create file request,
if the file already exists and remotely linked to a non-auth inode,
handle_client_openc() falls back to handle_client_open(),
handle_client_open() forwards the request because the MDS is not
inode's auth MDS. Then when the request arrives at inode's auth MDS,
rdlock_path_xlock_dentry() is called, it will forward the request
back.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/MDCache.cc | 9 +++++++--
src/mds/Server.cc | 33 ++++++++++++++++++++++++++-------
2 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index fe100f9..43a3954 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -6717,13 +6717,18 @@ int MDCache::path_traverse(MDRequest *mdr, Message *req, Context *fin, // wh
// can we conclude ENOENT?
if (dnl && dnl->is_null()) {
- if (mds->locker->rdlock_try(&dn->lock, client, NULL)) {
+ if (dn->lock.can_read(client) ||
+ (dn->lock.is_xlocked() && dn->lock.get_xlock_by() == mdr)) {
dout(10) << "traverse: miss on null+readable dentry " << path[depth] << " " << *dn << dendl;
return -ENOENT;
- } else {
+ } else if (curdir->is_auth()) {
dout(10) << "miss on dentry " << *dn << ", can't read due to lock" << dendl;
dn->lock.add_waiter(SimpleLock::WAIT_RD, _get_waiter(mdr, req, fin));
return 1;
+ } else {
+ // non-auth and can not read, treat this as no dentry
+ dn = NULL;
+ dnl = NULL;
}
}
diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index c95344e..60d3793 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -2589,6 +2589,29 @@ void Server::handle_client_openc(MDRequest *mdr)
return;
}
+ if (!(req->head.args.open.flags & O_EXCL)) {
+ int r = mdcache->path_traverse(mdr, NULL, NULL, req->get_filepath(),
+ &mdr->dn[0], NULL, MDS_TRAVERSE_FORWARD);
+ if (r > 0) return;
+ if (r == 0) {
+ // it existed.
+ handle_client_open(mdr);
+ return;
+ }
+ if (r < 0 && r != -ENOENT) {
+ if (r == -ESTALE) {
+ dout(10) << "FAIL on ESTALE but attempting recovery" << dendl;
+ Context *c = new C_MDS_TryFindInode(this, mdr);
+ mdcache->find_ino_peers(req->get_filepath().get_ino(), c);
+ } else {
+ dout(10) << "FAIL on error " << r << dendl;
+ reply_request(mdr, r);
+ }
+ return;
+ }
+ // r == -ENOENT
+ }
+
bool excl = (req->head.args.open.flags & O_EXCL);
set<SimpleLock*> rdlocks, wrlocks, xlocks;
ceph_file_layout *dir_layout = NULL;
@@ -2630,13 +2653,9 @@ void Server::handle_client_openc(MDRequest *mdr)
if (!dnl->is_null()) {
// it existed.
- if (req->head.args.open.flags & O_EXCL) {
- dout(10) << "O_EXCL, target exists, failing with -EEXIST" << dendl;
- reply_request(mdr, -EEXIST, dnl->get_inode(), dn);
- return;
- }
-
- handle_client_open(mdr);
+ assert(req->head.args.open.flags & O_EXCL);
+ dout(10) << "O_EXCL, target exists, failing with -EEXIST" << dendl;
+ reply_request(mdr, -EEXIST, dnl->get_inode(), dn);
return;
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 13/14] mds: fix race between send_dentry_link() and cache expire
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (11 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 12/14] mds: fix file existing check in Server::handle_client_openc() Yan, Zheng
@ 2012-12-11 8:30 ` Yan, Zheng
2012-12-11 8:31 ` [PATCH 14/14] mds: compare sessionmap version before replaying imported sessions Yan, Zheng
2012-12-11 8:33 ` [PATCH 00/14] fixes for MDS Yan, Zheng
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:30 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
MDentryLink message can race with cache expire, When it arrives at
the target MDS, it's possible there is no corresponding dentry in
the cache. If this race happens, we should expire the replica inode
encoded in the MDentryLink message. But to expire an inode, the MDS
need to know which subtree does the inode belong to, so modify the
MDentryLink message to include this information.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/MDCache.cc | 85 ++++++++++++++++++++++++++++++----------------
src/messages/MDentryLink.h | 7 +++-
2 files changed, 61 insertions(+), 31 deletions(-)
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 43a3954..3579261 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -9269,14 +9269,15 @@ void MDCache::send_dentry_link(CDentry *dn)
{
dout(7) << "send_dentry_link " << *dn << dendl;
+ CDir *subtree = get_subtree_root(dn->get_dir());
for (map<int,int>::iterator p = dn->replicas_begin();
p != dn->replicas_end();
p++) {
if (mds->mdsmap->get_state(p->first) < MDSMap::STATE_REJOIN)
continue;
CDentry::linkage_t *dnl = dn->get_linkage();
- MDentryLink *m = new MDentryLink(dn->get_dir()->dirfrag(), dn->name,
- dnl->is_primary());
+ MDentryLink *m = new MDentryLink(subtree->dirfrag(), dn->get_dir()->dirfrag(),
+ dn->name, dnl->is_primary());
if (dnl->is_primary()) {
dout(10) << " primary " << *dnl->get_inode() << dendl;
replicate_inode(dnl->get_inode(), p->first, m->bl);
@@ -9295,32 +9296,48 @@ void MDCache::send_dentry_link(CDentry *dn)
/* This function DOES put the passed message before returning */
void MDCache::handle_dentry_link(MDentryLink *m)
{
- CDir *dir = get_dirfrag(m->get_dirfrag());
- assert(dir);
- CDentry *dn = dir->lookup(m->get_dn());
- assert(dn);
- dout(7) << "handle_dentry_link on " << *dn << dendl;
- CDentry::linkage_t *dnl = dn->get_linkage();
+ CDentry *dn = NULL;
+ CDir *dir = get_dirfrag(m->get_dirfrag());
+ if (!dir) {
+ dout(7) << "handle_dentry_link don't have dirfrag " << m->get_dirfrag() << dendl;
+ } else {
+ dn = dir->lookup(m->get_dn());
+ if (!dn) {
+ dout(7) << "handle_dentry_link don't have dentry " << *dir << " dn " << m->get_dn() << dendl;
+ } else {
+ dout(7) << "handle_dentry_link on " << *dn << dendl;
+ CDentry::linkage_t *dnl = dn->get_linkage();
- assert(!dn->is_auth());
- assert(dnl->is_null());
+ assert(!dn->is_auth());
+ assert(dnl->is_null());
+ }
+ }
bufferlist::iterator p = m->bl.begin();
list<Context*> finished;
-
- if (m->get_is_primary()) {
- // primary link.
- add_replica_inode(p, dn, finished);
- } else {
- // remote link, easy enough.
- inodeno_t ino;
- __u8 d_type;
- ::decode(ino, p);
- ::decode(d_type, p);
- dir->link_remote_inode(dn, ino, d_type);
+ if (dn) {
+ if (m->get_is_primary()) {
+ // primary link.
+ add_replica_inode(p, dn, finished);
+ } else {
+ // remote link, easy enough.
+ inodeno_t ino;
+ __u8 d_type;
+ ::decode(ino, p);
+ ::decode(d_type, p);
+ dir->link_remote_inode(dn, ino, d_type);
+ }
+ } else if (m->get_is_primary()) {
+ CInode *in = add_replica_inode(p, NULL, finished);
+ assert(in->get_num_ref() == 0);
+ assert(in->get_parent_dn() == NULL);
+ MCacheExpire* expire = new MCacheExpire(mds->get_nodeid());
+ expire->add_inode(m->get_subtree(), in->vino(), in->get_replica_nonce());
+ mds->send_message_mds(expire, m->get_source().num());
+ remove_inode(in);
}
-
+
if (!finished.empty())
mds->queue_waiters(finished);
@@ -9352,6 +9369,11 @@ void MDCache::send_dentry_unlink(CDentry *dn, CDentry *straydn, MDRequest *mdr)
/* This function DOES put the passed message before returning */
void MDCache::handle_dentry_unlink(MDentryUnlink *m)
{
+ // straydn
+ CDentry *straydn = NULL;
+ if (m->straybl.length())
+ straydn = add_replica_stray(m->straybl, m->get_source().num());
+
CDir *dir = get_dirfrag(m->get_dirfrag());
if (!dir) {
dout(7) << "handle_dentry_unlink don't have dirfrag " << m->get_dirfrag() << dendl;
@@ -9363,13 +9385,6 @@ void MDCache::handle_dentry_unlink(MDentryUnlink *m)
dout(7) << "handle_dentry_unlink on " << *dn << dendl;
CDentry::linkage_t *dnl = dn->get_linkage();
- // straydn
- CDentry *straydn = NULL;
- if (m->straybl.length()) {
- int from = m->get_source().num();
- straydn = add_replica_stray(m->straybl, from);
- }
-
// open inode?
if (dnl->is_primary()) {
CInode *in = dnl->get_inode();
@@ -9392,8 +9407,9 @@ void MDCache::handle_dentry_unlink(MDentryUnlink *m)
migrator->export_caps(in);
lru.lru_bottouch(straydn); // move stray to end of lru
-
+ straydn = NULL;
} else {
+ assert(!straydn);
assert(dnl->is_remote());
dn->dir->unlink_inode(dn);
}
@@ -9404,6 +9420,15 @@ void MDCache::handle_dentry_unlink(MDentryUnlink *m)
}
}
+ // race with trim_dentry()
+ if (straydn) {
+ assert(straydn->get_num_ref() == 0);
+ assert(straydn->get_linkage()->is_null());
+ map<int, MCacheExpire*> expiremap;
+ trim_dentry(straydn, expiremap);
+ send_expire_messages(expiremap);
+ }
+
m->put();
return;
}
diff --git a/src/messages/MDentryLink.h b/src/messages/MDentryLink.h
index ed02bc2..b351532 100644
--- a/src/messages/MDentryLink.h
+++ b/src/messages/MDentryLink.h
@@ -17,11 +17,13 @@
#define CEPH_MDENTRYLINK_H
class MDentryLink : public Message {
+ dirfrag_t subtree;
dirfrag_t dirfrag;
string dn;
bool is_primary;
public:
+ dirfrag_t get_subtree() { return subtree; }
dirfrag_t get_dirfrag() { return dirfrag; }
string& get_dn() { return dn; }
bool get_is_primary() { return is_primary; }
@@ -30,8 +32,9 @@ class MDentryLink : public Message {
MDentryLink() :
Message(MSG_MDS_DENTRYLINK) { }
- MDentryLink(dirfrag_t df, string& n, bool p) :
+ MDentryLink(dirfrag_t r, dirfrag_t df, string& n, bool p) :
Message(MSG_MDS_DENTRYLINK),
+ subtree(r),
dirfrag(df),
dn(n),
is_primary(p) {}
@@ -46,12 +49,14 @@ public:
void decode_payload() {
bufferlist::iterator p = payload.begin();
+ ::decode(subtree, p);
::decode(dirfrag, p);
::decode(dn, p);
::decode(is_primary, p);
::decode(bl, p);
}
void encode_payload(uint64_t features) {
+ ::encode(subtree, payload);
::encode(dirfrag, payload);
::encode(dn, payload);
::encode(is_primary, payload);
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 14/14] mds: compare sessionmap version before replaying imported sessions
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (12 preceding siblings ...)
2012-12-11 8:30 ` [PATCH 13/14] mds: fix race between send_dentry_link() and cache expire Yan, Zheng
@ 2012-12-11 8:31 ` Yan, Zheng
2012-12-11 8:33 ` [PATCH 00/14] fixes for MDS Yan, Zheng
14 siblings, 0 replies; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:31 UTC (permalink / raw)
To: ceph-devel, sage; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Otherwise we may wrongly increase mds->sessionmap.version, which
will confuse future journal replays that involving sessionmap.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/Server.cc | 2 ++
src/mds/events/EUpdate.h | 8 ++++++--
src/mds/journal.cc | 24 +++++++++++++++++-------
3 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 60d3793..2b0f22f 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -5441,6 +5441,8 @@ void Server::handle_client_rename(MDRequest *mdr)
}
_rename_prepare(mdr, &le->metablob, &le->client_map, srcdn, destdn, straydn);
+ if (le->client_map.length())
+ le->cmapv = mds->sessionmap.projected;
// -- commit locally --
C_MDS_rename_finish *fin = new C_MDS_rename_finish(mds, mdr, srcdn, destdn, straydn);
diff --git a/src/mds/events/EUpdate.h b/src/mds/events/EUpdate.h
index 6ce18fe..a302a5a 100644
--- a/src/mds/events/EUpdate.h
+++ b/src/mds/events/EUpdate.h
@@ -23,13 +23,14 @@ public:
EMetaBlob metablob;
string type;
bufferlist client_map;
+ version_t cmapv;
metareqid_t reqid;
bool had_slaves;
EUpdate() : LogEvent(EVENT_UPDATE) { }
EUpdate(MDLog *mdlog, const char *s) :
LogEvent(EVENT_UPDATE), metablob(mdlog),
- type(s), had_slaves(false) { }
+ type(s), cmapv(0), had_slaves(false) { }
void print(ostream& out) {
if (type.length())
@@ -38,12 +39,13 @@ public:
}
void encode(bufferlist &bl) const {
- __u8 struct_v = 2;
+ __u8 struct_v = 3;
::encode(struct_v, bl);
::encode(stamp, bl);
::encode(type, bl);
::encode(metablob, bl);
::encode(client_map, bl);
+ ::encode(cmapv, bl);
::encode(reqid, bl);
::encode(had_slaves, bl);
}
@@ -55,6 +57,8 @@ public:
::decode(type, bl);
::decode(metablob, bl);
::decode(client_map, bl);
+ if (struct_v >= 3)
+ ::decode(cmapv, bl);
::decode(reqid, bl);
::decode(had_slaves, bl);
}
diff --git a/src/mds/journal.cc b/src/mds/journal.cc
index 46adbf2..b25096c 100644
--- a/src/mds/journal.cc
+++ b/src/mds/journal.cc
@@ -996,14 +996,24 @@ void EUpdate::replay(MDS *mds)
mds->mdcache->add_uncommitted_master(reqid, _segment, slaves);
}
- // open client sessions?
- map<client_t,entity_inst_t> cm;
- map<client_t, uint64_t> seqm;
if (client_map.length()) {
- bufferlist::iterator blp = client_map.begin();
- ::decode(cm, blp);
- mds->server->prepare_force_open_sessions(cm, seqm);
- mds->server->finish_force_open_sessions(cm, seqm);
+ if (mds->sessionmap.version >= cmapv) {
+ dout(10) << "EUpdate.replay sessionmap v " << cmapv
+ << " <= table " << mds->sessionmap.version << dendl;
+ } else {
+ dout(10) << "EUpdate.replay sessionmap " << mds->sessionmap.version
+ << " < " << cmapv << dendl;
+ // open client sessions?
+ map<client_t,entity_inst_t> cm;
+ map<client_t, uint64_t> seqm;
+ bufferlist::iterator blp = client_map.begin();
+ ::decode(cm, blp);
+ mds->server->prepare_force_open_sessions(cm, seqm);
+ mds->server->finish_force_open_sessions(cm, seqm);
+
+ assert(mds->sessionmap.version = cmapv);
+ mds->sessionmap.projected = mds->sessionmap.version;
+ }
}
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 00/14] fixes for MDS
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
` (13 preceding siblings ...)
2012-12-11 8:31 ` [PATCH 14/14] mds: compare sessionmap version before replaying imported sessions Yan, Zheng
@ 2012-12-11 8:33 ` Yan, Zheng
2012-12-11 17:11 ` Sage Weil
14 siblings, 1 reply; 17+ messages in thread
From: Yan, Zheng @ 2012-12-11 8:33 UTC (permalink / raw)
To: ceph-devel, sage
On 12/11/2012 04:30 PM, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>
> The first patch fixes a journal bug that may corrupt the rstat accounting,
> I think it should be included in the next release. The rest patches fix
> various issues I encountered when running 3 MDSs with thrash_exports==1.
>
These patches are also in:
git://github.com/ukernel/ceph.git wip-mds
> Regards
> Yan, Zheng
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 00/14] fixes for MDS
2012-12-11 8:33 ` [PATCH 00/14] fixes for MDS Yan, Zheng
@ 2012-12-11 17:11 ` Sage Weil
0 siblings, 0 replies; 17+ messages in thread
From: Sage Weil @ 2012-12-11 17:11 UTC (permalink / raw)
To: Yan, Zheng; +Cc: ceph-devel
On Tue, 11 Dec 2012, Yan, Zheng wrote:
> On 12/11/2012 04:30 PM, Yan, Zheng wrote:
> > From: "Yan, Zheng" <zheng.z.yan@intel.com>
> >
> > The first patch fixes a journal bug that may corrupt the rstat accounting,
> > I think it should be included in the next release. The rest patches fix
> > various issues I encountered when running 3 MDSs with thrash_exports==1.
> >
Applied to next, thanks!
> These patches are also in:
> git://github.com/ukernel/ceph.git wip-mds
Ah, much easier than git-am. :)
sage
>
> > Regards
> > Yan, Zheng
> >
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-12-11 17:11 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-11 8:30 [PATCH 00/14] fixes for MDS Yan, Zheng
2012-12-11 8:30 ` [PATCH 01/14] mds: fix journaling issue regarding rstat accounting Yan, Zheng
2012-12-11 8:30 ` [PATCH 02/14] mds: alllow handle_client_readdir() fetching freezing dir Yan, Zheng
2012-12-11 8:30 ` [PATCH 03/14] mds: properly mark dirfrag dirty Yan, Zheng
2012-12-11 8:30 ` [PATCH 04/14] mds: no bloom filter for replica dir Yan, Zheng
2012-12-11 8:30 ` [PATCH 05/14] mds: set want_base_dir to false for MDCache::discover_ino() Yan, Zheng
2012-12-11 8:30 ` [PATCH 06/14] mds: fix error hanlding in MDCache::handle_discover_reply() Yan, Zheng
2012-12-11 8:30 ` [PATCH 07/14] mds: always send discover if want_xlocked is true Yan, Zheng
2012-12-11 8:30 ` [PATCH 08/14] mds: re-issue caps after importing caps Yan, Zheng
2012-12-11 8:30 ` [PATCH 09/14] mds: take export lock set before sending MExportDirDiscover Yan, Zheng
2012-12-11 8:30 ` [PATCH 10/14] mds: don't retry readdir request after issuing caps Yan, Zheng
2012-12-11 8:30 ` [PATCH 11/14] mds: delay processing cache expire when state >= EXPORT_EXPORTING Yan, Zheng
2012-12-11 8:30 ` [PATCH 12/14] mds: fix file existing check in Server::handle_client_openc() Yan, Zheng
2012-12-11 8:30 ` [PATCH 13/14] mds: fix race between send_dentry_link() and cache expire Yan, Zheng
2012-12-11 8:31 ` [PATCH 14/14] mds: compare sessionmap version before replaying imported sessions Yan, Zheng
2012-12-11 8:33 ` [PATCH 00/14] fixes for MDS Yan, Zheng
2012-12-11 17:11 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.