From: "Yan, Zheng" <zheng.z.yan@intel.com>
To: Sage Weil <sage@inktank.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: [PATCH 1/2] mds: Don't acquire replica object's versionlock
Date: Tue, 06 Nov 2012 16:22:10 +0800 [thread overview]
Message-ID: <5098C8B2.5060404@intel.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1211051048520.30301@cobra.newdream.net>
On 11/06/2012 02:52 AM, Sage Weil wrote:
> On Thu, 1 Nov 2012, Yan, Zheng wrote:
>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>
>> Both CInode and CDentry's versionlocks are of type LocalLock.
>> Acquiring LocalLock in replica object is useless and problematic.
>> For example, if two requests try acquiring a replica object's
>> versionlock, the first request succeeds, the second request
>> is added to wait queue. Later when the first request finishes,
>> MDCache::request_drop_foreign_locks() finds the lock's parent is
>> non-auth, it skips waking requests in the wait queue. So the
>> second request hangs.
>
> I don't remmeber the details, but the iversion locking on replicas came up
> while testing renaming and export thrashing. i.e., running with
>
> mds thrash exports = 1
>
> and doing some rename workload (fsstress maybe?).
I saw assertion in Server::handle_slave_rename_prep() was triggered when I applied
a wrong fix. but never see it again with this patch.
>
> Maybe the fix is just to wake the requests in the queue?
>
If I'm not wrong the version locks are used to wait for all rename operation on
the paths to srcdn/destdn to complete. So witness see consistent paths for srcdn
and destdn no matter how many times the OP_RENAMEPREP slave request is dispatched.
I think the main reason we need version lock is that: For an auth dentry, we may
rdlock it even it is already xlocked. But for non-auth dentry, we only can rdlock
it when the lock is sync state, it guarantees the dentry is not xlocked.
I find a bug in previous patch. 'assert(dn->is_auth())' in Locker::acquire_locks
should be if '(!dn->is_auth()) continue'.
Regards
Yan, Zheng
---
From 6683482d9f9517b990d3e4bae18af275f32491e4 Mon Sep 17 00:00:00 2001
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Date: Thu, 1 Nov 2012 13:21:21 +0800
Subject: [PATCH] mds: Don't acquire replica object's versionlock
Both CInode and CDentry's versionlocks are of type LocalLock.
Acquiring LocalLock in replica object is useless and problematic.
For example, if two requests try acquiring a replica object's
versionlock, the first request succeeds, the second request
is added to wait queue. Later when the first request finishes,
MDCache::request_drop_foreign_locks() finds the lock's parent is
non-auth, it skips waking requests in the wait queue. So the
second request hangs.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
src/mds/Locker.cc | 7 +++++++
src/mds/Server.cc | 25 ++++++++++---------------
2 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index 7b6d449..a1f957a 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@@ -196,6 +196,8 @@ bool Locker::acquire_locks(MDRequest *mdr,
// augment xlock with a versionlock?
if ((*p)->get_type() == CEPH_LOCK_DN) {
CDentry *dn = (CDentry*)(*p)->get_parent();
+ if (!dn->is_auth())
+ continue;
if (xlocks.count(&dn->versionlock))
continue; // we're xlocking the versionlock too; don't wrlock it!
@@ -213,6 +215,8 @@ bool Locker::acquire_locks(MDRequest *mdr,
if ((*p)->get_type() > CEPH_LOCK_IVERSION) {
// inode version lock?
CInode *in = (CInode*)(*p)->get_parent();
+ if (!in->is_auth())
+ continue;
if (mdr->is_master()) {
// master. wrlock versionlock so we can pipeline inode updates to journal.
wrlocks.insert(&in->versionlock);
@@ -3899,6 +3903,7 @@ void Locker::local_wrlock_grab(LocalLock *lock, Mutation *mut)
dout(7) << "local_wrlock_grab on " << *lock
<< " on " << *lock->get_parent() << dendl;
+ assert(lock->get_parent()->is_auth());
assert(lock->can_wrlock());
assert(!mut->wrlocks.count(lock));
lock->get_wrlock(mut->get_client());
@@ -3911,6 +3916,7 @@ bool Locker::local_wrlock_start(LocalLock *lock, MDRequest *mut)
dout(7) << "local_wrlock_start on " << *lock
<< " on " << *lock->get_parent() << dendl;
+ assert(lock->get_parent()->is_auth());
if (lock->can_wrlock()) {
assert(!mut->wrlocks.count(lock));
lock->get_wrlock(mut->get_client());
@@ -3942,6 +3948,7 @@ bool Locker::local_xlock_start(LocalLock *lock, MDRequest *mut)
dout(7) << "local_xlock_start on " << *lock
<< " on " << *lock->get_parent() << dendl;
+ assert(lock->get_parent()->is_auth());
if (!lock->can_xlock_local()) {
lock->add_waiter(SimpleLock::WAIT_WR|SimpleLock::WAIT_STABLE, new C_MDS_RetryRequest(mdcache, mut));
return false;
diff --git a/src/mds/Server.cc b/src/mds/Server.cc
index 4642a13..45c890a 100644
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -5204,25 +5204,20 @@ void Server::handle_client_rename(MDRequest *mdr)
wrlocks.insert(&straydn->get_dir()->inode->nestlock);
}
- // xlock versionlock on srci if remote?
- // this ensures it gets safely remotely auth_pinned, avoiding deadlock;
- // strictly speaking, having the slave node freeze the inode is
- // otherwise sufficient for avoiding conflicts with inode locks, etc.
- if (!srcdn->is_auth() && srcdnl->is_primary()) // xlock versionlock on srci if there are any witnesses
- xlocks.insert(&srci->versionlock);
-
// xlock versionlock on dentries if there are witnesses.
// replicas can't see projected dentry linkages, and will get
// confused if we try to pipeline things.
if (!witnesses.empty()) {
- if (srcdn->is_projected())
- xlocks.insert(&srcdn->versionlock);
- if (destdn->is_projected())
- xlocks.insert(&destdn->versionlock);
- // also take rdlock on all ancestor dentries for destdn. this ensures that the
- // destdn can be traversed to by the witnesses.
- for (int i=0; i<(int)desttrace.size(); i++)
- xlocks.insert(&desttrace[i]->versionlock);
+ // take xlock on all projected dentries for srcdn and destdn. this ensures
+ // that the srcdn and destdn can be traversed to by the witnesses.
+ for (int i= 0; i<(int)srctrace.size(); i++) {
+ if (srctrace[i]->is_auth() && srctrace[i]->is_projected())
+ xlocks.insert(&srctrace[i]->versionlock);
+ }
+ for (int i=0; i<(int)desttrace.size(); i++) {
+ if (desttrace[i]->is_auth() && desttrace[i]->is_projected())
+ xlocks.insert(&desttrace[i]->versionlock);
+ }
}
// we need to update srci's ctime. xlock its least contended lock to do that...
--
1.7.11.7
next prev parent reply other threads:[~2012-11-06 8:22 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-01 9:03 [PATCH 1/2] ceph: Don't update i_max_size when handling non-auth cap Yan, Zheng
2012-11-01 9:03 ` [PATCH 1/2] mds: Don't acquire replica object's versionlock Yan, Zheng
2012-11-05 18:52 ` Sage Weil
2012-11-06 8:22 ` Yan, Zheng [this message]
2012-11-01 9:03 ` [PATCH 2/2] ceph: Fix i_size update race Yan, Zheng
2012-11-01 9:03 ` [PATCH 2/2] mds: Allow try_eval to eval unstable locks in freezing object Yan, Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5098C8B2.5060404@intel.com \
--to=zheng.z.yan@intel.com \
--cc=ceph-devel@vger.kernel.org \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.