* [PATCH 1/2] ceph: Don't update i_max_size when handling non-auth cap
@ 2012-11-01 9:03 Yan, Zheng
2012-11-01 9:03 ` [PATCH 1/2] mds: Don't acquire replica object's versionlock Yan, Zheng
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Yan, Zheng @ 2012-11-01 9:03 UTC (permalink / raw)
To: sage, ceph-devel; +Cc: Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
The cap from non-auth mds doesn't have a meaningful max_size value.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
fs/ceph/caps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 3251e9c..c633d1d 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2388,7 +2388,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
&atime);
/* max size increase? */
- if (max_size != ci->i_max_size) {
+ if (ci->i_auth_cap == cap && max_size != ci->i_max_size) {
dout("max_size %lld -> %llu\n", ci->i_max_size, max_size);
ci->i_max_size = max_size;
if (max_size >= ci->i_wanted_max_size) {
--
1.7.11.7
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 1/2] mds: Don't acquire replica object's versionlock 2012-11-01 9:03 [PATCH 1/2] ceph: Don't update i_max_size when handling non-auth cap Yan, Zheng @ 2012-11-01 9:03 ` Yan, Zheng 2012-11-05 18:52 ` Sage Weil 2012-11-01 9:03 ` [PATCH 2/2] ceph: Fix i_size update race Yan, Zheng 2012-11-01 9:03 ` [PATCH 2/2] mds: Allow try_eval to eval unstable locks in freezing object Yan, Zheng 2 siblings, 1 reply; 6+ messages in thread From: Yan, Zheng @ 2012-11-01 9:03 UTC (permalink / raw) To: sage, ceph-devel; +Cc: Yan, Zheng From: "Yan, Zheng" <zheng.z.yan@intel.com> Both CInode and CDentry's versionlocks are of type LocalLock. Acquiring LocalLock in replica object is useless and problematic. For example, if two requests try acquiring a replica object's versionlock, the first request succeeds, the second request is added to wait queue. Later when the first request finishes, MDCache::request_drop_foreign_locks() finds the lock's parent is non-auth, it skips waking requests in the wait queue. So the second request hangs. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> --- src/mds/Locker.cc | 6 ++++++ src/mds/Server.cc | 25 ++++++++++--------------- 3 files changed, 17 insertions(+), 16 deletions(-) diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc index e033bbe..6474743 100644 --- a/src/mds/Locker.cc +++ b/src/mds/Locker.cc @@ -196,6 +196,7 @@ bool Locker::acquire_locks(MDRequest *mdr, // augment xlock with a versionlock? if ((*p)->get_type() == CEPH_LOCK_DN) { CDentry *dn = (CDentry*)(*p)->get_parent(); + assert(dn->is_auth()); if (xlocks.count(&dn->versionlock)) continue; // we're xlocking the versionlock too; don't wrlock it! @@ -213,6 +214,8 @@ bool Locker::acquire_locks(MDRequest *mdr, if ((*p)->get_type() > CEPH_LOCK_IVERSION) { // inode version lock? CInode *in = (CInode*)(*p)->get_parent(); + if (!in->is_auth()) + continue; if (mdr->is_master()) { // master. wrlock versionlock so we can pipeline inode updates to journal. wrlocks.insert(&in->versionlock); @@ -3899,6 +3902,7 @@ void Locker::local_wrlock_grab(LocalLock *lock, Mutation *mut) dout(7) << "local_wrlock_grab on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); assert(lock->can_wrlock()); assert(!mut->wrlocks.count(lock)); lock->get_wrlock(mut->get_client()); @@ -3911,6 +3915,7 @@ bool Locker::local_wrlock_start(LocalLock *lock, MDRequest *mut) dout(7) << "local_wrlock_start on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); if (lock->can_wrlock()) { assert(!mut->wrlocks.count(lock)); lock->get_wrlock(mut->get_client()); @@ -3942,6 +3947,7 @@ bool Locker::local_xlock_start(LocalLock *lock, MDRequest *mut) dout(7) << "local_xlock_start on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); if (!lock->can_xlock_local()) { lock->add_waiter(SimpleLock::WAIT_WR|SimpleLock::WAIT_STABLE, new C_MDS_RetryRequest(mdcache, mut)); return false; diff --git a/src/mds/Server.cc b/src/mds/Server.cc index 4642a13..45c890a 100644 --- a/src/mds/Server.cc +++ b/src/mds/Server.cc @@ -5204,25 +5204,20 @@ void Server::handle_client_rename(MDRequest *mdr) wrlocks.insert(&straydn->get_dir()->inode->nestlock); } - // xlock versionlock on srci if remote? - // this ensures it gets safely remotely auth_pinned, avoiding deadlock; - // strictly speaking, having the slave node freeze the inode is - // otherwise sufficient for avoiding conflicts with inode locks, etc. - if (!srcdn->is_auth() && srcdnl->is_primary()) // xlock versionlock on srci if there are any witnesses - xlocks.insert(&srci->versionlock); - // xlock versionlock on dentries if there are witnesses. // replicas can't see projected dentry linkages, and will get // confused if we try to pipeline things. if (!witnesses.empty()) { - if (srcdn->is_projected()) - xlocks.insert(&srcdn->versionlock); - if (destdn->is_projected()) - xlocks.insert(&destdn->versionlock); - // also take rdlock on all ancestor dentries for destdn. this ensures that the - // destdn can be traversed to by the witnesses. - for (int i=0; i<(int)desttrace.size(); i++) - xlocks.insert(&desttrace[i]->versionlock); + // take xlock on all projected dentries for srcdn and destdn. this ensures + // that the srcdn and destdn can be traversed to by the witnesses. + for (int i= 0; i<(int)srctrace.size(); i++) { + if (srctrace[i]->is_auth() && srctrace[i]->is_projected()) + xlocks.insert(&srctrace[i]->versionlock); + } + for (int i=0; i<(int)desttrace.size(); i++) { + if (desttrace[i]->is_auth() && desttrace[i]->is_projected()) + xlocks.insert(&desttrace[i]->versionlock); + } } // we need to update srci's ctime. xlock its least contended lock to do that... -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] mds: Don't acquire replica object's versionlock 2012-11-01 9:03 ` [PATCH 1/2] mds: Don't acquire replica object's versionlock Yan, Zheng @ 2012-11-05 18:52 ` Sage Weil 2012-11-06 8:22 ` Yan, Zheng 0 siblings, 1 reply; 6+ messages in thread From: Sage Weil @ 2012-11-05 18:52 UTC (permalink / raw) To: Yan, Zheng; +Cc: ceph-devel On Thu, 1 Nov 2012, Yan, Zheng wrote: > From: "Yan, Zheng" <zheng.z.yan@intel.com> > > Both CInode and CDentry's versionlocks are of type LocalLock. > Acquiring LocalLock in replica object is useless and problematic. > For example, if two requests try acquiring a replica object's > versionlock, the first request succeeds, the second request > is added to wait queue. Later when the first request finishes, > MDCache::request_drop_foreign_locks() finds the lock's parent is > non-auth, it skips waking requests in the wait queue. So the > second request hangs. I don't remmeber the details, but the iversion locking on replicas came up while testing renaming and export thrashing. i.e., running with mds thrash exports = 1 and doing some rename workload (fsstress maybe?). Maybe the fix is just to wake the requests in the queue? s > > Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> > --- > src/mds/Locker.cc | 6 ++++++ > src/mds/Server.cc | 25 ++++++++++--------------- > 3 files changed, 17 insertions(+), 16 deletions(-) > > diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc > index e033bbe..6474743 100644 > --- a/src/mds/Locker.cc > +++ b/src/mds/Locker.cc > @@ -196,6 +196,7 @@ bool Locker::acquire_locks(MDRequest *mdr, > // augment xlock with a versionlock? > if ((*p)->get_type() == CEPH_LOCK_DN) { > CDentry *dn = (CDentry*)(*p)->get_parent(); > + assert(dn->is_auth()); > > if (xlocks.count(&dn->versionlock)) > continue; // we're xlocking the versionlock too; don't wrlock it! > @@ -213,6 +214,8 @@ bool Locker::acquire_locks(MDRequest *mdr, > if ((*p)->get_type() > CEPH_LOCK_IVERSION) { > // inode version lock? > CInode *in = (CInode*)(*p)->get_parent(); > + if (!in->is_auth()) > + continue; > if (mdr->is_master()) { > // master. wrlock versionlock so we can pipeline inode updates to journal. > wrlocks.insert(&in->versionlock); > @@ -3899,6 +3902,7 @@ void Locker::local_wrlock_grab(LocalLock *lock, Mutation *mut) > dout(7) << "local_wrlock_grab on " << *lock > << " on " << *lock->get_parent() << dendl; > > + assert(lock->get_parent()->is_auth()); > assert(lock->can_wrlock()); > assert(!mut->wrlocks.count(lock)); > lock->get_wrlock(mut->get_client()); > @@ -3911,6 +3915,7 @@ bool Locker::local_wrlock_start(LocalLock *lock, MDRequest *mut) > dout(7) << "local_wrlock_start on " << *lock > << " on " << *lock->get_parent() << dendl; > > + assert(lock->get_parent()->is_auth()); > if (lock->can_wrlock()) { > assert(!mut->wrlocks.count(lock)); > lock->get_wrlock(mut->get_client()); > @@ -3942,6 +3947,7 @@ bool Locker::local_xlock_start(LocalLock *lock, MDRequest *mut) > dout(7) << "local_xlock_start on " << *lock > << " on " << *lock->get_parent() << dendl; > > + assert(lock->get_parent()->is_auth()); > if (!lock->can_xlock_local()) { > lock->add_waiter(SimpleLock::WAIT_WR|SimpleLock::WAIT_STABLE, new C_MDS_RetryRequest(mdcache, mut)); > return false; > diff --git a/src/mds/Server.cc b/src/mds/Server.cc > index 4642a13..45c890a 100644 > --- a/src/mds/Server.cc > +++ b/src/mds/Server.cc > @@ -5204,25 +5204,20 @@ void Server::handle_client_rename(MDRequest *mdr) > wrlocks.insert(&straydn->get_dir()->inode->nestlock); > } > > - // xlock versionlock on srci if remote? > - // this ensures it gets safely remotely auth_pinned, avoiding deadlock; > - // strictly speaking, having the slave node freeze the inode is > - // otherwise sufficient for avoiding conflicts with inode locks, etc. > - if (!srcdn->is_auth() && srcdnl->is_primary()) // xlock versionlock on srci if there are any witnesses > - xlocks.insert(&srci->versionlock); > - > // xlock versionlock on dentries if there are witnesses. > // replicas can't see projected dentry linkages, and will get > // confused if we try to pipeline things. > if (!witnesses.empty()) { > - if (srcdn->is_projected()) > - xlocks.insert(&srcdn->versionlock); > - if (destdn->is_projected()) > - xlocks.insert(&destdn->versionlock); > - // also take rdlock on all ancestor dentries for destdn. this ensures that the > - // destdn can be traversed to by the witnesses. > - for (int i=0; i<(int)desttrace.size(); i++) > - xlocks.insert(&desttrace[i]->versionlock); > + // take xlock on all projected dentries for srcdn and destdn. this ensures > + // that the srcdn and destdn can be traversed to by the witnesses. > + for (int i= 0; i<(int)srctrace.size(); i++) { > + if (srctrace[i]->is_auth() && srctrace[i]->is_projected()) > + xlocks.insert(&srctrace[i]->versionlock); > + } > + for (int i=0; i<(int)desttrace.size(); i++) { > + if (desttrace[i]->is_auth() && desttrace[i]->is_projected()) > + xlocks.insert(&desttrace[i]->versionlock); > + } > } > > // we need to update srci's ctime. xlock its least contended lock to do that... > -- > 1.7.11.7 > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] mds: Don't acquire replica object's versionlock 2012-11-05 18:52 ` Sage Weil @ 2012-11-06 8:22 ` Yan, Zheng 0 siblings, 0 replies; 6+ messages in thread From: Yan, Zheng @ 2012-11-06 8:22 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On 11/06/2012 02:52 AM, Sage Weil wrote: > On Thu, 1 Nov 2012, Yan, Zheng wrote: >> From: "Yan, Zheng" <zheng.z.yan@intel.com> >> >> Both CInode and CDentry's versionlocks are of type LocalLock. >> Acquiring LocalLock in replica object is useless and problematic. >> For example, if two requests try acquiring a replica object's >> versionlock, the first request succeeds, the second request >> is added to wait queue. Later when the first request finishes, >> MDCache::request_drop_foreign_locks() finds the lock's parent is >> non-auth, it skips waking requests in the wait queue. So the >> second request hangs. > > I don't remmeber the details, but the iversion locking on replicas came up > while testing renaming and export thrashing. i.e., running with > > mds thrash exports = 1 > > and doing some rename workload (fsstress maybe?). I saw assertion in Server::handle_slave_rename_prep() was triggered when I applied a wrong fix. but never see it again with this patch. > > Maybe the fix is just to wake the requests in the queue? > If I'm not wrong the version locks are used to wait for all rename operation on the paths to srcdn/destdn to complete. So witness see consistent paths for srcdn and destdn no matter how many times the OP_RENAMEPREP slave request is dispatched. I think the main reason we need version lock is that: For an auth dentry, we may rdlock it even it is already xlocked. But for non-auth dentry, we only can rdlock it when the lock is sync state, it guarantees the dentry is not xlocked. I find a bug in previous patch. 'assert(dn->is_auth())' in Locker::acquire_locks should be if '(!dn->is_auth()) continue'. Regards Yan, Zheng --- From 6683482d9f9517b990d3e4bae18af275f32491e4 Mon Sep 17 00:00:00 2001 From: "Yan, Zheng" <zheng.z.yan@intel.com> Date: Thu, 1 Nov 2012 13:21:21 +0800 Subject: [PATCH] mds: Don't acquire replica object's versionlock Both CInode and CDentry's versionlocks are of type LocalLock. Acquiring LocalLock in replica object is useless and problematic. For example, if two requests try acquiring a replica object's versionlock, the first request succeeds, the second request is added to wait queue. Later when the first request finishes, MDCache::request_drop_foreign_locks() finds the lock's parent is non-auth, it skips waking requests in the wait queue. So the second request hangs. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> --- src/mds/Locker.cc | 7 +++++++ src/mds/Server.cc | 25 ++++++++++--------------- 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc index 7b6d449..a1f957a 100644 --- a/src/mds/Locker.cc +++ b/src/mds/Locker.cc @@ -196,6 +196,8 @@ bool Locker::acquire_locks(MDRequest *mdr, // augment xlock with a versionlock? if ((*p)->get_type() == CEPH_LOCK_DN) { CDentry *dn = (CDentry*)(*p)->get_parent(); + if (!dn->is_auth()) + continue; if (xlocks.count(&dn->versionlock)) continue; // we're xlocking the versionlock too; don't wrlock it! @@ -213,6 +215,8 @@ bool Locker::acquire_locks(MDRequest *mdr, if ((*p)->get_type() > CEPH_LOCK_IVERSION) { // inode version lock? CInode *in = (CInode*)(*p)->get_parent(); + if (!in->is_auth()) + continue; if (mdr->is_master()) { // master. wrlock versionlock so we can pipeline inode updates to journal. wrlocks.insert(&in->versionlock); @@ -3899,6 +3903,7 @@ void Locker::local_wrlock_grab(LocalLock *lock, Mutation *mut) dout(7) << "local_wrlock_grab on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); assert(lock->can_wrlock()); assert(!mut->wrlocks.count(lock)); lock->get_wrlock(mut->get_client()); @@ -3911,6 +3916,7 @@ bool Locker::local_wrlock_start(LocalLock *lock, MDRequest *mut) dout(7) << "local_wrlock_start on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); if (lock->can_wrlock()) { assert(!mut->wrlocks.count(lock)); lock->get_wrlock(mut->get_client()); @@ -3942,6 +3948,7 @@ bool Locker::local_xlock_start(LocalLock *lock, MDRequest *mut) dout(7) << "local_xlock_start on " << *lock << " on " << *lock->get_parent() << dendl; + assert(lock->get_parent()->is_auth()); if (!lock->can_xlock_local()) { lock->add_waiter(SimpleLock::WAIT_WR|SimpleLock::WAIT_STABLE, new C_MDS_RetryRequest(mdcache, mut)); return false; diff --git a/src/mds/Server.cc b/src/mds/Server.cc index 4642a13..45c890a 100644 --- a/src/mds/Server.cc +++ b/src/mds/Server.cc @@ -5204,25 +5204,20 @@ void Server::handle_client_rename(MDRequest *mdr) wrlocks.insert(&straydn->get_dir()->inode->nestlock); } - // xlock versionlock on srci if remote? - // this ensures it gets safely remotely auth_pinned, avoiding deadlock; - // strictly speaking, having the slave node freeze the inode is - // otherwise sufficient for avoiding conflicts with inode locks, etc. - if (!srcdn->is_auth() && srcdnl->is_primary()) // xlock versionlock on srci if there are any witnesses - xlocks.insert(&srci->versionlock); - // xlock versionlock on dentries if there are witnesses. // replicas can't see projected dentry linkages, and will get // confused if we try to pipeline things. if (!witnesses.empty()) { - if (srcdn->is_projected()) - xlocks.insert(&srcdn->versionlock); - if (destdn->is_projected()) - xlocks.insert(&destdn->versionlock); - // also take rdlock on all ancestor dentries for destdn. this ensures that the - // destdn can be traversed to by the witnesses. - for (int i=0; i<(int)desttrace.size(); i++) - xlocks.insert(&desttrace[i]->versionlock); + // take xlock on all projected dentries for srcdn and destdn. this ensures + // that the srcdn and destdn can be traversed to by the witnesses. + for (int i= 0; i<(int)srctrace.size(); i++) { + if (srctrace[i]->is_auth() && srctrace[i]->is_projected()) + xlocks.insert(&srctrace[i]->versionlock); + } + for (int i=0; i<(int)desttrace.size(); i++) { + if (desttrace[i]->is_auth() && desttrace[i]->is_projected()) + xlocks.insert(&desttrace[i]->versionlock); + } } // we need to update srci's ctime. xlock its least contended lock to do that... -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] ceph: Fix i_size update race 2012-11-01 9:03 [PATCH 1/2] ceph: Don't update i_max_size when handling non-auth cap Yan, Zheng 2012-11-01 9:03 ` [PATCH 1/2] mds: Don't acquire replica object's versionlock Yan, Zheng @ 2012-11-01 9:03 ` Yan, Zheng 2012-11-01 9:03 ` [PATCH 2/2] mds: Allow try_eval to eval unstable locks in freezing object Yan, Zheng 2 siblings, 0 replies; 6+ messages in thread From: Yan, Zheng @ 2012-11-01 9:03 UTC (permalink / raw) To: sage, ceph-devel; +Cc: Yan, Zheng From: "Yan, Zheng" <zheng.z.yan@intel.com> ceph_aio_write() has an optimization that marks cap EPH_CAP_FILE_WR dirty before data is copied to page cache and inode size is updated. If sceph_check_caps() flushes the dirty cap before the inode size is updated, MDS can miss the new inode size. The fix is move ceph_{get,put}_cap_refs() into ceph_write_{begin,end}() and call __ceph_mark_dirty_caps() after inode size is updated. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> --- fs/ceph/addr.c | 43 ++++++++++++++++++++++++++++++++++-- fs/ceph/file.c | 69 +++++++++++++++++++++++----------------------------------- 2 files changed, 68 insertions(+), 44 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 22b6e45..8dd8d05 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1078,23 +1078,47 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping, struct page **pagep, void **fsdata) { struct inode *inode = file->f_dentry->d_inode; + struct ceph_inode_info *ci = ceph_inode(inode); + struct ceph_file_info *fi = file->private_data; struct page *page; pgoff_t index = pos >> PAGE_CACHE_SHIFT; - int r; + int r, want, got = 0; + + if (fi->fmode & CEPH_FILE_MODE_LAZY) + want = CEPH_CAP_FILE_BUFFER | CEPH_CAP_FILE_LAZYIO; + else + want = CEPH_CAP_FILE_BUFFER; + r = ceph_get_caps(ci, CEPH_CAP_FILE_WR, want, &got, pos+len); + if (r < 0) + return r; + + dout("write_begin %p %llx.%llx %llu~%u got cap refs on %s\n", + inode, ceph_vinop(inode), pos, len, ceph_cap_string(got)); + if (!(got & (CEPH_CAP_FILE_BUFFER|CEPH_CAP_FILE_LAZYIO))) { + ceph_put_cap_refs(ci, got); + return -EAGAIN; + } do { /* get a page */ page = grab_cache_page_write_begin(mapping, index, 0); if (!page) return -ENOMEM; - *pagep = page; dout("write_begin file %p inode %p page %p %d~%d\n", file, inode, page, (int)pos, (int)len); r = ceph_update_writeable_page(file, pos, len, page); + if (r) + page_cache_release(page); } while (r == -EAGAIN); + if (r) { + ceph_put_cap_refs(ci, got); + } else { + *pagep = page; + *(int *)fsdata = got; + } return r; } @@ -1108,10 +1132,12 @@ static int ceph_write_end(struct file *file, struct address_space *mapping, struct page *page, void *fsdata) { struct inode *inode = file->f_dentry->d_inode; + struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); struct ceph_mds_client *mdsc = fsc->mdsc; unsigned from = pos & (PAGE_CACHE_SIZE - 1); int check_cap = 0; + int got = (unsigned long)fsdata; dout("write_end file %p inode %p page %p %d~%d (%d)\n", file, inode, page, (int)pos, (int)copied, (int)len); @@ -1134,6 +1160,19 @@ static int ceph_write_end(struct file *file, struct address_space *mapping, up_read(&mdsc->snap_rwsem); page_cache_release(page); + if (copied > 0) { + int dirty; + spin_lock(&ci->i_ceph_lock); + dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_FILE_WR); + spin_unlock(&ci->i_ceph_lock); + if (dirty) + __mark_inode_dirty(inode, dirty); + } + + dout("write_end %p %llx.%llx %llu~%u dropping cap refs on %s\n", + inode, ceph_vinop(inode), pos, len, ceph_cap_string(got)); + ceph_put_cap_refs(ci, got); + if (check_cap) ceph_check_caps(ceph_inode(inode), CHECK_CAPS_AUTHONLY, NULL); diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 5840d2a..266f6e0 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -712,63 +712,49 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov, struct ceph_osd_client *osdc = &ceph_sb_to_client(inode->i_sb)->client->osdc; loff_t endoff = pos + iov->iov_len; - int want, got = 0; - int ret, err; + int got = 0; + int ret, err, written; if (ceph_snap(inode) != CEPH_NOSNAP) return -EROFS; retry_snap: + written = 0; if (ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_FULL)) return -ENOSPC; __ceph_do_pending_vmtruncate(inode); - dout("aio_write %p %llx.%llx %llu~%u getting caps. i_size %llu\n", - inode, ceph_vinop(inode), pos, (unsigned)iov->iov_len, - inode->i_size); - if (fi->fmode & CEPH_FILE_MODE_LAZY) - want = CEPH_CAP_FILE_BUFFER | CEPH_CAP_FILE_LAZYIO; - else - want = CEPH_CAP_FILE_BUFFER; - ret = ceph_get_caps(ci, CEPH_CAP_FILE_WR, want, &got, endoff); - if (ret < 0) - goto out_put; - - dout("aio_write %p %llx.%llx %llu~%u got cap refs on %s\n", - inode, ceph_vinop(inode), pos, (unsigned)iov->iov_len, - ceph_cap_string(got)); - - if ((got & (CEPH_CAP_FILE_BUFFER|CEPH_CAP_FILE_LAZYIO)) == 0 || - (iocb->ki_filp->f_flags & O_DIRECT) || - (inode->i_sb->s_flags & MS_SYNCHRONOUS) || - (fi->flags & CEPH_F_SYNC)) { - ret = ceph_sync_write(file, iov->iov_base, iov->iov_len, - &iocb->ki_pos); - } else { - /* - * buffered write; drop Fw early to avoid slow - * revocation if we get stuck on balance_dirty_pages - */ - int dirty; - - spin_lock(&ci->i_ceph_lock); - dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_FILE_WR); - spin_unlock(&ci->i_ceph_lock); - ceph_put_cap_refs(ci, got); + if (!(iocb->ki_filp->f_flags & O_DIRECT) && + !(inode->i_sb->s_flags & MS_SYNCHRONOUS) && + !(fi->flags & CEPH_F_SYNC)) { ret = generic_file_aio_write(iocb, iov, nr_segs, pos); + if (ret >= 0) + written = ret; + if ((ret >= 0 || ret == -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) { - err = vfs_fsync_range(file, pos, pos + ret - 1, 1); + err = vfs_fsync_range(file, pos, pos + written - 1, 1); if (err < 0) ret = err; } - - if (dirty) - __mark_inode_dirty(inode, dirty); - goto out; + if ((ret < 0 && ret != -EAGAIN) || pos + written >= endoff) + goto out; } + dout("aio_write %p %llx.%llx %llu~%u getting caps. i_size %llu\n", + inode, ceph_vinop(inode), pos + written, + (unsigned)iov->iov_len - written, inode->i_size); + ret = ceph_get_caps(ci, CEPH_CAP_FILE_WR, 0, &got, endoff); + if (ret < 0) + goto out_put; + + dout("aio_write %p %llx.%llx %llu~%u got cap refs on %s\n", + inode, ceph_vinop(inode), pos + written, + (unsigned)iov->iov_len - written, ceph_cap_string(got)); + + ret = ceph_sync_write(file, iov->iov_base + written, + iov->iov_len - written, &iocb->ki_pos); if (ret >= 0) { int dirty; spin_lock(&ci->i_ceph_lock); @@ -780,10 +766,9 @@ retry_snap: out_put: dout("aio_write %p %llx.%llx %llu~%u dropping cap refs on %s\n", - inode, ceph_vinop(inode), pos, (unsigned)iov->iov_len, - ceph_cap_string(got)); + inode, ceph_vinop(inode), pos + written, + (unsigned)iov->iov_len - written, ceph_cap_string(got)); ceph_put_cap_refs(ci, got); - out: if (ret == -EOLDSNAPC) { dout("aio_write %p %llx.%llx %llu~%u got EOLDSNAPC, retrying\n", -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] mds: Allow try_eval to eval unstable locks in freezing object 2012-11-01 9:03 [PATCH 1/2] ceph: Don't update i_max_size when handling non-auth cap Yan, Zheng 2012-11-01 9:03 ` [PATCH 1/2] mds: Don't acquire replica object's versionlock Yan, Zheng 2012-11-01 9:03 ` [PATCH 2/2] ceph: Fix i_size update race Yan, Zheng @ 2012-11-01 9:03 ` Yan, Zheng 2 siblings, 0 replies; 6+ messages in thread From: Yan, Zheng @ 2012-11-01 9:03 UTC (permalink / raw) To: sage, ceph-devel; +Cc: Yan, Zheng From: "Yan, Zheng" <zheng.z.yan@intel.com> Unstable locks hold auth_pins on the object, it prevents the freezing object become frozen and then unfreeze. So try_eval() should not wait for freezing object Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> --- src/mds/Locker.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc index 6474743..68ecc16 100644 --- a/src/mds/Locker.cc +++ b/src/mds/Locker.cc @@ -837,8 +837,8 @@ void Locker::try_eval(MDSCacheObject *p, int mask) return; } - if (p->is_auth() && !p->can_auth_pin()) { - dout(7) << "try_eval can't auth_pin, waiting on " << *p << dendl; + if (p->is_auth() && p->is_frozen()) { + dout(7) << "try_eval frozen, waiting on " << *p << dendl; p->add_waiter(MDSCacheObject::WAIT_UNFREEZE, new C_Locker_Eval(this, p, mask)); return; } -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-11-06 8:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-01 9:03 [PATCH 1/2] ceph: Don't update i_max_size when handling non-auth cap Yan, Zheng 2012-11-01 9:03 ` [PATCH 1/2] mds: Don't acquire replica object's versionlock Yan, Zheng 2012-11-05 18:52 ` Sage Weil 2012-11-06 8:22 ` Yan, Zheng 2012-11-01 9:03 ` [PATCH 2/2] ceph: Fix i_size update race Yan, Zheng 2012-11-01 9:03 ` [PATCH 2/2] mds: Allow try_eval to eval unstable locks in freezing object Yan, Zheng
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.