From: "Yan, Zheng" <zheng.z.yan@intel.com>
To: Sage Weil <sage@inktank.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work
Date: Mon, 07 Jan 2013 13:31:39 +0800 [thread overview]
Message-ID: <50EA5DBB.7040505@intel.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1301062101250.15430@cobra.newdream.net>
On 01/07/2013 01:05 PM, Sage Weil wrote:
> On Sun, 6 Jan 2013, Yan, Zheng wrote:
>> On 01/06/2013 02:00 PM, Sage Weil wrote:
>>> On Fri, 4 Jan 2013, Yan, Zheng wrote:
>>>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>>>
>>>> In commit 22cddde104, ceph_get_caps() was moved into ceph_write_begin().
>>>> So ceph_get_caps() can be called while i_mutex is locked. If there is
>>>> pending vmtruncate, ceph_get_caps() will wait for it to complete, but
>>>> ceph_vmtruncate_work() is blocked by the locked i_mutex.
>>>
>>> Hmm... :/
>>>
>>>> There are several places that call __ceph_do_pending_vmtruncate()
>>>> without holding the i_mutex, I think it's OK to not acquire the i_mutex
>>>> in ceph_vmtruncate_work()
>>>
>>> The intention was that that function woudl only be called under i_mutex.
>>> I did a quick look through the callers and that appears to be the case
>>> (for things llseek and setattr, the vfs should be taking i_mutex).
>>
>> both ceph_aio_read() and ceph_aio_write() call __ceph_do_pending_vmtruncate()
>> without holding the i_mutex
>
> Hrm.. that's now how I remember it working. I'm pretty sure i_mutex was
> providing the serialization around truncation.
>
>>> IIRC, this is to serialize the page cache truncation with truncate
>>> operations; this work can only be sanely done under i_mutex, so we defer
>>> it to the work queue or next person who takes i_mutex and cares about the
>>> mapping and i_size being consistent.
>>>
>>> What was the deadlock you observed?
>>
>> generic_file_aio_write() locks the i_mutex, then indirectly calls ceph_get_caps()
>> through ceph_write_begin(). ceph_get_caps() wait for pending vmtruncate to complete,
>> but the work queue is blocked by the i_mutex.
>
> ...but it seems clear that that's not a workable solution. I suspect the
> right solution here is to introduce an inner i_trunc_mutex in the ceph
> inode and use that to protect truncation events in the page cache. That
> will touch a lot of different code paths, unfortunately, but I think
> something like that is necessary if we need to block with i_mutex held by
> the VFS.
>
Maybe ceph_vmtruncate_work() can wait until no one is using write cap. I think it's safe to
truncate page cache in that case.
Regards
Yan, Zheng
prev parent reply other threads:[~2013-01-07 5:31 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-04 11:34 [PATCH 0/6] fixes for cephfs kernel client Yan, Zheng
2013-01-04 11:34 ` [PATCH 1/6] ceph: re-calculate truncate_size for strip object Yan, Zheng
2013-01-06 6:04 ` Sage Weil
2013-01-04 11:34 ` [PATCH 2/6] ceph: move dirty inode to migrating list when clearing auth caps Yan, Zheng
2013-01-06 6:05 ` Sage Weil
2013-01-04 11:34 ` [PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS Yan, Zheng
2013-01-06 6:09 ` Sage Weil
2013-01-06 8:49 ` Yan, Zheng
2013-01-04 11:34 ` [PATCH 4/6] ceph: allocate cap_release message when receiving cap import Yan, Zheng
2013-01-06 6:12 ` Sage Weil
2013-01-04 11:34 ` [PATCH 5/6] ceph: check mds_wanted for imported cap Yan, Zheng
2013-01-06 6:20 ` Sage Weil
2013-01-06 8:21 ` Yan, Zheng
2013-01-07 5:06 ` Sage Weil
2013-01-06 8:54 ` Yan, Zheng
2013-01-07 5:11 ` Sage Weil
2013-01-07 5:39 ` Yan, Zheng
2013-01-04 11:34 ` [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work Yan, Zheng
2013-01-06 6:00 ` Sage Weil
2013-01-06 7:54 ` Yan, Zheng
2013-01-07 5:05 ` Sage Weil
2013-01-07 5:31 ` Yan, Zheng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50EA5DBB.7040505@intel.com \
--to=zheng.z.yan@intel.com \
--cc=ceph-devel@vger.kernel.org \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.