All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Liu <jeff.liu@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] unlink performance: wait for open lock in case of dirs
Date: Tue, 02 Jul 2013 15:11:08 +0800	[thread overview]
Message-ID: <51D27D0C.8040903@oracle.com> (raw)
In-Reply-To: <CAJtY7HXQHsKJT1JvTdx5__3XMuWPvSuwAWYg44PquHrkg03DNA@mail.gmail.com>

On 07/02/2013 12:44 AM, Goldwyn Rodrigues wrote:

> On Sun, Jun 30, 2013 at 1:44 AM, Jeff Liu <jeff.liu@oracle.com> wrote:
>> On 06/29/2013 07:12 AM, Andrew Morton wrote:
>>
>>> On Fri, 14 Jun 2013 21:05:10 -0500 Goldwyn Rodrigues <rgoldwyn@gmail.com> wrote:
>>>
>>>> This patch is to improve the unlink performance. Here is the scenario:
>>>>
>>>> On node A, create multiple directories say d1-d8, and each have 3
>>>> files under it f1, f2 and f3.
>>>> On node B, delete all directories using rm -Rf d*
>>>>
>>>> The FS first unlinks f1, f2 and f3. However, when it performs
>>>> ocfs2_evict_inode() -> ocfs2_delete_inode() ->
>>>> ocfs2_query_inode_wipe() -> ocfs2_try_open_lock() on d1, it fails
>>>> with -EAGAIN. The open lock fails because on the remote node
>>>> a PR->EX convert takes longer than a simple EX grant.
>>>>
>>>> This starts a checkpoint because OCFS2_INODE_DELETED flag is not set
>>>> on the directory inode. Now, a checkpoint interferes with the journaling
>>>> of the inodes deleted in the following unlinks, in our case,
>>>> directories d2-d8 and the files contained in it.
>>>>
>>>> With this patch, We wait on a directory EX lock only if we already
>>>> have an open_lock in PR mode. This way we will avoid the ABBA locking.
>>>>
>>>> By waiting for the open_lock on the directory, I am getting a unlink
>>>> performance improvement of a rm -Rf of 50-60% in the usual case.
>>>>
>>>> Also, folded ocfs2_open_lock and ocfs2_try_open_lock into one.
>>>>
>>>> Let me know if you would like to see the test case.
>>>
>>> We need some more review and test of this patch, please.
>>>
>>> The patch doesn't apply to current kernels - please redo, retest and
>>> resend it.
> 
> Yes, I sent it against an older kernel. I will update it once I
> incorporate the other review comments. Thanks for this comment.
> 
>>>
>>> In particular, the kernel you're patching doesn't have the
>>> ocfs2_is_hard_readonly() tests in ocfs2_open_lock() and
>>> ocfs2_try_open_lock().
>>>
>>> And looking at those tests, I wonder about ocfs2_open_lock().
>>>
>>> : int ocfs2_open_lock(struct inode *inode)
>>> : {
>>> :     int status = 0;
>>> :     struct ocfs2_lock_res *lockres;
>>> :     struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> :
>>> :     BUG_ON(!inode);
>>> :
>>> :     mlog(0, "inode %llu take PRMODE open lock\n",
>>> :          (unsigned long long)OCFS2_I(inode)->ip_blkno);
>>> :
>>> :     if (ocfs2_is_hard_readonly(osb) || ocfs2_mount_local(osb))
>>> :             goto out;
>>> :
>>> :     lockres = &OCFS2_I(inode)->ip_open_lockres;
>>> :
>>> :     status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres,
>>> :                                 DLM_LOCK_PR, 0, 0);
>>> :     if (status < 0)
>>> :             mlog_errno(status);
>>> :
>>> : out:
>>> :     return status;
>>> : }
>>>
>>> It will return zero if ocfs2_is_hard_readonly().  Is that correct?
>>
>>> Should it return -EROFS for writes?
> 
> Yes, it should.
> 
> 
>>
>> Hi Andrew,
>>
>> That is correct for the current design, because ocfs2_open_lock() is
>> implemented to grant a lock in protect read mode.
>>
>> The ocfs2_is_hard_readonly() has been introduced to ocfs2_open_lock()
>> by commit: 03efed8a2a1b8e00164eb4720a82a7dd5e368a8e
>>
>>     ocfs2: Bugfix for hard readonly mount
>>         ocfs2 cannot currently mount a device that is readonly at the media
>>     ("hard readonly").  Fix the broken places.
>>     see detail: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1322
>>
>>> After your patch, this code does
>>> know whether the attempt was for a write.
>>
>>
>>
>> Yes, with this change, it need a fix indeed.
>>
>>>
>>> Please check all that.
>>
>>
>>
>>
>> Hi Goldwyn,
>>
>> Sorry for the too late response as I have mentioned that I'll take a look at it,
>> but some internal tasks really racked my brains in the past couple of weeks.
> 
> No problem.
> 
>>
>> Just as Andrew's comments, this patch can not be applied properly against the
>> upstream development tree, could you please re-base it.
> 
> Yes, will do. I have other comments from Mark as well.
> 
>>
>> If we merge ocfs2_open_lock()/ocfs2_open_try_lock() into one, looks the semantics of
>> ocfs2_open_lock() is broke with the 'write' option is introduced, but this is not a
>> big issue as we can tweak up the comments as well as those functions which would be
>> affected, but we need to run ocfs2-test(at least cover all the code path which are
>> involved in this change) to avoid any regression, it can be download from:
>> https://oss.oracle.com/projects/ocfs2-test/source.html
>> https://oss.oracle.com/osswiki/OCFS2/Ocfs2TestList.html
>>
>> Another thing come to mind since this patch would affect ocfs2_evict_inode() with
>> ocfs2_try_open_lock(), i.e, the following call chains:
>>
>> ocfs2_evict_inode()
>> |
>> |-------ocfs2_delete_inode()
>>         |
>>         |-------ocfs2_query_inode_wipe()
>>         |       |
>>         |       |-------ocfs2_try_open_lock()
>>         |       |
>>         |-------ocfs2_clear_inode()
>>                 |
>>                 |---------ocfs2_open_unlock()
>>
>> OCFS2 has a DEAD LOCK issue when creating/removing tons of files in parallel if the
>> disk quota is enabled, please refer to:
>> https://oss.oracle.com/bugzilla/show_bug.cgi?id=1339
>>
>> As per the old discussions, Jan Kara pointed out that there is a race in
>> ocfs2_evict_inode() if consider the following lock sequences:
>>
>> ocfs2_evict_inode()
>> |
>> |-------ocfs2_inode_lock()
>>         |
>>         |-------ocfs2_try_open_lock() [ try to get open lock in EX mode ]
>>                 |
>>                 |-------ocfs2_inode_unlock()
>>                         |
>>                         |-------ocfs2_open_unlock() [ drops shared open lock that is hold ]
>>
>> Quotas from Jan:
>> """
>> Now if two nodes happen to execute ocfs2_evict_inode() in parallel and
>> ocfs2_try_open_lock() happens on both nodes before ocfs2_open_unlock() is
>> called on any of them, ocfs2_try_open_lock() fails for both nodes...
>>
>> I would think that the code should be reorganized so that shared open
>> lock is dropped before we drop inode lock. Then the race could not happen.
>> But I'm not sure if something else would not break.
>> """
> 
> I could not find the reference of this in the bug. Is it discussed on
> the ML. Do you have a link so I can look at the history?
> 
>>
>> This is deserve to give a try IMHO, especially there would be kind of dependencies
>> with your fix, and that would be wonderful if we can have your improvement as well
>> as lock sequences adjustments that might fix the quota problems.
>>
> 
> Yes, I understand. Please provide the references if possible so I get
> the bigger picture.

https://oss.oracle.com/pipermail/ocfs2-devel/2012-August/008709.html

Thanks,
-Jeff

      reply	other threads:[~2013-07-02  7:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-15  2:05 [Ocfs2-devel] [PATCH] unlink performance: wait for open lock in case of dirs Goldwyn Rodrigues
2013-06-18  1:30 ` shencanquan
2013-06-18  2:17   ` Goldwyn Rodrigues
2013-06-28 23:12 ` Andrew Morton
2013-06-30  6:44   ` Jeff Liu
2013-07-01 16:44     ` Goldwyn Rodrigues
2013-07-02  7:11       ` Jeff Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51D27D0C.8040903@oracle.com \
    --to=jeff.liu@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.