public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: piaojun <piaojun@huawei.com>
To: Gang He <ghe@suse.com>, <jlbec@evilplan.org>, <mfasheh@versity.com>
Cc: <ocfs2-devel@oss.oracle.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
Date: Thu, 28 Dec 2017 10:45:01 +0800	[thread overview]
Message-ID: <5A445AAD.6090406@huawei.com> (raw)
In-Reply-To: <5A44C34A020000F9000A0736@prv-mh.provo.novell.com>

Hi Gang,

You cleared my doubt. Should we handle the errno of ocfs2_inode_lock()
or just use mlog_errno()?

thanks,
Jun

On 2017/12/28 10:11, Gang He wrote:
> Hi Jun,
> 
> 
>>>>
>> Hi Gang,
>>
>> Thanks for your explaination, and I just have one more question. Could
>> we use 'ocfs2_inode_lock' instead of 'ocfs2_inode_lock_full' to avoid
>> -EAGAIN circularly?
> No, please see the comments above the function  ocfs2_inode_lock_with_page(),
> there will be probably a deadlock between tasks acquiring DLM
> locks while holding a page lock and the downconvert thread which
> blocks dlm lock acquiry while acquiring page locks.
> Then, the OCFS2_LOCK_NONBLOCK flag was introduced as a workaround to
> avoid this case.
> 
> Thanks
> Gang
> 
>>
>> thanks,
>> Jun
>>
>> On 2017/12/27 18:37, Gang He wrote:
>>> Hi Jun,
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>>
>>>> Do you mean that too many retrys in loop cast losts of CPU-time and
>>>> block page-fault interrupt? We should not add any delay in
>>>> ocfs2_fault(), right? And I still feel a little confused why your
>>>> method can solve this problem.
>>> You can see the related code in function filemap_fault(), if ocfs2 fails to 
>> read a page since 
>>> it can not get a inode lock with non-block mode, the VFS layer code will 
>> invoke ocfs2
>>> read page call back function circularly, this will lead to a softlockup 
>> problem (like the below back trace).
>>> So, we should get a blocking lock to let the dlm lock to this node and also 
>> can avoid CPU loop,
>>> second, base on my testing, the patch also can improve the efficiency in 
>> case modifying the same
>>> file frequently from multiple nodes, since the lock acquisition chance is 
>> more fair.
>>> In fact, the code was modified by a patch 1cce4df04f37 ("ocfs2: do not 
>> lock/unlock() inode DLM lock"),
>>> before that patch, the code is the same, this patch can be considered to 
>> revert that patch, except adding more
>>> clear comments.
>>>  
>>> Thanks
>>> Gang
>>>
>>>
>>>>
>>>> thanks,
>>>> Jun
>>>>
>>>> On 2017/12/27 17:29, Gang He wrote:
>>>>> If we can't get inode lock immediately in the function
>>>>> ocfs2_inode_lock_with_page() when reading a page, we should not
>>>>> return directly here, since this will lead to a softlockup problem.
>>>>> The method is to get a blocking lock and immediately unlock before
>>>>> returning, this can avoid CPU resource waste due to lots of retries,
>>>>> and benefits fairness in getting lock among multiple nodes, increase
>>>>> efficiency in case modifying the same file frequently from multiple
>>>>> nodes.
>>>>> The softlockup problem looks like,
>>>>> Kernel panic - not syncing: softlockup: hung tasks
>>>>> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
>>>>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>>>> Call Trace:
>>>>>   <IRQ>
>>>>>   dump_stack+0x5c/0x82
>>>>>   panic+0xd5/0x21e
>>>>>   watchdog_timer_fn+0x208/0x210
>>>>>   ? watchdog_park_threads+0x70/0x70
>>>>>   __hrtimer_run_queues+0xcc/0x200
>>>>>   hrtimer_interrupt+0xa6/0x1f0
>>>>>   smp_apic_timer_interrupt+0x34/0x50
>>>>>   apic_timer_interrupt+0x96/0xa0
>>>>>   </IRQ>
>>>>>  RIP: 0010:unlock_page+0x17/0x30
>>>>>  RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
>>>>>  RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004
>>>>>  RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300
>>>>>  RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00
>>>>>  R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518
>>>>>  R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300
>>>>>   ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
>>>>>   ocfs2_readpage+0x41/0x2d0 [ocfs2]
>>>>>   ? pagecache_get_page+0x30/0x200
>>>>>   filemap_fault+0x12b/0x5c0
>>>>>   ? recalc_sigpending+0x17/0x50
>>>>>   ? __set_task_blocked+0x28/0x70
>>>>>   ? __set_current_blocked+0x3d/0x60
>>>>>   ocfs2_fault+0x29/0xb0 [ocfs2]
>>>>>   __do_fault+0x1a/0xa0
>>>>>   __handle_mm_fault+0xbe8/0x1090
>>>>>   handle_mm_fault+0xaa/0x1f0
>>>>>   __do_page_fault+0x235/0x4b0
>>>>>   trace_do_page_fault+0x3c/0x110
>>>>>   async_page_fault+0x28/0x30
>>>>>  RIP: 0033:0x7fa75ded638e
>>>>>  RSP: 002b:00007ffd6657db18 EFLAGS: 00010287
>>>>>  RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700
>>>>>  RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700
>>>>>  RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000
>>>>>  R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770
>>>>>  R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000
>>>>>
>>>>> Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
>>>>> Signed-off-by: Gang He <ghe@suse.com>
>>>>> ---
>>>>>  fs/ocfs2/dlmglue.c | 9 +++++++++
>>>>>  1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>>>> index 4689940..5193218 100644
>>>>> --- a/fs/ocfs2/dlmglue.c
>>>>> +++ b/fs/ocfs2/dlmglue.c
>>>>> @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>>>>>  	ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>>>>>  	if (ret == -EAGAIN) {
>>>>>  		unlock_page(page);
>>>>> +		/*
>>>>> +		 * If we can't get inode lock immediately, we should not return
>>>>> +		 * directly here, since this will lead to a softlockup problem.
>>>>> +		 * The method is to get a blocking lock and immediately unlock
>>>>> +		 * before returning, this can avoid CPU resource waste due to
>>>>> +		 * lots of retries, and benefits fairness in getting lock.
>>>>> +		 */
>>>>> +		if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
>>>>> +			ocfs2_inode_unlock(inode, ex);
>>>>>  		ret = AOP_TRUNCATED_PAGE;
>>>>>  	}
>>>>>  
>>>>>
>>> .
>>>
> .
> 

  reply	other threads:[~2017-12-28  2:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-27  9:29 [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE Gang He
2017-12-27 10:15 ` [Ocfs2-devel] " piaojun
2017-12-27 10:37   ` Gang He
2017-12-28  1:30     ` piaojun
2017-12-28  2:11       ` Gang He
2017-12-28  2:45         ` piaojun [this message]
2017-12-28  2:58           ` Gang He
2017-12-28  3:34             ` piaojun
2017-12-28  2:02     ` alex chen
2017-12-28  2:48       ` Gang He
2017-12-28  6:13         ` alex chen
2017-12-27 12:32 ` Changwei Ge
2017-12-28  3:59 ` Eric Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A445AAD.6090406@huawei.com \
    --to=piaojun@huawei.com \
    --cc=ghe@suse.com \
    --cc=jlbec@evilplan.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfasheh@versity.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox