From: Xue jiufei <xuejiufei@huawei.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] A deadlock when system do not has sufficient memory
Date: Fri, 29 Aug 2014 15:22:11 +0800 [thread overview]
Message-ID: <54002A23.6050708@huawei.com> (raw)
In-Reply-To: <53FFF2FB.20706@oracle.com>
On 2014/8/29 11:26, Junxiao Bi wrote:
> On 08/28/2014 04:16 PM, Xue jiufei wrote:
>> Hi Junxiao,
>> On 2014/8/25 9:50, Junxiao Bi wrote:
>>> Hi Jiufei,
>>>
>>> Maybe you can consider using PF_FSTRANS flag, set this flag before
>>> allocating memory with GFP_KERNEL flag and unset after the allocation.
>>> Checking this flag in ocfs2 when trying to free some pages during memory
>>> direct reclaim. See an example from upstream commit
>>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
>>> releasepage if we're freeing memory for fs-related reasons) .
>>>
>>> Thanks,
>>> Junxiao.
>>>
>> Thank you very much for your suggestion. But in our situation,
>> o2net_wq is evicting inode during memory direct reclaim, which cannot
>> return error or do nothing because vfs would destroy_inode after evict,
>> but we haven't drop inode lock yet.
> How about checking the flag in vfs like this? And you can set PF_FSTRANS
> flag in o2net_wq context where GFP_NOFS flag can't be set.
>
>
> commit 8d27fdec5ce234d2f02e4582d340d231396b92af
> Author: Junxiao Bi <junxiao.bi@oracle.com>
> Date: Fri Aug 29 11:05:25 2014 +0800
>
> super: stop shrinker for processes with PF_FSTRANS flag
>
> For some cluster fs, like ocfs2, it may be impossible to
> set GFP_NOFS for some memory allocation, as the allocation
> is in network common code, like sock_alloc() and in this
> case, the shrinker will call back into the fs and cause
> deadlock when available memory is not enough.
>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>
> diff --git a/fs/super.c b/fs/super.c
> index b9a214d..c4a8dc1 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker
> *shrink,
> if (!(sc->gfp_mask & __GFP_FS))
> return SHRINK_STOP;
>
> + if (current->flags & PF_FSTRANS)
> + return SHRINK_STOP;
> +
> if (!grab_super_passive(sb))
> return SHRINK_STOP;
>
>
> Thanks,
> Junxiao.
>
Yes, this patch can resolve our problem. Thanks a lot.
Have you send this patch to fs-devel list?
>>
>> Thanks
>> Xuejiufei
>>
>>> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>>>> On 2014/8/20 11:57, Xue jiufei wrote:
>>>>> Hi all,
>>>>> We found there may exist a deadlock when system has not sufficient
>>>>> memory. Here's the situation:
>>>>> N1 N2
>>>>> send message to N1
>>>>> o2net_wq(kworker)
>>>>> receiving message and call corresponding
>>>>> handler to handle this message. It may
>>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>>>> but there's no sufficient memory, lower then
>>>>> min watermark. So it wakeup kswapd to reclaim memory
>>>>> and itself may also call
>>>>> __alloc_pages_direct_reclaim(), trying to
>>>>> free some pages.
>>>>>
>>>>> It tries to free ocfs2 inode
>>>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>>>> to drop inode lock, sending unlock message to master,
>>>>> say N2. When reply comes, queue sc_rx_work and
>>>>> wait o2net_wq to handle this work. however
>>>>> o2net_wq is still handling last message, so can not
>>>>> process the reply message. It will wait
>>>>> o2net_nsw_completed() in o2net_send_message_vec()
>>>>> forever.
>>>>> Kswapd thread enconter the same situation.
>>>>>
>>>>>
>>>>> So is there any advice to solve this deadlock?
>>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>>>
>>>>> Thanks.
>>>>>
>>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>>>> in all handlers and return ENOMEM to peer when failed. The peer will
>>>> try to resend the message again, o2net_wq can handle other messages.
>>>> However, it can not solve all problems. For example, if o2net_wq is
>>>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>>>> reclaim progress, it also trigger the deadlock. We can not change this
>>>> alloc flag.
>>>> We have no idea about it. Is there any better ideas.
>>>> Thanks very much.
>>>> xuejiufei
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>
>>>
>>> .
>>>
>>
>>
>
> .
>
next prev parent reply other threads:[~2014-08-29 7:22 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-20 3:57 [Ocfs2-devel] A deadlock when system do not has sufficient memory Xue jiufei
2014-08-22 8:30 ` Xue jiufei
2014-08-22 17:08 ` Sunil Mushran
2014-08-25 2:05 ` Xue jiufei
2014-08-25 5:00 ` Sunil Mushran
2014-08-25 5:41 ` Joseph Qi
2014-08-25 5:45 ` Sunil Mushran
2014-08-25 6:05 ` Joseph Qi
2014-08-25 17:13 ` Sunil Mushran
2014-08-27 1:57 ` Xue jiufei
2014-08-28 1:16 ` Sunil Mushran
2014-08-25 1:50 ` Junxiao Bi
2014-08-28 8:16 ` Xue jiufei
2014-08-29 3:26 ` Junxiao Bi
2014-08-29 7:22 ` Xue jiufei [this message]
2014-08-29 7:30 ` Junxiao Bi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54002A23.6050708@huawei.com \
--to=xuejiufei@huawei.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.