[Ocfs2-devel] A deadlock when system do not has sufficient memory

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
@ 2014-08-20  3:57 Xue jiufei
  2014-08-22  8:30 ` Xue jiufei
  0 siblings, 1 reply; 16+ messages in thread
From: Xue jiufei @ 2014-08-20  3:57 UTC (permalink / raw)
  To: ocfs2-devel

Hi all,
We found there may exist a deadlock when system has not sufficient
memory. Here's the situation:
            N1                                      N2
                                             send message to N1
      o2net_wq(kworker)
receiving message and call corresponding
handler to handle this message. It may 
need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
but there's no sufficient memory, lower then
min watermark. So it wakeup kswapd to reclaim memory
and itself may also call
__alloc_pages_direct_reclaim(), trying to
free some pages.

It tries to free ocfs2 inode
cache and calls ocfs2_drop_lock()->dlmunlock()
to drop inode lock, sending unlock message to master,
say N2. When reply comes, queue sc_rx_work and
wait o2net_wq to handle this work. however
o2net_wq is still handling last message, so can not 
process the reply message. It will wait
o2net_nsw_completed() in o2net_send_message_vec()
forever. 
Kswapd thread enconter the same situation.

So is there any advice to solve this deadlock?
And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-20  3:57 [Ocfs2-devel] A deadlock when system do not has sufficient memory Xue jiufei
@ 2014-08-22  8:30 ` Xue jiufei
  2014-08-22 17:08   ` Sunil Mushran
  2014-08-25  1:50   ` Junxiao Bi
  0 siblings, 2 replies; 16+ messages in thread
From: Xue jiufei @ 2014-08-22  8:30 UTC (permalink / raw)
  To: ocfs2-devel

On 2014/8/20 11:57, Xue jiufei wrote:
> Hi all,
> We found there may exist a deadlock when system has not sufficient
> memory. Here's the situation:
>             N1                                      N2
>                                              send message to N1
>       o2net_wq(kworker)
> receiving message and call corresponding
> handler to handle this message. It may 
> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
> but there's no sufficient memory, lower then
> min watermark. So it wakeup kswapd to reclaim memory
> and itself may also call
> __alloc_pages_direct_reclaim(), trying to
> free some pages.
> 
> It tries to free ocfs2 inode
> cache and calls ocfs2_drop_lock()->dlmunlock()
> to drop inode lock, sending unlock message to master,
> say N2. When reply comes, queue sc_rx_work and
> wait o2net_wq to handle this work. however
> o2net_wq is still handling last message, so can not 
> process the reply message. It will wait
> o2net_nsw_completed() in o2net_send_message_vec()
> forever. 
> Kswapd thread enconter the same situation.
> 
> 
> So is there any advice to solve this deadlock?
> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
> 
> Thanks.
> 
To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
in all handlers and return ENOMEM to peer when failed. The peer will
try to resend the message again, o2net_wq can handle other messages.
However, it can not solve all problems. For example, if o2net_wq is
processing sc_connect_work which would call sock_alloc_inode() to alloc
socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
reclaim progress, it also trigger the deadlock. We can not change this
alloc flag.
We have no idea about it. Is there any better ideas. 
Thanks very much.
xuejiufei
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-22  8:30 ` Xue jiufei
@ 2014-08-22 17:08   ` Sunil Mushran
  2014-08-25  2:05     ` Xue jiufei
  2014-08-25  1:50   ` Junxiao Bi
  1 sibling, 1 reply; 16+ messages in thread
From: Sunil Mushran @ 2014-08-22 17:08 UTC (permalink / raw)
  To: ocfs2-devel

Allocs made via GFP_NOFS, by definition, should not trigger any reclaim
from the fs.
So this situation should never arise. That's why all allocs in the dlm have
NOFS.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140822/67094873/attachment.html 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-22  8:30 ` Xue jiufei
  2014-08-22 17:08   ` Sunil Mushran
@ 2014-08-25  1:50   ` Junxiao Bi
  2014-08-28  8:16     ` Xue jiufei
  1 sibling, 1 reply; 16+ messages in thread
From: Junxiao Bi @ 2014-08-25  1:50 UTC (permalink / raw)
  To: ocfs2-devel

Hi Jiufei,

Maybe you can consider using PF_FSTRANS flag, set this flag before
allocating memory with GFP_KERNEL flag and unset after the allocation.
Checking this flag in ocfs2 when trying to free some pages during memory
direct reclaim. See an example from upstream commit
5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
releasepage if we're freeing memory for fs-related reasons) .

Thanks,
Junxiao.


On 08/22/2014 04:30 PM, Xue jiufei wrote:
> On 2014/8/20 11:57, Xue jiufei wrote:
>> Hi all,
>> We found there may exist a deadlock when system has not sufficient
>> memory. Here's the situation:
>>             N1                                      N2
>>                                              send message to N1
>>       o2net_wq(kworker)
>> receiving message and call corresponding
>> handler to handle this message. It may 
>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>> but there's no sufficient memory, lower then
>> min watermark. So it wakeup kswapd to reclaim memory
>> and itself may also call
>> __alloc_pages_direct_reclaim(), trying to
>> free some pages.
>>
>> It tries to free ocfs2 inode
>> cache and calls ocfs2_drop_lock()->dlmunlock()
>> to drop inode lock, sending unlock message to master,
>> say N2. When reply comes, queue sc_rx_work and
>> wait o2net_wq to handle this work. however
>> o2net_wq is still handling last message, so can not 
>> process the reply message. It will wait
>> o2net_nsw_completed() in o2net_send_message_vec()
>> forever. 
>> Kswapd thread enconter the same situation.
>>
>>
>> So is there any advice to solve this deadlock?
>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>
>> Thanks.
>>
> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
> in all handlers and return ENOMEM to peer when failed. The peer will
> try to resend the message again, o2net_wq can handle other messages.
> However, it can not solve all problems. For example, if o2net_wq is
> processing sc_connect_work which would call sock_alloc_inode() to alloc
> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
> reclaim progress, it also trigger the deadlock. We can not change this
> alloc flag.
> We have no idea about it. Is there any better ideas. 
> Thanks very much.
> xuejiufei
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> 
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-22 17:08   ` Sunil Mushran
@ 2014-08-25  2:05     ` Xue jiufei
  2014-08-25  5:00       ` Sunil Mushran
  0 siblings, 1 reply; 16+ messages in thread
From: Xue jiufei @ 2014-08-25  2:05 UTC (permalink / raw)
  To: ocfs2-devel

Hi Sunil,
On 2014/8/23 1:08, Sunil Mushran wrote:
> Allocs made via GFP_NOFS, by definition, should not trigger any reclaim from the fs.
> So this situation should never arise. That's why all allocs in the dlm have NOFS.
>
Thanks for your reply. I haven't noticed that before. So I think
dlm_query_region_handler() should also use GFP_NOFS instead of GFP_KERNEL,
right?

Thanks.
Xuejiufei
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-25  2:05     ` Xue jiufei
@ 2014-08-25  5:00       ` Sunil Mushran
  2014-08-25  5:41         ` Joseph Qi
  0 siblings, 1 reply; 16+ messages in thread
From: Sunil Mushran @ 2014-08-25  5:00 UTC (permalink / raw)
  To: ocfs2-devel

Functions in dlmdomain.c are only triggered during mount. So they cannot
trigger the deadlock as described above in this thread. I would leave them
as is.
On Aug 24, 2014 7:06 PM, "Xue jiufei" <xuejiufei@huawei.com> wrote:

> Hi Sunil,
> On 2014/8/23 1:08, Sunil Mushran wrote:
> > Allocs made via GFP_NOFS, by definition, should not trigger any reclaim
> from the fs.
> > So this situation should never arise. That's why all allocs in the dlm
> have NOFS.
> >
> Thanks for your reply. I haven't noticed that before. So I think
> dlm_query_region_handler() should also use GFP_NOFS instead of GFP_KERNEL,
> right?
>
> Thanks.
> Xuejiufei
> >
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel at oss.oracle.com
> > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140824/430d34df/attachment.html 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-25  5:00       ` Sunil Mushran
@ 2014-08-25  5:41         ` Joseph Qi
  2014-08-25  5:45           ` Sunil Mushran
  0 siblings, 1 reply; 16+ messages in thread
From: Joseph Qi @ 2014-08-25  5:41 UTC (permalink / raw)
  To: ocfs2-devel

On 2014/8/25 13:00, Sunil Mushran wrote:
> Functions in dlmdomain.c are only triggered during mount. So they cannot trigger the deadlock as described above in this thread. I would leave them as is.
> 
It is possible if mounts multiple volumes on each node.

> On Aug 24, 2014 7:06 PM, "Xue jiufei" <xuejiufei at huawei.com <mailto:xuejiufei@huawei.com>> wrote:
> 
>     Hi Sunil,
>     On 2014/8/23 1:08, Sunil Mushran wrote:
>     > Allocs made via GFP_NOFS, by definition, should not trigger any reclaim from the fs.
>     > So this situation should never arise. That's why all allocs in the dlm have NOFS.
>     >
>     Thanks for your reply. I haven't noticed that before. So I think
>     dlm_query_region_handler() should also use GFP_NOFS instead of GFP_KERNEL,
>     right?
> 
>     Thanks.
>     Xuejiufei
>     >
>     > _______________________________________________
>     > Ocfs2-devel mailing list
>     > Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com>
>     > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>     >
> 
> 
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-25  5:41         ` Joseph Qi
@ 2014-08-25  5:45           ` Sunil Mushran
  2014-08-25  6:05             ` Joseph Qi
  0 siblings, 1 reply; 16+ messages in thread
From: Sunil Mushran @ 2014-08-25  5:45 UTC (permalink / raw)
  To: ocfs2-devel

Please could you expand on that.
On Aug 24, 2014 10:42 PM, "Joseph Qi" <joseph.qi@huawei.com> wrote:

> On 2014/8/25 13:00, Sunil Mushran wrote:
> > Functions in dlmdomain.c are only triggered during mount. So they cannot
> trigger the deadlock as described above in this thread. I would leave them
> as is.
> >
> It is possible if mounts multiple volumes on each node.
>
> > On Aug 24, 2014 7:06 PM, "Xue jiufei" <xuejiufei@huawei.com <mailto:
> xuejiufei at huawei.com>> wrote:
> >
> >     Hi Sunil,
> >     On 2014/8/23 1:08, Sunil Mushran wrote:
> >     > Allocs made via GFP_NOFS, by definition, should not trigger any
> reclaim from the fs.
> >     > So this situation should never arise. That's why all allocs in the
> dlm have NOFS.
> >     >
> >     Thanks for your reply. I haven't noticed that before. So I think
> >     dlm_query_region_handler() should also use GFP_NOFS instead of
> GFP_KERNEL,
> >     right?
> >
> >     Thanks.
> >     Xuejiufei
> >     >
> >     > _______________________________________________
> >     > Ocfs2-devel mailing list
> >     > Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com>
> >     > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >     >
> >
> >
> >
> >
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel at oss.oracle.com
> > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140824/aacf06cf/attachment.html 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-25  5:45           ` Sunil Mushran
@ 2014-08-25  6:05             ` Joseph Qi
  2014-08-25 17:13               ` Sunil Mushran
  0 siblings, 1 reply; 16+ messages in thread
From: Joseph Qi @ 2014-08-25  6:05 UTC (permalink / raw)
  To: ocfs2-devel

On 2014/8/25 13:45, Sunil Mushran wrote:
> Please could you expand on that.
> 
In our scenario, one node can mount multiple volumes across the
cluster.
For instance, N1 has mounted ocfs2 volumes say volume1, volume2,
volume3. And volume3 may do umount/mount during runtime of other
volumes.

> On Aug 24, 2014 10:42 PM, "Joseph Qi" <joseph.qi at huawei.com <mailto:joseph.qi@huawei.com>> wrote:
> 
>     On 2014/8/25 13:00, Sunil Mushran wrote:
>     > Functions in dlmdomain.c are only triggered during mount. So they cannot trigger the deadlock as described above in this thread. I would leave them as is.
>     >
>     It is possible if mounts multiple volumes on each node.
> 
>     > On Aug 24, 2014 7:06 PM, "Xue jiufei" <xuejiufei at huawei.com <mailto:xuejiufei@huawei.com> <mailto:xuejiufei at huawei.com <mailto:xuejiufei@huawei.com>>> wrote:
>     >
>     >     Hi Sunil,
>     >     On 2014/8/23 1:08, Sunil Mushran wrote:
>     >     > Allocs made via GFP_NOFS, by definition, should not trigger any reclaim from the fs.
>     >     > So this situation should never arise. That's why all allocs in the dlm have NOFS.
>     >     >
>     >     Thanks for your reply. I haven't noticed that before. So I think
>     >     dlm_query_region_handler() should also use GFP_NOFS instead of GFP_KERNEL,
>     >     right?
>     >
>     >     Thanks.
>     >     Xuejiufei
>     >     >
>     >     > _______________________________________________
>     >     > Ocfs2-devel mailing list
>     >     > Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com> <mailto:Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com>>
>     >     > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>     >     >
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > Ocfs2-devel mailing list
>     > Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com>
>     > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>     >
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-25  6:05             ` Joseph Qi
@ 2014-08-25 17:13               ` Sunil Mushran
  2014-08-27  1:57                 ` Xue jiufei
  0 siblings, 1 reply; 16+ messages in thread
From: Sunil Mushran @ 2014-08-25 17:13 UTC (permalink / raw)
  To: ocfs2-devel

On Sun, Aug 24, 2014 at 11:05 PM, Joseph Qi <joseph.qi@huawei.com> wrote:

> On 2014/8/25 13:45, Sunil Mushran wrote:
> > Please could you expand on that.
> >
> In our scenario, one node can mount multiple volumes across the
> cluster.
> For instance, N1 has mounted ocfs2 volumes say volume1, volume2,
> volume3. And volume3 may do umount/mount during runtime of other
> volumes.


I meant expand on the deadlock. Say we are mounting a new volume and that
triggers a inode cleanup. That inode being cleaned up will have to be from
one of the mounted volumes. How can this lead to a deadlock?

Two variations:
a) Node death leading to recovery during the mount.
b) Mount atop a mount.

But I cannot still see a deadlock in either scenario.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140825/b2f6791f/attachment.html 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-25 17:13               ` Sunil Mushran
@ 2014-08-27  1:57                 ` Xue jiufei
  2014-08-28  1:16                   ` Sunil Mushran
  0 siblings, 1 reply; 16+ messages in thread
From: Xue jiufei @ 2014-08-27  1:57 UTC (permalink / raw)
  To: ocfs2-devel

Hi, Sunil
On 2014/8/26 1:13, Sunil Mushran wrote:
> On Sun, Aug 24, 2014 at 11:05 PM, Joseph Qi <joseph.qi at huawei.com <mailto:joseph.qi@huawei.com>> wrote:
> 
>     On 2014/8/25 13:45, Sunil Mushran wrote:
>     > Please could you expand on that.
>     >
>     In our scenario, one node can mount multiple volumes across the
>     cluster.
>     For instance, N1 has mounted ocfs2 volumes say volume1, volume2,
>     volume3. And volume3 may do umount/mount during runtime of other
>     volumes.
> 
> 
> I meant expand on the deadlock. Say we are mounting a new volume and that triggers a inode cleanup. That inode being cleaned up will have to be from one of the mounted volumes. How can this lead to a deadlock?
> 
> Two variations:
> a) Node death leading to recovery during the mount.
> b) Mount atop a mount.
> 
> But I cannot still see a deadlock in either scenario.
The deadlock situation is just the same as the I described in my first mail.
o2net_wq
-> dlm_query_region_handler
-> kmalloc(no sufficient memory)
-> triggers ocfs2 inodes cleanup
-> ocfs2_drop_lock
-> call o2net_send_message to send unlock message
-> wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw))
   to wait for the reply from master
-> tcp layer receive the reply, call o2net_data_ready
-> queue sc_rx_work, but o2net_wq cannot handle this work
so it triggers the deadlock, o2net_wq is waiting itself to
handle unlock reply and complete the nsw.

Thanks.
Xuejiufei

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-27  1:57                 ` Xue jiufei
@ 2014-08-28  1:16                   ` Sunil Mushran
  0 siblings, 0 replies; 16+ messages in thread
From: Sunil Mushran @ 2014-08-28  1:16 UTC (permalink / raw)
  To: ocfs2-devel

Hi,

What is o2net_wq waiting on that is preventing it from processing the reply
for unlock?
Sorry for being a bit slow. Do you have the raw stacks of the deadlock?

I mean, making that alloc NOFS should not be an issue. But I would like to
understand whether we are fixing the actual problem or not.

Sunil


On Tue, Aug 26, 2014 at 6:57 PM, Xue jiufei <xuejiufei@huawei.com> wrote:

> Hi, Sunil
> On 2014/8/26 1:13, Sunil Mushran wrote:
> > On Sun, Aug 24, 2014 at 11:05 PM, Joseph Qi <joseph.qi@huawei.com
> <mailto:joseph.qi@huawei.com>> wrote:
> >
> >     On 2014/8/25 13:45, Sunil Mushran wrote:
> >     > Please could you expand on that.
> >     >
> >     In our scenario, one node can mount multiple volumes across the
> >     cluster.
> >     For instance, N1 has mounted ocfs2 volumes say volume1, volume2,
> >     volume3. And volume3 may do umount/mount during runtime of other
> >     volumes.
> >
> >
> > I meant expand on the deadlock. Say we are mounting a new volume and
> that triggers a inode cleanup. That inode being cleaned up will have to be
> from one of the mounted volumes. How can this lead to a deadlock?
> >
> > Two variations:
> > a) Node death leading to recovery during the mount.
> > b) Mount atop a mount.
> >
> > But I cannot still see a deadlock in either scenario.
> The deadlock situation is just the same as the I described in my first
> mail.
> o2net_wq
> -> dlm_query_region_handler
> -> kmalloc(no sufficient memory)
> -> triggers ocfs2 inodes cleanup
> -> ocfs2_drop_lock
> -> call o2net_send_message to send unlock message
> -> wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw))
>    to wait for the reply from master
> -> tcp layer receive the reply, call o2net_data_ready
> -> queue sc_rx_work, but o2net_wq cannot handle this work
> so it triggers the deadlock, o2net_wq is waiting itself to
> handle unlock reply and complete the nsw.
>
> Thanks.
> Xuejiufei
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140827/0df968ca/attachment.html 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-25  1:50   ` Junxiao Bi
@ 2014-08-28  8:16     ` Xue jiufei
  2014-08-29  3:26       ` Junxiao Bi
  0 siblings, 1 reply; 16+ messages in thread
From: Xue jiufei @ 2014-08-28  8:16 UTC (permalink / raw)
  To: ocfs2-devel

Hi Junxiao,
On 2014/8/25 9:50, Junxiao Bi wrote:
> Hi Jiufei,
> 
> Maybe you can consider using PF_FSTRANS flag, set this flag before
> allocating memory with GFP_KERNEL flag and unset after the allocation.
> Checking this flag in ocfs2 when trying to free some pages during memory
> direct reclaim. See an example from upstream commit
> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
> releasepage if we're freeing memory for fs-related reasons) .
> 
> Thanks,
> Junxiao.
> 
Thank you very much for your suggestion. But in our situation,
o2net_wq is evicting inode during memory direct reclaim, which cannot
return error or do nothing because vfs would destroy_inode after evict,
but we haven't drop inode lock yet.

Thanks
Xuejiufei

> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>> On 2014/8/20 11:57, Xue jiufei wrote:
>>> Hi all,
>>> We found there may exist a deadlock when system has not sufficient
>>> memory. Here's the situation:
>>>             N1                                      N2
>>>                                              send message to N1
>>>       o2net_wq(kworker)
>>> receiving message and call corresponding
>>> handler to handle this message. It may 
>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>> but there's no sufficient memory, lower then
>>> min watermark. So it wakeup kswapd to reclaim memory
>>> and itself may also call
>>> __alloc_pages_direct_reclaim(), trying to
>>> free some pages.
>>>
>>> It tries to free ocfs2 inode
>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>> to drop inode lock, sending unlock message to master,
>>> say N2. When reply comes, queue sc_rx_work and
>>> wait o2net_wq to handle this work. however
>>> o2net_wq is still handling last message, so can not 
>>> process the reply message. It will wait
>>> o2net_nsw_completed() in o2net_send_message_vec()
>>> forever. 
>>> Kswapd thread enconter the same situation.
>>>
>>>
>>> So is there any advice to solve this deadlock?
>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>
>>> Thanks.
>>>
>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>> in all handlers and return ENOMEM to peer when failed. The peer will
>> try to resend the message again, o2net_wq can handle other messages.
>> However, it can not solve all problems. For example, if o2net_wq is
>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>> reclaim progress, it also trigger the deadlock. We can not change this
>> alloc flag.
>> We have no idea about it. Is there any better ideas. 
>> Thanks very much.
>> xuejiufei
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-28  8:16     ` Xue jiufei
@ 2014-08-29  3:26       ` Junxiao Bi
  2014-08-29  7:22         ` Xue jiufei
  0 siblings, 1 reply; 16+ messages in thread
From: Junxiao Bi @ 2014-08-29  3:26 UTC (permalink / raw)
  To: ocfs2-devel

On 08/28/2014 04:16 PM, Xue jiufei wrote:
> Hi Junxiao,
> On 2014/8/25 9:50, Junxiao Bi wrote:
>> Hi Jiufei,
>>
>> Maybe you can consider using PF_FSTRANS flag, set this flag before
>> allocating memory with GFP_KERNEL flag and unset after the allocation.
>> Checking this flag in ocfs2 when trying to free some pages during memory
>> direct reclaim. See an example from upstream commit
>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
>> releasepage if we're freeing memory for fs-related reasons) .
>>
>> Thanks,
>> Junxiao.
>>
> Thank you very much for your suggestion. But in our situation,
> o2net_wq is evicting inode during memory direct reclaim, which cannot
> return error or do nothing because vfs would destroy_inode after evict,
> but we haven't drop inode lock yet.
How about checking the flag in vfs like this? And you can set PF_FSTRANS
flag in o2net_wq context where GFP_NOFS flag can't be set.


commit 8d27fdec5ce234d2f02e4582d340d231396b92af
Author: Junxiao Bi <junxiao.bi@oracle.com>
Date:   Fri Aug 29 11:05:25 2014 +0800

    super: stop shrinker for processes with PF_FSTRANS flag

    For some cluster fs, like ocfs2, it may be impossible to
    set GFP_NOFS for some memory allocation, as the allocation
    is in network common code, like sock_alloc() and in this
    case, the shrinker will call back into the fs and cause
    deadlock when available memory is not enough.

    Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

diff --git a/fs/super.c b/fs/super.c
index b9a214d..c4a8dc1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker
*shrink,
        if (!(sc->gfp_mask & __GFP_FS))
                return SHRINK_STOP;

+       if (current->flags & PF_FSTRANS)
+               return SHRINK_STOP;
+
        if (!grab_super_passive(sb))
                return SHRINK_STOP;


Thanks,
Junxiao.

> 
> Thanks
> Xuejiufei
> 
>> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>>> On 2014/8/20 11:57, Xue jiufei wrote:
>>>> Hi all,
>>>> We found there may exist a deadlock when system has not sufficient
>>>> memory. Here's the situation:
>>>>             N1                                      N2
>>>>                                              send message to N1
>>>>       o2net_wq(kworker)
>>>> receiving message and call corresponding
>>>> handler to handle this message. It may 
>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>>> but there's no sufficient memory, lower then
>>>> min watermark. So it wakeup kswapd to reclaim memory
>>>> and itself may also call
>>>> __alloc_pages_direct_reclaim(), trying to
>>>> free some pages.
>>>>
>>>> It tries to free ocfs2 inode
>>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>>> to drop inode lock, sending unlock message to master,
>>>> say N2. When reply comes, queue sc_rx_work and
>>>> wait o2net_wq to handle this work. however
>>>> o2net_wq is still handling last message, so can not 
>>>> process the reply message. It will wait
>>>> o2net_nsw_completed() in o2net_send_message_vec()
>>>> forever. 
>>>> Kswapd thread enconter the same situation.
>>>>
>>>>
>>>> So is there any advice to solve this deadlock?
>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>>
>>>> Thanks.
>>>>
>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>>> in all handlers and return ENOMEM to peer when failed. The peer will
>>> try to resend the message again, o2net_wq can handle other messages.
>>> However, it can not solve all problems. For example, if o2net_wq is
>>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>>> reclaim progress, it also trigger the deadlock. We can not change this
>>> alloc flag.
>>> We have no idea about it. Is there any better ideas. 
>>> Thanks very much.
>>> xuejiufei
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
>> .
>>
> 
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-29  3:26       ` Junxiao Bi
@ 2014-08-29  7:22         ` Xue jiufei
  2014-08-29  7:30           ` Junxiao Bi
  0 siblings, 1 reply; 16+ messages in thread
From: Xue jiufei @ 2014-08-29  7:22 UTC (permalink / raw)
  To: ocfs2-devel

On 2014/8/29 11:26, Junxiao Bi wrote:
> On 08/28/2014 04:16 PM, Xue jiufei wrote:
>> Hi Junxiao,
>> On 2014/8/25 9:50, Junxiao Bi wrote:
>>> Hi Jiufei,
>>>
>>> Maybe you can consider using PF_FSTRANS flag, set this flag before
>>> allocating memory with GFP_KERNEL flag and unset after the allocation.
>>> Checking this flag in ocfs2 when trying to free some pages during memory
>>> direct reclaim. See an example from upstream commit
>>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
>>> releasepage if we're freeing memory for fs-related reasons) .
>>>
>>> Thanks,
>>> Junxiao.
>>>
>> Thank you very much for your suggestion. But in our situation,
>> o2net_wq is evicting inode during memory direct reclaim, which cannot
>> return error or do nothing because vfs would destroy_inode after evict,
>> but we haven't drop inode lock yet.
> How about checking the flag in vfs like this? And you can set PF_FSTRANS
> flag in o2net_wq context where GFP_NOFS flag can't be set.
> 
> 
> commit 8d27fdec5ce234d2f02e4582d340d231396b92af
> Author: Junxiao Bi <junxiao.bi@oracle.com>
> Date:   Fri Aug 29 11:05:25 2014 +0800
> 
>     super: stop shrinker for processes with PF_FSTRANS flag
> 
>     For some cluster fs, like ocfs2, it may be impossible to
>     set GFP_NOFS for some memory allocation, as the allocation
>     is in network common code, like sock_alloc() and in this
>     case, the shrinker will call back into the fs and cause
>     deadlock when available memory is not enough.
> 
>     Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> 
> diff --git a/fs/super.c b/fs/super.c
> index b9a214d..c4a8dc1 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker
> *shrink,
>         if (!(sc->gfp_mask & __GFP_FS))
>                 return SHRINK_STOP;
> 
> +       if (current->flags & PF_FSTRANS)
> +               return SHRINK_STOP;
> +
>         if (!grab_super_passive(sb))
>                 return SHRINK_STOP;
> 
> 
> Thanks,
> Junxiao.
> 
Yes, this patch can resolve our problem. Thanks a lot.
Have you send this patch to fs-devel list?
>>
>> Thanks
>> Xuejiufei
>>
>>> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>>>> On 2014/8/20 11:57, Xue jiufei wrote:
>>>>> Hi all,
>>>>> We found there may exist a deadlock when system has not sufficient
>>>>> memory. Here's the situation:
>>>>>             N1                                      N2
>>>>>                                              send message to N1
>>>>>       o2net_wq(kworker)
>>>>> receiving message and call corresponding
>>>>> handler to handle this message. It may 
>>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>>>> but there's no sufficient memory, lower then
>>>>> min watermark. So it wakeup kswapd to reclaim memory
>>>>> and itself may also call
>>>>> __alloc_pages_direct_reclaim(), trying to
>>>>> free some pages.
>>>>>
>>>>> It tries to free ocfs2 inode
>>>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>>>> to drop inode lock, sending unlock message to master,
>>>>> say N2. When reply comes, queue sc_rx_work and
>>>>> wait o2net_wq to handle this work. however
>>>>> o2net_wq is still handling last message, so can not 
>>>>> process the reply message. It will wait
>>>>> o2net_nsw_completed() in o2net_send_message_vec()
>>>>> forever. 
>>>>> Kswapd thread enconter the same situation.
>>>>>
>>>>>
>>>>> So is there any advice to solve this deadlock?
>>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>>>
>>>>> Thanks.
>>>>>
>>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>>>> in all handlers and return ENOMEM to peer when failed. The peer will
>>>> try to resend the message again, o2net_wq can handle other messages.
>>>> However, it can not solve all problems. For example, if o2net_wq is
>>>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>>>> reclaim progress, it also trigger the deadlock. We can not change this
>>>> alloc flag.
>>>> We have no idea about it. Is there any better ideas. 
>>>> Thanks very much.
>>>> xuejiufei
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>
>>>
>>> .
>>>
>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] A deadlock when system do not has sufficient memory
  2014-08-29  7:22         ` Xue jiufei
@ 2014-08-29  7:30           ` Junxiao Bi
  0 siblings, 0 replies; 16+ messages in thread
From: Junxiao Bi @ 2014-08-29  7:30 UTC (permalink / raw)
  To: ocfs2-devel

On 08/29/2014 03:22 PM, Xue jiufei wrote:
> On 2014/8/29 11:26, Junxiao Bi wrote:
>> On 08/28/2014 04:16 PM, Xue jiufei wrote:
>>> Hi Junxiao,
>>> On 2014/8/25 9:50, Junxiao Bi wrote:
>>>> Hi Jiufei,
>>>>
>>>> Maybe you can consider using PF_FSTRANS flag, set this flag before
>>>> allocating memory with GFP_KERNEL flag and unset after the allocation.
>>>> Checking this flag in ocfs2 when trying to free some pages during memory
>>>> direct reclaim. See an example from upstream commit
>>>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
>>>> releasepage if we're freeing memory for fs-related reasons) .
>>>>
>>>> Thanks,
>>>> Junxiao.
>>>>
>>> Thank you very much for your suggestion. But in our situation,
>>> o2net_wq is evicting inode during memory direct reclaim, which cannot
>>> return error or do nothing because vfs would destroy_inode after evict,
>>> but we haven't drop inode lock yet.
>> How about checking the flag in vfs like this? And you can set PF_FSTRANS
>> flag in o2net_wq context where GFP_NOFS flag can't be set.
>>
>>
>> commit 8d27fdec5ce234d2f02e4582d340d231396b92af
>> Author: Junxiao Bi <junxiao.bi@oracle.com>
>> Date:   Fri Aug 29 11:05:25 2014 +0800
>>
>>     super: stop shrinker for processes with PF_FSTRANS flag
>>
>>     For some cluster fs, like ocfs2, it may be impossible to
>>     set GFP_NOFS for some memory allocation, as the allocation
>>     is in network common code, like sock_alloc() and in this
>>     case, the shrinker will call back into the fs and cause
>>     deadlock when available memory is not enough.
>>
>>     Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>
>> diff --git a/fs/super.c b/fs/super.c
>> index b9a214d..c4a8dc1 100644
>> --- a/fs/super.c
>> +++ b/fs/super.c
>> @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker
>> *shrink,
>>         if (!(sc->gfp_mask & __GFP_FS))
>>                 return SHRINK_STOP;
>>
>> +       if (current->flags & PF_FSTRANS)
>> +               return SHRINK_STOP;
>> +
>>         if (!grab_super_passive(sb))
>>                 return SHRINK_STOP;
>>
>>
>> Thanks,
>> Junxiao.
>>
> Yes, this patch can resolve our problem. Thanks a lot.
> Have you send this patch to fs-devel list?
No. May you send it with your ocfs2 fix? I think that is more convincing
with the ocfs2 deadlock case. I will monitor it if there is any concern
about it.

Thanks,
Junxiao.
>>>
>>> Thanks
>>> Xuejiufei
>>>
>>>> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>>>>> On 2014/8/20 11:57, Xue jiufei wrote:
>>>>>> Hi all,
>>>>>> We found there may exist a deadlock when system has not sufficient
>>>>>> memory. Here's the situation:
>>>>>>             N1                                      N2
>>>>>>                                              send message to N1
>>>>>>       o2net_wq(kworker)
>>>>>> receiving message and call corresponding
>>>>>> handler to handle this message. It may 
>>>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>>>>> but there's no sufficient memory, lower then
>>>>>> min watermark. So it wakeup kswapd to reclaim memory
>>>>>> and itself may also call
>>>>>> __alloc_pages_direct_reclaim(), trying to
>>>>>> free some pages.
>>>>>>
>>>>>> It tries to free ocfs2 inode
>>>>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>>>>> to drop inode lock, sending unlock message to master,
>>>>>> say N2. When reply comes, queue sc_rx_work and
>>>>>> wait o2net_wq to handle this work. however
>>>>>> o2net_wq is still handling last message, so can not 
>>>>>> process the reply message. It will wait
>>>>>> o2net_nsw_completed() in o2net_send_message_vec()
>>>>>> forever. 
>>>>>> Kswapd thread enconter the same situation.
>>>>>>
>>>>>>
>>>>>> So is there any advice to solve this deadlock?
>>>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>>>>> in all handlers and return ENOMEM to peer when failed. The peer will
>>>>> try to resend the message again, o2net_wq can handle other messages.
>>>>> However, it can not solve all problems. For example, if o2net_wq is
>>>>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>>>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>>>>> reclaim progress, it also trigger the deadlock. We can not change this
>>>>> alloc flag.
>>>>> We have no idea about it. Is there any better ideas. 
>>>>> Thanks very much.
>>>>> xuejiufei
>>>>>> _______________________________________________
>>>>>> Ocfs2-devel mailing list
>>>>>> Ocfs2-devel at oss.oracle.com
>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-08-29  7:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-20  3:57 [Ocfs2-devel] A deadlock when system do not has sufficient memory Xue jiufei
2014-08-22  8:30 ` Xue jiufei
2014-08-22 17:08   ` Sunil Mushran
2014-08-25  2:05     ` Xue jiufei
2014-08-25  5:00       ` Sunil Mushran
2014-08-25  5:41         ` Joseph Qi
2014-08-25  5:45           ` Sunil Mushran
2014-08-25  6:05             ` Joseph Qi
2014-08-25 17:13               ` Sunil Mushran
2014-08-27  1:57                 ` Xue jiufei
2014-08-28  1:16                   ` Sunil Mushran
2014-08-25  1:50   ` Junxiao Bi
2014-08-28  8:16     ` Xue jiufei
2014-08-29  3:26       ` Junxiao Bi
2014-08-29  7:22         ` Xue jiufei
2014-08-29  7:30           ` Junxiao Bi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.