All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] ocfs2: retry once dlm_dispatch_assert_master failed with ENOMEM
@ 2014-04-03 12:45 Joseph Qi
  2014-04-04  1:26 ` Wengang
  0 siblings, 1 reply; 3+ messages in thread
From: Joseph Qi @ 2014-04-03 12:45 UTC (permalink / raw)
  To: ocfs2-devel

Once dlm_dispatch_assert_master failed in dlm_master_requery_handler,
the only reason is ENOMEM. So just retry it instead of BUG().

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
---
 fs/ocfs2/dlm/dlmrecovery.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 7035af0..f772d64 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -1685,6 +1685,7 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
 
 	hash = dlm_lockid_hash(req->name, req->namelen);
 
+retry:
 	spin_lock(&dlm->spinlock);
 	res = __dlm_lookup_lockres(dlm, req->name, req->namelen, hash);
 	if (res) {
@@ -1693,10 +1694,14 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
 		if (master == dlm->node_num) {
 			int ret = dlm_dispatch_assert_master(dlm, res,
 							     0, 0, flags);
+			/* ENOMEM returns, just retry */
 			if (ret < 0) {
-				mlog_errno(-ENOMEM);
-				/* retry!? */
-				BUG();
+				spin_unlock(&res->spinlock);
+				dlm_lockres_put(res);
+				spin_unlock(&dlm->spinlock);
+				mlog_errno(ret);
+				msleep(50);
+				goto retry;
 			}
 		} else /* put.. incase we are not the master */
 			dlm_lockres_put(res);
-- 
1.8.4.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: retry once dlm_dispatch_assert_master failed with ENOMEM
  2014-04-03 12:45 [Ocfs2-devel] [PATCH] ocfs2: retry once dlm_dispatch_assert_master failed with ENOMEM Joseph Qi
@ 2014-04-04  1:26 ` Wengang
  2014-04-04  2:06   ` Joseph Qi
  0 siblings, 1 reply; 3+ messages in thread
From: Wengang @ 2014-04-04  1:26 UTC (permalink / raw)
  To: ocfs2-devel

O2net is using a single threaded work queue to process network requests. 
Blocking in a handler would block whole network processing.
As you see, the memory allocation is with GFP_NOFS, if the first try 
failed, the following retries may still fail. Thus it could block a 
while which is not good.

How about to limit the retries, say, 3 or 5 times. If it still failed to 
get memory, return an error to peer and peer decides to retry or give up.

thanks,
wengang

? 2014?04?03? 20:45, Joseph Qi ??:
> Once dlm_dispatch_assert_master failed in dlm_master_requery_handler,
> the only reason is ENOMEM. So just retry it instead of BUG().
>
> Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
> ---
>   fs/ocfs2/dlm/dlmrecovery.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 7035af0..f772d64 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1685,6 +1685,7 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
>   
>   	hash = dlm_lockid_hash(req->name, req->namelen);
>   
> +retry:
>   	spin_lock(&dlm->spinlock);
>   	res = __dlm_lookup_lockres(dlm, req->name, req->namelen, hash);
>   	if (res) {
> @@ -1693,10 +1694,14 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
>   		if (master == dlm->node_num) {
>   			int ret = dlm_dispatch_assert_master(dlm, res,
>   							     0, 0, flags);
> +			/* ENOMEM returns, just retry */
>   			if (ret < 0) {
> -				mlog_errno(-ENOMEM);
> -				/* retry!? */
> -				BUG();
> +				spin_unlock(&res->spinlock);
> +				dlm_lockres_put(res);
> +				spin_unlock(&dlm->spinlock);
> +				mlog_errno(ret);
> +				msleep(50);
> +				goto retry;
>   			}
>   		} else /* put.. incase we are not the master */
>   			dlm_lockres_put(res);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: retry once dlm_dispatch_assert_master failed with ENOMEM
  2014-04-04  1:26 ` Wengang
@ 2014-04-04  2:06   ` Joseph Qi
  0 siblings, 0 replies; 3+ messages in thread
From: Joseph Qi @ 2014-04-04  2:06 UTC (permalink / raw)
  To: ocfs2-devel

Thanks for your advice.
I thought about returning DLM_LOCK_RES_OWNER_UNKNOWN, but it would
result in confusion at message sender.
I'll take your idea of retrying 3 times and then resend this patch.

On 2014/4/4 9:26, Wengang wrote:
> O2net is using a single threaded work queue to process network requests.
> Blocking in a handler would block whole network processing.
> As you see, the memory allocation is with GFP_NOFS, if the first try
> failed, the following retries may still fail. Thus it could block a
> while which is not good.
> 
> How about to limit the retries, say, 3 or 5 times. If it still failed to
> get memory, return an error to peer and peer decides to retry or give up.
> 
> thanks,
> wengang
> 
> ? 2014?04?03? 20:45, Joseph Qi ??:
>> Once dlm_dispatch_assert_master failed in dlm_master_requery_handler,
>> the only reason is ENOMEM. So just retry it instead of BUG().
>>
>> Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
>> ---
>>   fs/ocfs2/dlm/dlmrecovery.c | 11 ++++++++---
>>   1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
>> index 7035af0..f772d64 100644
>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>> @@ -1685,6 +1685,7 @@ int dlm_master_requery_handler(struct o2net_msg
>> *msg, u32 len, void *data,
>>         hash = dlm_lockid_hash(req->name, req->namelen);
>>   +retry:
>>       spin_lock(&dlm->spinlock);
>>       res = __dlm_lookup_lockres(dlm, req->name, req->namelen, hash);
>>       if (res) {
>> @@ -1693,10 +1694,14 @@ int dlm_master_requery_handler(struct
>> o2net_msg *msg, u32 len, void *data,
>>           if (master == dlm->node_num) {
>>               int ret = dlm_dispatch_assert_master(dlm, res,
>>                                    0, 0, flags);
>> +            /* ENOMEM returns, just retry */
>>               if (ret < 0) {
>> -                mlog_errno(-ENOMEM);
>> -                /* retry!? */
>> -                BUG();
>> +                spin_unlock(&res->spinlock);
>> +                dlm_lockres_put(res);
>> +                spin_unlock(&dlm->spinlock);
>> +                mlog_errno(ret);
>> +                msleep(50);
>> +                goto retry;
>>               }
>>           } else /* put.. incase we are not the master */
>>               dlm_lockres_put(res);
> 
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-04-04  2:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-03 12:45 [Ocfs2-devel] [PATCH] ocfs2: retry once dlm_dispatch_assert_master failed with ENOMEM Joseph Qi
2014-04-04  1:26 ` Wengang
2014-04-04  2:06   ` Joseph Qi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.