All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wengang Wang <wen.gang.wang@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [SUGGESSTION 1/1] OCFS2: runtime tunable network idle timeout
Date: Tue, 09 Jun 2009 11:34:54 +0800	[thread overview]
Message-ID: <4A2DD85E.7060505@oracle.com> (raw)
In-Reply-To: <4A2D51EF.3030606@oracle.com>

Sunil,

Sunil Mushran wrote:
> wengang wang wrote:
>> backgroud:
>>     there is a network idle timeout regarding which a node is 
>> considered dead or network partition occures.
>> problem:
>>     for some product environment, there is a special time during a 
>> day. in this special time, a backup work is happening over private 
>> network. at the time that the backup is going on, there is very very 
>> high load on network. this can lead to ocfs2 network idle timeout and 
>> when it can't connect back in time, some nodes have to be fensed out 
>> the cluster domain which is not really what we want.
> 
> Bug#? SR? Have we ruled out a bug in our code? The last time I saw one 
> of these
> we determined it was because of a bug.

one of the bugs is:
https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=8443612

oh, sorry that I didn't notice it could be caused by a bug. will get 
tcpdumps to do more analyse on it..

> 
>>     there is a configuration O2CB_IDLE_TIMEOUT_MS by which we can set 
>> the timeout value. but looks it takes effect on when o2cb service is 
>> restarted, so it's not possible to change it in the already running 
>> system.
>>
>> suggestion:
>>     if we can modify the timeout value at runtime, it's better. we can 
>> add a proc file under /proc/fs/ocfs2_nodemanager, for example, 
>> idle_timeout, so that a userspace application(such as debugfs.ocfs2) 
>> can read/write the timeout value. before the customer run the backup, 
>> set the value to a big value(or to no limit) and set it back when 
>> backup finished.
>>     contents in /proc/fs/ocfs2_nodemanager/idle_timeout is the timeout 
>> value in MS. 0 means no limit.
>>
>> if it's good, I'm glad to do it.
> 
> One cannot just set this value on one node. It would have to be set 
> atomically
> on all nodes.
> 

Yes, I know that.

> While that can still be done, my issue is as to why one cannot set that 
> timeout
> up front. Asking clients to "set" timeout dynamically before certain fs 
> operations
> is not at all friendly. Especially when the user has no idea as what 
> workload a
> certain operation entails.

if the timeout is set as a too large value, I think it will cause slower 
response when a timeout happens(a true node death or network partition) 
for a normal network load. for a production environment, it's not good.

and yes it's difficult for clients to determine a high network load 
unless they has a very cool administrator -- that's a problem.

Ok, then we put it away now and put it up when we know clearly about the 
problem.

thanks
wengang.

-- 
--just begin to learn, you are never too late...

  reply	other threads:[~2009-06-09  3:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-08  5:36 [Ocfs2-devel] [SUGGESSTION 1/1] OCFS2: runtime tunable network idle timeout wengang wang
2009-06-08 18:01 ` Sunil Mushran
2009-06-09  3:34   ` Wengang Wang [this message]
2009-06-09 21:12     ` Sunil Mushran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A2DD85E.7060505@oracle.com \
    --to=wen.gang.wang@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.