From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 2/3] ocfs2/cluster: Increase the live threshold
Date: Thu, 21 Apr 2011 15:13:46 -0700 [thread overview]
Message-ID: <4DB0AC1A.5020303@oracle.com> (raw)
In-Reply-To: <20110421214928.GL13325@wotan.suse.de>
We have seen isolated cases (very few, I might add) of o2hb not
detecting all live nodes on startup. One plausible reasoning for it
is that other node had a hb io delay at the same time. The live
threshold currently is 2. That's as low as it can be. As we set it to
that because we start the heartbeat on mount.
With global heartbeat we can afford to increase that timeout. The
patch increases it for only global heartbeat and that too only for
the first heartbeat region.
Makes sense?
On 04/21/2011 02:49 PM, Mark Fasheh wrote:
> The patch itself looks fine. Can I ask you to add more reasoning for the
> change in the comment? Pretend that we're looking at that code in 2 years
> scratching our heads going "why did this get changed"
> --Mark
>
> On Tue, Apr 05, 2011 at 03:21:08PM -0700, Sunil Mushran wrote:
>> Double the live threshold for the first region in the global heartbeat
>> mode. The default, 2, is bare minimum. Increasing it will affect all
>> mounts in the local heartbeat mode. Instead we increase it only for
>> the global heartbeat mode and that too only the first region. This is
>> only to increase the margin of safety.
>>
>> Addresses internal Oracle bug#10635585.
>>
>> Signed-off-by: Sunil Mushran<sunil.mushran@oracle.com>
>> ---
>> fs/ocfs2/cluster/heartbeat.c | 13 ++++++++++++-
>> 1 files changed, 12 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
>> index 2461eb3..ec50121 100644
>> --- a/fs/ocfs2/cluster/heartbeat.c
>> +++ b/fs/ocfs2/cluster/heartbeat.c
>> @@ -1690,6 +1690,7 @@ static ssize_t o2hb_region_dev_write(struct o2hb_region *reg,
>> struct file *filp = NULL;
>> struct inode *inode = NULL;
>> ssize_t ret = -EINVAL;
>> + int live_threshold;
>>
>> if (reg->hr_bdev)
>> goto out;
>> @@ -1766,8 +1767,18 @@ static ssize_t o2hb_region_dev_write(struct o2hb_region *reg,
>> * A node is considered live after it has beat LIVE_THRESHOLD
>> * times. We're not steady until we've given them a chance
>> * _after_ our first read.
>> + * The default threshold is bare minimum so as to limit the delay
>> + * during mounts. For global heartbeat, the threshold doubled for the
>> + * first region.
>> */
>> - atomic_set(®->hr_steady_iterations, O2HB_LIVE_THRESHOLD + 1);
>> + live_threshold = O2HB_LIVE_THRESHOLD;
>> + if (o2hb_global_heartbeat_active()) {
>> + spin_lock(&o2hb_live_lock);
>> + if (o2hb_pop_count(&o2hb_region_bitmap, O2NM_MAX_REGIONS) == 1)
>> + live_threshold<<= 1;
>> + spin_unlock(&o2hb_live_lock);
>> + }
>> + atomic_set(®->hr_steady_iterations, live_threshold + 1);
>>
>> hb_task = kthread_run(o2hb_thread, reg, "o2hb-%s",
>> reg->hr_item.ci_name);
>> --
>> 1.7.1
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> --
> Mark Fasheh
next prev parent reply other threads:[~2011-04-21 22:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-05 22:21 [Ocfs2-devel] Few bug fixes Sunil Mushran
2011-04-05 22:21 ` [Ocfs2-devel] [PATCH 1/3] ocfs2/dlm: Use negotiated o2dlm protocol version Sunil Mushran
2011-04-19 17:44 ` Mark Fasheh
2011-04-05 22:21 ` [Ocfs2-devel] [PATCH 2/3] ocfs2/cluster: Increase the live threshold Sunil Mushran
2011-04-21 21:49 ` Mark Fasheh
2011-04-21 22:13 ` Sunil Mushran [this message]
2011-04-21 23:39 ` Joel Becker
2011-04-21 23:49 ` Mark Fasheh
2011-04-05 22:21 ` [Ocfs2-devel] [PATCH 3/3] ocfs2/cluster: Heartbeat mismatch message improved Sunil Mushran
2011-04-21 21:50 ` Mark Fasheh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DB0AC1A.5020303@oracle.com \
--to=sunil.mushran@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).