All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wido den Hollander <wido@widodh.nl>
To: Gregory Farnum <gregory.farnum@dreamhost.com>
Cc: ceph-devel@vger.kernel.org, Sage Weil <sage@newdream.net>
Subject: Re: Repeated messages of "heartbeat_check: no heartbeat from"
Date: Tue, 28 Feb 2012 16:42:03 +0100	[thread overview]
Message-ID: <4F4CF5CB.9060006@widodh.nl> (raw)
In-Reply-To: <CAF3hT9BPkWi6XXrKeyAv3pXXdPk+eRt=faQCv3ri7eDZey63qw@mail.gmail.com>

Hi,

On 02/24/2012 06:18 AM, Gregory Farnum wrote:
> On Thu, Feb 23, 2012 at 2:45 AM, Wido den Hollander<wido@widodh.nl>  wrote:
>> Hi,
>>
>>
>>
>> On 02/22/2012 07:08 PM, Gregory Farnum wrote:
>>>
>>> Wido,
>>> Sorry we lost track of this last week — we were all distracted by FAST 12!
>>> :)
>>>
>> No problem!
>>
>>
>>> So it looks like they're both on the same map and osd.4 is sending
>>> pings to osd.19, but osd.19 is just ignoring them? Or do you really
>>> have on debug_os and not debug_osd? :)
>>
>>
>> That was a typo, I have debug_osd set to 20.
>>
>> I haven't rebooted the OSD's since and now osd.4 and osd.19 are not
>> complaining anymore, but it's now a different set of OSD's who are saying
>> the other one is down.
>>
>> I'm still running v0.41 btw. I'm not going to touch the cluster until this
>> one is tracked down, it keeps coming back.
>>
>> Suggestions?
>
> Well, like Sage said long ago, this will be easiest to diagnose if
> there are logs available for both OSDs that cover the entire time
> after one requested heartbeats from the other.
>
> If you do have these and can post them somewhere, I'm sure Sage or I
> will find it interesting enough to look through...  ;)
> If not, I'm out of ideas, although I'm not super-familiar with the
> heartbeat code since Sage rewrote it so we may be able to come up with
> something if we discuss it more.
> -Greg

I created an issue for this with logs attached: 
http://tracker.newdream.net/issues/2116

Thanks,

Wido

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2012-02-28 15:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-05 12:11 Repeated messages of "heartbeat_check: no heartbeat from" Wido den Hollander
2011-08-10 10:30 ` Wido den Hollander
2011-10-14 14:26   ` Wido den Hollander
2011-10-14 15:16     ` Sage Weil
     [not found]       ` <4E9C4F45.8030704@widodh.nl>
2011-10-25 23:06         ` Sage Weil
2011-10-27  6:20           ` Wido den Hollander
2012-02-15 14:12   ` Wido den Hollander
2012-02-22 18:08     ` Gregory Farnum
2012-02-23 10:45       ` Wido den Hollander
2012-02-24  5:18         ` Gregory Farnum
2012-02-28 15:42           ` Wido den Hollander [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4CF5CB.9060006@widodh.nl \
    --to=wido@widodh.nl \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gregory.farnum@dreamhost.com \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.