From: Gregory Farnum <gregory.farnum@dreamhost.com>
To: Jim Schutt <jaschut@sandia.gov>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: cosd multi-second stalls cause "wrongly marked me down"
Date: Wed, 23 Feb 2011 12:27:17 -0800 [thread overview]
Message-ID: <8CE75CCEC22A4A2F8A76180F4DB7F4ED@gmail.com> (raw)
In-Reply-To: <1298489031.25491.250.camel@sale659.sandia.gov>
On Wednesday, February 23, 2011 at 11:23 AM, Jim Schutt wrote:
> > I have managed to get OSDs wrongly marking each other down during startup when they're peering large numbers of PGs/pools, as they disagree on who they need to be heartbeating (due to the slow handling of new osd maps and pg creates); if you're mostly seeing OSDs get incorrectly marked down during low epochs (your original email said epoch 7) this is probably what you're finding.
>
> What I've been trying to look for is heartbeat stalls after I
> start up a bunch of clients writing. I'm really not sure why that
> original log caught one at such an early epoch - maybe there's
> two things going on?
>
That wouldn't surprise me too much, but is something to keep in mind when observing. :)
> > We still have no idea what could be causing the stall *inside* of tick(), though. :/
>
> I think that one was just lucky. Most of the stalls I've
> collected are between ticks.
Stalls between ticks make a lot of sense, since tick requires the osd_lock and we have some functions holding it for way too long, but as far as we can tell a stalled tick() function shouldn't break anything -- heartbeats are sent independently, and all the processing of heartbeats (where you detect down OSDs) is done inside of tick in such a way that it's not going to lose delivery of heartbeats -- that shouldn't be a problem!
next prev parent reply other threads:[~2011-02-23 20:27 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-16 21:25 cosd multi-second stalls cause "wrongly marked me down" Jim Schutt
2011-02-16 21:37 ` Wido den Hollander
2011-02-16 21:51 ` Jim Schutt
2011-02-16 21:40 ` Gregory Farnum
2011-02-16 21:50 ` Jim Schutt
2011-02-17 0:50 ` Sage Weil
2011-02-17 0:54 ` Sage Weil
2011-02-17 15:46 ` Jim Schutt
2011-02-17 16:11 ` Sage Weil
2011-02-17 23:31 ` Jim Schutt
2011-02-18 7:13 ` Sage Weil
2011-02-18 17:04 ` Jim Schutt
2011-02-18 17:15 ` Gregory Farnum
2011-02-18 18:41 ` Jim Schutt
2011-02-18 19:07 ` Colin McCabe
2011-02-18 20:48 ` Jim Schutt
2011-02-18 20:58 ` Sage Weil
2011-02-18 21:09 ` Jim Schutt
2011-03-09 16:02 ` Jim Schutt
2011-03-09 17:07 ` Gregory Farnum
2011-03-09 18:36 ` Jim Schutt
2011-03-09 19:37 ` Gregory Farnum
2011-03-10 23:09 ` Jim Schutt
2011-03-10 23:21 ` Sage Weil
2011-03-10 23:32 ` Jim Schutt
2011-03-10 23:40 ` Sage Weil
2011-03-11 14:51 ` Jim Schutt
2011-03-11 18:26 ` Jim Schutt
2011-03-11 18:37 ` Jim Schutt
2011-03-11 18:37 ` Sage Weil
2011-03-11 18:51 ` Jim Schutt
2011-03-11 19:09 ` Gregory Farnum
2011-03-11 19:13 ` Yehuda Sadeh Weinraub
2011-03-11 19:17 ` Yehuda Sadeh Weinraub
2011-03-11 19:16 ` Jim Schutt
2011-03-11 21:13 ` Jim Schutt
2011-03-11 21:37 ` Sage Weil
2011-03-11 22:21 ` Jim Schutt
2011-03-11 22:26 ` Jim Schutt
2011-03-11 22:45 ` Sage Weil
2011-03-11 23:29 ` Jim Schutt
2011-03-30 21:26 ` Jim Schutt
2011-03-30 21:55 ` Sage Weil
2011-03-31 14:16 ` Jim Schutt
2011-03-31 16:25 ` Sage Weil
2011-03-31 17:00 ` Jim Schutt
2011-03-31 17:10 ` Jim Schutt
2011-03-31 17:24 ` Sage Weil
2011-03-31 18:08 ` Jim Schutt
2011-03-31 18:41 ` Sage Weil
2011-04-01 22:38 ` Jim Schutt
2011-02-23 17:52 ` Jim Schutt
2011-02-23 18:12 ` Gregory Farnum
2011-02-23 18:54 ` Sage Weil
2011-02-23 19:12 ` Gregory Farnum
2011-02-23 19:23 ` Jim Schutt
2011-02-23 20:27 ` Gregory Farnum [this message]
2011-03-02 0:53 ` Sage Weil
2011-03-02 15:21 ` Jim Schutt
2011-03-02 17:10 ` Sage Weil
2011-03-02 20:54 ` Jim Schutt
2011-03-02 21:45 ` Sage Weil
2011-03-02 21:59 ` Jim Schutt
2011-03-02 22:57 ` Jim Schutt
2011-03-02 23:20 ` Gregory Farnum
2011-03-02 23:25 ` Jim Schutt
2011-03-02 23:33 ` Gregory Farnum
2011-03-03 2:26 ` Colin McCabe
2011-03-03 20:03 ` Jim Schutt
2011-03-03 20:47 ` Jim Schutt
2011-03-03 20:55 ` Yehuda Sadeh Weinraub
2011-03-03 21:45 ` Jim Schutt
2011-03-03 22:22 ` Sage Weil
2011-03-03 22:34 ` Jim Schutt
2011-03-03 21:53 ` Colin McCabe
2011-03-03 23:06 ` Jim Schutt
2011-03-03 23:30 ` Colin McCabe
2011-03-03 23:37 ` Jim Schutt
2011-03-03 5:03 ` Sage Weil
2011-03-03 16:35 ` Jim Schutt
2011-03-03 17:28 ` Jim Schutt
2011-03-03 18:04 ` Sage Weil
2011-03-03 18:42 ` Jim Schutt
2011-03-03 18:51 ` Sage Weil
2011-03-03 19:39 ` Jim Schutt
2011-04-08 16:23 ` Jim Schutt
2011-04-08 20:50 ` Sage Weil
2011-04-08 22:11 ` Jim Schutt
2011-04-08 23:10 ` Colin McCabe
2011-04-11 14:41 ` Jim Schutt
2011-04-11 16:25 ` Sage Weil
2011-04-11 20:14 ` Jim Schutt
2011-04-11 21:18 ` Jim Schutt
2011-04-11 23:23 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8CE75CCEC22A4A2F8A76180F4DB7F4ED@gmail.com \
--to=gregory.farnum@dreamhost.com \
--cc=ceph-devel@vger.kernel.org \
--cc=jaschut@sandia.gov \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.