All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jim Schutt" <jaschut@sandia.gov>
To: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Subject: Re: [PATCH 2/2] libceph: fix handle_timeout() racing with con_work()/try_write()
Date: Thu, 19 May 2011 11:31:31 -0600	[thread overview]
Message-ID: <4DD553F3.4020201@sandia.gov> (raw)
In-Reply-To: <Pine.LNX.4.64.1105181635580.1988@cobra.newdream.net>

Hi Sage,

Sage Weil wrote:
> On Wed, 18 May 2011, Jim Schutt wrote:
>> Sage Weil wrote:
>>
>>> I pushed a patch to the msgr_race branch that catches all four cases (I
>>> think).  Does the fix make sense given what you saw?
>> Sorry, I haven't completed much testing; it took me a while
>> to figure out the fix needs this:
>>
>> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
>> index 9c0a9bd..b140dd3 100644
>> --- a/net/ceph/messenger.c
>> +++ b/net/ceph/messenger.c
>> @@ -2013,6 +2013,7 @@ done:
>>  	mutex_unlock(&con->mutex);
>>  done_unlocked:
>>  	con->ops->put(con);
>> +	return;
>>
>>  fault:
>>  	mutex_unlock(&con->mutex);
>>
>>
>> Still testing....
> 
> Good catch.  Let us know how it goes!

I've been testing commit a30413af363 from your msgr_race
branch, and I think it is ready.  In my testing of it I've
found none of the signs that I recognize as indicators of
the race we were trying to fix.

However, with that issue fixed I'm now running into a
different type of stall under a heavy write load.

If the load is heavy enough to trigger that issue where
heartbeat processing is delayed, and an osd is wrongly
marked down, then I see a flurry of messages with bad
tags.

So far I can't generate a high enough load with much client
debugging enabled, but what I have done is turn on client
debugging after I see the "wrongly marked me down" from
ceph -w, and the bad message tags in the osd logs.

When I do that I see some clients with a few messages
queued up to send.  Sending of the first message
starts as expected, but before it completes it stalls,
as though the socket buffer fills up and isn't being
emptied.  Then the message times out, the socket is
closed/reopened, and the partially-sent message is
resent from the beginning.  And then sending stalls again.

I'll let you know when I've got some logs that suggest
what the problem might be...

FWIW, on the server side I'm running the stable branch
(commit b6cccc741a3).

-- Jim

> 
> sge
> 
> 



      reply	other threads:[~2011-05-19 17:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-04 22:27 Client reconnect failing: reader gets bad tag Jim Schutt
2011-05-05 19:19 ` Sage Weil
2011-05-05 20:23   ` Jim Schutt
2011-05-06 21:56   ` Jim Schutt
2011-05-12 21:32   ` [PATCH 1/2] libceph: add debugging to understand how bad msg tag is getting sent Jim Schutt
2011-05-12 21:32   ` [PATCH 2/2] libceph: fix handle_timeout() racing with con_work()/try_write() Jim Schutt
2011-05-16 16:57     ` [PATCH v2 0/1] " Jim Schutt
2011-05-16 16:57       ` [PATCH v2 1/1] " Jim Schutt
2011-05-16 17:57     ` [PATCH 2/2] " Sage Weil
2011-05-16 19:06       ` Jim Schutt
2011-05-17 22:32       ` Jim Schutt
2011-05-17 23:27         ` Sage Weil
2011-05-17 23:38           ` Sage Weil
2011-05-18 14:34             ` Jim Schutt
2011-05-18 20:27             ` Jim Schutt
2011-05-18 23:36               ` Sage Weil
2011-05-19 17:31                 ` Jim Schutt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DD553F3.4020201@sandia.gov \
    --to=jaschut@sandia.gov \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.