CEPH filesystem development
 help / color / mirror / Atom feed
From: Alex Elder <elder@inktank.com>
To: Sage Weil <sage@inktank.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: [PATCH 3/3] libceph: WARN, don't BUG on unexpected connection states
Date: Thu, 27 Dec 2012 20:01:47 -0600	[thread overview]
Message-ID: <50DCFD8B.7000603@inktank.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1212271655190.22449@cobra.newdream.net>

On 12/27/2012 06:57 PM, Sage Weil wrote:
> I agree that we should do BUG -> WARN on con->state everywhere.
> 
> But I don't think we should drop any of them, yet.  For Ugis's particular 
> crash, it was fail_protocol()'s fault... see my other patch.  The rest of 
> the time, a socket close should be caught at the top of con_work().

I looked at it again, and I think you're right.  I'd rather keep
the constraints in anyway.  So I will remove that last hunk from
this patch.

> For any other cases where we see con->state changing when we don't expect 
> it to, let's look at them on a case-by-case basis and address them in 
> separate patches?

Good plan.  With WARN_ON() rather than BUG_ON() if we find something
we got wrong it won't be a serious problem.

					-Alex

> sage
> 
> 
> On Thu, 27 Dec 2012, Alex Elder wrote:
> 
>> A number of assertions in the ceph messenger are implemented with
>> BUG_ON(), killing the system if connection's state doesn't match
>> what's expected.  At this point our state model is (evidently) not
>> well understood enough for these assertions to trigger a BUG().
>> Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...)
>> so we learn about these issues without killing the machine.
>>
>> We now recognize that a connection fault can occur due to a socket
>> closure at any time, regardless of the state of the connection.  So
>> there is really nothing we can assert about the state of the
>> connection at that point so eliminate that assertion.
>>
>> Reported-by: Ugis <ugis22@gmail.com>
>> Tested-by: Ugis <ugis22@gmail.com>
>> Signed-off-by: Alex Elder <elder@inktank.com>
>> ---
>>  net/ceph/messenger.c |   13 ++++---------
>>  1 file changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
>> index 4d111fd..075b9fd 100644
>> --- a/net/ceph/messenger.c
>> +++ b/net/ceph/messenger.c
>> @@ -561,7 +561,7 @@ void ceph_con_open(struct ceph_connection *con,
>>  	mutex_lock(&con->mutex);
>>  	dout("con_open %p %s\n", con, ceph_pr_addr(&addr->in_addr));
>>
>> -	BUG_ON(con->state != CON_STATE_CLOSED);
>> +	WARN_ON(con->state != CON_STATE_CLOSED);
>>  	con->state = CON_STATE_PREOPEN;
>>
>>  	con->peer_name.type = (__u8) entity_type;
>> @@ -1509,7 +1509,7 @@ static int process_banner(struct ceph_connection *con)
>>  static void fail_protocol(struct ceph_connection *con)
>>  {
>>  	reset_connection(con);
>> -	BUG_ON(con->state != CON_STATE_NEGOTIATING);
>> +	WARN_ON(con->state != CON_STATE_NEGOTIATING);
>>  	con->state = CON_STATE_CLOSED;
>>  }
>>
>> @@ -1635,7 +1635,7 @@ static int process_connect(struct ceph_connection
>> *con)
>>  			return -1;
>>  		}
>>
>> -		BUG_ON(con->state != CON_STATE_NEGOTIATING);
>> +		WARN_ON(con->state != CON_STATE_NEGOTIATING);
>>  		con->state = CON_STATE_OPEN;
>>
>>  		con->peer_global_seq = le32_to_cpu(con->in_reply.global_seq);
>> @@ -2132,7 +2132,6 @@ more:
>>  		if (ret < 0)
>>  			goto out;
>>
>> -		BUG_ON(con->state != CON_STATE_CONNECTING);
>>  		con->state = CON_STATE_NEGOTIATING;
>>
>>  		/*
>> @@ -2160,7 +2159,7 @@ more:
>>  		goto more;
>>  	}
>>
>> -	BUG_ON(con->state != CON_STATE_OPEN);
>> +	WARN_ON(con->state != CON_STATE_OPEN);
>>
>>  	if (con->in_base_pos < 0) {
>>  		/*
>> @@ -2382,10 +2381,6 @@ static void ceph_fault(struct ceph_connection *con)
>>  	dout("fault %p state %lu to peer %s\n",
>>  	     con, con->state, ceph_pr_addr(&con->peer_addr.in_addr));
>>
>> -	BUG_ON(con->state != CON_STATE_CONNECTING &&
>> -	       con->state != CON_STATE_NEGOTIATING &&
>> -	       con->state != CON_STATE_OPEN);
>> -
>>  	con_close_socket(con);
>>
>>  	if (test_bit(CON_FLAG_LOSSYTX, &con->flags)) {
>> -- 
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


      reply	other threads:[~2012-12-28  2:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-27 23:09 [PATCH 0/3] libceph: three bug fixes Alex Elder
2012-12-27 23:17 ` [PATCH 1/3] libceph: move linger requests sooner in kick_requests() Alex Elder
2012-12-28  0:54   ` Sage Weil
2012-12-27 23:17 ` [PATCH 2/3] libceph: always reset osds when kicking Alex Elder
2012-12-28  0:55   ` Sage Weil
2012-12-27 23:17 ` [PATCH 3/3] libceph: WARN, don't BUG on unexpected connection states Alex Elder
2012-12-28  0:57   ` Sage Weil
2012-12-28  2:01     ` Alex Elder [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50DCFD8B.7000603@inktank.com \
    --to=elder@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox