CEPH filesystem development
 help / color / mirror / Atom feed
* [PATCH 0/4] libceph: message skipping fixes
@ 2016-02-20 16:45 Ilya Dryomov
  2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw)
  To: ceph-devel; +Cc: Varada Kari

Hello,

Patch 1 hopefully fixes the problem that Varada ran into [1].
Patches 2 and 3 are surrounding fixes, but still important enough to
backport.

[1] http://www.spinics.net/lists/ceph-devel/msg28712.html

Thanks,

                Ilya


Ilya Dryomov (4):
  libceph: don't bail early from try_read() when skipping a message
  libceph: use the right footer size when skipping a message
  libceph: don't spam dmesg with stray reply warnings
  libceph: use sizeof_footer() in ceph_msg_revoke()

 net/ceph/messenger.c  | 26 +++++++++++++-------------
 net/ceph/osd_client.c |  4 ++--
 2 files changed, 15 insertions(+), 15 deletions(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message
  2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov
@ 2016-02-20 16:45 ` Ilya Dryomov
  2016-02-20 18:28   ` Alex Elder
  2016-02-20 16:45 ` [PATCH 2/4] libceph: use the right footer size " Ilya Dryomov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw)
  To: ceph-devel; +Cc: Varada Kari

The contract between try_read() and try_write() is that when called
each processes as much data as possible.  When instructed by osd_client
to skip a message, try_read() is violating this contract by returning
after receiving and discarding a single message instead of checking for
more.  try_write() then gets a chance to write out more requests,
generating more replies/skips for try_read() to handle, forcing the
messenger into a starvation loop.

Cc: stable@vger.kernel.org # 3.10+
Reported-by: Varada Kari <Varada.Kari@sandisk.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Varada Kari <Varada.Kari@sandisk.com>
---
 net/ceph/messenger.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 9cfedf565f5b..fec20819a5ea 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -2337,7 +2337,7 @@ static int read_partial_message(struct ceph_connection *con)
 		con->in_base_pos = -front_len - middle_len - data_len -
 			sizeof(m->footer);
 		con->in_tag = CEPH_MSGR_TAG_READY;
-		return 0;
+		return 1;
 	} else if ((s64)seq - (s64)con->in_seq > 1) {
 		pr_err("read_partial_message bad seq %lld expected %lld\n",
 		       seq, con->in_seq + 1);
@@ -2363,7 +2363,7 @@ static int read_partial_message(struct ceph_connection *con)
 				sizeof(m->footer);
 			con->in_tag = CEPH_MSGR_TAG_READY;
 			con->in_seq++;
-			return 0;
+			return 1;
 		}
 
 		BUG_ON(!con->in_msg);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] libceph: use the right footer size when skipping a message
  2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov
  2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov
@ 2016-02-20 16:45 ` Ilya Dryomov
  2016-02-20 18:44   ` Alex Elder
  2016-02-20 16:45 ` [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings Ilya Dryomov
  2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov
  3 siblings, 1 reply; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw)
  To: ceph-devel; +Cc: Varada Kari

ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.

Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 net/ceph/messenger.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index fec20819a5ea..dd7c1b7f932b 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -2284,6 +2284,13 @@ static int read_partial_msg_data(struct ceph_connection *con)
 	return 1;	/* must return > 0 to indicate success */
 }
 
+static size_t sizeof_footer(struct ceph_connection *con)
+{
+	return (con->peer_features & CEPH_FEATURE_MSG_AUTH) ?
+	    sizeof(struct ceph_msg_footer) :
+	    sizeof(struct ceph_msg_footer_old);
+}
+
 /*
  * read (part of) a message.
  */
@@ -2335,7 +2342,7 @@ static int read_partial_message(struct ceph_connection *con)
 			ceph_pr_addr(&con->peer_addr.in_addr),
 			seq, con->in_seq + 1);
 		con->in_base_pos = -front_len - middle_len - data_len -
-			sizeof(m->footer);
+			sizeof_footer(con);
 		con->in_tag = CEPH_MSGR_TAG_READY;
 		return 1;
 	} else if ((s64)seq - (s64)con->in_seq > 1) {
@@ -2360,7 +2367,7 @@ static int read_partial_message(struct ceph_connection *con)
 			/* skip this message */
 			dout("alloc_msg said skip message\n");
 			con->in_base_pos = -front_len - middle_len - data_len -
-				sizeof(m->footer);
+				sizeof_footer(con);
 			con->in_tag = CEPH_MSGR_TAG_READY;
 			con->in_seq++;
 			return 1;
@@ -2402,11 +2409,7 @@ static int read_partial_message(struct ceph_connection *con)
 	}
 
 	/* footer */
-	if (need_sign)
-		size = sizeof(m->footer);
-	else
-		size = sizeof(m->old_footer);
-
+	size = sizeof_footer(con);
 	end += size;
 	ret = read_partial(con, end, size, &m->footer);
 	if (ret <= 0)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings
  2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov
  2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov
  2016-02-20 16:45 ` [PATCH 2/4] libceph: use the right footer size " Ilya Dryomov
@ 2016-02-20 16:45 ` Ilya Dryomov
  2016-02-20 18:46   ` Alex Elder
  2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov
  3 siblings, 1 reply; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw)
  To: ceph-devel; +Cc: Varada Kari

Commit d15f9d694b77 ("libceph: check data_len in ->alloc_msg()")
mistakenly bumped the log level on the "tid %llu unknown, skipping"
message.  Turn it back into a dout() - stray replies are perfectly
normal when OSDs flap, crash, get killed for testing purposes, etc.

Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 net/ceph/osd_client.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 7c1a5d1734c3..32355d9d0103 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2890,8 +2890,8 @@ static struct ceph_msg *get_reply(struct ceph_connection *con,
 	mutex_lock(&osdc->request_mutex);
 	req = __lookup_request(osdc, tid);
 	if (!req) {
-		pr_warn("%s osd%d tid %llu unknown, skipping\n",
-			__func__, osd->o_osd, tid);
+		dout("%s osd%d tid %llu unknown, skipping\n", __func__,
+		     osd->o_osd, tid);
 		m = NULL;
 		*skip = 1;
 		goto out;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke()
  2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov
                   ` (2 preceding siblings ...)
  2016-02-20 16:45 ` [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings Ilya Dryomov
@ 2016-02-20 16:45 ` Ilya Dryomov
  2016-02-20 18:47   ` Alex Elder
  2016-02-20 19:56   ` Alex Elder
  3 siblings, 2 replies; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw)
  To: ceph-devel; +Cc: Varada Kari

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 net/ceph/messenger.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index dd7c1b7f932b..43edf897c9eb 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -3085,10 +3085,7 @@ void ceph_msg_revoke(struct ceph_msg *msg)
 			con->out_skip += con_out_kvec_skip(con);
 		} else {
 			BUG_ON(!msg->data_length);
-			if (con->peer_features & CEPH_FEATURE_MSG_AUTH)
-				con->out_skip += sizeof(msg->footer);
-			else
-				con->out_skip += sizeof(msg->old_footer);
+			con->out_skip += sizeof_footer(con);
 		}
 		/* data, middle, front */
 		if (msg->data_length)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message
  2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov
@ 2016-02-20 18:28   ` Alex Elder
  2016-02-20 20:21     ` Ilya Dryomov
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Elder @ 2016-02-20 18:28 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari

On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
> The contract between try_read() and try_write() is that when called
> each processes as much data as possible.  When instructed by osd_client
> to skip a message, try_read() is violating this contract by returning
> after receiving and discarding a single message instead of checking for
> more.  try_write() then gets a chance to write out more requests,
> generating more replies/skips for try_read() to handle, forcing the
> messenger into a starvation loop.

This makes sense.  If (for example) two messages are pending
(either arriving or already arrived and buffered), and the
first is going to be discarded, we should still process the
second one to the extent possible.


I'm going to review the way this stuff works. (I always have
to do that anyway it seems, so I document it in the process...)


There are two places when read_partial_message() will decide
to discard a message.  One is if a completely valid message
header arrives, but the sequence number is wrong (this means
re-sent message has arrived).  The second place the messenger
discards an incoming message is if the connection's message
buffer allocation method indicates the message should be
skipped.  The MDS msg_alloc method never calls for a buffer
to be skipped.  But the mon client and the OSD client do.

The mon client and OSD client issue requests to their
servers which are expected to receive response messages.
When a mon or OSD request is set up, a buffer to hold its
corresponding response message is preallocated.  So when
receiving a message on a mon or OSD connection, the message
allocation function used by the messenger looks up and
returns that preallocated buffer.  If a response message
arrives from a mon server or OSD was not expected (i.e.,
no outstanding request is waiting for that response) the
incoming response message should be skipped.

Additionally, an OSD client expects to receive only a few
types of messages.  It peeks at an incoming message to
determine how it should be handled.  If an incoming message
is not of a type recognized by the OSD client, the message
should be skipped.  Even if an incoming message is an OSD
response that was expected, it is possible that the buffer
allocated to receive the response is too small for the
incoming message; and in that case the incoming message
should be skipped.

So...  There are two places in try_read_message() where
incoming messages should be discarded, and they represent:
receipt of a duplicate (re-sent) message; receipt of an
unexpected message type; receipt of a response message
that was not expected; or receipt of a response that was
too large for the buffer that had been allocated to hold it.


A stream of data coming in a connection consists of blocks
of data, each of which begins with a "tag" that indicates
what is contained within the block that follows. A
connection's "in_tag" indicates what we're expecting to
see next on input, and if it's READY it means we've
finished reading a previous block (and so the next
byte should be a tag).

If a connection should discard some incoming data, this is
represented by having its "in_base_pos" value be negative.
The value of "in_base_pos" in this case is negative the
number of bytes to be discarded.  Whenever a negative
value is recorded in "in_base_pos", the connection's
"in_tag" field is set to READY, indicating that immediately
following the discarded bytes is expected to be a tag.

When a connection is in OPEN state, the first thing
try_read() does is discard the incoming message data if
it is supposed to be skipped.  Once this is done, it looks
at "in_tag" to determine what to do next, and if it's READY
it sets "in_tag" to the next byte on input, and prepares
for receiving the block of data that follows accordingly.


OK, now coming back to the issue at hand.  In try_read(),
if the current "in_tag" is MSG, a message is expected on
input, and read_partial_message() is called.  If that
returns 0 (or negative), try_read() completes and returns
to its caller.  Otherwise, if the connection's "in_tag"
is READY, try_read() jumps back to the top again, to
continue reading the next block of data on the input.

Currently, if read_partial_message() ever finds that the
incoming message should be skipped, it indicates that
by setting the negated number of bytes to be skipped in
the connection's "in_base_pos" field, and setting its
"in_tag" to READY.  HOWEVER, read_partial_message()
currently returns 0 in this case, which causes its
caller, try_read(), to quit reading.


This fix changes the two places in read_partial_message()
that arrange to skip the current incoming message so they
return 1 rather than 0.  The result is that when control
returns to try_read(), it will return to its top.  That
will cause the incoming message to be discarded, *and*
will cause the next block of incoming data to be processed.

Without this change, try_write() will get called again
before try_read() gets a chance to discard the current
message and read the next one.


In other words...

Looks good.

Reviewed-by: Alex Elder <elder@linaro.org>

> Cc: stable@vger.kernel.org # 3.10+
> Reported-by: Varada Kari <Varada.Kari@sandisk.com>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> Tested-by: Varada Kari <Varada.Kari@sandisk.com>
> ---
>  net/ceph/messenger.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index 9cfedf565f5b..fec20819a5ea 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -2337,7 +2337,7 @@ static int read_partial_message(struct ceph_connection *con)
>  		con->in_base_pos = -front_len - middle_len - data_len -
>  			sizeof(m->footer);
>  		con->in_tag = CEPH_MSGR_TAG_READY;
> -		return 0;
> +		return 1;
>  	} else if ((s64)seq - (s64)con->in_seq > 1) {
>  		pr_err("read_partial_message bad seq %lld expected %lld\n",
>  		       seq, con->in_seq + 1);
> @@ -2363,7 +2363,7 @@ static int read_partial_message(struct ceph_connection *con)
>  				sizeof(m->footer);
>  			con->in_tag = CEPH_MSGR_TAG_READY;
>  			con->in_seq++;
> -			return 0;
> +			return 1;
>  		}
>  
>  		BUG_ON(!con->in_msg);
> k


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] libceph: use the right footer size when skipping a message
  2016-02-20 16:45 ` [PATCH 2/4] libceph: use the right footer size " Ilya Dryomov
@ 2016-02-20 18:44   ` Alex Elder
  2016-02-20 20:05     ` Ilya Dryomov
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Elder @ 2016-02-20 18:44 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari

+On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
> ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
> Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.

This looks good, but I have a few suggestions.

You could done some factoring in prepare_write_message_footer()
to use the new function for setting iov_len and updating
out_kvec_bytes.

Consider my suggestions, but otherwise I believe this
is correct.

Reviewed-by: Alex Elder <elder@linaro.org>

> Cc: stable@vger.kernel.org # 3.19+
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
>  net/ceph/messenger.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index fec20819a5ea..dd7c1b7f932b 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -2284,6 +2284,13 @@ static int read_partial_msg_data(struct ceph_connection *con)
>  	return 1;	/* must return > 0 to indicate success */
>  }
>  
> +static size_t sizeof_footer(struct ceph_connection *con)

I understand why this is named this way.  But it's supplying a
connection as argument, but a footer is a message attribute.
So I might call this con_msg_footer_size().

> +{
> +	return (con->peer_features & CEPH_FEATURE_MSG_AUTH) ?
> +	    sizeof(struct ceph_msg_footer) :
> +	    sizeof(struct ceph_msg_footer_old);
> +}
> +
>  /*
>   * read (part of) a message.
>   */
> @@ -2335,7 +2342,7 @@ static int read_partial_message(struct ceph_connection *con)
>  			ceph_pr_addr(&con->peer_addr.in_addr),
>  			seq, con->in_seq + 1);
>  		con->in_base_pos = -front_len - middle_len - data_len -
> -			sizeof(m->footer);
> +			sizeof_footer(con);
>  		con->in_tag = CEPH_MSGR_TAG_READY;
>  		return 1;
>  	} else if ((s64)seq - (s64)con->in_seq > 1) {
> @@ -2360,7 +2367,7 @@ static int read_partial_message(struct ceph_connection *con)
>  			/* skip this message */
>  			dout("alloc_msg said skip message\n");
>  			con->in_base_pos = -front_len - middle_len - data_len -
> -				sizeof(m->footer);
> +				sizeof_footer(con);
>  			con->in_tag = CEPH_MSGR_TAG_READY;
>  			con->in_seq++;
>  			return 1;
> @@ -2402,11 +2409,7 @@ static int read_partial_message(struct ceph_connection *con)
>  	}
>  
>  	/* footer */
> -	if (need_sign)
> -		size = sizeof(m->footer);
> -	else
> -		size = sizeof(m->old_footer);
> -
> +	size = sizeof_footer(con);
>  	end += size;
>  	ret = read_partial(con, end, size, &m->footer);
>  	if (ret <= 0)
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings
  2016-02-20 16:45 ` [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings Ilya Dryomov
@ 2016-02-20 18:46   ` Alex Elder
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Elder @ 2016-02-20 18:46 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari

On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
> Commit d15f9d694b77 ("libceph: check data_len in ->alloc_msg()")
> mistakenly bumped the log level on the "tid %llu unknown, skipping"
> message.  Turn it back into a dout() - stray replies are perfectly
> normal when OSDs flap, crash, get killed for testing purposes, etc.
> 
> Cc: stable@vger.kernel.org # 4.3+
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Looks good.

Reviewed-by: Alex Elder <elder@linaro.org>

> ---
>  net/ceph/osd_client.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 7c1a5d1734c3..32355d9d0103 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -2890,8 +2890,8 @@ static struct ceph_msg *get_reply(struct ceph_connection *con,
>  	mutex_lock(&osdc->request_mutex);
>  	req = __lookup_request(osdc, tid);
>  	if (!req) {
> -		pr_warn("%s osd%d tid %llu unknown, skipping\n",
> -			__func__, osd->o_osd, tid);
> +		dout("%s osd%d tid %llu unknown, skipping\n", __func__,
> +		     osd->o_osd, tid);
>  		m = NULL;
>  		*skip = 1;
>  		goto out;
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke()
  2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov
@ 2016-02-20 18:47   ` Alex Elder
  2016-02-20 19:51     ` Ilya Dryomov
  2016-02-20 19:56   ` Alex Elder
  1 sibling, 1 reply; 18+ messages in thread
From: Alex Elder @ 2016-02-20 18:47 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari

On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Please just reorder your patches and squash this in with
your [2/4] patch.

					-Alex

> ---
>  net/ceph/messenger.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index dd7c1b7f932b..43edf897c9eb 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -3085,10 +3085,7 @@ void ceph_msg_revoke(struct ceph_msg *msg)
>  			con->out_skip += con_out_kvec_skip(con);
>  		} else {
>  			BUG_ON(!msg->data_length);
> -			if (con->peer_features & CEPH_FEATURE_MSG_AUTH)
> -				con->out_skip += sizeof(msg->footer);
> -			else
> -				con->out_skip += sizeof(msg->old_footer);
> +			con->out_skip += sizeof_footer(con);
>  		}
>  		/* data, middle, front */
>  		if (msg->data_length)
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke()
  2016-02-20 18:47   ` Alex Elder
@ 2016-02-20 19:51     ` Ilya Dryomov
  0 siblings, 0 replies; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 19:51 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development, Varada Kari

On Sat, Feb 20, 2016 at 7:47 PM, Alex Elder <elder@ieee.org> wrote:
> On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>
> Please just reorder your patches and squash this in with
> your [2/4] patch.

Can't do - this code in ceph_msg_revoke() is new in 4.5, while 2/4
needs to be backported.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke()
  2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov
  2016-02-20 18:47   ` Alex Elder
@ 2016-02-20 19:56   ` Alex Elder
  1 sibling, 0 replies; 18+ messages in thread
From: Alex Elder @ 2016-02-20 19:56 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari

On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Looks good.

Reviewed-by: Alex Elder <elder@linaro.org>

> ---
>  net/ceph/messenger.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index dd7c1b7f932b..43edf897c9eb 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -3085,10 +3085,7 @@ void ceph_msg_revoke(struct ceph_msg *msg)
>  			con->out_skip += con_out_kvec_skip(con);
>  		} else {
>  			BUG_ON(!msg->data_length);
> -			if (con->peer_features & CEPH_FEATURE_MSG_AUTH)
> -				con->out_skip += sizeof(msg->footer);
> -			else
> -				con->out_skip += sizeof(msg->old_footer);
> +			con->out_skip += sizeof_footer(con);
>  		}
>  		/* data, middle, front */
>  		if (msg->data_length)
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] libceph: use the right footer size when skipping a message
  2016-02-20 18:44   ` Alex Elder
@ 2016-02-20 20:05     ` Ilya Dryomov
  2016-02-20 20:15       ` Alex Elder
  2016-02-21 14:31       ` Ilya Dryomov
  0 siblings, 2 replies; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 20:05 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development, Varada Kari

On Sat, Feb 20, 2016 at 7:44 PM, Alex Elder <elder@ieee.org> wrote:
> +On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
>> ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
>> Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.
>
> This looks good, but I have a few suggestions.
>
> You could done some factoring in prepare_write_message_footer()
> to use the new function for setting iov_len and updating
> out_kvec_bytes.

I considered it, but we'd still need the if on MSG_AUTH bit, so
I decided it wasn't worth it.  TBH I'm more inclined to rip this
old_footer stuff entirely - it's supported since v0.55, that's
pre-bobtail...

>
> Consider my suggestions, but otherwise I believe this
> is correct.
>
> Reviewed-by: Alex Elder <elder@linaro.org>
>
>> Cc: stable@vger.kernel.org # 3.19+
>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>> ---
>>  net/ceph/messenger.c | 17 ++++++++++-------
>>  1 file changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
>> index fec20819a5ea..dd7c1b7f932b 100644
>> --- a/net/ceph/messenger.c
>> +++ b/net/ceph/messenger.c
>> @@ -2284,6 +2284,13 @@ static int read_partial_msg_data(struct ceph_connection *con)
>>       return 1;       /* must return > 0 to indicate success */
>>  }
>>
>> +static size_t sizeof_footer(struct ceph_connection *con)
>
> I understand why this is named this way.  But it's supplying a
> connection as argument, but a footer is a message attribute.
> So I might call this con_msg_footer_size().

The *size* of footer is a connection attribute though ;)

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] libceph: use the right footer size when skipping a message
  2016-02-20 20:05     ` Ilya Dryomov
@ 2016-02-20 20:15       ` Alex Elder
  2016-02-21 14:31       ` Ilya Dryomov
  1 sibling, 0 replies; 18+ messages in thread
From: Alex Elder @ 2016-02-20 20:15 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development, Varada Kari

On 02/20/2016 02:05 PM, Ilya Dryomov wrote:
>>> >>
>>> >> +static size_t sizeof_footer(struct ceph_connection *con)
>> >
>> > I understand why this is named this way.  But it's supplying a
>> > connection as argument, but a footer is a message attribute.
>> > So I might call this con_msg_footer_size().
> The *size* of footer is a connection attribute though ;)

Yeah, I know.  That's why I suggested the con_msg_***().
But in any case, the code looks correct regardless of the
name.

					-Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message
  2016-02-20 18:28   ` Alex Elder
@ 2016-02-20 20:21     ` Ilya Dryomov
  2016-02-20 20:38       ` Alex Elder
  0 siblings, 1 reply; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 20:21 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development, Varada Kari

On Sat, Feb 20, 2016 at 7:28 PM, Alex Elder <elder@ieee.org> wrote:
> On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
>> The contract between try_read() and try_write() is that when called
>> each processes as much data as possible.  When instructed by osd_client
>> to skip a message, try_read() is violating this contract by returning
>> after receiving and discarding a single message instead of checking for
>> more.  try_write() then gets a chance to write out more requests,
>> generating more replies/skips for try_read() to handle, forcing the
>> messenger into a starvation loop.
>
> This makes sense.  If (for example) two messages are pending
> (either arriving or already arrived and buffered), and the
> first is going to be discarded, we should still process the
> second one to the extent possible.
>
>
> I'm going to review the way this stuff works. (I always have
> to do that anyway it seems, so I document it in the process...)
>
>
> There are two places when read_partial_message() will decide
> to discard a message.  One is if a completely valid message
> header arrives, but the sequence number is wrong (this means
> re-sent message has arrived).  The second place the messenger
> discards an incoming message is if the connection's message
> buffer allocation method indicates the message should be
> skipped.  The MDS msg_alloc method never calls for a buffer
> to be skipped.  But the mon client and the OSD client do.
>
> The mon client and OSD client issue requests to their
> servers which are expected to receive response messages.
> When a mon or OSD request is set up, a buffer to hold its
> corresponding response message is preallocated.  So when
> receiving a message on a mon or OSD connection, the message
> allocation function used by the messenger looks up and
> returns that preallocated buffer.  If a response message
> arrives from a mon server or OSD was not expected (i.e.,
> no outstanding request is waiting for that response) the
> incoming response message should be skipped.
>
> Additionally, an OSD client expects to receive only a few
> types of messages.  It peeks at an incoming message to
> determine how it should be handled.  If an incoming message
> is not of a type recognized by the OSD client, the message
> should be skipped.  Even if an incoming message is an OSD
> response that was expected, it is possible that the buffer
> allocated to receive the response is too small for the
> incoming message; and in that case the incoming message
> should be skipped.
>
>
> So...  There are two places in try_read_message() where
> incoming messages should be discarded, and they represent:
> receipt of a duplicate (re-sent) message; receipt of an
> unexpected message type; receipt of a response message
> that was not expected; or receipt of a response that was
> too large for the buffer that had been allocated to hold it.

Just a note: it's less about the preallocated buffer being too small
part (in theory we could allocate everything in alloc_msg(), although
that would be stupid) and more about the "no outstanding request is
waiting for that response" part.  Stray and/or duplicate replies from
the OSDs are expected in various cases.

>
>
> A stream of data coming in a connection consists of blocks
> of data, each of which begins with a "tag" that indicates
> what is contained within the block that follows. A
> connection's "in_tag" indicates what we're expecting to
> see next on input, and if it's READY it means we've
> finished reading a previous block (and so the next
> byte should be a tag).
>
> If a connection should discard some incoming data, this is
> represented by having its "in_base_pos" value be negative.
> The value of "in_base_pos" in this case is negative the
> number of bytes to be discarded.  Whenever a negative
> value is recorded in "in_base_pos", the connection's
> "in_tag" field is set to READY, indicating that immediately
> following the discarded bytes is expected to be a tag.
>
> When a connection is in OPEN state, the first thing
> try_read() does is discard the incoming message data if
> it is supposed to be skipped.  Once this is done, it looks
> at "in_tag" to determine what to do next, and if it's READY
> it sets "in_tag" to the next byte on input, and prepares
> for receiving the block of data that follows accordingly.
>
>
> OK, now coming back to the issue at hand.  In try_read(),
> if the current "in_tag" is MSG, a message is expected on
> input, and read_partial_message() is called.  If that
> returns 0 (or negative), try_read() completes and returns
> to its caller.  Otherwise, if the connection's "in_tag"
> is READY, try_read() jumps back to the top again, to
> continue reading the next block of data on the input.
>
> Currently, if read_partial_message() ever finds that the
> incoming message should be skipped, it indicates that
> by setting the negated number of bytes to be skipped in
> the connection's "in_base_pos" field, and setting its
> "in_tag" to READY.  HOWEVER, read_partial_message()
> currently returns 0 in this case, which causes its
> caller, try_read(), to quit reading.
>
>
> This fix changes the two places in read_partial_message()
> that arrange to skip the current incoming message so they
> return 1 rather than 0.  The result is that when control
> returns to try_read(), it will return to its top.  That
> will cause the incoming message to be discarded, *and*
> will cause the next block of incoming data to be processed.
>
> Without this change, try_write() will get called again
> before try_read() gets a chance to discard the current
> message and read the next one.
>
>
> In other words...
>
> Looks good.
>
> Reviewed-by: Alex Elder <elder@linaro.org>

Thanks for the review, Alex!  These walkthroughs of yours on the list
have been extremely helpful at times ;)

                Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message
  2016-02-20 20:21     ` Ilya Dryomov
@ 2016-02-20 20:38       ` Alex Elder
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Elder @ 2016-02-20 20:38 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development, Varada Kari

On 02/20/2016 02:21 PM, Ilya Dryomov wrote:
>> > So...  There are two places in try_read_message() where
>> > incoming messages should be discarded, and they represent:
>> > receipt of a duplicate (re-sent) message; receipt of an
>> > unexpected message type; receipt of a response message
>> > that was not expected; or receipt of a response that was
>> > too large for the buffer that had been allocated to hold it.
> Just a note: it's less about the preallocated buffer being too small
> part (in theory we could allocate everything in alloc_msg(), although
> that would be stupid) and more about the "no outstanding request is
> waiting for that response" part.  Stray and/or duplicate replies from
> the OSDs are expected in various cases.

Agreed.  That one case (buffer too large) is anomalous.  But
the code *does* say "skip the message" in that case, which
is why I mentioned it.

Like I said before, when it's been a while since I've reviewed
code I need to go immerse myself in it for a little while.  And
while I'm at it, I figure I might as well explain it all for the
benefit of other people.  This is especially true for these tiny
patches--sometimes the smaller the patch, the bigger the context
needed to understand why it fixes a problem.

					-Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] libceph: use the right footer size when skipping a message
  2016-02-20 20:05     ` Ilya Dryomov
  2016-02-20 20:15       ` Alex Elder
@ 2016-02-21 14:31       ` Ilya Dryomov
  2016-02-21 14:35         ` [PATCH 4/4] libceph: use sizeof_footer() more Ilya Dryomov
  1 sibling, 1 reply; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-21 14:31 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development, Varada Kari

On Sat, Feb 20, 2016 at 9:05 PM, Ilya Dryomov <idryomov@gmail.com> wrote:
> On Sat, Feb 20, 2016 at 7:44 PM, Alex Elder <elder@ieee.org> wrote:
>> +On 02/20/2016 10:45 AM, Ilya Dryomov wrote:
>>> ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
>>> Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.
>>
>> This looks good, but I have a few suggestions.
>>
>> You could done some factoring in prepare_write_message_footer()
>> to use the new function for setting iov_len and updating
>> out_kvec_bytes.
>
> I considered it, but we'd still need the if on MSG_AUTH bit, so
> I decided it wasn't worth it.  TBH I'm more inclined to rip this
> old_footer stuff entirely - it's supported since v0.55, that's
> pre-bobtail...

What I missed yesterday was that using sizeof_footer() makes it
possible to use con_out_kvec_add().  I updated 4/4 and also pulled the
need_sign -> sizeof_footer() hunk from 2/4 into it.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/4] libceph: use sizeof_footer() more
  2016-02-21 14:31       ` Ilya Dryomov
@ 2016-02-21 14:35         ` Ilya Dryomov
  2016-02-21 15:08           ` Alex Elder
  0 siblings, 1 reply; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-21 14:35 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel, Varada Kari

Don't open-code sizeof_footer() in read_partial_message() and
ceph_msg_revoke().  Also, after switching to sizeof_footer(), it's now
possible to use con_out_kvec_add() in prepare_write_message_footer().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 net/ceph/messenger.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 9382619a405b..d9681bc839c7 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1221,25 +1221,19 @@ static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
 static void prepare_write_message_footer(struct ceph_connection *con)
 {
 	struct ceph_msg *m = con->out_msg;
-	int v = con->out_kvec_left;
 
 	m->footer.flags |= CEPH_MSG_FOOTER_COMPLETE;
 
 	dout("prepare_write_message_footer %p\n", con);
-	con->out_kvec[v].iov_base = &m->footer;
+	con_out_kvec_add(con, sizeof_footer(con), &m->footer);
 	if (con->peer_features & CEPH_FEATURE_MSG_AUTH) {
 		if (con->ops->sign_message)
 			con->ops->sign_message(m);
 		else
 			m->footer.sig = 0;
-		con->out_kvec[v].iov_len = sizeof(m->footer);
-		con->out_kvec_bytes += sizeof(m->footer);
 	} else {
 		m->old_footer.flags = m->footer.flags;
-		con->out_kvec[v].iov_len = sizeof(m->old_footer);
-		con->out_kvec_bytes += sizeof(m->old_footer);
 	}
-	con->out_kvec_left++;
 	con->out_more = m->more_to_follow;
 	con->out_msg_done = true;
 }
@@ -2409,11 +2403,7 @@ static int read_partial_message(struct ceph_connection *con)
 	}
 
 	/* footer */
-	if (need_sign)
-		size = sizeof(m->footer);
-	else
-		size = sizeof(m->old_footer);
-
+	size = sizeof_footer(con);
 	end += size;
 	ret = read_partial(con, end, size, &m->footer);
 	if (ret <= 0)
@@ -3089,10 +3079,7 @@ void ceph_msg_revoke(struct ceph_msg *msg)
 			con->out_skip += con_out_kvec_skip(con);
 		} else {
 			BUG_ON(!msg->data_length);
-			if (con->peer_features & CEPH_FEATURE_MSG_AUTH)
-				con->out_skip += sizeof(msg->footer);
-			else
-				con->out_skip += sizeof(msg->old_footer);
+			con->out_skip += sizeof_footer(con);
 		}
 		/* data, middle, front */
 		if (msg->data_length)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] libceph: use sizeof_footer() more
  2016-02-21 14:35         ` [PATCH 4/4] libceph: use sizeof_footer() more Ilya Dryomov
@ 2016-02-21 15:08           ` Alex Elder
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Elder @ 2016-02-21 15:08 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel, Varada Kari

On 02/21/2016 08:35 AM, Ilya Dryomov wrote:
> Don't open-code sizeof_footer() in read_partial_message() and
> ceph_msg_revoke().  Also, after switching to sizeof_footer(), it's now
> possible to use con_out_kvec_add() in prepare_write_message_footer().
> 
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

There you go, that's a nice result!

Reviewed-by: Alex Elder <elder@linaro.org>

> ---
>  net/ceph/messenger.c | 19 +++----------------
>  1 file changed, 3 insertions(+), 16 deletions(-)
> 
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index 9382619a405b..d9681bc839c7 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -1221,25 +1221,19 @@ static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
>  static void prepare_write_message_footer(struct ceph_connection *con)
>  {
>  	struct ceph_msg *m = con->out_msg;
> -	int v = con->out_kvec_left;
>  
>  	m->footer.flags |= CEPH_MSG_FOOTER_COMPLETE;
>  
>  	dout("prepare_write_message_footer %p\n", con);
> -	con->out_kvec[v].iov_base = &m->footer;
> +	con_out_kvec_add(con, sizeof_footer(con), &m->footer);
>  	if (con->peer_features & CEPH_FEATURE_MSG_AUTH) {
>  		if (con->ops->sign_message)
>  			con->ops->sign_message(m);
>  		else
>  			m->footer.sig = 0;
> -		con->out_kvec[v].iov_len = sizeof(m->footer);
> -		con->out_kvec_bytes += sizeof(m->footer);
>  	} else {
>  		m->old_footer.flags = m->footer.flags;
> -		con->out_kvec[v].iov_len = sizeof(m->old_footer);
> -		con->out_kvec_bytes += sizeof(m->old_footer);
>  	}
> -	con->out_kvec_left++;
>  	con->out_more = m->more_to_follow;
>  	con->out_msg_done = true;
>  }
> @@ -2409,11 +2403,7 @@ static int read_partial_message(struct ceph_connection *con)
>  	}
>  
>  	/* footer */
> -	if (need_sign)
> -		size = sizeof(m->footer);
> -	else
> -		size = sizeof(m->old_footer);
> -
> +	size = sizeof_footer(con);
>  	end += size;
>  	ret = read_partial(con, end, size, &m->footer);
>  	if (ret <= 0)
> @@ -3089,10 +3079,7 @@ void ceph_msg_revoke(struct ceph_msg *msg)
>  			con->out_skip += con_out_kvec_skip(con);
>  		} else {
>  			BUG_ON(!msg->data_length);
> -			if (con->peer_features & CEPH_FEATURE_MSG_AUTH)
> -				con->out_skip += sizeof(msg->footer);
> -			else
> -				con->out_skip += sizeof(msg->old_footer);
> +			con->out_skip += sizeof_footer(con);
>  		}
>  		/* data, middle, front */
>  		if (msg->data_length)
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-02-21 17:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov
2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov
2016-02-20 18:28   ` Alex Elder
2016-02-20 20:21     ` Ilya Dryomov
2016-02-20 20:38       ` Alex Elder
2016-02-20 16:45 ` [PATCH 2/4] libceph: use the right footer size " Ilya Dryomov
2016-02-20 18:44   ` Alex Elder
2016-02-20 20:05     ` Ilya Dryomov
2016-02-20 20:15       ` Alex Elder
2016-02-21 14:31       ` Ilya Dryomov
2016-02-21 14:35         ` [PATCH 4/4] libceph: use sizeof_footer() more Ilya Dryomov
2016-02-21 15:08           ` Alex Elder
2016-02-20 16:45 ` [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings Ilya Dryomov
2016-02-20 18:46   ` Alex Elder
2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov
2016-02-20 18:47   ` Alex Elder
2016-02-20 19:51     ` Ilya Dryomov
2016-02-20 19:56   ` Alex Elder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox