* [PATCH 0/4] libceph: message skipping fixes
@ 2016-02-20 16:45 Ilya Dryomov
2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw)
To: ceph-devel; +Cc: Varada Kari
Hello,
Patch 1 hopefully fixes the problem that Varada ran into [1].
Patches 2 and 3 are surrounding fixes, but still important enough to
backport.
[1] http://www.spinics.net/lists/ceph-devel/msg28712.html
Thanks,
Ilya
Ilya Dryomov (4):
libceph: don't bail early from try_read() when skipping a message
libceph: use the right footer size when skipping a message
libceph: don't spam dmesg with stray reply warnings
libceph: use sizeof_footer() in ceph_msg_revoke()
net/ceph/messenger.c | 26 +++++++++++++-------------
net/ceph/osd_client.c | 4 ++--
2 files changed, 15 insertions(+), 15 deletions(-)
--
2.4.3
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message 2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov @ 2016-02-20 16:45 ` Ilya Dryomov 2016-02-20 18:28 ` Alex Elder 2016-02-20 16:45 ` [PATCH 2/4] libceph: use the right footer size " Ilya Dryomov ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw) To: ceph-devel; +Cc: Varada Kari The contract between try_read() and try_write() is that when called each processes as much data as possible. When instructed by osd_client to skip a message, try_read() is violating this contract by returning after receiving and discarding a single message instead of checking for more. try_write() then gets a chance to write out more requests, generating more replies/skips for try_read() to handle, forcing the messenger into a starvation loop. Cc: stable@vger.kernel.org # 3.10+ Reported-by: Varada Kari <Varada.Kari@sandisk.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Varada Kari <Varada.Kari@sandisk.com> --- net/ceph/messenger.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 9cfedf565f5b..fec20819a5ea 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -2337,7 +2337,7 @@ static int read_partial_message(struct ceph_connection *con) con->in_base_pos = -front_len - middle_len - data_len - sizeof(m->footer); con->in_tag = CEPH_MSGR_TAG_READY; - return 0; + return 1; } else if ((s64)seq - (s64)con->in_seq > 1) { pr_err("read_partial_message bad seq %lld expected %lld\n", seq, con->in_seq + 1); @@ -2363,7 +2363,7 @@ static int read_partial_message(struct ceph_connection *con) sizeof(m->footer); con->in_tag = CEPH_MSGR_TAG_READY; con->in_seq++; - return 0; + return 1; } BUG_ON(!con->in_msg); -- 2.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message 2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov @ 2016-02-20 18:28 ` Alex Elder 2016-02-20 20:21 ` Ilya Dryomov 0 siblings, 1 reply; 18+ messages in thread From: Alex Elder @ 2016-02-20 18:28 UTC (permalink / raw) To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari On 02/20/2016 10:45 AM, Ilya Dryomov wrote: > The contract between try_read() and try_write() is that when called > each processes as much data as possible. When instructed by osd_client > to skip a message, try_read() is violating this contract by returning > after receiving and discarding a single message instead of checking for > more. try_write() then gets a chance to write out more requests, > generating more replies/skips for try_read() to handle, forcing the > messenger into a starvation loop. This makes sense. If (for example) two messages are pending (either arriving or already arrived and buffered), and the first is going to be discarded, we should still process the second one to the extent possible. I'm going to review the way this stuff works. (I always have to do that anyway it seems, so I document it in the process...) There are two places when read_partial_message() will decide to discard a message. One is if a completely valid message header arrives, but the sequence number is wrong (this means re-sent message has arrived). The second place the messenger discards an incoming message is if the connection's message buffer allocation method indicates the message should be skipped. The MDS msg_alloc method never calls for a buffer to be skipped. But the mon client and the OSD client do. The mon client and OSD client issue requests to their servers which are expected to receive response messages. When a mon or OSD request is set up, a buffer to hold its corresponding response message is preallocated. So when receiving a message on a mon or OSD connection, the message allocation function used by the messenger looks up and returns that preallocated buffer. If a response message arrives from a mon server or OSD was not expected (i.e., no outstanding request is waiting for that response) the incoming response message should be skipped. Additionally, an OSD client expects to receive only a few types of messages. It peeks at an incoming message to determine how it should be handled. If an incoming message is not of a type recognized by the OSD client, the message should be skipped. Even if an incoming message is an OSD response that was expected, it is possible that the buffer allocated to receive the response is too small for the incoming message; and in that case the incoming message should be skipped. So... There are two places in try_read_message() where incoming messages should be discarded, and they represent: receipt of a duplicate (re-sent) message; receipt of an unexpected message type; receipt of a response message that was not expected; or receipt of a response that was too large for the buffer that had been allocated to hold it. A stream of data coming in a connection consists of blocks of data, each of which begins with a "tag" that indicates what is contained within the block that follows. A connection's "in_tag" indicates what we're expecting to see next on input, and if it's READY it means we've finished reading a previous block (and so the next byte should be a tag). If a connection should discard some incoming data, this is represented by having its "in_base_pos" value be negative. The value of "in_base_pos" in this case is negative the number of bytes to be discarded. Whenever a negative value is recorded in "in_base_pos", the connection's "in_tag" field is set to READY, indicating that immediately following the discarded bytes is expected to be a tag. When a connection is in OPEN state, the first thing try_read() does is discard the incoming message data if it is supposed to be skipped. Once this is done, it looks at "in_tag" to determine what to do next, and if it's READY it sets "in_tag" to the next byte on input, and prepares for receiving the block of data that follows accordingly. OK, now coming back to the issue at hand. In try_read(), if the current "in_tag" is MSG, a message is expected on input, and read_partial_message() is called. If that returns 0 (or negative), try_read() completes and returns to its caller. Otherwise, if the connection's "in_tag" is READY, try_read() jumps back to the top again, to continue reading the next block of data on the input. Currently, if read_partial_message() ever finds that the incoming message should be skipped, it indicates that by setting the negated number of bytes to be skipped in the connection's "in_base_pos" field, and setting its "in_tag" to READY. HOWEVER, read_partial_message() currently returns 0 in this case, which causes its caller, try_read(), to quit reading. This fix changes the two places in read_partial_message() that arrange to skip the current incoming message so they return 1 rather than 0. The result is that when control returns to try_read(), it will return to its top. That will cause the incoming message to be discarded, *and* will cause the next block of incoming data to be processed. Without this change, try_write() will get called again before try_read() gets a chance to discard the current message and read the next one. In other words... Looks good. Reviewed-by: Alex Elder <elder@linaro.org> > Cc: stable@vger.kernel.org # 3.10+ > Reported-by: Varada Kari <Varada.Kari@sandisk.com> > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > Tested-by: Varada Kari <Varada.Kari@sandisk.com> > --- > net/ceph/messenger.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index 9cfedf565f5b..fec20819a5ea 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -2337,7 +2337,7 @@ static int read_partial_message(struct ceph_connection *con) > con->in_base_pos = -front_len - middle_len - data_len - > sizeof(m->footer); > con->in_tag = CEPH_MSGR_TAG_READY; > - return 0; > + return 1; > } else if ((s64)seq - (s64)con->in_seq > 1) { > pr_err("read_partial_message bad seq %lld expected %lld\n", > seq, con->in_seq + 1); > @@ -2363,7 +2363,7 @@ static int read_partial_message(struct ceph_connection *con) > sizeof(m->footer); > con->in_tag = CEPH_MSGR_TAG_READY; > con->in_seq++; > - return 0; > + return 1; > } > > BUG_ON(!con->in_msg); > k ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message 2016-02-20 18:28 ` Alex Elder @ 2016-02-20 20:21 ` Ilya Dryomov 2016-02-20 20:38 ` Alex Elder 0 siblings, 1 reply; 18+ messages in thread From: Ilya Dryomov @ 2016-02-20 20:21 UTC (permalink / raw) To: Alex Elder; +Cc: Ceph Development, Varada Kari On Sat, Feb 20, 2016 at 7:28 PM, Alex Elder <elder@ieee.org> wrote: > On 02/20/2016 10:45 AM, Ilya Dryomov wrote: >> The contract between try_read() and try_write() is that when called >> each processes as much data as possible. When instructed by osd_client >> to skip a message, try_read() is violating this contract by returning >> after receiving and discarding a single message instead of checking for >> more. try_write() then gets a chance to write out more requests, >> generating more replies/skips for try_read() to handle, forcing the >> messenger into a starvation loop. > > This makes sense. If (for example) two messages are pending > (either arriving or already arrived and buffered), and the > first is going to be discarded, we should still process the > second one to the extent possible. > > > I'm going to review the way this stuff works. (I always have > to do that anyway it seems, so I document it in the process...) > > > There are two places when read_partial_message() will decide > to discard a message. One is if a completely valid message > header arrives, but the sequence number is wrong (this means > re-sent message has arrived). The second place the messenger > discards an incoming message is if the connection's message > buffer allocation method indicates the message should be > skipped. The MDS msg_alloc method never calls for a buffer > to be skipped. But the mon client and the OSD client do. > > The mon client and OSD client issue requests to their > servers which are expected to receive response messages. > When a mon or OSD request is set up, a buffer to hold its > corresponding response message is preallocated. So when > receiving a message on a mon or OSD connection, the message > allocation function used by the messenger looks up and > returns that preallocated buffer. If a response message > arrives from a mon server or OSD was not expected (i.e., > no outstanding request is waiting for that response) the > incoming response message should be skipped. > > Additionally, an OSD client expects to receive only a few > types of messages. It peeks at an incoming message to > determine how it should be handled. If an incoming message > is not of a type recognized by the OSD client, the message > should be skipped. Even if an incoming message is an OSD > response that was expected, it is possible that the buffer > allocated to receive the response is too small for the > incoming message; and in that case the incoming message > should be skipped. > > > So... There are two places in try_read_message() where > incoming messages should be discarded, and they represent: > receipt of a duplicate (re-sent) message; receipt of an > unexpected message type; receipt of a response message > that was not expected; or receipt of a response that was > too large for the buffer that had been allocated to hold it. Just a note: it's less about the preallocated buffer being too small part (in theory we could allocate everything in alloc_msg(), although that would be stupid) and more about the "no outstanding request is waiting for that response" part. Stray and/or duplicate replies from the OSDs are expected in various cases. > > > A stream of data coming in a connection consists of blocks > of data, each of which begins with a "tag" that indicates > what is contained within the block that follows. A > connection's "in_tag" indicates what we're expecting to > see next on input, and if it's READY it means we've > finished reading a previous block (and so the next > byte should be a tag). > > If a connection should discard some incoming data, this is > represented by having its "in_base_pos" value be negative. > The value of "in_base_pos" in this case is negative the > number of bytes to be discarded. Whenever a negative > value is recorded in "in_base_pos", the connection's > "in_tag" field is set to READY, indicating that immediately > following the discarded bytes is expected to be a tag. > > When a connection is in OPEN state, the first thing > try_read() does is discard the incoming message data if > it is supposed to be skipped. Once this is done, it looks > at "in_tag" to determine what to do next, and if it's READY > it sets "in_tag" to the next byte on input, and prepares > for receiving the block of data that follows accordingly. > > > OK, now coming back to the issue at hand. In try_read(), > if the current "in_tag" is MSG, a message is expected on > input, and read_partial_message() is called. If that > returns 0 (or negative), try_read() completes and returns > to its caller. Otherwise, if the connection's "in_tag" > is READY, try_read() jumps back to the top again, to > continue reading the next block of data on the input. > > Currently, if read_partial_message() ever finds that the > incoming message should be skipped, it indicates that > by setting the negated number of bytes to be skipped in > the connection's "in_base_pos" field, and setting its > "in_tag" to READY. HOWEVER, read_partial_message() > currently returns 0 in this case, which causes its > caller, try_read(), to quit reading. > > > This fix changes the two places in read_partial_message() > that arrange to skip the current incoming message so they > return 1 rather than 0. The result is that when control > returns to try_read(), it will return to its top. That > will cause the incoming message to be discarded, *and* > will cause the next block of incoming data to be processed. > > Without this change, try_write() will get called again > before try_read() gets a chance to discard the current > message and read the next one. > > > In other words... > > Looks good. > > Reviewed-by: Alex Elder <elder@linaro.org> Thanks for the review, Alex! These walkthroughs of yours on the list have been extremely helpful at times ;) Ilya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message 2016-02-20 20:21 ` Ilya Dryomov @ 2016-02-20 20:38 ` Alex Elder 0 siblings, 0 replies; 18+ messages in thread From: Alex Elder @ 2016-02-20 20:38 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Ceph Development, Varada Kari On 02/20/2016 02:21 PM, Ilya Dryomov wrote: >> > So... There are two places in try_read_message() where >> > incoming messages should be discarded, and they represent: >> > receipt of a duplicate (re-sent) message; receipt of an >> > unexpected message type; receipt of a response message >> > that was not expected; or receipt of a response that was >> > too large for the buffer that had been allocated to hold it. > Just a note: it's less about the preallocated buffer being too small > part (in theory we could allocate everything in alloc_msg(), although > that would be stupid) and more about the "no outstanding request is > waiting for that response" part. Stray and/or duplicate replies from > the OSDs are expected in various cases. Agreed. That one case (buffer too large) is anomalous. But the code *does* say "skip the message" in that case, which is why I mentioned it. Like I said before, when it's been a while since I've reviewed code I need to go immerse myself in it for a little while. And while I'm at it, I figure I might as well explain it all for the benefit of other people. This is especially true for these tiny patches--sometimes the smaller the patch, the bigger the context needed to understand why it fixes a problem. -Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/4] libceph: use the right footer size when skipping a message 2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov 2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov @ 2016-02-20 16:45 ` Ilya Dryomov 2016-02-20 18:44 ` Alex Elder 2016-02-20 16:45 ` [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings Ilya Dryomov 2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov 3 siblings, 1 reply; 18+ messages in thread From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw) To: ceph-devel; +Cc: Varada Kari ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13. Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated. Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- net/ceph/messenger.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index fec20819a5ea..dd7c1b7f932b 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -2284,6 +2284,13 @@ static int read_partial_msg_data(struct ceph_connection *con) return 1; /* must return > 0 to indicate success */ } +static size_t sizeof_footer(struct ceph_connection *con) +{ + return (con->peer_features & CEPH_FEATURE_MSG_AUTH) ? + sizeof(struct ceph_msg_footer) : + sizeof(struct ceph_msg_footer_old); +} + /* * read (part of) a message. */ @@ -2335,7 +2342,7 @@ static int read_partial_message(struct ceph_connection *con) ceph_pr_addr(&con->peer_addr.in_addr), seq, con->in_seq + 1); con->in_base_pos = -front_len - middle_len - data_len - - sizeof(m->footer); + sizeof_footer(con); con->in_tag = CEPH_MSGR_TAG_READY; return 1; } else if ((s64)seq - (s64)con->in_seq > 1) { @@ -2360,7 +2367,7 @@ static int read_partial_message(struct ceph_connection *con) /* skip this message */ dout("alloc_msg said skip message\n"); con->in_base_pos = -front_len - middle_len - data_len - - sizeof(m->footer); + sizeof_footer(con); con->in_tag = CEPH_MSGR_TAG_READY; con->in_seq++; return 1; @@ -2402,11 +2409,7 @@ static int read_partial_message(struct ceph_connection *con) } /* footer */ - if (need_sign) - size = sizeof(m->footer); - else - size = sizeof(m->old_footer); - + size = sizeof_footer(con); end += size; ret = read_partial(con, end, size, &m->footer); if (ret <= 0) -- 2.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] libceph: use the right footer size when skipping a message 2016-02-20 16:45 ` [PATCH 2/4] libceph: use the right footer size " Ilya Dryomov @ 2016-02-20 18:44 ` Alex Elder 2016-02-20 20:05 ` Ilya Dryomov 0 siblings, 1 reply; 18+ messages in thread From: Alex Elder @ 2016-02-20 18:44 UTC (permalink / raw) To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari +On 02/20/2016 10:45 AM, Ilya Dryomov wrote: > ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13. > Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated. This looks good, but I have a few suggestions. You could done some factoring in prepare_write_message_footer() to use the new function for setting iov_len and updating out_kvec_bytes. Consider my suggestions, but otherwise I believe this is correct. Reviewed-by: Alex Elder <elder@linaro.org> > Cc: stable@vger.kernel.org # 3.19+ > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > net/ceph/messenger.c | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index fec20819a5ea..dd7c1b7f932b 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -2284,6 +2284,13 @@ static int read_partial_msg_data(struct ceph_connection *con) > return 1; /* must return > 0 to indicate success */ > } > > +static size_t sizeof_footer(struct ceph_connection *con) I understand why this is named this way. But it's supplying a connection as argument, but a footer is a message attribute. So I might call this con_msg_footer_size(). > +{ > + return (con->peer_features & CEPH_FEATURE_MSG_AUTH) ? > + sizeof(struct ceph_msg_footer) : > + sizeof(struct ceph_msg_footer_old); > +} > + > /* > * read (part of) a message. > */ > @@ -2335,7 +2342,7 @@ static int read_partial_message(struct ceph_connection *con) > ceph_pr_addr(&con->peer_addr.in_addr), > seq, con->in_seq + 1); > con->in_base_pos = -front_len - middle_len - data_len - > - sizeof(m->footer); > + sizeof_footer(con); > con->in_tag = CEPH_MSGR_TAG_READY; > return 1; > } else if ((s64)seq - (s64)con->in_seq > 1) { > @@ -2360,7 +2367,7 @@ static int read_partial_message(struct ceph_connection *con) > /* skip this message */ > dout("alloc_msg said skip message\n"); > con->in_base_pos = -front_len - middle_len - data_len - > - sizeof(m->footer); > + sizeof_footer(con); > con->in_tag = CEPH_MSGR_TAG_READY; > con->in_seq++; > return 1; > @@ -2402,11 +2409,7 @@ static int read_partial_message(struct ceph_connection *con) > } > > /* footer */ > - if (need_sign) > - size = sizeof(m->footer); > - else > - size = sizeof(m->old_footer); > - > + size = sizeof_footer(con); > end += size; > ret = read_partial(con, end, size, &m->footer); > if (ret <= 0) > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] libceph: use the right footer size when skipping a message 2016-02-20 18:44 ` Alex Elder @ 2016-02-20 20:05 ` Ilya Dryomov 2016-02-20 20:15 ` Alex Elder 2016-02-21 14:31 ` Ilya Dryomov 0 siblings, 2 replies; 18+ messages in thread From: Ilya Dryomov @ 2016-02-20 20:05 UTC (permalink / raw) To: Alex Elder; +Cc: Ceph Development, Varada Kari On Sat, Feb 20, 2016 at 7:44 PM, Alex Elder <elder@ieee.org> wrote: > +On 02/20/2016 10:45 AM, Ilya Dryomov wrote: >> ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13. >> Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated. > > This looks good, but I have a few suggestions. > > You could done some factoring in prepare_write_message_footer() > to use the new function for setting iov_len and updating > out_kvec_bytes. I considered it, but we'd still need the if on MSG_AUTH bit, so I decided it wasn't worth it. TBH I'm more inclined to rip this old_footer stuff entirely - it's supported since v0.55, that's pre-bobtail... > > Consider my suggestions, but otherwise I believe this > is correct. > > Reviewed-by: Alex Elder <elder@linaro.org> > >> Cc: stable@vger.kernel.org # 3.19+ >> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> >> --- >> net/ceph/messenger.c | 17 ++++++++++------- >> 1 file changed, 10 insertions(+), 7 deletions(-) >> >> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c >> index fec20819a5ea..dd7c1b7f932b 100644 >> --- a/net/ceph/messenger.c >> +++ b/net/ceph/messenger.c >> @@ -2284,6 +2284,13 @@ static int read_partial_msg_data(struct ceph_connection *con) >> return 1; /* must return > 0 to indicate success */ >> } >> >> +static size_t sizeof_footer(struct ceph_connection *con) > > I understand why this is named this way. But it's supplying a > connection as argument, but a footer is a message attribute. > So I might call this con_msg_footer_size(). The *size* of footer is a connection attribute though ;) Thanks, Ilya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] libceph: use the right footer size when skipping a message 2016-02-20 20:05 ` Ilya Dryomov @ 2016-02-20 20:15 ` Alex Elder 2016-02-21 14:31 ` Ilya Dryomov 1 sibling, 0 replies; 18+ messages in thread From: Alex Elder @ 2016-02-20 20:15 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Ceph Development, Varada Kari On 02/20/2016 02:05 PM, Ilya Dryomov wrote: >>> >> >>> >> +static size_t sizeof_footer(struct ceph_connection *con) >> > >> > I understand why this is named this way. But it's supplying a >> > connection as argument, but a footer is a message attribute. >> > So I might call this con_msg_footer_size(). > The *size* of footer is a connection attribute though ;) Yeah, I know. That's why I suggested the con_msg_***(). But in any case, the code looks correct regardless of the name. -Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] libceph: use the right footer size when skipping a message 2016-02-20 20:05 ` Ilya Dryomov 2016-02-20 20:15 ` Alex Elder @ 2016-02-21 14:31 ` Ilya Dryomov 2016-02-21 14:35 ` [PATCH 4/4] libceph: use sizeof_footer() more Ilya Dryomov 1 sibling, 1 reply; 18+ messages in thread From: Ilya Dryomov @ 2016-02-21 14:31 UTC (permalink / raw) To: Alex Elder; +Cc: Ceph Development, Varada Kari On Sat, Feb 20, 2016 at 9:05 PM, Ilya Dryomov <idryomov@gmail.com> wrote: > On Sat, Feb 20, 2016 at 7:44 PM, Alex Elder <elder@ieee.org> wrote: >> +On 02/20/2016 10:45 AM, Ilya Dryomov wrote: >>> ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13. >>> Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated. >> >> This looks good, but I have a few suggestions. >> >> You could done some factoring in prepare_write_message_footer() >> to use the new function for setting iov_len and updating >> out_kvec_bytes. > > I considered it, but we'd still need the if on MSG_AUTH bit, so > I decided it wasn't worth it. TBH I'm more inclined to rip this > old_footer stuff entirely - it's supported since v0.55, that's > pre-bobtail... What I missed yesterday was that using sizeof_footer() makes it possible to use con_out_kvec_add(). I updated 4/4 and also pulled the need_sign -> sizeof_footer() hunk from 2/4 into it. Thanks, Ilya ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] libceph: use sizeof_footer() more 2016-02-21 14:31 ` Ilya Dryomov @ 2016-02-21 14:35 ` Ilya Dryomov 2016-02-21 15:08 ` Alex Elder 0 siblings, 1 reply; 18+ messages in thread From: Ilya Dryomov @ 2016-02-21 14:35 UTC (permalink / raw) To: Alex Elder; +Cc: ceph-devel, Varada Kari Don't open-code sizeof_footer() in read_partial_message() and ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now possible to use con_out_kvec_add() in prepare_write_message_footer(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- net/ceph/messenger.c | 19 +++---------------- 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 9382619a405b..d9681bc839c7 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1221,25 +1221,19 @@ static void prepare_message_data(struct ceph_msg *msg, u32 data_len) static void prepare_write_message_footer(struct ceph_connection *con) { struct ceph_msg *m = con->out_msg; - int v = con->out_kvec_left; m->footer.flags |= CEPH_MSG_FOOTER_COMPLETE; dout("prepare_write_message_footer %p\n", con); - con->out_kvec[v].iov_base = &m->footer; + con_out_kvec_add(con, sizeof_footer(con), &m->footer); if (con->peer_features & CEPH_FEATURE_MSG_AUTH) { if (con->ops->sign_message) con->ops->sign_message(m); else m->footer.sig = 0; - con->out_kvec[v].iov_len = sizeof(m->footer); - con->out_kvec_bytes += sizeof(m->footer); } else { m->old_footer.flags = m->footer.flags; - con->out_kvec[v].iov_len = sizeof(m->old_footer); - con->out_kvec_bytes += sizeof(m->old_footer); } - con->out_kvec_left++; con->out_more = m->more_to_follow; con->out_msg_done = true; } @@ -2409,11 +2403,7 @@ static int read_partial_message(struct ceph_connection *con) } /* footer */ - if (need_sign) - size = sizeof(m->footer); - else - size = sizeof(m->old_footer); - + size = sizeof_footer(con); end += size; ret = read_partial(con, end, size, &m->footer); if (ret <= 0) @@ -3089,10 +3079,7 @@ void ceph_msg_revoke(struct ceph_msg *msg) con->out_skip += con_out_kvec_skip(con); } else { BUG_ON(!msg->data_length); - if (con->peer_features & CEPH_FEATURE_MSG_AUTH) - con->out_skip += sizeof(msg->footer); - else - con->out_skip += sizeof(msg->old_footer); + con->out_skip += sizeof_footer(con); } /* data, middle, front */ if (msg->data_length) -- 2.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] libceph: use sizeof_footer() more 2016-02-21 14:35 ` [PATCH 4/4] libceph: use sizeof_footer() more Ilya Dryomov @ 2016-02-21 15:08 ` Alex Elder 0 siblings, 0 replies; 18+ messages in thread From: Alex Elder @ 2016-02-21 15:08 UTC (permalink / raw) To: Ilya Dryomov; +Cc: ceph-devel, Varada Kari On 02/21/2016 08:35 AM, Ilya Dryomov wrote: > Don't open-code sizeof_footer() in read_partial_message() and > ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now > possible to use con_out_kvec_add() in prepare_write_message_footer(). > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> There you go, that's a nice result! Reviewed-by: Alex Elder <elder@linaro.org> > --- > net/ceph/messenger.c | 19 +++---------------- > 1 file changed, 3 insertions(+), 16 deletions(-) > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index 9382619a405b..d9681bc839c7 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -1221,25 +1221,19 @@ static void prepare_message_data(struct ceph_msg *msg, u32 data_len) > static void prepare_write_message_footer(struct ceph_connection *con) > { > struct ceph_msg *m = con->out_msg; > - int v = con->out_kvec_left; > > m->footer.flags |= CEPH_MSG_FOOTER_COMPLETE; > > dout("prepare_write_message_footer %p\n", con); > - con->out_kvec[v].iov_base = &m->footer; > + con_out_kvec_add(con, sizeof_footer(con), &m->footer); > if (con->peer_features & CEPH_FEATURE_MSG_AUTH) { > if (con->ops->sign_message) > con->ops->sign_message(m); > else > m->footer.sig = 0; > - con->out_kvec[v].iov_len = sizeof(m->footer); > - con->out_kvec_bytes += sizeof(m->footer); > } else { > m->old_footer.flags = m->footer.flags; > - con->out_kvec[v].iov_len = sizeof(m->old_footer); > - con->out_kvec_bytes += sizeof(m->old_footer); > } > - con->out_kvec_left++; > con->out_more = m->more_to_follow; > con->out_msg_done = true; > } > @@ -2409,11 +2403,7 @@ static int read_partial_message(struct ceph_connection *con) > } > > /* footer */ > - if (need_sign) > - size = sizeof(m->footer); > - else > - size = sizeof(m->old_footer); > - > + size = sizeof_footer(con); > end += size; > ret = read_partial(con, end, size, &m->footer); > if (ret <= 0) > @@ -3089,10 +3079,7 @@ void ceph_msg_revoke(struct ceph_msg *msg) > con->out_skip += con_out_kvec_skip(con); > } else { > BUG_ON(!msg->data_length); > - if (con->peer_features & CEPH_FEATURE_MSG_AUTH) > - con->out_skip += sizeof(msg->footer); > - else > - con->out_skip += sizeof(msg->old_footer); > + con->out_skip += sizeof_footer(con); > } > /* data, middle, front */ > if (msg->data_length) > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings 2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov 2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov 2016-02-20 16:45 ` [PATCH 2/4] libceph: use the right footer size " Ilya Dryomov @ 2016-02-20 16:45 ` Ilya Dryomov 2016-02-20 18:46 ` Alex Elder 2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov 3 siblings, 1 reply; 18+ messages in thread From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw) To: ceph-devel; +Cc: Varada Kari Commit d15f9d694b77 ("libceph: check data_len in ->alloc_msg()") mistakenly bumped the log level on the "tid %llu unknown, skipping" message. Turn it back into a dout() - stray replies are perfectly normal when OSDs flap, crash, get killed for testing purposes, etc. Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- net/ceph/osd_client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 7c1a5d1734c3..32355d9d0103 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -2890,8 +2890,8 @@ static struct ceph_msg *get_reply(struct ceph_connection *con, mutex_lock(&osdc->request_mutex); req = __lookup_request(osdc, tid); if (!req) { - pr_warn("%s osd%d tid %llu unknown, skipping\n", - __func__, osd->o_osd, tid); + dout("%s osd%d tid %llu unknown, skipping\n", __func__, + osd->o_osd, tid); m = NULL; *skip = 1; goto out; -- 2.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings 2016-02-20 16:45 ` [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings Ilya Dryomov @ 2016-02-20 18:46 ` Alex Elder 0 siblings, 0 replies; 18+ messages in thread From: Alex Elder @ 2016-02-20 18:46 UTC (permalink / raw) To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari On 02/20/2016 10:45 AM, Ilya Dryomov wrote: > Commit d15f9d694b77 ("libceph: check data_len in ->alloc_msg()") > mistakenly bumped the log level on the "tid %llu unknown, skipping" > message. Turn it back into a dout() - stray replies are perfectly > normal when OSDs flap, crash, get killed for testing purposes, etc. > > Cc: stable@vger.kernel.org # 4.3+ > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Looks good. Reviewed-by: Alex Elder <elder@linaro.org> > --- > net/ceph/osd_client.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > index 7c1a5d1734c3..32355d9d0103 100644 > --- a/net/ceph/osd_client.c > +++ b/net/ceph/osd_client.c > @@ -2890,8 +2890,8 @@ static struct ceph_msg *get_reply(struct ceph_connection *con, > mutex_lock(&osdc->request_mutex); > req = __lookup_request(osdc, tid); > if (!req) { > - pr_warn("%s osd%d tid %llu unknown, skipping\n", > - __func__, osd->o_osd, tid); > + dout("%s osd%d tid %llu unknown, skipping\n", __func__, > + osd->o_osd, tid); > m = NULL; > *skip = 1; > goto out; > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() 2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov ` (2 preceding siblings ...) 2016-02-20 16:45 ` [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings Ilya Dryomov @ 2016-02-20 16:45 ` Ilya Dryomov 2016-02-20 18:47 ` Alex Elder 2016-02-20 19:56 ` Alex Elder 3 siblings, 2 replies; 18+ messages in thread From: Ilya Dryomov @ 2016-02-20 16:45 UTC (permalink / raw) To: ceph-devel; +Cc: Varada Kari Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- net/ceph/messenger.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index dd7c1b7f932b..43edf897c9eb 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -3085,10 +3085,7 @@ void ceph_msg_revoke(struct ceph_msg *msg) con->out_skip += con_out_kvec_skip(con); } else { BUG_ON(!msg->data_length); - if (con->peer_features & CEPH_FEATURE_MSG_AUTH) - con->out_skip += sizeof(msg->footer); - else - con->out_skip += sizeof(msg->old_footer); + con->out_skip += sizeof_footer(con); } /* data, middle, front */ if (msg->data_length) -- 2.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() 2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov @ 2016-02-20 18:47 ` Alex Elder 2016-02-20 19:51 ` Ilya Dryomov 2016-02-20 19:56 ` Alex Elder 1 sibling, 1 reply; 18+ messages in thread From: Alex Elder @ 2016-02-20 18:47 UTC (permalink / raw) To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari On 02/20/2016 10:45 AM, Ilya Dryomov wrote: > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Please just reorder your patches and squash this in with your [2/4] patch. -Alex > --- > net/ceph/messenger.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index dd7c1b7f932b..43edf897c9eb 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -3085,10 +3085,7 @@ void ceph_msg_revoke(struct ceph_msg *msg) > con->out_skip += con_out_kvec_skip(con); > } else { > BUG_ON(!msg->data_length); > - if (con->peer_features & CEPH_FEATURE_MSG_AUTH) > - con->out_skip += sizeof(msg->footer); > - else > - con->out_skip += sizeof(msg->old_footer); > + con->out_skip += sizeof_footer(con); > } > /* data, middle, front */ > if (msg->data_length) > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() 2016-02-20 18:47 ` Alex Elder @ 2016-02-20 19:51 ` Ilya Dryomov 0 siblings, 0 replies; 18+ messages in thread From: Ilya Dryomov @ 2016-02-20 19:51 UTC (permalink / raw) To: Alex Elder; +Cc: Ceph Development, Varada Kari On Sat, Feb 20, 2016 at 7:47 PM, Alex Elder <elder@ieee.org> wrote: > On 02/20/2016 10:45 AM, Ilya Dryomov wrote: >> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > > Please just reorder your patches and squash this in with > your [2/4] patch. Can't do - this code in ceph_msg_revoke() is new in 4.5, while 2/4 needs to be backported. Thanks, Ilya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() 2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov 2016-02-20 18:47 ` Alex Elder @ 2016-02-20 19:56 ` Alex Elder 1 sibling, 0 replies; 18+ messages in thread From: Alex Elder @ 2016-02-20 19:56 UTC (permalink / raw) To: Ilya Dryomov, ceph-devel; +Cc: Varada Kari On 02/20/2016 10:45 AM, Ilya Dryomov wrote: > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Looks good. Reviewed-by: Alex Elder <elder@linaro.org> > --- > net/ceph/messenger.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index dd7c1b7f932b..43edf897c9eb 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -3085,10 +3085,7 @@ void ceph_msg_revoke(struct ceph_msg *msg) > con->out_skip += con_out_kvec_skip(con); > } else { > BUG_ON(!msg->data_length); > - if (con->peer_features & CEPH_FEATURE_MSG_AUTH) > - con->out_skip += sizeof(msg->footer); > - else > - con->out_skip += sizeof(msg->old_footer); > + con->out_skip += sizeof_footer(con); > } > /* data, middle, front */ > if (msg->data_length) > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2016-02-21 17:56 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-20 16:45 [PATCH 0/4] libceph: message skipping fixes Ilya Dryomov 2016-02-20 16:45 ` [PATCH 1/4] libceph: don't bail early from try_read() when skipping a message Ilya Dryomov 2016-02-20 18:28 ` Alex Elder 2016-02-20 20:21 ` Ilya Dryomov 2016-02-20 20:38 ` Alex Elder 2016-02-20 16:45 ` [PATCH 2/4] libceph: use the right footer size " Ilya Dryomov 2016-02-20 18:44 ` Alex Elder 2016-02-20 20:05 ` Ilya Dryomov 2016-02-20 20:15 ` Alex Elder 2016-02-21 14:31 ` Ilya Dryomov 2016-02-21 14:35 ` [PATCH 4/4] libceph: use sizeof_footer() more Ilya Dryomov 2016-02-21 15:08 ` Alex Elder 2016-02-20 16:45 ` [PATCH 3/4] libceph: don't spam dmesg with stray reply warnings Ilya Dryomov 2016-02-20 18:46 ` Alex Elder 2016-02-20 16:45 ` [PATCH 4/4] libceph: use sizeof_footer() in ceph_msg_revoke() Ilya Dryomov 2016-02-20 18:47 ` Alex Elder 2016-02-20 19:51 ` Ilya Dryomov 2016-02-20 19:56 ` Alex Elder
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox