* GSS unwrapping breaks the DRC @ 2020-04-15 17:05 Chuck Lever 2020-04-15 19:25 ` Bruce Fields 0 siblings, 1 reply; 11+ messages in thread From: Chuck Lever @ 2020-04-15 17:05 UTC (permalink / raw) To: Bruce Fields, Jeff Layton; +Cc: Linux NFS Mailing List Hi Bruce and Jeff: Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i or krb5p results in a pretty quick workload failure. Closer examination shows that the client is able to overrun the GSS sequence window with some regularity. When that happens, the server drops the connection. However, when the client retransmits requests with lost replies, they never hit in the DRC, and that results in unexpected failures of non- idempotent requests. The retransmitted XIDs are found in the DRC, but the retransmitted request has a different checksum than the original. We're hitting the "mismatch" case in nfsd_cache_key_cmp for these requests. I tracked down the problem to the way the DRC computes the length of the part of the buffer it wants to checksum. nfsd_cache_csum uses head.iov_len + page_len and then caps that at RC_CSUMLEN. That works fine for krb5 and sys, but the GSS unwrap functions (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len properly. So nfsd_cache_csum's length computation is significantly larger than the clear-text message, and that allows stale parts of the xdr_buf to be included in the checksum. Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf lengths properly and fixes the situation for krb5i. I don't see a similar solution for priv_unwrap_data: there's no MIC len available, and priv_len is not the actual length of the clear-text message. Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere where the relationship between the buf's head/len and how svc_defer works is authoritatively documented. It's not clear exactly how priv_unwrap_data is supposed to accommodate svc_defer, or whether integ_unwrap_data also needs to accommodate it. So I can't tell if the GSS unwrap functions are wrong or if there's a more accurate way to compute the message length in nfsd_cache_csum. I suspect both could use some improvement, but I'm not certain exactly what that might be. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-15 17:05 GSS unwrapping breaks the DRC Chuck Lever @ 2020-04-15 19:25 ` Bruce Fields 2020-04-15 20:06 ` Chuck Lever 0 siblings, 1 reply; 11+ messages in thread From: Bruce Fields @ 2020-04-15 19:25 UTC (permalink / raw) To: Chuck Lever; +Cc: Jeff Layton, Linux NFS Mailing List On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: > Hi Bruce and Jeff: > > Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i > or krb5p results in a pretty quick workload failure. Closer examination > shows that the client is able to overrun the GSS sequence window with > some regularity. When that happens, the server drops the connection. > > However, when the client retransmits requests with lost replies, they > never hit in the DRC, and that results in unexpected failures of non- > idempotent requests. > > The retransmitted XIDs are found in the DRC, but the retransmitted request > has a different checksum than the original. We're hitting the "mismatch" > case in nfsd_cache_key_cmp for these requests. > > I tracked down the problem to the way the DRC computes the length of the > part of the buffer it wants to checksum. nfsd_cache_csum uses > > head.iov_len + page_len > > and then caps that at RC_CSUMLEN. > > That works fine for krb5 and sys, but the GSS unwrap functions > (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len > properly. So nfsd_cache_csum's length computation is significantly larger > than the clear-text message, and that allows stale parts of the xdr_buf > to be included in the checksum. > > Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf > lengths properly and fixes the situation for krb5i. > > I don't see a similar solution for priv_unwrap_data: there's no MIC len > available, and priv_len is not the actual length of the clear-text message. > > Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere > where the relationship between the buf's head/len and how svc_defer works is > authoritatively documented. It's not clear exactly how priv_unwrap_data is > supposed to accommodate svc_defer, or whether integ_unwrap_data also needs > to accommodate it. > > So I can't tell if the GSS unwrap functions are wrong or if there's a more > accurate way to compute the message length in nfsd_cache_csum. I suspect > both could use some improvement, but I'm not certain exactly what that > might be. I don't know, I tried looking through that code and didn't get any further than you. The gss unwrap code does look suspect to me. It needs some kind of proper design, as it stands it's just an accumulation of fixes. --b. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-15 19:25 ` Bruce Fields @ 2020-04-15 20:06 ` Chuck Lever 2020-04-15 21:58 ` Bruce Fields 0 siblings, 1 reply; 11+ messages in thread From: Chuck Lever @ 2020-04-15 20:06 UTC (permalink / raw) To: Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing List > On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@fieldses.org> wrote: > > On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: >> Hi Bruce and Jeff: >> >> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i >> or krb5p results in a pretty quick workload failure. Closer examination >> shows that the client is able to overrun the GSS sequence window with >> some regularity. When that happens, the server drops the connection. >> >> However, when the client retransmits requests with lost replies, they >> never hit in the DRC, and that results in unexpected failures of non- >> idempotent requests. >> >> The retransmitted XIDs are found in the DRC, but the retransmitted request >> has a different checksum than the original. We're hitting the "mismatch" >> case in nfsd_cache_key_cmp for these requests. >> >> I tracked down the problem to the way the DRC computes the length of the >> part of the buffer it wants to checksum. nfsd_cache_csum uses >> >> head.iov_len + page_len >> >> and then caps that at RC_CSUMLEN. >> >> That works fine for krb5 and sys, but the GSS unwrap functions >> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len >> properly. So nfsd_cache_csum's length computation is significantly larger >> than the clear-text message, and that allows stale parts of the xdr_buf >> to be included in the checksum. >> >> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf >> lengths properly and fixes the situation for krb5i. >> >> I don't see a similar solution for priv_unwrap_data: there's no MIC len >> available, and priv_len is not the actual length of the clear-text message. >> >> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere >> where the relationship between the buf's head/len and how svc_defer works is >> authoritatively documented. It's not clear exactly how priv_unwrap_data is >> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs >> to accommodate it. >> >> So I can't tell if the GSS unwrap functions are wrong or if there's a more >> accurate way to compute the message length in nfsd_cache_csum. I suspect >> both could use some improvement, but I'm not certain exactly what that >> might be. > > I don't know, I tried looking through that code and didn't get any > further than you. The gss unwrap code does look suspect to me. It > needs some kind of proper design, as it stands it's just an accumulation > of fixes. Having recently completed overhauling the client-side equivalents, I agree with you there. I've now convinced myself that because nfsd_cache_csum might need to advance into the first page of the Call message, rq_arg.head.iov_len must contain an accurate length so that csum_partial is applied correctly to the head buffer. Therefore it is the preceding code that needs to set up rq_arg.head.iov_len correctly. The GSS unwrap functions have to do this, and therefore these must be fixed. I would theorize that svc_defer also depends on head.iov_len being set correctly. As far as how rq_arg.len needs to be set for svc_defer, I think I need to have some kind of test case to examine how that path is triggered. Any advice appreciated. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-15 20:06 ` Chuck Lever @ 2020-04-15 21:58 ` Bruce Fields 2020-04-15 22:23 ` Chuck Lever 0 siblings, 1 reply; 11+ messages in thread From: Bruce Fields @ 2020-04-15 21:58 UTC (permalink / raw) To: Chuck Lever; +Cc: Jeff Layton, Linux NFS Mailing List On Wed, Apr 15, 2020 at 04:06:17PM -0400, Chuck Lever wrote: > > > > On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@fieldses.org> wrote: > > > > On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: > >> Hi Bruce and Jeff: > >> > >> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i > >> or krb5p results in a pretty quick workload failure. Closer examination > >> shows that the client is able to overrun the GSS sequence window with > >> some regularity. When that happens, the server drops the connection. > >> > >> However, when the client retransmits requests with lost replies, they > >> never hit in the DRC, and that results in unexpected failures of non- > >> idempotent requests. > >> > >> The retransmitted XIDs are found in the DRC, but the retransmitted request > >> has a different checksum than the original. We're hitting the "mismatch" > >> case in nfsd_cache_key_cmp for these requests. > >> > >> I tracked down the problem to the way the DRC computes the length of the > >> part of the buffer it wants to checksum. nfsd_cache_csum uses > >> > >> head.iov_len + page_len > >> > >> and then caps that at RC_CSUMLEN. > >> > >> That works fine for krb5 and sys, but the GSS unwrap functions > >> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len > >> properly. So nfsd_cache_csum's length computation is significantly larger > >> than the clear-text message, and that allows stale parts of the xdr_buf > >> to be included in the checksum. > >> > >> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf > >> lengths properly and fixes the situation for krb5i. > >> > >> I don't see a similar solution for priv_unwrap_data: there's no MIC len > >> available, and priv_len is not the actual length of the clear-text message. > >> > >> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere > >> where the relationship between the buf's head/len and how svc_defer works is > >> authoritatively documented. It's not clear exactly how priv_unwrap_data is > >> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs > >> to accommodate it. > >> > >> So I can't tell if the GSS unwrap functions are wrong or if there's a more > >> accurate way to compute the message length in nfsd_cache_csum. I suspect > >> both could use some improvement, but I'm not certain exactly what that > >> might be. > > > > I don't know, I tried looking through that code and didn't get any > > further than you. The gss unwrap code does look suspect to me. It > > needs some kind of proper design, as it stands it's just an accumulation > > of fixes. > > Having recently completed overhauling the client-side equivalents, I > agree with you there. > > I've now convinced myself that because nfsd_cache_csum might need to > advance into the first page of the Call message, rq_arg.head.iov_len > must contain an accurate length so that csum_partial is applied > correctly to the head buffer. > > Therefore it is the preceding code that needs to set up rq_arg.head.iov_len > correctly. The GSS unwrap functions have to do this, and therefore these > must be fixed. I would theorize that svc_defer also depends on head.iov_len > being set correctly. > > As far as how rq_arg.len needs to be set for svc_defer, I think I need > to have some kind of test case to examine how that path is triggered. Any > advice appreciated. It's triggered on upcalls, so for example if you flush the export caches with exports -f and then send an rpc with a filehandle, it should call svc_defer on that request. --b. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-15 21:58 ` Bruce Fields @ 2020-04-15 22:23 ` Chuck Lever 2020-04-16 0:00 ` Bruce Fields 2020-04-17 21:48 ` Chuck Lever 0 siblings, 2 replies; 11+ messages in thread From: Chuck Lever @ 2020-04-15 22:23 UTC (permalink / raw) To: Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing List > On Apr 15, 2020, at 5:58 PM, Bruce Fields <bfields@fieldses.org> wrote: > > On Wed, Apr 15, 2020 at 04:06:17PM -0400, Chuck Lever wrote: >> >> >>> On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@fieldses.org> wrote: >>> >>> On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: >>>> Hi Bruce and Jeff: >>>> >>>> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i >>>> or krb5p results in a pretty quick workload failure. Closer examination >>>> shows that the client is able to overrun the GSS sequence window with >>>> some regularity. When that happens, the server drops the connection. >>>> >>>> However, when the client retransmits requests with lost replies, they >>>> never hit in the DRC, and that results in unexpected failures of non- >>>> idempotent requests. >>>> >>>> The retransmitted XIDs are found in the DRC, but the retransmitted request >>>> has a different checksum than the original. We're hitting the "mismatch" >>>> case in nfsd_cache_key_cmp for these requests. >>>> >>>> I tracked down the problem to the way the DRC computes the length of the >>>> part of the buffer it wants to checksum. nfsd_cache_csum uses >>>> >>>> head.iov_len + page_len >>>> >>>> and then caps that at RC_CSUMLEN. >>>> >>>> That works fine for krb5 and sys, but the GSS unwrap functions >>>> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len >>>> properly. So nfsd_cache_csum's length computation is significantly larger >>>> than the clear-text message, and that allows stale parts of the xdr_buf >>>> to be included in the checksum. >>>> >>>> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf >>>> lengths properly and fixes the situation for krb5i. >>>> >>>> I don't see a similar solution for priv_unwrap_data: there's no MIC len >>>> available, and priv_len is not the actual length of the clear-text message. >>>> >>>> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere >>>> where the relationship between the buf's head/len and how svc_defer works is >>>> authoritatively documented. It's not clear exactly how priv_unwrap_data is >>>> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs >>>> to accommodate it. >>>> >>>> So I can't tell if the GSS unwrap functions are wrong or if there's a more >>>> accurate way to compute the message length in nfsd_cache_csum. I suspect >>>> both could use some improvement, but I'm not certain exactly what that >>>> might be. >>> >>> I don't know, I tried looking through that code and didn't get any >>> further than you. The gss unwrap code does look suspect to me. It >>> needs some kind of proper design, as it stands it's just an accumulation >>> of fixes. >> >> Having recently completed overhauling the client-side equivalents, I >> agree with you there. >> >> I've now convinced myself that because nfsd_cache_csum might need to >> advance into the first page of the Call message, rq_arg.head.iov_len >> must contain an accurate length so that csum_partial is applied >> correctly to the head buffer. >> >> Therefore it is the preceding code that needs to set up rq_arg.head.iov_len >> correctly. The GSS unwrap functions have to do this, and therefore these >> must be fixed. I would theorize that svc_defer also depends on head.iov_len >> being set correctly. >> >> As far as how rq_arg.len needs to be set for svc_defer, I think I need >> to have some kind of test case to examine how that path is triggered. Any >> advice appreciated. > > It's triggered on upcalls, so for example if you flush the export caches > with exports -f and then send an rpc with a filehandle, it should call > svc_defer on that request. /me puts a brown paper bag over his head Reverting 241b1f419f0e ("SUNRPC: Remove xdr_buf_trim()") seems to fix both krb5i and krb5p. I'll post an official patch once I've done a little more testing. I promise to get the Fixes: tag right :-) -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-15 22:23 ` Chuck Lever @ 2020-04-16 0:00 ` Bruce Fields 2020-04-16 14:07 ` Chuck Lever 2020-04-17 21:48 ` Chuck Lever 1 sibling, 1 reply; 11+ messages in thread From: Bruce Fields @ 2020-04-16 0:00 UTC (permalink / raw) To: Chuck Lever; +Cc: Jeff Layton, Linux NFS Mailing List On Wed, Apr 15, 2020 at 06:23:57PM -0400, Chuck Lever wrote: > > > > On Apr 15, 2020, at 5:58 PM, Bruce Fields <bfields@fieldses.org> wrote: > > > > On Wed, Apr 15, 2020 at 04:06:17PM -0400, Chuck Lever wrote: > >> > >> > >>> On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@fieldses.org> wrote: > >>> > >>> On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: > >>>> Hi Bruce and Jeff: > >>>> > >>>> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i > >>>> or krb5p results in a pretty quick workload failure. Closer examination > >>>> shows that the client is able to overrun the GSS sequence window with > >>>> some regularity. When that happens, the server drops the connection. > >>>> > >>>> However, when the client retransmits requests with lost replies, they > >>>> never hit in the DRC, and that results in unexpected failures of non- > >>>> idempotent requests. > >>>> > >>>> The retransmitted XIDs are found in the DRC, but the retransmitted request > >>>> has a different checksum than the original. We're hitting the "mismatch" > >>>> case in nfsd_cache_key_cmp for these requests. > >>>> > >>>> I tracked down the problem to the way the DRC computes the length of the > >>>> part of the buffer it wants to checksum. nfsd_cache_csum uses > >>>> > >>>> head.iov_len + page_len > >>>> > >>>> and then caps that at RC_CSUMLEN. > >>>> > >>>> That works fine for krb5 and sys, but the GSS unwrap functions > >>>> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len > >>>> properly. So nfsd_cache_csum's length computation is significantly larger > >>>> than the clear-text message, and that allows stale parts of the xdr_buf > >>>> to be included in the checksum. > >>>> > >>>> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf > >>>> lengths properly and fixes the situation for krb5i. > >>>> > >>>> I don't see a similar solution for priv_unwrap_data: there's no MIC len > >>>> available, and priv_len is not the actual length of the clear-text message. > >>>> > >>>> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere > >>>> where the relationship between the buf's head/len and how svc_defer works is > >>>> authoritatively documented. It's not clear exactly how priv_unwrap_data is > >>>> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs > >>>> to accommodate it. > >>>> > >>>> So I can't tell if the GSS unwrap functions are wrong or if there's a more > >>>> accurate way to compute the message length in nfsd_cache_csum. I suspect > >>>> both could use some improvement, but I'm not certain exactly what that > >>>> might be. > >>> > >>> I don't know, I tried looking through that code and didn't get any > >>> further than you. The gss unwrap code does look suspect to me. It > >>> needs some kind of proper design, as it stands it's just an accumulation > >>> of fixes. > >> > >> Having recently completed overhauling the client-side equivalents, I > >> agree with you there. > >> > >> I've now convinced myself that because nfsd_cache_csum might need to > >> advance into the first page of the Call message, rq_arg.head.iov_len > >> must contain an accurate length so that csum_partial is applied > >> correctly to the head buffer. > >> > >> Therefore it is the preceding code that needs to set up rq_arg.head.iov_len > >> correctly. The GSS unwrap functions have to do this, and therefore these > >> must be fixed. I would theorize that svc_defer also depends on head.iov_len > >> being set correctly. > >> > >> As far as how rq_arg.len needs to be set for svc_defer, I think I need > >> to have some kind of test case to examine how that path is triggered. Any > >> advice appreciated. > > > > It's triggered on upcalls, so for example if you flush the export caches > > with exports -f and then send an rpc with a filehandle, it should call > > svc_defer on that request. > > /me puts a brown paper bag over his head > > Reverting 241b1f419f0e ("SUNRPC: Remove xdr_buf_trim()") seems to fix both > krb5i and krb5p. Well, it has my ack too.... > I'll post an official patch once I've done a little more testing. I promise > to get the Fixes: tag right :-) I did have it in the back of my mind that one of us had fixed a similar bug before. Indeed, Jeff's: 4c190e2f913f "sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer" explains exactly the bug you saw. Maybe some of that changelog should move into a code comment instead. And I still think the code is more accidents waiting to happen. --b. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-16 0:00 ` Bruce Fields @ 2020-04-16 14:07 ` Chuck Lever 2020-04-16 14:28 ` Bruce Fields 0 siblings, 1 reply; 11+ messages in thread From: Chuck Lever @ 2020-04-16 14:07 UTC (permalink / raw) To: Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing List > On Apr 15, 2020, at 8:00 PM, Bruce Fields <bfields@fieldses.org> wrote: > > On Wed, Apr 15, 2020 at 06:23:57PM -0400, Chuck Lever wrote: >> >> >>> On Apr 15, 2020, at 5:58 PM, Bruce Fields <bfields@fieldses.org> wrote: >>> >>> On Wed, Apr 15, 2020 at 04:06:17PM -0400, Chuck Lever wrote: >>>> >>>> >>>>> On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@fieldses.org> wrote: >>>>> >>>>> On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: >>>>>> Hi Bruce and Jeff: >>>>>> >>>>>> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i >>>>>> or krb5p results in a pretty quick workload failure. Closer examination >>>>>> shows that the client is able to overrun the GSS sequence window with >>>>>> some regularity. When that happens, the server drops the connection. >>>>>> >>>>>> However, when the client retransmits requests with lost replies, they >>>>>> never hit in the DRC, and that results in unexpected failures of non- >>>>>> idempotent requests. >>>>>> >>>>>> The retransmitted XIDs are found in the DRC, but the retransmitted request >>>>>> has a different checksum than the original. We're hitting the "mismatch" >>>>>> case in nfsd_cache_key_cmp for these requests. >>>>>> >>>>>> I tracked down the problem to the way the DRC computes the length of the >>>>>> part of the buffer it wants to checksum. nfsd_cache_csum uses >>>>>> >>>>>> head.iov_len + page_len >>>>>> >>>>>> and then caps that at RC_CSUMLEN. >>>>>> >>>>>> That works fine for krb5 and sys, but the GSS unwrap functions >>>>>> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len >>>>>> properly. So nfsd_cache_csum's length computation is significantly larger >>>>>> than the clear-text message, and that allows stale parts of the xdr_buf >>>>>> to be included in the checksum. >>>>>> >>>>>> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf >>>>>> lengths properly and fixes the situation for krb5i. >>>>>> >>>>>> I don't see a similar solution for priv_unwrap_data: there's no MIC len >>>>>> available, and priv_len is not the actual length of the clear-text message. >>>>>> >>>>>> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere >>>>>> where the relationship between the buf's head/len and how svc_defer works is >>>>>> authoritatively documented. It's not clear exactly how priv_unwrap_data is >>>>>> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs >>>>>> to accommodate it. >>>>>> >>>>>> So I can't tell if the GSS unwrap functions are wrong or if there's a more >>>>>> accurate way to compute the message length in nfsd_cache_csum. I suspect >>>>>> both could use some improvement, but I'm not certain exactly what that >>>>>> might be. >>>>> >>>>> I don't know, I tried looking through that code and didn't get any >>>>> further than you. The gss unwrap code does look suspect to me. It >>>>> needs some kind of proper design, as it stands it's just an accumulation >>>>> of fixes. >>>> >>>> Having recently completed overhauling the client-side equivalents, I >>>> agree with you there. >>>> >>>> I've now convinced myself that because nfsd_cache_csum might need to >>>> advance into the first page of the Call message, rq_arg.head.iov_len >>>> must contain an accurate length so that csum_partial is applied >>>> correctly to the head buffer. >>>> >>>> Therefore it is the preceding code that needs to set up rq_arg.head.iov_len >>>> correctly. The GSS unwrap functions have to do this, and therefore these >>>> must be fixed. I would theorize that svc_defer also depends on head.iov_len >>>> being set correctly. >>>> >>>> As far as how rq_arg.len needs to be set for svc_defer, I think I need >>>> to have some kind of test case to examine how that path is triggered. Any >>>> advice appreciated. >>> >>> It's triggered on upcalls, so for example if you flush the export caches >>> with exports -f and then send an rpc with a filehandle, it should call >>> svc_defer on that request. >> >> /me puts a brown paper bag over his head >> >> Reverting 241b1f419f0e ("SUNRPC: Remove xdr_buf_trim()") seems to fix both >> krb5i and krb5p. > > Well, it has my ack too.... > >> I'll post an official patch once I've done a little more testing. I promise >> to get the Fixes: tag right :-) > > I did have it in the back of my mind that one of us had fixed a similar > bug before. Indeed, Jeff's: > > 4c190e2f913f "sunrpc: trim off trailing checksum before > returning decrypted or integrity authenticated buffer" > > explains exactly the bug you saw. Yep, I agree with the purpose of 4c190e2f913f. The operation of xdr_bufs is different on the client and server in many subtle ways, which I probably still don't completely understand. The piece I missed was that the server ignores buf->len in many cases, preferring instead to use the xdr_buf segment lengths. So simply updating buf->len is insufficient on the server side. > Maybe some of that changelog should move into a code comment instead. The patch I'm testing is a mechanical revert. It doesn't improve the comments, and restores a BUG_ON that 241b1f419f0e removed. I'm putting off clean-ups for the moment because I'd like to see this fix in 5.7-rc. > And I still think the code is more accidents waiting to happen. The bigger picture is updating the server to use xdr_stream throughout its XDR stack. That's the main reason I worked on the client-side GSS wrap and unwrap functions last year. Using xdr_stream would move the server and client sides closer together in style and implementation, then hopefully we could share more code. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-16 14:07 ` Chuck Lever @ 2020-04-16 14:28 ` Bruce Fields 0 siblings, 0 replies; 11+ messages in thread From: Bruce Fields @ 2020-04-16 14:28 UTC (permalink / raw) To: Chuck Lever; +Cc: Jeff Layton, Linux NFS Mailing List On Thu, Apr 16, 2020 at 10:07:27AM -0400, Chuck Lever wrote: > The bigger picture is updating the server to use xdr_stream throughout > its XDR stack. That's the main reason I worked on the client-side GSS > wrap and unwrap functions last year. > > Using xdr_stream would move the server and client sides closer together > in style and implementation, then hopefully we could share more code. I'm all for that, though I'm not sure it's the same problem. The krb5i/krb5p implementation isn't based on xdr_stream on either the client or the server. But yes maybe it would force thinking about what the different xdr_buf fields mean in a way that would clarify things. I don't know. --b. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-15 22:23 ` Chuck Lever 2020-04-16 0:00 ` Bruce Fields @ 2020-04-17 21:48 ` Chuck Lever 2020-04-23 19:34 ` Bruce Fields 1 sibling, 1 reply; 11+ messages in thread From: Chuck Lever @ 2020-04-17 21:48 UTC (permalink / raw) To: Bruce Fields, Trond Myklebust; +Cc: Jeff Layton, Linux NFS Mailing List > On Apr 15, 2020, at 6:23 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > >> On Apr 15, 2020, at 5:58 PM, Bruce Fields <bfields@fieldses.org> wrote: >> >> On Wed, Apr 15, 2020 at 04:06:17PM -0400, Chuck Lever wrote: >>> >>> >>>> On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@fieldses.org> wrote: >>>> >>>> On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: >>>>> Hi Bruce and Jeff: >>>>> >>>>> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i >>>>> or krb5p results in a pretty quick workload failure. Closer examination >>>>> shows that the client is able to overrun the GSS sequence window with >>>>> some regularity. When that happens, the server drops the connection. >>>>> >>>>> However, when the client retransmits requests with lost replies, they >>>>> never hit in the DRC, and that results in unexpected failures of non- >>>>> idempotent requests. >>>>> >>>>> The retransmitted XIDs are found in the DRC, but the retransmitted request >>>>> has a different checksum than the original. We're hitting the "mismatch" >>>>> case in nfsd_cache_key_cmp for these requests. >>>>> >>>>> I tracked down the problem to the way the DRC computes the length of the >>>>> part of the buffer it wants to checksum. nfsd_cache_csum uses >>>>> >>>>> head.iov_len + page_len >>>>> >>>>> and then caps that at RC_CSUMLEN. >>>>> >>>>> That works fine for krb5 and sys, but the GSS unwrap functions >>>>> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len >>>>> properly. So nfsd_cache_csum's length computation is significantly larger >>>>> than the clear-text message, and that allows stale parts of the xdr_buf >>>>> to be included in the checksum. >>>>> >>>>> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf >>>>> lengths properly and fixes the situation for krb5i. >>>>> >>>>> I don't see a similar solution for priv_unwrap_data: there's no MIC len >>>>> available, and priv_len is not the actual length of the clear-text message. >>>>> >>>>> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere >>>>> where the relationship between the buf's head/len and how svc_defer works is >>>>> authoritatively documented. It's not clear exactly how priv_unwrap_data is >>>>> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs >>>>> to accommodate it. >>>>> >>>>> So I can't tell if the GSS unwrap functions are wrong or if there's a more >>>>> accurate way to compute the message length in nfsd_cache_csum. I suspect >>>>> both could use some improvement, but I'm not certain exactly what that >>>>> might be. >>>> >>>> I don't know, I tried looking through that code and didn't get any >>>> further than you. The gss unwrap code does look suspect to me. It >>>> needs some kind of proper design, as it stands it's just an accumulation >>>> of fixes. >>> >>> Having recently completed overhauling the client-side equivalents, I >>> agree with you there. >>> >>> I've now convinced myself that because nfsd_cache_csum might need to >>> advance into the first page of the Call message, rq_arg.head.iov_len >>> must contain an accurate length so that csum_partial is applied >>> correctly to the head buffer. >>> >>> Therefore it is the preceding code that needs to set up rq_arg.head.iov_len >>> correctly. The GSS unwrap functions have to do this, and therefore these >>> must be fixed. I would theorize that svc_defer also depends on head.iov_len >>> being set correctly. >>> >>> As far as how rq_arg.len needs to be set for svc_defer, I think I need >>> to have some kind of test case to examine how that path is triggered. Any >>> advice appreciated. >> >> It's triggered on upcalls, so for example if you flush the export caches >> with exports -f and then send an rpc with a filehandle, it should call >> svc_defer on that request. > > /me puts a brown paper bag over his head > > Reverting 241b1f419f0e ("SUNRPC: Remove xdr_buf_trim()") seems to fix both > krb5i and krb5p. > > I'll post an official patch once I've done a little more testing. I promise > to get the Fixes: tag right :-) I've hit a snag here. I reverted 241b1f419f0e on my server, and all tests completed successfully. I reverted 241b1f419f0e on my client, and now krb5p is failing. Using xdr_buf_trim does the right thing on the server, but on the client it has exposed a latent bug in gss_unwrap_resp_priv() (ie, the bug does not appear to be harmful until 241b1f419f0e has been reverted). The calculation of au_ralign in that function is wrong, and that forces rpc_prepare_reply_pages to allocate a zero-length tail. xdr_buf_trim() then lops off the end of each subsequent clear-text RPC message, and eventually a short READ results in test failures. After experimenting with this for a day, I don't see any way for gss_unwrap_resp_priv() to estimate au_ralign based on what gss_unwrap() has done to the xdr_buf. The kerberos_v1 unwrap method does not appear to have any trailing checksum, for example, but v2 does. The best approach for now seems to be to have the pseudoflavor-specific unwrap methods return the correct ralign value. A straightforward way to do this would be to add a *int parameter to ->gss_unwrap that would be set to the proper value; or hide that value somewhere in the xdr_buf. Any other thoughts or random bits of inspiration? -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-17 21:48 ` Chuck Lever @ 2020-04-23 19:34 ` Bruce Fields 2020-04-23 19:41 ` Chuck Lever 0 siblings, 1 reply; 11+ messages in thread From: Bruce Fields @ 2020-04-23 19:34 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, Jeff Layton, Linux NFS Mailing List On Fri, Apr 17, 2020 at 05:48:35PM -0400, Chuck Lever wrote: > I've hit a snag here. > > I reverted 241b1f419f0e on my server, and all tests completed > successfully. > > I reverted 241b1f419f0e on my client, and now krb5p is failing. Using > xdr_buf_trim does the right thing on the server, but on the client it > has exposed a latent bug in gss_unwrap_resp_priv() (ie, the bug does > not appear to be harmful until 241b1f419f0e has been reverted). > > The calculation of au_ralign in that function is wrong, and that forces > rpc_prepare_reply_pages to allocate a zero-length tail. xdr_buf_trim() > then lops off the end of each subsequent clear-text RPC message, and > eventually a short READ results in test failures. > > After experimenting with this for a day, I don't see any way for > gss_unwrap_resp_priv() to estimate au_ralign based on what gss_unwrap() > has done to the xdr_buf. The kerberos_v1 unwrap method does not appear > to have any trailing checksum, for example, but v2 does. > > The best approach for now seems to be to have the pseudoflavor-specific > unwrap methods return the correct ralign value. A straightforward way > to do this would be to add a *int parameter to ->gss_unwrap that would > be set to the proper value; or hide that value somewhere in the xdr_buf. > > Any other thoughts or random bits of inspiration? No. Among other things, a quick skim wasn't enough to remind me what au_ralign is and why we have both that and au_rslack.... Sorry! I've got not much to offer but sympathy. ... I have a random thought out of left field: after the xdr_stream conversion, fs/nfs/nfs4xdr.c mostly doesn't deal directly with the reply buffer any more. It calls xdr_inline_decode(xdr, n) and gets back a pointer to the next n bytes of rpc data. Or it calls xdr_read_pages() after which read data's supposed to be moved to the right place. Would it be possible to delay rpcsec_gss decoding until then? Would that make things simpler or more complicated? Eh, I think the answer is probably "more complicated". --b. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GSS unwrapping breaks the DRC 2020-04-23 19:34 ` Bruce Fields @ 2020-04-23 19:41 ` Chuck Lever 0 siblings, 0 replies; 11+ messages in thread From: Chuck Lever @ 2020-04-23 19:41 UTC (permalink / raw) To: Bruce Fields; +Cc: Trond Myklebust, Jeff Layton, Linux NFS Mailing List Hi Bruce- > On Apr 23, 2020, at 3:34 PM, Bruce Fields <bfields@fieldses.org> wrote: > > On Fri, Apr 17, 2020 at 05:48:35PM -0400, Chuck Lever wrote: >> I've hit a snag here. >> >> I reverted 241b1f419f0e on my server, and all tests completed >> successfully. >> >> I reverted 241b1f419f0e on my client, and now krb5p is failing. Using >> xdr_buf_trim does the right thing on the server, but on the client it >> has exposed a latent bug in gss_unwrap_resp_priv() (ie, the bug does >> not appear to be harmful until 241b1f419f0e has been reverted). >> >> The calculation of au_ralign in that function is wrong, and that forces >> rpc_prepare_reply_pages to allocate a zero-length tail. xdr_buf_trim() >> then lops off the end of each subsequent clear-text RPC message, and >> eventually a short READ results in test failures. >> >> After experimenting with this for a day, I don't see any way for >> gss_unwrap_resp_priv() to estimate au_ralign based on what gss_unwrap() >> has done to the xdr_buf. The kerberos_v1 unwrap method does not appear >> to have any trailing checksum, for example, but v2 does. >> >> The best approach for now seems to be to have the pseudoflavor-specific >> unwrap methods return the correct ralign value. A straightforward way >> to do this would be to add a *int parameter to ->gss_unwrap that would >> be set to the proper value; or hide that value somewhere in the xdr_buf. >> >> Any other thoughts or random bits of inspiration? > > No. Among other things, a quick skim wasn't enough to remind me what > au_ralign is and why we have both that and au_rslack.... Sorry! I've > got not much to offer but sympathy. I added au_ralign last year to deal with where the beginning of the encapsulated upper layer message lands. So: au_verfsize - the size of the RPC header's verifier field au_ralign - the offset of the start of the upper layer results au_rslack - the total size of the RPC header and results Note that the krb5_v2 unwrapper deals with this already. It's got headskip and tailskip. The v1 unwrapper doesn't need to worry about trailing GSS data. The purpose of this is to ensure that the upper layer's data payload (ie, what is supposed to land in the buf->pages most of the time) is not going to need to be re-aligned via a memcpy. au_ralign is used to this effect in rpc_prepare_reply_pages when setting up rq_rcv_buf. > ... > > I have a random thought out of left field: after the xdr_stream > conversion, fs/nfs/nfs4xdr.c mostly doesn't deal directly with the reply > buffer any more. It calls xdr_inline_decode(xdr, n) and gets back a > pointer to the next n bytes of rpc data. Or it calls xdr_read_pages() > after which read data's supposed to be moved to the right place. > > Would it be possible to delay rpcsec_gss decoding until then? Would > that make things simpler or more complicated? > > Eh, I think the answer is probably "more complicated". > > --b. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-04-23 19:44 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-15 17:05 GSS unwrapping breaks the DRC Chuck Lever 2020-04-15 19:25 ` Bruce Fields 2020-04-15 20:06 ` Chuck Lever 2020-04-15 21:58 ` Bruce Fields 2020-04-15 22:23 ` Chuck Lever 2020-04-16 0:00 ` Bruce Fields 2020-04-16 14:07 ` Chuck Lever 2020-04-16 14:28 ` Bruce Fields 2020-04-17 21:48 ` Chuck Lever 2020-04-23 19:34 ` Bruce Fields 2020-04-23 19:41 ` Chuck Lever
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.