* GSS sequence number window @ 2017-05-30 18:58 Chuck Lever 2017-05-30 19:34 ` J. Bruce Fields 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever @ 2017-05-30 18:58 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Linux NFS Mailing List Hey Bruce! While testing with sec=krb5 and sec=krb5i, I noticed a lot of spurious connection loss, especially when I wanted to run a CPU-intensive workload on my NFS server at the same time I was testing. I added a pr_err() in gss_check_seq_num, and ran a fio job on a vers=3,sec=sys,proto=tcp mount (server is exporting a tmpfs). On the server, I rebuilt a kernel source tree cscope database at the same time. May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = 250098, sd_max = 250291, GSS_SEQ_WIN = 128 May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 937816, sd_max = 938171, GSS_SEQ_WIN = 128 May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938544, sd_max = 938727, GSS_SEQ_WIN = 128 May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938543, sd_max = 938727, GSS_SEQ_WIN = 128 May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = 939344, sd_max = 939549, GSS_SEQ_WIN = 128 May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = 965007, sd_max = 965176, GSS_SEQ_WIN = 128 May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128 May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128 May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128 May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128 RFC 2203 suggests there's no risk to using a large window. My first thought was to make the sequence window larger (say 2048) but I've seen stragglers outside even that large a window. Any thoughts about why there are these sequence number outliers? -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-05-30 18:58 GSS sequence number window Chuck Lever @ 2017-05-30 19:34 ` J. Bruce Fields 2017-05-30 20:11 ` Benjamin Coddington 2017-05-30 21:03 ` Chuck Lever 0 siblings, 2 replies; 14+ messages in thread From: J. Bruce Fields @ 2017-05-30 19:34 UTC (permalink / raw) To: Chuck Lever; +Cc: Linux NFS Mailing List On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote: > Hey Bruce! > > While testing with sec=krb5 and sec=krb5i, I noticed a lot of > spurious connection loss, especially when I wanted to run a > CPU-intensive workload on my NFS server at the same time I > was testing. > > I added a pr_err() in gss_check_seq_num, and ran a fio job > on a vers=3,sec=sys,proto=tcp mount (server is exporting a > tmpfs). On the server, I rebuilt a kernel source tree cscope > database at the same time. > > May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = 250098, sd_max = 250291, GSS_SEQ_WIN = 128 > May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 937816, sd_max = 938171, GSS_SEQ_WIN = 128 > May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938544, sd_max = 938727, GSS_SEQ_WIN = 128 > May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938543, sd_max = 938727, GSS_SEQ_WIN = 128 > May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = 939344, sd_max = 939549, GSS_SEQ_WIN = 128 > May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = 965007, sd_max = 965176, GSS_SEQ_WIN = 128 > May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128 > May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128 > May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128 > May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128 > > RFC 2203 suggests there's no risk to using a large window. > My first thought was to make the sequence window larger > (say 2048) but I've seen stragglers outside even that large > a window. > > Any thoughts about why there are these sequence number > outliers? No, alas. The server's where it's easier to catch, but it'd be interesting to know what's going on on the client side when this happens. Might be interesting to know just what the operation (or compound ops) are in the bad cases, in case there's some pattern. Might also be interesting to check where exactly in the client code the sequence number is assigned and where in the server code it's checked, and think about where they might get queued up or reordered in between. Is it possible there are retries involved? Does increasing the number of server threads help? I don't think it'd be a problem to increase the sequence window. --b. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-05-30 19:34 ` J. Bruce Fields @ 2017-05-30 20:11 ` Benjamin Coddington 2017-05-31 19:22 ` J. Bruce Fields 2017-05-30 21:03 ` Chuck Lever 1 sibling, 1 reply; 14+ messages in thread From: Benjamin Coddington @ 2017-05-30 20:11 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Chuck Lever, Linux NFS Mailing List On 30 May 2017, at 15:34, J. Bruce Fields wrote: > On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote: >> Hey Bruce! >> >> While testing with sec=krb5 and sec=krb5i, I noticed a lot of >> spurious connection loss, especially when I wanted to run a >> CPU-intensive workload on my NFS server at the same time I >> was testing. >> >> I added a pr_err() in gss_check_seq_num, and ran a fio job >> on a vers=3,sec=sys,proto=tcp mount (server is exporting a >> tmpfs). On the server, I rebuilt a kernel source tree cscope >> database at the same time. >> >> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = 250098, >> sd_max = 250291, GSS_SEQ_WIN = 128 >> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 937816, >> sd_max = 938171, GSS_SEQ_WIN = 128 >> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938544, >> sd_max = 938727, GSS_SEQ_WIN = 128 >> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938543, >> sd_max = 938727, GSS_SEQ_WIN = 128 >> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = 939344, >> sd_max = 939549, GSS_SEQ_WIN = 128 >> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = 965007, >> sd_max = 965176, GSS_SEQ_WIN = 128 >> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = 1799710, >> sd_max = 1799982, GSS_SEQ_WIN = 128 >> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = 1831165, >> sd_max = 1831353, GSS_SEQ_WIN = 128 >> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = 1883583, >> sd_max = 1883761, GSS_SEQ_WIN = 128 >> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = 1959316, >> sd_max = 1959447, GSS_SEQ_WIN = 128 >> >> RFC 2203 suggests there's no risk to using a large window. >> My first thought was to make the sequence window larger >> (say 2048) but I've seen stragglers outside even that large >> a window. >> >> Any thoughts about why there are these sequence number >> outliers? > > No, alas. I noticed some slow allocations on the server with krb5 last year - but never got around to doing anything about it: http://marc.info/?t=146032122900006&r=1&w=2 Could be the same thing? Ben ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-05-30 20:11 ` Benjamin Coddington @ 2017-05-31 19:22 ` J. Bruce Fields 2017-05-31 20:27 ` Chuck Lever 0 siblings, 1 reply; 14+ messages in thread From: J. Bruce Fields @ 2017-05-31 19:22 UTC (permalink / raw) To: Benjamin Coddington; +Cc: Chuck Lever, Linux NFS Mailing List On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote: > On 30 May 2017, at 15:34, J. Bruce Fields wrote: > > >On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote: > >>Hey Bruce! > >> > >>While testing with sec=krb5 and sec=krb5i, I noticed a lot of > >>spurious connection loss, especially when I wanted to run a > >>CPU-intensive workload on my NFS server at the same time I > >>was testing. > >> > >>I added a pr_err() in gss_check_seq_num, and ran a fio job > >>on a vers=3,sec=sys,proto=tcp mount (server is exporting a > >>tmpfs). On the server, I rebuilt a kernel source tree cscope > >>database at the same time. > >> > >>May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = > >>250098, sd_max = 250291, GSS_SEQ_WIN = 128 > >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = > >>937816, sd_max = 938171, GSS_SEQ_WIN = 128 > >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = > >>938544, sd_max = 938727, GSS_SEQ_WIN = 128 > >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = > >>938543, sd_max = 938727, GSS_SEQ_WIN = 128 > >>May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = > >>939344, sd_max = 939549, GSS_SEQ_WIN = 128 > >>May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = > >>965007, sd_max = 965176, GSS_SEQ_WIN = 128 > >>May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = > >>1799710, sd_max = 1799982, GSS_SEQ_WIN = 128 > >>May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = > >>1831165, sd_max = 1831353, GSS_SEQ_WIN = 128 > >>May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = > >>1883583, sd_max = 1883761, GSS_SEQ_WIN = 128 > >>May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = > >>1959316, sd_max = 1959447, GSS_SEQ_WIN = 128 > >> > >>RFC 2203 suggests there's no risk to using a large window. > >>My first thought was to make the sequence window larger > >>(say 2048) but I've seen stragglers outside even that large > >>a window. > >> > >>Any thoughts about why there are these sequence number > >>outliers? > > > >No, alas. > > I noticed some slow allocations on the server with krb5 last year - but > never got around to doing anything about it: > http://marc.info/?t=146032122900006&r=1&w=2 > > Could be the same thing? I don't think it would be too hard to eliminate the need for allocations there. Or maybe there's even a quick hack that would let Chuck test whether that's the problem (different GFP flags on those allocations?) --b. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-05-31 19:22 ` J. Bruce Fields @ 2017-05-31 20:27 ` Chuck Lever 2017-06-06 19:35 ` Chuck Lever 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever @ 2017-05-31 20:27 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Benjamin Coddington, Linux NFS Mailing List > On May 31, 2017, at 3:22 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote: >> On 30 May 2017, at 15:34, J. Bruce Fields wrote: >> >>> On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote: >>>> Hey Bruce! >>>> >>>> While testing with sec=krb5 and sec=krb5i, I noticed a lot of >>>> spurious connection loss, especially when I wanted to run a >>>> CPU-intensive workload on my NFS server at the same time I >>>> was testing. >>>> >>>> I added a pr_err() in gss_check_seq_num, and ran a fio job >>>> on a vers=3,sec=sys,proto=tcp mount (server is exporting a >>>> tmpfs). On the server, I rebuilt a kernel source tree cscope >>>> database at the same time. >>>> >>>> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = >>>> 250098, sd_max = 250291, GSS_SEQ_WIN = 128 >>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>> 937816, sd_max = 938171, GSS_SEQ_WIN = 128 >>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>> 938544, sd_max = 938727, GSS_SEQ_WIN = 128 >>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>> 938543, sd_max = 938727, GSS_SEQ_WIN = 128 >>>> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = >>>> 939344, sd_max = 939549, GSS_SEQ_WIN = 128 >>>> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = >>>> 965007, sd_max = 965176, GSS_SEQ_WIN = 128 >>>> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = >>>> 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128 >>>> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = >>>> 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128 >>>> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = >>>> 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128 >>>> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = >>>> 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128 >>>> >>>> RFC 2203 suggests there's no risk to using a large window. >>>> My first thought was to make the sequence window larger >>>> (say 2048) but I've seen stragglers outside even that large >>>> a window. >>>> >>>> Any thoughts about why there are these sequence number >>>> outliers? >>> >>> No, alas. >> >> I noticed some slow allocations on the server with krb5 last year - but >> never got around to doing anything about it: >> http://marc.info/?t=146032122900006&r=1&w=2 >> >> Could be the same thing? > > I don't think it would be too hard to eliminate the need for allocations > there. Or maybe there's even a quick hack that would let Chuck test > whether that's the problem (different GFP flags on those allocations?) This doesn't feel like a memory allocator issue. My server has loads of free memory, and there's just one NUMA node. Since increasing nfsd thread count seems to improve the situation, my guess is that having more threads available makes it less likely that an RPC request will be reordered. -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-05-31 20:27 ` Chuck Lever @ 2017-06-06 19:35 ` Chuck Lever 2017-06-06 19:41 ` J. Bruce Fields 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever @ 2017-06-06 19:35 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Benjamin Coddington, Linux NFS Mailing List > On May 31, 2017, at 4:27 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > >> >> On May 31, 2017, at 3:22 PM, J. Bruce Fields <bfields@fieldses.org> wrote: >> >> On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote: >>> On 30 May 2017, at 15:34, J. Bruce Fields wrote: >>> >>>> On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote: >>>>> Hey Bruce! >>>>> >>>>> While testing with sec=krb5 and sec=krb5i, I noticed a lot of >>>>> spurious connection loss, especially when I wanted to run a >>>>> CPU-intensive workload on my NFS server at the same time I >>>>> was testing. >>>>> >>>>> I added a pr_err() in gss_check_seq_num, and ran a fio job >>>>> on a vers=3,sec=sys,proto=tcp mount (server is exporting a >>>>> tmpfs). On the server, I rebuilt a kernel source tree cscope >>>>> database at the same time. >>>>> >>>>> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = >>>>> 250098, sd_max = 250291, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>>> 937816, sd_max = 938171, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>>> 938544, sd_max = 938727, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>>> 938543, sd_max = 938727, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = >>>>> 939344, sd_max = 939549, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = >>>>> 965007, sd_max = 965176, GSS_SEQ_WIN = 128 >>>>> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = >>>>> 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128 >>>>> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = >>>>> 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128 >>>>> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = >>>>> 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128 >>>>> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = >>>>> 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128 >>>>> >>>>> RFC 2203 suggests there's no risk to using a large window. >>>>> My first thought was to make the sequence window larger >>>>> (say 2048) but I've seen stragglers outside even that large >>>>> a window. >>>>> >>>>> Any thoughts about why there are these sequence number >>>>> outliers? >>>> >>>> No, alas. >>> >>> I noticed some slow allocations on the server with krb5 last year - but >>> never got around to doing anything about it: >>> http://marc.info/?t=146032122900006&r=1&w=2 >>> >>> Could be the same thing? >> >> I don't think it would be too hard to eliminate the need for allocations >> there. Or maybe there's even a quick hack that would let Chuck test >> whether that's the problem (different GFP flags on those allocations?) > > This doesn't feel like a memory allocator issue. My server has > loads of free memory, and there's just one NUMA node. > > Since increasing nfsd thread count seems to improve the situation, > my guess is that having more threads available makes it less likely > that an RPC request will be reordered. I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306 To check memory allocation latency, I could always construct a framework around kmalloc and alloc_page. I've also found some bad behavior around proto=rdma,sec=krb5i. When I run a heavy I/O workload (fio, for example), every so often a read operation fails with EIO. I dug into it a little and MIC verification fails for these replies on the client. I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=307 to document this second issue. I'm not sure what a next step would be. My suspicion is that either the server or the client is mishandling the RPC reply buffer, which causes the checksums to be different. Not sure why this would be so intermittent, though. -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-06-06 19:35 ` Chuck Lever @ 2017-06-06 19:41 ` J. Bruce Fields 2017-06-06 19:45 ` Chuck Lever 0 siblings, 1 reply; 14+ messages in thread From: J. Bruce Fields @ 2017-06-06 19:41 UTC (permalink / raw) To: Chuck Lever; +Cc: Benjamin Coddington, Linux NFS Mailing List On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote: > I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306 > > To check memory allocation latency, I could always construct > a framework around kmalloc and alloc_page. > > > I've also found some bad behavior around proto=rdma,sec=krb5i. > When I run a heavy I/O workload (fio, for example), every so > often a read operation fails with EIO. I dug into it a little > and MIC verification fails for these replies on the client. Do we still have the problem that the read data can change between the time we calculate the MIC and the time we transmit the data to the client? --b. > > I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=307 > to document this second issue. > > I'm not sure what a next step would be. My suspicion is that > either the server or the client is mishandling the RPC reply > buffer, which causes the checksums to be different. Not sure > why this would be so intermittent, though. > > > -- > Chuck Lever > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-06-06 19:41 ` J. Bruce Fields @ 2017-06-06 19:45 ` Chuck Lever 2017-06-06 20:15 ` J. Bruce Fields 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever @ 2017-06-06 19:45 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Benjamin Coddington, Linux NFS Mailing List > On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote: >> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306 >> >> To check memory allocation latency, I could always construct >> a framework around kmalloc and alloc_page. >> >> >> I've also found some bad behavior around proto=rdma,sec=krb5i. >> When I run a heavy I/O workload (fio, for example), every so >> often a read operation fails with EIO. I dug into it a little >> and MIC verification fails for these replies on the client. > > Do we still have the problem that the read data can change between the > time we calculate the MIC and the time we transmit the data to the > client? I don't see a problem with krb5p, which, if IIUC, would also fall victim to this situation, unless there is much stricter request serialization going on with krb5p. Even so, how would I detect if this issue was present? > --b. > >> >> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=307 >> to document this second issue. >> >> I'm not sure what a next step would be. My suspicion is that >> either the server or the client is mishandling the RPC reply >> buffer, which causes the checksums to be different. Not sure >> why this would be so intermittent, though. -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-06-06 19:45 ` Chuck Lever @ 2017-06-06 20:15 ` J. Bruce Fields 2017-06-06 20:16 ` Chuck Lever 0 siblings, 1 reply; 14+ messages in thread From: J. Bruce Fields @ 2017-06-06 20:15 UTC (permalink / raw) To: Chuck Lever; +Cc: Benjamin Coddington, Linux NFS Mailing List On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote: > > > On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote: > >> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306 > >> > >> To check memory allocation latency, I could always construct > >> a framework around kmalloc and alloc_page. > >> > >> > >> I've also found some bad behavior around proto=rdma,sec=krb5i. > >> When I run a heavy I/O workload (fio, for example), every so > >> often a read operation fails with EIO. I dug into it a little > >> and MIC verification fails for these replies on the client. > > > > Do we still have the problem that the read data can change between the > > time we calculate the MIC and the time we transmit the data to the > > client? > > I don't see a problem with krb5p, which, if IIUC, would also > fall victim to this situation, unless there is much stricter > request serialization going on with krb5p. We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case. > Even so, how would I detect if this issue was present? Good question. If you knew the data and mic in the bad case, and had some way to guess what the previous data might have been based on what you knew about the test, then you could try mic's of likely older versions of the data and see if you get a match.... That sounds hard. --b. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-06-06 20:15 ` J. Bruce Fields @ 2017-06-06 20:16 ` Chuck Lever 2017-06-06 20:23 ` J. Bruce Fields 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever @ 2017-06-06 20:16 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Benjamin Coddington, Linux NFS Mailing List > On Jun 6, 2017, at 4:15 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote: >> >>> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <bfields@fieldses.org> wrote: >>> >>> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote: >>>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306 >>>> >>>> To check memory allocation latency, I could always construct >>>> a framework around kmalloc and alloc_page. >>>> >>>> >>>> I've also found some bad behavior around proto=rdma,sec=krb5i. >>>> When I run a heavy I/O workload (fio, for example), every so >>>> often a read operation fails with EIO. I dug into it a little >>>> and MIC verification fails for these replies on the client. >>> >>> Do we still have the problem that the read data can change between the >>> time we calculate the MIC and the time we transmit the data to the >>> client? >> >> I don't see a problem with krb5p, which, if IIUC, would also >> fall victim to this situation, unless there is much stricter >> request serialization going on with krb5p. > > We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case. Seems like this is the right answer for krb5i too. Shall I try that? >> Even so, how would I detect if this issue was present? > > Good question. If you knew the data and mic in the bad case, and had > some way to guess what the previous data might have been based on what > you knew about the test, then you could try mic's of likely older > versions of the data and see if you get a match.... That sounds hard. > > --b. -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-06-06 20:16 ` Chuck Lever @ 2017-06-06 20:23 ` J. Bruce Fields 2017-06-06 20:54 ` Chuck Lever 0 siblings, 1 reply; 14+ messages in thread From: J. Bruce Fields @ 2017-06-06 20:23 UTC (permalink / raw) To: Chuck Lever; +Cc: Benjamin Coddington, Linux NFS Mailing List On Tue, Jun 06, 2017 at 04:16:53PM -0400, Chuck Lever wrote: > > > On Jun 6, 2017, at 4:15 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote: > >> > >>> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > >>> > >>> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote: > >>>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306 > >>>> > >>>> To check memory allocation latency, I could always construct > >>>> a framework around kmalloc and alloc_page. > >>>> > >>>> > >>>> I've also found some bad behavior around proto=rdma,sec=krb5i. > >>>> When I run a heavy I/O workload (fio, for example), every so > >>>> often a read operation fails with EIO. I dug into it a little > >>>> and MIC verification fails for these replies on the client. > >>> > >>> Do we still have the problem that the read data can change between the > >>> time we calculate the MIC and the time we transmit the data to the > >>> client? > >> > >> I don't see a problem with krb5p, which, if IIUC, would also > >> fall victim to this situation, unless there is much stricter > >> request serialization going on with krb5p. > > > > We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case. > > Seems like this is the right answer for krb5i too. Shall I try that? Sure! Just grep around for RQ_SPLICE_OK, I think it should be easy to figure out. --b. > > > >> Even so, how would I detect if this issue was present? > > > > Good question. If you knew the data and mic in the bad case, and had > > some way to guess what the previous data might have been based on what > > you knew about the test, then you could try mic's of likely older > > versions of the data and see if you get a match.... That sounds hard. > > > > --b. > > -- > Chuck Lever > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-06-06 20:23 ` J. Bruce Fields @ 2017-06-06 20:54 ` Chuck Lever 2017-06-06 20:56 ` J. Bruce Fields 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever @ 2017-06-06 20:54 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Benjamin Coddington, Linux NFS Mailing List > On Jun 6, 2017, at 4:23 PM, J. Bruce Fields <bfields@fieldses.org> = wrote: >=20 > On Tue, Jun 06, 2017 at 04:16:53PM -0400, Chuck Lever wrote: >>=20 >>> On Jun 6, 2017, at 4:15 PM, J. Bruce Fields <bfields@fieldses.org> = wrote: >>>=20 >>> On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote: >>>>=20 >>>>> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <bfields@fieldses.org> = wrote: >>>>>=20 >>>>> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote: >>>>>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D306 >>>>>>=20 >>>>>> To check memory allocation latency, I could always construct >>>>>> a framework around kmalloc and alloc_page. >>>>>>=20 >>>>>>=20 >>>>>> I've also found some bad behavior around proto=3Drdma,sec=3Dkrb5i. >>>>>> When I run a heavy I/O workload (fio, for example), every so >>>>>> often a read operation fails with EIO. I dug into it a little >>>>>> and MIC verification fails for these replies on the client. >>>>>=20 >>>>> Do we still have the problem that the read data can change between = the >>>>> time we calculate the MIC and the time we transmit the data to the >>>>> client? >>>>=20 >>>> I don't see a problem with krb5p, which, if IIUC, would also >>>> fall victim to this situation, unless there is much stricter >>>> request serialization going on with krb5p. >>>=20 >>> We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case. >>=20 >> Seems like this is the right answer for krb5i too. Shall I try that? >=20 > Sure! Just grep around for RQ_SPLICE_OK, I think it should be easy to > figure out. I added clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); at the top of unwrap_integ_data() in net/sunrpc/auth_gss/svcauth_gss.c. I haven't seen a failure yet, which is a good sign. > --b. >>=20 >>=20 >>>> Even so, how would I detect if this issue was present? >>>=20 >>> Good question. If you knew the data and mic in the bad case, and = had >>> some way to guess what the previous data might have been based on = what >>> you knew about the test, then you could try mic's of likely older >>> versions of the data and see if you get a match.... That sounds = hard. >>>=20 >>> --b. >>=20 >> -- >> Chuck Lever -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-06-06 20:54 ` Chuck Lever @ 2017-06-06 20:56 ` J. Bruce Fields 0 siblings, 0 replies; 14+ messages in thread From: J. Bruce Fields @ 2017-06-06 20:56 UTC (permalink / raw) To: Chuck Lever; +Cc: Benjamin Coddington, Linux NFS Mailing List On Tue, Jun 06, 2017 at 04:54:51PM -0400, Chuck Lever wrote: > > > On Jun 6, 2017, at 4:23 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Tue, Jun 06, 2017 at 04:16:53PM -0400, Chuck Lever wrote: > >> > >>> On Jun 6, 2017, at 4:15 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > >>> > >>> On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote: > >>>> > >>>>> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > >>>>> > >>>>> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote: > >>>>>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306 > >>>>>> > >>>>>> To check memory allocation latency, I could always construct > >>>>>> a framework around kmalloc and alloc_page. > >>>>>> > >>>>>> > >>>>>> I've also found some bad behavior around proto=rdma,sec=krb5i. > >>>>>> When I run a heavy I/O workload (fio, for example), every so > >>>>>> often a read operation fails with EIO. I dug into it a little > >>>>>> and MIC verification fails for these replies on the client. > >>>>> > >>>>> Do we still have the problem that the read data can change between the > >>>>> time we calculate the MIC and the time we transmit the data to the > >>>>> client? > >>>> > >>>> I don't see a problem with krb5p, which, if IIUC, would also > >>>> fall victim to this situation, unless there is much stricter > >>>> request serialization going on with krb5p. > >>> > >>> We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case. > >> > >> Seems like this is the right answer for krb5i too. Shall I try that? > > > > Sure! Just grep around for RQ_SPLICE_OK, I think it should be easy to > > figure out. > > I added > > clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); > > at the top of unwrap_integ_data() in net/sunrpc/auth_gss/svcauth_gss.c. > > I haven't seen a failure yet, which is a good sign. Well, unfortunately it means the only easy fix we have right now may cause a performance regression. Anyway, maybe that's what we should do. --b. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: GSS sequence number window 2017-05-30 19:34 ` J. Bruce Fields 2017-05-30 20:11 ` Benjamin Coddington @ 2017-05-30 21:03 ` Chuck Lever 1 sibling, 0 replies; 14+ messages in thread From: Chuck Lever @ 2017-05-30 21:03 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Linux NFS Mailing List > On May 30, 2017, at 3:34 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote: >> Hey Bruce! >> >> While testing with sec=krb5 and sec=krb5i, I noticed a lot of >> spurious connection loss, especially when I wanted to run a >> CPU-intensive workload on my NFS server at the same time I >> was testing. >> >> I added a pr_err() in gss_check_seq_num, and ran a fio job >> on a vers=3,sec=sys,proto=tcp mount (server is exporting a >> tmpfs). On the server, I rebuilt a kernel source tree cscope >> database at the same time. >> >> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = 250098, sd_max = 250291, GSS_SEQ_WIN = 128 >> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 937816, sd_max = 938171, GSS_SEQ_WIN = 128 >> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938544, sd_max = 938727, GSS_SEQ_WIN = 128 >> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938543, sd_max = 938727, GSS_SEQ_WIN = 128 >> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = 939344, sd_max = 939549, GSS_SEQ_WIN = 128 >> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = 965007, sd_max = 965176, GSS_SEQ_WIN = 128 >> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128 >> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128 >> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128 >> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128 >> >> RFC 2203 suggests there's no risk to using a large window. >> My first thought was to make the sequence window larger >> (say 2048) but I've seen stragglers outside even that large >> a window. >> >> Any thoughts about why there are these sequence number >> outliers? > > No, alas. > > The server's where it's easier to catch, but it'd be interesting to know > what's going on on the client side when this happens. > > Might be interesting to know just what the operation (or compound ops) > are in the bad cases, in case there's some pattern. It appears to happen more frequently with a 100% read workload that accesses multiple files. > Might also be interesting to check where exactly in the client code the > sequence number is assigned and where in the server code it's checked, > and think about where they might get queued up or reordered in between. > > Is it possible there are retries involved? It's hard to tell, since disconnect means there will be a recorded retransmission. > Does increasing the number of server threads help? Yes, but it doesn't fully address the issue. > I don't think it'd be a problem to increase the sequence window. Increasing the window to 1024 does not seem to help. -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-06-06 20:56 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-05-30 18:58 GSS sequence number window Chuck Lever 2017-05-30 19:34 ` J. Bruce Fields 2017-05-30 20:11 ` Benjamin Coddington 2017-05-31 19:22 ` J. Bruce Fields 2017-05-31 20:27 ` Chuck Lever 2017-06-06 19:35 ` Chuck Lever 2017-06-06 19:41 ` J. Bruce Fields 2017-06-06 19:45 ` Chuck Lever 2017-06-06 20:15 ` J. Bruce Fields 2017-06-06 20:16 ` Chuck Lever 2017-06-06 20:23 ` J. Bruce Fields 2017-06-06 20:54 ` Chuck Lever 2017-06-06 20:56 ` J. Bruce Fields 2017-05-30 21:03 ` Chuck Lever
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.