* NFS troubles @ 2018-04-02 17:50 Orion Poplawski 2018-04-02 18:30 ` Benjamin Coddington 0 siblings, 1 reply; 12+ messages in thread From: Orion Poplawski @ 2018-04-02 17:50 UTC (permalink / raw) To: linux-nfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 804 bytes --] I'm having a lot of trouble with NFS going out to lunch between my RHEL7 machines. Users cannot access files, and get errors like: $ touch blah touch: cannot touch ‘blah’: Input/output error I'm attaching a pcap trace of the above touch during the problem. It seems that the server is returning NFS4ERR_EXPIRED. Reboots/restarts of nfs help for a bit but then the problems return. Other symptoms of trouble are messages like: RPC: fragment too large: 613351424 on the client. Any help with trying to track this down would be greatly appreciated. -- Orion Poplawski Manager of NWRA Technical Systems 720-772-5637 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion@nwra.com Boulder, CO 80301 https://www.nwra.com/ [-- Attachment #2: nfs-bad.pcap --] [-- Type: application/vnd.tcpdump.pcap, Size: 2212 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-02 17:50 NFS troubles Orion Poplawski @ 2018-04-02 18:30 ` Benjamin Coddington 2018-04-03 15:44 ` Orion Poplawski 0 siblings, 1 reply; 12+ messages in thread From: Benjamin Coddington @ 2018-04-02 18:30 UTC (permalink / raw) To: Orion Poplawski; +Cc: linux-nfs NFS4ERR_EXPIRED means the client is trying to use state that the server believes to have expired or that has been supplanted by newer state. Can we get kernel versions for the client and server? Have you talked to your Red Hat support channel about this? This capture doesn't appear to show any bugs or bad behaviors, but a longer capture may.. Ben On 2 Apr 2018, at 13:50, Orion Poplawski wrote: > I'm having a lot of trouble with NFS going out to lunch between my > RHEL7 > machines. Users cannot access files, and get errors like: > > $ touch blah > touch: cannot touch ‘blah’: Input/output error > > I'm attaching a pcap trace of the above touch during the problem. It > seems > that the server is returning NFS4ERR_EXPIRED. > > Reboots/restarts of nfs help for a bit but then the problems return. > > Other symptoms of trouble are messages like: > > RPC: fragment too large: 613351424 > > on the client. > > Any help with trying to track this down would be greatly appreciated. > > -- > Orion Poplawski > Manager of NWRA Technical Systems 720-772-5637 > NWRA, Boulder/CoRA Office FAX: 303-415-9702 > 3380 Mitchell Lane orion@nwra.com > Boulder, CO 80301 https://www.nwra.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-02 18:30 ` Benjamin Coddington @ 2018-04-03 15:44 ` Orion Poplawski 2018-04-04 14:08 ` Olga Kornievskaia 2018-04-06 16:07 ` Orion Poplawski 0 siblings, 2 replies; 12+ messages in thread From: Orion Poplawski @ 2018-04-03 15:44 UTC (permalink / raw) To: Benjamin Coddington; +Cc: linux-nfs Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these systems. I discovered that I'd been forcing vers=4.0 mounts in order to work around a mounting issue. I'm moving back to the default 4.1 mounts as it seems to work better for this issue. If the issue returns I'll try to grab a longer trace. Thanks. On 04/02/2018 12:30 PM, Benjamin Coddington wrote: > NFS4ERR_EXPIRED means the client is trying to use state that the server > believes to have expired or that has been supplanted by newer state. Can we > get kernel versions for the client and server? Have you talked to your Red > Hat support channel about this? > > This capture doesn't appear to show any bugs or bad behaviors, but a longer > capture may.. > > Ben > > On 2 Apr 2018, at 13:50, Orion Poplawski wrote: > >> I'm having a lot of trouble with NFS going out to lunch between my RHEL7 >> machines. Users cannot access files, and get errors like: >> >> $ touch blah >> touch: cannot touch ‘blah’: Input/output error >> >> I'm attaching a pcap trace of the above touch during the problem. It seems >> that the server is returning NFS4ERR_EXPIRED. >> >> Reboots/restarts of nfs help for a bit but then the problems return. >> >> Other symptoms of trouble are messages like: >> >> RPC: fragment too large: 613351424 >> >> on the client. >> >> Any help with trying to track this down would be greatly appreciated. >> >> -- >> Orion Poplawski >> Manager of NWRA Technical Systems 720-772-5637 >> NWRA, Boulder/CoRA Office FAX: 303-415-9702 >> 3380 Mitchell Lane orion@nwra.com >> Boulder, CO 80301 https://www.nwra.com/ -- Orion Poplawski Manager of NWRA Technical Systems 720-772-5637 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion@nwra.com Boulder, CO 80301 https://www.nwra.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-03 15:44 ` Orion Poplawski @ 2018-04-04 14:08 ` Olga Kornievskaia 2018-04-06 16:07 ` Orion Poplawski 1 sibling, 0 replies; 12+ messages in thread From: Olga Kornievskaia @ 2018-04-04 14:08 UTC (permalink / raw) To: Orion Poplawski; +Cc: Benjamin Coddington, linux-nfs I wonder if this is the issue we ran into during the NFS bakeathon testing last week. Problem was that a previous NFS4.0 mount left state so that umount didn't actually unmount. Next mount only did a PUTROOTFH and there was no SETCLIENTID, then any operations that tried to use the clientid got ERR_EXPIRED. We will be trying to reproduce it again and trying to fix it. On Tue, Apr 3, 2018 at 11:44 AM, Orion Poplawski <orion@nwra.com> wrote: > Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for th= ese > systems. > > I discovered that I'd been forcing vers=3D4.0 mounts in order to work aro= und a > mounting issue. I'm moving back to the default 4.1 mounts as it seems to= work > better for this issue. If the issue returns I'll try to grab a longer tr= ace. > > Thanks. > > On 04/02/2018 12:30 PM, Benjamin Coddington wrote: >> NFS4ERR_EXPIRED means the client is trying to use state that the server >> believes to have expired or that has been supplanted by newer state. Ca= n we >> get kernel versions for the client and server? Have you talked to your = Red >> Hat support channel about this? >> >> This capture doesn't appear to show any bugs or bad behaviors, but a lon= ger >> capture may.. >> >> Ben >> >> On 2 Apr 2018, at 13:50, Orion Poplawski wrote: >> >>> I'm having a lot of trouble with NFS going out to lunch between my RHEL= 7 >>> machines. Users cannot access files, and get errors like: >>> >>> $ touch blah >>> touch: cannot touch =E2=80=98blah=E2=80=99: Input/output error >>> >>> I'm attaching a pcap trace of the above touch during the problem. It s= eems >>> that the server is returning NFS4ERR_EXPIRED. >>> >>> Reboots/restarts of nfs help for a bit but then the problems return. >>> >>> Other symptoms of trouble are messages like: >>> >>> RPC: fragment too large: 613351424 >>> >>> on the client. >>> >>> Any help with trying to track this down would be greatly appreciated. >>> >>> -- >>> Orion Poplawski >>> Manager of NWRA Technical Systems 720-772-5637 >>> NWRA, Boulder/CoRA Office FAX: 303-415-9702 >>> 3380 Mitchell Lane orion@nwra.com >>> Boulder, CO 80301 https://www.nwra.com/ > > > -- > Orion Poplawski > Manager of NWRA Technical Systems 720-772-5637 > NWRA, Boulder/CoRA Office FAX: 303-415-9702 > 3380 Mitchell Lane orion@nwra.com > Boulder, CO 80301 https://www.nwra.com/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-03 15:44 ` Orion Poplawski 2018-04-04 14:08 ` Olga Kornievskaia @ 2018-04-06 16:07 ` Orion Poplawski 2018-04-06 16:24 ` Chuck Lever 2018-04-07 0:15 ` Chuck Lever 1 sibling, 2 replies; 12+ messages in thread From: Orion Poplawski @ 2018-04-06 16:07 UTC (permalink / raw) To: Benjamin Coddington; +Cc: linux-nfs [-- Attachment #1: Type: text/plain, Size: 1723 bytes --] On 04/03/2018 09:44 AM, Orion Poplawski wrote: > Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these > systems. > > I discovered that I'd been forcing vers=4.0 mounts in order to work around a > mounting issue. And I'm back to seeing the mount issue at boot. Here's the situation - we're forcing kerberos on the public network, but allowing sec=sys on some private networks: /etc/exports: / -ro,async,fsid=0 192.168.1.0/24(sec=sys) 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5) /export/home -rw,async,nohide 192.168.1.0/24(sec=sys) 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5) So for a while after boot, attempts to mount with sec=sys fail: # mount -t nfs4 -s -o sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1 earthib.cora.nwra.com:/export/home/greg /mnt mount.nfs4: Operation not permitted But then later they work: # mount -t nfs4 -s -o sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1 earthib.cora.nwra.com:/export/home/greg /mnt # umount /mnt This can cycle back and forth. I've attached a packet capture of some failed mount attempts. It seems that even with specifying sec=sys, some kerberos stuff is going on. It appears to be related to mounting a different sec=krb5 mount over the public network from the same server. While that mount is active, the sec=sys mounts fail. When it is unmounted, they work. At least now I think I can work around this... -- Orion Poplawski Manager of NWRA Technical Systems 720-772-5637 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion@nwra.com Boulder, CO 80301 https://www.nwra.com/ [-- Attachment #2: mount-fail.pcap --] [-- Type: application/vnd.tcpdump.pcap, Size: 9628 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-06 16:07 ` Orion Poplawski @ 2018-04-06 16:24 ` Chuck Lever 2018-04-06 18:16 ` J. Bruce Fields 2018-04-06 22:05 ` Orion Poplawski 2018-04-07 0:15 ` Chuck Lever 1 sibling, 2 replies; 12+ messages in thread From: Chuck Lever @ 2018-04-06 16:24 UTC (permalink / raw) To: Orion Poplawski; +Cc: Benjamin Coddington, Linux NFS Mailing List > On Apr 6, 2018, at 12:07 PM, Orion Poplawski <orion@nwra.com> wrote: >=20 > On 04/03/2018 09:44 AM, Orion Poplawski wrote: >> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support = for these >> systems. >>=20 >> I discovered that I'd been forcing vers=3D4.0 mounts in order to work = around a >> mounting issue. =20 >=20 > And I'm back to seeing the mount issue at boot. Here's the situation = - we're > forcing kerberos on the public network, but allowing sec=3Dsys on some = private > networks: >=20 > /etc/exports: > / -ro,async,fsid=3D0 192.168.1.0/24(sec=3Dsys) > 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5) > /export/home -rw,async,nohide 192.168.1.0/24(sec=3Dsys) > 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5) >=20 > So for a while after boot, attempts to mount with sec=3Dsys fail: >=20 > # mount -t nfs4 -s -o > = sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv= e,actimeo=3D1 > earthib.cora.nwra.com:/export/home/greg /mnt > mount.nfs4: Operation not permitted >=20 > But then later they work: >=20 > # mount -t nfs4 -s -o > = sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv= e,actimeo=3D1 > earthib.cora.nwra.com:/export/home/greg /mnt > # umount /mnt >=20 > This can cycle back and forth. >=20 > I've attached a packet capture of some failed mount attempts. It = seems that > even with specifying sec=3Dsys, some kerberos stuff is going on. >=20 > It appears to be related to mounting a different sec=3Dkrb5 mount over = the > public network from the same server. While that mount is active, the = sec=3Dsys > mounts fail. When it is unmounted, they work. At least now I think I = can > work around this... For NFSv4, the client is going to use krb5i to do lease management even on sec=3Dsys mounts. An NFSv4 server has to know for sure when it is = talking to the same client on different network interfaces or with different security flavors. Thus the client has to use the same security flavor = for lease management on all of its mounts of that server. That's not = controlled by the sec=3D mount option. I assume that "but then later" lasts only a few multiples of the = server's lease time (90 seconds by default)? Clients that use only the private network interface should be able to = use sec=3Dsys. But clients that use both the public and private interfaces = should need to use sec=3Dkrb5 on both. -- Chuck Lever ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-06 16:24 ` Chuck Lever @ 2018-04-06 18:16 ` J. Bruce Fields 2018-04-06 18:18 ` Chuck Lever 2018-04-06 22:05 ` Orion Poplawski 1 sibling, 1 reply; 12+ messages in thread From: J. Bruce Fields @ 2018-04-06 18:16 UTC (permalink / raw) To: Chuck Lever; +Cc: Orion Poplawski, Benjamin Coddington, Linux NFS Mailing List On Fri, Apr 06, 2018 at 12:24:21PM -0400, Chuck Lever wrote: > > > > On Apr 6, 2018, at 12:07 PM, Orion Poplawski <orion@nwra.com> wrote: > > > > On 04/03/2018 09:44 AM, Orion Poplawski wrote: > >> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these > >> systems. > >> > >> I discovered that I'd been forcing vers=4.0 mounts in order to work around a > >> mounting issue. > > > > And I'm back to seeing the mount issue at boot. Here's the situation - we're > > forcing kerberos on the public network, but allowing sec=sys on some private > > networks: > > > > /etc/exports: > > / -ro,async,fsid=0 192.168.1.0/24(sec=sys) > > 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5) > > /export/home -rw,async,nohide 192.168.1.0/24(sec=sys) > > 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5) > > > > So for a while after boot, attempts to mount with sec=sys fail: > > > > # mount -t nfs4 -s -o > > sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1 > > earthib.cora.nwra.com:/export/home/greg /mnt > > mount.nfs4: Operation not permitted > > > > But then later they work: > > > > # mount -t nfs4 -s -o > > sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1 > > earthib.cora.nwra.com:/export/home/greg /mnt > > # umount /mnt > > > > This can cycle back and forth. > > > > I've attached a packet capture of some failed mount attempts. It seems that > > even with specifying sec=sys, some kerberos stuff is going on. > > > > It appears to be related to mounting a different sec=krb5 mount over the > > public network from the same server. While that mount is active, the sec=sys > > mounts fail. When it is unmounted, they work. At least now I think I can > > work around this... > > For NFSv4, the client is going to use krb5i to do lease management even > on sec=sys mounts. An NFSv4 server has to know for sure when it is talking > to the same client on different network interfaces or with different > security flavors. Thus the client has to use the same security flavor for > lease management on all of its mounts of that server. That's not controlled > by the sec= mount option. > > I assume that "but then later" lasts only a few multiples of the server's > lease time (90 seconds by default)? > > Clients that use only the private network interface should be able to use > sec=sys. But clients that use both the public and private interfaces should > need to use sec=krb5 on both. Are you saying that the behavior he's seeing is expected? I'd expect sec=sys and sec=krb5 mounts to the same server to coexist and both use krb5i to manage the (shared) lease state. --b. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-06 18:16 ` J. Bruce Fields @ 2018-04-06 18:18 ` Chuck Lever 0 siblings, 0 replies; 12+ messages in thread From: Chuck Lever @ 2018-04-06 18:18 UTC (permalink / raw) To: Bruce Fields; +Cc: Orion Poplawski, Benjamin Coddington, Linux NFS Mailing List > On Apr 6, 2018, at 2:16 PM, bfields@fieldses.org wrote: >=20 > On Fri, Apr 06, 2018 at 12:24:21PM -0400, Chuck Lever wrote: >>=20 >>=20 >>> On Apr 6, 2018, at 12:07 PM, Orion Poplawski <orion@nwra.com> wrote: >>>=20 >>> On 04/03/2018 09:44 AM, Orion Poplawski wrote: >>>> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support = for these >>>> systems. >>>>=20 >>>> I discovered that I'd been forcing vers=3D4.0 mounts in order to = work around a >>>> mounting issue. =20 >>>=20 >>> And I'm back to seeing the mount issue at boot. Here's the = situation - we're >>> forcing kerberos on the public network, but allowing sec=3Dsys on = some private >>> networks: >>>=20 >>> /etc/exports: >>> / -ro,async,fsid=3D0 192.168.1.0/24(sec=3Dsys) >>> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5) >>> /export/home -rw,async,nohide 192.168.1.0/24(sec=3Dsys) >>> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5) >>>=20 >>> So for a while after boot, attempts to mount with sec=3Dsys fail: >>>=20 >>> # mount -t nfs4 -s -o >>> = sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv= e,actimeo=3D1 >>> earthib.cora.nwra.com:/export/home/greg /mnt >>> mount.nfs4: Operation not permitted >>>=20 >>> But then later they work: >>>=20 >>> # mount -t nfs4 -s -o >>> = sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv= e,actimeo=3D1 >>> earthib.cora.nwra.com:/export/home/greg /mnt >>> # umount /mnt >>>=20 >>> This can cycle back and forth. >>>=20 >>> I've attached a packet capture of some failed mount attempts. It = seems that >>> even with specifying sec=3Dsys, some kerberos stuff is going on. >>>=20 >>> It appears to be related to mounting a different sec=3Dkrb5 mount = over the >>> public network from the same server. While that mount is active, = the sec=3Dsys >>> mounts fail. When it is unmounted, they work. At least now I think = I can >>> work around this... >>=20 >> For NFSv4, the client is going to use krb5i to do lease management = even >> on sec=3Dsys mounts. An NFSv4 server has to know for sure when it is = talking >> to the same client on different network interfaces or with different >> security flavors. Thus the client has to use the same security flavor = for >> lease management on all of its mounts of that server. That's not = controlled >> by the sec=3D mount option. >>=20 >> I assume that "but then later" lasts only a few multiples of the = server's >> lease time (90 seconds by default)? >>=20 >> Clients that use only the private network interface should be able to = use >> sec=3Dsys. But clients that use both the public and private = interfaces should >> need to use sec=3Dkrb5 on both. >=20 > Are you saying that the behavior he's seeing is expected? I spoke without looking at the PCAP, perhaps I was hasty. > I'd expect sec=3Dsys and sec=3Dkrb5 mounts to the same server to = coexist and > both use krb5i to manage the (shared) lease state. Me too, if the NFS client's trunking detection is working. -- Chuck Lever ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-06 16:24 ` Chuck Lever 2018-04-06 18:16 ` J. Bruce Fields @ 2018-04-06 22:05 ` Orion Poplawski 1 sibling, 0 replies; 12+ messages in thread From: Orion Poplawski @ 2018-04-06 22:05 UTC (permalink / raw) To: Chuck Lever; +Cc: Benjamin Coddington, Linux NFS Mailing List On 04/06/2018 10:24 AM, Chuck Lever wrote: > > >> On Apr 6, 2018, at 12:07 PM, Orion Poplawski <orion@nwra.com> wrote: >> >> On 04/03/2018 09:44 AM, Orion Poplawski wrote: >>> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these >>> systems. >>> >>> I discovered that I'd been forcing vers=4.0 mounts in order to work around a >>> mounting issue. >> >> And I'm back to seeing the mount issue at boot. Here's the situation - we're >> forcing kerberos on the public network, but allowing sec=sys on some private >> networks: >> >> /etc/exports: >> / -ro,async,fsid=0 192.168.1.0/24(sec=sys) >> 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5) >> /export/home -rw,async,nohide 192.168.1.0/24(sec=sys) >> 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5) >> >> So for a while after boot, attempts to mount with sec=sys fail: >> >> # mount -t nfs4 -s -o >> sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1 >> earthib.cora.nwra.com:/export/home/greg /mnt >> mount.nfs4: Operation not permitted >> >> But then later they work: >> >> # mount -t nfs4 -s -o >> sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1 >> earthib.cora.nwra.com:/export/home/greg /mnt >> # umount /mnt >> >> This can cycle back and forth. >> >> I've attached a packet capture of some failed mount attempts. It seems that >> even with specifying sec=sys, some kerberos stuff is going on. >> >> It appears to be related to mounting a different sec=krb5 mount over the >> public network from the same server. While that mount is active, the sec=sys >> mounts fail. When it is unmounted, they work. At least now I think I can >> work around this... > > For NFSv4, the client is going to use krb5i to do lease management even > on sec=sys mounts. An NFSv4 server has to know for sure when it is talking > to the same client on different network interfaces or with different > security flavors. Thus the client has to use the same security flavor for > lease management on all of its mounts of that server. That's not controlled > by the sec= mount option. > > I assume that "but then later" lasts only a few multiples of the server's > lease time (90 seconds by default)? > > Clients that use only the private network interface should be able to use > sec=sys. But clients that use both the public and private interfaces should > need to use sec=krb5 on both. Testing again with RHEL 7.5 beta: 3.10.0-830.el7.x86_64 nfs-utils-1.3.0-0.52.el7.x86_64 Near as I can tell, with NFS version 4.1 mounts, as long as there is an active sec=krb5 mount to a server, sec=sys mounts to that server fail (with the above /etc/exports). As soon as I unmount the sec=krb5 mount, the sec=sys mount works. And vice-versa - an active sec=sys mount prevents sec=krb5 mounts. Waiting does not appear to help. The sporadic behavior noted before was due to the use of the automounter - as mounts came and went different ones would break. If I use NFS version 4.0, I can have both. But then I get the locking bug in my original post. -- Orion Poplawski Manager of NWRA Technical Systems 720-772-5637 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion@nwra.com Boulder, CO 80301 https://www.nwra.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-06 16:07 ` Orion Poplawski 2018-04-06 16:24 ` Chuck Lever @ 2018-04-07 0:15 ` Chuck Lever 2018-04-07 2:46 ` Bruce Fields 1 sibling, 1 reply; 12+ messages in thread From: Chuck Lever @ 2018-04-07 0:15 UTC (permalink / raw) To: Bruce Fields; +Cc: Linux NFS Mailing List > On Apr 6, 2018, at 12:07 PM, Orion Poplawski <orion@nwra.com> wrote: >=20 > On 04/03/2018 09:44 AM, Orion Poplawski wrote: >> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support = for these >> systems. >>=20 >> I discovered that I'd been forcing vers=3D4.0 mounts in order to work = around a >> mounting issue. =20 >=20 > And I'm back to seeing the mount issue at boot. Here's the situation = - we're > forcing kerberos on the public network, but allowing sec=3Dsys on some = private > networks: >=20 > /etc/exports: > / -ro,async,fsid=3D0 192.168.1.0/24(sec=3Dsys) > 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5) > /export/home -rw,async,nohide 192.168.1.0/24(sec=3Dsys) > 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5) >=20 > So for a while after boot, attempts to mount with sec=3Dsys fail: >=20 > # mount -t nfs4 -s -o > = sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv= e,actimeo=3D1 > earthib.cora.nwra.com:/export/home/greg /mnt > mount.nfs4: Operation not permitted >=20 > But then later they work: >=20 > # mount -t nfs4 -s -o > = sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv= e,actimeo=3D1 > earthib.cora.nwra.com:/export/home/greg /mnt > # umount /mnt >=20 > This can cycle back and forth. >=20 > I've attached a packet capture of some failed mount attempts. It = seems that > even with specifying sec=3Dsys, some kerberos stuff is going on. > It appears to be related to mounting a different sec=3Dkrb5 mount over = the > public network from the same server. While that mount is active, the = sec=3Dsys > mounts fail. When it is unmounted, they work. At least now I think I = can > work around this... Bruce- I examined the attached network capture. There are two attempts to do an EXCHANGE_ID operation. Both times: - a fresh GSS context is established successfully - a fresh TCP connection is established by the client - EXCHANGE_ID is sent using krb5i and the previously established GSS = context -- client owner verifier is 0x5ac794e81d0a1d81 -- client owner is "Linux NFSv4.1 qcomp1.cora.nwra.com" -- state protection is SP4_MACH_CRED - the server responds NFS4_OK; the CONFIRMED_R, PNFS_MDS, and = MOVED_REFER flags are set - the client destroys the GSS context - the client closes the TCP connection -- Chuck Lever ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-07 0:15 ` Chuck Lever @ 2018-04-07 2:46 ` Bruce Fields 2018-04-07 21:23 ` Chuck Lever 0 siblings, 1 reply; 12+ messages in thread From: Bruce Fields @ 2018-04-07 2:46 UTC (permalink / raw) To: Chuck Lever; +Cc: Linux NFS Mailing List On Fri, Apr 06, 2018 at 08:15:35PM -0400, Chuck Lever wrote: > > > On Apr 6, 2018, at 12:07 PM, Orion Poplawski <orion@nwra.com> wrote: > > > > On 04/03/2018 09:44 AM, Orion Poplawski wrote: > >> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support for these > >> systems. > >> > >> I discovered that I'd been forcing vers=4.0 mounts in order to work around a > >> mounting issue. > > > > And I'm back to seeing the mount issue at boot. Here's the situation - we're > > forcing kerberos on the public network, but allowing sec=sys on some private > > networks: > > > > /etc/exports: > > / -ro,async,fsid=0 192.168.1.0/24(sec=sys) > > 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5) > > /export/home -rw,async,nohide 192.168.1.0/24(sec=sys) > > 192.168.2.0/24(sec=sys) *.nwra.com(sec=krb5) > > > > So for a while after boot, attempts to mount with sec=sys fail: > > > > # mount -t nfs4 -s -o > > sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1 > > earthib.cora.nwra.com:/export/home/greg /mnt > > mount.nfs4: Operation not permitted > > > > But then later they work: > > > > # mount -t nfs4 -s -o > > sec=sys,intr,rsize=262144,wsize=262144,noatime,lookupcache=positive,actimeo=1 > > earthib.cora.nwra.com:/export/home/greg /mnt > > # umount /mnt > > > > This can cycle back and forth. > > > > I've attached a packet capture of some failed mount attempts. It seems that > > even with specifying sec=sys, some kerberos stuff is going on. > > > It appears to be related to mounting a different sec=krb5 mount over the > > public network from the same server. While that mount is active, the sec=sys > > mounts fail. When it is unmounted, they work. At least now I think I can > > work around this... > > Bruce- > > I examined the attached network capture. There are two attempts to do an > EXCHANGE_ID operation. Both times: > > - a fresh GSS context is established successfully > - a fresh TCP connection is established by the client > - EXCHANGE_ID is sent using krb5i and the previously established GSS context > -- client owner verifier is 0x5ac794e81d0a1d81 > -- client owner is "Linux NFSv4.1 qcomp1.cora.nwra.com" > -- state protection is SP4_MACH_CRED > - the server responds NFS4_OK; the CONFIRMED_R, PNFS_MDS, and MOVED_REFER flags are set > - the client destroys the GSS context > - the client closes the TCP connection Huh. If this is a second mount to the same server, it shouldn't need to do another EXCHANGE_ID at all, should it? I suppose the trunking detection code's being overzealous. Anyway, doesn't sound like the trace tells us much. Sounds easy to reproduce, so maybe we just need to try it and see where exactly the client code is failing. --b. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS troubles 2018-04-07 2:46 ` Bruce Fields @ 2018-04-07 21:23 ` Chuck Lever 0 siblings, 0 replies; 12+ messages in thread From: Chuck Lever @ 2018-04-07 21:23 UTC (permalink / raw) To: Bruce Fields; +Cc: Linux NFS Mailing List > On Apr 6, 2018, at 10:46 PM, Bruce Fields <bfields@fieldses.org> = wrote: >=20 > On Fri, Apr 06, 2018 at 08:15:35PM -0400, Chuck Lever wrote: >>=20 >>> On Apr 6, 2018, at 12:07 PM, Orion Poplawski <orion@nwra.com> wrote: >>>=20 >>> On 04/03/2018 09:44 AM, Orion Poplawski wrote: >>>> Kernel is 3.10.0-693.21.1.el7.x86_64 I don't have Red Hat support = for these >>>> systems. >>>>=20 >>>> I discovered that I'd been forcing vers=3D4.0 mounts in order to = work around a >>>> mounting issue. =20 >>>=20 >>> And I'm back to seeing the mount issue at boot. Here's the = situation - we're >>> forcing kerberos on the public network, but allowing sec=3Dsys on = some private >>> networks: >>>=20 >>> /etc/exports: >>> / -ro,async,fsid=3D0 192.168.1.0/24(sec=3Dsys) >>> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5) >>> /export/home -rw,async,nohide 192.168.1.0/24(sec=3Dsys) >>> 192.168.2.0/24(sec=3Dsys) *.nwra.com(sec=3Dkrb5) >>>=20 >>> So for a while after boot, attempts to mount with sec=3Dsys fail: >>>=20 >>> # mount -t nfs4 -s -o >>> = sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv= e,actimeo=3D1 >>> earthib.cora.nwra.com:/export/home/greg /mnt >>> mount.nfs4: Operation not permitted >>>=20 >>> But then later they work: >>>=20 >>> # mount -t nfs4 -s -o >>> = sec=3Dsys,intr,rsize=3D262144,wsize=3D262144,noatime,lookupcache=3Dpositiv= e,actimeo=3D1 >>> earthib.cora.nwra.com:/export/home/greg /mnt >>> # umount /mnt >>>=20 >>> This can cycle back and forth. >>>=20 >>> I've attached a packet capture of some failed mount attempts. It = seems that >>> even with specifying sec=3Dsys, some kerberos stuff is going on. >>=20 >>> It appears to be related to mounting a different sec=3Dkrb5 mount = over the >>> public network from the same server. While that mount is active, = the sec=3Dsys >>> mounts fail. When it is unmounted, they work. At least now I think = I can >>> work around this... >>=20 >> Bruce- >>=20 >> I examined the attached network capture. There are two attempts to do = an >> EXCHANGE_ID operation. Both times: >>=20 >> - a fresh GSS context is established successfully >> - a fresh TCP connection is established by the client >> - EXCHANGE_ID is sent using krb5i and the previously established GSS = context >> -- client owner verifier is 0x5ac794e81d0a1d81 >> -- client owner is "Linux NFSv4.1 qcomp1.cora.nwra.com" >> -- state protection is SP4_MACH_CRED >> - the server responds NFS4_OK; the CONFIRMED_R, PNFS_MDS, and = MOVED_REFER flags are set >> - the client destroys the GSS context >> - the client closes the TCP connection >=20 > Huh. If this is a second mount to the same server, it shouldn't need = to > do another EXCHANGE_ID at all, should it? The EXCHANGE_ID attempts are five seconds apart. It could be that there were two separate mount attempts. > I suppose the trunking > detection code's being overzealous. Anyway, doesn't sound like the > trace tells us much. Sounds easy to reproduce, so maybe we just need = to > try it and see where exactly the client code is failing. -- Chuck Lever ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-04-07 21:23 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-04-02 17:50 NFS troubles Orion Poplawski 2018-04-02 18:30 ` Benjamin Coddington 2018-04-03 15:44 ` Orion Poplawski 2018-04-04 14:08 ` Olga Kornievskaia 2018-04-06 16:07 ` Orion Poplawski 2018-04-06 16:24 ` Chuck Lever 2018-04-06 18:16 ` J. Bruce Fields 2018-04-06 18:18 ` Chuck Lever 2018-04-06 22:05 ` Orion Poplawski 2018-04-07 0:15 ` Chuck Lever 2018-04-07 2:46 ` Bruce Fields 2018-04-07 21:23 ` Chuck Lever
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).