* NooB Assitance with debugging NFSv4 client requested
@ 2010-12-10 15:43 Tim Watts
[not found] ` <4D024A8E.50407-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
2010-12-16 22:03 ` Tim Watts
0 siblings, 2 replies; 7+ messages in thread
From: Tim Watts @ 2010-12-10 15:43 UTC (permalink / raw)
To: linux-nfs
Hi,
I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
server is running Centos 5.5 and we use MIT kerberos and LDAP for
users/groups. This seems to work well with Centos 5.5 clients
All works fine with my Ubuntu client, except after a while my client
acts like it loses its authentication - symptom: home directory mount
drops to "nobody" - I see the mount as "other" - no write access, can
read files that have world read bit set etc.
This can happen anytime between 48 hours and 2 hours after a full client
reboot. It seems to be triggered by active use of thunderbird via the
NFSv4 mounted home dir which suggests it may be load sensitive.
When it happens, if I unmount my home dir (killing the desktop of
course) , then remount the fault is cleared and I can work again.
What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
I have run rpc.gssd in foreground debug mode and that doesn't say much
during the problem times, ditto idmapd. We are using openldap for passwd
and group lookups cached locally with nscd.
I have tried upping kernel debugging:
rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
callback client mount all
but I'm not sure what I'm looking for.
The symptoms feel like the kernel is losing the ticket or timing it out
or possibly the ID mapping is failing - is there any way to examining
the state of the kernel ticket cache or anything else I could be looking
for?
I am tempted to say this is a bug, possibly in the Ubuntu build, but I
would like to investigate further.
Any pointers much appreciated as to how I might isolate the fault further.
Cheers
Tim
--
Tim Watts
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NooB Assitance with debugging NFSv4 client requested
[not found] ` <4D024A8E.50407-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
@ 2010-12-13 10:22 ` Tim Watts
2010-12-13 11:43 ` Tim Watts
0 siblings, 1 reply; 7+ messages in thread
From: Tim Watts @ 2010-12-13 10:22 UTC (permalink / raw)
To: linux-nfs
Bump - anyone :)
Please note - I'm not requesting anyone fix a deeply obscure problem.
I'm a sysadmin - I'd just be really grateful for some pointers to help
me continue debugging this myself...
Ta,
Tim
On 10/12/10 15:43, Tim Watts wrote:
> Hi,
>
> I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
> server is running Centos 5.5 and we use MIT kerberos and LDAP for
> users/groups. This seems to work well with Centos 5.5 clients
>
> All works fine with my Ubuntu client, except after a while my client
> acts like it loses its authentication - symptom: home directory mount
> drops to "nobody" - I see the mount as "other" - no write access, can
> read files that have world read bit set etc.
>
> This can happen anytime between 48 hours and 2 hours after a full client
> reboot. It seems to be triggered by active use of thunderbird via the
> NFSv4 mounted home dir which suggests it may be load sensitive.
>
> When it happens, if I unmount my home dir (killing the desktop of
> course) , then remount the fault is cleared and I can work again.
>
> What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
>
> I have run rpc.gssd in foreground debug mode and that doesn't say much
> during the problem times, ditto idmapd. We are using openldap for passwd
> and group lookups cached locally with nscd.
>
> I have tried upping kernel debugging:
>
> rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
> callback client mount all
>
> but I'm not sure what I'm looking for.
>
> The symptoms feel like the kernel is losing the ticket or timing it out
> or possibly the ID mapping is failing - is there any way to examining
> the state of the kernel ticket cache or anything else I could be looking
> for?
>
> I am tempted to say this is a bug, possibly in the Ubuntu build, but I
> would like to investigate further.
>
> Any pointers much appreciated as to how I might isolate the fault further.
>
> Cheers
>
> Tim
--
Tim Watts
Linux Sysadmin, High Energy Physics, Imperial College London
Tel: 020 759 47809
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NooB Assitance with debugging NFSv4 client requested
2010-12-13 10:22 ` Tim Watts
@ 2010-12-13 11:43 ` Tim Watts
2010-12-13 18:55 ` J. Bruce Fields
0 siblings, 1 reply; 7+ messages in thread
From: Tim Watts @ 2010-12-13 11:43 UTC (permalink / raw)
To: linux-nfs
Hmm,
Wonder if this is a clue (not noticed this before, but running idmapd in
the foreground, it just whined):
rpc.idmapd: nfscb: read(/var/lib/nfs/rpc_pipefs/nfs/clnt0/idmap): No
such file or directory
(Repeated several times).
I'm veering towards this issue being an idmap problem.
Cheers
tim
On 13/12/10 10:22, Tim Watts wrote:
> Bump - anyone :)
>
> Please note - I'm not requesting anyone fix a deeply obscure problem.
> I'm a sysadmin - I'd just be really grateful for some pointers to help
> me continue debugging this myself...
>
> Ta,
>
> Tim
>
> On 10/12/10 15:43, Tim Watts wrote:
>> Hi,
>>
>> I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
>> server is running Centos 5.5 and we use MIT kerberos and LDAP for
>> users/groups. This seems to work well with Centos 5.5 clients
>>
>> All works fine with my Ubuntu client, except after a while my client
>> acts like it loses its authentication - symptom: home directory mount
>> drops to "nobody" - I see the mount as "other" - no write access, can
>> read files that have world read bit set etc.
>>
>> This can happen anytime between 48 hours and 2 hours after a full client
>> reboot. It seems to be triggered by active use of thunderbird via the
>> NFSv4 mounted home dir which suggests it may be load sensitive.
>>
>> When it happens, if I unmount my home dir (killing the desktop of
>> course) , then remount the fault is cleared and I can work again.
>>
>> What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
>>
>> I have run rpc.gssd in foreground debug mode and that doesn't say much
>> during the problem times, ditto idmapd. We are using openldap for passwd
>> and group lookups cached locally with nscd.
>>
>> I have tried upping kernel debugging:
>>
>> rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
>> callback client mount all
>>
>> but I'm not sure what I'm looking for.
>>
>> The symptoms feel like the kernel is losing the ticket or timing it out
>> or possibly the ID mapping is failing - is there any way to examining
>> the state of the kernel ticket cache or anything else I could be looking
>> for?
>>
>> I am tempted to say this is a bug, possibly in the Ubuntu build, but I
>> would like to investigate further.
>>
>> Any pointers much appreciated as to how I might isolate the fault
>> further.
>>
>> Cheers
>>
>> Tim
>
>
--
Tim Watts
Linux Sysadmin, High Energy Physics, Imperial College London
Tel: 020 759 47809
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NooB Assitance with debugging NFSv4 client requested
2010-12-13 11:43 ` Tim Watts
@ 2010-12-13 18:55 ` J. Bruce Fields
2010-12-13 19:15 ` Tim Watts
0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2010-12-13 18:55 UTC (permalink / raw)
To: Tim Watts; +Cc: linux-nfs
On Mon, Dec 13, 2010 at 11:43:12AM +0000, Tim Watts wrote:
> Hmm,
>
> Wonder if this is a clue (not noticed this before, but running
> idmapd in the foreground, it just whined):
>
> rpc.idmapd: nfscb: read(/var/lib/nfs/rpc_pipefs/nfs/clnt0/idmap): No
> such file or directory
>
> (Repeated several times).
>
> I'm veering towards this issue being an idmap problem.
All permissions are enforced on the server side--the effect of an
idmapper failure should just be that "ls -l" shows bad results (e.g.,
"nobody" as the owner of every file) and that commands like "chown"
fail.
If you're actually being prevented from performing normal filesystem
operations, then the most likely culprit is with gssd and krb5.
Might be worth filing a bug with ubuntu.
rpc.gssd debugging you've tried. Sniffing client/server and client/kdc
traffic with wireshark might also show some more information.
I dunno. Maybe strace rpc.gssd during the failure?
--b.
>
> Cheers
>
> tim
>
> On 13/12/10 10:22, Tim Watts wrote:
> >Bump - anyone :)
> >
> >Please note - I'm not requesting anyone fix a deeply obscure problem.
> >I'm a sysadmin - I'd just be really grateful for some pointers to help
> >me continue debugging this myself...
> >
> >Ta,
> >
> >Tim
> >
> >On 10/12/10 15:43, Tim Watts wrote:
> >>Hi,
> >>
> >>I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
> >>server is running Centos 5.5 and we use MIT kerberos and LDAP for
> >>users/groups. This seems to work well with Centos 5.5 clients
> >>
> >>All works fine with my Ubuntu client, except after a while my client
> >>acts like it loses its authentication - symptom: home directory mount
> >>drops to "nobody" - I see the mount as "other" - no write access, can
> >>read files that have world read bit set etc.
> >>
> >>This can happen anytime between 48 hours and 2 hours after a full client
> >>reboot. It seems to be triggered by active use of thunderbird via the
> >>NFSv4 mounted home dir which suggests it may be load sensitive.
> >>
> >>When it happens, if I unmount my home dir (killing the desktop of
> >>course) , then remount the fault is cleared and I can work again.
> >>
> >>What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
> >>
> >>I have run rpc.gssd in foreground debug mode and that doesn't say much
> >>during the problem times, ditto idmapd. We are using openldap for passwd
> >>and group lookups cached locally with nscd.
> >>
> >>I have tried upping kernel debugging:
> >>
> >>rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
> >>callback client mount all
> >>
> >>but I'm not sure what I'm looking for.
> >>
> >>The symptoms feel like the kernel is losing the ticket or timing it out
> >>or possibly the ID mapping is failing - is there any way to examining
> >>the state of the kernel ticket cache or anything else I could be looking
> >>for?
> >>
> >>I am tempted to say this is a bug, possibly in the Ubuntu build, but I
> >>would like to investigate further.
> >>
> >>Any pointers much appreciated as to how I might isolate the fault
> >>further.
> >>
> >>Cheers
> >>
> >>Tim
> >
> >
>
>
> --
> Tim Watts
> Linux Sysadmin, High Energy Physics, Imperial College London
> Tel: 020 759 47809
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NooB Assitance with debugging NFSv4 client requested
2010-12-13 18:55 ` J. Bruce Fields
@ 2010-12-13 19:15 ` Tim Watts
0 siblings, 0 replies; 7+ messages in thread
From: Tim Watts @ 2010-12-13 19:15 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
Hi Bruce,
On 13/12/10 18:55, J. Bruce Fields wrote:
> On Mon, Dec 13, 2010 at 11:43:12AM +0000, Tim Watts wrote:
>> Hmm,
>>
>> Wonder if this is a clue (not noticed this before, but running
>> idmapd in the foreground, it just whined):
>>
>> rpc.idmapd: nfscb: read(/var/lib/nfs/rpc_pipefs/nfs/clnt0/idmap): No
>> such file or directory
>>
>> (Repeated several times).
>>
>> I'm veering towards this issue being an idmap problem.
>
> All permissions are enforced on the server side--the effect of an
> idmapper failure should just be that "ls -l" shows bad results (e.g.,
> "nobody" as the owner of every file) and that commands like "chown"
> fail.
Thanks very much for your reply.
Thanks for confirming the logic - I certainly now believe I have an
issue with idmapd - sometimes the user will change to "nobody" and the
group is good, and sometimes the reverse will happen.
We are using LDAP - with nscd - I might give myself a static /etc/passwd
and /etc/groups entry to see if that makes the problem go away.
I'll run idmapd in debug plus under strace to see what's happening.
> If you're actually being prevented from performing normal filesystem
> operations, then the most likely culprit is with gssd and krb5.
I think I *may* have discovered the cause, eventually.
I have a couple of bash functions that maintain a user kerb principle
and a root one - the root one being in /tmp/krb5cc_<uid>_root - and the
bash function switches the env var to suit the mode I want to be in.
Seems that having two files of the form /tmp/krb5cc_<uid>* might be
confusing something, so I have made the root one have a totally non
conforming name to see if that was the cause.
> Might be worth filing a bug with ubuntu.
I did - it got classified under "nfs4_acl_tools"(!). I don't think
Ubuntu are big on NFSv4 and I'm happy to try to get to the bottom of
this, but it's pretty new to me (I've run a lot of NFSv3 before though).
> rpc.gssd debugging you've tried. Sniffing client/server and client/kdc
> traffic with wireshark might also show some more information.
Yes - that would be worth doing.
> I dunno. Maybe strace rpc.gssd during the failure?
Yes - might be worth doing that too.
Luckily during the failure conditions, the desktop is usable enough so I
can at least get around and look at stuff.
Many thanks for the pointers :)
Cheers
Tim
--
Tim Watts
Linux Sysadmin, High Energy Physics, Imperial College London
Tel: 020 759 47809
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NooB Assitance with debugging NFSv4 client requested
2010-12-10 15:43 NooB Assitance with debugging NFSv4 client requested Tim Watts
[not found] ` <4D024A8E.50407-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
@ 2010-12-16 22:03 ` Tim Watts
[not found] ` <4D0A8CC9.9050306-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
1 sibling, 1 reply; 7+ messages in thread
From: Tim Watts @ 2010-12-16 22:03 UTC (permalink / raw)
To: linux-nfs
On 10/12/10 15:43, Tim Watts wrote:
> Hi,
>
> I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
> server is running Centos 5.5 and we use MIT kerberos and LDAP for
> users/groups. This seems to work well with Centos 5.5 clients
>
> All works fine with my Ubuntu client, except after a while my client
> acts like it loses its authentication - symptom: home directory mount
> drops to "nobody" - I see the mount as "other" - no write access, can
> read files that have world read bit set etc.
>
> This can happen anytime between 48 hours and 2 hours after a full client
> reboot. It seems to be triggered by active use of thunderbird via the
> NFSv4 mounted home dir which suggests it may be load sensitive.
>
> When it happens, if I unmount my home dir (killing the desktop of
> course) , then remount the fault is cleared and I can work again.
>
> What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
>
> I have run rpc.gssd in foreground debug mode and that doesn't say much
> during the problem times, ditto idmapd. We are using openldap for passwd
> and group lookups cached locally with nscd.
>
> I have tried upping kernel debugging:
>
> rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
> callback client mount all
>
> but I'm not sure what I'm looking for.
>
> The symptoms feel like the kernel is losing the ticket or timing it out
> or possibly the ID mapping is failing - is there any way to examining
> the state of the kernel ticket cache or anything else I could be looking
> for?
>
> I am tempted to say this is a bug, possibly in the Ubuntu build, but I
> would like to investigate further.
>
> Any pointers much appreciated as to how I might isolate the fault further.
>
> Cheers
>
> Tim
Right - thanks for all the discussions...
The security (krb tickets) seem stable now I have renamed my root
principle cache file to /tmp/kerberos5cc_tjw_root rather than
/tmp/krb5cc_myuid_root
idmap seems still unstable - often my file groups go to nobody despite
my file owners being correct.
Our LDAP server setup is new (replaced NIS) and we see sometimes that
other servers (eg the mailserver) occasionally don't get an LDAP query
through in a timely manner once in a blue moon.
I would like to see idmapd try a bit harder and/or not cache bad/missing
answers for so long but I think that is roughly where the other problem
lies - thus is debuggable and soluable.
Thanks for the help guys!
Cheers
Tim
--
Tim Watts
Linux Sysadmin, High Energy Physics, Imperial College London
Tel: 020 759 47809
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NooB Assitance with debugging NFSv4 client requested
[not found] ` <4D0A8CC9.9050306-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
@ 2010-12-17 19:16 ` J. Bruce Fields
0 siblings, 0 replies; 7+ messages in thread
From: J. Bruce Fields @ 2010-12-17 19:16 UTC (permalink / raw)
To: Tim Watts; +Cc: linux-nfs
On Thu, Dec 16, 2010 at 10:03:53PM +0000, Tim Watts wrote:
> On 10/12/10 15:43, Tim Watts wrote:
> >Hi,
> >
> >I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
> >server is running Centos 5.5 and we use MIT kerberos and LDAP for
> >users/groups. This seems to work well with Centos 5.5 clients
> >
> >All works fine with my Ubuntu client, except after a while my client
> >acts like it loses its authentication - symptom: home directory mount
> >drops to "nobody" - I see the mount as "other" - no write access, can
> >read files that have world read bit set etc.
> >
> >This can happen anytime between 48 hours and 2 hours after a full client
> >reboot. It seems to be triggered by active use of thunderbird via the
> >NFSv4 mounted home dir which suggests it may be load sensitive.
> >
> >When it happens, if I unmount my home dir (killing the desktop of
> >course) , then remount the fault is cleared and I can work again.
> >
> >What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
> >
> >I have run rpc.gssd in foreground debug mode and that doesn't say much
> >during the problem times, ditto idmapd. We are using openldap for passwd
> >and group lookups cached locally with nscd.
> >
> >I have tried upping kernel debugging:
> >
> >rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
> >callback client mount all
> >
> >but I'm not sure what I'm looking for.
> >
> >The symptoms feel like the kernel is losing the ticket or timing it out
> >or possibly the ID mapping is failing - is there any way to examining
> >the state of the kernel ticket cache or anything else I could be looking
> >for?
> >
> >I am tempted to say this is a bug, possibly in the Ubuntu build, but I
> >would like to investigate further.
> >
> >Any pointers much appreciated as to how I might isolate the fault further.
> >
> >Cheers
> >
> >Tim
>
> Right - thanks for all the discussions...
>
> The security (krb tickets) seem stable now I have renamed my root
> principle cache file to /tmp/kerberos5cc_tjw_root rather than
> /tmp/krb5cc_myuid_root
>
> idmap seems still unstable - often my file groups go to nobody
> despite my file owners being correct.
>
> Our LDAP server setup is new (replaced NIS) and we see sometimes
> that other servers (eg the mailserver) occasionally don't get an
> LDAP query through in a timely manner once in a blue moon.
>
> I would like to see idmapd try a bit harder and/or not cache
> bad/missing answers for so long but I think that is roughly where
> the other problem lies - thus is debuggable and soluable.
Let us know what you figure out.
--b.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-12-17 19:16 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-10 15:43 NooB Assitance with debugging NFSv4 client requested Tim Watts
[not found] ` <4D024A8E.50407-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
2010-12-13 10:22 ` Tim Watts
2010-12-13 11:43 ` Tim Watts
2010-12-13 18:55 ` J. Bruce Fields
2010-12-13 19:15 ` Tim Watts
2010-12-16 22:03 ` Tim Watts
[not found] ` <4D0A8CC9.9050306-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
2010-12-17 19:16 ` J. Bruce Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).