* Re: ENOENT on first reference to an automounted file To: autofs@linux.kernel.org [not found] <mailman.1.1192795201.25176.autofs@linux.kernel.org> @ 2007-10-20 1:28 ` Greg Earle 2007-11-07 1:53 ` ENOENT on first reference to an automounted file Dan Halbert 0 siblings, 1 reply; 22+ messages in thread From: Greg Earle @ 2007-10-20 1:28 UTC (permalink / raw) To: autofs On Oct 19, 2007, at 16:02 AM, Ian Kent <raven@themaw.net> wrote: > On Thu, 2007-10-18 at 03:36 -0700, Greg Earle wrote: >> I am getting the impression from the bug reports (and posts in >> this thread) that this bug is *not* fixed in 2.6.9-55.0.9; and might >> not be until some point in the future when 2.6.9-61 is available via >> "up2date". Am I correct in that assumption? >> >> If so, we may have little choice but to rollback to Update 4 by >> doing complete reinstalls from scratch (groan). Is there any >> info on when this bug first crept in, and is Update 4 - with >> autofs-4.1.3-187 - safe to roll back to? The natives are restless, >> and they've already shown up outside my office door with torches >> and pitchforks. I've got a lot of unhappy Flight Projects reps >> on my hands. We need to make a command decision here Real Soon Now. >> >> Any illumination much appreciated. > > Well, if we can't confirm the problem and resolution then I have no > case > to put for an update. I can provide you (off-line, as it has potentially sensitive host information) an overly-wordy e-mail I sent out to others in my organization about the problem, illustrated with "strace" output and "tshark" packet traces if you like. I do have an update for you, though. And it's very bizarre. In Dan's original report to this list, he mentions having turned *off* "--ghost" with his maps sourced from LDAP. We use NIS for our maps, but just for fun, I decided to test turning "--ghost" *on* (we default to it off, and we also use "-nobrowse" on our Suns, so we like to keep them consistent), and ... It cured the problem! So for now, we are using that as a temporary kludge/workaround. My tests, which could trigger it anywhere between as little as 4 1/2 minutes to (at most) 15 minutes, now run for an entire day without error. Another twist: in our "/home" map, we have about 127 individual entries, but most of those are for people outside of our particular organization with non-wildcard-able home directory paths. But for the rest of us, we *do* use a single wildcard "*" entry: * rhel4u5server:/export/home/& Using "--ghost" has, as expected, produced 127 phantom directories under "/home" after a fresh reboot of the test system with it enabled. But here is what's of interest to me - the test script I used pummeled this system with requests from an account which is one of the wildcarded ones. I would think there would be a potential problem with mounting that one for the first time, since with "--ghost" it still doesn't appear under "/home" until the account actually tickles it. In short, I'd expect to see behavior like Dan's original query - "ENOENT on first reference". But I'm not. Our test setup is very simple: rhel4u5server - NFS server, has the home directories, RHEL 4 Update 5 rhel4u5client - NFS client, test machine, gets maps and password etc. entries from NIS, so user is everywhere, home directory is "/home/user", et al. othermachine - Doesn't matter what it is, just used to run SSH to client The user whose uid we've been using for the tests has a bunch of machines in his $HOME/.ssh/authorized_keys file. He's got "ssh-agent" set up and running so that he can fire up SSH sessions to remote hosts to run things - usually out of a local "cron" job - without needing a password to do so, since these are automated out of "crontab". So, basically, all our test script - which is run on "othermachine" - does is fire up constant SSH commands inside a loop: ssh -l user@rhel4u5client ls -l .ssh Since rhel4u5client is automounting /home from rhel4u5server, then obviously in the normal case, this works just fine. But when the bug is triggered, I see an expiry run from the automounter on "rhel4u5client", and it umounts the user's home directory, and returns ENOENT to the (rhel4u5client) SSH daemon when it goes to try an open the user's "authorized_keys" file in ".ssh". The result being that instead of it working, the script back on "othermachine" instead gets stopped in its tracks as it's prompted for a passphrase. So it's really easy to tell when the bug has occurred. What's odd is, once the ENOENT is returned, the SSH daemon goes and looks for "authorized_keys2" as a backup, and that fails - but by now, the home directory has been magically remounted by the automounter, so a real live NFS lookup occurs (unlike with what I just described above), NFS3ERR_NOENT gets returned (the user doesn't have one), which is normal. But after that, it asks for "authorized_keys" *again*, and I have no idea why. This NFS lookup succeeds, but by then the SSH daemon has already taken the original ENOENT returned while /home/user was temporarily unmounted, and thrown up the demand for the passphrase - so the fact that "authorized_keys" exists (again) no longer matters at that point. And the test script stops. I would be happy to forward the e-mail with the strace output and the packet traces to you and Jeff off-line, if you like. > No-one has volunteered to try the patches I referred to in this thread > and that's why I haven't posted them, so how about it, someone? Due to the nature of my organization, I am not really in a position to test patches - but surely someone else can? (We're a 24/7 Operations environment - getting downtimes to even do simple reboots is like pulling teeth, involving committees.) - Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-20 1:28 ` ENOENT on first reference to an automounted file To: autofs@linux.kernel.org Greg Earle @ 2007-11-07 1:53 ` Dan Halbert 0 siblings, 0 replies; 22+ messages in thread From: Dan Halbert @ 2007-11-07 1:53 UTC (permalink / raw) To: autofs Greg Earle wrote: > We use NIS for our maps, but just for fun, I decided to test > turning "--ghost" *on* (we default to it off, and we also use > "-nobrowse" on our Suns, so we like to keep them consistent), > and ... > Greg, try the latest kernel, 2.6.9-55.0.12, which is now available from RedHat and has also gone downstream to various other RH-source-based distributions. This works for us. I agree your ghost/non-ghost differences are odd and do not match my experience. But I see differences also based on client load. Dan ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <mailman.446.1194400455.3098.autofs@linux.kernel.org>]
* Re: ENOENT on first reference to an automounted file [not found] <mailman.446.1194400455.3098.autofs@linux.kernel.org> @ 2007-11-17 21:25 ` Greg Earle 2007-11-18 2:46 ` Ian Kent 0 siblings, 1 reply; 22+ messages in thread From: Greg Earle @ 2007-11-17 21:25 UTC (permalink / raw) To: autofs On Nov 6, 2007, at 8:53 PM EST, Dan Halbert <halbert@everyzing.com> wrote: > Greg Earle wrote: >> We use NIS for our maps, but just for fun, I decided to test >> turning "--ghost" *on* (we default to it off, and we also use >> "-nobrowse" on our Suns, so we like to keep them consistent), >> and ... >> > Greg, try the latest kernel, 2.6.9-55.0.12, which is now available > from > RedHat and has also gone downstream to various other RH-source-based > distributions. This works for us. I agree your ghost/non-ghost > differences are odd and do not match my experience. But I see > differences also based on client load. I see that Red Hat just announced/released RHEL 4 Update 6 yesterday: https://www.redhat.com/archives/nahant-list/2007-November/msg00068.html It appears that this includes kernel 2.6.9-67. Can I safely assume that this new release quashes this pesky ENOENT bug once and for all? More interestingly/importantly, RHEL 4 Update 6 includes autofs5 as a "Technology Preview". How does the autofs5 code in this new release compare with the mainline code in RHEL 5 Update 1, and is it considered robust enough to use in a production environment that depends heavily (as in, "life or death" - we use the automounter for *everything*) on automounting? We started our RHEL 4 Update 5 upgrade cycle a month and a half ago but were stopped dead in our tracks by this bug. Now that we have a workaround ("--ghost"), we are planning on pushing ahead, but I need to know whether I should try recommending that we instead move to Update 6 rather than continue to use Update 5 with a Band-Aid. Thanks, - Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-11-17 21:25 ` Greg Earle @ 2007-11-18 2:46 ` Ian Kent 0 siblings, 0 replies; 22+ messages in thread From: Ian Kent @ 2007-11-18 2:46 UTC (permalink / raw) To: autofs On Sat, 2007-11-17 at 13:25 -0800, Greg Earle wrote: > On Nov 6, 2007, at 8:53 PM EST, Dan Halbert <halbert@everyzing.com> > wrote: > > > Greg Earle wrote: > >> We use NIS for our maps, but just for fun, I decided to test > >> turning "--ghost" *on* (we default to it off, and we also use > >> "-nobrowse" on our Suns, so we like to keep them consistent), > >> and ... > >> > > Greg, try the latest kernel, 2.6.9-55.0.12, which is now available > > from > > RedHat and has also gone downstream to various other RH-source-based > > distributions. This works for us. I agree your ghost/non-ghost > > differences are odd and do not match my experience. But I see > > differences also based on client load. > > I see that Red Hat just announced/released RHEL 4 Update 6 > yesterday: > > https://www.redhat.com/archives/nahant-list/2007-November/msg00068.html > > It appears that this includes kernel 2.6.9-67. Can I safely > assume that this new release quashes this pesky ENOENT bug > once and for all? Try it out. All I can say is that the 4.6 release kernel has the patches that were used in 2.6.9-55.0.12 to resolve the problem. > > More interestingly/importantly, RHEL 4 Update 6 includes autofs5 > as a "Technology Preview". How does the autofs5 code in this > new release compare with the mainline code in RHEL 5 Update 1, > and is it considered robust enough to use in a production > environment that depends heavily (as in, "life or death" - > we use the automounter for *everything*) on automounting? It's the same as is in RHEL 5 U1, except for some changes to allow autofs and autofs5 to be installed at the same time. You still need to use "one or the other", not both. I'll be keeping RHEL 4 autofs5 in sync with RHEL 5 autofs. Tech Preview was our only option to get this into RHEL4 as autofs version 4 is already included as a core package, which must continue to be included. > > We started our RHEL 4 Update 5 upgrade cycle a month and a > half ago but were stopped dead in our tracks by this bug. > > Now that we have a workaround ("--ghost"), we are planning > on pushing ahead, but I need to know whether I should try > recommending that we instead move to Update 6 rather than > continue to use Update 5 with a Band-Aid. The kernel revision 2.6.9-55.0.12 isn't really a band-aid, it contains a correction. Whether you go 4.6 with autofs5 is a decision you'll need to make yourself after suitable testing. There is of course the issue that a Tech Preview isn't officially supported so you may have trouble logging bugs. But then you can always report them here and I can log bugs if needed. Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file @ 2007-10-18 10:36 Greg Earle 2007-10-19 8:02 ` Ian Kent 0 siblings, 1 reply; 22+ messages in thread From: Greg Earle @ 2007-10-18 10:36 UTC (permalink / raw) To: autofs Dan, Ian, Jeff: I work at a large U.S. Government Lab and after updating some of our Ops systems from RHEL 4 Update 4 to Update 5, all Hell broke loose as we have been plagued by this bug ever since - we are a Sun and Red Hat shop, and our software architecture is heavily dependent upon lots of automounts. (I can very easily replicate it in our own environment with a simple test script that usually provokes the race condition in about 10-15 minutes, tops.) I am getting the impression from the bug reports (and posts in this thread) that this bug is *not* fixed in 2.6.9-55.0.9; and might not be until some point in the future when 2.6.9-61 is available via "up2date". Am I correct in that assumption? If so, we may have little choice but to rollback to Update 4 by doing complete reinstalls from scratch (groan). Is there any info on when this bug first crept in, and is Update 4 - with autofs-4.1.3-187 - safe to roll back to? The natives are restless, and they've already shown up outside my office door with torches and pitchforks. I've got a lot of unhappy Flight Projects reps on my hands. We need to make a command decision here Real Soon Now. Any illumination much appreciated. - Greg Earle ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-18 10:36 Greg Earle @ 2007-10-19 8:02 ` Ian Kent 2007-10-19 13:20 ` Dan Halbert 0 siblings, 1 reply; 22+ messages in thread From: Ian Kent @ 2007-10-19 8:02 UTC (permalink / raw) To: autofs On Thu, 2007-10-18 at 03:36 -0700, Greg Earle wrote: > Dan, Ian, Jeff: > > I work at a large U.S. Government Lab and after updating some of our > Ops systems from RHEL 4 Update 4 to Update 5, all Hell broke loose > as we have been plagued by this bug ever since - we are a Sun and > Red Hat shop, and our software architecture is heavily dependent upon > lots of automounts. (I can very easily replicate it in our own > environment with a simple test script that usually provokes the > race condition in about 10-15 minutes, tops.) > > I am getting the impression from the bug reports (and posts in > this thread) that this bug is *not* fixed in 2.6.9-55.0.9; and might > not be until some point in the future when 2.6.9-61 is available via > "up2date". Am I correct in that assumption? > > If so, we may have little choice but to rollback to Update 4 by > doing complete reinstalls from scratch (groan). Is there any > info on when this bug first crept in, and is Update 4 - with > autofs-4.1.3-187 - safe to roll back to? The natives are restless, > and they've already shown up outside my office door with torches > and pitchforks. I've got a lot of unhappy Flight Projects reps > on my hands. We need to make a command decision here Real Soon Now. > > Any illumination much appreciated. Well, if we can't confirm the problem and resolution then I have no case to put for an update. No-one has volunteered to try the patches I referred to in this thread and that's why I haven't posted them, so how about it, someone? Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-19 8:02 ` Ian Kent @ 2007-10-19 13:20 ` Dan Halbert 2007-10-19 14:37 ` Ian Kent 0 siblings, 1 reply; 22+ messages in thread From: Dan Halbert @ 2007-10-19 13:20 UTC (permalink / raw) To: autofs Ian Kent wrote: > > Well, if we can't confirm the problem and resolution then I have no case > to put for an update. > > No-one has volunteered to try the patches I referred to in this thread > and that's why I haven't posted them, so how about it, someone? Ian (& Greg & Jeff), Maybe there's a bit of cross-purpose communication here. In an earlier message, Jeff said he had reproduced the problem by using exactly our kernel (2.6.9-55.0.9.ELsmp x86_64), and that the problem did NOT happen with a later kernel he had (which was the one he originally tried). See http://linux.kernel.org/pipermail/autofs/2007-October/004133.html. So I think Jeff has confirmed the problem and resolution. Am I telling you something you already know? Jeff said his successful test kernel has patches for bug 248126. Comment #24 in that bug says the patch was put in 2.6.9-55.0.7. So I'd expect the patch to be in 2.6.9-55.0.9 and for the problem to be fixed already. Since it isn't fixed, either the patch was pulled between .7 and .9, or the fix is more complicated than that single patch. Also, the bug comments refer to several different patch sets and other bugs, so it's not clear to me which patches Jeff actually has in his test kernel. My group has various workarounds, so we're not dead in the water. We are also might move up to 5.x, but are waiting for a completely different fix as well (kernel.org bug #7768), which is not yet in the released upstream kernels. Dan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-19 13:20 ` Dan Halbert @ 2007-10-19 14:37 ` Ian Kent 2007-10-19 15:22 ` Jeff Moyer 0 siblings, 1 reply; 22+ messages in thread From: Ian Kent @ 2007-10-19 14:37 UTC (permalink / raw) To: Dan Halbert; +Cc: autofs On Fri, 2007-10-19 at 09:20 -0400, Dan Halbert wrote: > Ian Kent wrote: > > > > Well, if we can't confirm the problem and resolution then I have no case > > to put for an update. > > > > No-one has volunteered to try the patches I referred to in this thread > > and that's why I haven't posted them, so how about it, someone? > > Ian (& Greg & Jeff), > > Maybe there's a bit of cross-purpose communication here. In an earlier > message, Jeff said he had reproduced the problem by using exactly our > kernel (2.6.9-55.0.9.ELsmp x86_64), and that the problem did NOT happen > with a later kernel he had (which was the one he originally tried). See > http://linux.kernel.org/pipermail/autofs/2007-October/004133.html. > > So I think Jeff has confirmed the problem and resolution. Am I telling > you something you already know? Well, to be honest, I had forgotten about that comment, but that's partly good. The curious thing is, of course, is hitting this problem is quite odd because it shouldn't be that prone to occur. > > Jeff said his successful test kernel has patches for bug 248126. Comment > #24 in that bug says the patch was put in 2.6.9-55.0.7. So I'd expect > the patch to be in 2.6.9-55.0.9 and for the problem to be fixed already. > Since it isn't fixed, either the patch was pulled between .7 and .9, or > the fix is more complicated than that single patch. Also, the bug > comments refer to several different patch sets and other bugs, so it's > not clear to me which patches Jeff actually has in his test kernel. I mentioned before (although I may not have been clear on exactly what ) that the patch for the mount/expire race had been reverted in 2.6.9-55.0.9 and the patches in the bug Jeff referred to are corrections to that patch. Anyway, the story just gets worse because there's another patch that depends on these that should also be included and isn't. To this end I've built a RHEL4 kernel with all the patches that "should" be included. If your interested in testing it we just need to find a way to get it to you. It would be good to get some clear information on this because several people are having, and will continue to have (including possibly RHEL5), odd little problems that end up being quite serious and I have no sold case to lobby for inclusion of the reverted or missing patches. Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-19 14:37 ` Ian Kent @ 2007-10-19 15:22 ` Jeff Moyer 2007-10-19 17:05 ` Dan Halbert 0 siblings, 1 reply; 22+ messages in thread From: Jeff Moyer @ 2007-10-19 15:22 UTC (permalink / raw) To: Ian Kent; +Cc: autofs Ian Kent <raven@themaw.net> writes: > On Fri, 2007-10-19 at 09:20 -0400, Dan Halbert wrote: >> Ian Kent wrote: >> > >> > Well, if we can't confirm the problem and resolution then I have no case >> > to put for an update. >> > >> > No-one has volunteered to try the patches I referred to in this thread >> > and that's why I haven't posted them, so how about it, someone? >> >> Ian (& Greg & Jeff), >> >> Maybe there's a bit of cross-purpose communication here. In an earlier >> message, Jeff said he had reproduced the problem by using exactly our >> kernel (2.6.9-55.0.9.ELsmp x86_64), and that the problem did NOT happen >> with a later kernel he had (which was the one he originally tried). See >> http://linux.kernel.org/pipermail/autofs/2007-October/004133.html. >> >> So I think Jeff has confirmed the problem and resolution. Am I telling >> you something you already know? > > Well, to be honest, I had forgotten about that comment, but that's > partly good. The curious thing is, of course, is hitting this problem is > quite odd because it shouldn't be that prone to occur. I verified that the latest release-candidate kernel for RHEL 4 U6 fixes the problem. In the mean time, you can work around the bug by turning off ghosting. Cheers, Jeff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-19 15:22 ` Jeff Moyer @ 2007-10-19 17:05 ` Dan Halbert 2007-10-19 17:21 ` Ian Kent 0 siblings, 1 reply; 22+ messages in thread From: Dan Halbert @ 2007-10-19 17:05 UTC (permalink / raw) To: Ian Kent; +Cc: autofs Ian Kent wrote: >Well, to be honest, I had forgotten about that comment, but that's >partly good. The curious thing is, of course, is hitting this problem >is quite odd because it shouldn't be that prone to occur. I think the original diagnosis of an umount/mount race is only one possible way to hit the bug. We use very long timeouts and would never have hit that particular race. Since we see it with LDAP but not with a local map, I wonder if it is due to some slight additional delay caused by the LDAP lookup. >To this end I've built a RHEL4 kernel with all the patches that >"should" be included. If your interested in testing it we just need to >find a way to get it to you. Jeff Moyer wrote: > I verified that the latest release-candidate kernel for RHEL 4 U6 > fixes the problem. > > In the mean time, you can work around the bug by turning off ghosting. Ian, should I try your test kernel, or is it moot now, given what Jeff says? I can privately give you an FTP location if you would still like it tested. Dan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-19 17:05 ` Dan Halbert @ 2007-10-19 17:21 ` Ian Kent 2007-11-03 15:27 ` Dan Halbert 0 siblings, 1 reply; 22+ messages in thread From: Ian Kent @ 2007-10-19 17:21 UTC (permalink / raw) To: Dan Halbert; +Cc: autofs On Fri, 2007-10-19 at 13:05 -0400, Dan Halbert wrote: > Ian Kent wrote: > > >Well, to be honest, I had forgotten about that comment, but that's > >partly good. The curious thing is, of course, is hitting this problem > >is quite odd because it shouldn't be that prone to occur. > > I think the original diagnosis of an umount/mount race is only one > possible way to hit the bug. We use very long timeouts and would never > have hit that particular race. Since we see it with LDAP but not with a > local map, I wonder if it is due to some slight additional delay caused > by the LDAP lookup. We understand what is happening now. Jeff worked it out. It is to do with the use of a wildcard entry and interaction of the daemon with the autofs internal cache and the directory create/remove done by the daemon. This also explains why the kernel patch prevents the problem from happening. > > >To this end I've built a RHEL4 kernel with all the patches that > >"should" be included. If your interested in testing it we just need to > >find a way to get it to you. > > Jeff Moyer wrote: > > I verified that the latest release-candidate kernel for RHEL 4 U6 > > fixes the problem. > > > > In the mean time, you can work around the bug by turning off ghosting. > > Ian, should I try your test kernel, or is it moot now, given what Jeff > says? I can privately give you an FTP location if you would still like > it tested. We probably still have to make our case for getting this into the revision 55 series kernel I think, given that the patches have been reverted. Your not the only one seeing this and I expect not everyone will be able or comfortable upgrading to a later revision kernel so perhaps we should. It's pretty much up to you as anyone who can't go to the U6 kernel and needs this will have to test it for themselves anyway. I have the build now anyway. Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-19 17:21 ` Ian Kent @ 2007-11-03 15:27 ` Dan Halbert 2007-11-04 5:12 ` Ian Kent 0 siblings, 1 reply; 22+ messages in thread From: Dan Halbert @ 2007-11-03 15:27 UTC (permalink / raw) To: autofs Ian, I see that http://rhn.redhat.com/errata/RHSA-2007-0939.html, dated 2007-11-01, claims to include a fix for autofs bug 248126, with a new kernel-2.6.9-55.0.12. Do you know if this kernel contains effectively the same patches that are in the test kernel I got from you? If so, great! We have been running your test kernel on about thirty machines with no problems, no ENOENT's, etc. Dan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-11-03 15:27 ` Dan Halbert @ 2007-11-04 5:12 ` Ian Kent 0 siblings, 0 replies; 22+ messages in thread From: Ian Kent @ 2007-11-04 5:12 UTC (permalink / raw) To: Dan Halbert; +Cc: autofs On Sat, 2007-11-03 at 11:27 -0400, Dan Halbert wrote: > Ian, > > I see that http://rhn.redhat.com/errata/RHSA-2007-0939.html, dated > 2007-11-01, claims to include a fix for autofs bug 248126, with a new > kernel-2.6.9-55.0.12. Do you know if this kernel contains effectively > the same patches that are in the test kernel I got from you? If so, > great! We have been running your test kernel on about thirty machines > with no problems, no ENOENT's, etc. Yep, it looks good to me. Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* ENOENT on first reference to an automounted file
@ 2007-10-05 21:14 Dan Halbert
2007-10-06 4:48 ` Ian Kent
0 siblings, 1 reply; 22+ messages in thread
From: Dan Halbert @ 2007-10-05 21:14 UTC (permalink / raw)
To: autofs
I have what looks like an automount race condition, and am very puzzled.
Any suggestions would be appreciated.
The first time I reference an automounted file, it is not there
(ENOENT). On the second and later try, the file is there. For instance:
$ cat /net/fileserver/fs/somefile
cat: /net/fileserver/fs/somefile: No such file or directory
$ cat /net/fileserver/fs/somefile
Contents of somefile.
I watched the log on fileserver, and the automount request is logged
seemingly immediately after the first "cat" prints its error.
This causes havoc with our applications, which expect files to be there
the first time they look for them.
I can repeat the problem after umounting the fileystem.
I see this problem on a CentOS 4.x system running their standard
autofs-4.1.3-199.3. I do NOT see it on CentOS 5.x, using
autofs-5.0.1-0.rc2.43.0.2. Instead I see a slight pause before "cat"
prints the contents of the file, presumably as the automount completes.
Both the CentOS4 and CentOS5 systems are completely up-to-date.
I also only see this problem with our Linux NFS servers (FC5 and FC6),
but not with a non-Fedora NAS server we have.
So I am not sure this is an automount problem, per se. Perhaps it's some
kind of NFS version problem?
The automount options include --ghost. At first I thought it might be
due to --ghost, because the very first time I reference the file, say
after a reboot or restarting autofs, I don't get an ENOENT. The first
time, the mountpoint dir does not yet exist. But removing --ghost from
the automount options does not seem to fix it.
Gory details about the automount maps are below.
Thanks for any help,
Dan Halbert
---------------
More details:
Our automount maps are stored in ldap. The entry in auto.master for
fileserver (for cn=/net/fileserver) is:
ldap:ou=auto.fileserver,ou=autofs,dc=example,dc=com --timeout=86400
--ghost -o
rw,hard,async,noatime,intr,retrans=4,timeo=100,rsize=8192,wsize=8192
The auto.fileserver is (for cn=*):
fileserver.example.com:/export/&
We are not using the fancy executable /net maps that come with these
systems.
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: ENOENT on first reference to an automounted file 2007-10-05 21:14 Dan Halbert @ 2007-10-06 4:48 ` Ian Kent 2007-10-08 15:15 ` Jeff Moyer [not found] ` <47081453.7000709@everyzing.com> 0 siblings, 2 replies; 22+ messages in thread From: Ian Kent @ 2007-10-06 4:48 UTC (permalink / raw) To: Dan Halbert; +Cc: autofs On Fri, 2007-10-05 at 17:14 -0400, Dan Halbert wrote: > I have what looks like an automount race condition, and am very puzzled. > Any suggestions would be appreciated. > > The first time I reference an automounted file, it is not there > (ENOENT). On the second and later try, the file is there. For instance: > > $ cat /net/fileserver/fs/somefile > cat: /net/fileserver/fs/somefile: No such file or directory > $ cat /net/fileserver/fs/somefile > Contents of somefile. > > I watched the log on fileserver, and the automount request is logged > seemingly immediately after the first "cat" prints its error. > > This causes havoc with our applications, which expect files to be there > the first time they look for them. > > I can repeat the problem after umounting the fileystem. > > I see this problem on a CentOS 4.x system running their standard > autofs-4.1.3-199.3. I do NOT see it on CentOS 5.x, using > autofs-5.0.1-0.rc2.43.0.2. Instead I see a slight pause before "cat" > prints the contents of the file, presumably as the automount completes. > Both the CentOS4 and CentOS5 systems are completely up-to-date. > > I also only see this problem with our Linux NFS servers (FC5 and FC6), > but not with a non-Fedora NAS server we have. > > So I am not sure this is an automount problem, per se. Perhaps it's some > kind of NFS version problem? > > The automount options include --ghost. At first I thought it might be > due to --ghost, because the very first time I reference the file, say > after a reboot or restarting autofs, I don't get an ENOENT. The first > time, the mountpoint dir does not yet exist. But removing --ghost from > the automount options does not seem to fix it. We've seen this from time to time for various reasons but to be honest I have trouble remembering so we'll need to check through a debug log. Jeff may recall this? Also, you don't mention the kernel versions? > > Gory details about the automount maps are below. > > Thanks for any help, > Dan Halbert > > --------------- > More details: > > Our automount maps are stored in ldap. The entry in auto.master for > fileserver (for cn=/net/fileserver) is: > > ldap:ou=auto.fileserver,ou=autofs,dc=example,dc=com --timeout=86400 > --ghost -o > rw,hard,async,noatime,intr,retrans=4,timeo=100,rsize=8192,wsize=8192 > > > The auto.fileserver is (for cn=*): > > fileserver.example.com:/export/& We really must have a debug log, include everything and give some indication of when the problem occurred. See http://people.redhat.com/jmoyer for info. Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-06 4:48 ` Ian Kent @ 2007-10-08 15:15 ` Jeff Moyer [not found] ` <47081453.7000709@everyzing.com> 1 sibling, 0 replies; 22+ messages in thread From: Jeff Moyer @ 2007-10-08 15:15 UTC (permalink / raw) To: Ian Kent; +Cc: autofs Ian Kent <raven@themaw.net> writes: > On Fri, 2007-10-05 at 17:14 -0400, Dan Halbert wrote: >> I have what looks like an automount race condition, and am very puzzled. >> Any suggestions would be appreciated. >> >> The first time I reference an automounted file, it is not there >> (ENOENT). On the second and later try, the file is there. For instance: >> >> $ cat /net/fileserver/fs/somefile >> cat: /net/fileserver/fs/somefile: No such file or directory >> $ cat /net/fileserver/fs/somefile >> Contents of somefile. >> >> I watched the log on fileserver, and the automount request is logged >> seemingly immediately after the first "cat" prints its error. >> >> This causes havoc with our applications, which expect files to be there >> the first time they look for them. >> >> I can repeat the problem after umounting the fileystem. >> >> I see this problem on a CentOS 4.x system running their standard >> autofs-4.1.3-199.3. I do NOT see it on CentOS 5.x, using >> autofs-5.0.1-0.rc2.43.0.2. Instead I see a slight pause before "cat" >> prints the contents of the file, presumably as the automount completes. >> Both the CentOS4 and CentOS5 systems are completely up-to-date. >> >> I also only see this problem with our Linux NFS servers (FC5 and FC6), >> but not with a non-Fedora NAS server we have. >> >> So I am not sure this is an automount problem, per se. Perhaps it's some >> kind of NFS version problem? >> >> The automount options include --ghost. At first I thought it might be >> due to --ghost, because the very first time I reference the file, say >> after a reboot or restarting autofs, I don't get an ENOENT. The first >> time, the mountpoint dir does not yet exist. But removing --ghost from >> the automount options does not seem to fix it. > > We've seen this from time to time for various reasons but to be honest I > have trouble remembering so we'll need to check through a debug log. > > Jeff may recall this? I think that the last time we looked at this, the problem was that there was a replicated server entry, and the first picked entry failed to mount. Then, the second succeeded, but we returned the wrong dentry from lookup. This resulted in a reported failure, even though the mount was successful. I'm not convinced this is the same problem. I'll try to reproduce it. Cheers, Jeff ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <47081453.7000709@everyzing.com>]
* Re: ENOENT on first reference to an automounted file [not found] ` <47081453.7000709@everyzing.com> @ 2007-10-08 16:29 ` Jeff Moyer 2007-10-08 16:35 ` Dan Halbert 0 siblings, 1 reply; 22+ messages in thread From: Jeff Moyer @ 2007-10-08 16:29 UTC (permalink / raw) To: Dan Halbert; +Cc: autofs Dan Halbert <halbert@everyzing.com> writes: > Ian Kent wrote: > > On Fri, 2007-10-05 at 17:14 -0400, Dan Halbert wrote: > >> I have what looks like an automount race condition, and am very puzzled... > >> > >> The first time I reference an automounted file, it is not there > >> (ENOENT). On the second and later try, the file is there... > > > > We really must have a debug log, include everything and give some > > indication of when the problem occurred.... > > Details you requested follow. > autofs-4.1.3-199.3 > (I also reproduced the problem with autofs-4.1.3-214, the latest version > at jmoyer's webpage.) > kernel: > 2.6.9-55.0.9.ELsmp x86_64 I can't reproduce the problem. Would you be willing to enable debugging in the kernel module. This will generate oodles of output. Alternatively, I could try to come up with some pointed kprobes to get the information we most likely need. The first option will definitely be quicker. Cheers, Jeff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-08 16:29 ` Jeff Moyer @ 2007-10-08 16:35 ` Dan Halbert 2007-10-08 16:43 ` Jeff Moyer 2007-10-08 17:20 ` Jeff Moyer 0 siblings, 2 replies; 22+ messages in thread From: Dan Halbert @ 2007-10-08 16:35 UTC (permalink / raw) To: autofs Jeff Moyer wrote: > I can't reproduce the problem. Would you be willing to enable > debugging in the kernel module. > Sure, happy to do this, though you'll need to tell me how to turn that on. I'm testing on a machine I have complete control over. Dan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-08 16:35 ` Dan Halbert @ 2007-10-08 16:43 ` Jeff Moyer 2007-10-08 17:20 ` Jeff Moyer 1 sibling, 0 replies; 22+ messages in thread From: Jeff Moyer @ 2007-10-08 16:43 UTC (permalink / raw) To: Dan Halbert; +Cc: autofs Dan Halbert <halbert@everyzing.com> writes: > Jeff Moyer wrote: >> I can't reproduce the problem. Would you be willing to enable >> debugging in the kernel module. >> > Sure, happy to do this, though you'll need to tell me how to turn that > on. I'm testing on a machine I have complete control over. Apply the following patch and rebuild the kernel module. Cheers, Jeff --- linux-2.6.9/fs/autofs4/autofs_i.h.orig 2007-10-08 12:43:00.000000000 -0400 +++ linux-2.6.9/fs/autofs4/autofs_i.h 2007-10-08 12:43:06.000000000 -0400 @@ -30,7 +30,7 @@ #include <asm/current.h> #include <asm/uaccess.h> -/* #define DEBUG */ +#define DEBUG #ifdef DEBUG #define DPRINTK(fmt,args...) do { printk(KERN_DEBUG "pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); } while(0) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-08 16:35 ` Dan Halbert 2007-10-08 16:43 ` Jeff Moyer @ 2007-10-08 17:20 ` Jeff Moyer 2007-10-08 18:00 ` Dan Halbert 1 sibling, 1 reply; 22+ messages in thread From: Jeff Moyer @ 2007-10-08 17:20 UTC (permalink / raw) To: Dan Halbert; +Cc: ikent, autofs Dan Halbert <halbert@everyzing.com> writes: > Jeff Moyer wrote: >> I can't reproduce the problem. Would you be willing to enable >> debugging in the kernel module. >> > Sure, happy to do this, though you'll need to tell me how to turn that > on. I'm testing on a machine I have complete control over. I was able to reproduce it. It turns out that I had a kernel installed that had a fix for the following bug: Bugzilla Bug 248126: autofs problem with symbolic links When I moved to the exact kernel you were running, I hit the problem. So, it's a known issue, and it had better be addressed in the next update (kernel 2.6.9-61.EL). Cheers, Jeff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-08 17:20 ` Jeff Moyer @ 2007-10-08 18:00 ` Dan Halbert 2007-10-09 3:11 ` Ian Kent 0 siblings, 1 reply; 22+ messages in thread From: Dan Halbert @ 2007-10-08 18:00 UTC (permalink / raw) To: Jeff Moyer; +Cc: ikent, autofs Jeff Moyer wrote: > I was able to reproduce it. It turns out that I had a kernel > installed that had a fix for the following bug: > Bugzilla Bug 248126: autofs problem with symbolic links > > When I moved to the exact kernel you were running, I hit the problem. > So, it's a known issue, and it had better be addressed in the next > update (kernel 2.6.9-61.EL). > Thanks! You saved me from a module rebuild, which I have not done in quite a few years. I had searched the existing bugs and seen 248126, but it did not seem to me that I could have had a simultaneous expire, since we have such long timeouts on the automounts (usually --timeout=86400). But I must have misunderstood what "expire" means in this case. Also, from 248126 and the bug it references, 174821, it appeared that these patches were already incorporated into my kernel 2.6.9-55.0.9-smp (248126 comment #24 mentions 55.0.7, for instance). But apparently not! Is this bug non-existent in the latest updated RHEL5.0? I am trying to think of a workaround until 2.6.9-61 comes out. We have a tried cron job to provoke the automount more often than its timeout, but I am not sure that would solve the problem. Dan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ENOENT on first reference to an automounted file 2007-10-08 18:00 ` Dan Halbert @ 2007-10-09 3:11 ` Ian Kent 0 siblings, 0 replies; 22+ messages in thread From: Ian Kent @ 2007-10-09 3:11 UTC (permalink / raw) To: Dan Halbert; +Cc: autofs On Mon, 2007-10-08 at 14:00 -0400, Dan Halbert wrote: > Jeff Moyer wrote: > > I was able to reproduce it. It turns out that I had a kernel > > installed that had a fix for the following bug: > > Bugzilla Bug 248126: autofs problem with symbolic links > > > > When I moved to the exact kernel you were running, I hit the problem. > > So, it's a known issue, and it had better be addressed in the next > > update (kernel 2.6.9-61.EL). > > > Thanks! You saved me from a module rebuild, which I have not done in > quite a few years. > > I had searched the existing bugs and seen 248126, but it did not seem to > me that I could have had a simultaneous expire, since we have such long > timeouts on the automounts (usually --timeout=86400). But I must have > misunderstood what "expire" means in this case. Yes, that's a bit puzzling. > > Also, from 248126 and the bug it references, 174821, it appeared that > these patches were already incorporated into my kernel 2.6.9-55.0.9-smp > (248126 comment #24 mentions 55.0.7, for instance). But apparently not! The patch was reverted in this revision. I'm not sure why. > > Is this bug non-existent in the latest updated RHEL5.0? I am trying to > think of a workaround until 2.6.9-61 comes out. We have a tried cron job > to provoke the automount more often than its timeout, but I am not sure > that would solve the problem. I'd need to check but I believe the patch is present in the RHEL5 kernel (but I think there are a couple of corrections missing). You could apply the patch(es) yourself. Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2007-11-18 2:46 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.1.1192795201.25176.autofs@linux.kernel.org>
2007-10-20 1:28 ` ENOENT on first reference to an automounted file To: autofs@linux.kernel.org Greg Earle
2007-11-07 1:53 ` ENOENT on first reference to an automounted file Dan Halbert
[not found] <mailman.446.1194400455.3098.autofs@linux.kernel.org>
2007-11-17 21:25 ` Greg Earle
2007-11-18 2:46 ` Ian Kent
2007-10-18 10:36 Greg Earle
2007-10-19 8:02 ` Ian Kent
2007-10-19 13:20 ` Dan Halbert
2007-10-19 14:37 ` Ian Kent
2007-10-19 15:22 ` Jeff Moyer
2007-10-19 17:05 ` Dan Halbert
2007-10-19 17:21 ` Ian Kent
2007-11-03 15:27 ` Dan Halbert
2007-11-04 5:12 ` Ian Kent
-- strict thread matches above, loose matches on Subject: below --
2007-10-05 21:14 Dan Halbert
2007-10-06 4:48 ` Ian Kent
2007-10-08 15:15 ` Jeff Moyer
[not found] ` <47081453.7000709@everyzing.com>
2007-10-08 16:29 ` Jeff Moyer
2007-10-08 16:35 ` Dan Halbert
2007-10-08 16:43 ` Jeff Moyer
2007-10-08 17:20 ` Jeff Moyer
2007-10-08 18:00 ` Dan Halbert
2007-10-09 3:11 ` Ian Kent
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.