* Automount/NFS issues causing executables to appear corrupted
@ 2004-04-18 21:23 Venkata Ravella
2004-04-18 23:06 ` H. Peter Anvin
2004-04-19 1:07 ` Ian Kent
0 siblings, 2 replies; 6+ messages in thread
From: Venkata Ravella @ 2004-04-18 21:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Ramki Balasubramanian, ab, hpa
The current kernel we use is default 7.2 kernel with two modifications:
1) BM patch applied to extend address space for a single process to 3.6GB
2) mnt patch applied to allow upto 1024 nfs mount points
uname -r output:
2.4.7-10mntBMsmp
Here is the detailed description of the problem, it's symptoms and few
observations. Let me know where I should look for solutions, pointers
that can help further debug this problem more and any possible solutions.
Unfortunately, upgrading to a newer kernel is not an option for us at the
moment.
The problem Description:
The executables on a particular nfs directory appear corrupted. The problem
is limited to that one nfs filesystem only. Analysis done so far is pointing to
automount/nfs on the local host as the culprit. Until a permanent fix can be
found, the nfs directory has to be unmounted and re-mounted or the automount
has to be restarted to clear the problem. This problem is not reproducible
but, showing up on our systems at random.
Symptoms:
The following are the symptoms of this problem. These symptoms may be very
misleading to the user.
Symptom 1
---------
executable gives one of the following errors and fail:
error while loading shared libraries: unexpected PLT reloc type 0x00
or
error while loading shared libraries: unsupported version 0 of Verneed record
or
Memory fault, Segmentation fault or Illegal instruction
Symptom 2
---------
executable gives the following kind of errors and fail:
/lib/libdl.so.2: version `' not found
/lib/i686/libm.so.6: version `' not found
/lib/i686/libpthread.so.0: version `' not found
/lib/i686/libc.so.6: version `' not found
Symptom 3
---------
SGE generated job output logs are truncated.
Detailed Analysis [Data Points Only]:
Sum produces wrong result
-------------------------
Example comparison of sum output of the same executable extracted from a good
system with the executable extracted from a bad one:
$ sum qqq*
50340 1147 qqq.bad
48019 1147 qqq.good
$
Executable dies at at relocation phase
--------------------------------------
The following is the tail output from the executable run with LD_DEBUG=all
setting:
24416: symbol=stderr; lookup in file=/lib/i686/libm.so.6
24416: symbol=stderr; lookup in file=/lib/i686/libc.so.6
24416: binding file ./qqq.bad to /lib/i686/libc.so.6: normal symbol `stderr'
[GLIBC_2.0]
24416: symbol=__ctype_toupper; lookup in file=/lib/i686/libm.so.6
24416: symbol=__ctype_toupper; lookup in file=/lib/i686/libc.so.6
24416: binding file ./qqq.bad to /lib/i686/libc.so.6: normal symbol
`__ctype_toupper' [GLIBC_2.0]
24416: symbol=__ctype_b; lookup in file=/lib/i686/libm.so.6
24416: symbol=__ctype_b; lookup in file=/lib/i686/libc.so.6
24416: binding file ./qqq.bad to /lib/i686/libc.so.6: normal symbol
`__ctype_b' [GLIBC_2.0]
./qqq.bad: error while loading shared libraries: unexpected PLT reloc type
0x00
cmp output between good and bad executable differ
-------------------------------------------------
$ cmp qqq.bad qqq.good
qqq.bad qqq.good differ: char 12289, line 40
Object dump on bad executable shows null bytes from 12289
---------------------------------------------------------
$ od -j 12250 qqq.bad | head -10
0027732 004023 142007 000000 037340 004023 142407 000000 037344
0027752 004023 143007 000000 037350 004023 143407 000000 037354
0027772 004023 144007 000000 000000 000000 000000 000000 000000
0030012 000000 000000 000000 000000 000000 000000 000000 000000
*
0037772 000000 000000 000000 175750 000316 164400 127560 000000
0040012 133215 000000 000000 106613 174324 177777 161676 010033
0040032 135410 015743 004020 010613 153611 003271 000000 176000
0040052 140061 123363 002164 140031 000414 140205 013564 164676
0040072 010033 104410 134727 000006 000000 124374 171400 007646
$ od -j 12250 qqq.good | head -10
0027732 004023 142007 000000 037340 004023 142407 000000 037344
0027752 004023 143007 000000 037350 004023 143407 000000 037354
0027772 004023 144007 000000 037360 004023 144407 000000 037364
0030012 004023 145007 000000 037370 004023 145407 000000 037374
0030032 004023 146007 000000 037400 004023 146407 000000 037404
0030052 004023 147007 000000 037410 004023 147407 000000 037414
0030072 004023 151007 000000 037420 004023 151407 000000 037424
0030112 004023 152007 000000 037430 004023 152407 000000 037434
0030132 004023 153007 000000 037440 004023 153407 000000 037444
0030152 004023 154007 000000 037450 004023 156407 000000 037454
Other Observations:
- sum output comparison of the executable between two different systems
experiencing this behaviour is
different.
- This affects only executables. Text files seem to be fine.
- copying any binary into the affected nfs partition gives input/output
error:
$ cp /tmp/ppp.good .
cp: writing `./ppp.good': Input/output error
cp: closing `./ppp.good': Input/output error
$ cp /usr/bin/archive .
cp: closing `./archive': Input/output error
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Automount/NFS issues causing executables to appear corrupted
2004-04-18 21:23 Automount/NFS issues causing executables to appear corrupted Venkata Ravella
@ 2004-04-18 23:06 ` H. Peter Anvin
2004-04-19 1:07 ` Ian Kent
1 sibling, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2004-04-18 23:06 UTC (permalink / raw)
To: Venkata Ravella; +Cc: linux-kernel, Ramki Balasubramanian, ab
Venkata Ravella wrote:
> The current kernel we use is default 7.2 kernel with two modifications:
> 1) BM patch applied to extend address space for a single process to 3.6GB
> 2) mnt patch applied to allow upto 1024 nfs mount points
>
> uname -r output:
> 2.4.7-10mntBMsmp
In other words, you're using an ancient kernel with plenty of known
problems, applied two additional patches to it, and are surprised you're
having problems?
> Unfortunately, upgrading to a newer kernel is not an option for us at
> the moment.
Sucks to be you.
-hpa
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Automount/NFS issues causing executables to appear corrupted
2004-04-18 21:23 Automount/NFS issues causing executables to appear corrupted Venkata Ravella
2004-04-18 23:06 ` H. Peter Anvin
@ 2004-04-19 1:07 ` Ian Kent
1 sibling, 0 replies; 6+ messages in thread
From: Ian Kent @ 2004-04-19 1:07 UTC (permalink / raw)
To: Venkata Ravella; +Cc: linux-kernel, Ramki Balasubramanian, ab, hpa
Please cc autofs questions to the list at autofs@linux.kernel.org.
On Sun, 18 Apr 2004, Venkata Ravella wrote:
>
> The current kernel we use is default 7.2 kernel with two modifications:
> 1) BM patch applied to extend address space for a single process to 3.6GB
> 2) mnt patch applied to allow upto 1024 nfs mount points
>
> uname -r output:
> 2.4.7-10mntBMsmp
What autofs version?
To be honest it's a bit hard to see how this is an autofs issue.
Mind, having said that, ....
Ian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Automount/NFS issues causing executables to appear corrupted
@ 2004-04-20 0:08 Venkata Ravella
2004-04-20 0:24 ` H. Peter Anvin
0 siblings, 1 reply; 6+ messages in thread
From: Venkata Ravella @ 2004-04-20 0:08 UTC (permalink / raw)
To: raven; +Cc: linux-kernel, Ramki.Balasubramanium, ab, hpa, autofs
autofs version is autofs-3.1.7-21
I also have one new update. We started seeing similar problem on
the system running the kernel 2.4.18-e.12smp which has the same
version(3.1.7-21) of autofs as well.
This may or may not be an autofs problem but, restarting autofs
fixes this problem temporarily.
>
>Please cc autofs questions to the list at autofs@linux.kernel.org.
>
>On Sun, 18 Apr 2004, Venkata Ravella wrote:
>
>>
>> The current kernel we use is default 7.2 kernel with two modifications:
>> 1) BM patch applied to extend address space for a single process to 3.6GB
>> 2) mnt patch applied to allow upto 1024 nfs mount points
>>
>> uname -r output:
>> 2.4.7-10mntBMsmp
>
>What autofs version?
>
>To be honest it's a bit hard to see how this is an autofs issue.
>Mind, having said that, ....
>
>Ian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Automount/NFS issues causing executables to appear corrupted
2004-04-20 0:08 Venkata Ravella
@ 2004-04-20 0:24 ` H. Peter Anvin
2004-04-20 1:27 ` Ian Kent
0 siblings, 1 reply; 6+ messages in thread
From: H. Peter Anvin @ 2004-04-20 0:24 UTC (permalink / raw)
To: Venkata Ravella; +Cc: raven, linux-kernel, Ramki.Balasubramanium, ab, autofs
Venkata Ravella wrote:
> autofs version is autofs-3.1.7-21
>
> I also have one new update. We started seeing similar problem on
> the system running the kernel 2.4.18-e.12smp which has the same
> version(3.1.7-21) of autofs as well.
>
> This may or may not be an autofs problem but, restarting autofs
> fixes this problem temporarily.
>
That will cause an NFS remount. This really feels much more like an NFS
problem.
-hpa
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Automount/NFS issues causing executables to appear corrupted
2004-04-20 0:24 ` H. Peter Anvin
@ 2004-04-20 1:27 ` Ian Kent
0 siblings, 0 replies; 6+ messages in thread
From: Ian Kent @ 2004-04-20 1:27 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Venkata Ravella, linux-kernel, Ramki.Balasubramanium, ab, autofs
On Mon, 19 Apr 2004, H. Peter Anvin wrote:
> Venkata Ravella wrote:
> > autofs version is autofs-3.1.7-21
> >
> > I also have one new update. We started seeing similar problem on
> > the system running the kernel 2.4.18-e.12smp which has the same
> > version(3.1.7-21) of autofs as well.
> >
> > This may or may not be an autofs problem but, restarting autofs
> > fixes this problem temporarily.
> >
>
> That will cause an NFS remount. This really feels much more like an NFS
> problem.
Certainly does.
Venkata,
Can you also forward this question to the nfs list at
nfs@lists.sourceforge.net. Sorry to ask you to post all over the place.
Please investigate the NFS client patches maintained by Trond Myklebust.
Check nfs.sourceforge.net. We found we had to use them in early 2.4 versions.
Ian
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-04-20 1:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-18 21:23 Automount/NFS issues causing executables to appear corrupted Venkata Ravella
2004-04-18 23:06 ` H. Peter Anvin
2004-04-19 1:07 ` Ian Kent
-- strict thread matches above, loose matches on Subject: below --
2004-04-20 0:08 Venkata Ravella
2004-04-20 0:24 ` H. Peter Anvin
2004-04-20 1:27 ` Ian Kent
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox