public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Venkata Ravella <Venkata.Ravella@synopsys.com>
To: linux-kernel@vger.kernel.org
Cc: Ramki Balasubramanian <Ramki.Balasubramanium@synopsys.com>,
	ab@californiadigital.com, hpa@zytor.com
Subject: Automount/NFS issues causing executables to appear corrupted
Date: Sun, 18 Apr 2004 14:23:46 -0700	[thread overview]
Message-ID: <20040418212346.GA23560@rearview.synopsys.com> (raw)


The current kernel we use is default 7.2 kernel with two modifications:
1) BM patch applied to extend address space for a single process to 3.6GB
2) mnt patch applied to allow upto 1024 nfs mount points

uname -r output:
2.4.7-10mntBMsmp

Here is the detailed description of the problem, it's symptoms and few
observations. Let me know where I should look for solutions, pointers 
that can help further debug this problem more and any possible solutions.
Unfortunately, upgrading to a newer kernel is not an option for us at the
moment. 


The problem Description:
The executables on a particular nfs directory appear corrupted. The problem
is limited to that one nfs filesystem only. Analysis done so far is pointing to 
automount/nfs on the local host as the culprit.  Until a permanent fix can be 
found, the nfs directory has to be unmounted and re-mounted or the automount 
has to be restarted to clear the problem.  This problem is not reproducible 
but, showing up on our systems at random.


Symptoms:
The following are the symptoms of this problem. These symptoms may be very
misleading to the user.

Symptom 1
---------
executable gives one of the following errors and fail:

error while loading shared libraries: unexpected PLT reloc type 0x00
or
error while loading shared libraries: unsupported version 0 of Verneed record
or
Memory fault, Segmentation fault or Illegal instruction

Symptom 2
---------
executable gives the following kind of errors and fail:

/lib/libdl.so.2: version `' not found
/lib/i686/libm.so.6: version `' not found
/lib/i686/libpthread.so.0: version `' not found
/lib/i686/libc.so.6: version `' not found


Symptom 3
---------
SGE generated job output logs are truncated.



Detailed Analysis [Data Points Only]:

Sum produces wrong result
-------------------------
Example comparison of sum output of the same executable extracted from a good
system with the executable extracted from a bad one:

$ sum qqq*
50340  1147 qqq.bad
48019  1147 qqq.good
$


Executable dies at at relocation phase
--------------------------------------
The following is the tail output from the executable run with LD_DEBUG=all
setting:

24416:  symbol=stderr;  lookup in file=/lib/i686/libm.so.6
24416:  symbol=stderr;  lookup in file=/lib/i686/libc.so.6
24416:  binding file ./qqq.bad to /lib/i686/libc.so.6: normal symbol `stderr'
[GLIBC_2.0]
24416:  symbol=__ctype_toupper;  lookup in file=/lib/i686/libm.so.6
24416:  symbol=__ctype_toupper;  lookup in file=/lib/i686/libc.so.6
24416:  binding file ./qqq.bad to /lib/i686/libc.so.6: normal symbol
`__ctype_toupper' [GLIBC_2.0]
24416:  symbol=__ctype_b;  lookup in file=/lib/i686/libm.so.6
24416:  symbol=__ctype_b;  lookup in file=/lib/i686/libc.so.6
24416:  binding file ./qqq.bad to /lib/i686/libc.so.6: normal symbol
`__ctype_b' [GLIBC_2.0]
./qqq.bad: error while loading shared libraries: unexpected PLT reloc type
0x00


cmp output between good and bad executable differ
-------------------------------------------------
$ cmp  qqq.bad qqq.good
qqq.bad qqq.good differ: char 12289, line 40


Object dump on bad executable shows null bytes from 12289
---------------------------------------------------------
$ od -j 12250 qqq.bad | head -10
0027732 004023 142007 000000 037340 004023 142407 000000 037344
0027752 004023 143007 000000 037350 004023 143407 000000 037354
0027772 004023 144007 000000 000000 000000 000000 000000 000000
0030012 000000 000000 000000 000000 000000 000000 000000 000000
*
0037772 000000 000000 000000 175750 000316 164400 127560 000000
0040012 133215 000000 000000 106613 174324 177777 161676 010033
0040032 135410 015743 004020 010613 153611 003271 000000 176000
0040052 140061 123363 002164 140031 000414 140205 013564 164676
0040072 010033 104410 134727 000006 000000 124374 171400 007646

$ od -j 12250 qqq.good  | head -10
0027732 004023 142007 000000 037340 004023 142407 000000 037344
0027752 004023 143007 000000 037350 004023 143407 000000 037354
0027772 004023 144007 000000 037360 004023 144407 000000 037364
0030012 004023 145007 000000 037370 004023 145407 000000 037374
0030032 004023 146007 000000 037400 004023 146407 000000 037404
0030052 004023 147007 000000 037410 004023 147407 000000 037414
0030072 004023 151007 000000 037420 004023 151407 000000 037424
0030112 004023 152007 000000 037430 004023 152407 000000 037434
0030132 004023 153007 000000 037440 004023 153407 000000 037444
0030152 004023 154007 000000 037450 004023 156407 000000 037454


Other Observations:

- sum output comparison of the executable between two different systems
experiencing this behaviour is
  different.

- This affects only executables. Text files seem to be fine.

- copying any binary into the affected nfs partition gives input/output
error:

$ cp /tmp/ppp.good .
cp: writing `./ppp.good': Input/output error
cp: closing `./ppp.good': Input/output error

$ cp /usr/bin/archive .
cp: closing `./archive': Input/output error




             reply	other threads:[~2004-04-18 21:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-18 21:23 Venkata Ravella [this message]
2004-04-18 23:06 ` Automount/NFS issues causing executables to appear corrupted H. Peter Anvin
2004-04-19  1:07 ` Ian Kent
  -- strict thread matches above, loose matches on Subject: below --
2004-04-20  0:08 Venkata Ravella
2004-04-20  0:24 ` H. Peter Anvin
2004-04-20  1:27   ` Ian Kent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040418212346.GA23560@rearview.synopsys.com \
    --to=venkata.ravella@synopsys.com \
    --cc=Ramki.Balasubramanium@synopsys.com \
    --cc=ab@californiadigital.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox