* vmlinux ELF header sometimes corrupt
@ 2020-01-22 17:52 ` Rasmus Villemoes
0 siblings, 0 replies; 13+ messages in thread
From: Rasmus Villemoes @ 2020-01-22 17:52 UTC (permalink / raw)
To: LKML; +Cc: Linux Kbuild mailing list, linuxppc-dev@lists.ozlabs.org
I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a
very hard to debug problem that maybe someone else has encountered. This
doesn't happen always, perhaps 1 in 8 times or something like that.
The issue is that when the build gets to do "${CROSS}objcopy -O binary
... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally
that fails with
powerpc-oe-linux-objcopy:vmlinux: file format not recognized
So I hacked link-vmlinux.sh to stash copies of vmlinux before and after
sortextable vmlinux. Both of those are proper ELF files, and comparing
the corrupted vmlinux to vmlinux.after_sort they are identical after the
first 52 bytes; in vmlinux, those first 52 bytes are all 0.
I also saved stat(1) info to see if vmlinux is being replaced or
modified in-place.
$ cat vmlinux.stat.after_sort
File: 'vmlinux'
Size: 8608456 Blocks: 16696 IO Block: 4096 regular file
Device: 811h/2065d Inode: 21919132 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user)
Access: 2020-01-22 10:52:38.946703081 +0000
Modify: 2020-01-22 10:52:38.954703105 +0000
Change: 2020-01-22 10:52:38.954703105 +0000
$ stat vmlinux
File: 'vmlinux'
Size: 8608456 Blocks: 16688 IO Block: 4096 regular file
Device: 811h/2065d Inode: 21919132 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user)
Access: 2020-01-22 17:20:00.650379057 +0000
Modify: 2020-01-22 10:52:38.954703105 +0000
Change: 2020-01-22 10:52:38.954703105 +0000
So the inode number and mtime/ctime are exactly the same, but for some
reason Blocks: has changed? This is on an ext4 filesystem, but I don't
suspect the filesystem to be broken, because it's always just vmlinux
that ends up corrupt, and always in exactly this way with the first 52
bytes having been wiped.
Any ideas?
Rasmus
^ permalink raw reply [flat|nested] 13+ messages in thread* vmlinux ELF header sometimes corrupt @ 2020-01-22 17:52 ` Rasmus Villemoes 0 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2020-01-22 17:52 UTC (permalink / raw) To: LKML; +Cc: linuxppc-dev@lists.ozlabs.org, Linux Kbuild mailing list I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a very hard to debug problem that maybe someone else has encountered. This doesn't happen always, perhaps 1 in 8 times or something like that. The issue is that when the build gets to do "${CROSS}objcopy -O binary ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally that fails with powerpc-oe-linux-objcopy:vmlinux: file format not recognized So I hacked link-vmlinux.sh to stash copies of vmlinux before and after sortextable vmlinux. Both of those are proper ELF files, and comparing the corrupted vmlinux to vmlinux.after_sort they are identical after the first 52 bytes; in vmlinux, those first 52 bytes are all 0. I also saved stat(1) info to see if vmlinux is being replaced or modified in-place. $ cat vmlinux.stat.after_sort File: 'vmlinux' Size: 8608456 Blocks: 16696 IO Block: 4096 regular file Device: 811h/2065d Inode: 21919132 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) Access: 2020-01-22 10:52:38.946703081 +0000 Modify: 2020-01-22 10:52:38.954703105 +0000 Change: 2020-01-22 10:52:38.954703105 +0000 $ stat vmlinux File: 'vmlinux' Size: 8608456 Blocks: 16688 IO Block: 4096 regular file Device: 811h/2065d Inode: 21919132 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) Access: 2020-01-22 17:20:00.650379057 +0000 Modify: 2020-01-22 10:52:38.954703105 +0000 Change: 2020-01-22 10:52:38.954703105 +0000 So the inode number and mtime/ctime are exactly the same, but for some reason Blocks: has changed? This is on an ext4 filesystem, but I don't suspect the filesystem to be broken, because it's always just vmlinux that ends up corrupt, and always in exactly this way with the first 52 bytes having been wiped. Any ideas? Rasmus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt 2020-01-22 17:52 ` Rasmus Villemoes (?) @ 2020-01-24 10:50 ` Michael Ellerman -1 siblings, 0 replies; 13+ messages in thread From: Michael Ellerman @ 2020-01-24 10:50 UTC (permalink / raw) To: Rasmus Villemoes, LKML Cc: Linux Kbuild mailing list, linuxppc-dev@lists.ozlabs.org Rasmus Villemoes <linux@rasmusvillemoes.dk> writes: > I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a > very hard to debug problem that maybe someone else has encountered. This > doesn't happen always, perhaps 1 in 8 times or something like that. > > The issue is that when the build gets to do "${CROSS}objcopy -O binary > ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally > that fails with > > powerpc-oe-linux-objcopy:vmlinux: file format not recognized > > So I hacked link-vmlinux.sh to stash copies of vmlinux before and after > sortextable vmlinux. Both of those are proper ELF files, and comparing > the corrupted vmlinux to vmlinux.after_sort they are identical after the > first 52 bytes; in vmlinux, those first 52 bytes are all 0. > > I also saved stat(1) info to see if vmlinux is being replaced or > modified in-place. > > $ cat vmlinux.stat.after_sort > File: 'vmlinux' > Size: 8608456 Blocks: 16696 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 10:52:38.946703081 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > $ stat vmlinux > File: 'vmlinux' > Size: 8608456 Blocks: 16688 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 17:20:00.650379057 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. > > Any ideas? Not really sorry. Haven't seen or heard of that before. Are you doing a parallel make? If so does -j 1 fix it? If it seems like sortextable is at fault then strace'ing it would be my next step. cheers ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt @ 2020-01-24 10:50 ` Michael Ellerman 0 siblings, 0 replies; 13+ messages in thread From: Michael Ellerman @ 2020-01-24 10:50 UTC (permalink / raw) To: Rasmus Villemoes, LKML Cc: Linux Kbuild mailing list, linuxppc-dev@lists.ozlabs.org Rasmus Villemoes <linux@rasmusvillemoes.dk> writes: > I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a > very hard to debug problem that maybe someone else has encountered. This > doesn't happen always, perhaps 1 in 8 times or something like that. > > The issue is that when the build gets to do "${CROSS}objcopy -O binary > ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally > that fails with > > powerpc-oe-linux-objcopy:vmlinux: file format not recognized > > So I hacked link-vmlinux.sh to stash copies of vmlinux before and after > sortextable vmlinux. Both of those are proper ELF files, and comparing > the corrupted vmlinux to vmlinux.after_sort they are identical after the > first 52 bytes; in vmlinux, those first 52 bytes are all 0. > > I also saved stat(1) info to see if vmlinux is being replaced or > modified in-place. > > $ cat vmlinux.stat.after_sort > File: 'vmlinux' > Size: 8608456 Blocks: 16696 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 10:52:38.946703081 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > $ stat vmlinux > File: 'vmlinux' > Size: 8608456 Blocks: 16688 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 17:20:00.650379057 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. > > Any ideas? Not really sorry. Haven't seen or heard of that before. Are you doing a parallel make? If so does -j 1 fix it? If it seems like sortextable is at fault then strace'ing it would be my next step. cheers ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt @ 2020-01-24 10:50 ` Michael Ellerman 0 siblings, 0 replies; 13+ messages in thread From: Michael Ellerman @ 2020-01-24 10:50 UTC (permalink / raw) To: Rasmus Villemoes, LKML Cc: linuxppc-dev@lists.ozlabs.org, Linux Kbuild mailing list Rasmus Villemoes <linux@rasmusvillemoes.dk> writes: > I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a > very hard to debug problem that maybe someone else has encountered. This > doesn't happen always, perhaps 1 in 8 times or something like that. > > The issue is that when the build gets to do "${CROSS}objcopy -O binary > ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally > that fails with > > powerpc-oe-linux-objcopy:vmlinux: file format not recognized > > So I hacked link-vmlinux.sh to stash copies of vmlinux before and after > sortextable vmlinux. Both of those are proper ELF files, and comparing > the corrupted vmlinux to vmlinux.after_sort they are identical after the > first 52 bytes; in vmlinux, those first 52 bytes are all 0. > > I also saved stat(1) info to see if vmlinux is being replaced or > modified in-place. > > $ cat vmlinux.stat.after_sort > File: 'vmlinux' > Size: 8608456 Blocks: 16696 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 10:52:38.946703081 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > $ stat vmlinux > File: 'vmlinux' > Size: 8608456 Blocks: 16688 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 17:20:00.650379057 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. > > Any ideas? Not really sorry. Haven't seen or heard of that before. Are you doing a parallel make? If so does -j 1 fix it? If it seems like sortextable is at fault then strace'ing it would be my next step. cheers ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt 2020-01-24 10:50 ` Michael Ellerman @ 2020-01-24 11:16 ` Rasmus Villemoes -1 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2020-01-24 11:16 UTC (permalink / raw) To: Michael Ellerman, LKML Cc: Linux Kbuild mailing list, linuxppc-dev@lists.ozlabs.org On 24/01/2020 11.50, Michael Ellerman wrote: > Rasmus Villemoes <linux@rasmusvillemoes.dk> writes: >> I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a >> very hard to debug problem that maybe someone else has encountered. This >> doesn't happen always, perhaps 1 in 8 times or something like that. >> >> The issue is that when the build gets to do "${CROSS}objcopy -O binary >> ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally >> that fails with >> >> powerpc-oe-linux-objcopy:vmlinux: file format not recognized >> >> >> Any ideas? > > Not really sorry. Haven't seen or heard of that before. > > Are you doing a parallel make? If so does -j 1 fix it? Hard to say, I'll have to try that a number of times to see if it can be reproduced with that setting. > If it seems like sortextable is at fault then strace'ing it would be my > next step. I don't think sortextable is at fault, that was just my first "I know that at least pokes around in the ELF file". I do "cp vmlinux vmlinux.before_sort" and "cp vmlinux vmlinux.after_sort", and both of those copies are proper ELF files - and the .after_sort is identical to the corrupt vmlinux apart from vmlinux ending up with its ELF header wiped. So it's something that happens during some later build step (Yocto has a lot of steps), perhaps "make modules" or "make modules_install" or something ends up somehow deciding "hey, vmlinux isn't quite uptodate, let's nuke it". I'm not even sure it's a Kbuild problem, but I've seen the same thing happen using another meta-build system called oe-lite, which is why I'm not primarily suspecting the Yocto logic. Rasmus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt @ 2020-01-24 11:16 ` Rasmus Villemoes 0 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2020-01-24 11:16 UTC (permalink / raw) To: Michael Ellerman, LKML Cc: linuxppc-dev@lists.ozlabs.org, Linux Kbuild mailing list On 24/01/2020 11.50, Michael Ellerman wrote: > Rasmus Villemoes <linux@rasmusvillemoes.dk> writes: >> I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a >> very hard to debug problem that maybe someone else has encountered. This >> doesn't happen always, perhaps 1 in 8 times or something like that. >> >> The issue is that when the build gets to do "${CROSS}objcopy -O binary >> ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally >> that fails with >> >> powerpc-oe-linux-objcopy:vmlinux: file format not recognized >> >> >> Any ideas? > > Not really sorry. Haven't seen or heard of that before. > > Are you doing a parallel make? If so does -j 1 fix it? Hard to say, I'll have to try that a number of times to see if it can be reproduced with that setting. > If it seems like sortextable is at fault then strace'ing it would be my > next step. I don't think sortextable is at fault, that was just my first "I know that at least pokes around in the ELF file". I do "cp vmlinux vmlinux.before_sort" and "cp vmlinux vmlinux.after_sort", and both of those copies are proper ELF files - and the .after_sort is identical to the corrupt vmlinux apart from vmlinux ending up with its ELF header wiped. So it's something that happens during some later build step (Yocto has a lot of steps), perhaps "make modules" or "make modules_install" or something ends up somehow deciding "hey, vmlinux isn't quite uptodate, let's nuke it". I'm not even sure it's a Kbuild problem, but I've seen the same thing happen using another meta-build system called oe-lite, which is why I'm not primarily suspecting the Yocto logic. Rasmus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt 2020-01-22 17:52 ` Rasmus Villemoes (?) @ 2020-01-24 13:14 ` Andreas Schwab -1 siblings, 0 replies; 13+ messages in thread From: Andreas Schwab @ 2020-01-24 13:14 UTC (permalink / raw) To: Rasmus Villemoes Cc: LKML, Linux Kbuild mailing list, linuxppc-dev@lists.ozlabs.org On Jan 22 2020, Rasmus Villemoes wrote: > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. Note that the size of the ELF header (Elf32_Ehdr) is 52 bytes. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt @ 2020-01-24 13:14 ` Andreas Schwab 0 siblings, 0 replies; 13+ messages in thread From: Andreas Schwab @ 2020-01-24 13:14 UTC (permalink / raw) To: Rasmus Villemoes Cc: LKML, Linux Kbuild mailing list, linuxppc-dev@lists.ozlabs.org On Jan 22 2020, Rasmus Villemoes wrote: > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. Note that the size of the ELF header (Elf32_Ehdr) is 52 bytes. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt @ 2020-01-24 13:14 ` Andreas Schwab 0 siblings, 0 replies; 13+ messages in thread From: Andreas Schwab @ 2020-01-24 13:14 UTC (permalink / raw) To: Rasmus Villemoes Cc: linuxppc-dev@lists.ozlabs.org, LKML, Linux Kbuild mailing list On Jan 22 2020, Rasmus Villemoes wrote: > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. Note that the size of the ELF header (Elf32_Ehdr) is 52 bytes. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt 2020-01-22 17:52 ` Rasmus Villemoes @ 2020-01-28 8:12 ` Rasmus Villemoes -1 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2020-01-28 8:12 UTC (permalink / raw) To: LKML; +Cc: Linux Kbuild mailing list, linuxppc-dev@lists.ozlabs.org, linux-ext4 On 22/01/2020 18.52, Rasmus Villemoes wrote: > I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a > very hard to debug problem that maybe someone else has encountered. This > doesn't happen always, perhaps 1 in 8 times or something like that. > > The issue is that when the build gets to do "${CROSS}objcopy -O binary > ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally > that fails with > > powerpc-oe-linux-objcopy:vmlinux: file format not recognized > > So I hacked link-vmlinux.sh to stash copies of vmlinux before and after > sortextable vmlinux. Both of those are proper ELF files, and comparing > the corrupted vmlinux to vmlinux.after_sort they are identical after the > first 52 bytes; in vmlinux, those first 52 bytes are all 0. > > I also saved stat(1) info to see if vmlinux is being replaced or > modified in-place. > > $ cat vmlinux.stat.after_sort > File: 'vmlinux' > Size: 8608456 Blocks: 16696 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 10:52:38.946703081 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > $ stat vmlinux > File: 'vmlinux' > Size: 8608456 Blocks: 16688 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 17:20:00.650379057 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. So, I think I take that last part back. I just hit a case where I built the kernel manually, made a copy of vmlinux to vmlinux.copy, and file(1) said both were fine (and cmp(1) agreed they were identical). Then I went off and did work elsewhere with a lot of I/O. When I came back to the linux build dir, vmlinux was broken, exactly as before. So I now suspect it to be some kind of "while the file is in the pagecache, everything is fine, but when it's read back from disk it's broken". My ext4 fs does have inline_data enabled, which could explain why the corruption happens in the beginning. It's just very odd that it only ever seems to trigger for vmlinux and not other files, but perhaps the I/O patterns that ld and/or sortextable does are exactly what are needed to trigger the bug. I've done a long overdue kernel update, and there are quite a few fs/ext4/ -stable patches in there, so now I'll see if it still happens. And if anything more comes of this, I'll remove the kbuild and ppc lists from cc, sorry for the noise. Rasmus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt @ 2020-01-28 8:12 ` Rasmus Villemoes 0 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2020-01-28 8:12 UTC (permalink / raw) To: LKML; +Cc: linux-ext4, linuxppc-dev@lists.ozlabs.org, Linux Kbuild mailing list On 22/01/2020 18.52, Rasmus Villemoes wrote: > I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a > very hard to debug problem that maybe someone else has encountered. This > doesn't happen always, perhaps 1 in 8 times or something like that. > > The issue is that when the build gets to do "${CROSS}objcopy -O binary > ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally > that fails with > > powerpc-oe-linux-objcopy:vmlinux: file format not recognized > > So I hacked link-vmlinux.sh to stash copies of vmlinux before and after > sortextable vmlinux. Both of those are proper ELF files, and comparing > the corrupted vmlinux to vmlinux.after_sort they are identical after the > first 52 bytes; in vmlinux, those first 52 bytes are all 0. > > I also saved stat(1) info to see if vmlinux is being replaced or > modified in-place. > > $ cat vmlinux.stat.after_sort > File: 'vmlinux' > Size: 8608456 Blocks: 16696 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 10:52:38.946703081 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > $ stat vmlinux > File: 'vmlinux' > Size: 8608456 Blocks: 16688 IO Block: 4096 regular file > Device: 811h/2065d Inode: 21919132 Links: 1 > Access: (0755/-rwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user) > Access: 2020-01-22 17:20:00.650379057 +0000 > Modify: 2020-01-22 10:52:38.954703105 +0000 > Change: 2020-01-22 10:52:38.954703105 +0000 > > So the inode number and mtime/ctime are exactly the same, but for some > reason Blocks: has changed? This is on an ext4 filesystem, but I don't > suspect the filesystem to be broken, because it's always just vmlinux > that ends up corrupt, and always in exactly this way with the first 52 > bytes having been wiped. So, I think I take that last part back. I just hit a case where I built the kernel manually, made a copy of vmlinux to vmlinux.copy, and file(1) said both were fine (and cmp(1) agreed they were identical). Then I went off and did work elsewhere with a lot of I/O. When I came back to the linux build dir, vmlinux was broken, exactly as before. So I now suspect it to be some kind of "while the file is in the pagecache, everything is fine, but when it's read back from disk it's broken". My ext4 fs does have inline_data enabled, which could explain why the corruption happens in the beginning. It's just very odd that it only ever seems to trigger for vmlinux and not other files, but perhaps the I/O patterns that ld and/or sortextable does are exactly what are needed to trigger the bug. I've done a long overdue kernel update, and there are quite a few fs/ext4/ -stable patches in there, so now I'll see if it still happens. And if anything more comes of this, I'll remove the kbuild and ppc lists from cc, sorry for the noise. Rasmus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: vmlinux ELF header sometimes corrupt 2020-01-28 8:12 ` Rasmus Villemoes (?) @ 2020-02-27 15:57 ` Rasmus Villemoes -1 siblings, 0 replies; 13+ messages in thread From: Rasmus Villemoes @ 2020-02-27 15:57 UTC (permalink / raw) Cc: linux-ext4 On 28/01/2020 09.12, Rasmus Villemoes wrote: > I've done a long overdue kernel update, and there are quite a few > fs/ext4/ -stable patches in there, so now I'll see if it still happens. OK, I still see the vmlinux corruption. To recap: My laptop uses a Ubuntu 4.15 kernel. I'm building a ppc32 kernel export ARCH=powerpc export CROSS_COMPILE=powerpc-linux-gnu- make 83xx/mpc8315_rdb_defconfig make -j8 Then "file vmlinux" says all is good: "ELF 32-bit MSB executable, PowerPC or ..." Time passes, I/O in other parts of the system etc., now suddenly "file vmlinux" says "data". Inspecting the file shows that the first 52 bytes (aka exactly the ELF header) are 0. Instrumenting the kernel build [1] to save some intermediate stages tells me that the vmlinux file is fine beyond 52 bytes. I.e., $ cmp -i 52 vmlinux vmlinux.after_sort is fine. Now comparing stat(1) on vmlinux immediately after sortextable to stat(1) on the corrupted vmlinux shows something a bit odd in the Blocks: field, but ino, mtime are unchanged (as one would expect when nothing should have touched that file). $ cat vmlinux.stat.after_sort File: vmlinux Size: 8055408 Blocks: 15624 IO Block: 4096 regular file Device: 811h/2065d Inode: 85736362 Links: 1 Access: (0775/-rwxrwxr-x) Uid: ( 1000/ ravi) Gid: ( 1000/ ravi) Access: 2020-02-27 13:15:10.989950361 +0100 Modify: 2020-02-27 13:15:11.033950470 +0100 Change: 2020-02-27 13:15:11.033950470 +0100 Birth: - $ stat vmlinux File: vmlinux Size: 8055408 Blocks: 15616 IO Block: 4096 regular file Device: 811h/2065d Inode: 85736362 Links: 1 Access: (0775/-rwxrwxr-x) Uid: ( 1000/ ravi) Gid: ( 1000/ ravi) Access: 2020-02-27 15:46:55.354409876 +0100 Modify: 2020-02-27 13:15:11.033950470 +0100 Change: 2020-02-27 13:15:11.033950470 +0100 Birth: - Oddly enough, I never seem to have problems with any other files, so I'm thinking there's some particular I/O pattern involved when ld creates the vmlinux file. It does lseek(6, 0, SEEK_SET) = 0 read(6, "", 52) = 0 lseek(6, 52, SEEK_CUR) = 52 initially, and then much later actually writes the header lseek(6, 0, SEEK_SET) = 0 write(6, "\177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\2\0\24\0\0\0\1\300\0\0\0\0\0\0004"..., 52) = 52 Then sortextable does some poking around, but that's just via mmap(), so not much to see in strace. I also don't seem to be able to reproduce it when I build an x86_64 kernel instead of ppc32 - perhaps the bigger ELF header matters? I'll try some other 32 bit platforms. I have tried creating an ext4 image in a 50 GB file with the exact same file system options [2] and did a "make O=..." build to see if I could make an image file containing the corrupted state, but so far no luck. fsck on /dev/sdb1 says my filesystem is just fine. This is starting to drive me a bit crazy, so I really hope someone goes "yeah, you should run at least v5.2, that has commit abc123def which fixes that". Thanks, Rasmus [1] I've applied this on top of v4.19.82: diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index c8cf45362bd6..6a6d8ac79607 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -94,8 +94,13 @@ vmlinux_link() ${KBUILD_VMLINUX_LIBS} \ --end-group \ ${1}" + if [ "${2}" = vmlinux ] ; then + strace="strace -o ${objtree}/vmlinux.strace" + else + strace="" + fi - ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \ + $strace ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \ -T ${lds} ${objects} else objects="-Wl,--whole-archive \ @@ -151,7 +156,7 @@ mksysmap() sortextable() { - ${objtree}/scripts/sortextable ${1} + strace -o ${objtree}/sortextable.strace ${objtree}/scripts/sortextable ${1} } # Delete output files in case of error @@ -283,7 +288,13 @@ vmlinux_link "${kallsymso}" vmlinux if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then info SORTEX vmlinux + stat vmlinux > vmlinux.stat.before_sort + cp --preserve=all vmlinux vmlinux.before_sort + file vmlinux >&2 sortextable vmlinux + stat vmlinux > vmlinux.stat.after_sort + cp --preserve=all vmlinux vmlinux.after_sort + file vmlinux >&2 fi info SYSMAP System.map [2] $ sudo dumpe2fs -h /dev/sdb1 [sudo] password for ravi: dumpe2fs 1.44.1 (24-Mar-2018) Filesystem volume name: <none> Last mounted on: /mnt/ext4 Filesystem UUID: 2b70b7ba-fdd7-4616-8431-2b04ea31d803 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg metadata_csum_seed inline_data sparse_super large_file extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 117440512 Block count: 469762048 Reserved block count: 23488102 Free blocks: 223934526 Free inodes: 88111686 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 1024 Flex block group size: 16 Filesystem created: Thu Jun 13 08:04:11 2019 Last mount time: Wed Jan 29 21:01:18 2020 Last write time: Wed Jan 29 21:01:18 2020 Mount count: 1 Maximum mount count: -1 Last checked: Wed Jan 29 20:49:04 2020 Check interval: 0 (<none>) Lifetime writes: 1563 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 512 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 First orphan inode: 87165617 Default directory hash: half_md4 Directory Hash Seed: d1ac8c45-70f3-4e6e-9710-b7e434bb59ba Journal backup: inode blocks Checksum type: crc32c Checksum: 0x950a844d Checksum seed: 0x143063d3 Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3 Journal size: 1024M Journal length: 262144 Journal sequence: 0x0003218a Journal start: 219662 Journal checksum type: crc32c Journal checksum: 0xea20aea2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-02-27 15:57 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-01-22 17:52 vmlinux ELF header sometimes corrupt Rasmus Villemoes 2020-01-22 17:52 ` Rasmus Villemoes 2020-01-24 10:50 ` Michael Ellerman 2020-01-24 10:50 ` Michael Ellerman 2020-01-24 10:50 ` Michael Ellerman 2020-01-24 11:16 ` Rasmus Villemoes 2020-01-24 11:16 ` Rasmus Villemoes 2020-01-24 13:14 ` Andreas Schwab 2020-01-24 13:14 ` Andreas Schwab 2020-01-24 13:14 ` Andreas Schwab 2020-01-28 8:12 ` Rasmus Villemoes 2020-01-28 8:12 ` Rasmus Villemoes 2020-02-27 15:57 ` Rasmus Villemoes
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.