* stat of /proc fails after CPU hot-unplug with EOVERFLOW in 2.6.18
@ 2006-09-27 7:55 Martin Devera
2006-09-27 8:13 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Martin Devera @ 2006-09-27 7:55 UTC (permalink / raw)
To: linux-kernel
Hello,
I have 2way Opteron machine. I've done this:
echo 0 > /sys/devices/system/cpu/cpu1/online
and then strace stat /proc:
[snip]
personality(PER_LINUX) = 4194304
getpid() = 14926
brk(0) = 0x804b000
brk(0x804b1a0) = 0x804b1a0
brk(0x804c000) = 0x804c000
stat("/proc", 0xbf8e7490) = -1 EOVERFLOW
When I do echo 1 > ... to start cpu again then the stat starts
to work again ... Weird.
Martin
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: stat of /proc fails after CPU hot-unplug with EOVERFLOW in 2.6.18 2006-09-27 7:55 stat of /proc fails after CPU hot-unplug with EOVERFLOW in 2.6.18 Martin Devera @ 2006-09-27 8:13 ` Andrew Morton 2006-09-27 14:44 ` Martin Devera 2006-10-03 8:40 ` Martin Devera 0 siblings, 2 replies; 4+ messages in thread From: Andrew Morton @ 2006-09-27 8:13 UTC (permalink / raw) To: Martin Devera; +Cc: linux-kernel On Wed, 27 Sep 2006 09:55:47 +0200 Martin Devera <devik@cdi.cz> wrote: > Hello, > > I have 2way Opteron machine. I've done this: > echo 0 > /sys/devices/system/cpu/cpu1/online > > and then strace stat /proc: > > [snip] > personality(PER_LINUX) = 4194304 > getpid() = 14926 > brk(0) = 0x804b000 > brk(0x804b1a0) = 0x804b1a0 > brk(0x804c000) = 0x804c000 > stat("/proc", 0xbf8e7490) = -1 EOVERFLOW > > When I do echo 1 > ... to start cpu again then the stat starts > to work again ... Weird. > boggle. Can you add this patch, see where it's going bad? fs/stat.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff -puN fs/stat.c~a fs/stat.c --- a/fs/stat.c~a +++ a/fs/stat.c @@ -18,6 +18,8 @@ #include <asm/uaccess.h> #include <asm/unistd.h> +#define D() printk("%s:%d\n", __FILE__, __LINE__) + void generic_fillattr(struct inode *inode, struct kstat *stat) { stat->dev = inode->i_sb->s_dev; @@ -141,14 +143,18 @@ static int cp_old_stat(struct kstat *sta tmp.st_ino = stat->ino; tmp.st_mode = stat->mode; tmp.st_nlink = stat->nlink; - if (tmp.st_nlink != stat->nlink) + if (tmp.st_nlink != stat->nlink) { + D(); return -EOVERFLOW; + } SET_UID(tmp.st_uid, stat->uid); SET_GID(tmp.st_gid, stat->gid); tmp.st_rdev = old_encode_dev(stat->rdev); #if BITS_PER_LONG == 32 - if (stat->size > MAX_NON_LFS) + if (stat->size > MAX_NON_LFS) { + D(); return -EOVERFLOW; + } #endif tmp.st_size = stat->size; tmp.st_atime = stat->atime.tv_sec; @@ -195,11 +201,15 @@ static int cp_new_stat(struct kstat *sta struct stat tmp; #if BITS_PER_LONG == 32 - if (!old_valid_dev(stat->dev) || !old_valid_dev(stat->rdev)) + if (!old_valid_dev(stat->dev) || !old_valid_dev(stat->rdev)) { + D(); return -EOVERFLOW; + } #else - if (!new_valid_dev(stat->dev) || !new_valid_dev(stat->rdev)) + if (!new_valid_dev(stat->dev) || !new_valid_dev(stat->rdev)) { + D(); return -EOVERFLOW; + } #endif memset(&tmp, 0, sizeof(tmp)); @@ -211,8 +221,10 @@ static int cp_new_stat(struct kstat *sta tmp.st_ino = stat->ino; tmp.st_mode = stat->mode; tmp.st_nlink = stat->nlink; - if (tmp.st_nlink != stat->nlink) + if (tmp.st_nlink != stat->nlink) { + D(); return -EOVERFLOW; + } SET_UID(tmp.st_uid, stat->uid); SET_GID(tmp.st_gid, stat->gid); #if BITS_PER_LONG == 32 @@ -221,8 +233,10 @@ static int cp_new_stat(struct kstat *sta tmp.st_rdev = new_encode_dev(stat->rdev); #endif #if BITS_PER_LONG == 32 - if (stat->size > MAX_NON_LFS) + if (stat->size > MAX_NON_LFS) { + D(); return -EOVERFLOW; + } #endif tmp.st_size = stat->size; tmp.st_atime = stat->atime.tv_sec; @@ -337,8 +351,10 @@ static long cp_new_stat64(struct kstat * memset(&tmp, 0, sizeof(struct stat64)); #ifdef CONFIG_MIPS /* mips has weird padding, so we don't get 64 bits there */ - if (!new_valid_dev(stat->dev) || !new_valid_dev(stat->rdev)) + if (!new_valid_dev(stat->dev) || !new_valid_dev(stat->rdev)) { + D(); return -EOVERFLOW; + } tmp.st_dev = new_encode_dev(stat->dev); tmp.st_rdev = new_encode_dev(stat->rdev); #else _ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: stat of /proc fails after CPU hot-unplug with EOVERFLOW in 2.6.18 2006-09-27 8:13 ` Andrew Morton @ 2006-09-27 14:44 ` Martin Devera 2006-10-03 8:40 ` Martin Devera 1 sibling, 0 replies; 4+ messages in thread From: Martin Devera @ 2006-09-27 14:44 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton wrote: > On Wed, 27 Sep 2006 09:55:47 +0200 > Martin Devera <devik@cdi.cz> wrote: > >> Hello, >> >> I have 2way Opteron machine. I've done this: >> echo 0 > /sys/devices/system/cpu/cpu1/online >> >> and then strace stat /proc: >> >> [snip] >> personality(PER_LINUX) = 4194304 >> getpid() = 14926 >> brk(0) = 0x804b000 >> brk(0x804b1a0) = 0x804b1a0 >> brk(0x804c000) = 0x804c000 >> stat("/proc", 0xbf8e7490) = -1 EOVERFLOW >> >> When I do echo 1 > ... to start cpu again then the stat starts >> to work again ... Weird. >> > > boggle. > > Can you add this patch, see where it's going bad? Ehh .. I finally learned how to code jprobe (I can't reboot the machine now), tested, installed and ... guess what ? The overflow bug is gone :-( It simply works now. I will reboot it next week and try again. thanks for a help and sorry for your wasted time, Martin ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: stat of /proc fails after CPU hot-unplug with EOVERFLOW in 2.6.18 2006-09-27 8:13 ` Andrew Morton 2006-09-27 14:44 ` Martin Devera @ 2006-10-03 8:40 ` Martin Devera 1 sibling, 0 replies; 4+ messages in thread From: Martin Devera @ 2006-10-03 8:40 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton wrote: > On Wed, 27 Sep 2006 09:55:47 +0200 > Martin Devera <devik@cdi.cz> wrote: > >> Hello, >> >> I have 2way Opteron machine. I've done this: >> echo 0 > /sys/devices/system/cpu/cpu1/online >> >> and then strace stat /proc: >> >> [snip] >> personality(PER_LINUX) = 4194304 >> getpid() = 14926 >> brk(0) = 0x804b000 >> brk(0x804b1a0) = 0x804b1a0 >> brk(0x804c000) = 0x804c000 >> stat("/proc", 0xbf8e7490) = -1 EOVERFLOW >> >> When I do echo 1 > ... to start cpu again then the stat starts >> to work again ... Weird. Hello, I just want to make more info public. It seems that the problem is deeper. The 2.6.18 kernel crashed the machine 4 times till now. Symptoms are - working net, ssh was functional but I was not able to run single binary except "cat", others giving me permission denied of Bus error. I was doing no experiments with cpu hotplug this time. The machine was up with 2.6.17.1 for six months and no problems. Also I found weird errors like tg3 watchdog timeout, sata read errors (on all sectors) etc. on console. Seems like memory corruption to me. It is worth to note that the lockup always occured after high load. We use MSI Far2 dual opteron MoBo. All related info is at http://luxik.cdi.cz/~devik/files/2618-corrupt/ along with 2.6.17.1 config (for comparison). The main problem is that I have no similar server to simulate the problem off-site. Thus take this report mainly as informative, I hope to replace the server in a few weeks to investigate it more. For now we are back on 2.6.17.1. Martin ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-10-03 8:41 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-09-27 7:55 stat of /proc fails after CPU hot-unplug with EOVERFLOW in 2.6.18 Martin Devera 2006-09-27 8:13 ` Andrew Morton 2006-09-27 14:44 ` Martin Devera 2006-10-03 8:40 ` Martin Devera
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox