* Re: salinfo-0.4 patch
2004-01-28 21:22 salinfo-0.4 patch Ben Woodard
2004-01-28 21:50 ` Keith Owens
@ 2004-01-28 23:32 ` Ben Woodard
2004-01-30 2:36 ` Ben Woodard
2 siblings, 0 replies; 4+ messages in thread
From: Ben Woodard @ 2004-01-28 23:32 UTC (permalink / raw)
To: linux-ia64
On Wed, 2004-01-28 at 13:50, Keith Owens wrote:
> On 28 Jan 2004 13:22:08 -0800,
> Ben Woodard <woodard@redhat.com> wrote:
> >Here is a tiny patch for salinfo-0.4. There is a problem with when it
> >deals with multiple errors. The decoded error messages fill up with
> >progressively more white space.
>
> That should not occur, the ++indent/--indent lines should be matched.
> Instead of forcing indent back to 0, please find the missing line and
> fix it. Correct code is better than an arbitrary reset.
>
> If you have a sample of nested indent that will help track down the
> bug, send it to kaos@sgi.com.
>
> >While poking around in the iprint function, I also found an error
> >message that looks like it is better suited for stderr rather than
> >stdout so I retargetted it.
>
> No. I deliberately used stdout so the error message would appear at
> the end of the truncated file on disk, to tell the viewer that this
> particular file was incomplete. stderr is useless when salinfo_decode
> is run from init.d.
OK how about this patch. I think I found the missing --indent.
diff -u -r1.1 mca.c
--- mca.c 2004/01/28 23:28:03 1.1
+++ mca.c 2004/01/28 23:28:27
@@ -857,6 +857,7 @@
}
if (pcei->valid.oem_data)
platform_pci_comp_err_print(&pcei->header, p_oem_data);
+ --indent;
}
/* Format and log the platform specifie error record section data */
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: salinfo-0.4 patch
2004-01-28 21:22 salinfo-0.4 patch Ben Woodard
2004-01-28 21:50 ` Keith Owens
2004-01-28 23:32 ` Ben Woodard
@ 2004-01-30 2:36 ` Ben Woodard
2 siblings, 0 replies; 4+ messages in thread
From: Ben Woodard @ 2004-01-30 2:36 UTC (permalink / raw)
To: linux-ia64
How about this:
diff -ru salinfo-0.4/mca.c salinfo-0.4-new/mca.c
--- salinfo-0.4/mca.c 2003-12-04 12:03:18.000000000 -0800
+++ salinfo-0.4-new/mca.c 2004-01-29 14:13:25.000000000 -0800
@@ -834,7 +834,7 @@
iprintf("Invalid PCI Component Error Record format: length = %d, "
" Size PCI Data = %ld, Num Mem-Map/IO-Map Regs = %d/%d\n",
pcei->header.len, n_pci_data, n_mem_regs, n_io_regs);
- return;
+ goto out;
}
if (n_mem_regs) {
@@ -857,6 +857,8 @@
}
if (pcei->valid.oem_data)
platform_pci_comp_err_print(&pcei->header, p_oem_data);
+ out:
+ --indent;
}
/* Format and log the platform specifie error record section data */
Only in salinfo-0.4-new/: mca.c~
Only in salinfo-0.4-new/: mca.c,v
diff -ru salinfo-0.4/salinfo_decode.c salinfo-0.4-new/salinfo_decode.c
--- salinfo-0.4/salinfo_decode.c 2003-11-24 14:37:28.000000000 -0800
+++ salinfo-0.4-new/salinfo_decode.c 2004-01-29 15:14:50.000000000 -0800
@@ -276,10 +276,15 @@
cpu,
type,
suffix);
- if (!(freopen(filename, "w", stdout) && freopen(filename, "w", stderr))) {
- perror(filename);
+ if ((fd = open(filename, O_WRONLY|O_CREAT|O_EXCL, S_IRUSR|S_IWUSR)) < 0){
+ perror(filename);
goto out;
}
+ if ( dup2(fd,1) != 1 && dup2(fd,2) != 2){
+ perror(filename);
+ goto out;
+ }
+ close(fd);
printf("BEGIN HARDWARE ERROR STATE from %s on cpu %d\n", type, cpu);
platform_info_print(buffer, 1, fd_data, cpu, oemdata_fd);
I found another bug that I tripped over. Evidently when running as a
daemon the freopen fails because the file is closed or something. I
didn't investigate the error state carefully. When it tries to do a
write later on in response to one of the printfs, it barfs and says gets
an error with EPIPE and the program monitoring that class of bugs exits.
Obviously this problem disappears if you run salinfo_decode
interactively.
Doing the open and the dup2's seems to fix that problem but it hasn't
been extensively tested. We have this tiny little problem with the whole
salinfo system. It hangs the system.
When we ran the salinfo_decode code interactively it seemed to hang the
system because it was getting hung up on the freopen. However, when we
ran it with the init script and it made a daemon, then the system
appeared to stay up. It turns out that this was only because the CPE
version of salinfo_decode was exiting when it got the the freopen.
Therefore it was never getting to the bug (which Jim and I think is
likely a race condition of some kind) that hangs the system.
Now that I fixed the problem with freopen running as a daemon and the
salinfo_decode program gets beyond that point even when it is running in
a daemon mode, we are hanging the system almost every single time now.
Jim is looking for more race conditions and for ways to fix the ones he
already sees. I will be working on that as well on Monday.
-ben
On Wed, 2004-01-28 at 15:43, Keith Owens wrote:
> On 28 Jan 2004 15:23:28 -0800,
> Ben Woodard <woodard@redhat.com> wrote:
> >It is a race, let's see who can find the missing --indent first. ;-)
>
> --- mca.c.old 2004-01-29 10:43:05.000000000 +1100
> +++ mca.c 2004-01-29 10:43:08.000000000 +1100
> @@ -834,7 +834,7 @@
> iprintf("Invalid PCI Component Error Record format: length = %d, "
> " Size PCI Data = %ld, Num Mem-Map/IO-Map Regs = %d/%d\n",
> pcei->header.len, n_pci_data, n_mem_regs, n_io_regs);
> - return;
> + goto out;
> }
>
> if (n_mem_regs) {
> @@ -857,6 +857,8 @@
> }
> if (pcei->valid.oem_data)
> platform_pci_comp_err_print(&pcei->header, p_oem_data);
> +out:
> + --indent;
> }
>
> /* Format and log the platform specifie error record section data */
^ permalink raw reply [flat|nested] 4+ messages in thread