* Alpha "process table hang"
@ 2001-04-11 15:40 Bob McElrath
2001-04-11 16:44 ` Peter Rival
[not found] ` <200104111642.f3BGg6930131@kanga.hofr.at>
0 siblings, 2 replies; 16+ messages in thread
From: Bob McElrath @ 2001-04-11 15:40 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]
I've been experiencing a particular kind of hang for many versions
(since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
the alpha architecture. The symptom is that any program that tries to
access the process table will hang. (ps, w, top) The hang will go away
by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
see that it gets halfway through the process list and hangs. The
process that comes next in the list (after hang goes away) almost always
has nonsensical memory numbers, like multi-gigabyte SIZE.
Linux draal.physics.wisc.edu 2.3.99-pre5 #8 Sun Apr 23 16:21:48 CDT 2000
alpha unknown
Gnu C 2.96
Gnu make 3.78.1
binutils 2.10.0.18
util-linux 2.11a
modutils 2.4.5
e2fsprogs 1.18
PPP 2.3.11
Linux C Library 2.2.1
Dynamic linker (ldd) 2.2.1
Procps 2.0.7
Net-tools 1.54
Kbd 0.94
Sh-utils 2.0
Modules Loaded nfsd lockd sunrpc af_packet msdos fat pas2 sound
soundcore
Has anyone else seen this? Is there a fix?
-- Bob
Bob McElrath (rsmcelrath@students.wisc.edu)
Univ. of Wisconsin at Madison, Department of Physics
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alpha "process table hang"
2001-04-11 15:40 Alpha "process table hang" Bob McElrath
@ 2001-04-11 16:44 ` Peter Rival
2001-04-11 17:00 ` Bob McElrath
[not found] ` <200104111642.f3BGg6930131@kanga.hofr.at>
1 sibling, 1 reply; 16+ messages in thread
From: Peter Rival @ 2001-04-11 16:44 UTC (permalink / raw)
To: Bob McElrath; +Cc: linux-kernel
You wouldn't happen to have khttpd loaded as a module, would you? I've seen
this type of problem caused by that before...
- Pete
Bob McElrath wrote:
> I've been experiencing a particular kind of hang for many versions
> (since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
> the alpha architecture. The symptom is that any program that tries to
> access the process table will hang. (ps, w, top) The hang will go away
> by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
> see that it gets halfway through the process list and hangs. The
> process that comes next in the list (after hang goes away) almost always
> has nonsensical memory numbers, like multi-gigabyte SIZE.
>
> Linux draal.physics.wisc.edu 2.3.99-pre5 #8 Sun Apr 23 16:21:48 CDT 2000
> alpha unknown
>
> Gnu C 2.96
> Gnu make 3.78.1
> binutils 2.10.0.18
> util-linux 2.11a
> modutils 2.4.5
> e2fsprogs 1.18
> PPP 2.3.11
> Linux C Library 2.2.1
> Dynamic linker (ldd) 2.2.1
> Procps 2.0.7
> Net-tools 1.54
> Kbd 0.94
> Sh-utils 2.0
> Modules Loaded nfsd lockd sunrpc af_packet msdos fat pas2 sound
> soundcore
>
> Has anyone else seen this? Is there a fix?
>
> -- Bob
>
> Bob McElrath (rsmcelrath@students.wisc.edu)
> Univ. of Wisconsin at Madison, Department of Physics
>
> ------------------------------------------------------------------------
> Part 1.2Type: application/pgp-signature
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alpha "process table hang"
2001-04-11 16:44 ` Peter Rival
@ 2001-04-11 17:00 ` Bob McElrath
2001-04-11 17:18 ` Peter Rival
0 siblings, 1 reply; 16+ messages in thread
From: Bob McElrath @ 2001-04-11 17:00 UTC (permalink / raw)
To: Peter Rival; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1884 bytes --]
Peter Rival [frival@zk3.dec.com] wrote:
> You wouldn't happen to have khttpd loaded as a module, would you? I've seen
> this type of problem caused by that before...
Nope...
>
> - Pete
>
> Bob McElrath wrote:
>
> > I've been experiencing a particular kind of hang for many versions
> > (since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
> > the alpha architecture. The symptom is that any program that tries to
> > access the process table will hang. (ps, w, top) The hang will go away
> > by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
> > see that it gets halfway through the process list and hangs. The
> > process that comes next in the list (after hang goes away) almost always
> > has nonsensical memory numbers, like multi-gigabyte SIZE.
> >
> > Linux draal.physics.wisc.edu 2.3.99-pre5 #8 Sun Apr 23 16:21:48 CDT 2000
> > alpha unknown
> >
> > Gnu C 2.96
> > Gnu make 3.78.1
> > binutils 2.10.0.18
> > util-linux 2.11a
> > modutils 2.4.5
> > e2fsprogs 1.18
> > PPP 2.3.11
> > Linux C Library 2.2.1
> > Dynamic linker (ldd) 2.2.1
> > Procps 2.0.7
> > Net-tools 1.54
> > Kbd 0.94
> > Sh-utils 2.0
> > Modules Loaded nfsd lockd sunrpc af_packet msdos fat pas2 sound
> > soundcore
> >
> > Has anyone else seen this? Is there a fix?
> >
> > -- Bob
> >
> > Bob McElrath (rsmcelrath@students.wisc.edu)
> > Univ. of Wisconsin at Madison, Department of Physics
> >
> > ------------------------------------------------------------------------
> > Part 1.2Type: application/pgp-signature
-- Bob
Bob McElrath (rsmcelrath@students.wisc.edu)
Univ. of Wisconsin at Madison, Department of Physics
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alpha "process table hang"
2001-04-11 17:00 ` Bob McElrath
@ 2001-04-11 17:18 ` Peter Rival
2001-04-11 17:57 ` Bob McElrath
0 siblings, 1 reply; 16+ messages in thread
From: Peter Rival @ 2001-04-11 17:18 UTC (permalink / raw)
To: Bob McElrath; +Cc: linux-kernel
Hmpf. Haven't seen this at all on any of the Alphas that I'm running. What
exact system are you seeing this on, and what are you running when it happens?
- Pete
Bob McElrath wrote:
> Peter Rival [frival@zk3.dec.com] wrote:
> > You wouldn't happen to have khttpd loaded as a module, would you? I've seen
> > this type of problem caused by that before...
>
> Nope...
>
> >
> > - Pete
> >
> > Bob McElrath wrote:
> >
> > > I've been experiencing a particular kind of hang for many versions
> > > (since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
> > > the alpha architecture. The symptom is that any program that tries to
> > > access the process table will hang. (ps, w, top) The hang will go away
> > > by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
> > > see that it gets halfway through the process list and hangs. The
> > > process that comes next in the list (after hang goes away) almost always
> > > has nonsensical memory numbers, like multi-gigabyte SIZE.
> > >
> > > Linux draal.physics.wisc.edu 2.3.99-pre5 #8 Sun Apr 23 16:21:48 CDT 2000
> > > alpha unknown
> > >
> > > Gnu C 2.96
> > > Gnu make 3.78.1
> > > binutils 2.10.0.18
> > > util-linux 2.11a
> > > modutils 2.4.5
> > > e2fsprogs 1.18
> > > PPP 2.3.11
> > > Linux C Library 2.2.1
> > > Dynamic linker (ldd) 2.2.1
> > > Procps 2.0.7
> > > Net-tools 1.54
> > > Kbd 0.94
> > > Sh-utils 2.0
> > > Modules Loaded nfsd lockd sunrpc af_packet msdos fat pas2 sound
> > > soundcore
> > >
> > > Has anyone else seen this? Is there a fix?
> > >
> > > -- Bob
> > >
> > > Bob McElrath (rsmcelrath@students.wisc.edu)
> > > Univ. of Wisconsin at Madison, Department of Physics
> > >
> > > ------------------------------------------------------------------------
> > > Part 1.2Type: application/pgp-signature
> -- Bob
>
> Bob McElrath (rsmcelrath@students.wisc.edu)
> Univ. of Wisconsin at Madison, Department of Physics
>
> ------------------------------------------------------------------------
> Part 1.2Type: application/pgp-signature
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alpha "process table hang"
2001-04-11 17:18 ` Peter Rival
@ 2001-04-11 17:57 ` Bob McElrath
[not found] ` <E14nOzo-0007Ew-00@the-village.bc.nu>
0 siblings, 1 reply; 16+ messages in thread
From: Bob McElrath @ 2001-04-11 17:57 UTC (permalink / raw)
To: Peter Rival; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]
Peter Rival [frival@zk3.dec.com] wrote:
> Hmpf. Haven't seen this at all on any of the Alphas that I'm running. What
> exact system are you seeing this on, and what are you running when it happens?
This is a LX164 system, 533 MHz.
I have a hunch it's related to the X server because I've seen it many,
many times while sitting at the console (in X), but never when I'm
logged on remotely. I've seen it with both XFree86 3.3.6, 4.0.2, 4.0.3,
Matrox Millenium II video card, 8MB.
I'm also experiencing regular X crashes, but the process-table-hang
doesn't occur at the same time as an X crash (or v/v). I sent a patch
to xfree86@xfree86.org a few days ago that seemed to fix (one of) the X
crashes (in the mga driver, ask if you want details).
(But since the X server shouldn't have the ability to corrupt the
kernel's process list, there has to be a problem in the kernel
somewhere)
Note that this system was completely stable with 2.2 kernels.
Cheers,
-- Bob
Bob McElrath (rsmcelrath@students.wisc.edu)
Univ. of Wisconsin at Madison, Department of Physics
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alpha "process table hang"
[not found] ` <200104111642.f3BGg6930131@kanga.hofr.at>
@ 2001-04-11 18:49 ` Bob McElrath
0 siblings, 0 replies; 16+ messages in thread
From: Bob McElrath @ 2001-04-11 18:49 UTC (permalink / raw)
To: Der Herr Hofrat; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2703 bytes --]
Well, here's the list of modules I have loaded:
nfsd 102496 8 (autoclean)
lockd 72976 1 (autoclean) [nfsd]
sunrpc 87984 1 (autoclean) [nfsd lockd]
nls_iso8859-1 4160 1 (autoclean)
nls_cp437 5664 1 (autoclean)
msdos 7728 1 (autoclean)
fat 42784 0 (autoclean) [msdos]
pas2 17488 1
sound 83184 1 [pas2]
soundcore 5568 5 [sound]
Are there any known problems with these? I have at times also used
matroxfb, and usb-uhci (along with visor, usb-storage), but I've seen
the process-table-hang with matroxfb and usb-uhci *not* installed, so I
don't think that's it. I have the above modules installed consistently
at each bootup.
Der Herr Hofrat [der.herr@hofr.at] wrote:
> > I've been experiencing a particular kind of hang for many versions
> > (since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
> > the alpha architecture. The symptom is that any program that tries to
> > access the process table will hang. (ps, w, top) The hang will go away
> > by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
> > see that it gets halfway through the process list and hangs. The
> > process that comes next in the list (after hang goes away) almost always
> > has nonsensical memory numbers, like multi-gigabyte SIZE.
> >
> >
> I know this effect independant of the platform when you have a proc entry that
> is not corectly unregistered.
>
> (the code only compiles for 2.2.X, for 2.4.X you need to change
> the proc struct.)
>
> ---snip---
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/proc_fs.h>
>
> #define BUF_LEN 1024
> struct proc_dir_entry prockill_proc_file={
> 0,
> 0,
> "prockill",
> S_IFREG|S_IRUGO,
> 1,
> 0,
> 0,
> BUF_LEN,
> NULL,
> NULL,
> NULL,
> };
>
> int init_module(void) {
> printk("prockill.o registering proc entry\n");
> return proc_register(&proc_root,&prockill_proc_file);
> }
>
> void cleanup_module(void) {
> printk("prockill.o fogets to unregister proc entry\n");
> }
> ---snip---
> compile this as kernel module
>
> insmod proc_kill.o
> rmmod proc_kill
>
> and the system will run without error until you do something like
>
> ls /proc/<TAB><TAB> or
> ls -R /proc
>
> after this the system will drop dead for minutes to hours or even for good....
>
>
> any chance you have a faulty module ??
>
>
> hofrat
-- Bob
Bob McElrath (rsmcelrath@students.wisc.edu)
Univ. of Wisconsin at Madison, Department of Physics
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Alpha "process table hang"
[not found] ` <E14nOzo-0007Ew-00@the-village.bc.nu>
@ 2001-04-13 13:48 ` Bob McElrath
2001-04-17 15:07 ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
0 siblings, 1 reply; 16+ messages in thread
From: Bob McElrath @ 2001-04-13 13:48 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]
Alan Cox [alan@lxorguk.ukuu.org.uk] wrote:
> > (But since the X server shouldn't have the ability to corrupt the
> > kernel's process list, there has to be a problem in the kernel
> > somewhere)
>
> The X server has enough priviledge to corrupt anything. Its unlikely to and
> I do agree they two are likely to be unrelated.
Well, nix that idea. I just fell back to 2.2.19, and I see neither the
X crash nor the process-table-hang crash (which rules out hardware
problems, thankfully). The X crash is also kernel related, it seems.
I'm using XFree86 4.0.3 with the mga driver. It hangs in mga_storm.c on
a line that looks like:
while (MGAISBUSY()) {}
where:
#define MGAISBUSY() (INREG8(MGAREG_Status + 2) & 0x01)
Killing and restarting X causes it to immediately hang in the same
place. (I have to reboot to recover the console)
This would seem to be PCI related. Have any significant PCI code
changes been made to the alpha architecture, especially pyxis or
cabriolet code? I see that arch/alpha/kernel has been totally
rearranged, but since this doesn't crash in kernel code, I have no idea
how to debug it.
Thanks,
-- Bob
Bob McElrath (rsmcelrath@students.wisc.edu)
Univ. of Wisconsin at Madison, Department of Physics
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* generic rwsem [Re: Alpha "process table hang"]
2001-04-13 13:48 ` Bob McElrath
@ 2001-04-17 15:07 ` Andrea Arcangeli
2001-04-17 15:28 ` Bob McElrath
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Andrea Arcangeli @ 2001-04-17 15:07 UTC (permalink / raw)
To: Bob McElrath; +Cc: linux-kernel, Peter Rival, Linus Torvalds, David Howells
On Fri, Apr 13, 2001 at 08:48:05AM -0500, Bob McElrath wrote:
> Alan Cox [alan@lxorguk.ukuu.org.uk] wrote:
> > > (But since the X server shouldn't have the ability to corrupt the
> > > kernel's process list, there has to be a problem in the kernel
> > > somewhere)
> >
> > The X server has enough priviledge to corrupt anything. Its unlikely to and
> > I do agree they two are likely to be unrelated.
>
> Well, nix that idea. I just fell back to 2.2.19, and I see neither the
> X crash nor the process-table-hang crash (which rules out hardware
> problems, thankfully). The X crash is also kernel related, it seems.
>
> I'm using XFree86 4.0.3 with the mga driver. It hangs in mga_storm.c on
> a line that looks like:
> while (MGAISBUSY()) {}
> where:
> #define MGAISBUSY() (INREG8(MGAREG_Status + 2) & 0x01)
>
> Killing and restarting X causes it to immediately hang in the same
> place. (I have to reboot to recover the console)
>
> This would seem to be PCI related. Have any significant PCI code
> changes been made to the alpha architecture, especially pyxis or
> cabriolet code? I see that arch/alpha/kernel has been totally
> rearranged, but since this doesn't crash in kernel code, I have no idea
> how to debug it.
It seems it was an SMP race in the rw alpha semaphores. I rewrote the
rwsemaphores starting from my first implementation of them in C that is now
adpoted by the ppc port (I added some scalability and locking optimization),
and made them generic dropping all the rwsem stuff that is been included into
2.4.4pre[23] (the generic rwsemaphores in those kernels is broken, try to use
them in other archs or x86 and you will notice) and I cannot reproduce the hang
any longer.
My generic rwsem should be also cleaner and faster than the generic ones in
2.4.4pre3 and they can be turned off completly so an architecture can really
takeover with its own asm implementation (while with the 2.4.4pre3 design this
is obviously not possible because lib/rwsem.c compilation isn't conditional and
such file knows the internals of the struct rw_semaphore).
In the below generic implementation of the rw sem the max limit of concurrent
readers in the critical section is 2^sizeof(int) and down_read is recursive.
There's no limit of tasks sleeping in the slow path either by down_read or
down_write. The waitqueue wakeups are done without any additional lock (the
lock in the waitqueue is unused).
So please try to reproduce the hang with 2.4.4pre3 with those two
patches applied:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
All alpha users should run with at least the above two patches applied
to compile their tree and to make sure to have rock solid rwsemaphores.
Both patches are suggested for inclusion, the arch optimizations can be done on
top of the cleaner and arch friendly rwsem code (just copy the asm files from
2.4.4pre3 and set CONFIG_GENERIC_RWSEM to `n') and the current lib/rwsem.c can be
moved in arch/i386/kernel without any problem. I didn't do that myself because
I wasn't going to audit every line of the x86 asm rwsem right now and I only
wanted obviously right stuff into my tree but I'd appreciate if David could do
that. Note that besides my patch drops the asm stuff I don't want to reject the asm
based implementation in the long run, but I only care to proivide a solid
and clean generic implementation that can be used as a fallback anytime on any
arch by only changing a configuration option.
The alpha-numa patch also fixes some mm bug in the common code.
Andrea
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: generic rwsem [Re: Alpha "process table hang"]
2001-04-17 15:07 ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
@ 2001-04-17 15:28 ` Bob McElrath
2001-04-19 16:21 ` Bob McElrath
2001-04-17 15:45 ` Christoph Hellwig
2001-04-17 16:59 ` David Howells
2 siblings, 1 reply; 16+ messages in thread
From: Bob McElrath @ 2001-04-17 15:28 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, Peter Rival, Linus Torvalds, David Howells
[-- Attachment #1: Type: text/plain, Size: 1051 bytes --]
Andrea Arcangeli [andrea@suse.de] wrote:
>
> So please try to reproduce the hang with 2.4.4pre3 with those two
> patches applied:
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
>
> All alpha users should run with at least the above two patches applied
> to compile their tree and to make sure to have rock solid rwsemaphores.
Excellent! I'll give it a try.
Note that I recently saw the X hang with the 2.2.19 kernel, but I still
haven't seen the process-table-hang with 2.2.19 (about 4 days running
with 2.2.19). It is *far* easier to get the X hang in 2.4 than 2.2.
(minutes for 2.4, days for 2.2) Also note that this is not an SMP
machine (single processor 21164a, LX164 mobo).
But I'll apply your patch tonight and let you know the results.
Cheers,
-- Bob
Bob McElrath (rsmcelrath@students.wisc.edu)
Univ. of Wisconsin at Madison, Department of Physics
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: generic rwsem [Re: Alpha "process table hang"]
2001-04-17 15:07 ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
2001-04-17 15:28 ` Bob McElrath
@ 2001-04-17 15:45 ` Christoph Hellwig
2001-04-17 16:59 ` David Howells
2 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2001-04-17 15:45 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel
In article <20010417170717.H2696@athlon.random> you wrote:
> My generic rwsem should be also cleaner and faster than the generic ones in
> 2.4.4pre3 and they can be turned off completly so an architecture can really
> takeover with its own asm implementation (while with the 2.4.4pre3 design this
> is obviously not possible because lib/rwsem.c compilation isn't conditional and
> such file knows the internals of the struct rw_semaphore).
>
> In the below generic implementation of the rw sem the max limit of concurrent
> readers in the critical section is 2^sizeof(int) and down_read is recursive.
> There's no limit of tasks sleeping in the slow path either by down_read or
> down_write. The waitqueue wakeups are done without any additional lock (the
> lock in the waitqueue is unused).
>
> So please try to reproduce the hang with 2.4.4pre3 with those two
> patches applied:
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
Hey it looks like someone finally fixed the rwsems :P
A little comment on the path:
In lib/Makefile you should _always_ add rwsem.o the export-objs, not only if
CONFIG_GENERIC_RWSEM is 'y' - that's the whole idea behind export-objs.
Christoph
--
Of course it doesn't work. We've performed a software upgrade.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: generic rwsem [Re: Alpha "process table hang"]
2001-04-17 15:07 ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
2001-04-17 15:28 ` Bob McElrath
2001-04-17 15:45 ` Christoph Hellwig
@ 2001-04-17 16:59 ` David Howells
2001-04-17 17:55 ` Andrea Arcangeli
2 siblings, 1 reply; 16+ messages in thread
From: David Howells @ 2001-04-17 16:59 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Bob McElrath, linux-kernel, Peter Rival, Linus Torvalds,
David Howells
Andrea,
How did you generate the 00_rwsem-generic-1 patch? Against what did you diff?
You seem to have removed all the optimised i386 rwsem stuff... Did it not work
for you?
> (the generic rwsemaphores in those kernels is broken, try to use them in
> other archs or x86 and you will notice) and I cannot reproduce the hang any
> longer.
Can you supply a test case that demonstrates it not working?
> My generic rwsem should be also cleaner and faster than the generic ones in
> 2.4.4pre3 and they can be turned off completly so an architecture can really
> takeover with its own asm implementation.
I quick look says it shouldn't be faster (inline functions and all that).
However, I think you might be right about it being too dependent on the
algorithm I put in, and that is easy to change.
> (while with the 2.4.4pre3 design this is obviously not possible because
> lib/rwsem.c compilation isn't conditional and such file knows the internals
> of the struct rw_semaphore).
Could be very easily changed.
David
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: generic rwsem [Re: Alpha "process table hang"]
2001-04-17 16:59 ` David Howells
@ 2001-04-17 17:55 ` Andrea Arcangeli
0 siblings, 0 replies; 16+ messages in thread
From: Andrea Arcangeli @ 2001-04-17 17:55 UTC (permalink / raw)
To: David Howells; +Cc: Bob McElrath, linux-kernel, Peter Rival, Linus Torvalds
On Tue, Apr 17, 2001 at 05:59:13PM +0100, David Howells wrote:
> Andrea,
>
> How did you generate the 00_rwsem-generic-1 patch? Against what did you diff?
2.4.4pre3 from kernel.org.
> You seem to have removed all the optimised i386 rwsem stuff... Did it not work
> for you?
As said the design of the framework to plugin per-arch rwsem implementation
isn't flexible enough and the generic spinlocks are as well broken, try to use
them if you can (yes I tried that for the alpha, it was just a mess and it was
more productive to rewrite than to fix).
> > (the generic rwsemaphores in those kernels is broken, try to use them in
> > other archs or x86 and you will notice) and I cannot reproduce the hang any
> > longer.
>
> Can you supply a test case that demonstrates it not working?
#define __RWSEM_INITIALIZER(name,count) \
^^^^^
{ RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, \
^^^^^^^^^^^^^^^^^^^^
__WAIT_QUEUE_HEAD_INITIALIZER((name).wait) \
__RWSEM_DEBUG_INIT __RWSEM_DEBUG_MINIT(name) }
#define __DECLARE_RWSEM_GENERIC(name,count) \
struct rw_semaphore name = __RWSEM_INITIALIZER(name,count)
^^^^^
#define DECLARE_RWSEM(name) __DECLARE_RWSEM_GENERIC(name,RW_LOCK_BIAS)
^^^^^^^^^^^^
#define DECLARE_RWSEM_READ_LOCKED(name) __DECLARE_RWSEM_GENERIC(name,RW_LOCK_BIAS-1)
^^^^^^^^^^^^^^
#define DECLARE_RWSEM_WRITE_LOCKED(name) __DECLARE_RWSEM_GENERIC(name,0)
> > My generic rwsem should be also cleaner and faster than the generic ones in
> > 2.4.4pre3 and they can be turned off completly so an architecture can really
> > takeover with its own asm implementation.
>
> I quick look says it shouldn't be faster (inline functions and all that).
The spinlock based generic semaphores are quite large, so I don't want to waste
icache because of that, a call asm instruction isn't that costly (it's
obviously _very_ costly for a spinlock because a spinlock is 1 asm instruction
in the fast path, but not for a C based rwsem). But the real point is the
locking and the waitqueue mechanism that is superior in my implementation (not
the non inlining part).
And it's also more readable and it's not bloated code, 65+110 lines compared to
156+148+174 lines.
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3aa/include/linux/rwsem.h
65 2.4.4pre3aa/include/linux/rwsem.h
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3aa/lib/rwsem.c
110 2.4.4pre3aa/lib/rwsem.c
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/lib/rwsem.c
156 2.4.4pre3/lib/rwsem.c
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/include/linux/rwsem.h
148 2.4.4pre3/include/linux/rwsem.h
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/include/linux/rwsem-spinlock.h
174 2.4.4pre3/include/linux/rwsem-spinlock.h
andrea@athlon:~/devel/kernel >
I suggest you to apply my patch, read my implementation, tell me if you think
it's not more efficient and more readable, and then to set CONFIG_RWSEM_GENERIC
to n in arch/i386/config.in and to plugin your asm code taken from vanilla
2.4.4pre3 into include/asm-i386/rwsem.h and arch/i386/kernel/rwsem.c then we're
done, and if someone has problems with the asm code with a one liner he can
fallback in a obviously right and quite efficient implementation [even if the
fastpath is not 1 inlined asm instruction] (all archs will be allowed to do
that transparently to the arch dependent code). Same can be done on alpha and
other archs, resurrecting the inlined fast paths based on the atomic_add_return
APIs is easy too. Infact I'd _recommend_ for archs that can implement the
atomic_add_return and friends (included ia32 with xadd on >=586) to
implement the "fast path" version of the rwsem it in C too in the common code
selectable with a CONFIG_RWSEM_ATOMIC_RETURN (plus we add
linux/include/linux/compiler.h with the builtin_expect macro to be able to
define the fast path in C too). Most archs have the atomic_*_return and friends
and they will be able share completly the common code and have rwsem fast paths
as fast as ia32 without risk to introduce bugs in the port. The more we share
the less risk there is. After CONFIG_RWSEM_ATOMIC_RETURN is implemented we can
probably drop the file asm-i386/rwsem-xadd.h.
Andrea
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: generic rwsem [Re: Alpha "process table hang"]
2001-04-17 15:28 ` Bob McElrath
@ 2001-04-19 16:21 ` Bob McElrath
2001-04-19 17:17 ` Andrea Arcangeli
0 siblings, 1 reply; 16+ messages in thread
From: Bob McElrath @ 2001-04-19 16:21 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1391 bytes --]
Bob McElrath [mcelrath+linux@draal.physics.wisc.edu] wrote:
> Andrea Arcangeli [andrea@suse.de] wrote:
> >
> > So please try to reproduce the hang with 2.4.4pre3 with those two
> > patches applied:
> >
> > ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> > ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
> >
> > All alpha users should run with at least the above two patches applied
> > to compile their tree and to make sure to have rock solid rwsemaphores.
>
> Excellent! I'll give it a try.
>
> Note that I recently saw the X hang with the 2.2.19 kernel, but I still
> haven't seen the process-table-hang with 2.2.19 (about 4 days running
> with 2.2.19). It is *far* easier to get the X hang in 2.4 than 2.2.
> (minutes for 2.4, days for 2.2) Also note that this is not an SMP
> machine (single processor 21164a, LX164 mobo).
>
> But I'll apply your patch tonight and let you know the results.
Status report:
I'm at 2 days uptime now, and have not seen the process-table-hang.
Looks like this fixed it. Previously I would get a hang in the first
day or so. I'm using your alpha-numa-3 and rwsem-generic-4 against
2.4.4pre3.
Cheers,
-- Bob
Bob McElrath (rsmcelrath@students.wisc.edu)
Univ. of Wisconsin at Madison, Department of Physics
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: generic rwsem [Re: Alpha "process table hang"]
2001-04-19 16:21 ` Bob McElrath
@ 2001-04-19 17:17 ` Andrea Arcangeli
2001-04-23 23:27 ` Bob McElrath
0 siblings, 1 reply; 16+ messages in thread
From: Andrea Arcangeli @ 2001-04-19 17:17 UTC (permalink / raw)
To: Bob McElrath; +Cc: linux-kernel
On Thu, Apr 19, 2001 at 11:21:17AM -0500, Bob McElrath wrote:
> I'm at 2 days uptime now, and have not seen the process-table-hang.
> Looks like this fixed it. Previously I would get a hang in the first
> day or so. I'm using your alpha-numa-3 and rwsem-generic-4 against
> 2.4.4pre3.
good, thanks for the report.
BTW, if you upgrade to 2.4.4pre4 you can apply those two patches:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_alpha-numa-4
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_rwsem-generic-6
really the first is not necessary anymore unless you're using a wildfire. The
second also resurrect the optimized rwsemaphores for all archs but alpha and
ia32.
Andrea
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: generic rwsem [Re: Alpha "process table hang"]
2001-04-19 17:17 ` Andrea Arcangeli
@ 2001-04-23 23:27 ` Bob McElrath
2001-04-23 23:40 ` Andrea Arcangeli
0 siblings, 1 reply; 16+ messages in thread
From: Bob McElrath @ 2001-04-23 23:27 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]
Andrea Arcangeli [andrea@suse.de] wrote:
> On Thu, Apr 19, 2001 at 11:21:17AM -0500, Bob McElrath wrote:
> > I'm at 2 days uptime now, and have not seen the process-table-hang.
> > Looks like this fixed it. Previously I would get a hang in the first
> > day or so. I'm using your alpha-numa-3 and rwsem-generic-4 against
> > 2.4.4pre3.
>
> good, thanks for the report.
>
> BTW, if you upgrade to 2.4.4pre4 you can apply those two patches:
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_alpha-numa-4
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_rwsem-generic-6
>
> really the first is not necessary anymore unless you're using a wildfire. The
> second also resurrect the optimized rwsemaphores for all archs but alpha and
> ia32.
Well, take that back, I just got it to hang. Again, this is 2.4.4pre3
with alpha-numa-3 and rwsem-generic-4. I saw it upon starting mozilla.
I also saw some scary filesystem errors that may or may not be related:
Apr 23 18:09:40 draal kernel: EXT2-fs error (device sd(8,2)):
ext2_new_block: Free blocks count corrupted for block group 252
There has been a lot of discussion on the topic of rwsems (that,
admittedly, I haven't followed very closely). It looks like
rwsem-generic-6 is the latest from Andrea, I'll build a new 2.4.4pre4
kernel with these patches and let you know the results. Have you made
changes between rwsem-generic-4 and rwsem-generic-6 that would
fix/prevent a deadlock?
Let me know if there are any useful tests I could perform. Would it be
useful for me to run the rwsem benchmarks you've been using? Could
these detect a deadlock situation?
Cheers,
-- Bob
Bob McElrath (rsmcelrath@students.wisc.edu)
Univ. of Wisconsin at Madison, Department of Physics
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: generic rwsem [Re: Alpha "process table hang"]
2001-04-23 23:27 ` Bob McElrath
@ 2001-04-23 23:40 ` Andrea Arcangeli
0 siblings, 0 replies; 16+ messages in thread
From: Andrea Arcangeli @ 2001-04-23 23:40 UTC (permalink / raw)
To: Bob McElrath; +Cc: linux-kernel
On Mon, Apr 23, 2001 at 06:27:23PM -0500, Bob McElrath wrote:
> Well, take that back, I just got it to hang. Again, this is 2.4.4pre3
> with alpha-numa-3 and rwsem-generic-4. I saw it upon starting mozilla.
> I also saw some scary filesystem errors that may or may not be related:
> Apr 23 18:09:40 draal kernel: EXT2-fs error (device sd(8,2)):
> ext2_new_block: Free blocks count corrupted for block group 252
That is probably unrelated to the ps hang. I suspect you are been bitten by the
ext2 metadata corruption (2.4.4pre2 was just fixed but previous kernel wasn't).
> rwsem-generic-6 is the latest from Andrea, I'll build a new 2.4.4pre4
> kernel with these patches and let you know the results. Have you made
Yes, that's safe.
> changes between rwsem-generic-4 and rwsem-generic-6 that would
> fix/prevent a deadlock?
No, but I think they are two separate issues.
> Let me know if there are any useful tests I could perform. Would it be
> useful for me to run the rwsem benchmarks you've been using? Could
> these detect a deadlock situation?
yes to be sure you can run it without my patch and see if it hangs (I never
tried that myself, but I was able to reproduce the ps hang quite easily and it
was quite obviously due the rwsemaphores and it gone away completly after I
used the generic semaphores).
Andrea
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2001-04-23 23:40 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-11 15:40 Alpha "process table hang" Bob McElrath
2001-04-11 16:44 ` Peter Rival
2001-04-11 17:00 ` Bob McElrath
2001-04-11 17:18 ` Peter Rival
2001-04-11 17:57 ` Bob McElrath
[not found] ` <E14nOzo-0007Ew-00@the-village.bc.nu>
2001-04-13 13:48 ` Bob McElrath
2001-04-17 15:07 ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
2001-04-17 15:28 ` Bob McElrath
2001-04-19 16:21 ` Bob McElrath
2001-04-19 17:17 ` Andrea Arcangeli
2001-04-23 23:27 ` Bob McElrath
2001-04-23 23:40 ` Andrea Arcangeli
2001-04-17 15:45 ` Christoph Hellwig
2001-04-17 16:59 ` David Howells
2001-04-17 17:55 ` Andrea Arcangeli
[not found] ` <200104111642.f3BGg6930131@kanga.hofr.at>
2001-04-11 18:49 ` Alpha "process table hang" Bob McElrath
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox