* linux 2.4.20 oops
@ 2003-04-29 9:18 Brasseur Valéry
2003-04-29 9:44 ` Danny Smith
0 siblings, 1 reply; 11+ messages in thread
From: Brasseur Valéry @ 2003-04-29 9:18 UTC (permalink / raw)
To: nfs
I have got this oops in my kernel (2.4.20 + NFS-ALL )
note : the oops are not log in syslog (don't know why !)
but the oops seems nfs-related, any ideas ?
Warning (compare_maps): mismatch on symbol ixp_mac_cache_timer_active ,
rxp.2.4.20-xfs.mp says f8b3161c, /usr/local/resonate/etc/rxp.2.4.20-xfs.mp
says f8b31404. Ignoring /usr/local/resonate/etc/rxp.2.4.20-xfs.mp entry
Reading Oops report from the terminal
Call Trace: [<c02b8dd4>] [<c02ba007>] [<c02b9f90>] [<c012268f>] [<c011e7cf>]
[<c010a30b>]
[<c0106d7c>] [<c0106dcb>] [<c011a019>] [<c011a29b>] [<c011a10>]
Code: 8b 40 2c 47 89 7c 24 10 b9 08 00 00 00 83 f8 09 0f 4c c8 b8
Using defaults from ksymoops -t elf32-i386 -a i386
Trace; c02b8dd4 <rpc_restart_call+1fec/28ec>
Trace; c02ba007 <xprt_destroy+3b3/80c>
Trace; c02b9f90 <xprt_destroy+33c/80c>
Trace; c012268f <del_timer_sync+75f/9c0>
Trace; c011e7cf <do_softirq+6f/cc>
Trace; c010a30b <enable_irq+17f/190>
Trace; c0106d7c <enable_hlt+34/150>
Trace; c0106dcb <enable_hlt+83/150>
Trace; c011a019 <__out_of_line_bug+579/5e8>
Trace; c011a29b <acquire_console_sem+d3/100>
Trace; c011a1a0 <printk+118/140>
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 8b 40 2c mov 0x2c(%eax),%eax
Code; 00000003 Before first symbol
3: 47 inc %edi
Code; 00000004 Before first symbol
4: 89 7c 24 10 mov %edi,0x10(%esp,1)
Code; 00000008 Before first symbol
8: b9 08 00 00 00 mov $0x8,%ecx
Code; 0000000d Before first symbol
d: 83 f8 09 cmp $0x9,%eax
Code; 00000010 Before first symbol
10: 0f 4c c8 cmovl %eax,%ecx
Code; 00000013 Before first symbol
13: b8 00 00 00 00 mov $0x0,%eax
thanks in advance
valery
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux 2.4.20 oops
2003-04-29 9:18 linux 2.4.20 oops Brasseur Valéry
@ 2003-04-29 9:44 ` Danny Smith
2003-04-29 16:31 ` Philippe Troin
0 siblings, 1 reply; 11+ messages in thread
From: Danny Smith @ 2003-04-29 9:44 UTC (permalink / raw)
To: Brasseur Valéry; +Cc: nfs
Brasseur Val=E9ry wrote:
>I have got this oops in my kernel (2.4.20 + NFS-ALL )
>note : the oops are not log in syslog (don't know why !)
>but the oops seems nfs-related, any ideas ?
>
We had almost identical oopses with the same setup on our dual render box=
es,
which would often panic almost immediately afterwards.
We used this patch from Ulrich Weigand (weigand@informatik.uni-erlangen.d=
e), and
haven't seen a problem since. See the archives for full details - basical=
ly seems
to be an SMP race in rpc_delete_timer(). If you're not on an SMP system,
it's probably NOT the right fix.
HTH,
Danny
Index: net/sunrpc/sched.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/cvs/linux-2.3/net/sunrpc/sched.c,v
retrieving revision 1.13
diff -u -p -r1.13 sched.c
--- net/sunrpc/sched.c 3 May 2001 16:18:18 -0000 1.13
+++ net/sunrpc/sched.c 8 Mar 2003 22:46:11 -0000
@@ -168,10 +168,8 @@ void rpc_add_timer(struct rpc_task *task
static inline void
rpc_delete_timer(struct rpc_task *task)
{
- if (timer_pending(&task->tk_timer)) {
+ if (del_timer_sync(&task->tk_timer))
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
- del_timer_sync(&task->tk_timer);
- }
}
=20
/*
--=20
Danny Smith
Senior Systems Administrator, Cinesite (Europe) Ltd
020 7973 4000 - x4055 / dannys@cinesite.co.uk
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: linux 2.4.20 oops
@ 2003-04-29 9:46 Brasseur Valéry
0 siblings, 0 replies; 11+ messages in thread
From: Brasseur Valéry @ 2003-04-29 9:46 UTC (permalink / raw)
To: 'Danny Smith'; +Cc: nfs
Thanks for your information ! we are on SMP boxes so I will try ASAP !
> -----Original Message-----
> From: Danny Smith [mailto:dannys@cinesite.co.uk]
> Sent: Tuesday, April 29, 2003 11:45 AM
> To: Brasseur Val=E9ry
> Cc: nfs@lists.sourceforge.net
> Subject: Re: [NFS] linux 2.4.20 oops
>=20
>=20
> Brasseur Val=E9ry wrote:
>=20
> >I have got this oops in my kernel (2.4.20 + NFS-ALL )
> >note : the oops are not log in syslog (don't know why !)
> >but the oops seems nfs-related, any ideas ?
> >
> We had almost identical oopses with the same setup on our=20
> dual render boxes,
> which would often panic almost immediately afterwards.
>=20
> We used this patch from Ulrich Weigand=20
> (weigand@informatik.uni-erlangen.de), and
> haven't seen a problem since. See the archives for full=20
> details - basically seems
> to be an SMP race in rpc_delete_timer(). If you're not on an=20
> SMP system,
> it's probably NOT the right fix.
>=20
> HTH,
>=20
> Danny
>=20
> Index: net/sunrpc/sched.c
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /home/cvs/linux-2.3/net/sunrpc/sched.c,v
> retrieving revision 1.13
> diff -u -p -r1.13 sched.c
> --- net/sunrpc/sched.c 3 May 2001 16:18:18 -0000 1.13
> +++ net/sunrpc/sched.c 8 Mar 2003 22:46:11 -0000
> @@ -168,10 +168,8 @@ void rpc_add_timer(struct rpc_task *task
> static inline void
> rpc_delete_timer(struct rpc_task *task)
> {
> - if (timer_pending(&task->tk_timer)) {
> + if (del_timer_sync(&task->tk_timer))
> dprintk("RPC: %4d deleting timer\n", task->tk_pid);
> - del_timer_sync(&task->tk_timer);
> - }
> }
> =20
> /*
>=20
>=20
>=20
>=20
> --=20
> Danny Smith
> Senior Systems Administrator, Cinesite (Europe) Ltd
> 020 7973 4000 - x4055 / dannys@cinesite.co.uk
>=20
>=20
>=20
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux 2.4.20 oops
2003-04-29 9:44 ` Danny Smith
@ 2003-04-29 16:31 ` Philippe Troin
2003-04-29 16:44 ` Danny Smith
2003-04-29 16:53 ` Trond Myklebust
0 siblings, 2 replies; 11+ messages in thread
From: Philippe Troin @ 2003-04-29 16:31 UTC (permalink / raw)
To: Danny Smith; +Cc: Brasseur Valéry, nfs
Danny Smith <dannys@cinesite.co.uk> writes:
> Brasseur Val=E9ry wrote:
>=20
> >I have got this oops in my kernel (2.4.20 + NFS-ALL )
> >note : the oops are not log in syslog (don't know why !)
> >but the oops seems nfs-related, any ideas ?
>=20
> We had almost identical oopses with the same setup on our dual
> render boxes, which would often panic almost immediately afterwards.
>=20
> We used this patch from Ulrich Weigand
> (weigand@informatik.uni-erlangen.de), and haven't seen a problem
> since. See the archives for full details - basically seems to be an
> SMP race in rpc_delete_timer(). If you're not on an SMP system, it's
> probably NOT the right fix.
Has this been pushed to 2.4.21?
Phil.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux 2.4.20 oops
2003-04-29 16:31 ` Philippe Troin
@ 2003-04-29 16:44 ` Danny Smith
2003-04-29 16:53 ` Trond Myklebust
1 sibling, 0 replies; 11+ messages in thread
From: Danny Smith @ 2003-04-29 16:44 UTC (permalink / raw)
To: Philippe Troin; +Cc: Brasseur Valéry, nfs
Philippe Troin wrote:
>Danny Smith <dannys@cinesite.co.uk> writes:
>
>
>>We used this patch from Ulrich Weigand
>>(weigand@informatik.uni-erlangen.de), and haven't seen a problem
>>since. See the archives for full details - basically seems to be an
>>SMP race in rpc_delete_timer(). If you're not on an SMP system, it's
>>probably NOT the right fix.
>>
>>
>
>Has this been pushed to 2.4.21?
>
>
Not AFAIK.
From the comments in the original post I saw, it looks like Trond did
produce a similar patch, but I couldn't see anything when I was looking
for a resolution to our problems. Perhaps it's worth waiting for more
confirmation that this is
a) harmless in all cases, and
b) fixes the problems being seen
before trying to get this pushed up. I freely confess that I don't know
enough of this code at this level to feel confident beyond "it works
well for us".
Danny
--
Danny Smith
Senior Systems Administrator, Cinesite (Europe) Ltd
020 7973 4000 - x4055 / dannys@cinesite.co.uk
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux 2.4.20 oops
2003-04-29 16:31 ` Philippe Troin
2003-04-29 16:44 ` Danny Smith
@ 2003-04-29 16:53 ` Trond Myklebust
1 sibling, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2003-04-29 16:53 UTC (permalink / raw)
To: Danny Smith; +Cc: Brasseur Valéry, nfs
>>>>> " " == Philippe Troin <phil@fifi.org> writes:
> Has this been pushed to 2.4.21?
Yes.
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: linux 2.4.20 oops
@ 2003-04-29 16:56 Lever, Charles
2003-04-29 17:04 ` Trond Myklebust
0 siblings, 1 reply; 11+ messages in thread
From: Lever, Charles @ 2003-04-29 16:56 UTC (permalink / raw)
To: Trond Myklebust (E-mail); +Cc: nfs
the fix appears in 2.4.21-pre7, but was removed from
2.4.21-rc1. trond?
> -----Original Message-----
> From: Danny Smith [mailto:dannys@cinesite.co.uk]
> Sent: Tuesday, April 29, 2003 12:44 PM
> To: Philippe Troin
> Cc: Brasseur Val=E9ry; nfs@lists.sourceforge.net
> Subject: Re: [NFS] linux 2.4.20 oops
>=20
>=20
>=20
> Philippe Troin wrote:
>=20
> >Danny Smith <dannys@cinesite.co.uk> writes:
> > =20
> >
> >>We used this patch from Ulrich Weigand
> >>(weigand@informatik.uni-erlangen.de), and haven't seen a problem
> >>since. See the archives for full details - basically seems to be an
> >>SMP race in rpc_delete_timer(). If you're not on an SMP system, =
it's
> >>probably NOT the right fix.
> >> =20
> >>
> >
> >Has this been pushed to 2.4.21?
> > =20
> >
> Not AFAIK.
> From the comments in the original post I saw, it looks like=20
> Trond did=20
> produce a similar patch, but I couldn't see anything when I=20
> was looking=20
> for a resolution to our problems. Perhaps it's worth waiting for more =
> confirmation that this is
> a) harmless in all cases, and
> b) fixes the problems being seen
> before trying to get this pushed up. I freely confess that I=20
> don't know=20
> enough of this code at this level to feel confident beyond "it works=20
> well for us".
>=20
> Danny
>=20
> --=20
> Danny Smith
> Senior Systems Administrator, Cinesite (Europe) Ltd
> 020 7973 4000 - x4055 / dannys@cinesite.co.uk
>=20
>=20
>=20
>=20
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: linux 2.4.20 oops
2003-04-29 16:56 Lever, Charles
@ 2003-04-29 17:04 ` Trond Myklebust
0 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2003-04-29 17:04 UTC (permalink / raw)
To: Lever, Charles; +Cc: nfs
>>>>> " " == Charles Lever <Lever> writes:
> the fix appears in 2.4.21-pre7, but was removed from
> 2.4.21-rc1. trond?
Huh? From the latest BK pull linux/net/sunrpc/sched.c
/*
* Delete any timer for the current task. Because we use
del_timer_sync(),
* this function should never be called while holding rpc_queue_lock.
*/
static inline void
rpc_delete_timer(struct rpc_task *task)
{
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
del_timer_sync(&task->tk_timer);
}
So AFAICS it is still there. I certainly have no quarrel with that
patch. I hit upon the same problem + fix in a different bug-report at
~ the same time.
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: linux 2.4.20 oops
[not found] <6440EA1A6AA1D5118C6900902745938E07D5559A@black.eng.netapp.com>
@ 2003-04-29 17:21 ` Trond Myklebust
0 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2003-04-29 17:21 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Lever, Charles, NFS maillist
>>>>> " " == Charles Lever <Lever> writes:
> i built 2.4.21-rc1 from 2.4.20.tar.bz2 and the rc1 upgrade
> patch, and its not in my version of rc1.
Hmmm... strange. AFAICS there are no NFS or RPC changes whatsoever in
the rc1 patch on ftp.kernel.org.
As I said, all the changes (including Ulrich Weigand's patch) appear
to still be in the bitkeeper repository (see for instance the kernel
source browser on http://linux.bkbits.net:8080/linux-2.4).
Marcelo, is this perhaps a problem with an incorrect generation of the
'official' 2.4.21-rc1 patch?
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: linux 2.4.20 oops
@ 2003-04-29 17:24 Lever, Charles
0 siblings, 0 replies; 11+ messages in thread
From: Lever, Charles @ 2003-04-29 17:24 UTC (permalink / raw)
To: trond.myklebust, Marcelo Tosatti; +Cc: NFS maillist
i will verify that i have created 2.4.21-rc1 correctly.
> -----Original Message-----
> From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no]
> Sent: Tuesday, April 29, 2003 1:22 PM
> To: Marcelo Tosatti
> Cc: Lever, Charles; NFS maillist
> Subject: RE: [NFS] linux 2.4.20 oops
>=20
>=20
> >>>>> " " =3D=3D Charles Lever <Lever> writes:
>=20
> > i built 2.4.21-rc1 from 2.4.20.tar.bz2 and the rc1 upgrade
> > patch, and its not in my version of rc1.
>=20
> Hmmm... strange. AFAICS there are no NFS or RPC changes whatsoever in
> the rc1 patch on ftp.kernel.org.
> As I said, all the changes (including Ulrich Weigand's patch) appear
> to still be in the bitkeeper repository (see for instance the kernel
> source browser on http://linux.bkbits.net:8080/linux-2.4).
>=20
> Marcelo, is this perhaps a problem with an incorrect generation of the
> 'official' 2.4.21-rc1 patch?
>=20
> Cheers,
> Trond
>=20
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: linux 2.4.20 oops
@ 2003-04-29 17:31 Lever, Charles
0 siblings, 0 replies; 11+ messages in thread
From: Lever, Charles @ 2003-04-29 17:31 UTC (permalink / raw)
To: trond.myklebust; +Cc: Marcelo Tosatti, NFS maillist
ok, it looks like i was stupid and didn't apply the rc1 patch
to my rc1 tree. my copy of the rc1 pre-patch has the right
stuff in it.
sorry to be a bug.
> -----Original Message-----
> From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no]
> Sent: Tuesday, April 29, 2003 1:22 PM
> To: Marcelo Tosatti
> Cc: Lever, Charles; NFS maillist
> Subject: RE: [NFS] linux 2.4.20 oops
>=20
>=20
> >>>>> " " =3D=3D Charles Lever <Lever> writes:
>=20
> > i built 2.4.21-rc1 from 2.4.20.tar.bz2 and the rc1 upgrade
> > patch, and its not in my version of rc1.
>=20
> Hmmm... strange. AFAICS there are no NFS or RPC changes whatsoever in
> the rc1 patch on ftp.kernel.org.
> As I said, all the changes (including Ulrich Weigand's patch) appear
> to still be in the bitkeeper repository (see for instance the kernel
> source browser on http://linux.bkbits.net:8080/linux-2.4).
>=20
> Marcelo, is this perhaps a problem with an incorrect generation of the
> 'official' 2.4.21-rc1 patch?
>=20
> Cheers,
> Trond
>=20
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-04-29 17:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-29 9:18 linux 2.4.20 oops Brasseur Valéry
2003-04-29 9:44 ` Danny Smith
2003-04-29 16:31 ` Philippe Troin
2003-04-29 16:44 ` Danny Smith
2003-04-29 16:53 ` Trond Myklebust
-- strict thread matches above, loose matches on Subject: below --
2003-04-29 9:46 Brasseur Valéry
2003-04-29 16:56 Lever, Charles
2003-04-29 17:04 ` Trond Myklebust
[not found] <6440EA1A6AA1D5118C6900902745938E07D5559A@black.eng.netapp.com>
2003-04-29 17:21 ` Trond Myklebust
2003-04-29 17:24 Lever, Charles
2003-04-29 17:31 Lever, Charles
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.