Hello all, I think there is a race in the RPC code in 2.4.20, related to timeout and congestion window handling. The problem code is in net/sunrpc/xprt.c: static void xprt_timer(struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; spin_lock(&xprt->sock_lock); if (req->rq_received) goto out; if (!xprt->nocong) { if (xprt_expbackoff(task, req)) { rpc_add_timer(task, xprt_timer); goto out_unlock; } rpc_inc_timeo(&task->tk_client->cl_rtt); xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT); } req->rq_nresend++; The call to xprt_adjust_cwnd is the problem - I experienced a panic (null pointer dereference) in static void xprt_adjust_cwnd(struct rpc_xprt *xprt, int result) { unsigned long cwnd; cwnd = xprt->cwnd; if (result >= 0 && cwnd <= xprt->cong) { Here it is the "cwnd = xprt->cwnd" that causes the panic. xprt was 0. This means, in the xprt_timer code, that req->rq_xprt must have been 0. I guess this can happen because of the sequence: xprt = req->rq_xprt; spin_lock(&xprt->sock_lock); ... xprt_adjust_cwnd(req->rq_xprt); We don't know whether req has been modified between the assignment and the spin_lock. Attached is a patch to solve the problem - please comment. It does not solve the other potential (?) problem with: xprt = req->rq_xprt; spin_lock(&xprt->sock_lock); ... if (xprt_expbackoff(task, req)) { ... Any suggestions to that one? I cannot test whether my patch solve the problem, because this panic has happened once on a *heavily* loaded box which has run 2.4.20 ever since it came out. The race is extremely rare. It is an SMP box by the way. Thanks a lot to "baldrick" on kernel-newbies for the help! -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: