Hello all,

I think there is a race in the RPC code in 2.4.20, related to timeout
and congestion window handling.

The problem code is in net/sunrpc/xprt.c:

static void
xprt_timer(struct rpc_task *task)
{       
        struct rpc_rqst *req = task->tk_rqstp;
        struct rpc_xprt *xprt = req->rq_xprt;

        spin_lock(&xprt->sock_lock);
        if (req->rq_received)
                goto out;

        if (!xprt->nocong) {
                if (xprt_expbackoff(task, req)) {
                        rpc_add_timer(task, xprt_timer);
                        goto out_unlock;
                }
                rpc_inc_timeo(&task->tk_client->cl_rtt);
                xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT);
        }
        req->rq_nresend++;

The call to xprt_adjust_cwnd is the problem - I experienced a panic
(null pointer dereference) in 

static void
xprt_adjust_cwnd(struct rpc_xprt *xprt, int result)
{
        unsigned long   cwnd;

        cwnd = xprt->cwnd;
        if (result >= 0 && cwnd <= xprt->cong) {

Here it is the "cwnd = xprt->cwnd" that causes the panic. xprt was 0.

This means, in the xprt_timer code, that req->rq_xprt must have been 0.

I guess this can happen because of the sequence:

 xprt = req->rq_xprt;
 spin_lock(&xprt->sock_lock);
...
 xprt_adjust_cwnd(req->rq_xprt);

We don't know whether req has been modified between the assignment and
the spin_lock.

Attached is a patch to solve the problem - please comment.

It does not solve the other potential (?) problem with:

 xprt = req->rq_xprt;
 spin_lock(&xprt->sock_lock);
 ...
                if (xprt_expbackoff(task, req)) {
 ...

Any suggestions to that one?

I cannot test whether my patch solve the problem, because this panic has
happened once on a *heavily* loaded box which has run 2.4.20 ever since
it came out.  The race is extremely rare.  It is an SMP box by the way.

Thanks a lot to "baldrick" on kernel-newbies for the help!

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............: