* Bug in nfs client handling of EJUKEBOX
@ 2005-02-22 19:46 Jan Sanislo
2005-02-24 11:55 ` Olaf Kirch
0 siblings, 1 reply; 4+ messages in thread
From: Jan Sanislo @ 2005-02-22 19:46 UTC (permalink / raw)
To: nfs
There seems to be a bug in the 2.6.11-rc code related to the handling
of nfsd's EJUNKBOX error return. The code for __rpc_execute in
net/sunrpc/sched.c looks like:
=============================
static int __rpc_execute(struct rpc_task *task)
{
int status = 0;
dprintk("RPC: %4d rpc_execute flgs %x\n",
task->tk_pid, task->tk_flags);
BUG_ON(RPC_IS_QUEUED(task));
restarted:
while (1) {
/*
* Garbage collection of pending timers...
*/
rpc_delete_timer(task);
.....
if (task->tk_exit) {
lock_kernel();
>>>> task->tk_exit(task);
unlock_kernel();
/* If tk_action is non-null, the user wants us to restart */
if (task->tk_action) {
if (!RPC_ASSASSINATED(task)) {
/* Release RPC slot and buffer memory */
if (task->tk_rqstp)
xprt_release(task);
rpc_free(task);
goto restarted;
}
printk(KERN_ERR "RPC: dead task tries to walk away.\n");
}
}
=============================
If the tk_exit procedure (e.g., nfs_read_done) called at ">>>>" above
detects EJUKEBOX as the result of the call, it will attempt to do
an rpc_restart and rpc_delay on the task. The rpc_delay will set a timer.
However, when the main loop is re-entered via the "goto restarted",
the first thing that happens is that all timers for the task are removed.
This results in the RPC task hanging in an uninterruptible wait.
Am I missing something here or is this really a problem?
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug in nfs client handling of EJUKEBOX
2005-02-22 19:46 Bug in nfs client handling of EJUKEBOX Jan Sanislo
@ 2005-02-24 11:55 ` Olaf Kirch
2005-02-24 20:02 ` Trond Myklebust
0 siblings, 1 reply; 4+ messages in thread
From: Olaf Kirch @ 2005-02-24 11:55 UTC (permalink / raw)
To: Jan Sanislo; +Cc: nfs
[-- Attachment #1: Type: text/plain, Size: 816 bytes --]
> If the tk_exit procedure (e.g., nfs_read_done) called at ">>>>" above
> detects EJUKEBOX as the result of the call, it will attempt to do
> an rpc_restart and rpc_delay on the task. The rpc_delay will set a timer.
> However, when the main loop is re-entered via the "goto restarted",
> the first thing that happens is that all timers for the task are removed.
> This results in the RPC task hanging in an uninterruptible wait.
>
> Am I missing something here or is this really a problem?
I think this is a genuine problem. Previously, we would clear the timers
only when the task was running, i.e. inside the if (RPC_RUNNING(task))
branch. Proposed patch attached.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
[-- Attachment #2: sunrpc-restart-delay-fix --]
[-- Type: text/plain, Size: 432 bytes --]
Index: linux-2.6.10/net/sunrpc/sched.c
===================================================================
--- linux-2.6.10.orig/net/sunrpc/sched.c
+++ linux-2.6.10/net/sunrpc/sched.c
@@ -569,7 +569,8 @@ static int __rpc_execute(struct rpc_task
/*
* Garbage collection of pending timers...
*/
- rpc_delete_timer(task);
+ if (RPC_IS_RUNNING(task))
+ rpc_delete_timer(task);
/*
* Execute any pending callback.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug in nfs client handling of EJUKEBOX
2005-02-24 11:55 ` Olaf Kirch
@ 2005-02-24 20:02 ` Trond Myklebust
2005-02-25 9:29 ` Olaf Kirch
0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2005-02-24 20:02 UTC (permalink / raw)
To: Olaf Kirch; +Cc: Jan Sanislo, nfs
[-- Attachment #1: Type: text/plain, Size: 566 bytes --]
to den 24.02.2005 Klokka 12:55 (+0100) skreiv Olaf Kirch:
> I think this is a genuine problem. Previously, we would clear the timers
> only when the task was running, i.e. inside the if (RPC_RUNNING(task))
> branch. Proposed patch attached.
I'm thinking it might make sense to move that code into rpc_delete_timer
itself, and perhaps change the condition to "!RPC_IS_QUEUED(task)", so
that we test if the task is queued on a waitqueue, rather than if it is
still running.
How about the following?
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
[-- Attachment #2: linux-2.6.11-02-fix_rpc_delete.dif --]
[-- Type: text/plain, Size: 1344 bytes --]
RPC: Don't kill timers when calling rpc_restart_call() after rpc_delay()
Currently, if we restart an RPC call after having set an RPC delay (for
instance in the case where an NFS EJUKEBOX error has occurred) the call
to rpc_delete_timer() at the top of the rpc_execute() loop will
kill off our timer.
This patch causes rpc_delete_timer() to detect if the rpc_task is still
queued on a wait queue, and refuse to delete the timer if this is the case.
Problem diagnosed by Jan Sanislo and Olaf Kirch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
sched.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6.11-rc4/net/sunrpc/sched.c
===================================================================
--- linux-2.6.11-rc4.orig/net/sunrpc/sched.c
+++ linux-2.6.11-rc4/net/sunrpc/sched.c
@@ -132,9 +132,11 @@ __rpc_add_timer(struct rpc_task *task, r
* Delete any timer for the current task. Because we use del_timer_sync(),
* this function should never be called while holding queue->lock.
*/
-static inline void
+static void
rpc_delete_timer(struct rpc_task *task)
{
+ if (RPC_IS_QUEUED(task))
+ return;
if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
del_singleshot_timer_sync(&task->tk_timer);
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug in nfs client handling of EJUKEBOX
2005-02-24 20:02 ` Trond Myklebust
@ 2005-02-25 9:29 ` Olaf Kirch
0 siblings, 0 replies; 4+ messages in thread
From: Olaf Kirch @ 2005-02-25 9:29 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Jan Sanislo, nfs
On Thu, Feb 24, 2005 at 12:02:53PM -0800, Trond Myklebust wrote:
> I'm thinking it might make sense to move that code into rpc_delete_timer
> itself, and perhaps change the condition to "!RPC_IS_QUEUED(task)", so
> that we test if the task is queued on a waitqueue, rather than if it is
> still running.
>
> How about the following?
Fine with me.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-02-25 9:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-22 19:46 Bug in nfs client handling of EJUKEBOX Jan Sanislo
2005-02-24 11:55 ` Olaf Kirch
2005-02-24 20:02 ` Trond Myklebust
2005-02-25 9:29 ` Olaf Kirch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.