From: Benny Halevy <bhalevy@panasas.com>
To: "Labiaga, Ricardo" <Ricardo.Labiaga@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
pNFS Mailing List <pnfs@linux-nfs.org>,
NFS list <linux-nfs@vger.kernel.org>
Subject: Re: nfs41: sunrpc: handle clnt==NULL in call_status
Date: Wed, 05 Nov 2008 15:12:40 +0200 [thread overview]
Message-ID: <49119BC8.7020701@panasas.com> (raw)
In-Reply-To: <273FE88A07F5D445824060902F70034402974728-hX7t0kiaRRpT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
On Nov. 04, 2008, 23:24 +0200, "Labiaga, Ricardo" <Ricardo.Labiaga@netapp.com> wrote:
>> -----Original Message-----
>> From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no]
>> Sent: Tuesday, November 04, 2008 10:19 AM
>> To: Benny Halevy
>> Cc: pNFS Mailing List; NFS list
>> Subject: Re: nfs41: sunrpc: handle clnt==NULL in call_status
>>
>> On Tue, 2008-11-04 at 17:52 +0200, Benny Halevy wrote:
>>> On the following error path task->tk_client == NULL causing NULL
> deref:
>>> Nov 4 14:14:32 tl1 kernel: Callback slot table overflowed
>>> Nov 4 14:14:32 tl1 kernel: BUG: unable to handle kernel NULL
> pointer
>> dereference at 0000000000000078
>>> Nov 4 14:14:32 tl1 kernel: IP: [<ffffffffa02b4dd2>]
>> call_status+0xe1/0x145 [sunrpc]
>>> Nov 4 14:14:32 tl1 kernel: PGD 2f973067 PUD 2f974067 PMD 0
>>> Nov 4 14:14:32 tl1 kernel: Oops: 0000 [1] SMP
>>> Nov 4 14:14:32 tl1 kernel: CPU 0
>>> Nov 4 14:14:32 tl1 kernel: Modules linked in: panfs(P)
> panlayoutdriver
>> nfslayoutdriver nfsd auth_rpcgss exportfs nfs lockd nfs_acl fuse
> sunrpc
>> ipv6 cpufreq_ondemand powernow_k8 freq_table dm_mirror dm_log
> dm_multipath
>> dm_mod snd_hda_intel snd_seq_dummy sr_mod snd_seq_oss cdrom
>> snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
>> snd_pcm snd_timer snd_page_alloc snd_hwdep snd pcspkr serio_raw k8temp
>> hwmon forcedeth soundcore pata_amd sg jedec_probe cfi_probe gen_probe
> mtd
>> i2c_nforce2 chipreg i2c_core button map_funcs sata_nv ata_generic
>> pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd
>> ehci_hcd [last unloaded: scsi_wait_scan]
>>> Nov 4 14:14:32 tl1 kernel: Pid: 3405, comm: nfsv4.1-svc Tainted: P
>> 2.6.27-pnfs #1
>>> Nov 4 14:14:32 tl1 kernel: RIP: 0010:[<ffffffffa02b4dd2>]
>> [<ffffffffa02b4dd2>] call_status+0xe1/0x145 [sunrpc]
>>> Nov 4 14:14:32 tl1 kernel: RSP: 0018:ffff88002a70bd60 EFLAGS:
> 00010246
>>> Nov 4 14:14:32 tl1 kernel: RAX: 0000000000000000 RBX:
> 00000000ffffff95
>> RCX: 0000000000000000
>>> Nov 4 14:14:32 tl1 kernel: RDX: ffff88002a70bd10 RSI:
> ffff88002fc52dc0
>> RDI: ffff88002fc52dc0
>>> Nov 4 14:14:32 tl1 kernel: RBP: ffff88002a70bd80 R08:
> ffff88002f9dae40
>> R09: ffffffff81489e60
>>> Nov 4 14:14:32 tl1 kernel: R10: ffff880038497800 R11:
> ffff88002a70bd30
>> R12: ffff88002fc52dc0
>>> Nov 4 14:14:32 tl1 kernel: R13: 0000000000000000 R14:
> ffff88002f57fa00
>> R15: ffff88002fc52e50
>>> Nov 4 14:14:32 tl1 kernel: FS: 00007fe78ffac6f0(0000)
>> GS:ffffffff8148da80(0000) knlGS:0000000000000000
>>> Nov 4 14:14:32 tl1 kernel: CS: 0010 DS: 0018 ES: 0018 CR0:
>> 000000008005003b
>>> Nov 4 14:14:32 tl1 kernel: CR2: 0000000000000078 CR3:
> 000000002f971000
>> CR4: 00000000000006e0
>>> Nov 4 14:14:32 tl1 kernel: DR0: 0000000000000000 DR1:
> 0000000000000000
>> DR2: 0000000000000000
>>> Nov 4 14:14:32 tl1 kernel: DR3: 0000000000000000 DR6:
> 00000000ffff0ff0
>> DR7: 0000000000000400
>>> Nov 4 14:14:32 tl1 kernel: Process nfsv4.1-svc (pid: 3405,
> threadinfo
>> ffff88002a70a000, task ffff88003694d900)
>>> Nov 4 14:14:32 tl1 kernel: Stack: 0000000000000000
> ffff88002f57fa00
>> ffff88002fc52dc0 ffff880030450ce8
>>> Nov 4 14:14:32 tl1 kernel: ffff88002a70bdc0 ffffffffa02bb9db
>> ffff88002a70bdf0 ffff88002fc52dc0
>>> Nov 4 14:14:32 tl1 kernel: ffff88002f57fa00 ffff88002f57fa00
>> ffff880030450ce8 ffff88002a734000
>>> Nov 4 14:14:32 tl1 kernel: Call Trace:
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffffa02bb9db>]
>> __rpc_execute+0x7e/0x21f [sunrpc]
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffffa02bbba4>]
> rpc_execute+0x28/0x2c
>> [sunrpc]
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffffa02b5596>]
>> rpc_run_bc_task+0x88/0xae [sunrpc]
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffffa02c923d>] bc_send+0x35/0x5a
>> [sunrpc]
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffffa02bdc92>]
>> bc_svc_process+0xf6/0x101 [sunrpc]
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffff81048a6c>] ?
>> prepare_to_wait+0x5c/0x65
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffffa033a4c9>]
>> nfs41_callback_svc+0xda/0x131 [nfs]
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffff810488a0>] ?
>> autoremove_wake_function+0x0/0x38
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffffa033a3ef>] ?
>> nfs41_callback_svc+0x0/0x131 [nfs]
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffff81048561>] kthread+0x49/0x76
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffff8100cf49>] child_rip+0xa/0x11
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffff8102a401>] ?
>> pick_next_task_fair+0x93/0xa3
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffff81048518>] ? kthread+0x0/0x76
>>> Nov 4 14:14:32 tl1 kernel: [<ffffffff8100cf3f>] ?
> child_rip+0x0/0x11
>>> Nov 4 14:14:32 tl1 kernel:
>>> Nov 4 14:14:32 tl1 kernel:
>>> Nov 4 14:14:32 tl1 kernel: Code: e8 7c 70 00 00 49 8b 44 24 20 49
> c7 44
>> 24 68 dc 48 2b a0 f6 40 78 02 74 6d 41 8b b6 34 01 00 00 48 8b 78 30
> e8 b9
>> 2a 00 00 eb 5b <41> f6 45 78 04 74 13 49 8b 55 30 48 8d 82 80 04 00 00
> f0
>> 80 a2
>>> Nov 4 14:14:32 tl1 kernel: RIP [<ffffffffa02b4dd2>]
>> call_status+0xe1/0x145 [sunrpc]
>>> Nov 4 14:14:32 tl1 kernel: RSP <ffff88002a70bd60>
>>> Nov 4 14:14:32 tl1 kernel: CR2: 0000000000000078
>>> Nov 4 14:14:32 tl1 kernel: ---[ end trace 1584c489f2573aaf ]---
>>>
>>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>>> ---
>>>
>>> Trond, I'm not sure if this can happen without nfs41.
>>> However, please consider this patch for upstream since
>>> it is safe to do in any case.
>>>
>>> Benny
>>>
>>> net/sunrpc/clnt.c | 8 +++++---
>>> 1 files changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>> index 78fc483..b555d9f 100644
>>> --- a/net/sunrpc/clnt.c
>>> +++ b/net/sunrpc/clnt.c
>>> @@ -1206,7 +1206,8 @@ call_status(struct rpc_task *task)
>>> break;
>>> case -ECONNREFUSED:
>>> case -ENOTCONN:
>>> - rpc_force_rebind(clnt);
>>> + if (clnt)
>>> + rpc_force_rebind(clnt);
>>> task->tk_action = call_bind;
>>> break;
>>> case -EAGAIN:
>>> @@ -1217,9 +1218,10 @@ call_status(struct rpc_task *task)
>>> rpc_exit(task, status);
>>> break;
>>> default:
>>> - if (clnt->cl_chatty)
>>> + if (!clnt || clnt->cl_chatty)
>>> printk("%s: RPC call returned error %d\n",
>>> - clnt->cl_protname, -status);
>>> + clnt ? clnt->cl_protname : "<unknown
> protocol>",
>>> + -status);
>>> rpc_exit(task, status);
>>> }
>>> }
>> BIG NACK!
>>
>> How does even it make sense for a task to get past call_transmit and
>> call_status without having task->tk_client set? This sounds like
> serious
>> borkenness in the nfsv4.1 patches...
Ricardo,
rpc_run_bc_task sets no task_setup_data.rpc_client when calling rpc_new_task.
We might be able to use to fore channel rpc client, but
I'm still concerned that using this path for sending the callback
replies is wrong.
>
> The "Callback slot table overflowed" message means we couldn't obtain a
> pre-allocated rpc_rqst to process the callback. When this occurs, the
> client sets the XPRT_CLOSE_WAIT bit to close the connection. (As an
> aside, should we be using xprt_force_disconnect() instead?).
could be, but we might be doing the right thing on the client side
in this case already.
>
> This leads me to believe the server exceeded the number of outstanding
> requests allowed on the backchannel (slot count of one at the moment),
> the new request caused us to close the connection and pulled the rug
> from under the callback service. I'll investigate further.
We simply got ENOTCONN back (see value of RBX)
>
> Is this reproducible?
Not yet.
Benny
>
> - ricardo
>
>
>> Trond
>> ...
>>
next prev parent reply other threads:[~2008-11-05 13:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-04 15:52 nfs41: sunrpc: handle clnt==NULL in call_status Benny Halevy
2008-11-04 16:09 ` [pnfs] " Peter Staubach
2008-11-05 13:20 ` Benny Halevy
2008-11-04 18:19 ` Trond Myklebust
[not found] ` <1225822763.30407.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-04 21:24 ` Labiaga, Ricardo
[not found] ` <273FE88A07F5D445824060902F70034402974728-hX7t0kiaRRpT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-11-05 0:43 ` [pnfs] " Tom Tucker
2008-11-05 2:40 ` Labiaga, Ricardo
[not found] ` <273FE88A07F5D445824060902F700344029749C8-hX7t0kiaRRpT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-11-05 16:53 ` Chuck Lever
2008-11-05 18:05 ` Tom Tucker
[not found] ` <3682CC68-8C61-472B-8D75-B19F31576428@oracle.com>
2008-11-05 18:52 ` Talpey, Thomas
2008-11-05 18:08 ` Trond Myklebust
2008-11-05 17:59 ` Tom Tucker
2008-11-05 13:12 ` Benny Halevy [this message]
2008-11-06 3:05 ` Labiaga, Ricardo
[not found] ` <273FE88A07F5D445824060902F70034402975078-hX7t0kiaRRpT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-11-06 7:19 ` [pnfs] " Ricardo Labiaga
2008-11-06 7:48 ` Benny Halevy
2008-11-07 2:41 ` Labiaga, Ricardo
2008-11-07 11:23 ` Halevy, Benny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49119BC8.7020701@panasas.com \
--to=bhalevy@panasas.com \
--cc=Ricardo.Labiaga@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=pnfs@linux-nfs.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox