From: "Steve Wise" <swise@opengridcomputing.com>
To: "'J. Bruce Fields'" <bfields@fieldses.org>,
"'Yan Burman'" <yanb@mellanox.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
"'Or Gerlitz'" <ogerlitz@mellanox.com>
Subject: RE: NFS over RDMA crashing
Date: Fri, 7 Mar 2014 14:41:14 -0600 [thread overview]
Message-ID: <005d01cf3a45$94ced0d0$be6c7270$@opengridcomputing.com> (raw)
In-Reply-To: <003601cf3a26$94523ee0$bcf6bca0$@opengridcomputing.com>
> >
> > Does this help?
> >
> > They must have added this for some reason, but I'm not seeing how it
> > could have ever done anything....
> >
> > --b.
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > index 0ce7552..e8f25ec 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > @@ -520,13 +520,6 @@ next_sge:
> > for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> > ch_no++)
> > rqstp->rq_pages[ch_no] = NULL;
> >
> > - /*
> > - * Detach res pages. If svc_release sees any it will attempt to
> > - * put them.
> > - */
> > - while (rqstp->rq_next_page != rqstp->rq_respages)
> > - *(--rqstp->rq_next_page) = NULL;
> > -
> > return err;
> > }
> >
>
> I can reproduce this server crash readily on a recent net-next tree.
I
> added the above change, and see a different crash:
>
> [ 192.764773] BUG: unable to handle kernel paging request at
> 0000100000000000
> [ 192.765688] IP: [<ffffffff8113c159>] put_page+0x9/0x50
> [ 192.765688] PGD 0
> [ 192.765688] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 192.765688] Modules linked in: nfsd lockd nfs_acl exportfs
> auth_rpcgss oid_registry svcrdma tg3 ip6table_filter ip6_tables
> ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
> nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter
> ip_tables bridge stp llc autofs4 sunrpc rdma_ucm rdma_cm iw_cm
ib_ipoib
> ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb4 iw_cxgb3 cxgb3 mdio
> ib_qib dca mlx4_en ib_mthca vhost_net macvtap macvlan vhost tun
> kvm_intel kvm uinput ipmi_si ipmi_msghandler iTCO_wdt
> iTCO_vendor_support dcdbas sg microcode pcspkr mlx4_ib ib_sa serio_raw
> ib_mad ib_core ib_addr ipv6 ptp pps_core lpc_ich mfd_core i5100_edac
> edac_core mlx4_core cxgb4 ext4 jbd2 mbcache sd_mod crc_t10dif
> crct10dif_common sr_mod cdrom pata_acpi ata_generic ata_piix radeon
> ttm
> drm_kms_helper drm i2c_algo_bit
> [ 192.765688] i2c_core dm_mirror dm_region_hash dm_log dm_mod
> [last
> unloaded: tg3]
> [ 192.765688] CPU: 1 PID: 6590 Comm: nfsd Not tainted
> 3.14.0-rc3-pending+ #5
> [ 192.765688] Hardware name: Dell Inc. PowerEdge R300/0TY179, BIOS
> 1.3.0 08/15/2008
> [ 192.765688] task: ffff8800b75c62c0 ti: ffff8801faa4a000 task.ti:
> ffff8801faa4a000
> [ 192.765688] RIP: 0010:[<ffffffff8113c159>] [<ffffffff8113c159>]
> put_page+0x9/0x50
> [ 192.765688] RSP: 0018:ffff8801faa4be28 EFLAGS: 00010206
> [ 192.765688] RAX: ffff8801fa9542a8 RBX: ffff8801fa954000 RCX:
> 0000000000000001
> [ 192.765688] RDX: ffff8801fa953e10 RSI: 0000000000000200 RDI:
> 0000100000000000
> [ 192.765688] RBP: ffff8801faa4be28 R08: 000000009b8d39b9 R09:
> 0000000000000017
> [ 192.765688] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8800cb2e7c00
> [ 192.765688] R13: ffff8801fa954210 R14: 0000000000000000 R15:
> 0000000000000000
> [ 192.765688] FS: 0000000000000000(0000) GS:ffff88022ec80000(0000)
> knlGS:0000000000000000
> [ 192.765688] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 192.765688] CR2: 0000100000000000 CR3: 00000000b9a5a000 CR4:
> 00000000000007e0
> [ 192.765688] Stack:
> [ 192.765688] ffff8801faa4be58 ffffffffa0881f4e ffff880204dd0e00
> ffff8801fa954000
> [ 192.765688] ffff880204dd0e00 ffff8800cb2e7c00 ffff8801faa4be88
> ffffffffa08825f5
> [ 192.765688] ffff8801fa954000 ffff8800b75c62c0 ffffffff81ae5ac0
> ffffffffa08cf930
> [ 192.765688] Call Trace:
> [ 192.765688] [<ffffffffa0881f4e>] svc_xprt_release+0x6e/0xf0
[sunrpc]
> [ 192.765688] [<ffffffffa08825f5>] svc_recv+0x165/0x190 [sunrpc]
> [ 192.765688] [<ffffffffa08cf930>] ?
nfsd_pool_stats_release+0x60/0x60
> [nfsd]
> [ 192.765688] [<ffffffffa08cf9e5>] nfsd+0xb5/0x160 [nfsd]
> [ 192.765688] [<ffffffffa08cf930>] ?
nfsd_pool_stats_release+0x60/0x60
> [nfsd]
> [ 192.765688] [<ffffffff8107471e>] kthread+0xce/0xf0
> [ 192.765688] [<ffffffff81074650>] ?
> kthread_freezable_should_stop+0x70/0x70
> [ 192.765688] [<ffffffff81584e2c>] ret_from_fork+0x7c/0xb0
> [ 192.765688] [<ffffffff81074650>] ?
> kthread_freezable_should_stop+0x70/0x70
> [ 192.765688] Code: 8d 7b 10 e8 ea fa ff ff 48 c7 03 00 00 00 00 48
83
> c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66
66
> 66 90 <66> f7 07 00 c0 75 32 8b 47 1c 48 8d 57 1c 85 c0 74 1c f0 ff 0a
> [ 192.765688] RIP [<ffffffff8113c159>] put_page+0x9/0x50
> [ 192.765688] RSP <ffff8801faa4be28>
> [ 192.765688] CR2: 0000100000000000
> crash>
This new crash is here calling put_page() on garbage I guess:
static inline void svc_free_res_pages(struct svc_rqst *rqstp)
{
while (rqstp->rq_next_page != rqstp->rq_respages) {
struct page **pp = --rqstp->rq_next_page;
if (*pp) {
put_page(*pp);
*pp = NULL;
}
}
}
WARNING: multiple messages have this Message-ID (diff)
From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: "'J. Bruce Fields'"
<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>,
'Yan Burman' <yanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
'Or Gerlitz' <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: RE: NFS over RDMA crashing
Date: Fri, 7 Mar 2014 14:41:14 -0600 [thread overview]
Message-ID: <005d01cf3a45$94ced0d0$be6c7270$@opengridcomputing.com> (raw)
In-Reply-To: <003601cf3a26$94523ee0$bcf6bca0$@opengridcomputing.com>
> >
> > Does this help?
> >
> > They must have added this for some reason, but I'm not seeing how it
> > could have ever done anything....
> >
> > --b.
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > index 0ce7552..e8f25ec 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > @@ -520,13 +520,6 @@ next_sge:
> > for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> > ch_no++)
> > rqstp->rq_pages[ch_no] = NULL;
> >
> > - /*
> > - * Detach res pages. If svc_release sees any it will attempt to
> > - * put them.
> > - */
> > - while (rqstp->rq_next_page != rqstp->rq_respages)
> > - *(--rqstp->rq_next_page) = NULL;
> > -
> > return err;
> > }
> >
>
> I can reproduce this server crash readily on a recent net-next tree.
I
> added the above change, and see a different crash:
>
> [ 192.764773] BUG: unable to handle kernel paging request at
> 0000100000000000
> [ 192.765688] IP: [<ffffffff8113c159>] put_page+0x9/0x50
> [ 192.765688] PGD 0
> [ 192.765688] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 192.765688] Modules linked in: nfsd lockd nfs_acl exportfs
> auth_rpcgss oid_registry svcrdma tg3 ip6table_filter ip6_tables
> ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
> nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter
> ip_tables bridge stp llc autofs4 sunrpc rdma_ucm rdma_cm iw_cm
ib_ipoib
> ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb4 iw_cxgb3 cxgb3 mdio
> ib_qib dca mlx4_en ib_mthca vhost_net macvtap macvlan vhost tun
> kvm_intel kvm uinput ipmi_si ipmi_msghandler iTCO_wdt
> iTCO_vendor_support dcdbas sg microcode pcspkr mlx4_ib ib_sa serio_raw
> ib_mad ib_core ib_addr ipv6 ptp pps_core lpc_ich mfd_core i5100_edac
> edac_core mlx4_core cxgb4 ext4 jbd2 mbcache sd_mod crc_t10dif
> crct10dif_common sr_mod cdrom pata_acpi ata_generic ata_piix radeon
> ttm
> drm_kms_helper drm i2c_algo_bit
> [ 192.765688] i2c_core dm_mirror dm_region_hash dm_log dm_mod
> [last
> unloaded: tg3]
> [ 192.765688] CPU: 1 PID: 6590 Comm: nfsd Not tainted
> 3.14.0-rc3-pending+ #5
> [ 192.765688] Hardware name: Dell Inc. PowerEdge R300/0TY179, BIOS
> 1.3.0 08/15/2008
> [ 192.765688] task: ffff8800b75c62c0 ti: ffff8801faa4a000 task.ti:
> ffff8801faa4a000
> [ 192.765688] RIP: 0010:[<ffffffff8113c159>] [<ffffffff8113c159>]
> put_page+0x9/0x50
> [ 192.765688] RSP: 0018:ffff8801faa4be28 EFLAGS: 00010206
> [ 192.765688] RAX: ffff8801fa9542a8 RBX: ffff8801fa954000 RCX:
> 0000000000000001
> [ 192.765688] RDX: ffff8801fa953e10 RSI: 0000000000000200 RDI:
> 0000100000000000
> [ 192.765688] RBP: ffff8801faa4be28 R08: 000000009b8d39b9 R09:
> 0000000000000017
> [ 192.765688] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8800cb2e7c00
> [ 192.765688] R13: ffff8801fa954210 R14: 0000000000000000 R15:
> 0000000000000000
> [ 192.765688] FS: 0000000000000000(0000) GS:ffff88022ec80000(0000)
> knlGS:0000000000000000
> [ 192.765688] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 192.765688] CR2: 0000100000000000 CR3: 00000000b9a5a000 CR4:
> 00000000000007e0
> [ 192.765688] Stack:
> [ 192.765688] ffff8801faa4be58 ffffffffa0881f4e ffff880204dd0e00
> ffff8801fa954000
> [ 192.765688] ffff880204dd0e00 ffff8800cb2e7c00 ffff8801faa4be88
> ffffffffa08825f5
> [ 192.765688] ffff8801fa954000 ffff8800b75c62c0 ffffffff81ae5ac0
> ffffffffa08cf930
> [ 192.765688] Call Trace:
> [ 192.765688] [<ffffffffa0881f4e>] svc_xprt_release+0x6e/0xf0
[sunrpc]
> [ 192.765688] [<ffffffffa08825f5>] svc_recv+0x165/0x190 [sunrpc]
> [ 192.765688] [<ffffffffa08cf930>] ?
nfsd_pool_stats_release+0x60/0x60
> [nfsd]
> [ 192.765688] [<ffffffffa08cf9e5>] nfsd+0xb5/0x160 [nfsd]
> [ 192.765688] [<ffffffffa08cf930>] ?
nfsd_pool_stats_release+0x60/0x60
> [nfsd]
> [ 192.765688] [<ffffffff8107471e>] kthread+0xce/0xf0
> [ 192.765688] [<ffffffff81074650>] ?
> kthread_freezable_should_stop+0x70/0x70
> [ 192.765688] [<ffffffff81584e2c>] ret_from_fork+0x7c/0xb0
> [ 192.765688] [<ffffffff81074650>] ?
> kthread_freezable_should_stop+0x70/0x70
> [ 192.765688] Code: 8d 7b 10 e8 ea fa ff ff 48 c7 03 00 00 00 00 48
83
> c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66
66
> 66 90 <66> f7 07 00 c0 75 32 8b 47 1c 48 8d 57 1c 85 c0 74 1c f0 ff 0a
> [ 192.765688] RIP [<ffffffff8113c159>] put_page+0x9/0x50
> [ 192.765688] RSP <ffff8801faa4be28>
> [ 192.765688] CR2: 0000100000000000
> crash>
This new crash is here calling put_page() on garbage I guess:
static inline void svc_free_res_pages(struct svc_rqst *rqstp)
{
while (rqstp->rq_next_page != rqstp->rq_respages) {
struct page **pp = --rqstp->rq_next_page;
if (*pp) {
put_page(*pp);
*pp = NULL;
}
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-03-07 20:41 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-06 15:48 NFS over RDMA crashing Yan Burman
2013-02-06 15:48 ` Yan Burman
2013-02-06 15:58 ` Steve Wise
2013-02-06 15:58 ` Steve Wise
2013-02-06 17:06 ` Jeff Becker
2013-02-06 17:06 ` Jeff Becker
2013-02-07 15:54 ` Yan Burman
2013-02-07 15:54 ` Yan Burman
2013-02-06 22:24 ` J. Bruce Fields
2013-02-06 22:24 ` J. Bruce Fields
2013-02-06 22:28 ` Steve Wise
2013-02-06 22:28 ` Steve Wise
2013-02-08 5:37 ` Tom Tucker
2013-02-08 5:37 ` Tom Tucker
2013-02-07 16:41 ` J. Bruce Fields
2013-02-07 16:41 ` J. Bruce Fields
2013-02-11 15:19 ` Yan Burman
2013-02-11 15:19 ` Yan Burman
2013-02-11 18:13 ` J. Bruce Fields
2013-02-11 18:13 ` J. Bruce Fields
2013-02-15 15:27 ` J. Bruce Fields
2013-02-15 15:27 ` J. Bruce Fields
2013-02-18 11:44 ` Yan Burman
2013-02-18 11:44 ` Yan Burman
2014-03-07 16:59 ` Steve Wise
2014-03-07 16:59 ` Steve Wise
2014-03-07 20:41 ` Steve Wise [this message]
2014-03-07 20:41 ` Steve Wise
2014-03-08 16:39 ` Steve Wise
2014-03-08 16:39 ` Steve Wise
2014-03-08 19:20 ` Steve Wise
2014-03-08 19:20 ` Steve Wise
2014-03-08 20:13 ` Steve Wise
2014-03-08 20:13 ` Steve Wise
2014-03-12 13:33 ` Jeff Layton
2014-03-12 13:33 ` Jeff Layton
2014-03-12 14:05 ` Trond Myklebust
2014-03-12 14:05 ` Trond Myklebust
2014-03-12 14:22 ` Tom Tucker
2014-03-12 14:22 ` Tom Tucker
2014-03-12 14:28 ` Jeffrey Layton
2014-03-12 14:28 ` Jeffrey Layton
2014-03-12 15:03 ` Trond Myklebust
2014-03-12 15:03 ` Trond Myklebust
2014-03-12 15:29 ` Jeffrey Layton
2014-03-12 15:29 ` Jeffrey Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='005d01cf3a45$94ced0d0$be6c7270$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=yanb@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.