From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Amir Vadai <amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Jack Morgenstein
<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Subject: Re: mlx4 problems with 4.2-rc8
Date: Mon, 31 Aug 2015 10:09:31 +0300 [thread overview]
Message-ID: <55E3FDAB.10706@mellanox.com> (raw)
In-Reply-To: <55E385DB.2-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On 8/31/2015 1:38 AM, Doug Ledford wrote:
> On 08/29/2015 09:13 PM, Or Gerlitz wrote:
>> On Fri, Aug 28, 2015 at 10:27 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>> I'm seeing this with rc8 on a dual port mlx4 adapter set to IB/Eth mode:
>>
>> mmm, both Amir and myself are just finishing vacations... so WB notes
>> are not always lovely as you want them to be, life
>>>
>>> [ 77.883513] IPv6: ADDRCONF(NETDEV_UP): mlx4_roce: link is not ready
>>> [ 77.892044] mlx4_en: mlx4_roce: frag:0 - size:1518 prefix:0 stride:1536
>>> [ 77.903129] genirq: Flags mismatch irq 135. 00000000
>>> (mlx4-65@0000:05:00.0) vs. 00000000 (mlx4-65@0000:05:00.0)
>>
>> is this strict regression from some known point in the past on this
>> system/config -- i.e 4.1 or 4.2-rc1?!
>
> Yes. When I was submitting the 4.2-rc changes this machine worked.
> This is one of my IB/Eth SRIOV machines. I tested with SRIOV disabled
> and it didn't effect things.
>
>> Can you please send the mlx4 driver output when you load it with debug
>> prints on? also do things work if you set the ports type to be ib/ib
>> or eth/eth?
>
> It should work as ib/ib given that in ib/eth mode the ib port works. I
> doubt eth/eth would work, but I'll try and see. OK, Eth/Eth mode fails
> too (at least on the second port, I can say on the first port for
> certain as I can't bring it up, it's still plugged into an IB switch).
> However, now in Eth/Eth mode, attempts to bring up the interface
> manually at the command line have hung, which it didn't do in IB/Eth mode.
>
> I'll try to ping things down further, but that's what I have so far.
>
> And as requested, the config is attached.
>
>>
>> send us your compressed .config
>>
>> Matan, any idea what goes wrong here?
>>
>> Or.
>>
>>
>>
>>> [ 77.914965] CPU: 0 PID: 1541 Comm: NetworkManager Not tainted
>>> 4.2.0-rc8 #58
>>> [ 77.923292] Hardware name: Dell Inc. PowerEdge R820/04K5X5, BIOS
>>> 2.2.3 07/09/2014
>>> [ 77.932205] 0000000000000000 00000000c16e3ce1 ffff8820365ab498
>>> ffffffff8167e6ff
>>> [ 77.941072] 0000000000000000 ffff8820339e9a00 ffff8820365ab4f8
>>> ffffffff810d2b6e
>>> [ 77.949938] 0000000000000246 ffff881032e67aa4 ffff881035e10ba0
>>> 00000000c16e3ce1
>>> [ 77.958812] Call Trace:
>>> [ 77.962109] [<ffffffff8167e6ff>] dump_stack+0x45/0x57
>>> [ 77.968412] [<ffffffff810d2b6e>] __setup_irq+0x51e/0x590
>>> [ 77.975018] [<ffffffffc03870a0>] ? mlx4_interrupt+0x80/0x80 [mlx4_core]
>>> [ 77.983072] [<ffffffff810d2d64>] request_threaded_irq+0xf4/0x1a0
>>> [ 77.990468] [<ffffffffc0385d55>] mlx4_assign_eq+0x135/0x360 [mlx4_core]
>>> [ 77.998513] [<ffffffffc0537537>] mlx4_en_activate_cq+0x2a7/0x310
>>> [mlx4_en]
>>> [ 78.006853] [<ffffffff8130a2c8>] ? alloc_cpumask_var_node+0x28/0x40
>>> [ 78.014542] [<ffffffff8131e8b9>] ? find_next_bit+0x19/0x20
>>> [ 78.021334] [<ffffffff8130a284>] ? cpumask_next_and+0x34/0x50
>>> [ 78.028425] [<ffffffffc053ae6b>] mlx4_en_start_port+0x1bb/0xb60
>>> [mlx4_en]
>>> [ 78.036689] [<ffffffffc037fe01>] ? mlx4_free_cmd_mailbox+0x31/0x40
>>> [mlx4_core]
>>> [ 78.045435] [<ffffffffc053bb59>] mlx4_en_open+0x349/0x630 [mlx4_en]
>>> [ 78.053107] [<ffffffff815732f9>] __dev_open+0xc9/0x140
>>> [ 78.059538] [<ffffffff81573621>] __dev_change_flags+0xa1/0x160
>>> [ 78.066718] [<ffffffff81573709>] dev_change_flags+0x29/0x60
>>> [ 78.073602] [<ffffffff81580dbe>] do_setlink+0x5be/0xa70
>>> [ 78.080097] [<ffffffffc01b158f>] ? mga_imageblit+0x2f/0x40 [mgag200]
>>> [ 78.087859] [<ffffffffc01b1456>] ? mga_dirty_update+0x1e6/0x2f0
>>> [mgag200]
>>> [ 78.096112] [<ffffffffc01b158f>] ? mga_imageblit+0x2f/0x40 [mgag200]
>>> [ 78.103873] [<ffffffff81582470>] rtnl_newlink+0x4f0/0x880
>>> [ 78.110586] [<ffffffff81582073>] ? rtnl_newlink+0xf3/0x880
>>> [ 78.117372] [<ffffffff81294238>] ? security_capable+0x48/0x60
>>> [ 78.124452] [<ffffffff81081b1d>] ? ns_capable+0x2d/0x60
>>> [ 78.130950] [<ffffffff8157f8c4>] rtnetlink_rcv_msg+0xa4/0x250
>>> [ 78.138028] [<ffffffff812987c0>] ? sock_has_perm+0x70/0x90
>>> [ 78.144824] [<ffffffff8157f820>] ? rtnetlink_rcv+0x40/0x40
>>> [ 78.151615] [<ffffffff815a2bdf>] netlink_rcv_skb+0xaf/0xc0
>>> [ 78.158425] [<ffffffff8157f80c>] rtnetlink_rcv+0x2c/0x40
>>> [ 78.164997] [<ffffffff815a22d1>] netlink_unicast+0x101/0x1f0
>>> [ 78.171937] [<ffffffff815a27c1>] netlink_sendmsg+0x401/0x660
>>> [ 78.178867] [<ffffffff81553e78>] sock_sendmsg+0x38/0x50
>>> [ 78.185335] [<ffffffff815547d5>] ___sys_sendmsg+0x275/0x290
>>> [ 78.192176] [<ffffffff81262c56>] ? sysctl_head_finish+0x46/0x50
>>> [ 78.199411] [<ffffffff81262e08>] ? proc_sys_call_handler+0x88/0xe0
>>> [ 78.206946] [<ffffffff8131854c>] ? lockref_put_or_lock+0x4c/0x80
>>> [ 78.214296] [<ffffffff81555197>] __sys_sendmsg+0x57/0xa0
>>> [ 78.220878] [<ffffffff815551f2>] SyS_sendmsg+0x12/0x20
>>> [ 78.227283] [<ffffffff8168536e>] entry_SYSCALL_64_fastpath+0x12/0x71
>>> [ 78.235114] mlx4_en 0000:05:00.0: Failed assigning an EQ to
>>> \xfffffff\xffffffb6Z6
>>> \xffffff88\xffffffff\xffffffff\xffffff84\xffffffa20\xffffff81\xffffffff\xffffffff\xffffffff\xffffffff
>>> [ 78.243732] mlx4_en: mlx4_roce: Failed activating Rx CQ
>>> [ 78.319027] mlx4_en: mlx4_roce: Failed starting port:2
>>>
>>> The interface in question is unusable.
>>>
>>> --
>>> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> GPG KeyID: 0E572FDD
>>>
>>>
>
>
Actually, it looks like the dump stack we've got before [1] was fixed.
This happens when the mlx4 driver is used in setups where number of
cores >= 32.
Doug, is that the case?
[1] http://www.spinics.net/lists/netdev/msg341171.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-08-31 7:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-29 5:27 mlx4 problems with 4.2-rc8 Doug Ledford
[not found] ` <55E142DC.8060205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-08-30 1:13 ` Or Gerlitz
[not found] ` <CAJ3xEMj5By11L3qbSKxcEiMarB6CeyeERMnuK_vvH11VLLFypw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-30 22:38 ` Doug Ledford
[not found] ` <55E385DB.2-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-08-31 7:09 ` Matan Barak [this message]
[not found] ` <55E3FDAB.10706-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-08-31 13:02 ` Doug Ledford
[not found] ` <55E45058.1070105-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-08-31 20:21 ` Or Gerlitz
[not found] ` <CAJ3xEMjp+3Y0y2d-K-zi9SnBSshN_C5x5KLY3oCpD_XriDCsWw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-31 22:13 ` Doug Ledford
[not found] ` <55E4D1A1.3060608-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-09-01 6:40 ` Or Gerlitz
[not found] ` <CAJ3xEMh-EXsoWsMhSf2ho_U_tz5tCUz2iWK+YZ3d76j5B7HJxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-01 8:42 ` Matan Barak
[not found] ` <CAAKD3BCbZpzG3g+H3xkuBM+Y9G14DZ=tJXeaGe5_ccQpiqWmCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-01 9:50 ` Or Gerlitz
[not found] ` <CAJ3xEMg1qoCrkMoymnbe_ww50Cg7yjOf4y3fTSb7ngoqbCZ1Hg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-01 13:54 ` Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55E3FDAB.10706@mellanox.com \
--to=matanb-vpraknaxozvwk0htik3j/w@public.gmane.org \
--cc=amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.