From: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Jason Gunthorpe <jgg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Bart Van Assche <Bart.VanAssche-Sjgp3cTcYWE@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
<ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: Kernel v4.16 / v4.17 SRP and SRPT patches
Date: Wed, 10 Jan 2018 16:11:14 -0500 [thread overview]
Message-ID: <1515618674.10153.6.camel@redhat.com> (raw)
In-Reply-To: <20180110205243.GP4776-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
On Wed, 2018-01-10 at 13:52 -0700, Jason Gunthorpe wrote:
> On Wed, Jan 10, 2018 at 02:30:39PM -0500, Laurence Oberman wrote:
>
> > Just to be clear, I have posted two types of stack traces, one
> > where I
> > panic the other here above where I am not panicking.
>
> Guessing it is just luck which you hit.. Random corrupted memory and
> all..
>
> > This is not any special type of test. I booted the kernel, mapped
> > the SRP devices from the target server and proceeded to shutdown
> > the
> > client with shutdown -r now. This is part of my holistic test I
> > always do against new patches in Bart's tree. I start with
> > reboots,
> > them rmmod's etc. before I go on to perform I/O against the LUNS
> > from the target.
>
> Well, your shtudown is triggering the mlx driver shutdown code,
> then it looks like the SRP stuff gets cleaned up? That certainly is
> getting a bit exciting code wise
>
> I see there have been some changes in the mlx5 shutdown handling
> recently..
>
> As an experiment comment out the '.shutdown = shutdown' in
> drivers/net/ethernet/mellanox/mlx5/core/main.c?
>
> And it would be interesting to know if your past success kernels were
> printing the mlx5 shutdown message too? Perhaps something core kernel
> changed to enable this path for your test?
>
> Jason
Its a solid issue each time, the shutdown.
Here is rc6, I am building rc1 now and will then go to 4.14 to peel
this onion
4.15.0-rc6
[ 150.600416] ---[ end trace fc9e16dc996e3246 ]---
[ 150.626405] mlx5_1:mlx5_ib_event:2992:(pid 14203): warning: event on
port 0
[ 150.666308] scsi host1: ib_srp: failed RECV status WR flushed (5)
for CQE 00000000ecb7c551
[ 150.712873] mlx5_core 0000:08:00.1: mlx5_enter_error_state:128:(pid
14203): end
[ 150.753463] mlx5_core 0000:08:00.0: Shutdown was called
[ 150.793126] mlx5_core 0000:08:00.0: mlx5_enter_error_state:121:(pid
14203): start
[ 150.835047] mlx5_0:mlx5_ib_event:2992:(pid 14203): warning: event on
port 0
[ 150.874155] scsi host2: ib_srp: failed RECV status WR flushed (5)
for CQE 00000000f7f26a7b
[ 150.919317] mlx5_core 0000:08:00.0: mlx5_enter_error_state:128:(pid
14203): end
[ 151.449010] reboot: Restarting system
[ 151.467644] reboot: machine restart
Almost looks like changes made may require new Firmware maybe for my
CX4 card because its coming from here and I dont like to see pci_err**
called.
static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev,
pci_channel_state_t
state)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
struct mlx5_priv *priv = &dev->priv;
dev_info(&pdev->dev, "%s was called\n", __func__);
mlx5_enter_error_state(dev, false);
mlx5_unload_one(dev, priv, false);
/* In case of kernel call drain the health wq */
if (state) {
mlx5_drain_health_wq(dev);
mlx5_pci_disable_device(dev);
}
return state == pci_channel_io_perm_failure ?
PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-01-10 21:11 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-06 0:22 [PATCH 5/8] infiniband: fix ulp/srpt/ib_srpt.c kernel-doc notation Randy Dunlap
[not found] ` <5a5016c0.4c0a620a.ed2b3.60da-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2018-01-06 0:36 ` Bart Van Assche
[not found] ` <fcc3f226-848d-abc4-2a81-f4fd821761c9-Sjgp3cTcYWE@public.gmane.org>
2018-01-06 5:55 ` Randy Dunlap
[not found] ` <31f69352-b8b1-9ed1-635b-2c654b49c775-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2018-01-06 16:50 ` Bart Van Assche
2018-01-09 20:15 ` Laurence Oberman
[not found] ` <1515528956.3919.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-09 20:31 ` Laurence Oberman
[not found] ` <1515529869.3919.4.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-09 20:51 ` Kernel v4.16 / v4.17 SRP and SRPT patches Bart Van Assche
[not found] ` <1515531079.2721.26.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-09 21:00 ` Laurence Oberman
[not found] ` <1515531652.26021.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-09 22:40 ` Laurence Oberman
[not found] ` <1515537614.26021.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 13:42 ` Laurence Oberman
[not found] ` <1515591723.26021.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 18:26 ` Jason Gunthorpe
[not found] ` <20180110182648.GI4518-uk2M96/98Pc@public.gmane.org>
2018-01-10 18:40 ` Bart Van Assche
[not found] ` <1515609623.2745.20.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-10 18:59 ` Laurence Oberman
[not found] ` <1515610750.10153.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 19:15 ` Jason Gunthorpe
[not found] ` <20180110191510.GK4518-uk2M96/98Pc@public.gmane.org>
2018-01-10 19:30 ` Laurence Oberman
[not found] ` <1515612639.10153.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 20:52 ` Jason Gunthorpe
[not found] ` <20180110205243.GP4776-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-10 21:11 ` Laurence Oberman [this message]
[not found] ` <1515618674.10153.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 21:15 ` Jason Gunthorpe
[not found] ` <20180110211501.GS4776-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-11 13:02 ` Laurence Oberman
[not found] ` <1515675741.21421.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-11 18:20 ` Laurence Oberman
[not found] ` <1515694855.21421.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-11 18:35 ` Patch: RDMA mlx5_core.c : mlx5_try_fast_unload causes panics Laurence Oberman
2018-01-11 20:43 ` Kernel v4.16 / v4.17 SRP and SRPT patches Laurence Oberman
[not found] ` <1515703435.21421.9.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-11 21:15 ` Bart Van Assche
[not found] ` <1515705340.2752.60.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-11 21:33 ` Laurence Oberman
[not found] ` <1515706433.21421.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-11 21:43 ` Bart Van Assche
2018-01-12 21:11 ` Bart Van Assche
[not found] ` <1515791472.2396.57.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-13 0:09 ` Laurence Oberman
[not found] ` <1515802177.1566.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-13 1:57 ` Laurence Oberman
[not found] ` <1515808673.11354.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-13 14:53 ` Laurence Oberman
[not found] ` <1515855226.32050.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-15 16:12 ` Bart Van Assche
[not found] ` <1516032762.3951.5.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-15 16:52 ` Laurence Oberman
2018-01-10 21:17 ` Laurence Oberman
2018-01-10 19:17 ` Jason Gunthorpe
[not found] ` <20180110191758.GL4518-uk2M96/98Pc@public.gmane.org>
2018-01-10 19:32 ` Bart Van Assche
[not found] ` <1515612733.2745.27.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-10 22:43 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1515618674.10153.6.camel@redhat.com \
--to=loberman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=Bart.VanAssche-Sjgp3cTcYWE@public.gmane.org \
--cc=ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=jgg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.