public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Jason Gunthorpe <jgg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Bart Van Assche <Bart.VanAssche-Sjgp3cTcYWE@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
	<ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: Kernel v4.16 / v4.17 SRP and SRPT patches
Date: Wed, 10 Jan 2018 16:11:14 -0500	[thread overview]
Message-ID: <1515618674.10153.6.camel@redhat.com> (raw)
In-Reply-To: <20180110205243.GP4776-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

On Wed, 2018-01-10 at 13:52 -0700, Jason Gunthorpe wrote:
> On Wed, Jan 10, 2018 at 02:30:39PM -0500, Laurence Oberman wrote:
> 
> > Just to be clear, I have posted two types of stack traces, one
> > where I
> > panic the other here above where I am not panicking.
> 
> Guessing it is just luck which you hit.. Random corrupted memory and
> all..
> 
> > This is not any special type of test. I booted the kernel, mapped
> > the SRP devices from the target server and proceeded to shutdown
> > the
> > client with shutdown -r now.  This is part of my holistic test I
> > always do against new patches in Bart's tree.  I start with
> > reboots,
> > them rmmod's etc. before I go on to perform I/O against the LUNS
> > from the target.
> 
> Well, your shtudown is triggering the mlx driver shutdown code,
> then it looks like the SRP stuff gets cleaned up? That certainly is
> getting a bit exciting code wise
> 
> I see there have been some changes in the mlx5 shutdown handling
> recently..
> 
> As an experiment comment out the '.shutdown = shutdown' in
> drivers/net/ethernet/mellanox/mlx5/core/main.c?
> 
> And it would be interesting to know if your past success kernels were
> printing the mlx5 shutdown message too? Perhaps something core kernel
> changed to enable this path for your test?
> 
> Jason

Its a solid issue each time, the shutdown.

Here is rc6, I am building rc1 now and will then go to 4.14 to peel
this onion

4.15.0-rc6

[  150.600416] ---[ end trace fc9e16dc996e3246 ]---
[  150.626405] mlx5_1:mlx5_ib_event:2992:(pid 14203): warning: event on
port 0
[  150.666308] scsi host1: ib_srp: failed RECV status WR flushed (5)
for CQE 00000000ecb7c551
[  150.712873] mlx5_core 0000:08:00.1: mlx5_enter_error_state:128:(pid
14203): end
[  150.753463] mlx5_core 0000:08:00.0: Shutdown was called
[  150.793126] mlx5_core 0000:08:00.0: mlx5_enter_error_state:121:(pid
14203): start
[  150.835047] mlx5_0:mlx5_ib_event:2992:(pid 14203): warning: event on
port 0
[  150.874155] scsi host2: ib_srp: failed RECV status WR flushed (5)
for CQE 00000000f7f26a7b
[  150.919317] mlx5_core 0000:08:00.0: mlx5_enter_error_state:128:(pid
14203): end
[  151.449010] reboot: Restarting system
[  151.467644] reboot: machine restart


Almost looks like changes made may require new Firmware maybe for my
CX4 card because its coming from here and I dont like to see pci_err**
called.

static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev,
                                              pci_channel_state_t
state)
{
        struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
        struct mlx5_priv *priv = &dev->priv;

        dev_info(&pdev->dev, "%s was called\n", __func__);

        mlx5_enter_error_state(dev, false);
        mlx5_unload_one(dev, priv, false);
        /* In case of kernel call drain the health wq */
        if (state) {
                mlx5_drain_health_wq(dev);
                mlx5_pci_disable_device(dev);
        }

        return state == pci_channel_io_perm_failure ?
                PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2018-01-10 21:11 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-06  0:22 [PATCH 5/8] infiniband: fix ulp/srpt/ib_srpt.c kernel-doc notation Randy Dunlap
     [not found] ` <5a5016c0.4c0a620a.ed2b3.60da-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2018-01-06  0:36   ` Bart Van Assche
     [not found]     ` <fcc3f226-848d-abc4-2a81-f4fd821761c9-Sjgp3cTcYWE@public.gmane.org>
2018-01-06  5:55       ` Randy Dunlap
     [not found]         ` <31f69352-b8b1-9ed1-635b-2c654b49c775-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2018-01-06 16:50           ` Bart Van Assche
2018-01-09 20:15       ` Laurence Oberman
     [not found]         ` <1515528956.3919.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-09 20:31           ` Laurence Oberman
     [not found]             ` <1515529869.3919.4.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-09 20:51               ` Kernel v4.16 / v4.17 SRP and SRPT patches Bart Van Assche
     [not found]                 ` <1515531079.2721.26.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-09 21:00                   ` Laurence Oberman
     [not found]                     ` <1515531652.26021.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-09 22:40                       ` Laurence Oberman
     [not found]                         ` <1515537614.26021.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 13:42                           ` Laurence Oberman
     [not found]                             ` <1515591723.26021.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 18:26                               ` Jason Gunthorpe
     [not found]                                 ` <20180110182648.GI4518-uk2M96/98Pc@public.gmane.org>
2018-01-10 18:40                                   ` Bart Van Assche
     [not found]                                     ` <1515609623.2745.20.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-10 18:59                                       ` Laurence Oberman
     [not found]                                         ` <1515610750.10153.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 19:15                                           ` Jason Gunthorpe
     [not found]                                             ` <20180110191510.GK4518-uk2M96/98Pc@public.gmane.org>
2018-01-10 19:30                                               ` Laurence Oberman
     [not found]                                                 ` <1515612639.10153.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 20:52                                                   ` Jason Gunthorpe
     [not found]                                                     ` <20180110205243.GP4776-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-10 21:11                                                       ` Laurence Oberman [this message]
     [not found]                                                         ` <1515618674.10153.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 21:15                                                           ` Jason Gunthorpe
     [not found]                                                             ` <20180110211501.GS4776-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-11 13:02                                                               ` Laurence Oberman
     [not found]                                                                 ` <1515675741.21421.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-11 18:20                                                                   ` Laurence Oberman
     [not found]                                                                     ` <1515694855.21421.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-11 18:35                                                                       ` Patch: RDMA mlx5_core.c : mlx5_try_fast_unload causes panics Laurence Oberman
2018-01-11 20:43                                                                   ` Kernel v4.16 / v4.17 SRP and SRPT patches Laurence Oberman
     [not found]                                                                     ` <1515703435.21421.9.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-11 21:15                                                                       ` Bart Van Assche
     [not found]                                                                         ` <1515705340.2752.60.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-11 21:33                                                                           ` Laurence Oberman
     [not found]                                                                             ` <1515706433.21421.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-11 21:43                                                                               ` Bart Van Assche
2018-01-12 21:11                                                                               ` Bart Van Assche
     [not found]                                                                                 ` <1515791472.2396.57.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-13  0:09                                                                                   ` Laurence Oberman
     [not found]                                                                                     ` <1515802177.1566.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-13  1:57                                                                                       ` Laurence Oberman
     [not found]                                                                                         ` <1515808673.11354.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-13 14:53                                                                                           ` Laurence Oberman
     [not found]                                                                                             ` <1515855226.32050.1.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-15 16:12                                                                                               ` Bart Van Assche
     [not found]                                                                                                 ` <1516032762.3951.5.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-15 16:52                                                                                                   ` Laurence Oberman
2018-01-10 21:17                                                           ` Laurence Oberman
2018-01-10 19:17                                       ` Jason Gunthorpe
     [not found]                                         ` <20180110191758.GL4518-uk2M96/98Pc@public.gmane.org>
2018-01-10 19:32                                           ` Bart Van Assche
     [not found]                                             ` <1515612733.2745.27.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-10 22:43                                               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1515618674.10153.6.camel@redhat.com \
    --to=loberman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=Bart.VanAssche-Sjgp3cTcYWE@public.gmane.org \
    --cc=ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=jgg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox