All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jinpu Wang <jinpu.wang@ionos.com>
Cc: netdev <netdev@vger.kernel.org>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	Moshe Shemesh <moshe@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>,
	Maor Gottlieb <maorg@nvidia.com>, Shay Drory <shayd@nvidia.com>
Subject: Re: [BUG] mlx5_core general protection fault in mlx5_cmd_comp_handler
Date: Thu, 13 Oct 2022 13:27:38 +0300	[thread overview]
Message-ID: <Y0foGrlwnYX8lJX2@unreal> (raw)
In-Reply-To: <CAMGffEmFCgKv-6XNXjAKzr5g6TtT_=wj6H62AdGCUXx4hruxBQ@mail.gmail.com>

On Thu, Oct 13, 2022 at 10:32:55AM +0200, Jinpu Wang wrote:
> On Thu, Oct 13, 2022 at 10:18 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Oct 12, 2022 at 01:55:55PM +0200, Jinpu Wang wrote:
> > > Hi Leon, hi Saeed,
> > >
> > > We have seen crashes during server shutdown on both kernel 5.10 and
> > > kernel 5.15 with GPF in mlx5 mlx5_cmd_comp_handler function.
> > >
> > > All of the crashes point to
> > >
> > > 1606                         memcpy(ent->out->first.data,
> > > ent->lay->out, sizeof(ent->lay->out));
> > >
> > > I guess, it's kind of use after free for ent buffer. I tried to reprod
> > > by repeatedly reboot the testing servers, but no success  so far.
> >
> > My guess is that command interface is not flushed, but Moshe and me
> > didn't see how it can happen.
> >
> >   1206         INIT_DELAYED_WORK(&ent->cb_timeout_work, cb_timeout_handler);
> >   1207         INIT_WORK(&ent->work, cmd_work_handler);
> >   1208         if (page_queue) {
> >   1209                 cmd_work_handler(&ent->work);
> >   1210         } else if (!queue_work(cmd->wq, &ent->work)) {
> >                           ^^^^^^^ this is what is causing to the splat
> >   1211                 mlx5_core_warn(dev, "failed to queue work\n");
> >   1212                 err = -EALREADY;
> >   1213                 goto out_free;
> >   1214         }
> >
> > <...>
> > >
> > > Is this problem known, maybe already fixed?
> >
> > I don't see any missing Fixes that exist in 6.0 and don't exist in 5.5.32.

Sorry it is 5.15.32

> > Is it possible to reproduce this on latest upstream code?
> I haven't been able to reproduce it, as mentioned above, I tried to
> reproduce by simply reboot in loop, no luck yet.
> do you have suggestions to speedup the reproduction?

Maybe try to shutdown during filling command interface.
I think that any query command will do the trick.

> Once I can reproduce, I can also try with kernel 6.0.

It will be great.

Thanks

  reply	other threads:[~2022-10-13 10:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 11:55 [BUG] mlx5_core general protection fault in mlx5_cmd_comp_handler Jinpu Wang
2022-10-13  8:18 ` Leon Romanovsky
2022-10-13  8:32   ` Jinpu Wang
2022-10-13 10:27     ` Leon Romanovsky [this message]
2022-10-17  5:54       ` Jinpu Wang
2022-11-09  9:51         ` Jinpu Wang
2022-11-15  5:14           ` Moshe Shemesh
2022-11-15  5:46             ` Jinpu Wang
2022-11-15 15:08               ` Jinpu Wang
2022-11-15 16:41                 ` Moshe Shemesh
2022-11-21  9:11                   ` Jinpu Wang
2022-11-22  4:31                     ` Moshe Shemesh
2022-11-22  6:08                       ` Jinpu Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0foGrlwnYX8lJX2@unreal \
    --to=leon@kernel.org \
    --cc=jinpu.wang@ionos.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.