linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sagi Grimberg <sagi@grimberg.me>, Max Gurtovoy <maxg@nvidia.com>,
	<linux-rdma@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Doug Ledford <dledford@redhat.com>
Subject: Re: [PATCH] IB/iser: Remove in_interrupt() usage.
Date: Fri, 27 Nov 2020 09:03:14 -0400	[thread overview]
Message-ID: <20201127130314.GE552508@nvidia.com> (raw)
In-Reply-To: <20201127123455.scnqc7xvuwwofdp2@linutronix.de>

On Fri, Nov 27, 2020 at 01:34:55PM +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-26 16:53:57 [-0400], Jason Gunthorpe wrote:
> > > +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
> > > @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task,
> > >  	struct iser_device *device = iser_conn->ib_conn.device;
> > >  	struct iscsi_iser_task *iser_task = task->dd_data;
> > >  	u64 dma_addr;
> > > -	const bool mgmt_task = !task->sc && !in_interrupt();
> > >  	int ret = 0;
> > 
> > Why do you think the task->sc doesn't matter?
> 
> Based on the call paths I checked, there was no evidence that
> state_mutex can be acquired. If I remove locking here then `mgmt_task'
> is no longer needed.

That only says there is no recursive deadlock..

> How should task->sc matter?

I was able to get the internal bug report that caused the
7414dde0a6c3a commit.

The issue here is that the state_mutex is protecting 

This:

	if (unlikely(iser_conn->state != ISER_CONN_UP)) {

Which indicates that this:

        dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc,

Won't crash because iser_con->ib_con is invalid. The notes say that
the iSCSI stack is in some state where data traffic won't flow but
management traffic is still possible. I suppose this is some fast path
so it was "optimized" to eliminate the lock for data traffic.

A call chain of interest for the lock at least is:

Nov  3 12:24:37 rsws10 BUG: unable to handle kernel 
Nov  3 12:24:37 NULL pointer dereference
Nov  3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF          O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784
[..]
Nov  3 12:24:37 rsws10  [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser]
Nov  3 12:24:37 rsws10  [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi]
Nov  3 12:24:37 rsws10  [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi]
Nov  3 12:24:37 rsws10  [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi]
Nov  3 12:24:37 rsws10  [<ffffffff813c2694>] ? scsi_send_eh_cmnd+0xd4/0x3a0
Nov  3 12:24:37 rsws10  [<ffffffff810c39df>] ? module_refcount+0x9f/0xc0
Nov  3 12:24:37 rsws10  [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi]
Nov  3 12:24:37 rsws10  [<ffffffff813c3119>] scsi_eh_bus_device_reset+0xb9/0x1e0
Nov  3 12:24:37 rsws10  [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff813c3cbe>] scsi_eh_ready_devs+0x5e/0x110
Nov  3 12:24:37 rsws10  [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff813c3e5d>] scsi_unjam_host+0xed/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff813c40c8>] scsi_error_handler+0x168/0x1c0
Nov  3 12:24:37 rsws10  [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff81082a6e>] kthread+0xce/0xe0
Nov  3 12:24:37 rsws10  [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70
Nov  3 12:24:37 rsws10  [<ffffffff8159b66c>] ret_from_fork+0x7c/0xb0
Nov  3 12:24:37 rsws10  [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70

So, I think the usual 'pass in atomic context flag' is really needed
here

Jason

  reply	other threads:[~2020-11-27 13:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-26 20:27 [PATCH] IB/iser: Remove in_interrupt() usage Sebastian Andrzej Siewior
2020-11-26 20:53 ` Jason Gunthorpe
2020-11-27 12:34   ` Sebastian Andrzej Siewior
2020-11-27 13:03     ` Jason Gunthorpe [this message]
2020-11-27 14:14       ` Sebastian Andrzej Siewior
2020-11-27 14:31         ` Jason Gunthorpe
2020-12-03 13:56           ` Sebastian Andrzej Siewior
2020-12-03 19:30             ` Sagi Grimberg
2020-12-04 17:42               ` [PATCH v2] " Sebastian Andrzej Siewior
2020-12-07 20:32                 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201127130314.GE552508@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=bigeasy@linutronix.de \
    --cc=dledford@redhat.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maxg@nvidia.com \
    --cc=sagi@grimberg.me \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).