linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Brian King <brking@linux.vnet.ibm.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Tyrel Datwyler <tyreld@linux.ibm.com>,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org, james.smart@broadcom.com,
	james.bottomley@hansenpartnership.com, brking@linux.ibm.com,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
Date: Thu, 14 Jan 2021 11:24:35 -0600	[thread overview]
Message-ID: <9c5f7786-cd13-6a49-2d71-d0c438318bcb@linux.vnet.ibm.com> (raw)
In-Reply-To: <20210114012738.GA237540@T590>

On 1/13/21 7:27 PM, Ming Lei wrote:
> On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
>> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
>>> On 1/12/21 2:54 PM, Brian King wrote:
>>>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>>>>> Introduce several new vhost fields for managing MQ state of the adapter
>>>>> as well as initial defaults for MQ enablement.
>>>>>
>>>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>>>>> ---
>>>>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
>>>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
>>>>>  2 files changed, 17 insertions(+)
>>>>>
>>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> index ba95438a8912..9200fe49c57e 100644
>>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>>>>>  	.max_sectors = IBMVFC_MAX_SECTORS,
>>>>>  	.shost_attrs = ibmvfc_attrs,
>>>>>  	.track_queue_depth = 1,
>>>>> +	.host_tagset = 1,
>>>>
>>>> This doesn't seem right. You are setting host_tagset, which means you want a
>>>> shared, host wide, tag set for commands. It also means that the total
>>>> queue depth for the host is can_queue. However, it looks like you are allocating
>>>> max_requests events for each sub crq, which means you are over allocating memory.
>>>
>>> With the shared tagset yes the queue depth for the host is can_queue, but this
>>> also implies that the max queue depth for each hw queue is also can_queue. So,
>>> in the worst case that all commands are queued down the same hw queue we need an
>>> event pool with can_queue commands.
>>>
>>>>
>>>> Looking at this closer, we might have bigger problems. There is a host wide
>>>> max number of commands that the VFC host supports, which gets returned on
>>>> NPIV Login. This value can change across a live migration event.
>>>
>>> From what I understand the max commands can only become less.
>>>
>>>>
>>>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
>>>> can_queue on the scsi_host *after* the tag set has been allocated. This looks
>>>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>>>> we look at can_queue once the tag set is setup, and I'm not seeing a good way
>>>> to dynamically change the host queue depth once the tag set is setup. 
>>>>
>>>> Unless I'm missing something, our best options appear to either be to implement
>>>> our own host wide busy reference counting, which doesn't sound very good, or
>>>> we need to add some API to block / scsi that allows us to dynamically change
>>>> can_queue.
>>>
>>> Changing can_queue won't do use any good with the shared tagset becasue each
>>> queue still needs to be able to queue can_queue number of commands in the worst
>>> case.
>>
>> The issue I'm trying to highlight here is the following scenario:
>>
>> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.
>>
>> 2. On our NPIV login response from the VIOS, we might get a lower value than we
>> initially set in shost->can_queue, so we update it, but nobody ever looks at it
>> again, and we don't have any protection against sending too many commands to the host.
>>
>>
>> Basically, we no longer have any code that ensures we don't send more
>> commands to the VIOS than we are told it supports. According to the architecture,
>> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
>> of a bug on our part.
>>
>> I don't think it was ever clearly defined in the API that a driver can
>> change shost->can_queue after calling scsi_add_host, but up until
>> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
>> it doesn't. 
> 
> Actually it isn't related with commit 6eb045e092ef, because blk_mq_alloc_tag_set()
> uses .can_queue to create driver tag sbitmap and request pool.
> 
> So even thought without 6eb045e092ef, the updated .can_queue can't work
> as expected because the max driver tag depth has been fixed by blk-mq already.

There are two scenarios here. In the scenario of someone increasing can_queue
after the tag set is allocated, I agree, blk-mq will never take advantage
of this. However, in the scenario of someone *decreasing* can_queue after the
tag set is allocated, prior to 6eb045e092ef, the shost->host_busy code provided
this protection.

> 
> What 6eb045e092ef does is just to remove the double check on max
> host-wide allowed commands because that has been respected by blk-mq
> driver tag allocation already.
> 
>>
>> I started looking through drivers that do this, and so far, it looks like the
>> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...
>>
>> We probably need an API that lets us change shost->can_queue dynamically.
> 
> I'd suggest to confirm changing .can_queue is one real usecase.

For ibmvfc, the total number of commands that the scsi host supports is very
much a dynamic value. It can increase and it can decrease. Live migrating
a logical partition from one system to another is the usual cause of
such a capability change. For ibmvfc, at least, this only ever happens
when we've self blocked the host and have sent back all outstanding I/O.

However, looking at other drivers that modify can_queue dynamically, this
doesn't always hold true. Looking at libfc, it looks to dynamically ramp
up and ramp down can_queue based on its ability to handle requests.

There are certainly a number of other drivers that change can_queue
after the tag set has been allocated. Some of these drivers could
likely be changed to avoid doing this, but changing them all will likely
be difficult.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


  reply	other threads:[~2021-01-14 19:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11 23:12 [PATCH v4 00/21] ibmvfc: initial MQ development Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement Tyrel Datwyler
2021-01-12 11:43   ` John Garry
2021-01-12 22:54   ` Brian King
2021-01-13  0:33     ` Tyrel Datwyler
2021-01-13 17:13       ` Brian King
2021-01-14  1:27         ` Ming Lei
2021-01-14 17:24           ` Brian King [this message]
2021-01-15  1:47             ` Ming Lei
2021-01-11 23:12 ` [PATCH v4 02/21] ibmvfc: move event pool init/free routines Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 03/21] ibmvfc: init/free event pool during queue allocation/free Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 04/21] ibmvfc: add size parameter to ibmvfc_init_event_pool Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 05/21] ibmvfc: define hcall wrapper for registering a Sub-CRQ Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 06/21] ibmvfc: add Subordinate CRQ definitions Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 07/21] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 08/21] ibmvfc: add Sub-CRQ IRQ enable/disable routine Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 09/21] ibmvfc: add handlers to drain and complete Sub-CRQ responses Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 10/21] ibmvfc: define Sub-CRQ interrupt handler routine Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 11/21] ibmvfc: map/request irq and register Sub-CRQ interrupt handler Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 12/21] ibmvfc: implement channel enquiry and setup commands Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 13/21] ibmvfc: advertise client support for using hardware channels Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 14/21] ibmvfc: set and track hw queue in ibmvfc_event struct Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 15/21] ibmvfc: send commands down HW Sub-CRQ when channelized Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 16/21] ibmvfc: register Sub-CRQ handles with VIOS during channel setup Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 17/21] ibmvfc: add cancel mad initialization helper Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 18/21] ibmvfc: send Cancel MAD down each hw scsi channel Tyrel Datwyler
2021-01-12 21:46   ` Brian King
2021-01-11 23:12 ` [PATCH v4 19/21] ibmvfc: purge scsi channels after transport loss/reset Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 20/21] ibmvfc: enable MQ and set reasonable defaults Tyrel Datwyler
2021-01-11 23:12 ` [PATCH v4 21/21] ibmvfc: provide modules parameters for MQ settings Tyrel Datwyler
2021-01-13 18:57   ` Brian King
2021-01-13 18:58 ` [PATCH v4 00/21] ibmvfc: initial MQ development Brian King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c5f7786-cd13-6a49-2d71-d0c438318bcb@linux.vnet.ibm.com \
    --to=brking@linux.vnet.ibm.com \
    --cc=brking@linux.ibm.com \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=james.smart@broadcom.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).