Re: [PATCH 06/29] ibmvfc: add logic for protocol specific fabric logins

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Tyrel Datwyler" <tyreld@linux.ibm.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH 06/29] ibmvfc: add logic for protocol specific fabric logins
Date: Tue, 23 Jun 2026 01:50:55 +0000	[thread overview]
Message-ID: <20260623015056.4FC3D1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623013035.3436640-7-tyreld@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [High] Host initialization stalls indefinitely if `do_enquiry` is false during NPIV login completion.
- [High] NVMe fabric login is entirely skipped if SCSI fabric login is not required.
- [High] Concurrent target discovery MADs cause a data race and state machine failure, dropping targets.
- [High] Stale `do_scsi_login` and `do_nvme_login` flags persist across NPIV logins, causing unsupported MAD requests.

Pre-existing issues:
- [High] Missing bounds check on VIOS-provided queue counts causes arbitrary out-of-bounds heap write.
- [High] Integer overflow in `max_cmds` leads to SCSI command queue depth under-allocation and negative `can_queue` bounds.
--

commit d57e35c4c58c31909c3240ab400f9b9c8c0b9a8b
Author: Tyrel Datwyler <tyreld@linux.ibm.com>

ibmvfc: add logic for protocol specific fabric logins

This commit adds support for the protocol-specific fabric login flow introduced
by the updated client/VIOS interface. It tracks whether SCSI and NVMe/FC fabric
logins are needed and negotiates queue counts for both protocols. The driver
now issues appropriate fabric login MADs before target discovery begins.

> diff --git a/drivers/scsi/ibmvscsi/ibmvfc-core.c b/drivers/scsi/ibmvscsi/ibmvfc-core.c
> index 6f5e8b3cbfc82..93c32fa162f83 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc-core.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc-core.c
[ ... ]
> @@ -1512,7 +1520,7 @@ static void ibmvfc_set_login_info(struct ibmvfc_host *vhost)
>  	max_cmds = scsi_qdepth + IBMVFC_NUM_INTERNAL_REQ;
>  	if (mq_enabled)
>  		max_cmds += (scsi_qdepth + IBMVFC_NUM_INTERNAL_SUBQ_REQ) *
> -			vhost->scsi_scrqs.desired_queues;
> +			(vhost->scsi_scrqs.desired_queues + vhost->nvme_scrqs.desired_queues);
>  
>  	memset(login_info, 0, sizeof(*login_info));

[Severity: High]
This is a pre-existing issue, but does max_cmds overflow here?

The max_cmds variable is a u16. When mq_enabled is true, adding the nvme
desired queues effectively doubles the multiplier. On systems with high core
counts, this calculation could silently overflow the 65,535 limit of the u16.

If the wrapped value is smaller than IBMVFC_NUM_INTERNAL_REQ, can_queue
becomes negative, which could induce failures during SCSI command allocations.

[ ... ]
> @@ -5058,22 +5201,28 @@ static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
[ ... ]
> -		active_queues = be32_to_cpu(setup->num_scsi_subq_channels);
> -		scrqs->active_queues = active_queues;
> +		scsi->active_queues = be32_to_cpu(setup->num_scsi_subq_channels);
> +		nvme->active_queues = be32_to_cpu(setup->num_nvme_subq_channels);
>  
>  		if (flags & IBMVFC_CHANNELS_CANCELED) {
>  			ibmvfc_dbg(vhost, "Channels Canceled\n");
>  			vhost->using_channels = 0;
> -		} else {
> -			if (active_queues)
> -				vhost->using_channels = 1;
> -			for (i = 0; i < active_queues; i++)
> -				scrqs->scrqs[i].vios_cookie =
> -					be64_to_cpu(setup->channel_handles[i]);
> -
> -			ibmvfc_dbg(vhost, "Using %u channels\n",
> -				   vhost->scsi_scrqs.active_queues);
> +			break;
>  		}
> +
> +		if (scsi->active_queues || nvme->active_queues)
> +			vhost->using_channels = 1;
> +		for (i = 0; i < scsi->active_queues; i++)
> +			scsi->scrqs[i].vios_cookie =
> +				be64_to_cpu(setup->channel_handles[i]);
> +		for (i = 0; i < nvme->active_queues; i++)
> +			nvme->scrqs[i].vios_cookie =
> +				be64_to_cpu(setup->channel_handles[scsi->active_queues + i]);

[Severity: High]
This isn't a bug introduced by this patch, but do these loops need bounds
checking on the VIOS-provided queue counts?

The driver assigns the untrusted num_scsi_subq_channels and
num_nvme_subq_channels values directly to active_queues. The code then uses
these values as boundary limits in the for loops to write into the scrqs
arrays.

Could a buggy or compromised VIOS returning a massive queue count cause an
out-of-bounds array write into the kernel heap?

[ ... ]
> @@ -5088,8 +5237,12 @@ static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
>  		return;
>  	}
>  
> -	ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY);
> -	wake_up(&vhost->work_wait_q);
> +	if (vhost->do_scsi_login) {
> +		ibmvfc_fabric_login_scsi(vhost);
> +	} else {
> +		ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY);
> +		wake_up(&vhost->work_wait_q);
> +	}
>  }

[Severity: High]
Does this bypass NVMe fabric login entirely if SCSI fabric login is not
required?

If a VIOS advertises IBMVFC_SUPPORT_NVMEOF but not IBMVFC_SUPPORT_SCSI,
vhost->do_scsi_login will be false while vhost->do_nvme_login is true.
Because vhost->do_scsi_login is false, execution falls into the else block
and advances the state to QUERY.

Since NVMe fabric login is only chained sequentially from the SCSI login
completion handler, does this prevent NVMe target discovery entirely on
NVMe-only setups?

[ ... ]
> @@ -5280,8 +5442,14 @@ static void ibmvfc_npiv_login_done(struct ibmvfc_event *evt)
>  	vhost->host->can_queue = be32_to_cpu(rsp->max_cmds) - IBMVFC_NUM_INTERNAL_REQ;
>  	vhost->host->max_sectors = npiv_max_sectors;
>  
> -	if (ibmvfc_check_caps(vhost, IBMVFC_CAN_SUPPORT_CHANNELS) && vhost->do_enquiry) {
> -		ibmvfc_channel_enquiry(vhost);
> +
> +	if (ibmvfc_check_caps(vhost, IBMVFC_CAN_SUPPORT_CHANNELS)) {
> +		if (ibmvfc_check_caps(vhost, IBMVFC_SUPPORT_SCSI))
> +			vhost->do_scsi_login = 1;
> +		if (ibmvfc_check_caps(vhost, IBMVFC_SUPPORT_NVMEOF))
> +			vhost->do_nvme_login = 1;

[Severity: High]
Can stale do_scsi_login and do_nvme_login flags persist across NPIV logins
here?

The flags are conditionally set based on NPIV login capabilities, but they are
never cleared before checking. If a VIOS drops these capabilities in a
subsequent NPIV login (for example, following a partition migration to a VIOS
with different support), the flags might erroneously remain set to 1.

Could this cause the driver to issue unsupported IBMVFC_FABRIC_LOGIN MADs to
the new VIOS?

> +		if (vhost->do_enquiry)
> +			ibmvfc_channel_enquiry(vhost);
>  	} else {
>  		vhost->do_enquiry = 0;
>  		ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY);

[Severity: High]
If vhost->do_enquiry is false, does the host initialization stall indefinitely?

The structural change placed the do_enquiry check inside the capabilities
check block. If IBMVFC_CAN_SUPPORT_CHANNELS is true but vhost->do_enquiry is
false (e.g. following a link flap), the driver sets the login flags and
completes execution of the if block.

Because it bypasses the else block, it skips setting IBMVFC_HOST_ACTION_QUERY
and waking the work queue. Does this leave the host state machine permanently
stalled in IBMVFC_HOST_ACTION_INIT_WAIT with no outstanding events to advance
it?

[Severity: High]
Since both protocols are now supported, do concurrent target discovery MADs
cause a data race?

Looking at ibmvfc_discover_targets(), it appears to issue both SCSI and NVMe
target discovery MADs to the VIOS concurrently:

ibmvfc_discover_targets()
    evt = ibmvfc_get_disc_event(&vhost->scsi_scrqs);
    ibmvfc_send_event(evt, vhost, default_timeout);

    evt = ibmvfc_get_disc_event(&vhost->nvme_scrqs);
    ibmvfc_send_event(evt, vhost, default_timeout);

When the first MAD completes, ibmvfc_discover_targets_done() immediately
advances the host action to ALLOC_TGTS and wakes the allocation worker
thread.

Does the worker thread iterate over both discovery buffers while the second
MAD is still in flight? Additionally, when the slower MAD completes, its
request to set the host action is ignored because the state is no longer
INIT_WAIT, potentially dropping the targets from the slower discovery
operation.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623013035.3436640-1-tyreld@linux.ibm.com?part=6

next prev parent reply	other threads:[~2026-06-23  1:50 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23  1:30 [PATCH 00/29] ibmvfc: Add NVMe-FC support Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 01/29] ibmvfc: move target list from host to protocol specific channel groups Tyrel Datwyler
2026-06-23  1:49   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 02/29] ibmvfc: add NVMe/FC protocol interface definitions Tyrel Datwyler
2026-06-23  1:54   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 03/29] ibmvfc: split NVMe support into separate source file and add transport stubs Tyrel Datwyler
2026-06-23  1:50   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 04/29] ibmvfc: initialize NVMe channel configuration during driver probe Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 05/29] ibmvfc: alloc/dealloc sub-queues for nvme channels Tyrel Datwyler
2026-06-23  1:55   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 06/29] ibmvfc: add logic for protocol specific fabric logins Tyrel Datwyler
2026-06-23  1:50   ` sashiko-bot [this message]
2026-06-23  1:30 ` [PATCH 07/29] ibmvfc: add wrapper to get vhost associated with a channel struct Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 08/29] ibmvfc: add helper for creating protocol specific discovery event Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 09/29] ibmvfc: add helper to check NVMe/FC support with active channels Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 10/29] ibmvfc: allocate and free NVMe channel group discover buffer Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 11/29] ibmvfc: send NVMe target discovery MAD Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 12/29] ibmvfc: add NVMe/FC Implicit Logout and Move Login support Tyrel Datwyler
2026-06-23  1:49   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 13/29] ibmvfc: add NVMe/FC Port " Tyrel Datwyler
2026-06-23  1:53   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 14/29] ibmvfc: add NVMe/FC Process " Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 15/29] ibmvfc: add NVMe/FC Query Target support Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 16/29] ibmvfc: allocate targets based on protocol Tyrel Datwyler
2026-06-23  1:56   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 17/29] ibmvfc: delete NVMe/FC targets as well as SCSI Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 18/29] ibmvfc: update state machine to process NVMe/FC targets Tyrel Datwyler
2026-06-23  1:55   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 19/29] ibmvfc: implement NVMe/FC stubs for local/remote port registration Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 20/29] ibmvfc: register local nvme fc port after fabric login Tyrel Datwyler
2026-06-23  1:57   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 21/29] ibmvfc: process NVMe/FC rports in work thread Tyrel Datwyler
2026-06-23  2:00   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 22/29] ibmvfc: extend ibmvfc_debug visibility to ibmvfc-nvme.h Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 23/29] ibmvfc: declare global function definitions Tyrel Datwyler
2026-06-23  2:04   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 24/29] ibmvfc: implement LLDD callbacks for mapping nvme-fc queues Tyrel Datwyler
2026-06-23  2:05   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 25/29] ibmvfc: implement nvme-fc LS submission transport callback Tyrel Datwyler
2026-06-23  2:08   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 26/29] ibmvfc: implement nvme-fc IO command submission callback Tyrel Datwyler
2026-06-23  2:09   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 27/29] ibmvfc: implement nvme-fc LS abort handling callback Tyrel Datwyler
2026-06-23  2:09   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 28/29] ibmvfc: implement nvme-fc FCP abort callback Tyrel Datwyler
2026-06-23  2:05   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 29/29] ibmvfc: fail nvme-fc fcp-io and ls requests during transport reset Tyrel Datwyler
2026-06-23  2:04   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623015056.4FC3D1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.