* [PATCH v9 0/2] libsas: rediscover improvements for linkrate/sas_addr changes
@ 2026-06-24 6:32 Xingui Yang
2026-06-24 6:32 ` [PATCH v9 1/2] scsi: libsas: refactor sas_ex_to_ata() using new helper sas_ex_to_dev() Xingui Yang
2026-06-24 6:32 ` [PATCH v9 2/2] scsi: libsas: Add linkrate and sas_addr change detection in rediscover Xingui Yang
0 siblings, 2 replies; 5+ messages in thread
From: Xingui Yang @ 2026-06-24 6:32 UTC (permalink / raw)
To: john.g.garry, yanaijie, jejb, martin.petersen
Cc: linux-scsi, linux-kernel, linuxarm, liyihang9, yangxingui,
liuyonglong, kangfenglong
When a device attached to an expander phy experiences a linkrate change
(e.g., due to cable reconnection or negotiation), the current code in
sas_rediscover_dev() treats it as "broadcast flutter" and takes no action
if the SAS address and device type remain unchanged.
This series is based on John Garry's suggestion [1] to check the linkrate
and mark the device as gone and rediscover when flutter occurs, replacing
the previous v2 patch series that used lldd callbacks.
The previous v2 approach added lldd_dev_info_update callback which John
commented as "seem fragile and too specialized" [2]. This series adopts
a simpler approach that directly checks linkrate/sas_addr changes in
sas_rediscover_dev() and triggers rediscovery using libsas's standard
async discovery pattern.
This aligns with Jason Yan's earlier work [3] which was verified to
solve the linkrate change issue.
Additionally, per the discussion in v3 [4], the existing replace code
path also suffers from the same sysfs duplication issue:
sas_unregister_devs_sas_addr() only marks the device as gone, but the
actual sysfs cleanup happens later in sas_destruct_devices(). Calling
sas_discover_new() immediately after unregister causes sysfs_warn_dup()
errors. This series also optimizes the replace path to use the async
pattern, ensuring proper ordering for both flutter and replace cases.
Changes from v8:
- In sas_dev_is_flutter(), call sas_ex_phy_discover() before
sas_ex_to_dev() to ensure PHY state is always updated and to avoid
use-after-free since the child device pointer is obtained after the
sleeping SMP request completes, eliminating the need for kref
- Addressed Sashiko AI review [5][6] feedback on PHY discovery bypass
and TOCTOU concerns
Changes from v7:
Addressed issues identified by Sashiko AI review [5][6]:
- In sas_dev_is_flutter(), reorder sas_addr check before linkrate check
to ensure address restoration is not skipped when both change
simultaneously, preventing device leak
- In sas_ex_to_dev(), add defensive NULL check for ex_dev to guard
against callers passing a NULL device
Not addressed (pre-existing subsystem design):
- sas_find_dev_by_rphy() returns unreferenced pointer: subsystem-wide
pattern used by 10+ call sites, should be a separate patch
- ex_phy->port TOCTOU: discovery path is serialized by disco_mutex,
no race occurs in practice
Changes from v6:
- Add comment for restoring phy->attached_sas_addr to child_dev->sas_addr
- Optimize the conditional structure in sas_dev_is_flutter()
Changes from v5:
- In sas_addr change handling, restore phy->attached_sas_addr to
child_dev->sas_addr before returning false, ensuring
sas_unregister_devs_sas_addr() can properly match the device via
sas_phy_match_dev_addr() for correct device unregistration
Changes from v4:
- Rename sas_rediscover_phy to sas_rediscover_ex_phy for consistency
with expander phy symbol naming convention
- Rename sas_is_flutter to sas_dev_is_flutter per John's suggestion
- Check return value of sas_ex_phy_discover() for errors
- Factor out child_dev checks to improve code clarity
Changes from v3:
- Also optimize the replace code path to use async discovery pattern
- Introduce sas_is_flutter() and sas_rediscover_phy() helpers
to encapsulate the flutter handling logic and avoid function bloat
- Fix replace code path sysfs duplication issue
Changes from v2:
- Drop lldd_dev_info_update callback approach per John Garry's suggestion
- Drop hisi_sas specific changes (no longer needed without callback)
- Use libsas's async discovery pattern for rediscovery
- Add sas_addr change detection alongside linkrate change
Changes from v1:
- Split into three patches
[1] https://lore.kernel.org/linux-scsi/c4e4c99f-a13c-4e28-8650-48be1f96d7cf@oracle.com/
[2] https://lore.kernel.org/linux-scsi/28bd9d5b-f597-0aae-5340-bd951b2083aa@huawei.com/
[3] https://lore.kernel.org/linux-scsi/20190130082412.9357-6-yanaijie@huawei.com/
[4] https://lore.kernel.org/linux-scsi/b99cd59f-b986-432e-aaf1-3b757e1c4c34@oracle.com/
[5] https://lore.kernel.org/linux-scsi/20260611062530.3B6651F00898@smtp.kernel.org/
[6] https://lore.kernel.org/linux-scsi/20260611062833.357031F00893@smtp.kernel.org/
Xingui Yang (2):
scsi: libsas: refactor sas_ex_to_ata() using new helper
sas_ex_to_dev()
scsi: libsas: Add linkrate and sas_addr change detection in rediscover
drivers/scsi/libsas/sas_expander.c | 102 +++++++++++++++++++++++------
drivers/scsi/libsas/sas_internal.h | 1 +
2 files changed, 84 insertions(+), 19 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v9 1/2] scsi: libsas: refactor sas_ex_to_ata() using new helper sas_ex_to_dev() 2026-06-24 6:32 [PATCH v9 0/2] libsas: rediscover improvements for linkrate/sas_addr changes Xingui Yang @ 2026-06-24 6:32 ` Xingui Yang 2026-06-24 6:47 ` sashiko-bot 2026-06-24 6:32 ` [PATCH v9 2/2] scsi: libsas: Add linkrate and sas_addr change detection in rediscover Xingui Yang 1 sibling, 1 reply; 5+ messages in thread From: Xingui Yang @ 2026-06-24 6:32 UTC (permalink / raw) To: john.g.garry, yanaijie, jejb, martin.petersen Cc: linux-scsi, linux-kernel, linuxarm, liyihang9, yangxingui, liuyonglong, kangfenglong Introduce sas_ex_to_dev() to return any device type attached to an expander phy. The new helper is then used by sas_ex_to_ata() to reduce code duplication. Also add a defensive NULL check for ex_dev to guard against callers passing a NULL device. Signed-off-by: Xingui Yang <yangxingui@huawei.com> --- drivers/scsi/libsas/sas_expander.c | 19 ++++++++++++++----- drivers/scsi/libsas/sas_internal.h | 1 + 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index f471ab464a78..fc6d8f3c9dca 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -345,13 +345,16 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, SAS_ADDR(phy->attached_sas_addr), type); } -/* check if we have an existing attached ata device on this expander phy */ -struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id) +/* return the domain device attached to an expander phy */ +struct domain_device *sas_ex_to_dev(struct domain_device *ex_dev, int phy_id) { - struct ex_phy *ex_phy = &ex_dev->ex_dev.ex_phy[phy_id]; - struct domain_device *dev; + struct ex_phy *ex_phy; struct sas_rphy *rphy; + if (!ex_dev) + return NULL; + + ex_phy = &ex_dev->ex_dev.ex_phy[phy_id]; if (!ex_phy->port) return NULL; @@ -359,7 +362,13 @@ struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id) if (!rphy) return NULL; - dev = sas_find_dev_by_rphy(rphy); + return sas_find_dev_by_rphy(rphy); +} + +/* check if we have an existing attached ata device on this expander phy */ +struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id) +{ + struct domain_device *dev = sas_ex_to_dev(ex_dev, phy_id); if (dev && dev_is_sata(dev)) return dev; diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h index 7dce0f587149..350a70484bde 100644 --- a/drivers/scsi/libsas/sas_internal.h +++ b/drivers/scsi/libsas/sas_internal.h @@ -91,6 +91,7 @@ int sas_smp_get_phy_events(struct sas_phy *phy); void sas_device_set_phy(struct domain_device *dev, struct sas_port *port); struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy); +struct domain_device *sas_ex_to_dev(struct domain_device *ex_dev, int phy_id); struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id); int sas_ex_phy_discover(struct domain_device *dev, int single); int sas_get_report_phy_sata(struct domain_device *dev, int phy_id, -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v9 1/2] scsi: libsas: refactor sas_ex_to_ata() using new helper sas_ex_to_dev() 2026-06-24 6:32 ` [PATCH v9 1/2] scsi: libsas: refactor sas_ex_to_ata() using new helper sas_ex_to_dev() Xingui Yang @ 2026-06-24 6:47 ` sashiko-bot 0 siblings, 0 replies; 5+ messages in thread From: sashiko-bot @ 2026-06-24 6:47 UTC (permalink / raw) To: Xingui Yang; +Cc: linux-scsi Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: Pre-existing issues: - [High] Incomplete defensive NULL check in `sas_ex_to_dev()` defers a kernel panic to `transport_sas_phy_reset()`. - [High] Race condition in `sas_ex_to_dev()` due to unlocked access to `ex_phy->port`. - [High] Returning an un-refcounted `domain_device` pointer exposes callers to a Use-After-Free. -- commit cd70af1a5d33d82f2c6199f5fc26b59a70b7d45c Author: Xingui Yang <yangxingui@huawei.com> scsi: libsas: refactor sas_ex_to_ata() using new helper sas_ex_to_dev() Introduce sas_ex_to_dev() to return any device type attached to an expander phy. The new helper is then used by sas_ex_to_ata() to reduce code duplication. Also add a defensive NULL check for ex_dev to guard against callers passing a NULL device. > diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c > index f471ab464a78b..fc6d8f3c9dcae 100644 > --- a/drivers/scsi/libsas/sas_expander.c > +++ b/drivers/scsi/libsas/sas_expander.c > @@ -345,13 +345,16 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, > SAS_ADDR(phy->attached_sas_addr), type); > } > > -/* check if we have an existing attached ata device on this expander phy */ > -struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id) > +/* return the domain device attached to an expander phy */ > +struct domain_device *sas_ex_to_dev(struct domain_device *ex_dev, int phy_id) > { > - struct ex_phy *ex_phy = &ex_dev->ex_dev.ex_phy[phy_id]; > - struct domain_device *dev; > + struct ex_phy *ex_phy; > struct sas_rphy *rphy; > > + if (!ex_dev) > + return NULL; [Severity: High] This is a pre-existing issue, but does adding this defensive NULL check here fail to protect the caller from the same NULL pointer? When sas_ex_to_ata() safely returns NULL because ex_dev is NULL, could the caller logic fall through and crash by deferring the NULL pointer dereference? For example, in drivers/scsi/libsas/sas_init.c:transport_sas_phy_reset(): ddev = sas_ex_to_ata(ex_dev, phy->number); if (ddev) { ... } else { return sas_smp_phy_control(ddev, phy->number, reset_type, NULL); } Would this result in a kernel panic inside sas_smp_phy_control() if ddev is NULL (e.g., due to concurrent device removal)? > + > + ex_phy = &ex_dev->ex_dev.ex_phy[phy_id]; > if (!ex_phy->port) > return NULL; > > rphy = ex_phy->port->rphy; [Severity: High] This is a pre-existing issue, but is there a race condition here due to unlocked access to ex_phy->port? Device teardown paths can call sas_port_delete(phy->port) and set phy->port to NULL concurrently. Since we access ex_phy->port locklessly and without READ_ONCE(), could the port be freed between the check and the dereference of ex_phy->port->rphy, leading to a panic or use-after-free? > if (!rphy) > return NULL; > > - dev = sas_find_dev_by_rphy(rphy); > + return sas_find_dev_by_rphy(rphy); [Severity: High] This is a pre-existing issue, but does returning the domain_device pointer without incrementing its reference count expose callers to a use-after-free? Looking at drivers/scsi/libsas/sas_scsi_host.c:sas_find_dev_by_rphy(): spin_lock_irqsave(&ha->phy_port_lock, flags); ... found_dev = dev; ... spin_unlock_irqrestore(&ha->phy_port_lock, flags); return found_dev; If a device is concurrently unregistered, sas_put_device() can drop its kref to 0 and call kfree(dev). When sas_ex_to_dev() passes this un-refcounted pointer up to callers, won't they read freed memory when attempting to interact with the device? > +} [ ... ] -- Sashiko AI review · https://sashiko.dev/#/patchset/20260624063230.3264029-1-yangxingui@huawei.com?part=1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v9 2/2] scsi: libsas: Add linkrate and sas_addr change detection in rediscover 2026-06-24 6:32 [PATCH v9 0/2] libsas: rediscover improvements for linkrate/sas_addr changes Xingui Yang 2026-06-24 6:32 ` [PATCH v9 1/2] scsi: libsas: refactor sas_ex_to_ata() using new helper sas_ex_to_dev() Xingui Yang @ 2026-06-24 6:32 ` Xingui Yang 2026-06-24 6:53 ` sashiko-bot 1 sibling, 1 reply; 5+ messages in thread From: Xingui Yang @ 2026-06-24 6:32 UTC (permalink / raw) To: john.g.garry, yanaijie, jejb, martin.petersen Cc: linux-scsi, linux-kernel, linuxarm, liyihang9, yangxingui, liuyonglong, kangfenglong Introduce sas_dev_is_flutter() and sas_rediscover_ex_phy() to improve flutter and device replace detection during rediscovery. sas_dev_is_flutter() calls sas_ex_phy_discover() before looking up the child device, which ensures the PHY state is always updated and avoids a use-after-free since the child device pointer is obtained after the sleeping SMP request completes. It adds validation for linkrate and sas_addr changes. When the SAS address changes, it restores phy->attached_sas_addr back to the original address before returning false, ensuring sas_unregister_devs_sas_addr() can properly match and unregister the old device via sas_phy_match_dev_addr(). The sas_addr check is ordered before the linkrate check to ensure the address restoration is not skipped when both change simultaneously. sas_rediscover_ex_phy() uses the async discovery pattern (sas_discover_event) instead of the synchronous sas_discover_new() to ensure proper ordering between device unregistration and rediscovery, avoiding sysfs_warn_dup() errors. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Suggested-by: John Garry <john.g.garry@oracle.com> --- drivers/scsi/libsas/sas_expander.c | 83 +++++++++++++++++++++++++----- 1 file changed, 69 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index fc6d8f3c9dca..e27953de2b4e 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -1967,6 +1967,72 @@ static bool dev_type_flutter(enum sas_device_type new, enum sas_device_type old) return false; } +static void sas_rediscover_ex_phy(struct domain_device *dev, int phy_id, + bool last) +{ + struct expander_device *ex = &dev->ex_dev; + struct ex_phy *phy = &ex->ex_phy[phy_id]; + + phy->phy_change_count = -1; + ex->ex_change_count = -1; + sas_unregister_devs_sas_addr(dev, phy_id, last); + sas_discover_event(dev->port, DISCE_REVALIDATE_DOMAIN); +} + +static bool sas_dev_is_flutter(struct domain_device *dev, int phy_id, + u8 *sas_addr, enum sas_device_type type) +{ + struct expander_device *ex = &dev->ex_dev; + struct ex_phy *phy = &ex->ex_phy[phy_id]; + struct domain_device *child_dev; + char *action = ""; + int res; + + if (SAS_ADDR(sas_addr) != SAS_ADDR(phy->attached_sas_addr) || + !dev_type_flutter(type, phy->attached_dev_type)) + return false; + + res = sas_ex_phy_discover(dev, phy_id); + if (res) + return false; + + child_dev = sas_ex_to_dev(dev, phy_id); + if (!child_dev) + goto out; + + if (dev_is_sata(child_dev) && + phy->attached_dev_type == SAS_SATA_PENDING) { + action = ", needs recovery"; + goto out; + } + + if (SAS_ADDR(child_dev->sas_addr) != SAS_ADDR(phy->attached_sas_addr)) { + pr_info("ex %016llx phy%02d sas_addr changed from %016llx to %016llx\n", + SAS_ADDR(dev->sas_addr), phy_id, + SAS_ADDR(child_dev->sas_addr), + SAS_ADDR(phy->attached_sas_addr)); + /* + * Device unregistering relies on address matching. Restore + * attached_sas_addr back to the original address so that the old + * device can be unregistered later + */ + memcpy(phy->attached_sas_addr, child_dev->sas_addr, SAS_ADDR_SIZE); + return false; + } + + if (child_dev->linkrate != phy->linkrate) { + pr_info("ex %016llx phy%02d linkrate changed from %d to %d\n", + SAS_ADDR(dev->sas_addr), phy_id, + child_dev->linkrate, phy->linkrate); + return false; + } + +out: + pr_debug("ex %016llx phy%02d broadcast flutter%s\n", + SAS_ADDR(dev->sas_addr), phy_id, action); + return true; +} + static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last, int sibling) { @@ -2020,27 +2086,16 @@ static int sas_rediscover_dev(struct domain_device *dev, int phy_id, if (res == 0) sas_set_ex_phy(dev, phy_id, disc_resp); goto out_free_resp; - } else if (SAS_ADDR(sas_addr) == SAS_ADDR(phy->attached_sas_addr) && - dev_type_flutter(type, phy->attached_dev_type)) { - struct domain_device *ata_dev = sas_ex_to_ata(dev, phy_id); - char *action = ""; - - sas_ex_phy_discover(dev, phy_id); + } - if (ata_dev && phy->attached_dev_type == SAS_SATA_PENDING) - action = ", needs recovery"; - pr_debug("ex %016llx phy%02d broadcast flutter%s\n", - SAS_ADDR(dev->sas_addr), phy_id, action); + if (sas_dev_is_flutter(dev, phy_id, sas_addr, type)) goto out_free_resp; - } /* we always have to delete the old device when we went here */ pr_info("ex %016llx phy%02d replace %016llx\n", SAS_ADDR(dev->sas_addr), phy_id, SAS_ADDR(phy->attached_sas_addr)); - sas_unregister_devs_sas_addr(dev, phy_id, last); - - res = sas_discover_new(dev, phy_id); + sas_rediscover_ex_phy(dev, phy_id, last); out_free_resp: kfree(disc_resp); return res; -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v9 2/2] scsi: libsas: Add linkrate and sas_addr change detection in rediscover 2026-06-24 6:32 ` [PATCH v9 2/2] scsi: libsas: Add linkrate and sas_addr change detection in rediscover Xingui Yang @ 2026-06-24 6:53 ` sashiko-bot 0 siblings, 0 replies; 5+ messages in thread From: sashiko-bot @ 2026-06-24 6:53 UTC (permalink / raw) To: Xingui Yang; +Cc: linux-scsi Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: Pre-existing issues: - [High] The patch attempts to fix a Use-After-Free by moving `sas_ex_phy_discover()` before the `child_dev` lookup, but `sas_ex_to_dev()` still returns an unprotected, unreferenced pointer, leaving a lockless TOCTOU / UAF race. -- commit c64b715b8d5d34842e349c246a7de0dc1188f077 Author: Xingui Yang <yangxingui@huawei.com> scsi: libsas: Add linkrate and sas_addr change detection in rediscover Introduce sas_dev_is_flutter() and sas_rediscover_ex_phy() to improve flutter and device replace detection during rediscovery. > diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c > index fc6d8f3c9dcae..e27953de2b4e2 100644 > --- a/drivers/scsi/libsas/sas_expander.c > +++ b/drivers/scsi/libsas/sas_expander.c [ ... ] > +static bool sas_dev_is_flutter(struct domain_device *dev, int phy_id, > + u8 *sas_addr, enum sas_device_type type) > +{ > + struct expander_device *ex = &dev->ex_dev; > + struct ex_phy *phy = &ex->ex_phy[phy_id]; > + struct domain_device *child_dev; > + char *action = ""; > + int res; > + > + if (SAS_ADDR(sas_addr) != SAS_ADDR(phy->attached_sas_addr) || > + !dev_type_flutter(type, phy->attached_dev_type)) > + return false; > + > + res = sas_ex_phy_discover(dev, phy_id); > + if (res) > + return false; > + > + child_dev = sas_ex_to_dev(dev, phy_id); > + if (!child_dev) > + goto out; > + > + if (dev_is_sata(child_dev) && > + phy->attached_dev_type == SAS_SATA_PENDING) { [Severity: High] This is a pre-existing issue, but does sas_ex_to_dev() still leave us open to a use-after-free race here? It looks like sas_ex_to_dev() calls sas_find_dev_by_rphy() and returns the unreferenced device pointer without incrementing its reference count via kref_get(). If a hardware link drop event occurs concurrently, the event workqueue could trigger sas_deform_port(). Since sas_deform_port() doesn't acquire the disco_mutex before calling sas_unregister_domain_devices() and sas_destruct_devices(), it can tear down the device and free its memory. Because sas_deform_port() only flushes the disco_q after tearing down the devices, could child_dev be freed by the event workqueue before we dereference it here in dev_is_sata() or in the sas_addr check below? > + action = ", needs recovery"; > + goto out; > + } > + > + if (SAS_ADDR(child_dev->sas_addr) != SAS_ADDR(phy->attached_sas_addr)) { > + pr_info("ex %016llx phy%02d sas_addr changed from %016llx to %016llx\n", > + SAS_ADDR(dev->sas_addr), phy_id, > + SAS_ADDR(child_dev->sas_addr), > + SAS_ADDR(phy->attached_sas_addr)); -- Sashiko AI review · https://sashiko.dev/#/patchset/20260624063230.3264029-1-yangxingui@huawei.com?part=2 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-24 6:53 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-24 6:32 [PATCH v9 0/2] libsas: rediscover improvements for linkrate/sas_addr changes Xingui Yang 2026-06-24 6:32 ` [PATCH v9 1/2] scsi: libsas: refactor sas_ex_to_ata() using new helper sas_ex_to_dev() Xingui Yang 2026-06-24 6:47 ` sashiko-bot 2026-06-24 6:32 ` [PATCH v9 2/2] scsi: libsas: Add linkrate and sas_addr change detection in rediscover Xingui Yang 2026-06-24 6:53 ` sashiko-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox