* [PATCH v5 0/3] Fix the failure of adding phy with zero-address to port
@ 2023-12-04 12:29 Xingui Yang
2023-12-04 12:29 ` [PATCH v5 1/3] scsi: libsas: Add helper for port add ex_phy Xingui Yang
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Xingui Yang @ 2023-12-04 12:29 UTC (permalink / raw)
To: john.g.garry, yanaijie, jejb, martin.petersen, damien.lemoal
Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66,
kangfenglong
This series is to solve the problem of a BUG() when adding phy with zero
address to a new port.
v4 -> v5
1. Add new helper sas_port_add_ex_phy() based on John's suggestion.
2. Move sas_add_parent_port() to sas_expander.c based on John's suggestion.
3. Update the comments.
v3 -> v4:
1. Update patch title and comments based on John's suggestion.
v2 -> v3:
1. Set ex_dev->parent_port to NULL when the number of PHYs of the parent
port becomes 0.
2. Update the comments.
v1 -> v2:
1. Set ex_phy->port with parent_port when ex_phy is added to the parent
port.
2. Set ex_phy to NULL when free expander.
3. Update the comments.
Xingui Yang (3):
scsi: libsas: Add helper for port add ex_phy
scsi: libsas: Move sas_add_parent_port() to sas_expander.c
scsi: libsas: Fix the failure of adding phy with zero-address to port
drivers/scsi/libsas/sas_expander.c | 34 ++++++++++++++++++++++++------
drivers/scsi/libsas/sas_internal.h | 15 -------------
2 files changed, 28 insertions(+), 21 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v5 1/3] scsi: libsas: Add helper for port add ex_phy 2023-12-04 12:29 [PATCH v5 0/3] Fix the failure of adding phy with zero-address to port Xingui Yang @ 2023-12-04 12:29 ` Xingui Yang 2023-12-04 13:08 ` John Garry 2023-12-04 12:29 ` [PATCH v5 2/3] scsi: libsas: Move sas_add_parent_port() to sas_expander.c Xingui Yang 2023-12-04 12:29 ` [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port Xingui Yang 2 siblings, 1 reply; 10+ messages in thread From: Xingui Yang @ 2023-12-04 12:29 UTC (permalink / raw) To: john.g.garry, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong This moves the process of adding ex_phy to a port into a new helper. Signed-off-by: Xingui Yang <yangxingui@huawei.com> --- drivers/scsi/libsas/sas_expander.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index a2204674b680..1257f90130fb 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -26,6 +26,13 @@ static int sas_configure_phy(struct domain_device *dev, int phy_id, u8 *sas_addr, int include); static int sas_disable_routing(struct domain_device *dev, u8 *sas_addr); +static void sas_port_add_ex_phy(struct sas_port *port, struct ex_phy *ex_phy) +{ + sas_port_add_phy(port, ex_phy->phy); + ex_phy->port = port; + ex_phy->phy_state = PHY_DEVICE_DISCOVERED; +} + /* ---------- SMP task management ---------- */ /* Give it some long enough timeout. In seconds. */ @@ -857,9 +864,7 @@ static bool sas_ex_join_wide_port(struct domain_device *parent, int phy_id) if (!memcmp(phy->attached_sas_addr, ephy->attached_sas_addr, SAS_ADDR_SIZE) && ephy->port) { - sas_port_add_phy(ephy->port, phy->phy); - phy->port = ephy->port; - phy->phy_state = PHY_DEVICE_DISCOVERED; + sas_port_add_ex_phy(ephy->port, phy); return true; } } -- 2.17.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v5 1/3] scsi: libsas: Add helper for port add ex_phy 2023-12-04 12:29 ` [PATCH v5 1/3] scsi: libsas: Add helper for port add ex_phy Xingui Yang @ 2023-12-04 13:08 ` John Garry 0 siblings, 0 replies; 10+ messages in thread From: John Garry @ 2023-12-04 13:08 UTC (permalink / raw) To: Xingui Yang, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong On 04/12/2023 12:29, Xingui Yang wrote: > This moves the process of adding ex_phy to a port into a new helper. > > Signed-off-by: Xingui Yang<yangxingui@huawei.com> > --- Reviewed-by: John Garry <john.g.garry@oracle.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v5 2/3] scsi: libsas: Move sas_add_parent_port() to sas_expander.c 2023-12-04 12:29 [PATCH v5 0/3] Fix the failure of adding phy with zero-address to port Xingui Yang 2023-12-04 12:29 ` [PATCH v5 1/3] scsi: libsas: Add helper for port add ex_phy Xingui Yang @ 2023-12-04 12:29 ` Xingui Yang 2023-12-04 13:17 ` John Garry 2023-12-04 12:29 ` [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port Xingui Yang 2 siblings, 1 reply; 10+ messages in thread From: Xingui Yang @ 2023-12-04 12:29 UTC (permalink / raw) To: john.g.garry, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong Move sas_add_parent_port() to sas_expander.c since it is only used in this file. Signed-off-by: Xingui Yang <yangxingui@huawei.com> --- drivers/scsi/libsas/sas_expander.c | 15 +++++++++++++++ drivers/scsi/libsas/sas_internal.h | 15 --------------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 1257f90130fb..7aa968b85e1e 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -33,6 +33,21 @@ static void sas_port_add_ex_phy(struct sas_port *port, struct ex_phy *ex_phy) ex_phy->phy_state = PHY_DEVICE_DISCOVERED; } +static void sas_add_parent_port(struct domain_device *dev, int phy_id) +{ + struct expander_device *ex = &dev->ex_dev; + struct ex_phy *ex_phy = &ex->ex_phy[phy_id]; + + if (!ex->parent_port) { + ex->parent_port = sas_port_alloc(&dev->rphy->dev, phy_id); + /* FIXME: error handling */ + BUG_ON(!ex->parent_port); + BUG_ON(sas_port_add(ex->parent_port)); + sas_port_mark_backlink(ex->parent_port); + } + sas_port_add_phy(ex->parent_port, ex_phy->phy); +} + /* ---------- SMP task management ---------- */ /* Give it some long enough timeout. In seconds. */ diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h index 3804aef165ad..85948963fb97 100644 --- a/drivers/scsi/libsas/sas_internal.h +++ b/drivers/scsi/libsas/sas_internal.h @@ -189,21 +189,6 @@ static inline void sas_phy_set_target(struct asd_sas_phy *p, struct domain_devic } } -static inline void sas_add_parent_port(struct domain_device *dev, int phy_id) -{ - struct expander_device *ex = &dev->ex_dev; - struct ex_phy *ex_phy = &ex->ex_phy[phy_id]; - - if (!ex->parent_port) { - ex->parent_port = sas_port_alloc(&dev->rphy->dev, phy_id); - /* FIXME: error handling */ - BUG_ON(!ex->parent_port); - BUG_ON(sas_port_add(ex->parent_port)); - sas_port_mark_backlink(ex->parent_port); - } - sas_port_add_phy(ex->parent_port, ex_phy->phy); -} - static inline struct domain_device *sas_alloc_device(void) { struct domain_device *dev = kzalloc(sizeof(*dev), GFP_KERNEL); -- 2.17.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v5 2/3] scsi: libsas: Move sas_add_parent_port() to sas_expander.c 2023-12-04 12:29 ` [PATCH v5 2/3] scsi: libsas: Move sas_add_parent_port() to sas_expander.c Xingui Yang @ 2023-12-04 13:17 ` John Garry 0 siblings, 0 replies; 10+ messages in thread From: John Garry @ 2023-12-04 13:17 UTC (permalink / raw) To: Xingui Yang, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong On 04/12/2023 12:29, Xingui Yang wrote: > Move sas_add_parent_port() to sas_expander.c since it is only used in this > file. > > Signed-off-by: Xingui Yang <yangxingui@huawei.com> Ignoring some comments, below: Reviewed-by: John Garry <john.g.garry@oracle.com> > --- > drivers/scsi/libsas/sas_expander.c | 15 +++++++++++++++ > drivers/scsi/libsas/sas_internal.h | 15 --------------- > 2 files changed, 15 insertions(+), 15 deletions(-) > > diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c > index 1257f90130fb..7aa968b85e1e 100644 > --- a/drivers/scsi/libsas/sas_expander.c > +++ b/drivers/scsi/libsas/sas_expander.c > @@ -33,6 +33,21 @@ static void sas_port_add_ex_phy(struct sas_port *port, struct ex_phy *ex_phy) > ex_phy->phy_state = PHY_DEVICE_DISCOVERED; > } > > +static void sas_add_parent_port(struct domain_device *dev, int phy_id) Can we have "_ex" in the name, like sas_ex_add_parent_port() > +{ > + struct expander_device *ex = &dev->ex_dev; > + struct ex_phy *ex_phy = &ex->ex_phy[phy_id]; > + > + if (!ex->parent_port) { > + ex->parent_port = sas_port_alloc(&dev->rphy->dev, phy_id); > + /* FIXME: error handling */ > + BUG_ON(!ex->parent_port); > + BUG_ON(sas_port_add(ex->parent_port)); how about fixing these at some stage? > + sas_port_mark_backlink(ex->parent_port); > + } > + sas_port_add_phy(ex->parent_port, ex_phy->phy); > +} > + > /* ---------- SMP task management ---------- */ > > /* Give it some long enough timeout. In seconds. */ > diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h > index 3804aef165ad..85948963fb97 100644 > --- a/drivers/scsi/libsas/sas_internal.h > +++ b/drivers/scsi/libsas/sas_internal.h > @@ -189,21 +189,6 @@ static inline void sas_phy_set_target(struct asd_sas_phy *p, struct domain_devic > } > } > > -static inline void sas_add_parent_port(struct domain_device *dev, int phy_id) > -{ > - struct expander_device *ex = &dev->ex_dev; > - struct ex_phy *ex_phy = &ex->ex_phy[phy_id]; > - > - if (!ex->parent_port) { > - ex->parent_port = sas_port_alloc(&dev->rphy->dev, phy_id); > - /* FIXME: error handling */ > - BUG_ON(!ex->parent_port); > - BUG_ON(sas_port_add(ex->parent_port)); > - sas_port_mark_backlink(ex->parent_port); > - } > - sas_port_add_phy(ex->parent_port, ex_phy->phy); > -} > - > static inline struct domain_device *sas_alloc_device(void) > { > struct domain_device *dev = kzalloc(sizeof(*dev), GFP_KERNEL); ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port 2023-12-04 12:29 [PATCH v5 0/3] Fix the failure of adding phy with zero-address to port Xingui Yang 2023-12-04 12:29 ` [PATCH v5 1/3] scsi: libsas: Add helper for port add ex_phy Xingui Yang 2023-12-04 12:29 ` [PATCH v5 2/3] scsi: libsas: Move sas_add_parent_port() to sas_expander.c Xingui Yang @ 2023-12-04 12:29 ` Xingui Yang 2023-12-04 18:05 ` John Garry 2 siblings, 1 reply; 10+ messages in thread From: Xingui Yang @ 2023-12-04 12:29 UTC (permalink / raw) To: john.g.garry, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong When the expander device which attached many SATA disks is connected to the host, first disable and then enable the local phy. The following BUG() will be triggered with a small probability: [562240.051046] sas: phy19 part of wide port with phy16 [562240.051197] sas: ex 500e004aaaaaaa1f phy19:U:0 attached: 0000000000000000 (no device) [562240.051203] sas: done REVALIDATING DOMAIN on port 0, pid:435909, res 0x0 <...> [562240.062536] sas: ex 500e004aaaaaaa1f phy0 new device attached [562240.062616] sas: ex 500e004aaaaaaa1f phy00:U:5 attached: 0000000000000000 (stp) [562240.062680] port-7:7:0: trying to add phy phy-7:7:19 fails: it's already part of another port [562240.085064] ------------[ cut here ]------------ [562240.096612] kernel BUG at drivers/scsi/scsi_transport_sas.c:1083! [562240.109611] Internal error: Oops - BUG: 0 [#1] SMP [562240.343518] Process kworker/u256:3 (pid: 435909, stack limit = 0x0000000003bcbebf) [562240.421714] Workqueue: 0000:b4:02.0_disco_q sas_revalidate_domain [libsas] [562240.437173] pstate: 40c00009 (nZcv daif +PAN +UAO) [562240.450478] pc : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] [562240.465283] lr : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] [562240.479751] sp : ffff0000300cfa70 [562240.674822] Call trace: [562240.682709] sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] [562240.694013] sas_ex_get_linkrate.isra.5+0xcc/0x128 [libsas] [562240.704957] sas_ex_discover_end_dev+0xfc/0x538 [libsas] [562240.715508] sas_ex_discover_dev+0x3cc/0x4b8 [libsas] [562240.725634] sas_ex_discover_devices+0x9c/0x1a8 [libsas] [562240.735855] sas_ex_revalidate_domain+0x2f0/0x450 [libsas] [562240.746123] sas_revalidate_domain+0x158/0x160 [libsas] [562240.756014] process_one_work+0x1b4/0x448 [562240.764548] worker_thread+0x54/0x468 [562240.772562] kthread+0x134/0x138 [562240.779989] ret_from_fork+0x10/0x18 What causes this problem: 1. For phy19, when the phy is attached and added to the parent wide port, the path is: sas_rediscover() ->sas_discover_new() ->sas_ex_discover_devices() ->sas_ex_discover_dev() -> sas_add_parent_port() ex_phy->port was not set and when it is removed from parent wide port the path is: sas_rediscover() ->sas_unregister_devs_sas_addr() Then the sas address of phy19 becomes 0, and since ex_phy->port is NULL, phy19 was not removed from the parent wide port's phy_list. 2. For phy0, it is connected to a new sata device and the path is: sas_rediscover() ->sas_discover_new()->sas_ex_phy_discover() ->sas_ex_phy_discover_helper() ->sas_set_ex_phy() ->sas_ex_discover_devices() ->sas_ex_discover_dev() ->sas_ex_discover_end_dev() ->sas_port_alloc() // Create port-7:7:0 ->sas_ex_get_linkrate() ->sas_port_add_phy() The type of the newly connected device is stp, but the linkrate is 5 which less than 1.5G, then the sas address is set to 0 in sas_set_ex_phy(). Subsequently, a new port port-7:7:0 was created and tried to add phy19 with the same zero-address to this new port. However, phy19 still belongs to another port, then a BUG() was triggered in sas_ex_get_linkrate(). Fix the problem as follows: 1. Use sas_port_add_ex_phy() instead of sas_port_add_phy() when ex_phy is added to the parent port. 2. Set ex_dev->parent_port to NULL when the number of phy on the port becomes 0. 3. When phy->attached_dev_type != NO_DEVICE, do not set the zero address for phy->attached_sas_addr. Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") Fixes: 7d1d86518118 ("[SCSI] libsas: fix false positive 'device attached' conditions") Signed-off-by: Xingui Yang <yangxingui@huawei.com> --- drivers/scsi/libsas/sas_expander.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 7aa968b85e1e..9152152d5e10 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -45,7 +45,7 @@ static void sas_add_parent_port(struct domain_device *dev, int phy_id) BUG_ON(sas_port_add(ex->parent_port)); sas_port_mark_backlink(ex->parent_port); } - sas_port_add_phy(ex->parent_port, ex_phy->phy); + sas_port_add_ex_phy(ex->parent_port, ex_phy); } /* ---------- SMP task management ---------- */ @@ -261,8 +261,7 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, /* help some expanders that fail to zero sas_address in the 'no * device' case */ - if (phy->attached_dev_type == SAS_PHY_UNUSED || - phy->linkrate < SAS_LINK_RATE_1_5_GBPS) + if (phy->attached_dev_type == SAS_PHY_UNUSED) memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE); else memcpy(phy->attached_sas_addr, dr->attached_sas_addr, SAS_ADDR_SIZE); @@ -1864,9 +1863,12 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent, if (phy->port) { sas_port_delete_phy(phy->port, phy->phy); sas_device_set_phy(found, phy->port); - if (phy->port->num_phys == 0) + if (phy->port->num_phys == 0) { list_add_tail(&phy->port->del_list, &parent->port->sas_port_del_list); + if (ex_dev->parent_port == phy->port) + ex_dev->parent_port = NULL; + } phy->port = NULL; } } -- 2.17.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port 2023-12-04 12:29 ` [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port Xingui Yang @ 2023-12-04 18:05 ` John Garry 2023-12-05 13:22 ` yangxingui 0 siblings, 1 reply; 10+ messages in thread From: John Garry @ 2023-12-04 18:05 UTC (permalink / raw) To: Xingui Yang, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong On 04/12/2023 12:29, Xingui Yang wrote: > When the expander device which attached many SATA disks is connected to > the host, first disable and then enable the local phy. The following BUG() > will be triggered with a small probability: > > [562240.051046] sas: phy19 part of wide port with phy16 Please use code from latest kernel. This again seems to be the old comment format. > [562240.051197] sas: ex 500e004aaaaaaa1f phy19:U:0 attached: 0000000000000000 (no device) The log at 562240.051046 tells that phy19 formed a wideport with phy16, but then here we see that phy19 has attached SAS address 0. How did we form a wideport with a phy with sas address 0? Sorry if I asked this before, but I looked through the thread and it is not clear. > [562240.051203] sas: done REVALIDATING DOMAIN on port 0, pid:435909, res 0x0 > <...> > [562240.062536] sas: ex 500e004aaaaaaa1f phy0 new device attached > [562240.062616] sas: ex 500e004aaaaaaa1f phy00:U:5 attached: 0000000000000000 (stp) > [562240.062680] port-7:7:0: trying to add phy phy-7:7:19 fails: it's already part of another port > [562240.085064] ------------[ cut here ]------------ > [562240.096612] kernel BUG at drivers/scsi/scsi_transport_sas.c:1083! > [562240.109611] Internal error: Oops - BUG: 0 [#1] SMP > [562240.343518] Process kworker/u256:3 (pid: 435909, stack limit = 0x0000000003bcbebf) > [562240.421714] Workqueue: 0000:b4:02.0_disco_q sas_revalidate_domain [libsas] > [562240.437173] pstate: 40c00009 (nZcv daif +PAN +UAO) > [562240.450478] pc : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] > [562240.465283] lr : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] > [562240.479751] sp : ffff0000300cfa70 > [562240.674822] Call trace: > [562240.682709] sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] > [562240.694013] sas_ex_get_linkrate.isra.5+0xcc/0x128 [libsas] > [562240.704957] sas_ex_discover_end_dev+0xfc/0x538 [libsas] > [562240.715508] sas_ex_discover_dev+0x3cc/0x4b8 [libsas] > [562240.725634] sas_ex_discover_devices+0x9c/0x1a8 [libsas] > [562240.735855] sas_ex_revalidate_domain+0x2f0/0x450 [libsas] > [562240.746123] sas_revalidate_domain+0x158/0x160 [libsas] > [562240.756014] process_one_work+0x1b4/0x448 > [562240.764548] worker_thread+0x54/0x468 > [562240.772562] kthread+0x134/0x138 > [562240.779989] ret_from_fork+0x10/0x18 > > What causes this problem: > 1. For phy19, when the phy is attached and added to the parent wide port, > the path is: > sas_rediscover() > ->sas_discover_new() > ->sas_ex_discover_devices() > ->sas_ex_discover_dev() > -> sas_add_parent_port() > > ex_phy->port was not set and when it is removed from parent wide port the > path is: > sas_rediscover() > ->sas_unregister_devs_sas_addr() Sorry, but that is not a callpath. Maybe you condensed it. Please expand it. > > Then the sas address of phy19 becomes 0, and since ex_phy->port is NULL, > phy19 was not removed from the parent wide port's phy_list. > > 2. For phy0, it is connected to a new sata device and the path is: > sas_rediscover() > ->sas_discover_new()->sas_ex_phy_discover() > ->sas_ex_phy_discover_helper() > ->sas_set_ex_phy() > ->sas_ex_discover_devices() > ->sas_ex_discover_dev() > ->sas_ex_discover_end_dev() > ->sas_port_alloc() // Create port-7:7:0 > ->sas_ex_get_linkrate() > ->sas_port_add_phy() > > The type of the newly connected device is stp, but the linkrate is 5 which > less than 1.5G, then the sas address is set to 0 in sas_set_ex_phy(). I don't understand why we do anything when in this state. linkrate == 5 means phy reset in progress. Can we just bail out until the SATA phy is in a decent shape? I assume that when the SATA phy is in "up" state that we get a broadcast event and can re-evaluate. > Subsequently, a new port port-7:7:0 was created and tried to add phy19 with > the same zero-address to this new port. However, phy19 still belongs to > another port, then a BUG() was triggered in sas_ex_get_linkrate(). > > Fix the problem as follows: > 1. Use sas_port_add_ex_phy() instead of sas_port_add_phy() when ex_phy is > added to the parent port. this seems ok > > 2. Set ex_dev->parent_port to NULL when the number of phy on the port > becomes 0. > > 3. When phy->attached_dev_type != NO_DEVICE, do not set the zero address > for phy->attached_sas_addr. > > Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") > Fixes: 7d1d86518118 ("[SCSI] libsas: fix false positive 'device attached' conditions") > Signed-off-by: Xingui Yang <yangxingui@huawei.com> > --- > drivers/scsi/libsas/sas_expander.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c > index 7aa968b85e1e..9152152d5e10 100644 > --- a/drivers/scsi/libsas/sas_expander.c > +++ b/drivers/scsi/libsas/sas_expander.c > @@ -45,7 +45,7 @@ static void sas_add_parent_port(struct domain_device *dev, int phy_id) > BUG_ON(sas_port_add(ex->parent_port)); > sas_port_mark_backlink(ex->parent_port); > } > - sas_port_add_phy(ex->parent_port, ex_phy->phy); > + sas_port_add_ex_phy(ex->parent_port, ex_phy); > } > > /* ---------- SMP task management ---------- */ > @@ -261,8 +261,7 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, > /* help some expanders that fail to zero sas_address in the 'no > * device' case > */ Please pay attention to this comment. It seems that some expanders require us to explicitly zero the SAS address. > - if (phy->attached_dev_type == SAS_PHY_UNUSED || > - phy->linkrate < SAS_LINK_RATE_1_5_GBPS) > + if (phy->attached_dev_type == SAS_PHY_UNUSED) > memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE); > else > memcpy(phy->attached_sas_addr, dr->attached_sas_addr, SAS_ADDR_SIZE); > @@ -1864,9 +1863,12 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent, > if (phy->port) { > sas_port_delete_phy(phy->port, phy->phy); > sas_device_set_phy(found, phy->port); > - if (phy->port->num_phys == 0) > + if (phy->port->num_phys == 0) { > list_add_tail(&phy->port->del_list, > &parent->port->sas_port_del_list); > + if (ex_dev->parent_port == phy->port) > + ex_dev->parent_port = NULL; This does not feel like the right place to do this. So the port which we queue to free is the ex_dev->parent_port, right? BTW, do you know why it's called ex_dev->parent_port and not ex_dev->port? I find the name parent_port confusing... Thanks, John ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port 2023-12-04 18:05 ` John Garry @ 2023-12-05 13:22 ` yangxingui 2023-12-05 18:13 ` John Garry 0 siblings, 1 reply; 10+ messages in thread From: yangxingui @ 2023-12-05 13:22 UTC (permalink / raw) To: John Garry, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong Hi, John On 2023/12/5 2:05, John Garry wrote: > On 04/12/2023 12:29, Xingui Yang wrote: >> When the expander device which attached many SATA disks is connected to >> the host, first disable and then enable the local phy. The following >> BUG() >> will be triggered with a small probability: >> >> [562240.051046] sas: phy19 part of wide port with phy16 > > Please use code from latest kernel. This again seems to be the old > comment format. Ok. > >> [562240.051197] sas: ex 500e004aaaaaaa1f phy19:U:0 attached: >> 0000000000000000 (no device) > > The log at 562240.051046 tells that phy19 formed a wideport with phy16, > but then here we see that phy19 has attached SAS address 0. How did we > form a wideport with a phy with sas address 0? Sorry if I asked this > before, but I looked through the thread and it is not clear. Ok, the early address of phy19 is not 0, and forms a wide port with phy16. But now phy19 has been unregistered and the sas address of phy19 is set to 0. > >> [562240.051203] sas: done REVALIDATING DOMAIN on port 0, pid:435909, >> res 0x0 >> <...> >> [562240.062536] sas: ex 500e004aaaaaaa1f phy0 new device attached >> [562240.062616] sas: ex 500e004aaaaaaa1f phy00:U:5 attached: >> 0000000000000000 (stp) >> [562240.062680] port-7:7:0: trying to add phy phy-7:7:19 fails: it's >> already part of another port >> [562240.085064] ------------[ cut here ]------------ >> [562240.096612] kernel BUG at drivers/scsi/scsi_transport_sas.c:1083! >> [562240.109611] Internal error: Oops - BUG: 0 [#1] SMP >> [562240.343518] Process kworker/u256:3 (pid: 435909, stack limit = >> 0x0000000003bcbebf) >> [562240.421714] Workqueue: 0000:b4:02.0_disco_q sas_revalidate_domain >> [libsas] >> [562240.437173] pstate: 40c00009 (nZcv daif +PAN +UAO) >> [562240.450478] pc : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >> [562240.465283] lr : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >> [562240.479751] sp : ffff0000300cfa70 >> [562240.674822] Call trace: >> [562240.682709] sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >> [562240.694013] sas_ex_get_linkrate.isra.5+0xcc/0x128 [libsas] >> [562240.704957] sas_ex_discover_end_dev+0xfc/0x538 [libsas] >> [562240.715508] sas_ex_discover_dev+0x3cc/0x4b8 [libsas] >> [562240.725634] sas_ex_discover_devices+0x9c/0x1a8 [libsas] >> [562240.735855] sas_ex_revalidate_domain+0x2f0/0x450 [libsas] >> [562240.746123] sas_revalidate_domain+0x158/0x160 [libsas] >> [562240.756014] process_one_work+0x1b4/0x448 >> [562240.764548] worker_thread+0x54/0x468 >> [562240.772562] kthread+0x134/0x138 >> [562240.779989] ret_from_fork+0x10/0x18 >> >> What causes this problem: >> 1. For phy19, when the phy is attached and added to the parent wide port, >> the path is: >> sas_rediscover() >> ->sas_discover_new() >> ->sas_ex_discover_devices() >> ->sas_ex_discover_dev() >> -> sas_add_parent_port() >> >> ex_phy->port was not set and when it is removed from parent wide port the >> path is: >> sas_rediscover() >> ->sas_unregister_devs_sas_addr() > > > Sorry, but that is not a callpath. Maybe you condensed it. Please expand > it. Ok. > >> >> Then the sas address of phy19 becomes 0, and since ex_phy->port is NULL, >> phy19 was not removed from the parent wide port's phy_list. >> >> 2. For phy0, it is connected to a new sata device and the path is: >> sas_rediscover() >> ->sas_discover_new()->sas_ex_phy_discover() >> ->sas_ex_phy_discover_helper() >> ->sas_set_ex_phy() >> ->sas_ex_discover_devices() >> ->sas_ex_discover_dev() >> ->sas_ex_discover_end_dev() >> ->sas_port_alloc() // Create >> port-7:7:0 >> ->sas_ex_get_linkrate() >> ->sas_port_add_phy() >> >> The type of the newly connected device is stp, but the linkrate is 5 >> which >> less than 1.5G, then the sas address is set to 0 in sas_set_ex_phy(). > > I don't understand why we do anything when in this state. linkrate == 5 > means phy reset in progress. Can we just bail out until the SATA phy is > in a decent shape? I assume that when the SATA phy is in "up" state that > we get a broadcast event and can re-evaluate. You are saying that we use a method similar to SAS_SATA_SPINUP_HOLD? > >> Subsequently, a new port port-7:7:0 was created and tried to add phy19 >> with >> the same zero-address to this new port. However, phy19 still belongs to >> another port, then a BUG() was triggered in sas_ex_get_linkrate(). >> >> Fix the problem as follows: >> 1. Use sas_port_add_ex_phy() instead of sas_port_add_phy() when ex_phy is >> added to the parent port. > > this seems ok > >> >> 2. Set ex_dev->parent_port to NULL when the number of phy on the port >> becomes 0. >> >> 3. When phy->attached_dev_type != NO_DEVICE, do not set the zero address >> for phy->attached_sas_addr. >> >> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") >> Fixes: 7d1d86518118 ("[SCSI] libsas: fix false positive 'device >> attached' conditions") >> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >> --- >> drivers/scsi/libsas/sas_expander.c | 10 ++++++---- >> 1 file changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/scsi/libsas/sas_expander.c >> b/drivers/scsi/libsas/sas_expander.c >> index 7aa968b85e1e..9152152d5e10 100644 >> --- a/drivers/scsi/libsas/sas_expander.c >> +++ b/drivers/scsi/libsas/sas_expander.c >> @@ -45,7 +45,7 @@ static void sas_add_parent_port(struct domain_device >> *dev, int phy_id) >> BUG_ON(sas_port_add(ex->parent_port)); >> sas_port_mark_backlink(ex->parent_port); >> } >> - sas_port_add_phy(ex->parent_port, ex_phy->phy); >> + sas_port_add_ex_phy(ex->parent_port, ex_phy); >> } >> /* ---------- SMP task management ---------- */ >> @@ -261,8 +261,7 @@ static void sas_set_ex_phy(struct domain_device >> *dev, int phy_id, >> /* help some expanders that fail to zero sas_address in the 'no >> * device' case >> */ > > Please pay attention to this comment. It seems that some expanders > require us to explicitly zero the SAS address. Yes, we have reviewed this point, and its modification is for some expanders to report that the sas address isn't zero in the "no device" case. The current modification does not affect its original problem fix, we just removed its linkrate judgment. > >> - if (phy->attached_dev_type == SAS_PHY_UNUSED || >> - phy->linkrate < SAS_LINK_RATE_1_5_GBPS) >> + if (phy->attached_dev_type == SAS_PHY_UNUSED) >> memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE); >> else >> memcpy(phy->attached_sas_addr, dr->attached_sas_addr, >> SAS_ADDR_SIZE); >> @@ -1864,9 +1863,12 @@ static void sas_unregister_devs_sas_addr(struct >> domain_device *parent, >> if (phy->port) { >> sas_port_delete_phy(phy->port, phy->phy); >> sas_device_set_phy(found, phy->port); >> - if (phy->port->num_phys == 0) >> + if (phy->port->num_phys == 0) { >> list_add_tail(&phy->port->del_list, >> &parent->port->sas_port_del_list); >> + if (ex_dev->parent_port == phy->port) >> + ex_dev->parent_port = NULL; > > This does not feel like the right place to do this. So the port which we > queue to free is the ex_dev->parent_port, right? Yes, we found that if ex_dev->parent_port is not set to NULL, after the port is released, if there is a new ex_phy connection, use-after-free problems will occur. And the current branch is to determine whether the number of phys on the port is 0. I think it is more appropriate to set parent_port. Do you have any better suggestions? > > BTW, do you know why it's called ex_dev->parent_port and not > ex_dev->port? I find the name parent_port confusing... It is the port connected to the upper-level device, so named parent_port. Thanks, Xingui ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port 2023-12-05 13:22 ` yangxingui @ 2023-12-05 18:13 ` John Garry 2024-03-12 14:26 ` yangxingui 0 siblings, 1 reply; 10+ messages in thread From: John Garry @ 2023-12-05 18:13 UTC (permalink / raw) To: yangxingui, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong On 05/12/2023 13:22, yangxingui wrote: > > On 2023/12/5 2:05, John Garry wrote: >> On 04/12/2023 12:29, Xingui Yang wrote: >>> When the expander device which attached many SATA disks is connected to >>> the host, first disable and then enable the local phy. The following >>> BUG() >>> will be triggered with a small probability: >>> >>> [562240.051046] sas: phy19 part of wide port with phy16 >> >> Please use code from latest kernel. This again seems to be the old >> comment format. > Ok. >> >>> [562240.051197] sas: ex 500e004aaaaaaa1f phy19:U:0 attached: >>> 0000000000000000 (no device) >> >> The log at 562240.051046 tells that phy19 formed a wideport with >> phy16, but then here we see that phy19 has attached SAS address 0. How >> did we form a wideport with a phy with sas address 0? Sorry if I asked >> this before, but I looked through the thread and it is not clear. > Ok, the early address of phy19 is not 0, and forms a wide port with > phy16. But now phy19 has been unregistered and the sas address of phy19 > is set to 0. ok, so the old logs are simply misleading: "sas: phy19 part of wide port with phy16" implies that we have joined phy19 to a wideport with phy16. Indeed, my change to that vague print is more than 4.5 years old now - see commit a5b38d3159. Sorry to say, but that does not fill me full of confidence that the changes in this series are suitable for a mainline kernel. Please don't do that. Test against the very recent mainline kernel. > >> >>> [562240.051203] sas: done REVALIDATING DOMAIN on port 0, pid:435909, >>> res 0x0 >>> <...> >>> [562240.062536] sas: ex 500e004aaaaaaa1f phy0 new device attached >>> [562240.062616] sas: ex 500e004aaaaaaa1f phy00:U:5 attached: >>> 0000000000000000 (stp) >>> [562240.062680] port-7:7:0: trying to add phy phy-7:7:19 fails: it's >>> already part of another port >>> [562240.085064] ------------[ cut here ]------------ >>> [562240.096612] kernel BUG at drivers/scsi/scsi_transport_sas.c:1083! >>> [562240.109611] Internal error: Oops - BUG: 0 [#1] SMP >>> [562240.343518] Process kworker/u256:3 (pid: 435909, stack limit = >>> 0x0000000003bcbebf) >>> [562240.421714] Workqueue: 0000:b4:02.0_disco_q sas_revalidate_domain >>> [libsas] >>> [562240.437173] pstate: 40c00009 (nZcv daif +PAN +UAO) >>> [562240.450478] pc : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >>> [562240.465283] lr : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >>> [562240.479751] sp : ffff0000300cfa70 >>> [562240.674822] Call trace: >>> [562240.682709] sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >>> [562240.694013] sas_ex_get_linkrate.isra.5+0xcc/0x128 [libsas] >>> [562240.704957] sas_ex_discover_end_dev+0xfc/0x538 [libsas] >>> [562240.715508] sas_ex_discover_dev+0x3cc/0x4b8 [libsas] >>> [562240.725634] sas_ex_discover_devices+0x9c/0x1a8 [libsas] >>> [562240.735855] sas_ex_revalidate_domain+0x2f0/0x450 [libsas] >>> [562240.746123] sas_revalidate_domain+0x158/0x160 [libsas] >>> [562240.756014] process_one_work+0x1b4/0x448 >>> [562240.764548] worker_thread+0x54/0x468 >>> [562240.772562] kthread+0x134/0x138 >>> [562240.779989] ret_from_fork+0x10/0x18 >>> >>> What causes this problem: >>> 1. For phy19, when the phy is attached and added to the parent wide >>> port, >>> the path is: >>> sas_rediscover() >>> ->sas_discover_new() >>> ->sas_ex_discover_devices() >>> ->sas_ex_discover_dev() >>> -> sas_add_parent_port() >>> >>> ex_phy->port was not set and when it is removed from parent wide port >>> the >>> path is: >>> sas_rediscover() >>> ->sas_unregister_devs_sas_addr() >> >> >> Sorry, but that is not a callpath. Maybe you condensed it. Please >> expand it. > Ok. >> >>> >>> Then the sas address of phy19 becomes 0, and since ex_phy->port is NULL, >>> phy19 was not removed from the parent wide port's phy_list. >>> >>> 2. For phy0, it is connected to a new sata device and the path is: >>> sas_rediscover() >>> ->sas_discover_new()->sas_ex_phy_discover() >>> ->sas_ex_phy_discover_helper() >>> ->sas_set_ex_phy() >>> ->sas_ex_discover_devices() >>> ->sas_ex_discover_dev() >>> ->sas_ex_discover_end_dev() >>> ->sas_port_alloc() // Create >>> port-7:7:0 >>> ->sas_ex_get_linkrate() >>> ->sas_port_add_phy() >>> >>> The type of the newly connected device is stp, but the linkrate is 5 >>> which >>> less than 1.5G, then the sas address is set to 0 in sas_set_ex_phy(). >> >> I don't understand why we do anything when in this state. linkrate == >> 5 means phy reset in progress. Can we just bail out until the SATA phy >> is in a decent shape? I assume that when the SATA phy is in "up" state >> that we get a broadcast event and can re-evaluate. > You are saying that we use a method similar to SAS_SATA_SPINUP_HOLD? Maybe. Can we simply re-use SAS_SATA_SPINUP_HOLD handling? Is it suitable? >> >>> Subsequently, a new port port-7:7:0 was created and tried to add >>> phy19 with >>> the same zero-address to this new port. However, phy19 still belongs to >>> another port, then a BUG() was triggered in sas_ex_get_linkrate(). >>> >>> Fix the problem as follows: >>> 1. Use sas_port_add_ex_phy() instead of sas_port_add_phy() when >>> ex_phy is >>> added to the parent port. >> >> this seems ok >> >>> >>> 2. Set ex_dev->parent_port to NULL when the number of phy on the port >>> becomes 0. >>> >>> 3. When phy->attached_dev_type != NO_DEVICE, do not set the zero address >>> for phy->attached_sas_addr. >>> >>> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") >>> Fixes: 7d1d86518118 ("[SCSI] libsas: fix false positive 'device >>> attached' conditions") >>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >>> --- >>> drivers/scsi/libsas/sas_expander.c | 10 ++++++---- >>> 1 file changed, 6 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/scsi/libsas/sas_expander.c >>> b/drivers/scsi/libsas/sas_expander.c >>> index 7aa968b85e1e..9152152d5e10 100644 >>> --- a/drivers/scsi/libsas/sas_expander.c >>> +++ b/drivers/scsi/libsas/sas_expander.c >>> @@ -45,7 +45,7 @@ static void sas_add_parent_port(struct >>> domain_device *dev, int phy_id) >>> BUG_ON(sas_port_add(ex->parent_port)); >>> sas_port_mark_backlink(ex->parent_port); >>> } >>> - sas_port_add_phy(ex->parent_port, ex_phy->phy); >>> + sas_port_add_ex_phy(ex->parent_port, ex_phy); >>> } >>> /* ---------- SMP task management ---------- */ >>> @@ -261,8 +261,7 @@ static void sas_set_ex_phy(struct domain_device >>> *dev, int phy_id, >>> /* help some expanders that fail to zero sas_address in the 'no >>> * device' case >>> */ >> >> Please pay attention to this comment. It seems that some expanders >> require us to explicitly zero the SAS address. > Yes, we have reviewed this point, and its modification is for some > expanders to report that the sas address isn't zero in the "no device" > case. The current modification does not affect its original problem fix, > we just removed its linkrate judgment. ok >> >>> - if (phy->attached_dev_type == SAS_PHY_UNUSED || >>> - phy->linkrate < SAS_LINK_RATE_1_5_GBPS) >>> + if (phy->attached_dev_type == SAS_PHY_UNUSED) >>> memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE); >>> else >>> memcpy(phy->attached_sas_addr, dr->attached_sas_addr, >>> SAS_ADDR_SIZE); >>> @@ -1864,9 +1863,12 @@ static void >>> sas_unregister_devs_sas_addr(struct domain_device *parent, >>> if (phy->port) { >>> sas_port_delete_phy(phy->port, phy->phy); >>> sas_device_set_phy(found, phy->port); >>> - if (phy->port->num_phys == 0) >>> + if (phy->port->num_phys == 0) { >>> list_add_tail(&phy->port->del_list, >>> &parent->port->sas_port_del_list); >>> + if (ex_dev->parent_port == phy->port) >>> + ex_dev->parent_port = NULL; >> >> This does not feel like the right place to do this. So the port which >> we queue to free is the ex_dev->parent_port, right? > Yes, we found that if ex_dev->parent_port is not set to NULL, after the > port is released, if there is a new ex_phy connection, use-after-free Do you mean really a memory use-after-free, like which KASAN would report? > problems will occur. And the current branch is to determine whether the > number of phys on the port is 0. I think it is more appropriate to set > parent_port. Do you have any better suggestions? Let me check again... >> >> BTW, do you know why it's called ex_dev->parent_port and not >> ex_dev->port? I find the name parent_port confusing... > It is the port connected to the upper-level device, so named parent_port. But isn't this just the sas_port for the expander device attachment to the host and more closely associated to the expander itself? Thanks, John ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port 2023-12-05 18:13 ` John Garry @ 2024-03-12 14:26 ` yangxingui 0 siblings, 0 replies; 10+ messages in thread From: yangxingui @ 2024-03-12 14:26 UTC (permalink / raw) To: John Garry, yanaijie, jejb, martin.petersen, damien.lemoal Cc: linux-scsi, linux-kernel, linuxarm, prime.zeng, chenxiang66, kangfenglong Hi John, On 2023/12/6 2:13, John Garry wrote: > On 05/12/2023 13:22, yangxingui wrote: >> >> On 2023/12/5 2:05, John Garry wrote: >>> On 04/12/2023 12:29, Xingui Yang wrote: >>>> When the expander device which attached many SATA disks is connected to >>>> the host, first disable and then enable the local phy. The following >>>> BUG() >>>> will be triggered with a small probability: >>>> >>>> [562240.051046] sas: phy19 part of wide port with phy16 >>> >>> Please use code from latest kernel. This again seems to be the old >>> comment format. >> Ok. >>> >>>> [562240.051197] sas: ex 500e004aaaaaaa1f phy19:U:0 attached: >>>> 0000000000000000 (no device) >>> >>> The log at 562240.051046 tells that phy19 formed a wideport with >>> phy16, but then here we see that phy19 has attached SAS address 0. >>> How did we form a wideport with a phy with sas address 0? Sorry if I >>> asked this before, but I looked through the thread and it is not clear. >> Ok, the early address of phy19 is not 0, and forms a wide port with >> phy16. But now phy19 has been unregistered and the sas address of >> phy19 is set to 0. > > ok, so the old logs are simply misleading: "sas: phy19 part of wide port > with phy16" implies that we have joined phy19 to a wideport with phy16. > > Indeed, my change to that vague print is more than 4.5 years old now - > see commit a5b38d3159. > > Sorry to say, but that does not fill me full of confidence that the > changes in this series are suitable for a mainline kernel. Please don't > do that. Test against the very recent mainline kernel. This problem will occasionally occur in new versions. I updated a version and split the patch to make this problem easier to deal with. > >> >>> >>>> [562240.051203] sas: done REVALIDATING DOMAIN on port 0, pid:435909, >>>> res 0x0 >>>> <...> >>>> [562240.062536] sas: ex 500e004aaaaaaa1f phy0 new device attached >>>> [562240.062616] sas: ex 500e004aaaaaaa1f phy00:U:5 attached: >>>> 0000000000000000 (stp) >>>> [562240.062680] port-7:7:0: trying to add phy phy-7:7:19 fails: >>>> it's already part of another port >>>> [562240.085064] ------------[ cut here ]------------ >>>> [562240.096612] kernel BUG at drivers/scsi/scsi_transport_sas.c:1083! >>>> [562240.109611] Internal error: Oops - BUG: 0 [#1] SMP >>>> [562240.343518] Process kworker/u256:3 (pid: 435909, stack limit = >>>> 0x0000000003bcbebf) >>>> [562240.421714] Workqueue: 0000:b4:02.0_disco_q >>>> sas_revalidate_domain [libsas] >>>> [562240.437173] pstate: 40c00009 (nZcv daif +PAN +UAO) >>>> [562240.450478] pc : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >>>> [562240.465283] lr : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >>>> [562240.479751] sp : ffff0000300cfa70 >>>> [562240.674822] Call trace: >>>> [562240.682709] sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >>>> [562240.694013] sas_ex_get_linkrate.isra.5+0xcc/0x128 [libsas] >>>> [562240.704957] sas_ex_discover_end_dev+0xfc/0x538 [libsas] >>>> [562240.715508] sas_ex_discover_dev+0x3cc/0x4b8 [libsas] >>>> [562240.725634] sas_ex_discover_devices+0x9c/0x1a8 [libsas] >>>> [562240.735855] sas_ex_revalidate_domain+0x2f0/0x450 [libsas] >>>> [562240.746123] sas_revalidate_domain+0x158/0x160 [libsas] >>>> [562240.756014] process_one_work+0x1b4/0x448 >>>> [562240.764548] worker_thread+0x54/0x468 >>>> [562240.772562] kthread+0x134/0x138 >>>> [562240.779989] ret_from_fork+0x10/0x18 >>>> >>>> What causes this problem: >>>> 1. For phy19, when the phy is attached and added to the parent wide >>>> port, >>>> the path is: >>>> sas_rediscover() >>>> ->sas_discover_new() >>>> ->sas_ex_discover_devices() >>>> ->sas_ex_discover_dev() >>>> -> sas_add_parent_port() >>>> >>>> ex_phy->port was not set and when it is removed from parent wide >>>> port the >>>> path is: >>>> sas_rediscover() >>>> ->sas_unregister_devs_sas_addr() >>> >>> >>> Sorry, but that is not a callpath. Maybe you condensed it. Please >>> expand it. I updated a version and split the patch to make this problem easier to deal with. >> Ok. >>> >>>> >>>> Then the sas address of phy19 becomes 0, and since ex_phy->port is >>>> NULL, >>>> phy19 was not removed from the parent wide port's phy_list. >>>> >>>> 2. For phy0, it is connected to a new sata device and the path is: >>>> sas_rediscover() >>>> ->sas_discover_new()->sas_ex_phy_discover() >>>> ->sas_ex_phy_discover_helper() >>>> ->sas_set_ex_phy() >>>> ->sas_ex_discover_devices() >>>> ->sas_ex_discover_dev() >>>> ->sas_ex_discover_end_dev() >>>> ->sas_port_alloc() // Create >>>> port-7:7:0 >>>> ->sas_ex_get_linkrate() >>>> ->sas_port_add_phy() >>>> >>>> The type of the newly connected device is stp, but the linkrate is 5 >>>> which >>>> less than 1.5G, then the sas address is set to 0 in sas_set_ex_phy(). >>> >>> I don't understand why we do anything when in this state. linkrate == >>> 5 means phy reset in progress. Can we just bail out until the SATA >>> phy is in a decent shape? I assume that when the SATA phy is in "up" >>> state that we get a broadcast event and can re-evaluate. >> You are saying that we use a method similar to SAS_SATA_SPINUP_HOLD? > > Maybe. Can we simply re-use SAS_SATA_SPINUP_HOLD handling? Is it suitable? From the SAS Protocol Layer - 4 (SPL-4) interpretation of RESET_IN_PROGRESS, Phy is enabled and the expander phy is performing an SMP PHY CONTROL function (see 9.4.3.28) phy operation of LINK RESET or HARD RESET. This value is returned if the specified phy contained a value of 8h to Fh in this field when an SMP PHY CONTROL function phy operation of LINK RESET or HARD RESET phy operation is processed. Maybe, we should not need to perform operations of LINK RESET. > >>> >>>> Subsequently, a new port port-7:7:0 was created and tried to add >>>> phy19 with >>>> the same zero-address to this new port. However, phy19 still belongs to >>>> another port, then a BUG() was triggered in sas_ex_get_linkrate(). >>>> >>>> Fix the problem as follows: >>>> 1. Use sas_port_add_ex_phy() instead of sas_port_add_phy() when >>>> ex_phy is >>>> added to the parent port. >>> >>> this seems ok >>> >>>> >>>> 2. Set ex_dev->parent_port to NULL when the number of phy on the port >>>> becomes 0. >>>> >>>> 3. When phy->attached_dev_type != NO_DEVICE, do not set the zero >>>> address >>>> for phy->attached_sas_addr. >>>> >>>> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") >>>> Fixes: 7d1d86518118 ("[SCSI] libsas: fix false positive 'device >>>> attached' conditions") >>>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >>>> --- >>>> drivers/scsi/libsas/sas_expander.c | 10 ++++++---- >>>> 1 file changed, 6 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/scsi/libsas/sas_expander.c >>>> b/drivers/scsi/libsas/sas_expander.c >>>> index 7aa968b85e1e..9152152d5e10 100644 >>>> --- a/drivers/scsi/libsas/sas_expander.c >>>> +++ b/drivers/scsi/libsas/sas_expander.c >>>> @@ -45,7 +45,7 @@ static void sas_add_parent_port(struct >>>> domain_device *dev, int phy_id) >>>> BUG_ON(sas_port_add(ex->parent_port)); >>>> sas_port_mark_backlink(ex->parent_port); >>>> } >>>> - sas_port_add_phy(ex->parent_port, ex_phy->phy); >>>> + sas_port_add_ex_phy(ex->parent_port, ex_phy); >>>> } >>>> /* ---------- SMP task management ---------- */ >>>> @@ -261,8 +261,7 @@ static void sas_set_ex_phy(struct domain_device >>>> *dev, int phy_id, >>>> /* help some expanders that fail to zero sas_address in the 'no >>>> * device' case >>>> */ >>> >>> Please pay attention to this comment. It seems that some expanders >>> require us to explicitly zero the SAS address. >> Yes, we have reviewed this point, and its modification is for some >> expanders to report that the sas address isn't zero in the "no device" >> case. The current modification does not affect its original problem >> fix, we just removed its linkrate judgment. > > ok > >>> >>>> - if (phy->attached_dev_type == SAS_PHY_UNUSED || >>>> - phy->linkrate < SAS_LINK_RATE_1_5_GBPS) >>>> + if (phy->attached_dev_type == SAS_PHY_UNUSED) >>>> memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE); >>>> else >>>> memcpy(phy->attached_sas_addr, dr->attached_sas_addr, >>>> SAS_ADDR_SIZE); >>>> @@ -1864,9 +1863,12 @@ static void >>>> sas_unregister_devs_sas_addr(struct domain_device *parent, >>>> if (phy->port) { >>>> sas_port_delete_phy(phy->port, phy->phy); >>>> sas_device_set_phy(found, phy->port); >>>> - if (phy->port->num_phys == 0) >>>> + if (phy->port->num_phys == 0) { >>>> list_add_tail(&phy->port->del_list, >>>> &parent->port->sas_port_del_list); >>>> + if (ex_dev->parent_port == phy->port) >>>> + ex_dev->parent_port = NULL; >>> >>> This does not feel like the right place to do this. So the port which >>> we queue to free is the ex_dev->parent_port, right? >> Yes, we found that if ex_dev->parent_port is not set to NULL, after >> the port is released, if there is a new ex_phy connection, use-after-free > > Do you mean really a memory use-after-free, like which KASAN would report? Yes, it will trigger panic. > > >> problems will occur. And the current branch is to determine whether >> the number of phys on the port is 0. I think it is more appropriate to >> set parent_port. Do you have any better suggestions? > > Let me check again... > >>> >>> BTW, do you know why it's called ex_dev->parent_port and not >>> ex_dev->port? I find the name parent_port confusing... >> It is the port connected to the upper-level device, so named >> parent_port. > > But isn't this just the sas_port for the expander device attachment to > the host and more closely associated to the expander itself? > > Thanks, > John > > . Thanks, Xingui ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-03-12 14:26 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-04 12:29 [PATCH v5 0/3] Fix the failure of adding phy with zero-address to port Xingui Yang 2023-12-04 12:29 ` [PATCH v5 1/3] scsi: libsas: Add helper for port add ex_phy Xingui Yang 2023-12-04 13:08 ` John Garry 2023-12-04 12:29 ` [PATCH v5 2/3] scsi: libsas: Move sas_add_parent_port() to sas_expander.c Xingui Yang 2023-12-04 13:17 ` John Garry 2023-12-04 12:29 ` [PATCH v5 3/3] scsi: libsas: Fix the failure of adding phy with zero-address to port Xingui Yang 2023-12-04 18:05 ` John Garry 2023-12-05 13:22 ` yangxingui 2023-12-05 18:13 ` John Garry 2024-03-12 14:26 ` yangxingui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox