[for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
@ 2015-06-18  3:22 Dan Williams
  2015-06-21 14:27 ` Hannes Reinecke
  2015-07-22 18:28 ` James Bottomley
  0 siblings, 2 replies; 13+ messages in thread
From: Dan Williams @ 2015-06-18  3:22 UTC (permalink / raw)
  To: JBottomley, hch; +Cc: Praveen Murali, linux-scsi, stable

Praveen reports:

    After some debugging this is what I have found

    sas_phye_loss_of_signal gets triggered on phy_event from mvsas
        sas_phye_loss_of_signal calls sas_deform_port
             sas_deform_port posts a DISCE_DESTRUCT event (sas_unregister_domain_devices-> sas_unregister_dev)
             sas_deform_port calls sas_port_delete
                     sas_port_delete calls sas_port_delete_link
                             sysfs_remove_group: kobject 'port-X:Y'
                     sas_port_delete calls device_del
                             sysfs_remove_group: kobject 'port-X:Y'

    sas_destruct_devices gets triggered for the destruct event (DISCE_DESTRUCT)
        sas_destruct_devices calls sas_rphy_delete
            sas_rphy_delete calls scsi_remove_device
                 scsi_remove_device calls __scsi_remove_device
                         __scsi_remove_device calls bsg_unregister_queue
                                 bsg_unregister_queue -> device_unregister -> device_del -> sysfs_remove_group: kobject 'X:0:0:0'

    Since X:0:0:0 falls under port-X:Y (which got deleted during
    sas_port_delete), this call results in the warning. All the later
    warnings in the dmesg output I sent earlier are trying to delete objects
    under port-X:Y. Since port-X:Y got recursively deleted, all these calls
    result in warnings. Since, the PHY and DISC events are processed in two
    different work queues (and one triggers the other), is there any way
    other than checking if the object exists in sysfs (in device_del) before
    deleting?

    WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:219 device_del+0x40/0x1c0()
    sysfs group ffffffff818b97e0 not found for kobject '2:0:4:0'
    [..]
    CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: P        W  O  3.16.7-ckt9-logicube-ng.3 #1
    Hardware name: To be filled by O.E.M. To be filled by O.E.M./VT6085, BIOS 4.6.5 01/23/2015
    Workqueue: scsi_wq_2 sas_destruct_devices [libsas]
     0000000000000009 ffffffff8151cd18 ffff88011b35bcd8 ffffffff810687b7
     ffff88011a661400 ffff88011b35bd28 ffff8800c6e5e968 ffff880000028810
     ffff8800c89f2c00 ffffffff8106881c ffffffff81733b68 0000000000000028
    Call Trace:
     [<ffffffff8151cd18>] ? dump_stack+0x41/0x51
     [<ffffffff810687b7>] ? warn_slowpath_common+0x77/0x90
     [<ffffffff8106881c>] ? warn_slowpath_fmt+0x4c/0x50
     [<ffffffff813ad2d0>] ? device_del+0x40/0x1c0
     [<ffffffff813ad46a>] ? device_unregister+0x1a/0x70
     [<ffffffff812a535e>] ? bsg_unregister_queue+0x5e/0xb0
     [<ffffffffa00781a9>] ? __scsi_remove_device+0xa9/0xd0 [scsi_mod]

It appears we've always been double deleting the devices below sas_port,
but recent sysfs changes now exposes this problem.  Libsas should delete
all the devices from rphy down before deleting the parent port.

Cc: <stable@vger.kernel.org>
Reported-by: Praveen Murali <pmurali@logicube.com>
Tested-by: Praveen Murali <pmurali@logicube.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c |    6 +++---
 drivers/scsi/libsas/sas_port.c     |    1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 60de66252fa2..a4db770fe8b0 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -362,11 +362,14 @@ static void sas_destruct_devices(struct work_struct *work)
 	clear_bit(DISCE_DESTRUCT, &port->disc.pending);
 
 	list_for_each_entry_safe(dev, n, &port->destroy_list, disco_list_node) {
+		struct sas_port *sas_port = dev_to_sas_port(dev->rphy->dev.parent);
+
 		list_del_init(&dev->disco_list_node);
 
 		sas_remove_children(&dev->rphy->dev);
 		sas_rphy_delete(dev->rphy);
 		sas_unregister_common_dev(port, dev);
+		sas_port_delete(sas_port);
 	}
 }
 
@@ -400,9 +403,6 @@ void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
 
 	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
 		sas_unregister_dev(port, dev);
-
-	port->port->rphy = NULL;
-
 }
 
 void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index d3c5297c6c89..9a25ae3a52a4 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
 
 	if (port->num_phys == 1) {
 		sas_unregister_domain_devices(port, gone);
-		sas_port_delete(port->port);
 		port->port = NULL;
 	} else {
 		sas_port_delete_phy(port->port, phy->phy);


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-06-18  3:22 [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time Dan Williams
@ 2015-06-21 14:27 ` Hannes Reinecke
  2015-07-22 18:28 ` James Bottomley
  1 sibling, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2015-06-21 14:27 UTC (permalink / raw)
  To: Dan Williams, JBottomley, hch; +Cc: Praveen Murali, linux-scsi, stable

On 06/18/2015 05:22 AM, Dan Williams wrote:
> Praveen reports:
> 
>     After some debugging this is what I have found
> 
>     sas_phye_loss_of_signal gets triggered on phy_event from mvsas
>         sas_phye_loss_of_signal calls sas_deform_port
>              sas_deform_port posts a DISCE_DESTRUCT event (sas_unregister_domain_devices-> sas_unregister_dev)
>              sas_deform_port calls sas_port_delete
>                      sas_port_delete calls sas_port_delete_link
>                              sysfs_remove_group: kobject 'port-X:Y'
>                      sas_port_delete calls device_del
>                              sysfs_remove_group: kobject 'port-X:Y'
> 
>     sas_destruct_devices gets triggered for the destruct event (DISCE_DESTRUCT)
>         sas_destruct_devices calls sas_rphy_delete
>             sas_rphy_delete calls scsi_remove_device
>                  scsi_remove_device calls __scsi_remove_device
>                          __scsi_remove_device calls bsg_unregister_queue
>                                  bsg_unregister_queue -> device_unregister -> device_del -> sysfs_remove_group: kobject 'X:0:0:0'
> 
>     Since X:0:0:0 falls under port-X:Y (which got deleted during
>     sas_port_delete), this call results in the warning. All the later
>     warnings in the dmesg output I sent earlier are trying to delete objects
>     under port-X:Y. Since port-X:Y got recursively deleted, all these calls
>     result in warnings. Since, the PHY and DISC events are processed in two
>     different work queues (and one triggers the other), is there any way
>     other than checking if the object exists in sysfs (in device_del) before
>     deleting?
> 
>     WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:219 device_del+0x40/0x1c0()
>     sysfs group ffffffff818b97e0 not found for kobject '2:0:4:0'
>     [..]
>     CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: P        W  O  3.16.7-ckt9-logicube-ng.3 #1
>     Hardware name: To be filled by O.E.M. To be filled by O.E.M./VT6085, BIOS 4.6.5 01/23/2015
>     Workqueue: scsi_wq_2 sas_destruct_devices [libsas]
>      0000000000000009 ffffffff8151cd18 ffff88011b35bcd8 ffffffff810687b7
>      ffff88011a661400 ffff88011b35bd28 ffff8800c6e5e968 ffff880000028810
>      ffff8800c89f2c00 ffffffff8106881c ffffffff81733b68 0000000000000028
>     Call Trace:
>      [<ffffffff8151cd18>] ? dump_stack+0x41/0x51
>      [<ffffffff810687b7>] ? warn_slowpath_common+0x77/0x90
>      [<ffffffff8106881c>] ? warn_slowpath_fmt+0x4c/0x50
>      [<ffffffff813ad2d0>] ? device_del+0x40/0x1c0
>      [<ffffffff813ad46a>] ? device_unregister+0x1a/0x70
>      [<ffffffff812a535e>] ? bsg_unregister_queue+0x5e/0xb0
>      [<ffffffffa00781a9>] ? __scsi_remove_device+0xa9/0xd0 [scsi_mod]
> 
> It appears we've always been double deleting the devices below sas_port,
> but recent sysfs changes now exposes this problem.  Libsas should delete
> all the devices from rphy down before deleting the parent port.
> 
> Cc: <stable@vger.kernel.org>
> Reported-by: Praveen Murali <pmurali@logicube.com>
> Tested-by: Praveen Murali <pmurali@logicube.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe stable" in

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-06-18  3:22 [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time Dan Williams
  2015-06-21 14:27 ` Hannes Reinecke
@ 2015-07-22 18:28 ` James Bottomley
  2015-07-22 21:08   ` Dan Williams
  1 sibling, 1 reply; 13+ messages in thread
From: James Bottomley @ 2015-07-22 18:28 UTC (permalink / raw)
  To: Dan Williams; +Cc: hch, Praveen Murali, linux-scsi, stable

On Wed, 2015-06-17 at 23:22 -0400, Dan Williams wrote:
> Praveen reports:
> 
>     After some debugging this is what I have found
> 
>     sas_phye_loss_of_signal gets triggered on phy_event from mvsas
>         sas_phye_loss_of_signal calls sas_deform_port
>              sas_deform_port posts a DISCE_DESTRUCT event (sas_unregister_domain_devices-> sas_unregister_dev)
>              sas_deform_port calls sas_port_delete
>                      sas_port_delete calls sas_port_delete_link
>                              sysfs_remove_group: kobject 'port-X:Y'
>                      sas_port_delete calls device_del
>                              sysfs_remove_group: kobject 'port-X:Y'
> 
>     sas_destruct_devices gets triggered for the destruct event (DISCE_DESTRUCT)
>         sas_destruct_devices calls sas_rphy_delete
>             sas_rphy_delete calls scsi_remove_device
>                  scsi_remove_device calls __scsi_remove_device
>                          __scsi_remove_device calls bsg_unregister_queue
>                                  bsg_unregister_queue -> device_unregister -> device_del -> sysfs_remove_group: kobject 'X:0:0:0'
> 
>     Since X:0:0:0 falls under port-X:Y (which got deleted during
>     sas_port_delete), this call results in the warning. All the later
>     warnings in the dmesg output I sent earlier are trying to delete objects
>     under port-X:Y. Since port-X:Y got recursively deleted, all these calls
>     result in warnings. Since, the PHY and DISC events are processed in two
>     different work queues (and one triggers the other), is there any way
>     other than checking if the object exists in sysfs (in device_del) before
>     deleting?
> 
>     WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:219 device_del+0x40/0x1c0()
>     sysfs group ffffffff818b97e0 not found for kobject '2:0:4:0'
>     [..]
>     CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: P        W  O  3.16.7-ckt9-logicube-ng.3 #1
>     Hardware name: To be filled by O.E.M. To be filled by O.E.M./VT6085, BIOS 4.6.5 01/23/2015
>     Workqueue: scsi_wq_2 sas_destruct_devices [libsas]
>      0000000000000009 ffffffff8151cd18 ffff88011b35bcd8 ffffffff810687b7
>      ffff88011a661400 ffff88011b35bd28 ffff8800c6e5e968 ffff880000028810
>      ffff8800c89f2c00 ffffffff8106881c ffffffff81733b68 0000000000000028
>     Call Trace:
>      [<ffffffff8151cd18>] ? dump_stack+0x41/0x51
>      [<ffffffff810687b7>] ? warn_slowpath_common+0x77/0x90
>      [<ffffffff8106881c>] ? warn_slowpath_fmt+0x4c/0x50
>      [<ffffffff813ad2d0>] ? device_del+0x40/0x1c0
>      [<ffffffff813ad46a>] ? device_unregister+0x1a/0x70
>      [<ffffffff812a535e>] ? bsg_unregister_queue+0x5e/0xb0
>      [<ffffffffa00781a9>] ? __scsi_remove_device+0xa9/0xd0 [scsi_mod]
> 
> It appears we've always been double deleting the devices below sas_port,
> but recent sysfs changes now exposes this problem.  Libsas should delete
> all the devices from rphy down before deleting the parent port.

There's a missing description of the fix here.

        So we make the DISCE_DESTROY event delete the port as well as
        all the underlying devices
?

> Cc: <stable@vger.kernel.org>
> Reported-by: Praveen Murali <pmurali@logicube.com>
> Tested-by: Praveen Murali <pmurali@logicube.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/scsi/libsas/sas_discover.c |    6 +++---
>  drivers/scsi/libsas/sas_port.c     |    1 -
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> index 60de66252fa2..a4db770fe8b0 100644
> --- a/drivers/scsi/libsas/sas_discover.c
> +++ b/drivers/scsi/libsas/sas_discover.c
> @@ -362,11 +362,14 @@ static void sas_destruct_devices(struct work_struct *work)
>  	clear_bit(DISCE_DESTRUCT, &port->disc.pending);
>  
>  	list_for_each_entry_safe(dev, n, &port->destroy_list, disco_list_node) {
> +		struct sas_port *sas_port = dev_to_sas_port(dev->rphy->dev.parent);
> +

Do you need this? isn't what you've elaborately got here as sas_port,
simply port->port?  Assuming you don't NULL that out (see below) all
this goes away.

>  		list_del_init(&dev->disco_list_node);
>  
>  		sas_remove_children(&dev->rphy->dev);
>  		sas_rphy_delete(dev->rphy);
>  		sas_unregister_common_dev(port, dev);
> +		sas_port_delete(sas_port);

So this becomes sas_port_delete(port->port);

>  	}
>  }
>  
> @@ -400,9 +403,6 @@ void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
>  
>  	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
>  		sas_unregister_dev(port, dev);
> -
> -	port->port->rphy = NULL;
> -

Why does this line need removing.  It's only used by ATA devices on an
expander, but it's logical that it removes the visibility of the device
being destroyed.

>  }
>  
>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> index d3c5297c6c89..9a25ae3a52a4 100644
> --- a/drivers/scsi/libsas/sas_port.c
> +++ b/drivers/scsi/libsas/sas_port.c
> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>  
>  	if (port->num_phys == 1) {
>  		sas_unregister_domain_devices(port, gone);
> -		sas_port_delete(port->port);
>  		port->port = NULL;
>  	} else {
>  		sas_port_delete_phy(port->port, phy->phy);
> 

This should become

if (port->num_phys == 1)
	sas_unregister_domain_device(port, gone);

sas_port_delete_phy(port->port, phy->phy);

So we end up with a port scheduled for destruction with no phys rather
than making the last phy association hang around until the DISCE
workqueue runs.

James



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-22 18:28 ` James Bottomley
@ 2015-07-22 21:08   ` Dan Williams
  2015-07-27 15:17     ` James Bottomley
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2015-07-22 21:08 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2015-06-17 at 23:22 -0400, Dan Williams wrote:
>> Praveen reports:
>>
>>     After some debugging this is what I have found
>>
>>     sas_phye_loss_of_signal gets triggered on phy_event from mvsas
>>         sas_phye_loss_of_signal calls sas_deform_port
>>              sas_deform_port posts a DISCE_DESTRUCT event (sas_unregister_domain_devices-> sas_unregister_dev)
>>              sas_deform_port calls sas_port_delete
>>                      sas_port_delete calls sas_port_delete_link
>>                              sysfs_remove_group: kobject 'port-X:Y'
>>                      sas_port_delete calls device_del
>>                              sysfs_remove_group: kobject 'port-X:Y'
>>
>>     sas_destruct_devices gets triggered for the destruct event (DISCE_DESTRUCT)
>>         sas_destruct_devices calls sas_rphy_delete
>>             sas_rphy_delete calls scsi_remove_device
>>                  scsi_remove_device calls __scsi_remove_device
>>                          __scsi_remove_device calls bsg_unregister_queue
>>                                  bsg_unregister_queue -> device_unregister -> device_del -> sysfs_remove_group: kobject 'X:0:0:0'
>>
>>     Since X:0:0:0 falls under port-X:Y (which got deleted during
>>     sas_port_delete), this call results in the warning. All the later
>>     warnings in the dmesg output I sent earlier are trying to delete objects
>>     under port-X:Y. Since port-X:Y got recursively deleted, all these calls
>>     result in warnings. Since, the PHY and DISC events are processed in two
>>     different work queues (and one triggers the other), is there any way
>>     other than checking if the object exists in sysfs (in device_del) before
>>     deleting?
>>
>>     WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:219 device_del+0x40/0x1c0()
>>     sysfs group ffffffff818b97e0 not found for kobject '2:0:4:0'
>>     [..]
>>     CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: P        W  O  3.16.7-ckt9-logicube-ng.3 #1
>>     Hardware name: To be filled by O.E.M. To be filled by O.E.M./VT6085, BIOS 4.6.5 01/23/2015
>>     Workqueue: scsi_wq_2 sas_destruct_devices [libsas]
>>      0000000000000009 ffffffff8151cd18 ffff88011b35bcd8 ffffffff810687b7
>>      ffff88011a661400 ffff88011b35bd28 ffff8800c6e5e968 ffff880000028810
>>      ffff8800c89f2c00 ffffffff8106881c ffffffff81733b68 0000000000000028
>>     Call Trace:
>>      [<ffffffff8151cd18>] ? dump_stack+0x41/0x51
>>      [<ffffffff810687b7>] ? warn_slowpath_common+0x77/0x90
>>      [<ffffffff8106881c>] ? warn_slowpath_fmt+0x4c/0x50
>>      [<ffffffff813ad2d0>] ? device_del+0x40/0x1c0
>>      [<ffffffff813ad46a>] ? device_unregister+0x1a/0x70
>>      [<ffffffff812a535e>] ? bsg_unregister_queue+0x5e/0xb0
>>      [<ffffffffa00781a9>] ? __scsi_remove_device+0xa9/0xd0 [scsi_mod]
>>
>> It appears we've always been double deleting the devices below sas_port,
>> but recent sysfs changes now exposes this problem.  Libsas should delete
>> all the devices from rphy down before deleting the parent port.
>
> There's a missing description of the fix here.
>
>         So we make the DISCE_DESTROY event delete the port as well as
>         all the underlying devices
> ?

"We make DISCE_DESTROY responsible for deleting the child devices as
well as the port." == "Libsas should delete all the devices from rphy
down before deleting the parent port."

>> Cc: <stable@vger.kernel.org>
>> Reported-by: Praveen Murali <pmurali@logicube.com>
>> Tested-by: Praveen Murali <pmurali@logicube.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  drivers/scsi/libsas/sas_discover.c |    6 +++---
>>  drivers/scsi/libsas/sas_port.c     |    1 -
>>  2 files changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
>> index 60de66252fa2..a4db770fe8b0 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -362,11 +362,14 @@ static void sas_destruct_devices(struct work_struct *work)
>>       clear_bit(DISCE_DESTRUCT, &port->disc.pending);
>>
>>       list_for_each_entry_safe(dev, n, &port->destroy_list, disco_list_node) {
>> +             struct sas_port *sas_port = dev_to_sas_port(dev->rphy->dev.parent);
>> +
>
> Do you need this? isn't what you've elaborately got here as sas_port,
> simply port->port?

Yes, it's just an elaborate workaround since port->port is already torn down.

> Assuming you don't NULL that out (see below) all
> this goes away.

Not sure, I'd have to go look if libsas is prepared to have port->port
being valid for longer.

>
>>               list_del_init(&dev->disco_list_node);
>>
>>               sas_remove_children(&dev->rphy->dev);
>>               sas_rphy_delete(dev->rphy);
>>               sas_unregister_common_dev(port, dev);
>> +             sas_port_delete(sas_port);
>
> So this becomes sas_port_delete(port->port);
>
>>       }
>>  }
>>
>> @@ -400,9 +403,6 @@ void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
>>
>>       list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
>>               sas_unregister_dev(port, dev);
>> -
>> -     port->port->rphy = NULL;
>> -
>
> Why does this line need removing.  It's only used by ATA devices on an
> expander, but it's logical that it removes the visibility of the device
> being destroyed.

It's already performed by sas_port_delete()

>
>>  }
>>
>>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
>> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
>> index d3c5297c6c89..9a25ae3a52a4 100644
>> --- a/drivers/scsi/libsas/sas_port.c
>> +++ b/drivers/scsi/libsas/sas_port.c
>> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>>
>>       if (port->num_phys == 1) {
>>               sas_unregister_domain_devices(port, gone);
>> -             sas_port_delete(port->port);
>>               port->port = NULL;
>>       } else {
>>               sas_port_delete_phy(port->port, phy->phy);
>>
>
> This should become
>
> if (port->num_phys == 1)
>         sas_unregister_domain_device(port, gone);
>
> sas_port_delete_phy(port->port, phy->phy);
>
> So we end up with a port scheduled for destruction with no phys rather
> than making the last phy association hang around until the DISCE
> workqueue runs.

Sounds ok in theory.  I don't have a libsas environment handy, I
worked with Praveen to validate the version as submitted if you want
to re-work it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-22 21:08   ` Dan Williams
@ 2015-07-27 15:17     ` James Bottomley
  2015-07-27 15:48       ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2015-07-27 15:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> >
> >>  }
> >>
> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> >> index d3c5297c6c89..9a25ae3a52a4 100644
> >> --- a/drivers/scsi/libsas/sas_port.c
> >> +++ b/drivers/scsi/libsas/sas_port.c
> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
> >>
> >>       if (port->num_phys == 1) {
> >>               sas_unregister_domain_devices(port, gone);
> >> -             sas_port_delete(port->port);
> >>               port->port = NULL;
> >>       } else {
> >>               sas_port_delete_phy(port->port, phy->phy);
> >>
> >
> > This should become
> >
> > if (port->num_phys == 1)
> >         sas_unregister_domain_device(port, gone);
> >
> > sas_port_delete_phy(port->port, phy->phy);
> >
> > So we end up with a port scheduled for destruction with no phys rather
> > than making the last phy association hang around until the DISCE
> > workqueue runs.
> 
> Sounds ok in theory.

It's not really a choice.  The specific problem you've introduced with
this patch is failure to cope with link flutter: a deform and form event
queued sequentially.  In the new scheme you're trying to introduce, the
destruct event gets queued from the deform but behind the form and the
link flutter results in a dead link.  I thought just forcing a zero phy
port would fix this, but it won't, either the destruct has to run in the
context of the deform event or the form has to be queued later than the
destruct.  I think coupled with the changes above, there needs to be

if (port->port) {
	/* dying port, requeue form event */
	resend the PORTE_BYTES_DMAED event
	return
}

inside the unmatched port loop in sas_port_form() if nothing is found as
well to close this.

>   I don't have a libsas environment handy, I worked with Praveen to
> validate the version as submitted if you want to re-work it.

A couple of days ago, this was so urgent as to have to go outside the
usual patch process ... now it's not important enough for you to bother
working on it; which is it?

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-27 15:17     ` James Bottomley
@ 2015-07-27 15:48       ` Dan Williams
  2015-07-27 17:11         ` James Bottomley
  2015-07-27 18:07         ` James Bottomley
  0 siblings, 2 replies; 13+ messages in thread
From: Dan Williams @ 2015-07-27 15:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
>> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> >
>> >>  }
>> >>
>> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
>> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
>> >> index d3c5297c6c89..9a25ae3a52a4 100644
>> >> --- a/drivers/scsi/libsas/sas_port.c
>> >> +++ b/drivers/scsi/libsas/sas_port.c
>> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>> >>
>> >>       if (port->num_phys == 1) {
>> >>               sas_unregister_domain_devices(port, gone);
>> >> -             sas_port_delete(port->port);
>> >>               port->port = NULL;
>> >>       } else {
>> >>               sas_port_delete_phy(port->port, phy->phy);
>> >>
>> >
>> > This should become
>> >
>> > if (port->num_phys == 1)
>> >         sas_unregister_domain_device(port, gone);
>> >
>> > sas_port_delete_phy(port->port, phy->phy);
>> >
>> > So we end up with a port scheduled for destruction with no phys rather
>> > than making the last phy association hang around until the DISCE
>> > workqueue runs.
>>
>> Sounds ok in theory.
>
> It's not really a choice.  The specific problem you've introduced with
> this patch is failure to cope with link flutter: a deform and form event
> queued sequentially.  In the new scheme you're trying to introduce, the
> destruct event gets queued from the deform but behind the form and the
> link flutter results in a dead link.  I thought just forcing a zero phy
> port would fix this, but it won't, either the destruct has to run in the
> context of the deform event or the form has to be queued later than the
> destruct.  I think coupled with the changes above, there needs to be
>
> if (port->port) {
>         /* dying port, requeue form event */
>         resend the PORTE_BYTES_DMAED event
>         return
> }
>
> inside the unmatched port loop in sas_port_form() if nothing is found as
> well to close this.

I think it's too late.  Once the lldd has triggered libsas to start
tear down I seem to recall the lldd has the expectation that a new
PORTE_BYTES_DMAED triggers the creation of a new port instance for
that phy.  Once the flutter reaches libsas the race is already lost
and the port needs to be torn down, but I would need to take a closer
look.

>>   I don't have a libsas environment handy, I worked with Praveen to
>> validate the version as submitted if you want to re-work it.
>
> A couple of days ago, this was so urgent as to have to go outside the
> usual patch process ... now it's not important enough for you to bother
> working on it; which is it?

Neither, it was a reviewed patch that was idling in the process.  I'm
still of the opinion that pinging Andrew in a case like this *is* the
expected process, unless there's a place I can check that a patch is
still in the application queue?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-27 15:48       ` Dan Williams
@ 2015-07-27 17:11         ` James Bottomley
  2015-07-27 17:55           ` Dan Williams
  2015-07-27 18:07         ` James Bottomley
  1 sibling, 1 reply; 13+ messages in thread
From: James Bottomley @ 2015-07-27 17:11 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
> >> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> >
> >> >>  }
> >> >>
> >> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> >> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> >> >> index d3c5297c6c89..9a25ae3a52a4 100644
> >> >> --- a/drivers/scsi/libsas/sas_port.c
> >> >> +++ b/drivers/scsi/libsas/sas_port.c
> >> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
> >> >>
> >> >>       if (port->num_phys == 1) {
> >> >>               sas_unregister_domain_devices(port, gone);
> >> >> -             sas_port_delete(port->port);
> >> >>               port->port = NULL;
> >> >>       } else {
> >> >>               sas_port_delete_phy(port->port, phy->phy);
> >> >>
> >> >
> >> > This should become
> >> >
> >> > if (port->num_phys == 1)
> >> >         sas_unregister_domain_device(port, gone);
> >> >
> >> > sas_port_delete_phy(port->port, phy->phy);
> >> >
> >> > So we end up with a port scheduled for destruction with no phys rather
> >> > than making the last phy association hang around until the DISCE
> >> > workqueue runs.
> >>
> >> Sounds ok in theory.
> >
> > It's not really a choice.  The specific problem you've introduced with
> > this patch is failure to cope with link flutter: a deform and form event
> > queued sequentially.  In the new scheme you're trying to introduce, the
> > destruct event gets queued from the deform but behind the form and the
> > link flutter results in a dead link.  I thought just forcing a zero phy
> > port would fix this, but it won't, either the destruct has to run in the
> > context of the deform event or the form has to be queued later than the
> > destruct.  I think coupled with the changes above, there needs to be
> >
> > if (port->port) {
> >         /* dying port, requeue form event */
> >         resend the PORTE_BYTES_DMAED event
> >         return
> > }
> >
> > inside the unmatched port loop in sas_port_form() if nothing is found as
> > well to close this.
> 
> I think it's too late.  Once the lldd has triggered libsas to start
> tear down I seem to recall the lldd has the expectation that a new
> PORTE_BYTES_DMAED triggers the creation of a new port instance for
> that phy.  Once the flutter reaches libsas the race is already lost
> and the port needs to be torn down, but I would need to take a closer
> look.

I don't understand your reasoning.  The expectation is that
PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
this event precedes DISCE_DESTRUCT for the port and requeues the event,
now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?

James



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-27 17:11         ` James Bottomley
@ 2015-07-27 17:55           ` Dan Williams
  2015-07-27 18:38             ` James Bottomley
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2015-07-27 17:55 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Mon, Jul 27, 2015 at 10:11 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
>> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
>> >> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
>> >> <James.Bottomley@hansenpartnership.com> wrote:
>> >> >
>> >> >>  }
>> >> >>
>> >> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
>> >> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
>> >> >> index d3c5297c6c89..9a25ae3a52a4 100644
>> >> >> --- a/drivers/scsi/libsas/sas_port.c
>> >> >> +++ b/drivers/scsi/libsas/sas_port.c
>> >> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>> >> >>
>> >> >>       if (port->num_phys == 1) {
>> >> >>               sas_unregister_domain_devices(port, gone);
>> >> >> -             sas_port_delete(port->port);
>> >> >>               port->port = NULL;
>> >> >>       } else {
>> >> >>               sas_port_delete_phy(port->port, phy->phy);
>> >> >>
>> >> >
>> >> > This should become
>> >> >
>> >> > if (port->num_phys == 1)
>> >> >         sas_unregister_domain_device(port, gone);
>> >> >
>> >> > sas_port_delete_phy(port->port, phy->phy);
>> >> >
>> >> > So we end up with a port scheduled for destruction with no phys rather
>> >> > than making the last phy association hang around until the DISCE
>> >> > workqueue runs.
>> >>
>> >> Sounds ok in theory.
>> >
>> > It's not really a choice.  The specific problem you've introduced with
>> > this patch is failure to cope with link flutter: a deform and form event
>> > queued sequentially.  In the new scheme you're trying to introduce, the
>> > destruct event gets queued from the deform but behind the form and the
>> > link flutter results in a dead link.  I thought just forcing a zero phy
>> > port would fix this, but it won't, either the destruct has to run in the
>> > context of the deform event or the form has to be queued later than the
>> > destruct.  I think coupled with the changes above, there needs to be
>> >
>> > if (port->port) {
>> >         /* dying port, requeue form event */
>> >         resend the PORTE_BYTES_DMAED event
>> >         return
>> > }
>> >
>> > inside the unmatched port loop in sas_port_form() if nothing is found as
>> > well to close this.
>>
>> I think it's too late.  Once the lldd has triggered libsas to start
>> tear down I seem to recall the lldd has the expectation that a new
>> PORTE_BYTES_DMAED triggers the creation of a new port instance for
>> that phy.  Once the flutter reaches libsas the race is already lost
>> and the port needs to be torn down, but I would need to take a closer
>> look.
>
> I don't understand your reasoning.  The expectation is that
> PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
> this event precedes DISCE_DESTRUCT for the port and requeues the event,
> now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?
>

Ah, I read "flutter" and mistakenly thought "debounce".  You're
addressing the case where we're fully committed to the port going
down, but a "port up" event is injected between the "port down" and
"destruct" events.  Yes, I agree re-queuing needs to happen, however
don't we now have queue re-order problem with respect to a new "port
down" event?  It seems events need to be held off in-order while the
subordinate DISCE events are processed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-27 15:48       ` Dan Williams
  2015-07-27 17:11         ` James Bottomley
@ 2015-07-27 18:07         ` James Bottomley
  2015-07-27 18:24           ` Dan Williams
  1 sibling, 1 reply; 13+ messages in thread
From: James Bottomley @ 2015-07-27 18:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
> >>   I don't have a libsas environment handy, I worked with Praveen to
> >> validate the version as submitted if you want to re-work it.
> >
> > A couple of days ago, this was so urgent as to have to go outside the
> > usual patch process ... now it's not important enough for you to bother
> > working on it; which is it?
> 
> Neither, it was a reviewed patch that was idling in the process.  I'm
> still of the opinion that pinging Andrew in a case like this *is* the
> expected process, unless there's a place I can check that a patch is
> still in the application queue?

I didn't ask you to justify your process, I asked you how important you
thought the patch was mainly because of the conflicting signals you've
sent.  I get that you think I should treat all your patches as important
whether you do or not, but the world doesn't quite work like that: patch
application is a process of triage.  Patches, like this, which have
timing related issues potentially leading to races get looked at by me
as the last reviewer.  The speed of review depends on several factors,
but one of which is what type of user visible issue is this causing.
The user visible effects of this are a nasty warning message and nothing
more, I believe?  A useful indicator in this triage is how important the
submitter thinks the patch is, which was originally why I asked.

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-27 18:07         ` James Bottomley
@ 2015-07-27 18:24           ` Dan Williams
  2015-07-27 19:53             ` Praveen Murali
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2015-07-27 18:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Mon, Jul 27, 2015 at 11:07 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
>> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
>> >>   I don't have a libsas environment handy, I worked with Praveen to
>> >> validate the version as submitted if you want to re-work it.
>> >
>> > A couple of days ago, this was so urgent as to have to go outside the
>> > usual patch process ... now it's not important enough for you to bother
>> > working on it; which is it?
>>
>> Neither, it was a reviewed patch that was idling in the process.  I'm
>> still of the opinion that pinging Andrew in a case like this *is* the
>> expected process, unless there's a place I can check that a patch is
>> still in the application queue?
>
> I didn't ask you to justify your process, I asked you how important you
> thought the patch was mainly because of the conflicting signals you've
> sent.  I get that you think I should treat all your patches as important
> whether you do or not, but the world doesn't quite work like that: patch
> application is a process of triage.  Patches, like this, which have
> timing related issues potentially leading to races get looked at by me
> as the last reviewer.  The speed of review depends on several factors,
> but one of which is what type of user visible issue is this causing.
> The user visible effects of this are a nasty warning message and nothing
> more, I believe?  A useful indicator in this triage is how important the
> submitter thinks the patch is, which was originally why I asked.
>

That would be a question to Praveen.  It wasn't clear to me whether
this sysfs backtrace was a simply a warning or eventually fatal to the
box.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-27 17:55           ` Dan Williams
@ 2015-07-27 18:38             ` James Bottomley
  2015-07-27 20:52               ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2015-07-27 18:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Mon, 2015-07-27 at 10:55 -0700, Dan Williams wrote:
> On Mon, Jul 27, 2015 at 10:11 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
> >> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
> >> >> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
> >> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> >> >
> >> >> >>  }
> >> >> >>
> >> >> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> >> >> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> >> >> >> index d3c5297c6c89..9a25ae3a52a4 100644
> >> >> >> --- a/drivers/scsi/libsas/sas_port.c
> >> >> >> +++ b/drivers/scsi/libsas/sas_port.c
> >> >> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
> >> >> >>
> >> >> >>       if (port->num_phys == 1) {
> >> >> >>               sas_unregister_domain_devices(port, gone);
> >> >> >> -             sas_port_delete(port->port);
> >> >> >>               port->port = NULL;
> >> >> >>       } else {
> >> >> >>               sas_port_delete_phy(port->port, phy->phy);
> >> >> >>
> >> >> >
> >> >> > This should become
> >> >> >
> >> >> > if (port->num_phys == 1)
> >> >> >         sas_unregister_domain_device(port, gone);
> >> >> >
> >> >> > sas_port_delete_phy(port->port, phy->phy);
> >> >> >
> >> >> > So we end up with a port scheduled for destruction with no phys rather
> >> >> > than making the last phy association hang around until the DISCE
> >> >> > workqueue runs.
> >> >>
> >> >> Sounds ok in theory.
> >> >
> >> > It's not really a choice.  The specific problem you've introduced with
> >> > this patch is failure to cope with link flutter: a deform and form event
> >> > queued sequentially.  In the new scheme you're trying to introduce, the
> >> > destruct event gets queued from the deform but behind the form and the
> >> > link flutter results in a dead link.  I thought just forcing a zero phy
> >> > port would fix this, but it won't, either the destruct has to run in the
> >> > context of the deform event or the form has to be queued later than the
> >> > destruct.  I think coupled with the changes above, there needs to be
> >> >
> >> > if (port->port) {
> >> >         /* dying port, requeue form event */
> >> >         resend the PORTE_BYTES_DMAED event
> >> >         return
> >> > }
> >> >
> >> > inside the unmatched port loop in sas_port_form() if nothing is found as
> >> > well to close this.
> >>
> >> I think it's too late.  Once the lldd has triggered libsas to start
> >> tear down I seem to recall the lldd has the expectation that a new
> >> PORTE_BYTES_DMAED triggers the creation of a new port instance for
> >> that phy.  Once the flutter reaches libsas the race is already lost
> >> and the port needs to be torn down, but I would need to take a closer
> >> look.
> >
> > I don't understand your reasoning.  The expectation is that
> > PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
> > this event precedes DISCE_DESTRUCT for the port and requeues the event,
> > now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?
> >
> 
> Ah, I read "flutter" and mistakenly thought "debounce".  You're
> addressing the case where we're fully committed to the port going
> down, but a "port up" event is injected between the "port down" and
> "destruct" events.  Yes, I agree re-queuing needs to happen, however
> don't we now have queue re-order problem with respect to a new "port
> down" event?  It seems events need to be held off in-order while the
> subordinate DISCE events are processed.

No, that seems to be the intent of the prior code.  The reason port
visibility goes immediately (along with all associated phys), is that
the port is ready for reuse as soon as sas_deform_port() returns.
Destroying the subtree is left to DISCE_DESTRUCT, but since the subtree
is not visible, a new port can form even as the elements associated with
the old port are being destroyed.  the DISCE_DISCOVER that populates it
will queue behind the destruct, so everything just works today (modulo
the new warning).

It would be ideal if we could detect in the destruct queue that the
visibility is already gone and make use of it, but we can't.  The patch
that causes this warning induces a mismatch between the state of the
kernfs tree and the kobjects ... we still have to call device_del which
leads to kobject_del (which triggers the warning) to bring this state
back into alignment.  So the disconnected subtree we originally used to
make all this work is what's suddenly been rendered invalid.

James



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-27 18:24           ` Dan Williams
@ 2015-07-27 19:53             ` Praveen Murali
  0 siblings, 0 replies; 13+ messages in thread
From: Praveen Murali @ 2015-07-27 19:53 UTC (permalink / raw)
  To: Dan Williams, James Bottomley
  Cc: Christoph Hellwig, linux-scsi, stable@vger.kernel.org

> -----Original Message-----
> From: Dan Williams [mailto:dan.j.williams@intel.com]
> Sent: Monday, July 27, 2015 11:25 AM
> To: James Bottomley
> Cc: Christoph Hellwig; Praveen Murali; linux-scsi; stable@vger.kernel.org
> Subject: Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings
> at port teardown time
> 
> > I didn't ask you to justify your process, I asked you how important you
> > thought the patch was mainly because of the conflicting signals you've
> > sent.  I get that you think I should treat all your patches as important
> > whether you do or not, but the world doesn't quite work like that: patch
> > application is a process of triage.  Patches, like this, which have
> > timing related issues potentially leading to races get looked at by me
> > as the last reviewer.  The speed of review depends on several factors,
> > but one of which is what type of user visible issue is this causing.
> > The user visible effects of this are a nasty warning message and nothing
> > more, I believe?  A useful indicator in this triage is how important the
> > submitter thinks the patch is, which was originally why I asked.
> >
> 
> That would be a question to Praveen.  It wasn't clear to me whether
> this sysfs backtrace was a simply a warning or eventually fatal to the
> box.
As far as I remember, the issue was mostly with the sysfs backtraces. I don’t
remember it causing a fatal error; but that could very well be because I did not
run it long enough.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time
  2015-07-27 18:38             ` James Bottomley
@ 2015-07-27 20:52               ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2015-07-27 20:52 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Praveen Murali, linux-scsi,
	stable@vger.kernel.org

On Mon, Jul 27, 2015 at 11:38 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> No, that seems to be the intent of the prior code.  The reason port
> visibility goes immediately (along with all associated phys), is that
> the port is ready for reuse as soon as sas_deform_port() returns.
> Destroying the subtree is left to DISCE_DESTRUCT, but since the subtree
> is not visible, a new port can form even as the elements associated with
> the old port are being destroyed.  the DISCE_DISCOVER that populates it
> will queue behind the destruct, so everything just works today (modulo
> the new warning).
>
> It would be ideal if we could detect in the destruct queue that the
> visibility is already gone and make use of it, but we can't.  The patch
> that causes this warning induces a mismatch between the state of the
> kernfs tree and the kobjects ... we still have to call device_del which
> leads to kobject_del (which triggers the warning) to bring this state
> back into alignment.  So the disconnected subtree we originally used to
> make all this work is what's suddenly been rendered invalid.
>

Looking at the original report this seemed new for the 3.16 kernel.
However, looking closer, that warning in sysfs_remove_group() has been
present for a long time before that.  I don't recall seeing it back
when we were doing focused hotplug testing with the isci driver for
the 3.5 / 3.6 kernels, so I believe it is newly uncovered since then.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-07-27 20:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-18  3:22 [for 4.1 PATCH resend] libsas: fix "sysfs group not found" warnings at port teardown time Dan Williams
2015-06-21 14:27 ` Hannes Reinecke
2015-07-22 18:28 ` James Bottomley
2015-07-22 21:08   ` Dan Williams
2015-07-27 15:17     ` James Bottomley
2015-07-27 15:48       ` Dan Williams
2015-07-27 17:11         ` James Bottomley
2015-07-27 17:55           ` Dan Williams
2015-07-27 18:38             ` James Bottomley
2015-07-27 20:52               ` Dan Williams
2015-07-27 18:07         ` James Bottomley
2015-07-27 18:24           ` Dan Williams
2015-07-27 19:53             ` Praveen Murali

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).