public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add()
@ 2022-11-11 14:44 Yang Yingliang
  2022-11-11 15:51 ` James Bottomley
  0 siblings, 1 reply; 6+ messages in thread
From: Yang Yingliang @ 2022-11-11 14:44 UTC (permalink / raw)
  To: linux-scsi; +Cc: jejb, martin.petersen, john.g.garry, yangyingliang

In sas_rphy_add(), if transport_add_device() fails, the device
is not added, the return value is not checked, it won't goto
error path, when removing rphy in normal remove path, it causes
null-ptr-deref, because transport_remove_device() is called to
remove the device that was not added.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000108
pc : device_del+0x54/0x3d0
lr : device_del+0x37c/0x3d0
Call trace:
 device_del+0x54/0x3d0
 attribute_container_class_device_del+0x28/0x38
 transport_remove_classdev+0x6c/0x80
 attribute_container_device_trigger+0x108/0x110
 transport_remove_device+0x28/0x38
 sas_rphy_remove+0x50/0x78 [scsi_transport_sas]
 sas_port_delete+0x30/0x148 [scsi_transport_sas]
 do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
 device_for_each_child+0x68/0xb0
 sas_remove_children+0x30/0x50 [scsi_transport_sas]
 sas_rphy_remove+0x38/0x78 [scsi_transport_sas]
 sas_port_delete+0x30/0x148 [scsi_transport_sas]
 do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
 device_for_each_child+0x68/0xb0
 sas_remove_children+0x30/0x50 [scsi_transport_sas]
 sas_remove_host+0x20/0x38 [scsi_transport_sas]
 scsih_remove+0xd8/0x420 [mpt3sas]

Fix this by checking and handling return value of transport_add_device()
in sas_rphy_add().

Fixes: c7ebbbce366c ("[SCSI] SAS transport class")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
v1 -> v2:
  Update commit message.
---
 drivers/scsi/scsi_transport_sas.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 74b99f2b0b74..accc0afa8f77 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -1526,7 +1526,11 @@ int sas_rphy_add(struct sas_rphy *rphy)
 	error = device_add(&rphy->dev);
 	if (error)
 		return error;
-	transport_add_device(&rphy->dev);
+	error = transport_add_device(&rphy->dev);
+	if (error) {
+		device_del(&rphy->dev);
+		return error;
+	}
 	transport_configure_device(&rphy->dev);
 	if (sas_bsg_initialize(shost, rphy))
 		printk("fail to a bsg device %s\n", dev_name(&rphy->dev));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add()
  2022-11-11 14:44 [PATCH v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add() Yang Yingliang
@ 2022-11-11 15:51 ` James Bottomley
  2022-11-18  3:11   ` Yang Yingliang
  0 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2022-11-11 15:51 UTC (permalink / raw)
  To: Yang Yingliang, linux-scsi; +Cc: martin.petersen, john.g.garry

On Fri, 2022-11-11 at 22:44 +0800, Yang Yingliang wrote:
> In sas_rphy_add(), if transport_add_device() fails, the device
> is not added, the return value is not checked, it won't goto
> error path, when removing rphy in normal remove path, it causes
> null-ptr-deref, because transport_remove_device() is called to
> remove the device that was not added.
> 
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000108
> pc : device_del+0x54/0x3d0
> lr : device_del+0x37c/0x3d0
> Call trace:
>  device_del+0x54/0x3d0
>  attribute_container_class_device_del+0x28/0x38
>  transport_remove_classdev+0x6c/0x80
>  attribute_container_device_trigger+0x108/0x110
>  transport_remove_device+0x28/0x38
>  sas_rphy_remove+0x50/0x78 [scsi_transport_sas]
>  sas_port_delete+0x30/0x148 [scsi_transport_sas]
>  do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
>  device_for_each_child+0x68/0xb0
>  sas_remove_children+0x30/0x50 [scsi_transport_sas]
>  sas_rphy_remove+0x38/0x78 [scsi_transport_sas]
>  sas_port_delete+0x30/0x148 [scsi_transport_sas]
>  do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
>  device_for_each_child+0x68/0xb0
>  sas_remove_children+0x30/0x50 [scsi_transport_sas]
>  sas_remove_host+0x20/0x38 [scsi_transport_sas]
>  scsih_remove+0xd8/0x420 [mpt3sas]
> 
> Fix this by checking and handling return value of
> transport_add_device()
> in sas_rphy_add().
> 
> Fixes: c7ebbbce366c ("[SCSI] SAS transport class")
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
> v1 -> v2:
>   Update commit message.
> ---
>  drivers/scsi/scsi_transport_sas.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_transport_sas.c
> b/drivers/scsi/scsi_transport_sas.c
> index 74b99f2b0b74..accc0afa8f77 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -1526,7 +1526,11 @@ int sas_rphy_add(struct sas_rphy *rphy)
>         error = device_add(&rphy->dev);
>         if (error)
>                 return error;
> -       transport_add_device(&rphy->dev);
> +       error = transport_add_device(&rphy->dev);
> +       if (error) {
> +               device_del(&rphy->dev);
> +               return error;
> +       }
>         transport_configure_device(&rphy->dev);
>         if (sas_bsg_initialize(shost, rphy))
>                 printk("fail to a bsg device %s\n", dev_name(&rphy-
> >dev));

There is a slight problem with doing this in that if
transport_device_add() ever fails it's likely because memory pressure
caused the allocation of the internal_container to fail. What that
means is that the visible sysfs attributes don't get added, but
otherwise the rphy is fully functional as far as the driver sees it, so
this condition doesn't have to be a fatal error which kills the device.

There are two ways of handling this:

   1. The above to move the condition from an ignored to a fatal error.
      It's so rare that we almost never see it in practice and if it
      ever happened, the machine is so low on memory that something
      else is bound to fail an allocation and kill the device anyway,
      so treating it as non-fatal likely serves no purpose.
   2. Simply to make the assumption that transport_remove_device() is
      idempotent true by adding a flag in the internal_class to signify
      removal is required. This would preserve current behaviour and
      have the bonus that it only requires a single patch, not one
      patch per transport class object that has this problem.

I'd probably prefer 2. since it's way less work, but others might have
different opinions.

James


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add()
  2022-11-11 15:51 ` James Bottomley
@ 2022-11-18  3:11   ` Yang Yingliang
  2022-11-18  9:18     ` John Garry
  0 siblings, 1 reply; 6+ messages in thread
From: Yang Yingliang @ 2022-11-18  3:11 UTC (permalink / raw)
  To: jejb, linux-scsi, Greg KH; +Cc: martin.petersen, john.g.garry, yangyingliang

+Cc: Greg

Hi Greg,

On 2022/11/11 23:51, James Bottomley wrote:
> On Fri, 2022-11-11 at 22:44 +0800, Yang Yingliang wrote:
>> In sas_rphy_add(), if transport_add_device() fails, the device
>> is not added, the return value is not checked, it won't goto
>> error path, when removing rphy in normal remove path, it causes
>> null-ptr-deref, because transport_remove_device() is called to
>> remove the device that was not added.
>>
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 0000000000000108
>> pc : device_del+0x54/0x3d0
>> lr : device_del+0x37c/0x3d0
>> Call trace:
>>   device_del+0x54/0x3d0
>>   attribute_container_class_device_del+0x28/0x38
>>   transport_remove_classdev+0x6c/0x80
>>   attribute_container_device_trigger+0x108/0x110
>>   transport_remove_device+0x28/0x38
>>   sas_rphy_remove+0x50/0x78 [scsi_transport_sas]
>>   sas_port_delete+0x30/0x148 [scsi_transport_sas]
>>   do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
>>   device_for_each_child+0x68/0xb0
>>   sas_remove_children+0x30/0x50 [scsi_transport_sas]
>>   sas_rphy_remove+0x38/0x78 [scsi_transport_sas]
>>   sas_port_delete+0x30/0x148 [scsi_transport_sas]
>>   do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
>>   device_for_each_child+0x68/0xb0
>>   sas_remove_children+0x30/0x50 [scsi_transport_sas]
>>   sas_remove_host+0x20/0x38 [scsi_transport_sas]
>>   scsih_remove+0xd8/0x420 [mpt3sas]
>>
>> Fix this by checking and handling return value of
>> transport_add_device()
>> in sas_rphy_add().
>>
>> Fixes: c7ebbbce366c ("[SCSI] SAS transport class")
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>> v1 -> v2:
>>    Update commit message.
>> ---
>>   drivers/scsi/scsi_transport_sas.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_transport_sas.c
>> b/drivers/scsi/scsi_transport_sas.c
>> index 74b99f2b0b74..accc0afa8f77 100644
>> --- a/drivers/scsi/scsi_transport_sas.c
>> +++ b/drivers/scsi/scsi_transport_sas.c
>> @@ -1526,7 +1526,11 @@ int sas_rphy_add(struct sas_rphy *rphy)
>>          error = device_add(&rphy->dev);
>>          if (error)
>>                  return error;
>> -       transport_add_device(&rphy->dev);
>> +       error = transport_add_device(&rphy->dev);
>> +       if (error) {
>> +               device_del(&rphy->dev);
>> +               return error;
>> +       }
>>          transport_configure_device(&rphy->dev);
>>          if (sas_bsg_initialize(shost, rphy))
>>                  printk("fail to a bsg device %s\n", dev_name(&rphy-
>>> dev));
> There is a slight problem with doing this in that if
> transport_device_add() ever fails it's likely because memory pressure
> caused the allocation of the internal_container to fail. What that
> means is that the visible sysfs attributes don't get added, but
> otherwise the rphy is fully functional as far as the driver sees it, so
> this condition doesn't have to be a fatal error which kills the device.
>
> There are two ways of handling this:
>
>     1. The above to move the condition from an ignored to a fatal error.
>        It's so rare that we almost never see it in practice and if it
>        ever happened, the machine is so low on memory that something
>        else is bound to fail an allocation and kill the device anyway,
>        so treating it as non-fatal likely serves no purpose.
>     2. Simply to make the assumption that transport_remove_device() is
>        idempotent true by adding a flag in the internal_class to signify
>        removal is required. This would preserve current behaviour and
>        have the bonus that it only requires a single patch, not one
>        patch per transport class object that has this problem.
>
> I'd probably prefer 2. since it's way less work, but others might have
> different opinions.
Current some callers ignore the return value of transport_add_device(), 
if it fails,
it will cause null-ptr-deref in transport_remove_device().

James suggested that add some check in transport_remove_device(), so all can
be fix in one patch.

Do you have any suggestion for this ?

Thanks,
Yang
>
> James
>
> .

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add()
  2022-11-18  3:11   ` Yang Yingliang
@ 2022-11-18  9:18     ` John Garry
  2022-11-19  8:58       ` Yang Yingliang
  0 siblings, 1 reply; 6+ messages in thread
From: John Garry @ 2022-11-18  9:18 UTC (permalink / raw)
  To: Yang Yingliang, jejb, linux-scsi, Greg KH; +Cc: martin.petersen

On 18/11/2022 03:11, Yang Yingliang wrote:
>>>> );
>> There is a slight problem with doing this in that if
>> transport_device_add() ever fails it's likely because memory pressure
>> caused the allocation of the internal_container to fail. What that
>> means is that the visible sysfs attributes don't get added, but
>> otherwise the rphy is fully functional as far as the driver sees it, so
>> this condition doesn't have to be a fatal error which kills the device.
>>
>> There are two ways of handling this:
>>
>>     1. The above to move the condition from an ignored to a fatal error.
>>        It's so rare that we almost never see it in practice and if it
>>        ever happened, the machine is so low on memory that something
>>        else is bound to fail an allocation and kill the device anyway,
>>        so treating it as non-fatal likely serves no purpose.
>>     2. Simply to make the assumption that transport_remove_device() is
>>        idempotent true by adding a flag in the internal_class to signify
>>        removal is required. This would preserve current behaviour and
>>        have the bonus that it only requires a single patch, not one
>>        patch per transport class object that has this problem.
>>
>> I'd probably prefer 2. since it's way less work, but others might have
>> different opinions.
> Current some callers ignore the return value of transport_add_device(), 
> if it fails,
> it will cause null-ptr-deref in transport_remove_device().
> 
> James suggested that add some check in transport_remove_device(), so all 
> can
> be fix in one patch.
> 
> Do you have any suggestion for this ?

Personally I prefer 1. However did you develop a prototype patch for how 
2. would look? And how many changes are still required for 1.?

Thanks,
John

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add()
  2022-11-18  9:18     ` John Garry
@ 2022-11-19  8:58       ` Yang Yingliang
  2022-11-21 12:51         ` John Garry
  0 siblings, 1 reply; 6+ messages in thread
From: Yang Yingliang @ 2022-11-19  8:58 UTC (permalink / raw)
  To: John Garry, jejb, linux-scsi, Greg KH; +Cc: martin.petersen, yangyingliang


On 2022/11/18 17:18, John Garry wrote:
> On 18/11/2022 03:11, Yang Yingliang wrote:
>>>>> );
>>> There is a slight problem with doing this in that if
>>> transport_device_add() ever fails it's likely because memory pressure
>>> caused the allocation of the internal_container to fail. What that
>>> means is that the visible sysfs attributes don't get added, but
>>> otherwise the rphy is fully functional as far as the driver sees it, so
>>> this condition doesn't have to be a fatal error which kills the device.
>>>
>>> There are two ways of handling this:
>>>
>>>     1. The above to move the condition from an ignored to a fatal 
>>> error.
>>>        It's so rare that we almost never see it in practice and if it
>>>        ever happened, the machine is so low on memory that something
>>>        else is bound to fail an allocation and kill the device anyway,
>>>        so treating it as non-fatal likely serves no purpose.
>>>     2. Simply to make the assumption that transport_remove_device() is
>>>        idempotent true by adding a flag in the internal_class to 
>>> signify
>>>        removal is required. This would preserve current behaviour and
>>>        have the bonus that it only requires a single patch, not one
>>>        patch per transport class object that has this problem.
>>>
>>> I'd probably prefer 2. since it's way less work, but others might have
>>> different opinions.
>> Current some callers ignore the return value of 
>> transport_add_device(), if it fails,
>> it will cause null-ptr-deref in transport_remove_device().
>>
>> James suggested that add some check in transport_remove_device(), so 
>> all can
>> be fix in one patch.
>>
>> Do you have any suggestion for this ?
>
> Personally I prefer 1. However did you develop a prototype patch for 
> how 2. would look? And how many changes are still required for 1.?
For 1, in total, there are 8 places need be checked
in drivers/scsi/scsi_transport_sas.c, 2 places
in drivers/scsi/scsi_sysfs.c, 3 places
in drivers/scsi/scsi_transport_fc.c, 2 places
in drivers/scsi/scsi_transport_srp.c, 1 place

For 2, I think we can use device_is_registered() to check if add 
operation is successful, may be like this (not test yet):

diff --git a/drivers/base/transport_class.c b/drivers/base/transport_class.c
index ccc86206e508..ac41be7b724e 100644
--- a/drivers/base/transport_class.c
+++ b/drivers/base/transport_class.c
@@ -227,9 +227,11 @@ static int transport_remove_classdev(struct 
attribute_container *cont,
          tclass->remove(tcont, dev, classdev);

      if (tclass->remove != anon_transport_dummy_function) {
-        if (tcont->statistics)
-            sysfs_remove_group(&classdev->kobj, tcont->statistics);
-        attribute_container_class_device_del(classdev);
+        if (device_is_registered(classdev)) {
+            if (tcont->statistics)
+                sysfs_remove_group(&classdev->kobj, tcont->statistics);
+            attribute_container_class_device_del(classdev);
+        }
      }

      return 0;

Thanks,
Yang
>
> Thanks,
> John
> .

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add()
  2022-11-19  8:58       ` Yang Yingliang
@ 2022-11-21 12:51         ` John Garry
  0 siblings, 0 replies; 6+ messages in thread
From: John Garry @ 2022-11-21 12:51 UTC (permalink / raw)
  To: Yang Yingliang, jejb, linux-scsi, Greg KH; +Cc: martin.petersen

On 19/11/2022 08:58, Yang Yingliang wrote:
>> Personally I prefer 1. However did you develop a prototype patch for 
>> how 2. would look? And how many changes are still required for 1.?
> For 1, in total, there are 8 places need be checked
> in drivers/scsi/scsi_transport_sas.c, 2 places
> in drivers/scsi/scsi_sysfs.c, 3 places
> in drivers/scsi/scsi_transport_fc.c, 2 places
> in drivers/scsi/scsi_transport_srp.c, 1 place

and in linux-next there are 4x places which do already check the return 
code...

Not sure what's best to do. I'll leave it to James' wisdom.

However, we do seem to have a common pattern:

error = device_add(dev);
if (error)
	return error;
transport_add_device(dev);
transport_configure_device(dev);

Could we make an "add" (which does as above pattern) and "remove" 
helper? It might simplify things such that we not only fixing the 
possible crash but also reducing code.

Thanks,
John

> 
> For 2, I think we can use device_is_registered() to check if add 
> operation is successful, may be like this (not test yet):
> 
> diff --git a/drivers/base/transport_class.c 
> b/drivers/base/transport_class.c
> index ccc86206e508..ac41be7b724e 100644
> --- a/drivers/base/transport_class.c
> +++ b/drivers/base/transport_class.c
> @@ -227,9 +227,11 @@ static int transport_remove_classdev(struct 
> attribute_container *cont,
>           tclass->remove(tcont, dev, classdev);
> 
>       if (tclass->remove != anon_transport_dummy_function) {
> -        if (tcont->statistics)
> -            sysfs_remove_group(&classdev->kobj, tcont->statistics);
> -        attribute_container_class_device_del(classdev);
> +        if (device_is_registered(classdev)) {
> +            if (tcont->statistics)
> +                sysfs_remove_group(&classdev->kobj, tcont->statistics);
> +            attribute_container_class_device_del(classdev);
> +        }
>       }
> 
>       return 0;


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-21 12:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-11 14:44 [PATCH v2] scsi: scsi_transport_sas: fix error handling in sas_rphy_add() Yang Yingliang
2022-11-11 15:51 ` James Bottomley
2022-11-18  3:11   ` Yang Yingliang
2022-11-18  9:18     ` John Garry
2022-11-19  8:58       ` Yang Yingliang
2022-11-21 12:51         ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox