* [PATCH-next v2 0/2] scsi, driver core: fix iscsi rescan fails to create block device @ 2023-01-28 9:41 Zhong Jinghua 2023-01-28 9:41 ` [PATCH-next v2 1/2] driver core: introduce get_device_unless_zero() Zhong Jinghua 2023-01-28 9:41 ` [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device Zhong Jinghua 0 siblings, 2 replies; 15+ messages in thread From: Zhong Jinghua @ 2023-01-28 9:41 UTC (permalink / raw) To: gregkh, jejb, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, zhongjinghua, yi.zhang, yukuai3 v1->v2: add a new patch that introduces get_device_unless_zero() method. Hello, This patchset introduces get_device_unless_zero() method, Avoid dev's reference count from 0 to 1, as this will cause bugs in some parts of the kernel. We used this method when we fixed an issue with iSCSI delete order. Zhong Jinghua (2): driver core: introduce get_device_unless_zero() scsi: fix iscsi rescan fails to create block device drivers/base/core.c | 8 ++++++++ drivers/scsi/scsi_sysfs.c | 4 +--- include/linux/device.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-) -- 2.31.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH-next v2 1/2] driver core: introduce get_device_unless_zero() 2023-01-28 9:41 [PATCH-next v2 0/2] scsi, driver core: fix iscsi rescan fails to create block device Zhong Jinghua @ 2023-01-28 9:41 ` Zhong Jinghua 2023-01-28 10:43 ` Greg KH 2023-01-28 9:41 ` [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device Zhong Jinghua 1 sibling, 1 reply; 15+ messages in thread From: Zhong Jinghua @ 2023-01-28 9:41 UTC (permalink / raw) To: gregkh, jejb, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, zhongjinghua, yi.zhang, yukuai3 When the dev reference count is 0, calling get_device will go from 0 to 1, which will cause errors in some place of the kernel. So introduce a get_devcie_unless_zero method that returns NULL when the dev reference count is 0. Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> --- drivers/base/core.c | 8 ++++++++ include/linux/device.h | 1 + 2 files changed, 9 insertions(+) diff --git a/drivers/base/core.c b/drivers/base/core.c index d02501933467..6f17a93a3443 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -3613,6 +3613,14 @@ struct device *get_device(struct device *dev) } EXPORT_SYMBOL_GPL(get_device); +struct device __must_check *get_device_unless_zero(struct device *dev) +{ + if (!dev || !kobject_get_unless_zero(&dev->kobj)) + return NULL; + return dev; +} +EXPORT_SYMBOL_GPL(get_device_unless_zero); + /** * put_device - decrement reference count. * @dev: device in question. diff --git a/include/linux/device.h b/include/linux/device.h index 424b55df0272..c63bac6d51c8 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1069,6 +1069,7 @@ extern int (*platform_notify_remove)(struct device *dev); * */ struct device *get_device(struct device *dev); +struct device __must_check *get_device_unless_zero(struct device *dev); void put_device(struct device *dev); bool kill_device(struct device *dev); -- 2.31.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 1/2] driver core: introduce get_device_unless_zero() 2023-01-28 9:41 ` [PATCH-next v2 1/2] driver core: introduce get_device_unless_zero() Zhong Jinghua @ 2023-01-28 10:43 ` Greg KH 0 siblings, 0 replies; 15+ messages in thread From: Greg KH @ 2023-01-28 10:43 UTC (permalink / raw) To: Zhong Jinghua Cc: jejb, martin.petersen, hare, bvanassche, emilne, linux-kernel, linux-scsi, yi.zhang, yukuai3 On Sat, Jan 28, 2023 at 05:41:45PM +0800, Zhong Jinghua wrote: > When the dev reference count is 0, calling get_device will go from 0 to 1, You can NOT have a device reference count that is 0. If you do, you are doing something really really wrong, and there's a bug somewhere else. > which will cause errors in some place of the kernel. It's already an error in the kernel that tries to increment a reference count of 0 as that device is already freed and you are working with memory that is not present. > So introduce a > get_devcie_unless_zero method that returns NULL when the dev reference > count is 0. No, this is not ok, sorry, please never do this. Fix the caller. thanks, greg k-h ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-28 9:41 [PATCH-next v2 0/2] scsi, driver core: fix iscsi rescan fails to create block device Zhong Jinghua 2023-01-28 9:41 ` [PATCH-next v2 1/2] driver core: introduce get_device_unless_zero() Zhong Jinghua @ 2023-01-28 9:41 ` Zhong Jinghua 2023-01-28 10:45 ` Greg KH 2023-01-29 17:30 ` James Bottomley 1 sibling, 2 replies; 15+ messages in thread From: Zhong Jinghua @ 2023-01-28 9:41 UTC (permalink / raw) To: gregkh, jejb, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, zhongjinghua, yi.zhang, yukuai3 When the three iscsi operations delete, logout, and rescan are concurrent at the same time, there is a probability of failure to add disk through device_add_disk(). The concurrent process is as follows: T0: scan host // echo 1 > /sys/devices/platform/host1/scsi_host/host1/scan T1: delete target // echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete T2: logout // iscsiadm -m node --login T3: T2 scsi_queue_work T4: T0 bus_probe_device T0 T1 T2 T3 scsi_scan_target mutex_lock(&shost->scan_mutex); __scsi_scan_target scsi_report_lun_scan scsi_add_lun scsi_sysfs_add_sdev device_add kobject_add //create session1/target1:0:0/1:0:0:1/ ... bus_probe_device // Create block asynchronously mutex_unlock(&shost->scan_mutex); sdev_store_delete scsi_remove_device device_remove_file mutex_lock(scan_mutex) __scsi_remove_device res = scsi_device_set_state(sdev, SDEV_CANCEL) iscsi_if_recv_msg scsi_queue_work __iscsi_unbind_session session->target_id = ISCSI_MAX_TARGET __scsi_remove_target sdev->sdev_state == SDEV_CANCEL continue; // end, No delete kobject 1:0:0:1 iscsi_if_recv_msg transport->destroy_session(session) __iscsi_destroy_session iscsi_session_teardown iscsi_remove_session __iscsi_unbind_session iscsi_session_event device_del // delete session T4: // create the block, its parent is 1:0:0:1 // If kobject 1:0:0:1 does not exist, it won't go down __device_attach_async_helper device_lock ... __device_attach_driver driver_probe_device really_probe sd_probe device_add_disk register_disk device_add // error The block is created after the seesion is deleted. When T2 deletes the session, it will mark block'parent 1:0:01 as unusable: T2 device_del kobject_del sysfs_remove_dir __kernfs_remove // Mark the children under the session as unusable while ((pos = kernfs_next_descendant_post(pos, kn))) if (kernfs_active(pos)) atomic_add(KN_DEACTIVATED_BIAS, &pos->active); Then, create the block: T4 device_add kobject_add kobject_add_varg kobject_add_internal create_dir sysfs_create_dir_ns kernfs_create_dir_ns kernfs_add_one if ((parent->flags & KERNFS_ACTIVATED) && !kernfs_active(parent)) goto out_unlock; // return error This error will cause a warning: kobject_add_internal failed for block (error: -2 parent: 1:0:0:1). In the lower version (such as 5.10), there is no corresponding error handling, continuing to go down will trigger a kernel panic, so cc stable. Therefore, creating the block should not be done after deleting the session. More practically, we should ensure that the target under the session is deleted first, and then the session is deleted. In this way, there are two possibilities: 1) if the process(T1) of deleting the target execute first, it will grab the device_lock(), and the process(T4) of creating the block will wait for the deletion to complete. Then, block's parent 1:0:0:1 has been deleted, it won't go down. 2) if the process(T4) of creating block execute first, it will grab the device_lock(), and the process(T1) of deleting the target will wait for the creation block to complete. Then, the process(T2) of deleting the session should need wait for the deletion to complete. Fix it by removing the judgment of state equal to SDEV_CANCEL in __scsi_remove_target() to ensure the order of deletion. Then, it will wait for T1's mutex_lock(scan_mutex) and device_del() in __scsi_remove_device() will wait for T4's device_lock(dev). But we found that such a fix would cause the previous problem: commit 81b6c9998979 ("scsi: core: check for device state in __scsi_remove_target()"). So we use get_device_unless_zero() instead of get_devcie() to fix the previous problem. Fixes: 81b6c9998979 ("scsi: core: check for device state in __scsi_remove_target()") Cc: <stable@vger.kernel.org> Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> --- drivers/scsi/scsi_sysfs.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index cac7c902cf70..a22109cdb8ef 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1535,9 +1535,7 @@ static void __scsi_remove_target(struct scsi_target *starget) if (sdev->channel != starget->channel || sdev->id != starget->id) continue; - if (sdev->sdev_state == SDEV_DEL || - sdev->sdev_state == SDEV_CANCEL || - !get_device(&sdev->sdev_gendev)) + if (!get_device_unless_zero(&sdev->sdev_gendev)) continue; spin_unlock_irqrestore(shost->host_lock, flags); scsi_remove_device(sdev); -- 2.31.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-28 9:41 ` [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device Zhong Jinghua @ 2023-01-28 10:45 ` Greg KH 2023-01-29 1:13 ` Yu Kuai 2023-01-29 17:30 ` James Bottomley 1 sibling, 1 reply; 15+ messages in thread From: Greg KH @ 2023-01-28 10:45 UTC (permalink / raw) To: Zhong Jinghua Cc: jejb, martin.petersen, hare, bvanassche, emilne, linux-kernel, linux-scsi, yi.zhang, yukuai3 On Sat, Jan 28, 2023 at 05:41:46PM +0800, Zhong Jinghua wrote: > When the three iscsi operations delete, logout, and rescan are concurrent > at the same time, there is a probability of failure to add disk through > device_add_disk(). The concurrent process is as follows: > > T0: scan host // echo 1 > /sys/devices/platform/host1/scsi_host/host1/scan > T1: delete target // echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete > T2: logout // iscsiadm -m node --login > T3: T2 scsi_queue_work > T4: T0 bus_probe_device > > T0 T1 T2 T3 > scsi_scan_target > mutex_lock(&shost->scan_mutex); > __scsi_scan_target > scsi_report_lun_scan > scsi_add_lun > scsi_sysfs_add_sdev > device_add > kobject_add > //create session1/target1:0:0/1:0:0:1/ > ... > bus_probe_device > // Create block asynchronously > mutex_unlock(&shost->scan_mutex); > sdev_store_delete > scsi_remove_device > device_remove_file > mutex_lock(scan_mutex) > __scsi_remove_device > res = scsi_device_set_state(sdev, SDEV_CANCEL) > iscsi_if_recv_msg > scsi_queue_work > __iscsi_unbind_session > session->target_id = ISCSI_MAX_TARGET > __scsi_remove_target > sdev->sdev_state == SDEV_CANCEL > continue; > // end, No delete kobject 1:0:0:1 > iscsi_if_recv_msg > transport->destroy_session(session) > __iscsi_destroy_session > iscsi_session_teardown > iscsi_remove_session > __iscsi_unbind_session > iscsi_session_event > device_del > // delete session > T4: > // create the block, its parent is 1:0:0:1 > // If kobject 1:0:0:1 does not exist, it won't go down > __device_attach_async_helper > device_lock > ... > __device_attach_driver > driver_probe_device > really_probe > sd_probe > device_add_disk > register_disk > device_add > // error > > The block is created after the seesion is deleted. > When T2 deletes the session, it will mark block'parent 1:0:01 as unusable: > T2 > device_del > kobject_del > sysfs_remove_dir > __kernfs_remove > // Mark the children under the session as unusable > while ((pos = kernfs_next_descendant_post(pos, kn))) > if (kernfs_active(pos)) > atomic_add(KN_DEACTIVATED_BIAS, &pos->active); > > Then, create the block: > T4 > device_add > kobject_add > kobject_add_varg > kobject_add_internal > create_dir > sysfs_create_dir_ns > kernfs_create_dir_ns > kernfs_add_one > if ((parent->flags & KERNFS_ACTIVATED) && !kernfs_active(parent)) > goto out_unlock; > // return error > > This error will cause a warning: > kobject_add_internal failed for block (error: -2 parent: 1:0:0:1). > In the lower version (such as 5.10), there is no corresponding error handling, continuing > to go down will trigger a kernel panic, so cc stable. > > Therefore, creating the block should not be done after deleting the session. > More practically, we should ensure that the target under the session is deleted first, > and then the session is deleted. In this way, there are two possibilities: > > 1) if the process(T1) of deleting the target execute first, it will grab the device_lock(), > and the process(T4) of creating the block will wait for the deletion to complete. > Then, block's parent 1:0:0:1 has been deleted, it won't go down. > > 2) if the process(T4) of creating block execute first, it will grab the device_lock(), > and the process(T1) of deleting the target will wait for the creation block to complete. > Then, the process(T2) of deleting the session should need wait for the deletion to complete. > > Fix it by removing the judgment of state equal to SDEV_CANCEL in > __scsi_remove_target() to ensure the order of deletion. Then, it will wait for > T1's mutex_lock(scan_mutex) and device_del() in __scsi_remove_device() will wait for > T4's device_lock(dev). > But we found that such a fix would cause the previous problem: > commit 81b6c9998979 ("scsi: core: check for device state in __scsi_remove_target()"). > So we use get_device_unless_zero() instead of get_devcie() to fix the previous problem. > > Fixes: 81b6c9998979 ("scsi: core: check for device state in __scsi_remove_target()") > Cc: <stable@vger.kernel.org> > Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> > --- > drivers/scsi/scsi_sysfs.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index cac7c902cf70..a22109cdb8ef 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1535,9 +1535,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > if (sdev->channel != starget->channel || > sdev->id != starget->id) > continue; > - if (sdev->sdev_state == SDEV_DEL || > - sdev->sdev_state == SDEV_CANCEL || > - !get_device(&sdev->sdev_gendev)) > + if (!get_device_unless_zero(&sdev->sdev_gendev)) If sdev_gendev is 0 here, the object is gone and you are working with memory that is already freed so something is _VERY_ wrong. This isn't ok, sorry. greg k-h ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-28 10:45 ` Greg KH @ 2023-01-29 1:13 ` Yu Kuai 2023-01-29 6:46 ` Greg KH 0 siblings, 1 reply; 15+ messages in thread From: Yu Kuai @ 2023-01-29 1:13 UTC (permalink / raw) To: Greg KH, Zhong Jinghua Cc: jejb, martin.petersen, hare, bvanassche, emilne, linux-kernel, linux-scsi, yi.zhang, yukuai (C) Hi, Greg 在 2023/01/28 18:45, Greg KH 写道: >> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c >> index cac7c902cf70..a22109cdb8ef 100644 >> --- a/drivers/scsi/scsi_sysfs.c >> +++ b/drivers/scsi/scsi_sysfs.c >> @@ -1535,9 +1535,7 @@ static void __scsi_remove_target(struct scsi_target *starget) >> if (sdev->channel != starget->channel || >> sdev->id != starget->id) >> continue; >> - if (sdev->sdev_state == SDEV_DEL || >> - sdev->sdev_state == SDEV_CANCEL || >> - !get_device(&sdev->sdev_gendev)) >> + if (!get_device_unless_zero(&sdev->sdev_gendev)) > > If sdev_gendev is 0 here, the object is gone and you are working with > memory that is already freed so something is _VERY_ wrong. In fact, this patch will work: In __scsi_remove_target(), 'host_lock' is held to protect iterating siblings, and object will wait for this lock in scsi_device_dev_release() to remove siblings. Hence sdev will not be freed untill the lock is released. Thanks, Kuai > > This isn't ok, sorry. > > greg k-h > . > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-29 1:13 ` Yu Kuai @ 2023-01-29 6:46 ` Greg KH 2023-01-29 6:55 ` Yu Kuai 0 siblings, 1 reply; 15+ messages in thread From: Greg KH @ 2023-01-29 6:46 UTC (permalink / raw) To: Yu Kuai Cc: Zhong Jinghua, jejb, martin.petersen, hare, bvanassche, emilne, linux-kernel, linux-scsi, yi.zhang, yukuai (C) On Sun, Jan 29, 2023 at 09:13:55AM +0800, Yu Kuai wrote: > Hi, Greg > > 在 2023/01/28 18:45, Greg KH 写道: > > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > > > index cac7c902cf70..a22109cdb8ef 100644 > > > --- a/drivers/scsi/scsi_sysfs.c > > > +++ b/drivers/scsi/scsi_sysfs.c > > > @@ -1535,9 +1535,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > > > if (sdev->channel != starget->channel || > > > sdev->id != starget->id) > > > continue; > > > - if (sdev->sdev_state == SDEV_DEL || > > > - sdev->sdev_state == SDEV_CANCEL || > > > - !get_device(&sdev->sdev_gendev)) > > > + if (!get_device_unless_zero(&sdev->sdev_gendev)) > > > > If sdev_gendev is 0 here, the object is gone and you are working with > > memory that is already freed so something is _VERY_ wrong. > > In fact, this patch will work: > > In __scsi_remove_target(), 'host_lock' is held to protect iterating > siblings, and object will wait for this lock in > scsi_device_dev_release() to remove siblings. Hence sdev will not be > freed untill the lock is released. Then you got lucky, as that is not how a reference counted object should be working (i.e. the reference dropped to 0 and it still be kept alive.) Please fix up the scsi logic here, don't abuse the reference count code. thanks, greg k-h ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-29 6:46 ` Greg KH @ 2023-01-29 6:55 ` Yu Kuai 0 siblings, 0 replies; 15+ messages in thread From: Yu Kuai @ 2023-01-29 6:55 UTC (permalink / raw) To: Greg KH, Yu Kuai Cc: Zhong Jinghua, jejb, martin.petersen, hare, bvanassche, emilne, linux-kernel, linux-scsi, yi.zhang, yukuai (C) Hi, 在 2023/01/29 14:46, Greg KH 写道: > On Sun, Jan 29, 2023 at 09:13:55AM +0800, Yu Kuai wrote: >> Hi, Greg >> >> 在 2023/01/28 18:45, Greg KH 写道: >>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c >>>> index cac7c902cf70..a22109cdb8ef 100644 >>>> --- a/drivers/scsi/scsi_sysfs.c >>>> +++ b/drivers/scsi/scsi_sysfs.c >>>> @@ -1535,9 +1535,7 @@ static void __scsi_remove_target(struct scsi_target *starget) >>>> if (sdev->channel != starget->channel || >>>> sdev->id != starget->id) >>>> continue; >>>> - if (sdev->sdev_state == SDEV_DEL || >>>> - sdev->sdev_state == SDEV_CANCEL || >>>> - !get_device(&sdev->sdev_gendev)) >>>> + if (!get_device_unless_zero(&sdev->sdev_gendev)) >>> >>> If sdev_gendev is 0 here, the object is gone and you are working with >>> memory that is already freed so something is _VERY_ wrong. >> >> In fact, this patch will work: >> >> In __scsi_remove_target(), 'host_lock' is held to protect iterating >> siblings, and object will wait for this lock in >> scsi_device_dev_release() to remove siblings. Hence sdev will not be >> freed untill the lock is released. > > Then you got lucky, as that is not how a reference counted object should > be working (i.e. the reference dropped to 0 and it still be kept alive.) > > Please fix up the scsi logic here, don't abuse the reference count code. > Thanks for the reply, I agree that we should fix this in scsi layer. Kuai ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-28 9:41 ` [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device Zhong Jinghua 2023-01-28 10:45 ` Greg KH @ 2023-01-29 17:30 ` James Bottomley 2023-01-30 3:07 ` Yu Kuai 1 sibling, 1 reply; 15+ messages in thread From: James Bottomley @ 2023-01-29 17:30 UTC (permalink / raw) To: Zhong Jinghua, gregkh, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, yi.zhang, yukuai3 On Sat, 2023-01-28 at 17:41 +0800, Zhong Jinghua wrote: > This error will cause a warning: > kobject_add_internal failed for block (error: -2 parent: 1:0:0:1). > In the lower version (such as 5.10), there is no corresponding error > handling, continuing > to go down will trigger a kernel panic, so cc stable. Is this is important point and what you're saying is that this only panics on kernels before 5.10 or so because after that it's correctly failed by block device error handling so there's nothing to fix in later kernels? In that case, isn't the correct fix to look at backporting the block device error handling: commit 83cbce9574462c6b4eed6797bdaf18fae6859ab3 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Wed Aug 18 16:45:40 2021 +0200 block: add error handling for device_add_disk / add_disk ? James ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-29 17:30 ` James Bottomley @ 2023-01-30 3:07 ` Yu Kuai 2023-01-30 3:29 ` James Bottomley 0 siblings, 1 reply; 15+ messages in thread From: Yu Kuai @ 2023-01-30 3:07 UTC (permalink / raw) To: jejb, Zhong Jinghua, gregkh, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, yi.zhang, yukuai (C) Hi, 在 2023/01/30 1:30, James Bottomley 写道: > On Sat, 2023-01-28 at 17:41 +0800, Zhong Jinghua wrote: >> This error will cause a warning: >> kobject_add_internal failed for block (error: -2 parent: 1:0:0:1). >> In the lower version (such as 5.10), there is no corresponding error >> handling, continuing >> to go down will trigger a kernel panic, so cc stable. > > Is this is important point and what you're saying is that this only > panics on kernels before 5.10 or so because after that it's correctly > failed by block device error handling so there's nothing to fix in > later kernels? > > In that case, isn't the correct fix to look at backporting the block > device error handling: This is the last commit that support error handling, and there are many relied patches, and there are lots of refactor in block layer. It's not a good idea to backport error handling to lower version. Althrough error handling can prevent kernel crash in this case, I still think it make sense to make sure kobject is deleted in order, parent should not be deleted before child. Thanks, Kuai > > commit 83cbce9574462c6b4eed6797bdaf18fae6859ab3 > Author: Luis Chamberlain <mcgrof@kernel.org> > Date: Wed Aug 18 16:45:40 2021 +0200 > > block: add error handling for device_add_disk / add_disk > > ? > > James > > . > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-30 3:07 ` Yu Kuai @ 2023-01-30 3:29 ` James Bottomley 2023-01-30 3:46 ` Yu Kuai 0 siblings, 1 reply; 15+ messages in thread From: James Bottomley @ 2023-01-30 3:29 UTC (permalink / raw) To: Yu Kuai, Zhong Jinghua, gregkh, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, yi.zhang, yukuai (C) On Mon, 2023-01-30 at 11:07 +0800, Yu Kuai wrote: > Hi, > > 在 2023/01/30 1:30, James Bottomley 写道: > > On Sat, 2023-01-28 at 17:41 +0800, Zhong Jinghua wrote: > > > This error will cause a warning: > > > kobject_add_internal failed for block (error: -2 parent: > > > 1:0:0:1). In the lower version (such as 5.10), there is no > > > corresponding error handling, continuing to go down will trigger > > > a kernel panic, so cc stable. > > > > Is this is important point and what you're saying is that this only > > panics on kernels before 5.10 or so because after that it's > > correctly failed by block device error handling so there's nothing > > to fix in later kernels? > > > > In that case, isn't the correct fix to look at backporting the > > block device error handling: > > This is the last commit that support error handling, and there are > many relied patches, and there are lots of refactor in block layer. > It's not a good idea to backport error handling to lower version. > > Althrough error handling can prevent kernel crash in this case, I > still think it make sense to make sure kobject is deleted in order, > parent should not be deleted before child. Well, look, you've created a very artificial situation where a create closely followed by a delete of the underlying sdev races with the create of the block gendisk devices of sd that bind asynchronously to the created sdev. The asynchronous nature of the bind gives the elongated race window so the only real fix is some sort of check that the sdev is still viable by the time the bind occurs ... probably in sd_probe(), say a scsi_device_get of sdp at the top which would ensure viability of the sdev for the entire bind or fail the probe if the sdev can't be got. James ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-30 3:29 ` James Bottomley @ 2023-01-30 3:46 ` Yu Kuai 2023-01-30 13:17 ` James Bottomley 0 siblings, 1 reply; 15+ messages in thread From: Yu Kuai @ 2023-01-30 3:46 UTC (permalink / raw) To: jejb, Yu Kuai, Zhong Jinghua, gregkh, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, yi.zhang, yukuai (C) Hi, 在 2023/01/30 11:29, James Bottomley 写道: > On Mon, 2023-01-30 at 11:07 +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/01/30 1:30, James Bottomley 写道: >>> On Sat, 2023-01-28 at 17:41 +0800, Zhong Jinghua wrote: >>>> This error will cause a warning: >>>> kobject_add_internal failed for block (error: -2 parent: >>>> 1:0:0:1). In the lower version (such as 5.10), there is no >>>> corresponding error handling, continuing to go down will trigger >>>> a kernel panic, so cc stable. >>> >>> Is this is important point and what you're saying is that this only >>> panics on kernels before 5.10 or so because after that it's >>> correctly failed by block device error handling so there's nothing >>> to fix in later kernels? >>> >>> In that case, isn't the correct fix to look at backporting the >>> block device error handling: >> >> This is the last commit that support error handling, and there are >> many relied patches, and there are lots of refactor in block layer. >> It's not a good idea to backport error handling to lower version. >> >> Althrough error handling can prevent kernel crash in this case, I >> still think it make sense to make sure kobject is deleted in order, >> parent should not be deleted before child. > > Well, look, you've created a very artificial situation where a create > closely followed by a delete of the underlying sdev races with the > create of the block gendisk devices of sd that bind asynchronously to > the created sdev. The asynchronous nature of the bind gives the > elongated race window so the only real fix is some sort of check that > the sdev is still viable by the time the bind occurs ... probably in > sd_probe(), say a scsi_device_get of sdp at the top which would ensure > viability of the sdev for the entire bind or fail the probe if the sdev > can't be got. Sorry, I don't follow here. 😟 I agree this is a very artificial situation, however I can't tell our tester not to test this way... The problem is that kobject session is deleted and then sd_probe() tries to create a new kobject under hostx/sessionx/x:x:x:x/. I don't see how scsi_device_get() can prevent that, it only get a kobject reference and can prevent kobject to be released, however, kobject_del() can still be done. In this patch, we make sure remove session and sd_probe() won't concurrent, remove session will wait for all child kobject to be deleted, what do you think? Thanks, Kuai > > James > > > . > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-30 3:46 ` Yu Kuai @ 2023-01-30 13:17 ` James Bottomley 2023-01-31 1:43 ` Yu Kuai 0 siblings, 1 reply; 15+ messages in thread From: James Bottomley @ 2023-01-30 13:17 UTC (permalink / raw) To: Yu Kuai, Zhong Jinghua, gregkh, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, yi.zhang, yukuai (C) On Mon, 2023-01-30 at 11:46 +0800, Yu Kuai wrote: > Hi, > > 在 2023/01/30 11:29, James Bottomley 写道: > > On Mon, 2023-01-30 at 11:07 +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/01/30 1:30, James Bottomley 写道: > > > > On Sat, 2023-01-28 at 17:41 +0800, Zhong Jinghua wrote: > > > > > This error will cause a warning: > > > > > kobject_add_internal failed for block (error: -2 parent: > > > > > 1:0:0:1). In the lower version (such as 5.10), there is no > > > > > corresponding error handling, continuing to go down will > > > > > trigger a kernel panic, so cc stable. > > > > > > > > Is this is important point and what you're saying is that this > > > > only panics on kernels before 5.10 or so because after that > > > > it's correctly failed by block device error handling so there's > > > > nothing to fix in later kernels? > > > > > > > > In that case, isn't the correct fix to look at backporting the > > > > block device error handling: > > > > > > This is the last commit that support error handling, and there > > > are many relied patches, and there are lots of refactor in block > > > layer. It's not a good idea to backport error handling to lower > > > version. > > > Althrough error handling can prevent kernel crash in this case, I > > > still think it make sense to make sure kobject is deleted in > > > order, parent should not be deleted before child. > > > > Well, look, you've created a very artificial situation where a > > create closely followed by a delete of the underlying sdev races > > with the create of the block gendisk devices of sd that bind > > asynchronously to the created sdev. The asynchronous nature of the > > bind gives the elongated race window so the only real fix is some > > sort of check that the sdev is still viable by the time the bind > > occurs ... probably in sd_probe(), say a scsi_device_get of sdp at > > the top which would ensure viability of the sdev for the entire > > bind or fail the probe if the sdev can't be got. > > Sorry, I don't follow here. 😟 In the current kernel the race is mitigated because add_device fails due to the parent being torn down. That parent is the sdev->gendev so it seems we can detect this in the probe by looking at the sdev->gendev state, which scsi_device_get() will do. > I agree this is a very artificial situation, however I can't tell our > tester not to test this way... > > The problem is that kobject session is deleted and then sd_probe() > tries to create a new kobject under hostx/sessionx/x:x:x:x/. I don't > see how scsi_device_get() can prevent that, it only get a kobject > reference and can prevent kobject to be released, however, > kobject_del() can still be done. So your contention is there's no way that we could make scsi_device_get see the kernfs deactivation? I would have thought checking sdev- >sdev_gendev.kobj.sd.active would give that ... although the check would have to be via an API since KN_DEACTIVATED_BIAS is internal. James > In this patch, we make sure remove session and sd_probe() won't > concurrent, remove session will wait for all child kobject to be > deleted, what do you think? > > Thanks, > Kuai > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-30 13:17 ` James Bottomley @ 2023-01-31 1:43 ` Yu Kuai 2023-01-31 3:25 ` James Bottomley 0 siblings, 1 reply; 15+ messages in thread From: Yu Kuai @ 2023-01-31 1:43 UTC (permalink / raw) To: jejb, Yu Kuai, Zhong Jinghua, gregkh, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, yi.zhang, yukuai (C) Hi, 在 2023/01/30 21:17, James Bottomley 写道: > On Mon, 2023-01-30 at 11:46 +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/01/30 11:29, James Bottomley 写道: >>> On Mon, 2023-01-30 at 11:07 +0800, Yu Kuai wrote: >>>> Hi, >>>> >>>> 在 2023/01/30 1:30, James Bottomley 写道: >>>>> On Sat, 2023-01-28 at 17:41 +0800, Zhong Jinghua wrote: >>>>>> This error will cause a warning: >>>>>> kobject_add_internal failed for block (error: -2 parent: >>>>>> 1:0:0:1). In the lower version (such as 5.10), there is no >>>>>> corresponding error handling, continuing to go down will >>>>>> trigger a kernel panic, so cc stable. >>>>> >>>>> Is this is important point and what you're saying is that this >>>>> only panics on kernels before 5.10 or so because after that >>>>> it's correctly failed by block device error handling so there's >>>>> nothing to fix in later kernels? >>>>> >>>>> In that case, isn't the correct fix to look at backporting the >>>>> block device error handling: >>>> >>>> This is the last commit that support error handling, and there >>>> are many relied patches, and there are lots of refactor in block >>>> layer. It's not a good idea to backport error handling to lower >>>> version. >>>> Althrough error handling can prevent kernel crash in this case, I >>>> still think it make sense to make sure kobject is deleted in >>>> order, parent should not be deleted before child. >>> >>> Well, look, you've created a very artificial situation where a >>> create closely followed by a delete of the underlying sdev races >>> with the create of the block gendisk devices of sd that bind >>> asynchronously to the created sdev. The asynchronous nature of the >>> bind gives the elongated race window so the only real fix is some >>> sort of check that the sdev is still viable by the time the bind >>> occurs ... probably in sd_probe(), say a scsi_device_get of sdp at >>> the top which would ensure viability of the sdev for the entire >>> bind or fail the probe if the sdev can't be got. >> >> Sorry, I don't follow here. 😟 > > In the current kernel the race is mitigated because add_device fails > due to the parent being torn down. That parent is the sdev->gendev so > it seems we can detect this in the probe by looking at the sdev->gendev > state, which scsi_device_get() will do. > >> I agree this is a very artificial situation, however I can't tell our >> tester not to test this way... >> >> The problem is that kobject session is deleted and then sd_probe() >> tries to create a new kobject under hostx/sessionx/x:x:x:x/. I don't >> see how scsi_device_get() can prevent that, it only get a kobject >> reference and can prevent kobject to be released, however, >> kobject_del() can still be done. > > So your contention is there's no way that we could make scsi_device_get > see the kernfs deactivation? I would have thought checking sdev- >> sdev_gendev.kobj.sd.active would give that ... although the check > would have to be via an API since KN_DEACTIVATED_BIAS is internal. I'm still not sure if such checking is enough. session1/target1:0:0/1:0:0:0/block 1) t1 is deleting target, and t1 already set 1:0:0:0 to SDEV_CANCEL, and 1:0:0:0 is not deleted yet. 2) t2 is deleting session1, 1:0:0:0 state is SDEV_CACEL, so 1:0:0:0 is skipped, and session1 is deleted before 1:0:0:0, which will cause 1:0:0:0 to be not active. 3) t3 create block, it can happen because 1:0:0:0 is still not deleted, and later kobject_add() will found 1:0:0:0 is not active and hence faild. The problem is that deleting parent kobject will cause child kobject not to be active, and in 3) device_lock is not hold for parents, hence just checking if this scsi_device is active is not enough, we have to make sure parents won't be deleted concurrently, for example, a litter adjustment for above procedures: 1) ...(the same) 2) t3 create block, it check kobject state is still active 3) t2 delete session1 ...(the same), 1:0:0:0 is not active anymore. 4) t3 continue to create block undre 1:0:0:0, which will fail. By the way, I think such problem exist because scsi_device state is SDEV_CANCEL doesn't mean that the device is deleted, simply skip such device while removing session is not right. Do you found other problems if we make sure that kobject is deleted in order? Thanks, Kuai > > James > >> In this patch, we make sure remove session and sd_probe() won't >> concurrent, remove session will wait for all child kobject to be >> deleted, what do you think? >> >> Thanks, >> Kuai >> > > . > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device 2023-01-31 1:43 ` Yu Kuai @ 2023-01-31 3:25 ` James Bottomley 0 siblings, 0 replies; 15+ messages in thread From: James Bottomley @ 2023-01-31 3:25 UTC (permalink / raw) To: Yu Kuai, Zhong Jinghua, gregkh, martin.petersen, hare, bvanassche, emilne Cc: linux-kernel, linux-scsi, yi.zhang, yukuai (C) On Tue, 2023-01-31 at 09:43 +0800, Yu Kuai wrote: > Hi, > > 在 2023/01/30 21:17, James Bottomley 写道: > > On Mon, 2023-01-30 at 11:46 +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/01/30 11:29, James Bottomley 写道: > > > > On Mon, 2023-01-30 at 11:07 +0800, Yu Kuai wrote: > > > > > Hi, > > > > > > > > > > 在 2023/01/30 1:30, James Bottomley 写道: > > > > > > On Sat, 2023-01-28 at 17:41 +0800, Zhong Jinghua wrote: > > > > > > > This error will cause a warning: > > > > > > > kobject_add_internal failed for block (error: -2 parent: > > > > > > > 1:0:0:1). In the lower version (such as 5.10), there is > > > > > > > no corresponding error handling, continuing to go down > > > > > > > will trigger a kernel panic, so cc stable. > > > > > > > > > > > > Is this is important point and what you're saying is that > > > > > > this only panics on kernels before 5.10 or so because after > > > > > > that it's correctly failed by block device error handling > > > > > > so there's nothing to fix in later kernels? > > > > > > > > > > > > In that case, isn't the correct fix to look at backporting > > > > > > the block device error handling: > > > > > > > > > > This is the last commit that support error handling, and > > > > > there are many relied patches, and there are lots of refactor > > > > > in block layer. It's not a good idea to backport error > > > > > handling to lower version. Althrough error handling can > > > > > prevent kernel crash in this case, I still think it make > > > > > sense to make sure kobject is deleted in order, parent should > > > > > not be deleted before child. > > > > > > > > Well, look, you've created a very artificial situation where a > > > > create closely followed by a delete of the underlying sdev > > > > races with the create of the block gendisk devices of sd that > > > > bind asynchronously to the created sdev. The asynchronous > > > > nature of the bind gives the elongated race window so the only > > > > real fix is some sort of check that the sdev is still viable by > > > > the time the bind occurs ... probably in sd_probe(), say a > > > > scsi_device_get of sdp at the top which would ensure viability > > > > of the sdev for the entire bind or fail the probe if the sdev > > > > can't be got. > > > > > > Sorry, I don't follow here. 😟 > > > > In the current kernel the race is mitigated because add_device > > fails due to the parent being torn down. That parent is the sdev- > > >gendev so it seems we can detect this in the probe by looking at > > the sdev->gendev state, which scsi_device_get() will do. > > > > > I agree this is a very artificial situation, however I can't tell > > > our tester not to test this way... > > > > > > The problem is that kobject session is deleted and then > > > sd_probe() tries to create a new kobject under > > > hostx/sessionx/x:x:x:x/. I don't see how scsi_device_get() can > > > prevent that, it only get a kobject reference and can prevent > > > kobject to be released, however, kobject_del() can still be done. > > > > So your contention is there's no way that we could make > > scsi_device_get see the kernfs deactivation? I would have thought > > checking sdev->sdev_gendev.kobj.sd.active would give that ... > > although the check would have to be via an API since > > KN_DEACTIVATED_BIAS is internal. > > I'm still not sure if such checking is enough. It's the same check as causes the block device_add() to fail in upstream which, so far I believe, you've failed to trigger an oops on. The problem is this doesn't reproduce upstream and say you need something simple to backport to stable kernels rather than trying to backport the device_add() error handling. The proposal doesn't completely close the race windows but I think it narrows it to the point where the add/remove race is almost impossible to trigger. > session1/target1:0:0/1:0:0:0/block > > 1) t1 is deleting target, and t1 already set 1:0:0:0 to SDEV_CANCEL, > and 1:0:0:0 is not deleted yet. > 2) t2 is deleting session1, 1:0:0:0 state is SDEV_CACEL, so 1:0:0:0 > is skipped, and session1 is deleted before 1:0:0:0, which will cause > 1:0:0:0 to be not active. > 3) t3 create block, it can happen because 1:0:0:0 is still not > deleted, and later kobject_add() will found 1:0:0:0 is not active and > hence faild. > > The problem is that deleting parent kobject will cause child kobject > not to be active, and in 3) device_lock is not hold for parents, > hence just checking if this scsi_device is active is not enough, we > have to make sure parents won't be deleted concurrently, for example, > a litter adjustment for above procedures: > > 1) ...(the same) > 2) t3 create block, it check kobject state is still active > 3) t2 delete session1 ...(the same), 1:0:0:0 is not active anymore. > 4) t3 continue to create block undre 1:0:0:0, which will fail. > > By the way, I think such problem exist because scsi_device state is > SDEV_CANCEL doesn't mean that the device is deleted, simply skip such > device while removing session is not right. > > Do you found other problems if we make sure that kobject is deleted > in order? Given there's nothing to fix in upstream, coming up with elaborate ordering constraints on kobjects isn't going to pass muster for backporting to stable. James ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-01-31 3:27 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-28 9:41 [PATCH-next v2 0/2] scsi, driver core: fix iscsi rescan fails to create block device Zhong Jinghua 2023-01-28 9:41 ` [PATCH-next v2 1/2] driver core: introduce get_device_unless_zero() Zhong Jinghua 2023-01-28 10:43 ` Greg KH 2023-01-28 9:41 ` [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device Zhong Jinghua 2023-01-28 10:45 ` Greg KH 2023-01-29 1:13 ` Yu Kuai 2023-01-29 6:46 ` Greg KH 2023-01-29 6:55 ` Yu Kuai 2023-01-29 17:30 ` James Bottomley 2023-01-30 3:07 ` Yu Kuai 2023-01-30 3:29 ` James Bottomley 2023-01-30 3:46 ` Yu Kuai 2023-01-30 13:17 ` James Bottomley 2023-01-31 1:43 ` Yu Kuai 2023-01-31 3:25 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox