public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] nvme_core: scan namespaces asynchronously
@ 2024-07-15 20:34 Stuart Hayes
  2024-07-15 22:28 ` Sagi Grimberg
  2024-07-16  6:37 ` Hannes Reinecke
  0 siblings, 2 replies; 4+ messages in thread
From: Stuart Hayes @ 2024-07-15 20:34 UTC (permalink / raw)
  To: linux-kernel, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme
  Cc: Hannes Reinecke, Martin Wilck, Ayush Siddarath, Stuart Hayes

Use async function calls to make namespace scanning happen in parallel.

Without the patch, NVME namespaces are scanned serially, so it can take
a long time for all of a controller's namespaces to become available,
especially with a slower (TCP) interface with large number of
namespaces.

It is not uncommon to have large numbers (hundreds or thousands) of
namespaces on nvme-of with storage servers.

The time it took for all namespaces to show up after connecting (via
TCP) to a controller with 1002 namespaces was measured on one system:

network latency   without patch   with patch
     0                 6s            1s
    50ms             210s           10s
   100ms             417s           18s

Measurements taken on another system show the effect of the patch on the
time nvme_scan_work() took to complete, when connecting to a linux
nvme-of target with varying numbers of namespaces, on a network of
400us.

namespaces    without patch   with patch
     1            16ms           14ms
     2            24ms           16ms
     4            49ms           22ms
     8           101ms           33ms
    16           207ms           56ms
   100           1.4s           0.6s
  1000          12.9s           2.0s

On the same system, connecting to a local PCIe NVMe drive (a Samsung
PM1733) instead of a network target:

namespaces    without patch   with patch
     1            13ms           12ms
     2            41ms           13ms

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
changes from V2:
  * make a separate function nvme_scan_ns_async() that calls
    nvme_scan_ns(), instead of modifying nvme_scan_ns()
  * only scan asynchronously from nvme_scan_ns_list(), not from
    nvme_scan_ns_sequential()
  * provide more timing data in the commit message

changes from V1:
  * remove module param to enable/disable async scanning
  * add scan time measurements to commit message


 drivers/nvme/host/core.c | 48 +++++++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 782090ce0bc1..dbf05cfea063 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4,6 +4,7 @@
  * Copyright (c) 2011-2014, Intel Corporation.
  */
 
+#include <linux/async.h>
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-integrity.h>
@@ -3952,6 +3953,30 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	}
 }
 
+/*
+ * struct async_scan_info - keeps track of controller & NSIDs to scan
+ * @ctrl:	Controller on which namespaces are being scanned
+ * @next_idx:	Index of next NSID to scan in ns_list
+ * @ns_list:	Pointer to list of NSIDs to scan
+ */
+struct async_scan_info {
+	struct nvme_ctrl *ctrl;
+	atomic_t next_idx;
+	__le32 *ns_list;
+};
+
+static void nvme_scan_ns_async(void *data, async_cookie_t cookie)
+{
+	struct async_scan_info *scan_info = data;
+	int idx;
+	u32 nsid;
+
+	idx = (u32)atomic_fetch_add(1, &scan_info->next_idx);
+	nsid = le32_to_cpu(scan_info->ns_list[idx]);
+
+	nvme_scan_ns(scan_info->ctrl, nsid);
+}
+
 static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 					unsigned nsid)
 {
@@ -3975,12 +4000,14 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
 {
 	const int nr_entries = NVME_IDENTIFY_DATA_SIZE / sizeof(__le32);
-	__le32 *ns_list;
+	struct async_scan_info scan_info;
 	u32 prev = 0;
 	int ret = 0, i;
+	ASYNC_DOMAIN(domain);
 
-	ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
-	if (!ns_list)
+	scan_info.ctrl = ctrl;
+	scan_info.ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
+	if (!scan_info.ns_list)
 		return -ENOMEM;
 
 	for (;;) {
@@ -3990,28 +4017,33 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
 			.identify.nsid		= cpu_to_le32(prev),
 		};
 
-		ret = nvme_submit_sync_cmd(ctrl->admin_q, &cmd, ns_list,
-					    NVME_IDENTIFY_DATA_SIZE);
+		ret = nvme_submit_sync_cmd(ctrl->admin_q, &cmd,
+					   scan_info.ns_list,
+					   NVME_IDENTIFY_DATA_SIZE);
 		if (ret) {
 			dev_warn(ctrl->device,
 				"Identify NS List failed (status=0x%x)\n", ret);
 			goto free;
 		}
 
+		atomic_set(&scan_info.next_idx, 0);
 		for (i = 0; i < nr_entries; i++) {
-			u32 nsid = le32_to_cpu(ns_list[i]);
+			u32 nsid = le32_to_cpu(scan_info.ns_list[i]);
 
 			if (!nsid)	/* end of the list? */
 				goto out;
-			nvme_scan_ns(ctrl, nsid);
+			async_schedule_domain(nvme_scan_ns_async, &scan_info,
+						&domain);
 			while (++prev < nsid)
 				nvme_ns_remove_by_nsid(ctrl, prev);
 		}
+		async_synchronize_full_domain(&domain);
 	}
  out:
 	nvme_remove_invalid_namespaces(ctrl, prev);
  free:
-	kfree(ns_list);
+	async_synchronize_full_domain(&domain);
+	kfree(scan_info.ns_list);
 	return ret;
 }
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] nvme_core: scan namespaces asynchronously
  2024-07-15 20:34 [PATCH v3] nvme_core: scan namespaces asynchronously Stuart Hayes
@ 2024-07-15 22:28 ` Sagi Grimberg
  2024-07-16  1:23   ` stuart hayes
  2024-07-16  6:37 ` Hannes Reinecke
  1 sibling, 1 reply; 4+ messages in thread
From: Sagi Grimberg @ 2024-07-15 22:28 UTC (permalink / raw)
  To: Stuart Hayes, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, linux-nvme
  Cc: Hannes Reinecke, Martin Wilck, Ayush Siddarath



On 15/07/2024 23:34, Stuart Hayes wrote:
> Use async function calls to make namespace scanning happen in parallel.
>
> Without the patch, NVME namespaces are scanned serially, so it can take
> a long time for all of a controller's namespaces to become available,
> especially with a slower (TCP) interface with large number of
> namespaces.
>
> It is not uncommon to have large numbers (hundreds or thousands) of
> namespaces on nvme-of with storage servers.
>
> The time it took for all namespaces to show up after connecting (via
> TCP) to a controller with 1002 namespaces was measured on one system:
>
> network latency   without patch   with patch
>       0                 6s            1s
>      50ms             210s           10s
>     100ms             417s           18s
>
> Measurements taken on another system show the effect of the patch on the
> time nvme_scan_work() took to complete, when connecting to a linux
> nvme-of target with varying numbers of namespaces, on a network of
> 400us.
>
> namespaces    without patch   with patch
>       1            16ms           14ms
>       2            24ms           16ms
>       4            49ms           22ms
>       8           101ms           33ms
>      16           207ms           56ms
>     100           1.4s           0.6s
>    1000          12.9s           2.0s

Not sure how common is the 1000 namespaces use-case, but the dozens of 
namespaces
seems compelling enough.

>
> On the same system, connecting to a local PCIe NVMe drive (a Samsung
> PM1733) instead of a network target:
>
> namespaces    without patch   with patch
>       1            13ms           12ms
>       2            41ms           13ms
>
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> ---
> changes from V2:
>    * make a separate function nvme_scan_ns_async() that calls
>      nvme_scan_ns(), instead of modifying nvme_scan_ns()
>    * only scan asynchronously from nvme_scan_ns_list(), not from
>      nvme_scan_ns_sequential()
>    * provide more timing data in the commit message
>
> changes from V1:
>    * remove module param to enable/disable async scanning
>    * add scan time measurements to commit message
>
>
>   drivers/nvme/host/core.c | 48 +++++++++++++++++++++++++++++++++-------
>   1 file changed, 40 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 782090ce0bc1..dbf05cfea063 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4,6 +4,7 @@
>    * Copyright (c) 2011-2014, Intel Corporation.
>    */
>   
> +#include <linux/async.h>
>   #include <linux/blkdev.h>
>   #include <linux/blk-mq.h>
>   #include <linux/blk-integrity.h>
> @@ -3952,6 +3953,30 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>   	}
>   }
>   
> +/*
> + * struct async_scan_info - keeps track of controller & NSIDs to scan
> + * @ctrl:	Controller on which namespaces are being scanned
> + * @next_idx:	Index of next NSID to scan in ns_list
> + * @ns_list:	Pointer to list of NSIDs to scan
> + */
> +struct async_scan_info {
> +	struct nvme_ctrl *ctrl;
> +	atomic_t next_idx;

next_nsid ?

> +	__le32 *ns_list;
> +};
> +
> +static void nvme_scan_ns_async(void *data, async_cookie_t cookie)
> +{
> +	struct async_scan_info *scan_info = data;
> +	int idx;
> +	u32 nsid;
> +
> +	idx = (u32)atomic_fetch_add(1, &scan_info->next_idx);
> +	nsid = le32_to_cpu(scan_info->ns_list[idx]);
> +
> +	nvme_scan_ns(scan_info->ctrl, nsid);
> +}
> +
>   static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>   					unsigned nsid)
>   {
> @@ -3975,12 +4000,14 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>   static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
>   {
>   	const int nr_entries = NVME_IDENTIFY_DATA_SIZE / sizeof(__le32);
> -	__le32 *ns_list;
> +	struct async_scan_info scan_info;

What initializes next_idx?

>   	u32 prev = 0;
>   	int ret = 0, i;
> +	ASYNC_DOMAIN(domain);
>   
> -	ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
> -	if (!ns_list)
> +	scan_info.ctrl = ctrl;
> +	scan_info.ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
> +	if (!scan_info.ns_list)
>   		return -ENOMEM;

I think you can leave the local variable ns_list as is, and just assign 
it to scan_info
after, its common practice to allocate to a local pointer and use it to 
init a struct member.

Plus it will make the patch diff simpler.

>   
>   	for (;;) {
> @@ -3990,28 +4017,33 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
>   			.identify.nsid		= cpu_to_le32(prev),
>   		};
>   
> -		ret = nvme_submit_sync_cmd(ctrl->admin_q, &cmd, ns_list,
> -					    NVME_IDENTIFY_DATA_SIZE);
> +		ret = nvme_submit_sync_cmd(ctrl->admin_q, &cmd,
> +					   scan_info.ns_list,
> +					   NVME_IDENTIFY_DATA_SIZE);
>   		if (ret) {
>   			dev_warn(ctrl->device,
>   				"Identify NS List failed (status=0x%x)\n", ret);
>   			goto free;
>   		}
>   
> +		atomic_set(&scan_info.next_idx, 0);
>   		for (i = 0; i < nr_entries; i++) {
> -			u32 nsid = le32_to_cpu(ns_list[i]);
> +			u32 nsid = le32_to_cpu(scan_info.ns_list[i]);
>   
>   			if (!nsid)	/* end of the list? */
>   				goto out;
> -			nvme_scan_ns(ctrl, nsid);
> +			async_schedule_domain(nvme_scan_ns_async, &scan_info,
> +						&domain);
>   			while (++prev < nsid)
>   				nvme_ns_remove_by_nsid(ctrl, prev);
>   		}
> +		async_synchronize_full_domain(&domain);
>   	}
>    out:
>   	nvme_remove_invalid_namespaces(ctrl, prev);
>    free:
> -	kfree(ns_list);
> +	async_synchronize_full_domain(&domain);
> +	kfree(scan_info.ns_list);
>   	return ret;
>   }
>   


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] nvme_core: scan namespaces asynchronously
  2024-07-15 22:28 ` Sagi Grimberg
@ 2024-07-16  1:23   ` stuart hayes
  0 siblings, 0 replies; 4+ messages in thread
From: stuart hayes @ 2024-07-16  1:23 UTC (permalink / raw)
  To: Sagi Grimberg, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, linux-nvme
  Cc: Hannes Reinecke, Martin Wilck, Ayush Siddarath



On 7/15/2024 5:28 PM, Sagi Grimberg wrote:
> 
> 
> On 15/07/2024 23:34, Stuart Hayes wrote:
>> Use async function calls to make namespace scanning happen in parallel.
>>
>> Without the patch, NVME namespaces are scanned serially, so it can take
>> a long time for all of a controller's namespaces to become available,
>> especially with a slower (TCP) interface with large number of
>> namespaces.
>>
>> It is not uncommon to have large numbers (hundreds or thousands) of
>> namespaces on nvme-of with storage servers.
>>
>> The time it took for all namespaces to show up after connecting (via
>> TCP) to a controller with 1002 namespaces was measured on one system:
>>
>> network latency   without patch   with patch
>>       0                 6s            1s
>>      50ms             210s           10s
>>     100ms             417s           18s
>>
>> Measurements taken on another system show the effect of the patch on the
>> time nvme_scan_work() took to complete, when connecting to a linux
>> nvme-of target with varying numbers of namespaces, on a network of
>> 400us.
>>
>> namespaces    without patch   with patch
>>       1            16ms           14ms
>>       2            24ms           16ms
>>       4            49ms           22ms
>>       8           101ms           33ms
>>      16           207ms           56ms
>>     100           1.4s           0.6s
>>    1000          12.9s           2.0s
> 
> Not sure how common is the 1000 namespaces use-case, but the dozens of namespaces
> seems compelling enough.
> 
>>
>> On the same system, connecting to a local PCIe NVMe drive (a Samsung
>> PM1733) instead of a network target:
>>
>> namespaces    without patch   with patch
>>       1            13ms           12ms
>>       2            41ms           13ms
>>
>> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
>> ---
>> changes from V2:
>>    * make a separate function nvme_scan_ns_async() that calls
>>      nvme_scan_ns(), instead of modifying nvme_scan_ns()
>>    * only scan asynchronously from nvme_scan_ns_list(), not from
>>      nvme_scan_ns_sequential()
>>    * provide more timing data in the commit message
>>
>> changes from V1:
>>    * remove module param to enable/disable async scanning
>>    * add scan time measurements to commit message
>>
>>
>>   drivers/nvme/host/core.c | 48 +++++++++++++++++++++++++++++++++-------
>>   1 file changed, 40 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 782090ce0bc1..dbf05cfea063 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -4,6 +4,7 @@
>>    * Copyright (c) 2011-2014, Intel Corporation.
>>    */
>> +#include <linux/async.h>
>>   #include <linux/blkdev.h>
>>   #include <linux/blk-mq.h>
>>   #include <linux/blk-integrity.h>
>> @@ -3952,6 +3953,30 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>>       }
>>   }
>> +/*
>> + * struct async_scan_info - keeps track of controller & NSIDs to scan
>> + * @ctrl:    Controller on which namespaces are being scanned
>> + * @next_idx:    Index of next NSID to scan in ns_list
>> + * @ns_list:    Pointer to list of NSIDs to scan
>> + */
>> +struct async_scan_info {
>> +    struct nvme_ctrl *ctrl;
>> +    atomic_t next_idx;
> 
> next_nsid ?
> 

OK!

>> +    __le32 *ns_list;
>> +};
>> +
>> +static void nvme_scan_ns_async(void *data, async_cookie_t cookie)
>> +{
>> +    struct async_scan_info *scan_info = data;
>> +    int idx;
>> +    u32 nsid;
>> +
>> +    idx = (u32)atomic_fetch_add(1, &scan_info->next_idx);
>> +    nsid = le32_to_cpu(scan_info->ns_list[idx]);
>> +
>> +    nvme_scan_ns(scan_info->ctrl, nsid);
>> +}
>> +
>>   static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>>                       unsigned nsid)
>>   {
>> @@ -3975,12 +4000,14 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>>   static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
>>   {
>>       const int nr_entries = NVME_IDENTIFY_DATA_SIZE / sizeof(__le32);
>> -    __le32 *ns_list;
>> +    struct async_scan_info scan_info;
> 
> What initializes next_idx?

See below--there's an atomic_set().  It is inside of the outer "for" loop because there can
be multiple lists that have to be scanned and it has to reset to 0 each time.

> 
>>       u32 prev = 0;
>>       int ret = 0, i;
>> +    ASYNC_DOMAIN(domain);
>> -    ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
>> -    if (!ns_list)
>> +    scan_info.ctrl = ctrl;
>> +    scan_info.ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
>> +    if (!scan_info.ns_list)
>>           return -ENOMEM;
> 
> I think you can leave the local variable ns_list as is, and just assign it to scan_info
> after, its common practice to allocate to a local pointer and use it to init a struct member.
> 
> Plus it will make the patch diff simpler.
> 

No problem, I agree.  I think someone suggested the opposite last time I submitted this.  :)

>>       for (;;) {
>> @@ -3990,28 +4017,33 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
>>               .identify.nsid        = cpu_to_le32(prev),
>>           };
>> -        ret = nvme_submit_sync_cmd(ctrl->admin_q, &cmd, ns_list,
>> -                        NVME_IDENTIFY_DATA_SIZE);
>> +        ret = nvme_submit_sync_cmd(ctrl->admin_q, &cmd,
>> +                       scan_info.ns_list,
>> +                       NVME_IDENTIFY_DATA_SIZE);
>>           if (ret) {
>>               dev_warn(ctrl->device,
>>                   "Identify NS List failed (status=0x%x)\n", ret);
>>               goto free;
>>           }
>> +        atomic_set(&scan_info.next_idx, 0);

This atomic_set is what initializes next_idx.

>>           for (i = 0; i < nr_entries; i++) {
>> -            u32 nsid = le32_to_cpu(ns_list[i]);
>> +            u32 nsid = le32_to_cpu(scan_info.ns_list[i]);
>>               if (!nsid)    /* end of the list? */
>>                   goto out;
>> -            nvme_scan_ns(ctrl, nsid);
>> +            async_schedule_domain(nvme_scan_ns_async, &scan_info,
>> +                        &domain);
>>               while (++prev < nsid)
>>                   nvme_ns_remove_by_nsid(ctrl, prev);
>>           }
>> +        async_synchronize_full_domain(&domain);
>>       }
>>    out:
>>       nvme_remove_invalid_namespaces(ctrl, prev);
>>    free:
>> -    kfree(ns_list);
>> +    async_synchronize_full_domain(&domain);
>> +    kfree(scan_info.ns_list);
>>       return ret;
>>   }
> 

Thank you for the feedback!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] nvme_core: scan namespaces asynchronously
  2024-07-15 20:34 [PATCH v3] nvme_core: scan namespaces asynchronously Stuart Hayes
  2024-07-15 22:28 ` Sagi Grimberg
@ 2024-07-16  6:37 ` Hannes Reinecke
  1 sibling, 0 replies; 4+ messages in thread
From: Hannes Reinecke @ 2024-07-16  6:37 UTC (permalink / raw)
  To: Stuart Hayes, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme
  Cc: Martin Wilck, Ayush Siddarath

On 7/15/24 22:34, Stuart Hayes wrote:
> Use async function calls to make namespace scanning happen in parallel.
> 
> Without the patch, NVME namespaces are scanned serially, so it can take
> a long time for all of a controller's namespaces to become available,
> especially with a slower (TCP) interface with large number of
> namespaces.
> 
> It is not uncommon to have large numbers (hundreds or thousands) of
> namespaces on nvme-of with storage servers.
> 
> The time it took for all namespaces to show up after connecting (via
> TCP) to a controller with 1002 namespaces was measured on one system:
> 
> network latency   without patch   with patch
>       0                 6s            1s
>      50ms             210s           10s
>     100ms             417s           18s
> 
> Measurements taken on another system show the effect of the patch on the
> time nvme_scan_work() took to complete, when connecting to a linux
> nvme-of target with varying numbers of namespaces, on a network of
> 400us.
> 
> namespaces    without patch   with patch
>       1            16ms           14ms
>       2            24ms           16ms
>       4            49ms           22ms
>       8           101ms           33ms
>      16           207ms           56ms
>     100           1.4s           0.6s
>    1000          12.9s           2.0s
> 
> On the same system, connecting to a local PCIe NVMe drive (a Samsung
> PM1733) instead of a network target:
> 
> namespaces    without patch   with patch
>       1            13ms           12ms
>       2            41ms           13ms
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> ---
> changes from V2:
>    * make a separate function nvme_scan_ns_async() that calls
>      nvme_scan_ns(), instead of modifying nvme_scan_ns()
>    * only scan asynchronously from nvme_scan_ns_list(), not from
>      nvme_scan_ns_sequential()
>    * provide more timing data in the commit message
> 
> changes from V1:
>    * remove module param to enable/disable async scanning
>    * add scan time measurements to commit message
> 
> 
>   drivers/nvme/host/core.c | 48 +++++++++++++++++++++++++++++++++-------
>   1 file changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 782090ce0bc1..dbf05cfea063 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4,6 +4,7 @@
>    * Copyright (c) 2011-2014, Intel Corporation.
>    */
>   
> +#include <linux/async.h>
>   #include <linux/blkdev.h>
>   #include <linux/blk-mq.h>
>   #include <linux/blk-integrity.h>
> @@ -3952,6 +3953,30 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>   	}
>   }
>   
> +/*
> + * struct async_scan_info - keeps track of controller & NSIDs to scan
> + * @ctrl:	Controller on which namespaces are being scanned
> + * @next_idx:	Index of next NSID to scan in ns_list
> + * @ns_list:	Pointer to list of NSIDs to scan
> + */
> +struct async_scan_info {
> +	struct nvme_ctrl *ctrl;
> +	atomic_t next_idx;
> +	__le32 *ns_list;
> +};
> +
> +static void nvme_scan_ns_async(void *data, async_cookie_t cookie)
> +{
> +	struct async_scan_info *scan_info = data;
> +	int idx;
> +	u32 nsid;
> +
> +	idx = (u32)atomic_fetch_add(1, &scan_info->next_idx);
> +	nsid = le32_to_cpu(scan_info->ns_list[idx]);
> +
> +	nvme_scan_ns(scan_info->ctrl, nsid);
> +}
> +
>   static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>   					unsigned nsid)
>   {
> @@ -3975,12 +4000,14 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>   static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
>   {
>   	const int nr_entries = NVME_IDENTIFY_DATA_SIZE / sizeof(__le32);
> -	__le32 *ns_list;
> +	struct async_scan_info scan_info;
>   	u32 prev = 0;
>   	int ret = 0, i;
> +	ASYNC_DOMAIN(domain);
>   
> -	ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
> -	if (!ns_list)
> +	scan_info.ctrl = ctrl;
> +	scan_info.ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
> +	if (!scan_info.ns_list)
>   		return -ENOMEM;
>   
>   	for (;;) {
> @@ -3990,28 +4017,33 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
>   			.identify.nsid		= cpu_to_le32(prev),
>   		};
>   
> -		ret = nvme_submit_sync_cmd(ctrl->admin_q, &cmd, ns_list,
> -					    NVME_IDENTIFY_DATA_SIZE);
> +		ret = nvme_submit_sync_cmd(ctrl->admin_q, &cmd,
> +					   scan_info.ns_list,
> +					   NVME_IDENTIFY_DATA_SIZE);
>   		if (ret) {
>   			dev_warn(ctrl->device,
>   				"Identify NS List failed (status=0x%x)\n", ret);
>   			goto free;
>   		}
>   
> +		atomic_set(&scan_info.next_idx, 0);
>   		for (i = 0; i < nr_entries; i++) {
> -			u32 nsid = le32_to_cpu(ns_list[i]);
> +			u32 nsid = le32_to_cpu(scan_info.ns_list[i]);
>   
>   			if (!nsid)	/* end of the list? */
>   				goto out;
> -			nvme_scan_ns(ctrl, nsid);
> +			async_schedule_domain(nvme_scan_ns_async, &scan_info,
> +						&domain);
>   			while (++prev < nsid)
>   				nvme_ns_remove_by_nsid(ctrl, prev);
>   		}
> +		async_synchronize_full_domain(&domain);

Let me see if I get this right ...
You allocate 'scan_info' on the stack, so every call to
'async_schedule_domain()' in the loop will be using the same
scan_info context, right?
So each instance of nvme_scan_ns_async() will be using
whichever value is in 'next_idx', right?
Effectively making 'nvme_scan_ns_async()' completely free-floating,
spawning 'nr_entry' instances, and letting each instance pick whichever
nsid is (at the time of execution) the next one.

If that's the case then I would welcome some comments in the code, as
this is somewhat non-obvious. And it also spells out clearly why the
atomic 'next_idx' value is absolutely crucial to that mechanism, and
we don't want anyone getting wrong ideas by 'optimizing' that away.

Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-07-16  6:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-15 20:34 [PATCH v3] nvme_core: scan namespaces asynchronously Stuart Hayes
2024-07-15 22:28 ` Sagi Grimberg
2024-07-16  1:23   ` stuart hayes
2024-07-16  6:37 ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox