[PATCH 0/2] block: Generalize physical entry definition

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] block: Generalize physical entry definition
@ 2025-11-15 16:22 Leon Romanovsky
  2025-11-15 16:22 ` [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes Leon Romanovsky
  2025-11-15 16:22 ` [PATCH 2/2] types: move phys_vec definition to common header Leon Romanovsky
  0 siblings, 2 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-11-15 16:22 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: linux-block, linux-kernel, linux-nvme

The block layer code is declared "struct phys_vec" entry which describes
contiguous chunk of physical memory. That definition is useful for all
possible users of DMA physical address-based API.

This series changes NVMe code to support larger chunks of memory by changing
length field from u32 to be size_t, which will be u64 on 64-bits platforms,
and promotes "struct phys_vec" to general place.

---
Leon Romanovsky (2):
      nvme-pci: Use size_t for length fields to handle larger sizes
      types: move phys_vec definition to common header

 block/blk-mq-dma.c      | 17 ++++++++---------
 drivers/nvme/host/pci.c |  4 ++--
 include/linux/types.h   |  5 +++++
 3 files changed, 15 insertions(+), 11 deletions(-)
---
base-commit: 79bd8c9814a273fa7ba43399e1c07adec3fc95db
change-id: 20251030-nvme-phys-types-988893249454

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes
  2025-11-15 16:22 [PATCH 0/2] block: Generalize physical entry definition Leon Romanovsky
@ 2025-11-15 16:22 ` Leon Romanovsky
  2025-11-15 17:33   ` David Laight
  2025-11-15 22:25   ` Chaitanya Kulkarni
  2025-11-15 16:22 ` [PATCH 2/2] types: move phys_vec definition to common header Leon Romanovsky
  1 sibling, 2 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-11-15 16:22 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: linux-block, linux-kernel, linux-nvme

From: Leon Romanovsky <leonro@nvidia.com>

This patch changes the length variables from unsigned int to size_t.
Using size_t ensures that we can handle larger sizes, as size_t is
always equal to or larger than the previously used u32 type.

Originally, u32 was used because blk-mq-dma code evolved from
scatter-gather implementation, which uses unsigned int to describe length.
This change will also allow us to reuse the existing struct phys_vec in places
that don't need scatter-gather.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 block/blk-mq-dma.c      | 14 +++++++++-----
 drivers/nvme/host/pci.c |  4 ++--
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index e9108ccaf4b0..cc3e2548cc30 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -8,7 +8,7 @@
 
 struct phys_vec {
 	phys_addr_t	paddr;
-	u32		len;
+	size_t		len;
 };
 
 static bool __blk_map_iter_next(struct blk_map_iter *iter)
@@ -112,8 +112,8 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
 		struct phys_vec *vec)
 {
 	enum dma_data_direction dir = rq_dma_dir(req);
-	unsigned int mapped = 0;
 	unsigned int attrs = 0;
+	size_t mapped = 0;
 	int error;
 
 	iter->addr = state->addr;
@@ -296,8 +296,10 @@ int __blk_rq_map_sg(struct request *rq, struct scatterlist *sglist,
 	blk_rq_map_iter_init(rq, &iter);
 	while (blk_map_iter_next(rq, &iter, &vec)) {
 		*last_sg = blk_next_sg(last_sg, sglist);
-		sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,
-				offset_in_page(vec.paddr));
+
+		WARN_ON_ONCE(overflows_type(vec.len, unsigned int));
+		sg_set_page(*last_sg, phys_to_page(vec.paddr),
+			    (unsigned int)vec.len, offset_in_page(vec.paddr));
 		nsegs++;
 	}
 
@@ -416,7 +418,9 @@ int blk_rq_map_integrity_sg(struct request *rq, struct scatterlist *sglist)
 
 	while (blk_map_iter_next(rq, &iter, &vec)) {
 		sg = blk_next_sg(&sg, sglist);
-		sg_set_page(sg, phys_to_page(vec.paddr), vec.len,
+
+		WARN_ON_ONCE(overflows_type(vec.len, unsigned int));
+		sg_set_page(sg, phys_to_page(vec.paddr), (unsigned int)vec.len,
 				offset_in_page(vec.paddr));
 		segments++;
 	}
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9085bed107fd..de512efa742d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -290,14 +290,14 @@ struct nvme_iod {
 	u8 flags;
 	u8 nr_descriptors;
 
-	unsigned int total_len;
+	size_t total_len;
 	struct dma_iova_state dma_state;
 	void *descriptors[NVME_MAX_NR_DESCRIPTORS];
 	struct nvme_dma_vec *dma_vecs;
 	unsigned int nr_dma_vecs;
 
 	dma_addr_t meta_dma;
-	unsigned int meta_total_len;
+	size_t meta_total_len;
 	struct dma_iova_state meta_dma_state;
 	struct nvme_sgl_desc *meta_descriptor;
 };

-- 
2.51.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes
  2025-11-15 16:22 ` [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes Leon Romanovsky
@ 2025-11-15 17:33   ` David Laight
  2025-11-15 18:05     ` Leon Romanovsky
  2025-11-15 22:25   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 9+ messages in thread
From: David Laight @ 2025-11-15 17:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	linux-block, linux-kernel, linux-nvme

On Sat, 15 Nov 2025 18:22:45 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> From: Leon Romanovsky <leonro@nvidia.com>
> 
> This patch changes the length variables from unsigned int to size_t.
> Using size_t ensures that we can handle larger sizes, as size_t is
> always equal to or larger than the previously used u32 type.

Where are requests larger than 4GB going to come from?

> Originally, u32 was used because blk-mq-dma code evolved from
> scatter-gather implementation, which uses unsigned int to describe length.
> This change will also allow us to reuse the existing struct phys_vec in places
> that don't need scatter-gather.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  block/blk-mq-dma.c      | 14 +++++++++-----
>  drivers/nvme/host/pci.c |  4 ++--
>  2 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> index e9108ccaf4b0..cc3e2548cc30 100644
> --- a/block/blk-mq-dma.c
> +++ b/block/blk-mq-dma.c
> @@ -8,7 +8,7 @@
>  
>  struct phys_vec {
>  	phys_addr_t	paddr;
> -	u32		len;
> +	size_t		len;
>  };
>  
>  static bool __blk_map_iter_next(struct blk_map_iter *iter)
> @@ -112,8 +112,8 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
>  		struct phys_vec *vec)
>  {
>  	enum dma_data_direction dir = rq_dma_dir(req);
> -	unsigned int mapped = 0;
>  	unsigned int attrs = 0;
> +	size_t mapped = 0;
>  	int error;
>  
>  	iter->addr = state->addr;
> @@ -296,8 +296,10 @@ int __blk_rq_map_sg(struct request *rq, struct scatterlist *sglist,
>  	blk_rq_map_iter_init(rq, &iter);
>  	while (blk_map_iter_next(rq, &iter, &vec)) {
>  		*last_sg = blk_next_sg(last_sg, sglist);
> -		sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,
> -				offset_in_page(vec.paddr));
> +
> +		WARN_ON_ONCE(overflows_type(vec.len, unsigned int));

I'm not at all sure you need that test.
blk_map_iter_next() has to guarantee that vec.len is valid.
(probably even less than a page size?)
Perhaps this code should be using a different type for the addr:len pair?

> +		sg_set_page(*last_sg, phys_to_page(vec.paddr),
> +			    (unsigned int)vec.len, offset_in_page(vec.paddr));

You definitely don't need the explicit cast.

	David

>  		nsegs++;
>  	}
>  
> @@ -416,7 +418,9 @@ int blk_rq_map_integrity_sg(struct request *rq, struct scatterlist *sglist)
>  
>  	while (blk_map_iter_next(rq, &iter, &vec)) {
>  		sg = blk_next_sg(&sg, sglist);
> -		sg_set_page(sg, phys_to_page(vec.paddr), vec.len,
> +
> +		WARN_ON_ONCE(overflows_type(vec.len, unsigned int));
> +		sg_set_page(sg, phys_to_page(vec.paddr), (unsigned int)vec.len,
>  				offset_in_page(vec.paddr));
>  		segments++;
>  	}
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 9085bed107fd..de512efa742d 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -290,14 +290,14 @@ struct nvme_iod {
>  	u8 flags;
>  	u8 nr_descriptors;
>  
> -	unsigned int total_len;
> +	size_t total_len;
>  	struct dma_iova_state dma_state;
>  	void *descriptors[NVME_MAX_NR_DESCRIPTORS];
>  	struct nvme_dma_vec *dma_vecs;
>  	unsigned int nr_dma_vecs;
>  
>  	dma_addr_t meta_dma;
> -	unsigned int meta_total_len;
> +	size_t meta_total_len;
>  	struct dma_iova_state meta_dma_state;
>  	struct nvme_sgl_desc *meta_descriptor;
>  };
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes
  2025-11-15 17:33   ` David Laight
@ 2025-11-15 18:05     ` Leon Romanovsky
  2025-11-15 22:28       ` David Laight
  0 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2025-11-15 18:05 UTC (permalink / raw)
  To: David Laight
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	linux-block, linux-kernel, linux-nvme

On Sat, Nov 15, 2025 at 05:33:41PM +0000, David Laight wrote:
> On Sat, 15 Nov 2025 18:22:45 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > This patch changes the length variables from unsigned int to size_t.
> > Using size_t ensures that we can handle larger sizes, as size_t is
> > always equal to or larger than the previously used u32 type.
> 
> Where are requests larger than 4GB going to come from?

The main goal is to reuse phys_vec structure. It is going to represent PCI
regions exposed through VFIO DMABUF interface. Their length is more than u32.

> 
> > Originally, u32 was used because blk-mq-dma code evolved from
> > scatter-gather implementation, which uses unsigned int to describe length.
> > This change will also allow us to reuse the existing struct phys_vec in places
> > that don't need scatter-gather.
> > 
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  block/blk-mq-dma.c      | 14 +++++++++-----
> >  drivers/nvme/host/pci.c |  4 ++--
> >  2 files changed, 11 insertions(+), 7 deletions(-)
> > 
> > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> > index e9108ccaf4b0..cc3e2548cc30 100644
> > --- a/block/blk-mq-dma.c
> > +++ b/block/blk-mq-dma.c
> > @@ -8,7 +8,7 @@
> >  
> >  struct phys_vec {
> >  	phys_addr_t	paddr;
> > -	u32		len;
> > +	size_t		len;
> >  };
> >  
> >  static bool __blk_map_iter_next(struct blk_map_iter *iter)
> > @@ -112,8 +112,8 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
> >  		struct phys_vec *vec)
> >  {
> >  	enum dma_data_direction dir = rq_dma_dir(req);
> > -	unsigned int mapped = 0;
> >  	unsigned int attrs = 0;
> > +	size_t mapped = 0;
> >  	int error;
> >  
> >  	iter->addr = state->addr;
> > @@ -296,8 +296,10 @@ int __blk_rq_map_sg(struct request *rq, struct scatterlist *sglist,
> >  	blk_rq_map_iter_init(rq, &iter);
> >  	while (blk_map_iter_next(rq, &iter, &vec)) {
> >  		*last_sg = blk_next_sg(last_sg, sglist);
> > -		sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,
> > -				offset_in_page(vec.paddr));
> > +
> > +		WARN_ON_ONCE(overflows_type(vec.len, unsigned int));
> 
> I'm not at all sure you need that test.
> blk_map_iter_next() has to guarantee that vec.len is valid.
> (probably even less than a page size?)
> Perhaps this code should be using a different type for the addr:len pair?

I added this test for future proof, this is why it doesn't "return" on
overflow, but prints dump stack and continues. It can't happen.

> 
> > +		sg_set_page(*last_sg, phys_to_page(vec.paddr),
> > +			    (unsigned int)vec.len, offset_in_page(vec.paddr));
> 
> You definitely don't need the explicit cast.

We degrade type from u64 to u32. Why don't we need cast?

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes
  2025-11-15 18:05     ` Leon Romanovsky
@ 2025-11-15 22:28       ` David Laight
  2025-11-16  7:14         ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: David Laight @ 2025-11-15 22:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	linux-block, linux-kernel, linux-nvme

On Sat, 15 Nov 2025 20:05:47 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Sat, Nov 15, 2025 at 05:33:41PM +0000, David Laight wrote:
> > On Sat, 15 Nov 2025 18:22:45 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > This patch changes the length variables from unsigned int to size_t.
> > > Using size_t ensures that we can handle larger sizes, as size_t is
> > > always equal to or larger than the previously used u32 type.  
> > 
> > Where are requests larger than 4GB going to come from?  
> 
> The main goal is to reuse phys_vec structure. It is going to represent PCI
> regions exposed through VFIO DMABUF interface. Their length is more than u32.

Unless you actually need to have the same structure (because some function
is used in both places) there isn't really any need to have a single structure
for a a phy_addr:length pair.
Indeed keeping them separate can even remove bugs.

For instance (I think) blk_map_iter_next() returns an addr:len pair
that is only only used for the following sg_set_page() call - which
has separate parameters for phys_to_page(addr) and len.
So unless there are other place it is used it doesn't need to be
the same structure at all.
(Other people might disagree...)

> 
> >   
> > > Originally, u32 was used because blk-mq-dma code evolved from
> > > scatter-gather implementation, which uses unsigned int to describe length.
> > > This change will also allow us to reuse the existing struct phys_vec in places
> > > that don't need scatter-gather.
> > > 
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > ---
> > >  block/blk-mq-dma.c      | 14 +++++++++-----
> > >  drivers/nvme/host/pci.c |  4 ++--
> > >  2 files changed, 11 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> > > index e9108ccaf4b0..cc3e2548cc30 100644
> > > --- a/block/blk-mq-dma.c
> > > +++ b/block/blk-mq-dma.c
> > > @@ -8,7 +8,7 @@
> > >  
> > >  struct phys_vec {
> > >  	phys_addr_t	paddr;
> > > -	u32		len;
> > > +	size_t		len;
> > >  };
> > >  
> > >  static bool __blk_map_iter_next(struct blk_map_iter *iter)
> > > @@ -112,8 +112,8 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
> > >  		struct phys_vec *vec)
> > >  {
> > >  	enum dma_data_direction dir = rq_dma_dir(req);
> > > -	unsigned int mapped = 0;
> > >  	unsigned int attrs = 0;
> > > +	size_t mapped = 0;
> > >  	int error;
> > >  
> > >  	iter->addr = state->addr;
> > > @@ -296,8 +296,10 @@ int __blk_rq_map_sg(struct request *rq, struct scatterlist *sglist,
> > >  	blk_rq_map_iter_init(rq, &iter);
> > >  	while (blk_map_iter_next(rq, &iter, &vec)) {
> > >  		*last_sg = blk_next_sg(last_sg, sglist);
> > > -		sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,
> > > -				offset_in_page(vec.paddr));
> > > +
> > > +		WARN_ON_ONCE(overflows_type(vec.len, unsigned int));  
> > 
> > I'm not at all sure you need that test.
> > blk_map_iter_next() has to guarantee that vec.len is valid.
> > (probably even less than a page size?)
> > Perhaps this code should be using a different type for the addr:len pair?  
> 
> I added this test for future proof, this is why it doesn't "return" on
> overflow, but prints dump stack and continues. It can't happen.

No, on a large number of installed systems it prints the stack an panicks.
Were it to continue the effect would be all wrong anyway.
But blk_map_iter_next() guarantees to return a sane length.

> 
> >   
> > > +		sg_set_page(*last_sg, phys_to_page(vec.paddr),
> > > +			    (unsigned int)vec.len, offset_in_page(vec.paddr));  
> > 
> > You definitely don't need the explicit cast.  
> 
> We degrade type from u64 to u32. Why don't we need cast?

Because you don't need to cast pretty much all integer conversions.
Any warnings compilers might output for such assignments really are best
disabled.
The more casts you add to code to remove 'silly' compiler warnings the
harder it is to find the ones that actually have a desired effect and/or
unwanted effects that are actually bugs.

I'm busy trying to fix a load of min_t(u32, a, b) which mask off high
significant bits from u64 values.
The casts got added (implicitly by using min_t() instead of min()) because
min() required the types match - and in a lot of cases the programmer
picked the type of the result not that of the larger parameter.
Others are just cut&paste of another line.
But the effect is the same, the casts add bugs rather than making the
code better.

I've even seen:
	uchar_buf[0] = (unsigned char)(int_val & 0xff);
(Presumably written to avoid compiler warnings.)
and looked at the object code to find the compiler (not gcc) anded the
value with 0xff for the '& 0xff', anded it with 0xff again for the cast
and then did a memory write of the low bits.

casts could easily be the next 'bug'...

	David

> 
> Thanks


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes
  2025-11-15 22:28       ` David Laight
@ 2025-11-16  7:14         ` Leon Romanovsky
  0 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2025-11-16  7:14 UTC (permalink / raw)
  To: David Laight
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	linux-block, linux-kernel, linux-nvme

On Sat, Nov 15, 2025 at 10:28:50PM +0000, David Laight wrote:
> On Sat, 15 Nov 2025 20:05:47 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Sat, Nov 15, 2025 at 05:33:41PM +0000, David Laight wrote:
> > > On Sat, 15 Nov 2025 18:22:45 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >   
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > 
> > > > This patch changes the length variables from unsigned int to size_t.
> > > > Using size_t ensures that we can handle larger sizes, as size_t is
> > > > always equal to or larger than the previously used u32 type.  
> > > 
> > > Where are requests larger than 4GB going to come from?  
> > 
> > The main goal is to reuse phys_vec structure. It is going to represent PCI
> > regions exposed through VFIO DMABUF interface. Their length is more than u32.
> 
> Unless you actually need to have the same structure (because some function
> is used in both places) there isn't really any need to have a single structure
> for a a phy_addr:length pair.

Actually, we do plan to use them. In RDMA and probably in DMA API also,
as I was suggested to provide general DMA map function, which will
perform mapping for array of phys_vecs.

> Indeed keeping them separate can even remove bugs.

Or introduce, it depends on the situation.

> 
> For instance (I think) blk_map_iter_next() returns an addr:len pair
> that is only only used for the following sg_set_page() call - which
> has separate parameters for phys_to_page(addr) and len.

It is temporary, because we needed to use old SG interface. At some
point of time (after we will finish discussion/implementation of VFIO
and DMABUF), the blk_rq_map_*_sg() routines that are used for RDMA will 
be changed to do not use SG at all.

> So unless there are other place it is used it doesn't need to be
> the same structure at all.
> (Other people might disagree...)

Yes, VFIO, DMABUF and RDMA are other places, so it is better to move
that struct phys_vec to general place now, so in next cycle we will
be able to reuse it.

> 
> > 
> > >   
> > > > Originally, u32 was used because blk-mq-dma code evolved from
> > > > scatter-gather implementation, which uses unsigned int to describe length.
> > > > This change will also allow us to reuse the existing struct phys_vec in places
> > > > that don't need scatter-gather.
> > > > 
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > ---
> > > >  block/blk-mq-dma.c      | 14 +++++++++-----
> > > >  drivers/nvme/host/pci.c |  4 ++--
> > > >  2 files changed, 11 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> > > > index e9108ccaf4b0..cc3e2548cc30 100644
> > > > --- a/block/blk-mq-dma.c
> > > > +++ b/block/blk-mq-dma.c
> > > > @@ -8,7 +8,7 @@
> > > >  
> > > >  struct phys_vec {
> > > >  	phys_addr_t	paddr;
> > > > -	u32		len;
> > > > +	size_t		len;
> > > >  };
> > > >  
> > > >  static bool __blk_map_iter_next(struct blk_map_iter *iter)
> > > > @@ -112,8 +112,8 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
> > > >  		struct phys_vec *vec)
> > > >  {
> > > >  	enum dma_data_direction dir = rq_dma_dir(req);
> > > > -	unsigned int mapped = 0;
> > > >  	unsigned int attrs = 0;
> > > > +	size_t mapped = 0;
> > > >  	int error;
> > > >  
> > > >  	iter->addr = state->addr;
> > > > @@ -296,8 +296,10 @@ int __blk_rq_map_sg(struct request *rq, struct scatterlist *sglist,
> > > >  	blk_rq_map_iter_init(rq, &iter);
> > > >  	while (blk_map_iter_next(rq, &iter, &vec)) {
> > > >  		*last_sg = blk_next_sg(last_sg, sglist);
> > > > -		sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,
> > > > -				offset_in_page(vec.paddr));
> > > > +
> > > > +		WARN_ON_ONCE(overflows_type(vec.len, unsigned int));  
> > > 
> > > I'm not at all sure you need that test.
> > > blk_map_iter_next() has to guarantee that vec.len is valid.
> > > (probably even less than a page size?)
> > > Perhaps this code should be using a different type for the addr:len pair?  
> > 
> > I added this test for future proof, this is why it doesn't "return" on
> > overflow, but prints dump stack and continues. It can't happen.
> 
> No, on a large number of installed systems it prints the stack an panicks.

It will print such stack if vec.len is more than u32, which is not
supposed to be. 

> Were it to continue the effect would be all wrong anyway.
> But blk_map_iter_next() guarantees to return a sane length.

It is not guarantee. If I understand it correctly, the guarantee comes
from upper layer which limits request size because of SG limitations.

> 
> > 
> > >   
> > > > +		sg_set_page(*last_sg, phys_to_page(vec.paddr),
> > > > +			    (unsigned int)vec.len, offset_in_page(vec.paddr));  
> > > 
> > > You definitely don't need the explicit cast.  
> > 
> > We degrade type from u64 to u32. Why don't we need cast?
> 
> Because you don't need to cast pretty much all integer conversions.
> Any warnings compilers might output for such assignments really are best
> disabled.
> The more casts you add to code to remove 'silly' compiler warnings the
> harder it is to find the ones that actually have a desired effect and/or
> unwanted effects that are actually bugs.
> 
> I'm busy trying to fix a load of min_t(u32, a, b) which mask off high
> significant bits from u64 values.
> The casts got added (implicitly by using min_t() instead of min()) because
> min() required the types match - and in a lot of cases the programmer
> picked the type of the result not that of the larger parameter.
> Others are just cut&paste of another line.
> But the effect is the same, the casts add bugs rather than making the
> code better.
> 
> I've even seen:
> 	uchar_buf[0] = (unsigned char)(int_val & 0xff);
> (Presumably written to avoid compiler warnings.)
> and looked at the object code to find the compiler (not gcc) anded the
> value with 0xff for the '& 0xff', anded it with 0xff again for the cast
> and then did a memory write of the low bits.
> 
> casts could easily be the next 'bug'...

I have no such strong feelings about cast here and can remove it.

Thanks

> 
> 	David
> 
> > 
> > Thanks
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes
  2025-11-15 16:22 ` [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes Leon Romanovsky
  2025-11-15 17:33   ` David Laight
@ 2025-11-15 22:25   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2025-11-15 22:25 UTC (permalink / raw)
  To: Leon Romanovsky, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg
  Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org

On 11/15/25 08:22, Leon Romanovsky wrote:
> From: Leon Romanovsky<leonro@nvidia.com>
>
> This patch changes the length variables from unsigned int to size_t.
> Using size_t ensures that we can handle larger sizes, as size_t is
> always equal to or larger than the previously used u32 type.
>
> Originally, u32 was used because blk-mq-dma code evolved from
> scatter-gather implementation, which uses unsigned int to describe length.
> This change will also allow us to reuse the existing struct phys_vec in places
> that don't need scatter-gather.
>
> Signed-off-by: Leon Romanovsky<leonro@nvidia.com>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] types: move phys_vec definition to common header
  2025-11-15 16:22 [PATCH 0/2] block: Generalize physical entry definition Leon Romanovsky
  2025-11-15 16:22 ` [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes Leon Romanovsky
@ 2025-11-15 16:22 ` Leon Romanovsky
  2025-11-15 22:25   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2025-11-15 16:22 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg
  Cc: linux-block, linux-kernel, linux-nvme

From: Leon Romanovsky <leonro@nvidia.com>

Move the struct phys_vec definition from block/blk-mq-dma.c to
include/linux/types.h to make it available for use across the kernel.

The phys_vec structure represents a physical address range with a
length, which is used by the new physical address-based DMA mapping
API. This structure is already used by the block layer and will be
needed for DMA phys API users.

Moving this definition to types.h provides a centralized location
for this common data structure and eliminates code duplication
across subsystems that need to work with physical address ranges.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 block/blk-mq-dma.c    | 5 -----
 include/linux/types.h | 5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index cc3e2548cc30..ba7e77bbe7fa 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -6,11 +6,6 @@
 #include <linux/blk-mq-dma.h>
 #include "blk.h"
 
-struct phys_vec {
-	phys_addr_t	paddr;
-	size_t		len;
-};
-
 static bool __blk_map_iter_next(struct blk_map_iter *iter)
 {
 	if (iter->iter.bi_size)
diff --git a/include/linux/types.h b/include/linux/types.h
index 6dfdb8e8e4c3..6cc2d7cba9b3 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -170,6 +170,11 @@ typedef u64 phys_addr_t;
 typedef u32 phys_addr_t;
 #endif
 
+struct phys_vec {
+	phys_addr_t	paddr;
+	size_t		len;
+};
+
 typedef phys_addr_t resource_size_t;
 
 /*

-- 
2.51.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] types: move phys_vec definition to common header
  2025-11-15 16:22 ` [PATCH 2/2] types: move phys_vec definition to common header Leon Romanovsky
@ 2025-11-15 22:25   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2025-11-15 22:25 UTC (permalink / raw)
  To: Leon Romanovsky, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg
  Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org

On 11/15/25 08:22, Leon Romanovsky wrote:
> From: Leon Romanovsky<leonro@nvidia.com>
>
> Move the struct phys_vec definition from block/blk-mq-dma.c to
> include/linux/types.h to make it available for use across the kernel.
>
> The phys_vec structure represents a physical address range with a
> length, which is used by the new physical address-based DMA mapping
> API. This structure is already used by the block layer and will be
> needed for DMA phys API users.
>
> Moving this definition to types.h provides a centralized location
> for this common data structure and eliminates code duplication
> across subsystems that need to work with physical address ranges.
>
> Signed-off-by: Leon Romanovsky<leonro@nvidia.com>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-11-16  7:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-15 16:22 [PATCH 0/2] block: Generalize physical entry definition Leon Romanovsky
2025-11-15 16:22 ` [PATCH 1/2] nvme-pci: Use size_t for length fields to handle larger sizes Leon Romanovsky
2025-11-15 17:33   ` David Laight
2025-11-15 18:05     ` Leon Romanovsky
2025-11-15 22:28       ` David Laight
2025-11-16  7:14         ` Leon Romanovsky
2025-11-15 22:25   ` Chaitanya Kulkarni
2025-11-15 16:22 ` [PATCH 2/2] types: move phys_vec definition to common header Leon Romanovsky
2025-11-15 22:25   ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).