* [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type
2023-09-29 17:00 [PATCH 0/3] UIO_MEM_DMA_COHERENT for cnic/bnx2/bnx2x Chris Leech
@ 2023-09-29 17:00 ` Chris Leech
2023-09-30 7:10 ` Greg Kroah-Hartman
2023-10-02 6:09 ` Christoph Hellwig
2023-09-29 17:00 ` [PATCH 2/3] cnic,bnx2,bnx2x: page align uio mmap allocations Chris Leech
2023-09-29 17:00 ` [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT Chris Leech
2 siblings, 2 replies; 23+ messages in thread
From: Chris Leech @ 2023-09-29 17:00 UTC (permalink / raw)
To: Greg Kroah-Hartman, Christoph Hellwig, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra
Cc: Nilesh Javali, Manish Rangankar, Jerry Snitselaar, John Meneghini,
Lee Duncan, Mike Christie, Hannes Reinecke, netdev, linux-kernel
Add a UIO memtype specificially for sharing dma_alloc_coherent
memory with userspace, backed by dma_mmap_coherent.
Signed-off-by: Chris Leech <cleech@redhat.com>
---
drivers/uio/uio.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/uio_driver.h | 12 ++++++++++--
2 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 62082d64ece0..f8f1f7ba6378 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -24,6 +24,7 @@
#include <linux/kobject.h>
#include <linux/cdev.h>
#include <linux/uio_driver.h>
+#include <linux/dma-mapping.h>
#define UIO_MAX_DEVICES (1U << MINORBITS)
@@ -759,6 +760,36 @@ static int uio_mmap_physical(struct vm_area_struct *vma)
vma->vm_page_prot);
}
+static int uio_mmap_dma_coherent(struct vm_area_struct *vma)
+{
+ struct uio_device *idev = vma->vm_private_data;
+ int mi = uio_find_mem_index(vma);
+ struct uio_mem *mem;
+ int rc;
+
+ if (mi < 0)
+ return -EINVAL;
+ mem = idev->info->mem + mi;
+
+ if (mem->dma_addr & ~PAGE_MASK)
+ return -ENODEV;
+ if (vma->vm_end - vma->vm_start > mem->size)
+ return -EINVAL;
+
+ /*
+ * UIO uses offset to index into the maps for a device.
+ * We need to clear vm_pgoff for dma_mmap_coherent.
+ */
+ vma->vm_pgoff = 0;
+ rc = dma_mmap_coherent(mem->dma_device,
+ vma,
+ mem->virtual_addr,
+ mem->dma_addr,
+ vma->vm_end - vma->vm_start);
+ vma->vm_pgoff = mi;
+ return rc;
+}
+
static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
{
struct uio_listener *listener = filep->private_data;
@@ -806,6 +837,9 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
case UIO_MEM_VIRTUAL:
ret = uio_mmap_logical(vma);
break;
+ case UIO_MEM_DMA_COHERENT:
+ ret = uio_mmap_dma_coherent(vma);
+ break;
default:
ret = -EINVAL;
}
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 47c5962b876b..ede58e984658 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -36,11 +36,18 @@ struct uio_map;
*/
struct uio_mem {
const char *name;
- phys_addr_t addr;
+ union {
+ phys_addr_t addr;
+ dma_addr_t dma_addr;
+ };
unsigned long offs;
resource_size_t size;
int memtype;
- void __iomem *internal_addr;
+ union {
+ void __iomem *internal_addr;
+ void *virtual_addr;
+ };
+ struct device *dma_device;
struct uio_map *map;
};
@@ -158,6 +165,7 @@ extern int __must_check
#define UIO_MEM_LOGICAL 2
#define UIO_MEM_VIRTUAL 3
#define UIO_MEM_IOVA 4
+#define UIO_MEM_DMA_COHERENT 5
/* defines for uio_port->porttype */
#define UIO_PORT_NONE 0
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type
2023-09-29 17:00 ` [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type Chris Leech
@ 2023-09-30 7:10 ` Greg Kroah-Hartman
2023-09-30 18:08 ` Chris Leech
2023-10-02 6:09 ` Christoph Hellwig
1 sibling, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2023-09-30 7:10 UTC (permalink / raw)
To: Chris Leech
Cc: Christoph Hellwig, Rasesh Mody, Ariel Elior, Sudarsana Kalluru,
Manish Chopra, Nilesh Javali, Manish Rangankar, Jerry Snitselaar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On Fri, Sep 29, 2023 at 10:00:21AM -0700, Chris Leech wrote:
> Add a UIO memtype specificially for sharing dma_alloc_coherent
> memory with userspace, backed by dma_mmap_coherent.
Are you sure that you can share this type of memory with userspace
safely? And you are saying what you are doing here, but not why you
want to do it and who will use it.
What are the userspace implications for accessing this type of memory?
>
> Signed-off-by: Chris Leech <cleech@redhat.com>
> ---
> drivers/uio/uio.c | 34 ++++++++++++++++++++++++++++++++++
> include/linux/uio_driver.h | 12 ++++++++++--
> 2 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
> index 62082d64ece0..f8f1f7ba6378 100644
> --- a/drivers/uio/uio.c
> +++ b/drivers/uio/uio.c
> @@ -24,6 +24,7 @@
> #include <linux/kobject.h>
> #include <linux/cdev.h>
> #include <linux/uio_driver.h>
> +#include <linux/dma-mapping.h>
>
> #define UIO_MAX_DEVICES (1U << MINORBITS)
>
> @@ -759,6 +760,36 @@ static int uio_mmap_physical(struct vm_area_struct *vma)
> vma->vm_page_prot);
> }
>
> +static int uio_mmap_dma_coherent(struct vm_area_struct *vma)
> +{
> + struct uio_device *idev = vma->vm_private_data;
> + int mi = uio_find_mem_index(vma);
> + struct uio_mem *mem;
> + int rc;
> +
> + if (mi < 0)
> + return -EINVAL;
> + mem = idev->info->mem + mi;
> +
> + if (mem->dma_addr & ~PAGE_MASK)
> + return -ENODEV;
> + if (vma->vm_end - vma->vm_start > mem->size)
> + return -EINVAL;
> +
> + /*
> + * UIO uses offset to index into the maps for a device.
> + * We need to clear vm_pgoff for dma_mmap_coherent.
> + */
> + vma->vm_pgoff = 0;
> + rc = dma_mmap_coherent(mem->dma_device,
> + vma,
> + mem->virtual_addr,
> + mem->dma_addr,
> + vma->vm_end - vma->vm_start);
> + vma->vm_pgoff = mi;
> + return rc;
> +}
> +
> static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
> {
> struct uio_listener *listener = filep->private_data;
> @@ -806,6 +837,9 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
> case UIO_MEM_VIRTUAL:
> ret = uio_mmap_logical(vma);
> break;
> + case UIO_MEM_DMA_COHERENT:
> + ret = uio_mmap_dma_coherent(vma);
> + break;
> default:
> ret = -EINVAL;
> }
> diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
> index 47c5962b876b..ede58e984658 100644
> --- a/include/linux/uio_driver.h
> +++ b/include/linux/uio_driver.h
> @@ -36,11 +36,18 @@ struct uio_map;
> */
> struct uio_mem {
> const char *name;
> - phys_addr_t addr;
> + union {
> + phys_addr_t addr;
> + dma_addr_t dma_addr;
> + };
> unsigned long offs;
> resource_size_t size;
> int memtype;
> - void __iomem *internal_addr;
> + union {
> + void __iomem *internal_addr;
> + void *virtual_addr;
> + };
> + struct device *dma_device;
Why are you adding a new struct device here?
And why the unions? How are you going to verify that they are being
used correctly? What space savings are you attempting to do here and
why?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type
2023-09-30 7:10 ` Greg Kroah-Hartman
@ 2023-09-30 18:08 ` Chris Leech
0 siblings, 0 replies; 23+ messages in thread
From: Chris Leech @ 2023-09-30 18:08 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Christoph Hellwig, Rasesh Mody, Ariel Elior, Sudarsana Kalluru,
Manish Chopra, Nilesh Javali, Manish Rangankar, Jerry Snitselaar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On Sat, Sep 30, 2023 at 09:10:10AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 10:00:21AM -0700, Chris Leech wrote:
> > Add a UIO memtype specificially for sharing dma_alloc_coherent
> > memory with userspace, backed by dma_mmap_coherent.
>
> Are you sure that you can share this type of memory with userspace
> safely? And you are saying what you are doing here, but not why you
> want to do it and who will use it.
>
> What are the userspace implications for accessing this type of memory?
Thanks for taking the time to look at this Greg.
I'm trying to help Marvell fix a regression with these drivers, by
figuring out what the right way to handle this type of mmap is.
The dma_mmap_coherent API exists for exactly this, so I thought making
the uio interface aware of it made sense. There are uio drivers sharing
dma_alloc_coherent memory (uio_dmem_genirq, uio_pruss) using
UIO_MEM_PHYS, but that falls apart in the face of an iommu.
> > struct uio_mem {
> > const char *name;
> > - phys_addr_t addr;
> > + union {
> > + phys_addr_t addr;
> > + dma_addr_t dma_addr;
> > + };
> > unsigned long offs;
> > resource_size_t size;
> > int memtype;
> > - void __iomem *internal_addr;
> > + union {
> > + void __iomem *internal_addr;
> > + void *virtual_addr;
> > + };
> > + struct device *dma_device;
>
> Why are you adding a new struct device here?
dma_mmap_coherent wants it.
> And why the unions? How are you going to verify that they are being
> used correctly? What space savings are you attempting to do here and
> why?
I should have expected that would be questioned, I was being paranoid
about mixing different pointer and address types. I can remove the
unions if putting a dma_addr_t in addr going to be OK.
- Chris
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type
2023-09-29 17:00 ` [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type Chris Leech
2023-09-30 7:10 ` Greg Kroah-Hartman
@ 2023-10-02 6:09 ` Christoph Hellwig
1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-10-02 6:09 UTC (permalink / raw)
To: Chris Leech
Cc: Greg Kroah-Hartman, Christoph Hellwig, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra, Nilesh Javali, Manish Rangankar,
Jerry Snitselaar, John Meneghini, Lee Duncan, Mike Christie,
Hannes Reinecke, netdev, linux-kernel
This is the right way to map dma coherent memory to userspace.
I have to say I agree with the doubts on what it is actually
trying to do, though.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/3] cnic,bnx2,bnx2x: page align uio mmap allocations
2023-09-29 17:00 [PATCH 0/3] UIO_MEM_DMA_COHERENT for cnic/bnx2/bnx2x Chris Leech
2023-09-29 17:00 ` [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type Chris Leech
@ 2023-09-29 17:00 ` Chris Leech
2023-09-29 17:18 ` Jacob Keller
2023-10-02 6:11 ` Christoph Hellwig
2023-09-29 17:00 ` [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT Chris Leech
2 siblings, 2 replies; 23+ messages in thread
From: Chris Leech @ 2023-09-29 17:00 UTC (permalink / raw)
To: Greg Kroah-Hartman, Christoph Hellwig, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra
Cc: Nilesh Javali, Manish Rangankar, Jerry Snitselaar, John Meneghini,
Lee Duncan, Mike Christie, Hannes Reinecke, netdev, linux-kernel
Allocations in these drivers that will be mmaped through a uio device
should be made in multiples of PAGE_SIZE to avoid exposing additional
kernel memory unintentionally.
Signed-off-by: Chris Leech <cleech@redhat.com>
---
drivers/net/ethernet/broadcom/bnx2.c | 1 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 8 ++++----
drivers/net/ethernet/broadcom/cnic.c | 9 +++++----
3 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 0d917a9699c5..84a04eec654a 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -837,6 +837,7 @@ bnx2_alloc_stats_blk(struct net_device *dev)
BNX2_SBLK_MSIX_ALIGN_SIZE);
bp->status_stats_size = status_blk_size +
sizeof(struct statistics_block);
+ bp->status_stats_size = PAGE_ALIGN(bp->status_stats_size);
status_blk = dma_alloc_coherent(&bp->pdev->dev, bp->status_stats_size,
&bp->status_blk_mapping, GFP_KERNEL);
if (!status_blk)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 0d8e61c63c7c..2fcde42a05c1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -8270,10 +8270,10 @@ void bnx2x_free_mem_cnic(struct bnx2x *bp)
if (!CHIP_IS_E1x(bp))
BNX2X_PCI_FREE(bp->cnic_sb.e2_sb, bp->cnic_sb_mapping,
- sizeof(struct host_hc_status_block_e2));
+ PAGE_ALIGN(sizeof(struct host_hc_status_block_e2)));
else
BNX2X_PCI_FREE(bp->cnic_sb.e1x_sb, bp->cnic_sb_mapping,
- sizeof(struct host_hc_status_block_e1x));
+ PAGE_ALIGN(sizeof(struct host_hc_status_block_e1x)));
BNX2X_PCI_FREE(bp->t2, bp->t2_mapping, SRC_T2_SZ);
}
@@ -8316,12 +8316,12 @@ int bnx2x_alloc_mem_cnic(struct bnx2x *bp)
if (!CHIP_IS_E1x(bp)) {
/* size = the status block + ramrod buffers */
bp->cnic_sb.e2_sb = BNX2X_PCI_ALLOC(&bp->cnic_sb_mapping,
- sizeof(struct host_hc_status_block_e2));
+ PAGE_ALIGN(sizeof(struct host_hc_status_block_e2)));
if (!bp->cnic_sb.e2_sb)
goto alloc_mem_err;
} else {
bp->cnic_sb.e1x_sb = BNX2X_PCI_ALLOC(&bp->cnic_sb_mapping,
- sizeof(struct host_hc_status_block_e1x));
+ PAGE_ALIGN(sizeof(struct host_hc_status_block_e1x)));
if (!bp->cnic_sb.e1x_sb)
goto alloc_mem_err;
}
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 7926aaef8f0c..67ec397bd171 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1026,13 +1026,14 @@ static int __cnic_alloc_uio_rings(struct cnic_uio_dev *udev, int pages)
return 0;
udev->l2_ring_size = pages * CNIC_PAGE_SIZE;
+ udev->l2_ring_size = PAGE_ALIGN(udev->l2_ring_size);
udev->l2_ring = dma_alloc_coherent(&udev->pdev->dev, udev->l2_ring_size,
&udev->l2_ring_map, GFP_KERNEL);
if (!udev->l2_ring)
return -ENOMEM;
udev->l2_buf_size = (cp->l2_rx_ring_size + 1) * cp->l2_single_buf_size;
- udev->l2_buf_size = CNIC_PAGE_ALIGN(udev->l2_buf_size);
+ udev->l2_buf_size = PAGE_ALIGN(udev->l2_buf_size);
udev->l2_buf = dma_alloc_coherent(&udev->pdev->dev, udev->l2_buf_size,
&udev->l2_buf_map, GFP_KERNEL);
if (!udev->l2_buf) {
@@ -1108,9 +1109,9 @@ static int cnic_init_uio(struct cnic_dev *dev)
uinfo->mem[1].addr = (unsigned long) cp->status_blk.gen &
CNIC_PAGE_MASK;
if (cp->ethdev->drv_state & CNIC_DRV_STATE_USING_MSIX)
- uinfo->mem[1].size = BNX2_SBLK_MSIX_ALIGN_SIZE * 9;
+ uinfo->mem[1].size = PAGE_ALIGN(BNX2_SBLK_MSIX_ALIGN_SIZE * 9);
else
- uinfo->mem[1].size = BNX2_SBLK_MSIX_ALIGN_SIZE;
+ uinfo->mem[1].size = PAGE_ALIGN(BNX2_SBLK_MSIX_ALIGN_SIZE);
uinfo->name = "bnx2_cnic";
} else if (test_bit(CNIC_F_BNX2X_CLASS, &dev->flags)) {
@@ -1118,7 +1119,7 @@ static int cnic_init_uio(struct cnic_dev *dev)
uinfo->mem[1].addr = (unsigned long) cp->bnx2x_def_status_blk &
CNIC_PAGE_MASK;
- uinfo->mem[1].size = sizeof(*cp->bnx2x_def_status_blk);
+ uinfo->mem[1].size = PAGE_ALIGN(sizeof(*cp->bnx2x_def_status_blk));
uinfo->name = "bnx2x_cnic";
}
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 2/3] cnic,bnx2,bnx2x: page align uio mmap allocations
2023-09-29 17:00 ` [PATCH 2/3] cnic,bnx2,bnx2x: page align uio mmap allocations Chris Leech
@ 2023-09-29 17:18 ` Jacob Keller
2023-10-02 6:11 ` Christoph Hellwig
1 sibling, 0 replies; 23+ messages in thread
From: Jacob Keller @ 2023-09-29 17:18 UTC (permalink / raw)
To: Chris Leech, Greg Kroah-Hartman, Christoph Hellwig, Rasesh Mody,
Ariel Elior, Sudarsana Kalluru, Manish Chopra
Cc: Nilesh Javali, Manish Rangankar, Jerry Snitselaar, John Meneghini,
Lee Duncan, Mike Christie, Hannes Reinecke, netdev, linux-kernel
On 9/29/2023 10:00 AM, Chris Leech wrote:
> Allocations in these drivers that will be mmaped through a uio device
> should be made in multiples of PAGE_SIZE to avoid exposing additional
> kernel memory unintentionally.
>
> Signed-off-by: Chris Leech <cleech@redhat.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] cnic,bnx2,bnx2x: page align uio mmap allocations
2023-09-29 17:00 ` [PATCH 2/3] cnic,bnx2,bnx2x: page align uio mmap allocations Chris Leech
2023-09-29 17:18 ` Jacob Keller
@ 2023-10-02 6:11 ` Christoph Hellwig
1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2023-10-02 6:11 UTC (permalink / raw)
To: Chris Leech
Cc: Greg Kroah-Hartman, Christoph Hellwig, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra, Nilesh Javali, Manish Rangankar,
Jerry Snitselaar, John Meneghini, Lee Duncan, Mike Christie,
Hannes Reinecke, netdev, linux-kernel
On Fri, Sep 29, 2023 at 10:00:22AM -0700, Chris Leech wrote:
> Allocations in these drivers that will be mmaped through a uio device
> should be made in multiples of PAGE_SIZE to avoid exposing additional
> kernel memory unintentionally.
dma coherent allocations are always rounded up to a page, although
the documentation for that is somewhat obscure and I wouldn't fault
anyone for not relying on it.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-09-29 17:00 [PATCH 0/3] UIO_MEM_DMA_COHERENT for cnic/bnx2/bnx2x Chris Leech
2023-09-29 17:00 ` [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type Chris Leech
2023-09-29 17:00 ` [PATCH 2/3] cnic,bnx2,bnx2x: page align uio mmap allocations Chris Leech
@ 2023-09-29 17:00 ` Chris Leech
2023-09-29 17:19 ` Jacob Keller
2023-09-30 7:06 ` Greg Kroah-Hartman
2 siblings, 2 replies; 23+ messages in thread
From: Chris Leech @ 2023-09-29 17:00 UTC (permalink / raw)
To: Greg Kroah-Hartman, Christoph Hellwig, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra
Cc: Nilesh Javali, Manish Rangankar, Jerry Snitselaar, John Meneghini,
Lee Duncan, Mike Christie, Hannes Reinecke, netdev, linux-kernel
Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
for dma_alloc_coherent buffers.
The cnic l2_ring and l2_buf mmaps have caused page refcount issues since
the misuse of the __GFP_COMP flag was removed from their
dma_alloc_coherent calls. Fix that by having the uio device use
dma_mmap_coherent.
The bnx2 and bnx2x status block allocations are also dma_alloc_coherent,
and should use dma_mmap_coherent. They didn't allocate multiple pages,
but also didn't seem to work correctly with an iommu enabled.
Fixes: bb73955c0b1d ("cnic: don't pass bogus GFP_ flags to dma_alloc_coherent")
Signed-off-by: Chris Leech <cleech@redhat.com>
---
drivers/net/ethernet/broadcom/bnx2.c | 1 +
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 ++
drivers/net/ethernet/broadcom/cnic.c | 25 ++++++++++++-------
drivers/net/ethernet/broadcom/cnic.h | 1 +
drivers/net/ethernet/broadcom/cnic_if.h | 1 +
5 files changed, 21 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 84a04eec654a..490f88ad3bd2 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -367,6 +367,7 @@ static void bnx2_setup_cnic_irq_info(struct bnx2 *bp)
cp->irq_arr[0].status_blk = (void *)
((unsigned long) bnapi->status_blk.msi +
(BNX2_SBLK_MSIX_ALIGN_SIZE * sb_id));
+ cp->irq_arr[0].status_blk_map = bp->status_blk_mapping;
cp->irq_arr[0].status_blk_num = sb_id;
cp->num_irq = 1;
}
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 2fcde42a05c1..dbaa90b7f889 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14912,9 +14912,11 @@ void bnx2x_setup_cnic_irq_info(struct bnx2x *bp)
else
cp->irq_arr[0].status_blk = (void *)bp->cnic_sb.e1x_sb;
+ cp->irq_arr[0].status_blk_map = bp->cnic_sb_mapping;
cp->irq_arr[0].status_blk_num = bnx2x_cnic_fw_sb_id(bp);
cp->irq_arr[0].status_blk_num2 = bnx2x_cnic_igu_sb_id(bp);
cp->irq_arr[1].status_blk = bp->def_status_blk;
+ cp->irq_arr[1].status_blk_map = bp->def_status_blk_mapping;
cp->irq_arr[1].status_blk_num = DEF_SB_ID;
cp->irq_arr[1].status_blk_num2 = DEF_SB_IGU_ID;
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 67ec397bd171..05e28e00e916 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1106,8 +1106,8 @@ static int cnic_init_uio(struct cnic_dev *dev)
if (test_bit(CNIC_F_BNX2_CLASS, &dev->flags)) {
uinfo->mem[0].size = MB_GET_CID_ADDR(TX_TSS_CID +
TX_MAX_TSS_RINGS + 1);
- uinfo->mem[1].addr = (unsigned long) cp->status_blk.gen &
- CNIC_PAGE_MASK;
+ uinfo->mem[1].dma_addr = cp->status_blk_map;
+ uinfo->mem[1].virtual_addr = cp->status_blk.gen;
if (cp->ethdev->drv_state & CNIC_DRV_STATE_USING_MSIX)
uinfo->mem[1].size = PAGE_ALIGN(BNX2_SBLK_MSIX_ALIGN_SIZE * 9);
else
@@ -1117,22 +1117,27 @@ static int cnic_init_uio(struct cnic_dev *dev)
} else if (test_bit(CNIC_F_BNX2X_CLASS, &dev->flags)) {
uinfo->mem[0].size = pci_resource_len(dev->pcidev, 0);
- uinfo->mem[1].addr = (unsigned long) cp->bnx2x_def_status_blk &
- CNIC_PAGE_MASK;
+ uinfo->mem[1].dma_addr = cp->status_blk_map;
+ uinfo->mem[1].virtual_addr = cp->bnx2x_def_status_blk;
uinfo->mem[1].size = PAGE_ALIGN(sizeof(*cp->bnx2x_def_status_blk));
uinfo->name = "bnx2x_cnic";
}
- uinfo->mem[1].memtype = UIO_MEM_LOGICAL;
+ uinfo->mem[1].dma_device = &dev->pcidev->dev;
+ uinfo->mem[1].memtype = UIO_MEM_DMA_COHERENT;
- uinfo->mem[2].addr = (unsigned long) udev->l2_ring;
+ uinfo->mem[2].dma_addr = udev->l2_ring_map;
+ uinfo->mem[2].virtual_addr = udev->l2_ring;
uinfo->mem[2].size = udev->l2_ring_size;
- uinfo->mem[2].memtype = UIO_MEM_LOGICAL;
+ uinfo->mem[2].dma_device = &dev->pcidev->dev;
+ uinfo->mem[2].memtype = UIO_MEM_DMA_COHERENT;
- uinfo->mem[3].addr = (unsigned long) udev->l2_buf;
+ uinfo->mem[3].dma_addr = udev->l2_buf_map;
+ uinfo->mem[3].virtual_addr = udev->l2_buf;
uinfo->mem[3].size = udev->l2_buf_size;
- uinfo->mem[3].memtype = UIO_MEM_LOGICAL;
+ uinfo->mem[3].dma_device = &dev->pcidev->dev;
+ uinfo->mem[3].memtype = UIO_MEM_DMA_COHERENT;
uinfo->version = CNIC_MODULE_VERSION;
uinfo->irq = UIO_IRQ_CUSTOM;
@@ -1314,6 +1319,7 @@ static int cnic_alloc_bnx2x_resc(struct cnic_dev *dev)
return 0;
cp->bnx2x_def_status_blk = cp->ethdev->irq_arr[1].status_blk;
+ cp->status_blk_map = cp->ethdev->irq_arr[1].status_blk_map;
cp->l2_rx_ring_size = 15;
@@ -5324,6 +5330,7 @@ static int cnic_start_hw(struct cnic_dev *dev)
pci_dev_get(dev->pcidev);
cp->func = PCI_FUNC(dev->pcidev->devfn);
cp->status_blk.gen = ethdev->irq_arr[0].status_blk;
+ cp->status_blk_map = ethdev->irq_arr[0].status_blk_map;
cp->status_blk_num = ethdev->irq_arr[0].status_blk_num;
err = cp->alloc_resc(dev);
diff --git a/drivers/net/ethernet/broadcom/cnic.h b/drivers/net/ethernet/broadcom/cnic.h
index 4baea81bae7a..fedc84ada937 100644
--- a/drivers/net/ethernet/broadcom/cnic.h
+++ b/drivers/net/ethernet/broadcom/cnic.h
@@ -260,6 +260,7 @@ struct cnic_local {
#define SM_RX_ID 0
#define SM_TX_ID 1
} status_blk;
+ dma_addr_t status_blk_map;
struct host_sp_status_block *bnx2x_def_status_blk;
diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 789e5c7e9311..49a11ec80b36 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -190,6 +190,7 @@ struct cnic_ops {
struct cnic_irq {
unsigned int vector;
void *status_blk;
+ dma_addr_t status_blk_map;
u32 status_blk_num;
u32 status_blk_num2;
u32 irq_flags;
--
2.41.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-09-29 17:00 ` [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT Chris Leech
@ 2023-09-29 17:19 ` Jacob Keller
2023-09-30 7:06 ` Greg Kroah-Hartman
1 sibling, 0 replies; 23+ messages in thread
From: Jacob Keller @ 2023-09-29 17:19 UTC (permalink / raw)
To: Chris Leech, Greg Kroah-Hartman, Christoph Hellwig, Rasesh Mody,
Ariel Elior, Sudarsana Kalluru, Manish Chopra
Cc: Nilesh Javali, Manish Rangankar, Jerry Snitselaar, John Meneghini,
Lee Duncan, Mike Christie, Hannes Reinecke, netdev, linux-kernel
On 9/29/2023 10:00 AM, Chris Leech wrote:
> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> for dma_alloc_coherent buffers.
>
> The cnic l2_ring and l2_buf mmaps have caused page refcount issues since
> the misuse of the __GFP_COMP flag was removed from their
> dma_alloc_coherent calls. Fix that by having the uio device use
> dma_mmap_coherent.
>
> The bnx2 and bnx2x status block allocations are also dma_alloc_coherent,
> and should use dma_mmap_coherent. They didn't allocate multiple pages,
> but also didn't seem to work correctly with an iommu enabled.
>
> Fixes: bb73955c0b1d ("cnic: don't pass bogus GFP_ flags to dma_alloc_coherent")
> Signed-off-by: Chris Leech <cleech@redhat.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-09-29 17:00 ` [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT Chris Leech
2023-09-29 17:19 ` Jacob Keller
@ 2023-09-30 7:06 ` Greg Kroah-Hartman
2023-09-30 9:10 ` Jerry Snitselaar
2023-09-30 18:19 ` Chris Leech
1 sibling, 2 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2023-09-30 7:06 UTC (permalink / raw)
To: Chris Leech
Cc: Christoph Hellwig, Rasesh Mody, Ariel Elior, Sudarsana Kalluru,
Manish Chopra, Nilesh Javali, Manish Rangankar, Jerry Snitselaar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> for dma_alloc_coherent buffers.
Why are ethernet drivers messing around with UIO devices? That's not
what UIO is for, unless you are trying to do kernel bypass for these
devices without anyone noticing?
confused,
greg k-h
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-09-30 7:06 ` Greg Kroah-Hartman
@ 2023-09-30 9:10 ` Jerry Snitselaar
2023-09-30 18:29 ` Greg Kroah-Hartman
2023-09-30 18:19 ` Chris Leech
1 sibling, 1 reply; 23+ messages in thread
From: Jerry Snitselaar @ 2023-09-30 9:10 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Chris Leech, Christoph Hellwig, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra, Nilesh Javali, Manish Rangankar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > for dma_alloc_coherent buffers.
>
> Why are ethernet drivers messing around with UIO devices? That's not
> what UIO is for, unless you are trying to do kernel bypass for these
> devices without anyone noticing?
>
> confused,
>
> greg k-h
Hi Greg,
It is for iscsi offload [1], and apparently cnic has been handing off
dma alloc'd memory to uio for this since 2009, but it has just been
handing off the address from dma_alloc_coherent to uio, and giving the
dma handle the bnx2x device. That managed to avoid being an issue
until changes last year rightly cleaned up allowing __GFP_COMP to be
passed to the dma_alloc* calls, which cnic was passing to
dma_alloc_coherent. Now iscsiuio goes to mmap through the uio device
and the result is a BUG and stack trace like below.
It was my suggestion that it either needs to use dma_mmap_coherent to
mmap the memory, which possibly is a mistaken understanding on my
part but what dma_mmap_coherent says it is for, or possibly look to
do something similar to what qed is doing for uio.
Regards,
Jerry
[ 129.196731] page:ffffea0004cacb40 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x132b2d
[ 129.207285] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 129.214650] page_type: 0xffffffff()
[ 129.218584] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[ 129.227264] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 129.235937] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
[ 129.246632] ------------[ cut here ]------------
[ 129.251817] kernel BUG at include/linux/mm.h:1441!
[ 129.257209] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[ 129.263440] CPU: 1 PID: 1930 Comm: iscsiuio Kdump: loaded Not tainted 6.6.0-0.rc3.26.eln130.x86_64+debug #1
[ 129.274323] Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.4.2 01/29/2015
[ 129.282682] RIP: 0010:uio_vma_fault+0x40e/0x520 [uio]
[ 129.288345] Code: 49 83 ee 01 e9 db fe ff ff be 01 00 00 00 4c 89 f7 e8 96 7f ae e9 e9 2b ff ff ff 48 c7 c6 00 08 52 c1 4c 89 f7 e8 12 1b 8e e9 <0f> 0b e8 cb 7b a3 e9 e9 f6 fd ff ff bb 02 00 00 00 e9 2b ff ff ff
[ 129.309311] RSP: 0018:ffffc900022878b0 EFLAGS: 00010246
[ 129.315156] RAX: 000000000000005c RBX: ffffea0004cacb40 RCX: 0000000000000000
[ 129.323130] RDX: 0000000000000000 RSI: ffffffffad380f00 RDI: 0000000000000001
[ 129.331102] RBP: ffff8881a65da528 R08: 0000000000000001 R09: fffff52000450eca
[ 129.339074] R10: ffffc90002287657 R11: 0000000000000001 R12: ffffea0004cacb74
[ 129.347044] R13: ffffc900022879f8 R14: ffffea0004cacb40 R15: ffff8881946a0400
[ 129.355015] FS: 00007fc6dcafe640(0000) GS:ffff8885cb800000(0000) knlGS:0000000000000000
[ 129.364054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 129.370474] CR2: 000055b5c9f091a0 CR3: 0000000162292001 CR4: 00000000001706e0
[ 129.378446] Call Trace:
[ 129.381183] <TASK>
[ 129.383532] ? die+0x36/0x90
[ 129.386765] ? do_trap+0x199/0x240
[ 129.390573] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.395560] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.400539] ? do_error_trap+0xa3/0x170
[ 129.404830] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.409815] ? handle_invalid_op+0x2c/0x40
[ 129.414395] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.419361] ? exc_invalid_op+0x2d/0x40
[ 129.423657] ? asm_exc_invalid_op+0x1a/0x20
[ 129.428332] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.433301] __do_fault+0xf2/0x5a0
[ 129.437103] do_fault+0x68e/0xc30
[ 129.440804] ? __pfx___pte_offset_map_lock+0x10/0x10
[ 129.446350] __handle_mm_fault+0x8e0/0xeb0
[ 129.450939] ? follow_page_pte+0x29a/0xda0
[ 129.455513] ? __pfx___handle_mm_fault+0x10/0x10
[ 129.460668] ? __pfx_follow_page_pte+0x10/0x10
[ 129.465626] ? follow_pmd_mask.isra.0+0x1d4/0xab0
[ 129.470877] ? count_memcg_event_mm.part.0+0xc7/0x1f0
[ 129.476515] ? rcu_is_watching+0x15/0xb0
[ 129.480896] handle_mm_fault+0x2f2/0x8d0
[ 129.485277] ? check_vma_flags+0x1c2/0x420
[ 129.489852] __get_user_pages+0x333/0xa20
[ 129.494331] ? __pfx___get_user_pages+0x10/0x10
[ 129.499389] ? __pfx_mt_find+0x10/0x10
[ 129.503568] ? __mm_populate+0xe7/0x360
[ 129.507850] ? rcu_is_watching+0x15/0xb0
[ 129.512231] populate_vma_page_range+0x1e9/0x2d0
[ 129.517388] ? __pfx_populate_vma_page_range+0x10/0x10
[ 129.523125] ? __pfx_mmap_region+0x10/0x10
[ 129.527693] __mm_populate+0x1ff/0x360
[ 129.531882] ? __pfx___mm_populate+0x10/0x10
[ 129.536649] ? do_mmap+0x61d/0xcd0
[ 129.540446] ? __up_write+0x1a5/0x500
[ 129.544536] vm_mmap_pgoff+0x276/0x360
[ 129.548722] ? rcu_is_watching+0x15/0xb0
[ 129.553093] ? __pfx_vm_mmap_pgoff+0x10/0x10
[ 129.557861] ? rcu_is_watching+0x15/0xb0
[ 129.562239] ? lock_release+0x25c/0x300
[ 129.566522] ? __fget_files+0x1e0/0x380
[ 129.570807] ksys_mmap_pgoff+0x2e4/0x430
[ 129.575189] do_syscall_64+0x60/0x90
[ 129.579180] ? do_syscall_64+0x6c/0x90
[ 129.583357] ? lockdep_hardirqs_on+0x7d/0x100
[ 129.588235] ? do_syscall_64+0x6c/0x90
[ 129.592420] ? lockdep_hardirqs_on+0x7d/0x100
[ 129.597283] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[1] https://github.com/open-iscsi/open-iscsi/blob/master/iscsiuio/README
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-09-30 9:10 ` Jerry Snitselaar
@ 2023-09-30 18:29 ` Greg Kroah-Hartman
0 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2023-09-30 18:29 UTC (permalink / raw)
To: Jerry Snitselaar
Cc: Chris Leech, Christoph Hellwig, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra, Nilesh Javali, Manish Rangankar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On Sat, Sep 30, 2023 at 02:10:50AM -0700, Jerry Snitselaar wrote:
> [1] https://github.com/open-iscsi/open-iscsi/blob/master/iscsiuio/README
That's IP offload, not what UIO is supposed to be for at all (yes, I
know DPDK abuses this api as well, and I hate it.) But this is on a
network card, it shouldn't need UIO. Why is iscsi somehow "special" for
this?
still confused,
greg k-h
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-09-30 7:06 ` Greg Kroah-Hartman
2023-09-30 9:10 ` Jerry Snitselaar
@ 2023-09-30 18:19 ` Chris Leech
2023-09-30 18:28 ` Greg Kroah-Hartman
1 sibling, 1 reply; 23+ messages in thread
From: Chris Leech @ 2023-09-30 18:19 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Christoph Hellwig, Rasesh Mody, Ariel Elior, Sudarsana Kalluru,
Manish Chopra, Nilesh Javali, Manish Rangankar, Jerry Snitselaar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > for dma_alloc_coherent buffers.
>
> Why are ethernet drivers messing around with UIO devices? That's not
> what UIO is for, unless you are trying to do kernel bypass for these
> devices without anyone noticing?
>
> confused,
It's confusing. The bnx2 driver stack included a cnic (converged nic?)
module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
offload drivers (iscsi, fcoe, rdma).
The iscsi module (bnx2i) uses a passthrough interface from cnic to
handle some network configuration that the device firmware doesn't do.
It uses a uio device and a userspace component called iscsiuio to do
that.
Questions beyond that will probably need to be answer by one of the many
Marvell engineers copied on this thread.
- Chris
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-09-30 18:19 ` Chris Leech
@ 2023-09-30 18:28 ` Greg Kroah-Hartman
2023-10-01 10:44 ` Hannes Reinecke
0 siblings, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2023-09-30 18:28 UTC (permalink / raw)
To: Chris Leech
Cc: Christoph Hellwig, Rasesh Mody, Ariel Elior, Sudarsana Kalluru,
Manish Chopra, Nilesh Javali, Manish Rangankar, Jerry Snitselaar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > for dma_alloc_coherent buffers.
> >
> > Why are ethernet drivers messing around with UIO devices? That's not
> > what UIO is for, unless you are trying to do kernel bypass for these
> > devices without anyone noticing?
> >
> > confused,
>
> It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> offload drivers (iscsi, fcoe, rdma).
>
> The iscsi module (bnx2i) uses a passthrough interface from cnic to
> handle some network configuration that the device firmware doesn't do.
> It uses a uio device and a userspace component called iscsiuio to do
> that.
That's horrible, and not what the UIO api is for at all. Configure the
device like any other normal kernel device, don't poke at raw memory
values directly, that way lies madness.
Have a pointer to the userspace tool anywhere? All I found looks like a
full IP stack in userspace under that name, and surely that's not what
this api is for...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-09-30 18:28 ` Greg Kroah-Hartman
@ 2023-10-01 10:44 ` Hannes Reinecke
2023-10-01 11:57 ` Greg Kroah-Hartman
0 siblings, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2023-10-01 10:44 UTC (permalink / raw)
To: Greg Kroah-Hartman, Chris Leech
Cc: Christoph Hellwig, Rasesh Mody, Ariel Elior, Sudarsana Kalluru,
Manish Chopra, Nilesh Javali, Manish Rangankar, Jerry Snitselaar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
>> On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
>>> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
>>>> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
>>>> for dma_alloc_coherent buffers.
>>>
>>> Why are ethernet drivers messing around with UIO devices? That's not
>>> what UIO is for, unless you are trying to do kernel bypass for these
>>> devices without anyone noticing?
>>>
>>> confused,
>>
>> It's confusing. The bnx2 driver stack included a cnic (converged nic?)
>> module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
>> offload drivers (iscsi, fcoe, rdma).
>>
>> The iscsi module (bnx2i) uses a passthrough interface from cnic to
>> handle some network configuration that the device firmware doesn't do.
>> It uses a uio device and a userspace component called iscsiuio to do
>> that.
>
> That's horrible, and not what the UIO api is for at all. Configure the
> device like any other normal kernel device, don't poke at raw memory
> values directly, that way lies madness.
>
> Have a pointer to the userspace tool anywhere? All I found looks like a
> full IP stack in userspace under that name, and surely that's not what
> this api is for...
>
But that's how the interface is used, in particular for the bnx2i
driver. Problem is that the bnx2i iSCSI offload is just that, an iSCSI
offload. Not a TCP offload. So if the iSCSI interface is configured to
acquire the IP address via DHCP, someone has to run the DHCP protocol.
But the iSCSI offload can't, and the bnx2i PCI device is not a network
device so that the normal network stack can't be used.
And so the architects of the bnx2i card decided to use UIO to pass
the network traffic to userspace, and used the userspace 'iscsiuio'
application to run DHCP in userspace.
But's been that way for several years now; so long, in fact, that
the card itself has been out of support from Marvell (since quite some
years, too, IIRC). And even the successor of that card (the qedi driver)
is nearing EOL. Mind you, the qedi driver is using the same interface
(by using UIO to run DHCP in userspace), so singling out the bnx2i for
bad design can be construed as being unfair :-)
I agree, though, that the design is a mess.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-10-01 10:44 ` Hannes Reinecke
@ 2023-10-01 11:57 ` Greg Kroah-Hartman
2023-10-01 14:22 ` Jerry Snitselaar
0 siblings, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-01 11:57 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Chris Leech, Christoph Hellwig, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra, Nilesh Javali, Manish Rangankar,
Jerry Snitselaar, John Meneghini, Lee Duncan, Mike Christie,
Hannes Reinecke, netdev, linux-kernel
On Sun, Oct 01, 2023 at 12:44:05PM +0200, Hannes Reinecke wrote:
> On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> > > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > > > for dma_alloc_coherent buffers.
> > > >
> > > > Why are ethernet drivers messing around with UIO devices? That's not
> > > > what UIO is for, unless you are trying to do kernel bypass for these
> > > > devices without anyone noticing?
> > > >
> > > > confused,
> > >
> > > It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> > > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> > > offload drivers (iscsi, fcoe, rdma).
> > >
> > > The iscsi module (bnx2i) uses a passthrough interface from cnic to
> > > handle some network configuration that the device firmware doesn't do.
> > > It uses a uio device and a userspace component called iscsiuio to do
> > > that.
> >
> > That's horrible, and not what the UIO api is for at all. Configure the
> > device like any other normal kernel device, don't poke at raw memory
> > values directly, that way lies madness.
> >
> > Have a pointer to the userspace tool anywhere? All I found looks like a
> > full IP stack in userspace under that name, and surely that's not what
> > this api is for...
> >
> But that's how the interface is used, in particular for the bnx2i driver.
> Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not
> a TCP offload. So if the iSCSI interface is configured to
> acquire the IP address via DHCP, someone has to run the DHCP protocol.
> But the iSCSI offload can't, and the bnx2i PCI device is not a network
> device so that the normal network stack can't be used.
> And so the architects of the bnx2i card decided to use UIO to pass
> the network traffic to userspace, and used the userspace 'iscsiuio'
> application to run DHCP in userspace.
>
> But's been that way for several years now; so long, in fact, that
> the card itself has been out of support from Marvell (since quite some
> years, too, IIRC). And even the successor of that card (the qedi driver)
> is nearing EOL. Mind you, the qedi driver is using the same interface (by
> using UIO to run DHCP in userspace), so singling out the bnx2i for bad
> design can be construed as being unfair :-)
Ok, let's say they are all horrible! :)
> I agree, though, that the design is a mess.
Ok, so why are we papering over it and continuing to allow it to exist?
What "broke" to suddenly require this UIO change? If this has been
around for a very long time, what has caused this to now require the UIO
layer to change?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-10-01 11:57 ` Greg Kroah-Hartman
@ 2023-10-01 14:22 ` Jerry Snitselaar
2023-10-02 6:04 ` Christoph Hellwig
0 siblings, 1 reply; 23+ messages in thread
From: Jerry Snitselaar @ 2023-10-01 14:22 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Hannes Reinecke, Chris Leech, Christoph Hellwig, Rasesh Mody,
Ariel Elior, Sudarsana Kalluru, Manish Chopra, Nilesh Javali,
Manish Rangankar, John Meneghini, Lee Duncan, Mike Christie,
Hannes Reinecke, netdev, linux-kernel
On Sun, Oct 01, 2023 at 01:57:25PM +0200, Greg Kroah-Hartman wrote:
> On Sun, Oct 01, 2023 at 12:44:05PM +0200, Hannes Reinecke wrote:
> > On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> > > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> > > > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > > > > for dma_alloc_coherent buffers.
> > > > >
> > > > > Why are ethernet drivers messing around with UIO devices? That's not
> > > > > what UIO is for, unless you are trying to do kernel bypass for these
> > > > > devices without anyone noticing?
> > > > >
> > > > > confused,
> > > >
> > > > It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> > > > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> > > > offload drivers (iscsi, fcoe, rdma).
> > > >
> > > > The iscsi module (bnx2i) uses a passthrough interface from cnic to
> > > > handle some network configuration that the device firmware doesn't do.
> > > > It uses a uio device and a userspace component called iscsiuio to do
> > > > that.
> > >
> > > That's horrible, and not what the UIO api is for at all. Configure the
> > > device like any other normal kernel device, don't poke at raw memory
> > > values directly, that way lies madness.
> > >
> > > Have a pointer to the userspace tool anywhere? All I found looks like a
> > > full IP stack in userspace under that name, and surely that's not what
> > > this api is for...
> > >
> > But that's how the interface is used, in particular for the bnx2i driver.
> > Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not
> > a TCP offload. So if the iSCSI interface is configured to
> > acquire the IP address via DHCP, someone has to run the DHCP protocol.
> > But the iSCSI offload can't, and the bnx2i PCI device is not a network
> > device so that the normal network stack can't be used.
> > And so the architects of the bnx2i card decided to use UIO to pass
> > the network traffic to userspace, and used the userspace 'iscsiuio'
> > application to run DHCP in userspace.
> >
> > But's been that way for several years now; so long, in fact, that
> > the card itself has been out of support from Marvell (since quite some
> > years, too, IIRC). And even the successor of that card (the qedi driver)
> > is nearing EOL. Mind you, the qedi driver is using the same interface (by
> > using UIO to run DHCP in userspace), so singling out the bnx2i for bad
> > design can be construed as being unfair :-)
>
> Ok, let's say they are all horrible! :)
>
> > I agree, though, that the design is a mess.
>
> Ok, so why are we papering over it and continuing to allow it to exist?
>
> What "broke" to suddenly require this UIO change? If this has been
> around for a very long time, what has caused this to now require the UIO
> layer to change?
>
> thanks,
>
> greg k-h
Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
in particular these two (from the e529d3507a93 dma-mapping pull for
6.2):
ffcb75458460 dma-mapping: reject __GFP_COMP in dma_alloc_attrs | 2022-11-21 | (Christoph Hellwig)
bb73955c0b1d cnic: don't pass bogus GFP_ flags to dma_alloc_coherent | 2022-11-21 | (Christoph Hellwig)
Regards,
Jerry
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-10-01 14:22 ` Jerry Snitselaar
@ 2023-10-02 6:04 ` Christoph Hellwig
2023-10-02 7:50 ` Jerry Snitselaar
0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2023-10-02 6:04 UTC (permalink / raw)
To: Jerry Snitselaar
Cc: Greg Kroah-Hartman, Hannes Reinecke, Chris Leech,
Christoph Hellwig, Rasesh Mody, Ariel Elior, Sudarsana Kalluru,
Manish Chopra, Nilesh Javali, Manish Rangankar, John Meneghini,
Lee Duncan, Mike Christie, Hannes Reinecke, netdev, linux-kernel
On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> in particular these two (from the e529d3507a93 dma-mapping pull for
> 6.2):
That's complete BS. The driver was broken since day 1 and always
ignored the DMA API requirement to never try to grab the page from the
dma coherent allocation because you generally speaking can't. It just
happened to accidentally work the trivial dma coherent allocator that
is used on x86.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-10-02 6:04 ` Christoph Hellwig
@ 2023-10-02 7:50 ` Jerry Snitselaar
2023-10-02 8:46 ` Greg Kroah-Hartman
0 siblings, 1 reply; 23+ messages in thread
From: Jerry Snitselaar @ 2023-10-02 7:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Greg Kroah-Hartman, Hannes Reinecke, Chris Leech, Rasesh Mody,
Ariel Elior, Sudarsana Kalluru, Manish Chopra, Nilesh Javali,
Manish Rangankar, John Meneghini, Lee Duncan, Mike Christie,
Hannes Reinecke, netdev, linux-kernel
On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > in particular these two (from the e529d3507a93 dma-mapping pull for
> > 6.2):
>
> That's complete BS. The driver was broken since day 1 and always
> ignored the DMA API requirement to never try to grab the page from the
> dma coherent allocation because you generally speaking can't. It just
> happened to accidentally work the trivial dma coherent allocator that
> is used on x86.
>
re-sending since gmail decided to not send plain text:
Yes, I agree that it has been broken and misusing the API. Greg's
question was what changed though, and it was the clean up of
__GFP_COMP in dma-mapping that brought the problem in the driver to
light.
I already said the other day that cnic has been doing this for 14
years. I'm not blaming you or your __GFP_COMP cleanup commits, they
just uncovered that cnic was doing something wrong. My apologies if
you took it that way.
Regards,
Jerry
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-10-02 7:50 ` Jerry Snitselaar
@ 2023-10-02 8:46 ` Greg Kroah-Hartman
2023-10-02 8:59 ` Hannes Reinecke
0 siblings, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-02 8:46 UTC (permalink / raw)
To: Jerry Snitselaar
Cc: Christoph Hellwig, Hannes Reinecke, Chris Leech, Rasesh Mody,
Ariel Elior, Sudarsana Kalluru, Manish Chopra, Nilesh Javali,
Manish Rangankar, John Meneghini, Lee Duncan, Mike Christie,
Hannes Reinecke, netdev, linux-kernel
On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
> On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > > in particular these two (from the e529d3507a93 dma-mapping pull for
> > > 6.2):
> >
> > That's complete BS. The driver was broken since day 1 and always
> > ignored the DMA API requirement to never try to grab the page from the
> > dma coherent allocation because you generally speaking can't. It just
> > happened to accidentally work the trivial dma coherent allocator that
> > is used on x86.
> >
>
> re-sending since gmail decided to not send plain text:
>
> Yes, I agree that it has been broken and misusing the API. Greg's
> question was what changed though, and it was the clean up of
> __GFP_COMP in dma-mapping that brought the problem in the driver to
> light.
>
> I already said the other day that cnic has been doing this for 14
> years. I'm not blaming you or your __GFP_COMP cleanup commits, they
> just uncovered that cnic was doing something wrong. My apologies if
> you took it that way.
As these devices aren't being made anymore, and this api is really not a
good idea in the first place, why don't we just leave it broken and see
if anyone notices?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-10-02 8:46 ` Greg Kroah-Hartman
@ 2023-10-02 8:59 ` Hannes Reinecke
2023-10-05 10:39 ` Paolo Abeni
0 siblings, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2023-10-02 8:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Jerry Snitselaar
Cc: Christoph Hellwig, Chris Leech, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra, Nilesh Javali, Manish Rangankar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On 10/2/23 10:46, Greg Kroah-Hartman wrote:
> On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
>> On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
>>> On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
>>>> Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
>>>> in particular these two (from the e529d3507a93 dma-mapping pull for
>>>> 6.2):
>>>
>>> That's complete BS. The driver was broken since day 1 and always
>>> ignored the DMA API requirement to never try to grab the page from the
>>> dma coherent allocation because you generally speaking can't. It just
>>> happened to accidentally work the trivial dma coherent allocator that
>>> is used on x86.
>>>
>>
>> re-sending since gmail decided to not send plain text:
>>
>> Yes, I agree that it has been broken and misusing the API. Greg's
>> question was what changed though, and it was the clean up of
>> __GFP_COMP in dma-mapping that brought the problem in the driver to
>> light.
>>
>> I already said the other day that cnic has been doing this for 14
>> years. I'm not blaming you or your __GFP_COMP cleanup commits, they
>> just uncovered that cnic was doing something wrong. My apologies if
>> you took it that way.
>
> As these devices aren't being made anymore, and this api is really not a
> good idea in the first place, why don't we just leave it broken and see
> if anyone notices?
>
Guess what triggered this mail thread.
Some customers did notice.
Problem is that these devices were built as the network interface in
some bladecenter machines, so you can't just replace them with a
different Ethernet card.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
2023-10-02 8:59 ` Hannes Reinecke
@ 2023-10-05 10:39 ` Paolo Abeni
0 siblings, 0 replies; 23+ messages in thread
From: Paolo Abeni @ 2023-10-05 10:39 UTC (permalink / raw)
To: Hannes Reinecke, Greg Kroah-Hartman, Jerry Snitselaar
Cc: Christoph Hellwig, Chris Leech, Rasesh Mody, Ariel Elior,
Sudarsana Kalluru, Manish Chopra, Nilesh Javali, Manish Rangankar,
John Meneghini, Lee Duncan, Mike Christie, Hannes Reinecke,
netdev, linux-kernel
On Mon, 2023-10-02 at 10:59 +0200, Hannes Reinecke wrote:
> On 10/2/23 10:46, Greg Kroah-Hartman wrote:
> > On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
> > > On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> > > > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > > > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > > > > in particular these two (from the e529d3507a93 dma-mapping pull for
> > > > > 6.2):
> > > >
> > > > That's complete BS. The driver was broken since day 1 and always
> > > > ignored the DMA API requirement to never try to grab the page from the
> > > > dma coherent allocation because you generally speaking can't. It just
> > > > happened to accidentally work the trivial dma coherent allocator that
> > > > is used on x86.
> > > >
> > >
> > > re-sending since gmail decided to not send plain text:
> > >
> > > Yes, I agree that it has been broken and misusing the API. Greg's
> > > question was what changed though, and it was the clean up of
> > > __GFP_COMP in dma-mapping that brought the problem in the driver to
> > > light.
> > >
> > > I already said the other day that cnic has been doing this for 14
> > > years. I'm not blaming you or your __GFP_COMP cleanup commits, they
> > > just uncovered that cnic was doing something wrong. My apologies if
> > > you took it that way.
> >
> > As these devices aren't being made anymore, and this api is really not a
> > good idea in the first place, why don't we just leave it broken and see
> > if anyone notices?
> >
> Guess what triggered this mail thread.
> Some customers did notice.
>
> Problem is that these devices were built as the network interface in
> some bladecenter machines, so you can't just replace them with a
> different Ethernet card.
This route looks a no-go.
Out of sheer ignorance, would the iommu hack hinted in the cover letter
require similar controversial changes?
Cheers,
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread