* Re: Enabling peer to peer device transactions for PCIe devices
From: Logan Gunthorpe @ 2016-11-25 16:45 UTC (permalink / raw)
To: Christian König, Jason Gunthorpe, Dan Williams
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <c60815a1-aaac-52eb-1714-66abb28bdc01-5C7GfCeVMHo@public.gmane.org>
On 25/11/16 06:06 AM, Christian König wrote:
> Well Serguei send me a couple of documents about QPI when we started to
> discuss this internally as well and that's exactly one of the cases I
> had in mind when writing this.
>
> If I understood it correctly for such systems P2P is technical possible,
> but not necessary a good idea. Usually it is faster to just use a
> bouncing buffer when the peers are a bit "father" apart.
>
> That this problem is solved on newer hardware is good, but doesn't helps
> us at all if we at want to support at least systems from the last five
> years or so.
Well we have been testing with Sandy Bridge, I think the problem was
supposed to be fixed in Ivy but we never tested it so I can't say what
the performance turned out to be. Ivy is nearly 5 years old. I expect
this is something that will be improved more and more with subsequent
generations.
A white list may end up being rather complicated if it has to cover
different CPU generations and system architectures. I feel this is a
decision user space could easily make.
Logan
^ permalink raw reply
* Re: [PATCH 0/2] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Martin K. Petersen @ 2016-11-25 15:19 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Bart Van Assche, James Bottomley, Doug Ledford, Christoph Hellwig,
Sagi Grimberg, Max Gurtovoy, linux-scsi@vger.kernel.org,
linux-rdma@vger.kernel.org
In-Reply-To: <yq1d1hjtwz5.fsf@sermon.lab.mkp.net>
>>>>> "Martin" == Martin K Petersen <martin.petersen@oracle.com> writes:
Hi Bart,
Martin> Applied to 4.10/scsi-queue.
2/2 needs a rebase and I'm not going to do another one this late in the
cycle. Please resend this patch once we hit 4.10 rc1.
Thanks!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
* Re: [PATCH 0/2] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Martin K. Petersen @ 2016-11-25 15:12 UTC (permalink / raw)
To: Bart Van Assche
Cc: James Bottomley, Martin K. Petersen, Doug Ledford,
Christoph Hellwig, Sagi Grimberg, Max Gurtovoy,
linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <39d8cb23-0406-e8c3-6e3a-a467ebe41470@sandisk.com>
>>>>> "Bart" == Bart Van Assche <bart.vanassche@sandisk.com> writes:
Bart> The SRP transport code must wait until ongoing .queuecommand() /
Bart> .queue_rq() callback function invocations have finished before
Bart> reconnecting at the transport layer level and also before invoking
Bart> .terminate_rport_io(). This is already the case for the single
Bart> queue path but not yet for the scsi-mq path. This patch series
Bart> realizes the proper serialization for the scsi-mq path. Compared
Bart> to last time these patches were posted, only the patch
Bart> descriptions and one comment have been changed.
Applied to 4.10/scsi-queue.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Christian König @ 2016-11-25 13:22 UTC (permalink / raw)
To: Jason Gunthorpe, Logan Gunthorpe
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161124164249.GD20818-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Am 24.11.2016 um 17:42 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2016 at 06:25:21PM -0700, Logan Gunthorpe wrote:
>>
>> On 23/11/16 02:55 PM, Jason Gunthorpe wrote:
>>>>> Only ODP hardware allows changing the DMA address on the fly, and it
>>>>> works at the page table level. We do not need special handling for
>>>>> RDMA.
>>>> I am aware of ODP but, noted by others, it doesn't provide a general
>>>> solution to the points above.
>>> How do you mean?
>> I was only saying it wasn't general in that it wouldn't work for IB
>> hardware that doesn't support ODP or other hardware that doesn't do
>> similar things (like an NVMe drive).
> There are three cases to worry about:
> - Coherent long lived page table mirroring (RDMA ODP MR)
> - Non-coherent long lived page table mirroring (RDMA MR)
> - Short lived DMA mapping (everything else)
>
> Like you say below we have to handle short lived in the usual way, and
> that covers basically every device except IB MRs, including the
> command queue on a NVMe drive.
Well a problem which wasn't mentioned so far is that while GPUs do have
a page table to mirror the CPU page table, they usually can't recover
from page faults.
So what we do is making sure that all memory accessed by the GPU Jobs
stays in place while those jobs run (pretty much the same pinning you do
for the DMA).
But since this can lock down huge amounts of memory the whole command
submission to GPUs is bound to the memory management. So when to much
memory would get blocked by the GPU we block further command submissions
until the situation resolves.
>> any complex allocators (GPU or otherwise) should respect that. And that
>> seems like it should be the default way most of this works -- and I
>> think it wouldn't actually take too much effort to make it all work now
>> as is. (Our iopmem work is actually quite small and simple.)
> Yes, absolutely, some kind of page pinning like locking is a hard
> requirement.
>
>> Yeah, we've had RDMA and O_DIRECT transfers to PCIe backed ZONE_DEVICE
>> memory working for some time. I'd say it's a good fit. The main question
>> we've had is how to expose PCIe bars to userspace to be used as MRs and
>> such.
> Is there any progress on that?
>
> I still don't quite get what iopmem was about.. I thought the
> objection to uncachable ZONE_DEVICE & DAX made sense, so running DAX
> over iopmem and still ending up with uncacheable mmaps still seems
> like a non-starter to me...
>
> Serguei, what is your plan in GPU land for migration? Ie if I have a
> CPU mapped page and the GPU moves it to VRAM, it becomes non-cachable
> - do you still allow the CPU to access it? Or do you swap it back to
> cachable memory if the CPU touches it?
Depends on the policy in command, but currently it's the other way
around most of the time.
E.g. we allocate memory in VRAM, the CPU writes to it WC and avoids
reading because that is slow, the GPU in turn can access it with full speed.
When we run out of VRAM we move those allocations to system memory and
update both the CPU as well as the GPU page tables.
So that move is transparent for both userspace as well as shaders
running on the GPU.
> One approach might be to mmap the uncachable ZONE_DEVICE memory and
> mark it inaccessible to the CPU - DMA could still translate. If the
> CPU needs it then the kernel migrates it to system memory so it
> becomes cachable. ??
The whole purpose of this effort is that we can do I/O on VRAM directly
without migrating everything back to system memory.
Allowing this, but then doing the migration by the first touch of the
CPU is clearly not a good idea.
Regards,
Christian.
>
> Jason
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Christian König @ 2016-11-25 13:06 UTC (permalink / raw)
To: Logan Gunthorpe, Jason Gunthorpe, Dan Williams
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <5e1de9ee-34f5-136d-a07e-f949d492864f-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
Am 24.11.2016 um 18:55 schrieb Logan Gunthorpe:
> Hey,
>
> On 24/11/16 02:45 AM, Christian König wrote:
>> E.g. it can happen that PCI device A exports it's BAR using ZONE_DEVICE.
>> Not PCI device B (a SATA device) can directly read/write to it because
>> it is on the same bus segment, but PCI device C (a network card for
>> example) can't because it is on a different bus segment and the bridge
>> can't handle P2P transactions.
> Yeah, that could be an issue but in our experience we have yet to see
> it. We've tested with two separate PCI buses on different CPUs connected
> through QPI links and it works fine. (It is rather slow but I understand
> Intel has improved the bottleneck in newer CPUs than the ones we tested.)
Well Serguei send me a couple of documents about QPI when we started to
discuss this internally as well and that's exactly one of the cases I
had in mind when writing this.
If I understood it correctly for such systems P2P is technical possible,
but not necessary a good idea. Usually it is faster to just use a
bouncing buffer when the peers are a bit "father" apart.
That this problem is solved on newer hardware is good, but doesn't helps
us at all if we at want to support at least systems from the last five
years or so.
> It may just be older hardware that has this issue. I expect that as long
> as a failed transfer can be handled gracefully by the initiator I don't
> see a need to predetermine whether a device can see another devices memory.
I don't want to predetermine whether a device can see another devices
memory at get_user_pages() time.
My thinking was more going into the direction of a whitelist to figure
out during dma_map_single()/dma_map_sg() time if we should use a
bouncing buffer or not.
Christian.
>
>
> Logan
^ permalink raw reply
* [Bug 189031] New: Function mlx4_ib_query_device() does not set error codes on some failures
From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r @ 2016-11-25 11:18 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
https://bugzilla.kernel.org/show_bug.cgi?id=189031
Bug ID: 189031
Summary: Function mlx4_ib_query_device() does not set error
codes on some failures
Product: Drivers
Version: 2.5
Kernel Version: linux-4.9-rc6
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Infiniband/RDMA
Assignee: drivers_infiniband-rdma-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
Reporter: bianpan2010-AvrBmmDjM4YnDS1+zs4M5A@public.gmane.org
Regression: No
Functions kzalloc() and kmalloc() return NULL pointers if there is no enough
memory. They are called in the function mlx4_ib_query_device() defined in file
drivers/infiniband/hw/mlx4/main.c. If they return NULL pointers (see line 458),
function mlx4_ib_query_device() returns variable err. Because variable err is
checked at line 444, the value of err may be 0 when kzalloc() and kmalloc() are
called. As a result, 0 (indicates success) may be returned even on failures.
Though these errors may occur rarely, I think it is better to set correct error
code (e.g. -ENOMEM) when kzalloc() and kmalloc() returns NULL pointers. Codes
related to this bug are summarised as follows.
mlx4_ib_query_device @@ drivers/infiniband/hw/mlx4/main.c
426 static int mlx4_ib_query_device(struct ib_device *ibdev,
427 struct ib_device_attr *props,
428 struct ib_udata *uhw)
429 {
430 struct mlx4_ib_dev *dev = to_mdev(ibdev);
431 struct ib_smp *in_mad = NULL;
432 struct ib_smp *out_mad = NULL;
433 int err = -ENOMEM;
...
439 if (uhw->inlen) {
440 if (uhw->inlen < sizeof(cmd))
441 return -EINVAL;
442
443 err = ib_copy_from_udata(&cmd, uhw, sizeof(cmd));
444 if (err)
445 return err;
446
447 if (cmd.comp_mask)
448 return -EINVAL;
449
450 if (cmd.reserved)
451 return -EINVAL;
452 }
453
454 resp.response_length = offsetof(typeof(resp), response_length) +
455 sizeof(resp.response_length);
456 in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL);
457 out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
458 if (!in_mad || !out_mad)
// The value of err may be 0. Insert "err = -ENOMEM;" ?
459 goto out;
...
567 out:
568 kfree(in_mad);
569 kfree(out_mad);
570
571 return err;
572 }
Thanks very much!
--
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [Bug 188831] New: Function ocrdma_mbx_create_ah_tbl() does not set error code when the call to dma_alloc_coherent() fails
From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r @ 2016-11-25 11:01 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
https://bugzilla.kernel.org/show_bug.cgi?id=188831
Bug ID: 188831
Summary: Function ocrdma_mbx_create_ah_tbl() does not set error
code when the call to dma_alloc_coherent() fails
Product: Drivers
Version: 2.5
Kernel Version: linux-4.9-rc6
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Infiniband/RDMA
Assignee: drivers_infiniband-rdma-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
Reporter: bianpan2010-AvrBmmDjM4YnDS1+zs4M5A@public.gmane.org
Regression: No
Function dma_alloc_coherent() returns a NULL pointer if there is no enough
memory. Function ocrdma_mbx_create_ah_tbl() defined in file
drivers/infiniband/hw/ocrdma/ocrdma_hw.c will return 0 on success or negative
error codes on failures. It calls function dma_alloc_coherent() twice and
checks the return values against NULL (at lines 1681 and 1686). The control
flow jumps to label "mem_err_ah" and returns the value of variable status. The
value of status is 0 (see the check of variable status at line 1645). As a
result, the caller of ocrdma_mbx_create_ah_tbl() will be misled to believe all
goes well even the memory allocation fails. Maybe it is better to assign
"-ENOMEM" to variable status before the jump instructions at lines 1682 and
1687, or simply initialize status with "-ENOMEM" rather than "0" at line 1645.
Codes related to this bug are summarised as follows.
ocrdma_mbx_create_ah_tbl @@ drivers/infiniband/hw/ocrdma/ocrdma_hw.c
1642 static int ocrdma_mbx_create_ah_tbl(struct ocrdma_dev *dev)
1643 {
1644 int i;
1645 int status = 0; // use "int status = -ENOMEM;" ?
...
1653 cmd = ocrdma_init_emb_mqe(OCRDMA_CMD_CREATE_AH_TBL, sizeof(*cmd));
1654 if (!cmd)
1655 return status;
...
1678 dev->av_tbl.pbl.va = dma_alloc_coherent(&pdev->dev, PAGE_SIZE,
1679 &dev->av_tbl.pbl.pa,
1680 GFP_KERNEL);
1681 if (dev->av_tbl.pbl.va == NULL)
1682 goto mem_err;
1683
1684 dev->av_tbl.va = dma_alloc_coherent(&pdev->dev, dev->av_tbl.size,
1685 &pa, GFP_KERNEL);
1686 if (dev->av_tbl.va == NULL)
1687 goto mem_err_ah;
...
1706 return 0;
1707
1708 mbx_err:
1709 dma_free_coherent(&pdev->dev, dev->av_tbl.size, dev->av_tbl.va,
1710 dev->av_tbl.pa);
1711 dev->av_tbl.va = NULL;
1712 mem_err_ah:
1713 dma_free_coherent(&pdev->dev, PAGE_SIZE, dev->av_tbl.pbl.va,
1714 dev->av_tbl.pbl.pa);
1715 dev->av_tbl.pbl.va = NULL;
1716 dev->av_tbl.size = 0;
1717 mem_err:
1718 kfree(cmd);
1719 return status;
1720 }
Thanks very much!
--
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [Bug 188821] New: Function c4iw_rdev_open() does not set error code when the call to __get_free_page() fails.
From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r @ 2016-11-25 11:01 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
https://bugzilla.kernel.org/show_bug.cgi?id=188821
Bug ID: 188821
Summary: Function c4iw_rdev_open() does not set error code when
the call to __get_free_page() fails.
Product: Drivers
Version: 2.5
Kernel Version: linux-4.9-rc6
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Infiniband/RDMA
Assignee: drivers_infiniband-rdma-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
Reporter: bianpan2010-AvrBmmDjM4YnDS1+zs4M5A@public.gmane.org
Regression: No
Function __get_free_page() returns a NULL pointer if there is no enough memory.
In function c4iw_rdev_open defined in file
drivers/infiniband/hw/cxgb4/device.c, __get_free_page() is called and its
return value is checked against NULL (at line 831). When the return value is
NULL, the control flow jumps to label "destroy_ocqp_pool", and returns the
value of variable err. However, after the check of variable err at line 825,
the value of err must be 0. As a result, 0 (indicates success) may be returned
even when the memory allocation fails. Maybe it is better to assign "-ENOMEM"
to err before the jump instruction at line 832. Codes related to this bug are
summarised as follows.
c4iw_rdev_open @@ drivers/infiniband/hw/cxgb4/device.c
752 static int c4iw_rdev_open(struct c4iw_rdev *rdev)
753 {
754 int err;
...
824 err = c4iw_ocqp_pool_create(rdev);
825 if (err) {
826 printk(KERN_ERR MOD "error %d initializing ocqp pool\n", err);
827 goto destroy_rqtpool;
828 }
829 rdev->status_page = (struct t4_dev_status_page *)
830 __get_free_page(GFP_KERNEL);
831 if (!rdev->status_page)
832 goto destroy_ocqp_pool; // insert "err = -ENOMEM" before this
line?
...
851 return 0;
852 destroy_ocqp_pool:
853 c4iw_ocqp_pool_destroy(rdev);
854 destroy_rqtpool:
855 c4iw_rqtpool_destroy(rdev);
856 destroy_pblpool:
857 c4iw_pblpool_destroy(rdev);
858 destroy_resource:
859 c4iw_destroy_resource(&rdev->resource);
860 return err;
861 }
Thanks very much!
--
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [Bug 188601] New: Function ioat_xor_val_self_test() does not set error code when the call to dma_mapping_error() fails
From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r @ 2016-11-25 10:40 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
https://bugzilla.kernel.org/show_bug.cgi?id=188601
Bug ID: 188601
Summary: Function ioat_xor_val_self_test() does not set error
code when the call to dma_mapping_error() fails
Product: Drivers
Version: 2.5
Kernel Version: linux-4.9-rc6
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Infiniband/RDMA
Assignee: drivers_infiniband-rdma-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
Reporter: bianpan2010-AvrBmmDjM4YnDS1+zs4M5A@public.gmane.org
Regression: No
Function dma_mapping_error() returns a NULL pointer on failure. In function
ioat_xor_val_self_test() defined in file drivers/dma/ioat/init.c, variable err
takes the error code. However, the value of err is 0 (indicates success) even
when the call to dma_mapping_error() fails at line 830. Though this error may
occur rarely, it is better to assign "-ENOMEM" to err when dma_mapping_error()
fails.
There are other 3 similar bugs when the call to dma_mapping_error() fail at
lines 838, 907 and 960. Codes related to these bugs are summarised as follows.
ioat_xor_val_self_test @@ drivers/dma/ioat/init.c
761 static int ioat_xor_val_self_test(struct ioatdma_device *ioat_dma)
762 {
...
775 int err = 0;
...
819 dma_chan = container_of(dma->channels.next, struct dma_chan,
820 device_node);
821 if (dma->device_alloc_chan_resources(dma_chan) < 1) {
822 err = -ENODEV;
823 goto out;
824 }
825
826 /* test xor */
827 op = IOAT_OP_XOR;
828
829 dest_dma = dma_map_page(dev, dest, 0, PAGE_SIZE, DMA_FROM_DEVICE);
830 if (dma_mapping_error(dev, dest_dma))
831 goto free_resources; // insert "err = -ENOMEM" before this jump
instruction?
832
833 for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
834 dma_srcs[i] = DMA_ERROR_CODE;
835 for (i = 0; i < IOAT_NUM_SRC_TEST; i++) {
836 dma_srcs[i] = dma_map_page(dev, xor_srcs[i], 0, PAGE_SIZE,
837 DMA_TO_DEVICE);
838 if (dma_mapping_error(dev, dma_srcs[i]))
839 goto dma_unmap; // insert "err = -ENOMEM" before this jump
instruction?
840 }
...
904 for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
905 dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
906 DMA_TO_DEVICE);
907 if (dma_mapping_error(dev, dma_srcs[i]))
908 goto dma_unmap; // insert "err = -ENOMEM" before this jump
instruction?
909 }
...
957 for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
958 dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
959 DMA_TO_DEVICE);
960 if (dma_mapping_error(dev, dma_srcs[i]))
961 goto dma_unmap; // insert "err = -ENOMEM" before this jump
instruction?
962 }
...
1003 dma_unmap:
1004 if (op == IOAT_OP_XOR) {
1005 if (dest_dma != DMA_ERROR_CODE)
1006 dma_unmap_page(dev, dest_dma, PAGE_SIZE,
1007 DMA_FROM_DEVICE);
1008 for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
1009 if (dma_srcs[i] != DMA_ERROR_CODE)
1010 dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
1011 DMA_TO_DEVICE);
1012 } else if (op == IOAT_OP_XOR_VAL) {
1013 for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
1014 if (dma_srcs[i] != DMA_ERROR_CODE)
1015 dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
1016 DMA_TO_DEVICE);
1017 }
1018 free_resources:
1019 dma->device_free_chan_resources(dma_chan);
1020 out:
1021 src_idx = IOAT_NUM_SRC_TEST;
1022 while (src_idx--)
1023 __free_page(xor_srcs[src_idx]);
1024 __free_page(dest);
1025 return err;
1026 }
Thanks very much!
--
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [Bug 188591] New: Function ioat_dma_self_test() does not set error code when the call to dma_mapping_error() fails
From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r @ 2016-11-25 10:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
https://bugzilla.kernel.org/show_bug.cgi?id=188591
Bug ID: 188591
Summary: Function ioat_dma_self_test() does not set error code
when the call to dma_mapping_error() fails
Product: Drivers
Version: 2.5
Kernel Version: linux-4.9-rc6
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Infiniband/RDMA
Assignee: drivers_infiniband-rdma-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
Reporter: bianpan2010-AvrBmmDjM4YnDS1+zs4M5A@public.gmane.org
Regression: No
Function dma_mapping_error() returns a NULL pointer on failure. In function
ioat_dma_self_test() defined in file drivers/dma/ioat/init.c, variable err
takes the error code. However, the value of err is 0 (indicates success) even
when the calls to dma_mapping_error() fail at line 341 or 346. Though this
error may occur rarely, it is better to assgin "-ENOMEM" to err before the jump
instructions at lines 343 and 348. Codes related to these two bugs are
summarised as follows.
ioat_dma_self_test @@ drivers/dma/ioat/init.c
302 static int ioat_dma_self_test(struct ioatdma_device *ioat_dma)
303 {
...
313 int err = 0;
...
332 dma_chan = container_of(dma->channels.next, struct dma_chan,
333 device_node);
334 if (dma->device_alloc_chan_resources(dma_chan) < 1) {
335 dev_err(dev, "selftest cannot allocate chan resource\n");
336 err = -ENODEV;
337 goto out;
338 }
339
340 dma_src = dma_map_single(dev, src, IOAT_TEST_SIZE, DMA_TO_DEVICE);
341 if (dma_mapping_error(dev, dma_src)) {
342 dev_err(dev, "mapping src buffer failed\n");
343 goto free_resources; // insert "err = -ENOMEM" before this
jump instruction?
344 }
345 dma_dest = dma_map_single(dev, dest, IOAT_TEST_SIZE, DMA_FROM_DEVICE);
346 if (dma_mapping_error(dev, dma_dest)) {
347 dev_err(dev, "mapping dest buffer failed\n");
348 goto unmap_src; // insert "err = -ENOMEM" before this jump
instruction?
349 }
...
387 unmap_dma:
388 dma_unmap_single(dev, dma_dest, IOAT_TEST_SIZE, DMA_FROM_DEVICE);
389 unmap_src:
390 dma_unmap_single(dev, dma_src, IOAT_TEST_SIZE, DMA_TO_DEVICE);
391 free_resources:
392 dma->device_free_chan_resources(dma_chan);
393 out:
394 kfree(src);
395 kfree(dest);
396 return err;
397 }
Thanks very much!
--
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [Bug 188541] New: Function rsxx_pci_probe() does not set error code when the call to create_singlethread_workqueue() fails
From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r @ 2016-11-25 10:31 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
https://bugzilla.kernel.org/show_bug.cgi?id=188541
Bug ID: 188541
Summary: Function rsxx_pci_probe() does not set error code when
the call to create_singlethread_workqueue() fails
Product: Drivers
Version: 2.5
Kernel Version: linux-4.9-rc6
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Infiniband/RDMA
Assignee: drivers_infiniband-rdma-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
Reporter: bianpan2010-AvrBmmDjM4YnDS1+zs4M5A@public.gmane.org
Regression: No
Function create_singlethread_workqueue() returns a NULL pointer on failure. In
function rsxx_pci_probe() defined in file drivers/block/rsxx/core.c, variable
st keeps the error code, its value must be 0 at line 888. As a result, 0
(indicates no error) will be returned even when the call to function
create_singlethread_workqueue() at line 890 fails. Though this error may occur
rarely, it's better to explicitly assign "-ENOMEM" to st on the true branch of
the if-statement at line 891.
758 static int rsxx_pci_probe(struct pci_dev *dev,
759 const struct pci_device_id *id)
760 {
761 struct rsxx_cardinfo *card;
762 int st;
...
876 card->ctrl = kzalloc(card->n_targets * sizeof(*card->ctrl),
GFP_KERNEL);
877 if (!card->ctrl) {
878 st = -ENOMEM;
879 goto failed_dma_setup;
880 }
881
882 st = rsxx_dma_setup(card);
883 if (st) {
884 dev_info(CARD_TO_DEV(card),
885 "Failed to setup DMA engine\n");
886 goto failed_dma_setup;
887 }
888 // the value of st must be 0 when the code reaches here
889 /************* Setup Card Event Handler *************/
890 card->event_wq = create_singlethread_workqueue(DRIVER_NAME"_event");
891 if (!card->event_wq) {
892 dev_err(CARD_TO_DEV(card), "Failed card event setup.\n");
893 goto failed_event_handler; // insert "st = -ENOMEM"
before the jump instruction?
894 }
...
961 return 0;
962
963 failed_create_dev:
964 destroy_workqueue(card->event_wq);
965 card->event_wq = NULL;
966 failed_event_handler:
967 rsxx_dma_destroy(card);
968 failed_dma_setup:
969 failed_compatiblity_check:
970 destroy_workqueue(card->creg_ctrl.creg_wq);
971 card->creg_ctrl.creg_wq = NULL;
...
990 failed_ida_get:
991 kfree(card);
992
993 return st;
994 }
Thanks very much!
--
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Christoph Hellwig @ 2016-11-25 7:58 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: Haggai Eran, linux-rdma@vger.kernel.org,
linux-nvdimm@lists.01.org, Kuehling, Felix, Serguei Sagalovitch,
Blinzer, Paul, linux-kernel@vger.kernel.org,
dri-devel@lists.freedesktop.org, Sander, Ben, Jason Gunthorpe,
Suthikulpanit, Suravee, linux-pci@vger.kernel.org,
Deucher, Alexander, Dan Williams, Koenig, Christian,
Linux-media@vger.kernel.org
In-Reply-To: <9cc22068-ede8-c1bc-5d8b-cf6224a7ce05@deltatee.com>
On Thu, Nov 24, 2016 at 11:11:34AM -0700, Logan Gunthorpe wrote:
> * Regular DAX in the FS doesn't work at this time because the FS can
> move the file you think your transfer to out from under you. Though I
> understand there's been some work with XFS to solve that issue.
The file system will never move anything under locked down pages,
locking down pages is used exactly to protect against that. So as long
as we page structures available RDMA to/from device memory _from kernel
space_ is trivial, although for file systems to work properly you
really want a notification to the consumer if the file systems wants
to remove the mapping. We have implemented that using FL_LAYOUTS locks
for NFSD, but only XFS supports it so far. Without that a long term
locked down region of memory (e.g. a kernel MR) would prevent various
file operations that would simply hang.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply
* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Vishwanathapura, Niranjana @ 2016-11-25 2:13 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: ira.weiny, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, Dennis Dalessandro
In-Reply-To: <20161124161545.GA20818-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Thu, Nov 24, 2016 at 09:15:45AM -0700, Jason Gunthorpe wrote:
>On Wed, Nov 23, 2016 at 04:08:25PM -0800, Vishwanathapura, Niranjana wrote:
>
>> In order to pass the hfi function pointers to the hfi_vnic ULP, I can,
>> a) Have hfi_vnic ULP define an interface API for hfi1 driver to call to
>> register its callback (as you pointed). Unfortunately there will be a module
>> dependency here.
>> Or,
>
>That is probably backwards
>
>> b) Add a new member ‘struct vnic_ops’ either to the ib_device structure or
>> ib_port_immutable structure. As it is hfi1 specific, only hfi1 driver will
>> set it. No module dependency here.
>
>You can add a hfi1_get_vnic_ops(struct ib_device *) and implement it
>in your module..
>
In order to be truely device independent the hfi_vnic ULP should not depend on
a device exported symbol. Instead device should register its functions with the
ULP. Hence the approaches a) and b).
Niranjana
>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Logan Gunthorpe @ 2016-11-24 18:11 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch, Blinzer, Paul,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Sander, Ben, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Koenig, Christian,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161124164249.GD20818-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On 24/11/16 09:42 AM, Jason Gunthorpe wrote:
> There are three cases to worry about:
> - Coherent long lived page table mirroring (RDMA ODP MR)
> - Non-coherent long lived page table mirroring (RDMA MR)
> - Short lived DMA mapping (everything else)
>
> Like you say below we have to handle short lived in the usual way, and
> that covers basically every device except IB MRs, including the
> command queue on a NVMe drive.
Yes, this makes sense to me. Though I thought regular IB MRs with
regular memory currently pinned the pages (despite being long lived)
that's why we can run up against the "max locked memory" limit. It
doesn't seem so terrible if GPU memory had a similar restriction until
ODP like solutions get implemented.
>> Yeah, we've had RDMA and O_DIRECT transfers to PCIe backed ZONE_DEVICE
>> memory working for some time. I'd say it's a good fit. The main question
>> we've had is how to expose PCIe bars to userspace to be used as MRs and
>> such.
> Is there any progress on that?
Well, I guess there's some consensus building to do. The existing
options are:
* Device DAX: which could work but the problem I see with it is that it
only allows one application to do these transfers. Or there would have
to be some user-space coordination to figure which application gets what
memeroy.
* Regular DAX in the FS doesn't work at this time because the FS can
move the file you think your transfer to out from under you. Though I
understand there's been some work with XFS to solve that issue.
Though, we've been considering that the backed memory would be
non-volatile which adds some of this complexity. If the memory were
volatile the kernel would just need to do some relatively straight
forward allocation to user-space when asked. For example, with NVMe, the
kernel could give chunks of the CMB buffer to userspace via an mmap call
to /dev/nvmeX. Though I think there's been some push back against things
like that as well.
> I still don't quite get what iopmem was about.. I thought the
> objection to uncachable ZONE_DEVICE & DAX made sense, so running DAX
> over iopmem and still ending up with uncacheable mmaps still seems
> like a non-starter to me...
The latest incarnation of iopmem simply created a block device backed by
ZONE_DEVICE memory on a PCIe BAR. We then put a DAX FS on it and
user-space could mmap the files and send them to other devices to do P2P
transfers.
I don't think there was a hard objection to uncachable ZONE_DEVICE and
DAX. We did try our experimental hardware with cached ZONE_DEVICE and it
did work but the performance was beyond unusable (which may be a
hardware issue). In the end I feel the driver would have to decide the
most appropriate caching for the hardware and I don't understand why WC
or UC wouldn't work with ZONE_DEVICE.
Logan
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Logan Gunthorpe @ 2016-11-24 17:55 UTC (permalink / raw)
To: Christian König, Jason Gunthorpe, Dan Williams
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <a33ec1cd-051f-8a24-0587-68707459c25c-5C7GfCeVMHo@public.gmane.org>
Hey,
On 24/11/16 02:45 AM, Christian König wrote:
> E.g. it can happen that PCI device A exports it's BAR using ZONE_DEVICE.
> Not PCI device B (a SATA device) can directly read/write to it because
> it is on the same bus segment, but PCI device C (a network card for
> example) can't because it is on a different bus segment and the bridge
> can't handle P2P transactions.
Yeah, that could be an issue but in our experience we have yet to see
it. We've tested with two separate PCI buses on different CPUs connected
through QPI links and it works fine. (It is rather slow but I understand
Intel has improved the bottleneck in newer CPUs than the ones we tested.)
It may just be older hardware that has this issue. I expect that as long
as a failed transfer can be handled gracefully by the initiator I don't
see a need to predetermine whether a device can see another devices memory.
Logan
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Serguei Sagalovitch @ 2016-11-24 17:00 UTC (permalink / raw)
To: Jason Gunthorpe, Christian König
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161124162620.GC20818-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On 2016-11-24 11:26 AM, Jason Gunthorpe wrote:
> On Thu, Nov 24, 2016 at 10:45:18AM +0100, Christian König wrote:
>> Am 24.11.2016 um 00:25 schrieb Jason Gunthorpe:
>>> There is certainly nothing about the hardware that cares
>>> about ZONE_DEVICE vs System memory.
>> Well that is clearly not so simple. When your ZONE_DEVICE pages describe a
>> PCI BAR and another PCI device initiates a DMA to this address the DMA
>> subsystem must be able to check if the interconnection really works.
> I said the hardware doesn't care.. You are right, we still have an
> outstanding problem in Linux of how to generically DMA map a P2P
> address - which is a different issue from getting the P2P address from
> a __user pointer...
>
> Jason
I agreed but the problem is that one issue immediately introduce another
one
to solve and so on (if we do not want to cut corners). I would think that
a lot of them interconnected because the way how one problem could be
solved may impact solution for another.
btw: about "DMA map a p2p address": Right now to enable p2p between
devices
it is required/recommended to disable iommu support (e.g. intel iommu
driver
has special logic for graphics and comment "Reserve all PCI MMIO to avoid
peer-to-peer access").
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Jason Gunthorpe @ 2016-11-24 16:42 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch, Blinzer, Paul,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Sander, Ben, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Koenig, Christian,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <91d28749-bc64-622f-56a1-26c00e6b462a-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
On Wed, Nov 23, 2016 at 06:25:21PM -0700, Logan Gunthorpe wrote:
>
>
> On 23/11/16 02:55 PM, Jason Gunthorpe wrote:
> >>> Only ODP hardware allows changing the DMA address on the fly, and it
> >>> works at the page table level. We do not need special handling for
> >>> RDMA.
> >>
> >> I am aware of ODP but, noted by others, it doesn't provide a general
> >> solution to the points above.
> >
> > How do you mean?
>
> I was only saying it wasn't general in that it wouldn't work for IB
> hardware that doesn't support ODP or other hardware that doesn't do
> similar things (like an NVMe drive).
There are three cases to worry about:
- Coherent long lived page table mirroring (RDMA ODP MR)
- Non-coherent long lived page table mirroring (RDMA MR)
- Short lived DMA mapping (everything else)
Like you say below we have to handle short lived in the usual way, and
that covers basically every device except IB MRs, including the
command queue on a NVMe drive.
> any complex allocators (GPU or otherwise) should respect that. And that
> seems like it should be the default way most of this works -- and I
> think it wouldn't actually take too much effort to make it all work now
> as is. (Our iopmem work is actually quite small and simple.)
Yes, absolutely, some kind of page pinning like locking is a hard
requirement.
> Yeah, we've had RDMA and O_DIRECT transfers to PCIe backed ZONE_DEVICE
> memory working for some time. I'd say it's a good fit. The main question
> we've had is how to expose PCIe bars to userspace to be used as MRs and
> such.
Is there any progress on that?
I still don't quite get what iopmem was about.. I thought the
objection to uncachable ZONE_DEVICE & DAX made sense, so running DAX
over iopmem and still ending up with uncacheable mmaps still seems
like a non-starter to me...
Serguei, what is your plan in GPU land for migration? Ie if I have a
CPU mapped page and the GPU moves it to VRAM, it becomes non-cachable
- do you still allow the CPU to access it? Or do you swap it back to
cachable memory if the CPU touches it?
One approach might be to mmap the uncachable ZONE_DEVICE memory and
mark it inaccessible to the CPU - DMA could still translate. If the
CPU needs it then the kernel migrates it to system memory so it
becomes cachable. ??
Jason
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Jason Gunthorpe @ 2016-11-24 16:26 UTC (permalink / raw)
To: Christian König
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <a33ec1cd-051f-8a24-0587-68707459c25c-5C7GfCeVMHo@public.gmane.org>
On Thu, Nov 24, 2016 at 10:45:18AM +0100, Christian König wrote:
> Am 24.11.2016 um 00:25 schrieb Jason Gunthorpe:
> >There is certainly nothing about the hardware that cares
> >about ZONE_DEVICE vs System memory.
> Well that is clearly not so simple. When your ZONE_DEVICE pages describe a
> PCI BAR and another PCI device initiates a DMA to this address the DMA
> subsystem must be able to check if the interconnection really works.
I said the hardware doesn't care.. You are right, we still have an
outstanding problem in Linux of how to generically DMA map a P2P
address - which is a different issue from getting the P2P address from
a __user pointer...
Jason
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Jason Gunthorpe @ 2016-11-24 16:24 UTC (permalink / raw)
To: Sagalovitch, Serguei
Cc: Haggai Eran, Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Blinzer, Paul,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Sander, Ben, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Koenig, Christian,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <SN1PR12MB0703CE07F4878243164299C4FEB70-z7L1TMIYDg6P/i5UxMCIqAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
On Thu, Nov 24, 2016 at 12:40:37AM +0000, Sagalovitch, Serguei wrote:
> On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote:
>
> > Perhaps I am not following what Serguei is asking for, but I
> > understood the desire was for a complex GPU allocator that could
> > migrate pages between GPU and CPU memory under control of the GPU
> > driver, among other things. The desire is for DMA to continue to work
> > even after these migrations happen.
>
> The main issue is to how to solve use cases when p2p is
> requested/initiated via CPU pointers where such pointers could
> point to non-system memory location e.g. VRAM.
Okay, but your list is conflating a whole bunch of problems..
1) How to go from a __user pointer to a p2p DMA address
a) How to validate, setup iommu and maybe worst case bounce buffer
these p2p DMAs
2) How to allow drivers (ie GPU allocator) dynamically
remap pages in a VMA to/from p2p DMA addresses
3) How to expose uncachable p2p DMA address to user space via mmap
> to allow "get_user_pages" to work transparently similar
> how it is/was done for "DAX Device" case. Unfortunately
> based on my understanding "DAX Device" implementation
> deal only with permanently "locked" memory (fixed location)
> unrelated to "get_user_pages"/"put_page" scope
> which doesn't satisfy requirements for "eviction" / "moving" of
> memory keeping CPU address intact.
Hurm, isn't that issue with DAX only to do with being coherent with
the page cache?
A GPU allocator would not use the page cache, it would have to
construct VMAs some other way.
> My understanding is that It will not solve RDMA MR issue where "lock"
> could be during the whole application life but (a) it will not make
> RDMA MR case worse (b) should be enough for all other cases for
> "get_user_pages"/"put_page" controlled by kernel.
Right. There is no solution to the RDMA MR issue on old hardware. Apps
that are using GPU+RDMA+Old hardware will have to use short lived MRs
and pay that performance cost, or give up on migration.
Jason
^ permalink raw reply
* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Jason Gunthorpe @ 2016-11-24 16:15 UTC (permalink / raw)
To: Vishwanathapura, Niranjana
Cc: ira.weiny, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, Dennis Dalessandro
In-Reply-To: <20161124000825.GA73280-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
On Wed, Nov 23, 2016 at 04:08:25PM -0800, Vishwanathapura, Niranjana wrote:
> In order to pass the hfi function pointers to the hfi_vnic ULP, I can,
> a) Have hfi_vnic ULP define an interface API for hfi1 driver to call to
> register its callback (as you pointed). Unfortunately there will be a module
> dependency here.
> Or,
That is probably backwards
> b) Add a new member ‘struct vnic_ops’ either to the ib_device structure or
> ib_port_immutable structure. As it is hfi1 specific, only hfi1 driver will
> set it. No module dependency here.
You can add a hfi1_get_vnic_ops(struct ib_device *) and implement it
in your module..
> And will move the hfi_vnic module under
> ‘drivers/infiniband/ulp/hfi_vnic’.
I would prefer drivers/net/ethernet
This is clearly not a ULP since it doesn't use verbs.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [patch] net/mlx5: remove a duplicate condition
From: Saeed Mahameed @ 2016-11-24 15:30 UTC (permalink / raw)
To: Dan Carpenter
Cc: Saeed Mahameed, Matan Barak, Leon Romanovsky, Linux Netdev List,
linux-rdma, linux-kernel, kernel-janitors
In-Reply-To: <20161124110345.GC17225@mwanda>
On Thu, Nov 24, 2016 at 1:03 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> We verified that MLX5_FLOW_CONTEXT_ACTION_COUNT was set on the first
> line of the function so we don't need to check again here.
>
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
^ permalink raw reply
* Re: [PATCH] net/mlx5: drop duplicate header delay.h
From: Saeed Mahameed @ 2016-11-24 15:25 UTC (permalink / raw)
To: Geliang Tang
Cc: Saeed Mahameed, Matan Barak, Leon Romanovsky, Linux Netdev List,
linux-rdma, linux-kernel
In-Reply-To: <03d5a2a0f03458cdb4f2b139cfabc11b5c334f95.1479990943.git.geliangtang@gmail.com>
On Thu, Nov 24, 2016 at 3:58 PM, Geliang Tang <geliangtang@gmail.com> wrote:
> Drop duplicate header delay.h from mlx5/core/main.c.
>
> Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
^ permalink raw reply
* Re: [PATCH] net/mlx5: drop duplicate header delay.h
From: Matan Barak @ 2016-11-24 14:12 UTC (permalink / raw)
To: Geliang Tang, Saeed Mahameed, Leon Romanovsky
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <03d5a2a0f03458cdb4f2b139cfabc11b5c334f95.1479990943.git.geliangtang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On 24/11/2016 15:58, Geliang Tang wrote:
> Drop duplicate header delay.h from mlx5/core/main.c.
>
> Signed-off-by: Geliang Tang <geliangtang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index f28df33..d7a55eb 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -46,7 +46,6 @@
> #include <linux/mlx5/srq.h>
> #include <linux/debugfs.h>
> #include <linux/kmod.h>
> -#include <linux/delay.h>
> #include <linux/mlx5/mlx5_ifc.h>
> #ifdef CONFIG_RFS_ACCEL
> #include <linux/cpu_rmap.h>
>
Thanks.
Acked-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] net/mlx5: drop duplicate header delay.h
From: Geliang Tang @ 2016-11-24 13:58 UTC (permalink / raw)
To: Saeed Mahameed, Matan Barak, Leon Romanovsky
Cc: Geliang Tang, netdev, linux-rdma, linux-kernel
In-Reply-To: <15299de49216a2360976ca37ff774cae9d27d88b.1479991297.git.geliangtang@gmail.com>
Drop duplicate header delay.h from mlx5/core/main.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
---
drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index f28df33..d7a55eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -46,7 +46,6 @@
#include <linux/mlx5/srq.h>
#include <linux/debugfs.h>
#include <linux/kmod.h>
-#include <linux/delay.h>
#include <linux/mlx5/mlx5_ifc.h>
#ifdef CONFIG_RFS_ACCEL
#include <linux/cpu_rmap.h>
--
2.9.3
^ permalink raw reply related
* [patch] net/mlx5: remove a duplicate condition
From: Dan Carpenter @ 2016-11-24 11:03 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Matan Barak, Leon Romanovsky, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
kernel-janitors-u79uwXL29TY76Z2rM5mHXA
We verified that MLX5_FLOW_CONTEXT_ACTION_COUNT was set on the first
line of the function so we don't need to check again here.
Signed-off-by: Dan Carpenter <dan.carpenter-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
Not a bugfix so it would go into -next
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 68ec4ea..a263d89 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1307,8 +1307,7 @@ static bool counter_is_valid(struct mlx5_fc *counter, u32 action)
return false;
return (action & (MLX5_FLOW_CONTEXT_ACTION_DROP |
- MLX5_FLOW_CONTEXT_ACTION_FWD_DEST)) &&
- (action & MLX5_FLOW_CONTEXT_ACTION_COUNT);
+ MLX5_FLOW_CONTEXT_ACTION_FWD_DEST));
}
static bool dest_is_valid(struct mlx5_flow_destination *dest,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox