* ConnectX-3 on non-coherent AARCH64 system
@ 2017-03-09 23:59 Stephen Warren
[not found] ` <b01684b6-ba4a-65cc-ea20-b578e1e2fa2e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Stephen Warren @ 2017-03-09 23:59 UTC (permalink / raw)
To: Yishai Hadas; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
I have a pair of Mellanox ConnectX-3 cards. I can put these into a pair
of x86 machines, run an application using libibverbs on each, and
transfer data without issue.
However, when I run the same test on a (non-cache-coherent) AArch64
(ARMv8) machine (Cortex-A57, NVIDIA Tegra), the application does not
work. In particular, ivc_poll_cq() returns -2 when the application waits
for an ibv_post_send/recv() to complete.
Looking into libmlx4, it seems that the CQ memory content is
non-sensical; mainly all 0xcccccccc rather than the zero values that
mlx4_alloc_cq_buf() set it to. I'm not familiar with this HW's CQE
format, but those values don't seem to match what the code expects. I'm
pretty sure the HW is writing this data, since the 0xcccccccc value only
appears after ibv_cmd_create_cq()'s call into the kernel to configure
the CQ in HW. Can anyone help me track down where that data is coming from?
Perhaps related, I see the CQ memory is allocated via anonymous mmap(),
which will set up a cached mapping of the allocated pages. I'm not sure
how this can work given that HW appears to write to these physical pages
(rather than e.g. the kernel filling in the CQ), and there don't appear
to be any cache invalidation operations anywhere in libmlx4 (I'm not
even sure if cache manipulation is possible from user-space on ARM). I
also couldn't find anything in the kernel mlx4 driver that remaps the CQ
pages as non-cached in Linux kernel 4.4, but perhaps I just didn't find
it. Is this library/HW not supported on non-cache-coherent systems, or
am I missing something obvious?
Note: I'm using the libibverbs/libmlx4/... packaged in Ubuntu 16.04. I'm
aware that /usr/include/infiniband/arch.h there is not new enough to
contain the AArch64 barrier support. However, this issue appears deeper
than just barriers. Equally, I don't see any relevant difference in how
the latest libmlx4 source code allocates buffers, so I'm not convinced
that upgrading all the Infiniband libraries to e.g. the latest git
version would solve anything for me. However, if there's an expectation
that an upgrade would help this specific issue, I'm willing to try.
Thanks for any help!
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ConnectX-3 on non-coherent AARCH64 system
[not found] ` <b01684b6-ba4a-65cc-ea20-b578e1e2fa2e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2017-03-10 17:06 ` Jason Gunthorpe
[not found] ` <20170310170626.GE22960-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2017-03-10 17:06 UTC (permalink / raw)
To: Stephen Warren; +Cc: Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA
On Thu, Mar 09, 2017 at 04:59:40PM -0700, Stephen Warren wrote:
> However, when I run the same test on a (non-cache-coherent) AArch64 (ARMv8)
> machine (Cortex-A57, NVIDIA Tegra), the application does not work. In
> particular, ivc_poll_cq() returns -2 when the application waits for an
> ibv_post_send/recv() to complete.
User space verbs are not supported on non-cache-coherent architectures
for kernel bypass adapters and probably never will be.
Ideally the kernel would not create the uverbs device for DMA drivers
on such architectures.. Is there a kernel API to detect cache
incoherence?
> than e.g. the kernel filling in the CQ), and there don't appear to be any
> cache invalidation operations anywhere in libmlx4 (I'm not even sure if
> cache manipulation is possible from user-space on ARM). I also
> couldn't find
Precisely.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ConnectX-3 on non-coherent AARCH64 system
[not found] ` <20170310170626.GE22960-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-13 16:43 ` Stephen Warren
[not found] ` <40698e27-8b17-df09-8165-40420f16a37e-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Stephen Warren @ 2017-03-13 16:43 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Stephen Warren, Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 03/10/2017 10:06 AM, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2017 at 04:59:40PM -0700, Stephen Warren wrote:
>
>> However, when I run the same test on a (non-cache-coherent) AArch64 (ARMv8)
>> machine (Cortex-A57, NVIDIA Tegra), the application does not work. In
>> particular, ivc_poll_cq() returns -2 when the application waits for an
>> ibv_post_send/recv() to complete.
>
> User space verbs are not supported on non-cache-coherent architectures
> for kernel bypass adapters and probably never will be.
In my case, I primarily want another "DMA device" to access the data
read/written by Infiniband. If CPU access to the data buffers doesn't
work correctly, it might not be such a big deal. So, I'd be willing to
take a solution that only fixed the MMU setup issues for the pages used
for ConnectX HW control. I expect making those uncached would work, and
since the number of these control pages (compared to the data plane) is
small, the performance probably wouldn't be too bad. Would it be
possible for the kernel ConnectX driver to modify the process page
tables to flip all known control pages from cached to uncached when the
completion queues are created?
> Ideally the kernel would not create the uverbs device for DMA drivers
> on such architectures.. Is there a kernel API to detect cache
> incoherence?
I'm not aware of any right now (although I don't deal with Linux MM
much, so probably wouldn't be). I am aware of dma_alloc_coherent(), and
I assume that must choose uncached/cached/... based on the HW's
coherence, so perhaps that information could be found.
Once upon a time, we might have got away with assuming the answer based
on architecture, but I suspect that ARM==non-coherent is false in many
cases these days.
>> than e.g. the kernel filling in the CQ), and there don't appear to be any
>> cache invalidation operations anywhere in libmlx4 (I'm not even sure if
>> cache manipulation is possible from user-space on ARM). I also
>> couldn't find
>
> Precisely.
P.S. Sorry about the email disclaimer last time around; I hadn't noticed
that my email client defaulted to sending the message through the
corporate mail server.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ConnectX-3 on non-coherent AARCH64 system
[not found] ` <40698e27-8b17-df09-8165-40420f16a37e-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
@ 2017-03-13 16:55 ` Jason Gunthorpe
2017-03-13 17:38 ` Christoph Lameter
1 sibling, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2017-03-13 16:55 UTC (permalink / raw)
To: Stephen Warren
Cc: Stephen Warren, Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA
On Mon, Mar 13, 2017 at 10:43:04AM -0600, Stephen Warren wrote:
> In my case, I primarily want another "DMA device" to access the data
> read/written by Infiniband. If CPU access to the data buffers doesn't work
> correctly, it might not be such a big deal. So, I'd be willing to take a
> solution that only fixed the MMU setup issues for the pages used for
> ConnectX HW control. I expect making those uncached would work, and since
> the number of these control pages (compared to the data plane) is small, the
> performance probably wouldn't be too bad. Would it be possible for the
> kernel ConnectX driver to modify the process page tables to flip all known
> control pages from cached to uncached when the completion queues are
> created?
I don't know the mlx drivers well enough to remark if that is simple
or not.. It looks liky they rely on the userspace provider itself
allocating memory DMA rather than the driver in the kernel.. Flipping
pre-allocated user pages into UC is probably quite difficult to do
properly..
> Once upon a time, we might have got away with assuming the answer based on
> architecture, but I suspect that ARM==non-coherent is false in many cases
> these days.
There are certainly cases.. Currently we do not even build the user
space side on ARM32 because nobody has asked for it, even though
surely there is hardware that can accommodate it.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ConnectX-3 on non-coherent AARCH64 system
[not found] ` <40698e27-8b17-df09-8165-40420f16a37e-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2017-03-13 16:55 ` Jason Gunthorpe
@ 2017-03-13 17:38 ` Christoph Lameter
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Lameter @ 2017-03-13 17:38 UTC (permalink / raw)
To: Stephen Warren
Cc: Jason Gunthorpe, Stephen Warren, Yishai Hadas,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
On Mon, 13 Mar 2017, Stephen Warren wrote:
> > Ideally the kernel would not create the uverbs device for DMA drivers
> > on such architectures.. Is there a kernel API to detect cache
> > incoherence?
>
> I'm not aware of any right now (although I don't deal with Linux MM much, so
> probably wouldn't be). I am aware of dma_alloc_coherent(), and I assume that
> must choose uncached/cached/... based on the HW's coherence, so perhaps that
> information could be found.
>
> Once upon a time, we might have got away with assuming the answer based on
> architecture, but I suspect that ARM==non-coherent is false in many cases
> these days.
CONFIG_HAVE_GENERIC_DMA_COHERENT?
Look for DMA_COHERENT in the kernel header files
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-03-13 17:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-09 23:59 ConnectX-3 on non-coherent AARCH64 system Stephen Warren
[not found] ` <b01684b6-ba4a-65cc-ea20-b578e1e2fa2e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-03-10 17:06 ` Jason Gunthorpe
[not found] ` <20170310170626.GE22960-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-13 16:43 ` Stephen Warren
[not found] ` <40698e27-8b17-df09-8165-40420f16a37e-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2017-03-13 16:55 ` Jason Gunthorpe
2017-03-13 17:38 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox