* [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
@ 2009-04-24 19:05 Steve Wise
[not found] ` <20090424190510.3134.90405.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-04-24 19:05 UTC (permalink / raw)
To: Trond.Myklebust; +Cc: tmtalpey, tom, linux-nfs
A bad cast causes the iova_start, which in this case is a DMA bus address,
to be truncated on 32b systems. No cast is needed.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
net/sunrpc/xprtrdma/verbs.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 3b21e0c..3a374f5 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1489,7 +1489,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
memset(&frmr_wr, 0, sizeof frmr_wr);
frmr_wr.opcode = IB_WR_FAST_REG_MR;
frmr_wr.send_flags = 0; /* unsignaled */
- frmr_wr.wr.fast_reg.iova_start = (unsigned long)seg1->mr_dma;
+ frmr_wr.wr.fast_reg.iova_start = seg1->mr_dma;
frmr_wr.wr.fast_reg.page_list = seg1->mr_chunk.rl_mw->r.frmr.fr_pgl;
frmr_wr.wr.fast_reg.page_list_len = i;
frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <20090424190510.3134.90405.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
@ 2009-04-25 14:11 ` Steve Wise
2009-04-26 18:57 ` Steve Wise
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-04-25 14:11 UTC (permalink / raw)
To: Trond.Myklebust; +Cc: tmtalpey, tom, linux-nfs, vuhuong
Trond,
There is a similar bug in the server side too. I'll repost a single
patch that fixes both.
Steve.
Steve Wise wrote:
> A bad cast causes the iova_start, which in this case is a DMA bus address,
> to be truncated on 32b systems. No cast is needed.
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
>
> net/sunrpc/xprtrdma/verbs.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 3b21e0c..3a374f5 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1489,7 +1489,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> memset(&frmr_wr, 0, sizeof frmr_wr);
> frmr_wr.opcode = IB_WR_FAST_REG_MR;
> frmr_wr.send_flags = 0; /* unsignaled */
> - frmr_wr.wr.fast_reg.iova_start = (unsigned long)seg1->mr_dma;
> + frmr_wr.wr.fast_reg.iova_start = seg1->mr_dma;
> frmr_wr.wr.fast_reg.page_list = seg1->mr_chunk.rl_mw->r.frmr.fr_pgl;
> frmr_wr.wr.fast_reg.page_list_len = i;
> frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-04-25 14:11 ` Steve Wise
@ 2009-04-26 18:57 ` Steve Wise
2009-04-27 2:17 ` Tom Talpey
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-04-26 18:57 UTC (permalink / raw)
To: Trond.Myklebust; +Cc: tmtalpey, tom, linux-nfs, vuhuong
Hey Trond,
Turns out the server side is fine. So this patch is good to go.
Thanks,
Steve.
Steve Wise wrote:
> Trond,
>
> There is a similar bug in the server side too. I'll repost a single
> patch that fixes both.
>
> Steve.
>
> Steve Wise wrote:
>> A bad cast causes the iova_start, which in this case is a DMA bus
>> address,
>> to be truncated on 32b systems. No cast is needed.
>>
>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>> ---
>>
>> net/sunrpc/xprtrdma/verbs.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 3b21e0c..3a374f5 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -1489,7 +1489,7 @@ rpcrdma_register_frmr_external(struct
>> rpcrdma_mr_seg *seg,
>> memset(&frmr_wr, 0, sizeof frmr_wr);
>> frmr_wr.opcode = IB_WR_FAST_REG_MR;
>> frmr_wr.send_flags = 0; /* unsignaled */
>> - frmr_wr.wr.fast_reg.iova_start = (unsigned long)seg1->mr_dma;
>> + frmr_wr.wr.fast_reg.iova_start = seg1->mr_dma;
>> frmr_wr.wr.fast_reg.page_list =
>> seg1->mr_chunk.rl_mw->r.frmr.fr_pgl;
>> frmr_wr.wr.fast_reg.page_list_len = i;
>> frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-04-26 18:57 ` Steve Wise
@ 2009-04-27 2:17 ` Tom Talpey
[not found] ` <49f515a5.1d1e640a.1c82.6677-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Tom Talpey @ 2009-04-27 2:17 UTC (permalink / raw)
To: Steve Wise; +Cc: Trond.Myklebust, tom, linux-nfs, vuhuong
At 02:57 PM 4/26/2009, Steve Wise wrote:
>Hey Trond,
>
>Turns out the server side is fine. So this patch is good to go.
I'll ACK the patch, but I do wonder if the code will compile cleanly on all
platforms. The iova_start is a u64, whereas the mr_dma is a dma_addr_t,
which is variable sized, depending on platform. Would a (u64) cast be a
safer patch, warning-wise?
Tom.
>
>Thanks,
>
>Steve.
>
>Steve Wise wrote:
>> Trond,
>>
>> There is a similar bug in the server side too. I'll repost a single
>> patch that fixes both.
>>
>> Steve.
>>
>> Steve Wise wrote:
>>> A bad cast causes the iova_start, which in this case is a DMA bus
>>> address,
>>> to be truncated on 32b systems. No cast is needed.
>>>
>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>> ---
>>>
>>> net/sunrpc/xprtrdma/verbs.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index 3b21e0c..3a374f5 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -1489,7 +1489,7 @@ rpcrdma_register_frmr_external(struct
>>> rpcrdma_mr_seg *seg,
>>> memset(&frmr_wr, 0, sizeof frmr_wr);
>>> frmr_wr.opcode = IB_WR_FAST_REG_MR;
>>> frmr_wr.send_flags = 0; /* unsignaled */
>>> - frmr_wr.wr.fast_reg.iova_start = (unsigned long)seg1->mr_dma;
>>> + frmr_wr.wr.fast_reg.iova_start = seg1->mr_dma;
>>> frmr_wr.wr.fast_reg.page_list =
>>> seg1->mr_chunk.rl_mw->r.frmr.fr_pgl;
>>> frmr_wr.wr.fast_reg.page_list_len = i;
>>> frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <49f515a5.1d1e640a.1c82.6677-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2009-04-27 17:37 ` Steve Wise
2009-04-27 18:05 ` Trond Myklebust
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-04-27 17:37 UTC (permalink / raw)
To: Tom Talpey; +Cc: Trond.Myklebust, tom, linux-nfs, vuhuong
Tom Talpey wrote:
> At 02:57 PM 4/26/2009, Steve Wise wrote:
>
>> Hey Trond,
>>
>> Turns out the server side is fine. So this patch is good to go.
>>
>
> I'll ACK the patch, but I do wonder if the code will compile cleanly on all
> platforms. The iova_start is a u64, whereas the mr_dma is a dma_addr_t,
> which is variable sized, depending on platform. Would a (u64) cast be a
> safer patch, warning-wise?
>
>
Well, if dma_addr_t is smaller, I don't think you'll get a warning, will
you? And if its larger than 64b, then you _want_ the warning, because
nothing will work. :)
Its up to you and/or Trond.
Steve.
> Tom.
>
>
>> Thanks,
>>
>> Steve.
>>
>> Steve Wise wrote:
>>
>>> Trond,
>>>
>>> There is a similar bug in the server side too. I'll repost a single
>>> patch that fixes both.
>>>
>>> Steve.
>>>
>>> Steve Wise wrote:
>>>
>>>> A bad cast causes the iova_start, which in this case is a DMA bus
>>>> address,
>>>> to be truncated on 32b systems. No cast is needed.
>>>>
>>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>>> ---
>>>>
>>>> net/sunrpc/xprtrdma/verbs.c | 2 +-
>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>>> index 3b21e0c..3a374f5 100644
>>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>>> @@ -1489,7 +1489,7 @@ rpcrdma_register_frmr_external(struct
>>>> rpcrdma_mr_seg *seg,
>>>> memset(&frmr_wr, 0, sizeof frmr_wr);
>>>> frmr_wr.opcode = IB_WR_FAST_REG_MR;
>>>> frmr_wr.send_flags = 0; /* unsignaled */
>>>> - frmr_wr.wr.fast_reg.iova_start = (unsigned long)seg1->mr_dma;
>>>> + frmr_wr.wr.fast_reg.iova_start = seg1->mr_dma;
>>>> frmr_wr.wr.fast_reg.page_list =
>>>> seg1->mr_chunk.rl_mw->r.frmr.fr_pgl;
>>>> frmr_wr.wr.fast_reg.page_list_len = i;
>>>> frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-04-27 17:37 ` Steve Wise
@ 2009-04-27 18:05 ` Trond Myklebust
[not found] ` <1240855510.8818.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2009-04-27 18:05 UTC (permalink / raw)
To: Steve Wise; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
On Mon, 2009-04-27 at 12:37 -0500, Steve Wise wrote:
> Tom Talpey wrote:
> > At 02:57 PM 4/26/2009, Steve Wise wrote:
> >
> >> Hey Trond,
> >>
> >> Turns out the server side is fine. So this patch is good to go.
> >>
> >
> > I'll ACK the patch, but I do wonder if the code will compile cleanly on all
> > platforms. The iova_start is a u64, whereas the mr_dma is a dma_addr_t,
> > which is variable sized, depending on platform. Would a (u64) cast be a
> > safer patch, warning-wise?
> >
> >
>
> Well, if dma_addr_t is smaller, I don't think you'll get a warning, will
> you? And if its larger than 64b, then you _want_ the warning, because
> nothing will work. :)
>
> Its up to you and/or Trond.
It looks looks as though the bug is really that the IB code is using a
u64 to store dma handles. As an external user of the IB api, we really
shouldn't have to perform this sort of transformation. If it is
absolutely necessary, then it should be done by means of specialised
accessor functions to initialise/read iova_start value when given a
dma_addr_t.
I'd therefore prefer the no-cast version (with eventual compiler
warnings), in the hope that eventually the IB folks will fix their
interface.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <1240855510.8818.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-04-27 18:23 ` Trond Myklebust
[not found] ` <1240856613.8818.16.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2009-04-27 18:23 UTC (permalink / raw)
To: Steve Wise; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
On Mon, 2009-04-27 at 14:05 -0400, Trond Myklebust wrote:
> It looks looks as though the bug is really that the IB code is using a
> u64 to store dma handles. As an external user of the IB api, we really
> shouldn't have to perform this sort of transformation. If it is
> absolutely necessary, then it should be done by means of specialised
> accessor functions to initialise/read iova_start value when given a
> dma_addr_t.
>
> I'd therefore prefer the no-cast version (with eventual compiler
> warnings), in the hope that eventually the IB folks will fix their
> interface.
Translation: It looks to me as if the interface that we're using is a
bit too corrupted with IB low level implementation grime. In the future,
I'd like to see someone come up with a more high level interface for use
by external code such as the sunrpc module.
Cheers
Trond
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <1240856613.8818.16.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-04-27 19:32 ` Steve Wise
2009-04-27 19:42 ` Tom Talpey
2009-04-27 20:46 ` Trond Myklebust
0 siblings, 2 replies; 29+ messages in thread
From: Steve Wise @ 2009-04-27 19:32 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
Trond Myklebust wrote:
> On Mon, 2009-04-27 at 14:05 -0400, Trond Myklebust wrote:
>
>> It looks looks as though the bug is really that the IB code is using a
>> u64 to store dma handles. As an external user of the IB api, we really
>> shouldn't have to perform this sort of transformation. If it is
>> absolutely necessary, then it should be done by means of specialised
>> accessor functions to initialise/read iova_start value when given a
>> dma_addr_t.
>>
>> I'd therefore prefer the no-cast version (with eventual compiler
>> warnings), in the hope that eventually the IB folks will fix their
>> interface.
>>
>
> Translation: It looks to me as if the interface that we're using is a
> bit too corrupted with IB low level implementation grime. In the future,
> I'd like to see someone come up with a more high level interface for use
> by external code such as the sunrpc module.
>
>
Clarification: The iova_start isn't used to store dma handles. The
iova_start is the "address" base value that is advertised to a peer to
describe the base address of a memory region. The contents of that can
be more than just a dma handle...its up to the application. For
instance, you could advertise a iova_start of zero or a kernel VA as the
rdma server does. Also, the type is u64 because that is the size used
on the wire as part of the rdma (IB and iWARP) protocols.
Steve.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-04-27 19:32 ` Steve Wise
@ 2009-04-27 19:42 ` Tom Talpey
[not found] ` <49f60ac4.1c1d640a.2d0a.61a7-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-04-27 20:46 ` Trond Myklebust
1 sibling, 1 reply; 29+ messages in thread
From: Tom Talpey @ 2009-04-27 19:42 UTC (permalink / raw)
To: Steve Wise; +Cc: Trond Myklebust, tom, linux-nfs, vuhuong
At 03:32 PM 4/27/2009, Steve Wise wrote:
>Trond Myklebust wrote:
>> On Mon, 2009-04-27 at 14:05 -0400, Trond Myklebust wrote:
>>
>>> It looks looks as though the bug is really that the IB code is using a
>>> u64 to store dma handles. As an external user of the IB api, we really
>>> shouldn't have to perform this sort of transformation. If it is
>>> absolutely necessary, then it should be done by means of specialised
>>> accessor functions to initialise/read iova_start value when given a
>>> dma_addr_t.
>>>
>>> I'd therefore prefer the no-cast version (with eventual compiler
>>> warnings), in the hope that eventually the IB folks will fix their
>>> interface.
>>>
>>
>> Translation: It looks to me as if the interface that we're using is a
>> bit too corrupted with IB low level implementation grime. In the future,
>> I'd like to see someone come up with a more high level interface for use
>> by external code such as the sunrpc module.
>>
>>
>
>Clarification: The iova_start isn't used to store dma handles. The
Agreed, it's more of a hardware register, that ends up on the wire as well.
I think the net of this is that the mr_dma should have a more sensible
up-cast that yields the right bits in the iova_start. Maybe a nice
machine-dependent macro, defined in the RDMA layer, would be a good
approach. Surely the other upper layers need it too.
While I have the floor, why doesn't the server have this issue? Looking
at the code, it has the same (unsigned long) cast as the client when
initializing its iova_start.
Tom.
>iova_start is the "address" base value that is advertised to a peer to
>describe the base address of a memory region. The contents of that can
>be more than just a dma handle...its up to the application. For
>instance, you could advertise a iova_start of zero or a kernel VA as the
>rdma server does. Also, the type is u64 because that is the size used
>on the wire as part of the rdma (IB and iWARP) protocols.
>
>
>Steve.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <49f60ac4.1c1d640a.2d0a.61a7-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2009-04-27 19:50 ` Steve Wise
2009-04-27 20:06 ` Tom Talpey
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-04-27 19:50 UTC (permalink / raw)
To: Tom Talpey; +Cc: Trond Myklebust, tom, linux-nfs, vuhuong
Tom Talpey wrote:
> At 03:32 PM 4/27/2009, Steve Wise wrote:
>
>> Trond Myklebust wrote:
>>
>>> On Mon, 2009-04-27 at 14:05 -0400, Trond Myklebust wrote:
>>>
>>>
>>>> It looks looks as though the bug is really that the IB code is using a
>>>> u64 to store dma handles. As an external user of the IB api, we really
>>>> shouldn't have to perform this sort of transformation. If it is
>>>> absolutely necessary, then it should be done by means of specialised
>>>> accessor functions to initialise/read iova_start value when given a
>>>> dma_addr_t.
>>>>
>>>> I'd therefore prefer the no-cast version (with eventual compiler
>>>> warnings), in the hope that eventually the IB folks will fix their
>>>> interface.
>>>>
>>>>
>>> Translation: It looks to me as if the interface that we're using is a
>>> bit too corrupted with IB low level implementation grime. In the future,
>>> I'd like to see someone come up with a more high level interface for use
>>> by external code such as the sunrpc module.
>>>
>>>
>>>
>> Clarification: The iova_start isn't used to store dma handles. The
>>
>
> Agreed, it's more of a hardware register, that ends up on the wire as well.
>
> I think the net of this is that the mr_dma should have a more sensible
> up-cast that yields the right bits in the iova_start. Maybe a nice
> machine-dependent macro, defined in the RDMA layer, would be a good
> approach. Surely the other upper layers need it too.
>
> While I have the floor, why doesn't the server have this issue? Looking
> at the code, it has the same (unsigned long) cast as the client when
> initializing its iova_start.
>
>
The server isn't using the dma address as the iova_start, but rather a
kernel virtual address pointer, which is 32b on a i386 system. If you
take the cast off, then the the signed bit gets extended into the u64.
Apparently pointers are signed?
For instance, the server had a kva of 0xf5b75000. If you remove the
(unsigned long) cast and stuff that into a u64, it ends up as
0xfffffffff5b75000.
here was a trace I took of the server doing the first rdma write using
an frmr:
Apr 26 13:14:07 rac2 kernel: build_fastreg iova_start 0xfffffffff5b75000
rkey 0x500 len 4096
Apr 26 13:14:07 rac2 kernel: build_fastreg pbl[0] 0x35b75000
Apr 26 13:14:07 rac2 kernel: build_rdma_write sge[0] lkey 0x500 addr
0xf5b75000 len 24
Apr 26 13:14:07 rac2 kernel: post_qp_event - AE qpid 0x23 opcode 0
status 0x6 type 1 wrid.hi 0x1 wrid.lo 0x0
So the frmr registration ends up with 0xfffffffff5b75000 as the
iova_start, yet the rdma write work request has 0xf5b75000 as the sge
address entry. And the rnic fails this WR with a base/bounds violation
(status 0x6 in the chelsio Async Event).
Steve.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-04-27 19:50 ` Steve Wise
@ 2009-04-27 20:06 ` Tom Talpey
[not found] ` <49f61067.181e640a.3cb9.0e6c-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Tom Talpey @ 2009-04-27 20:06 UTC (permalink / raw)
To: Steve Wise; +Cc: Trond Myklebust, tom, linux-nfs, vuhuong
At 03:50 PM 4/27/2009, Steve Wise wrote:
>Tom Talpey wrote:
>> At 03:32 PM 4/27/2009, Steve Wise wrote:
>>
>>> Trond Myklebust wrote:
>>>
>>>> On Mon, 2009-04-27 at 14:05 -0400, Trond Myklebust wrote:
>>>>
>>>>
>>>>> It looks looks as though the bug is really that the IB code is using a
>>>>> u64 to store dma handles. As an external user of the IB api, we really
>>>>> shouldn't have to perform this sort of transformation. If it is
>>>>> absolutely necessary, then it should be done by means of specialised
>>>>> accessor functions to initialise/read iova_start value when given a
>>>>> dma_addr_t.
>>>>>
>>>>> I'd therefore prefer the no-cast version (with eventual compiler
>>>>> warnings), in the hope that eventually the IB folks will fix their
>>>>> interface.
>>>>>
>>>>>
>>>> Translation: It looks to me as if the interface that we're using is a
>>>> bit too corrupted with IB low level implementation grime. In the future,
>>>> I'd like to see someone come up with a more high level interface for use
>>>> by external code such as the sunrpc module.
>>>>
>>>>
>>>>
>>> Clarification: The iova_start isn't used to store dma handles. The
>>>
>>
>> Agreed, it's more of a hardware register, that ends up on the wire as well.
>>
>> I think the net of this is that the mr_dma should have a more sensible
>> up-cast that yields the right bits in the iova_start. Maybe a nice
>> machine-dependent macro, defined in the RDMA layer, would be a good
>> approach. Surely the other upper layers need it too.
>>
>> While I have the floor, why doesn't the server have this issue? Looking
>> at the code, it has the same (unsigned long) cast as the client when
>> initializing its iova_start.
>>
>>
>
>The server isn't using the dma address as the iova_start, but rather a
>kernel virtual address pointer, which is 32b on a i386 system. If you
>take the cast off, then the the signed bit gets extended into the u64.
>Apparently pointers are signed?
Why is the server using a u64 to store a naked pointer? That has to be
a bug. Casting to (unsigned long) is just hiding it.
Does this address get handed to the RNIC to perform some sort of local
DMA? If so, how does it work if there's an IOMMU in the system? The
kva isn't necessarily the same as the dma_addr, right?
BTW, pointers are unsigned, but the assignment to u64 causes the
compiler to convert the pointer into a ptrdiff_t, in effect evaluating
((pointer) - NULL). Then, since the ptrdiff_t is a signed 32 bits, the
promotion results in the sign extension. I think! IOW, bug.
Tom.
>
>For instance, the server had a kva of 0xf5b75000. If you remove the
>(unsigned long) cast and stuff that into a u64, it ends up as
>0xfffffffff5b75000.
>
>here was a trace I took of the server doing the first rdma write using
>an frmr:
>
>Apr 26 13:14:07 rac2 kernel: build_fastreg iova_start 0xfffffffff5b75000
>rkey 0x500 len 4096
>Apr 26 13:14:07 rac2 kernel: build_fastreg pbl[0] 0x35b75000
>Apr 26 13:14:07 rac2 kernel: build_rdma_write sge[0] lkey 0x500 addr
>0xf5b75000 len 24
>Apr 26 13:14:07 rac2 kernel: post_qp_event - AE qpid 0x23 opcode 0
>status 0x6 type 1 wrid.hi 0x1 wrid.lo 0x0
>
>
>So the frmr registration ends up with 0xfffffffff5b75000 as the
>iova_start, yet the rdma write work request has 0xf5b75000 as the sge
>address entry. And the rnic fails this WR with a base/bounds violation
>(status 0x6 in the chelsio Async Event).
>
>Steve.
>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <49f61067.181e640a.3cb9.0e6c-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2009-04-27 20:20 ` Steve Wise
0 siblings, 0 replies; 29+ messages in thread
From: Steve Wise @ 2009-04-27 20:20 UTC (permalink / raw)
To: Tom Talpey; +Cc: Trond Myklebust, tom, linux-nfs, vuhuong
Tom Talpey wrote:
> At 03:50 PM 4/27/2009, Steve Wise wrote:
>
>> Tom Talpey wrote:
>>
>>> At 03:32 PM 4/27/2009, Steve Wise wrote:
>>>
>>>
>>>> Trond Myklebust wrote:
>>>>
>>>>
>>>>> On Mon, 2009-04-27 at 14:05 -0400, Trond Myklebust wrote:
>>>>>
>>>>>
>>>>>
>>>>>> It looks looks as though the bug is really that the IB code is using a
>>>>>> u64 to store dma handles. As an external user of the IB api, we really
>>>>>> shouldn't have to perform this sort of transformation. If it is
>>>>>> absolutely necessary, then it should be done by means of specialised
>>>>>> accessor functions to initialise/read iova_start value when given a
>>>>>> dma_addr_t.
>>>>>>
>>>>>> I'd therefore prefer the no-cast version (with eventual compiler
>>>>>> warnings), in the hope that eventually the IB folks will fix their
>>>>>> interface.
>>>>>>
>>>>>>
>>>>>>
>>>>> Translation: It looks to me as if the interface that we're using is a
>>>>> bit too corrupted with IB low level implementation grime. In the future,
>>>>> I'd like to see someone come up with a more high level interface for use
>>>>> by external code such as the sunrpc module.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Clarification: The iova_start isn't used to store dma handles. The
>>>>
>>>>
>>> Agreed, it's more of a hardware register, that ends up on the wire as well.
>>>
>>> I think the net of this is that the mr_dma should have a more sensible
>>> up-cast that yields the right bits in the iova_start. Maybe a nice
>>> machine-dependent macro, defined in the RDMA layer, would be a good
>>> approach. Surely the other upper layers need it too.
>>>
>>> While I have the floor, why doesn't the server have this issue? Looking
>>> at the code, it has the same (unsigned long) cast as the client when
>>> initializing its iova_start.
>>>
>>>
>>>
>> The server isn't using the dma address as the iova_start, but rather a
>> kernel virtual address pointer, which is 32b on a i386 system. If you
>> take the cast off, then the the signed bit gets extended into the u64.
>> Apparently pointers are signed?
>>
>
> Why is the server using a u64 to store a naked pointer? That has to be
> a bug. Casting to (unsigned long) is just hiding it.
>
That is what it wants to use as the registration for its frmr, which in
this case is used as the source of an RDMA Write.
> Does this address get handed to the RNIC to perform some sort of local
> DMA?
No.
> If so, how does it work if there's an IOMMU in the system? The
> kva isn't necessarily the same as the dma_addr, right?
>
>
Correct.
This kva is used as the iova_start for the fast-registered memory
region. All it is used for is to mark the base value for "addresses"
passed in via sge entries in the work requests, and also for incoming
"addresses" in rdma packets. So you can use the kva when you fastreg
the mr, and then also use the kva + any offset in the sge entries of
your work requests that utilize it. Additionally, you can advertise the
fastreg rkey, iova_start, and length to the peer for doing rdma into
that region. The HW will validate any SGE entry in and any incoming
rdma packets to ensure that the rkey/addr/len in the sge/packet is
within the bounds of the fastregmr. Namely that the sge/packet address
and length fall within the iova_start and iova_start+fastreg_len.
> BTW, pointers are unsigned, but the assignment to u64 causes the
> compiler to convert the pointer into a ptrdiff_t, in effect evaluating
> ((pointer) - NULL). Then, since the ptrdiff_t is a signed 32 bits, the
> promotion results in the sign extension. I think! IOW, bug.
>
>
I see. And that's why the cast is needed for the server side.
Steve.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-04-27 19:32 ` Steve Wise
2009-04-27 19:42 ` Tom Talpey
@ 2009-04-27 20:46 ` Trond Myklebust
[not found] ` <1240865214.8818.73.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
1 sibling, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2009-04-27 20:46 UTC (permalink / raw)
To: Steve Wise; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
On Mon, 2009-04-27 at 14:32 -0500, Steve Wise wrote:
> Clarification: The iova_start isn't used to store dma handles. The
> iova_start is the "address" base value that is advertised to a peer to
> describe the base address of a memory region. The contents of that can
> be more than just a dma handle...its up to the application. For
> instance, you could advertise a iova_start of zero or a kernel VA as the
> rdma server does. Also, the type is u64 because that is the size used
> on the wire as part of the rdma (IB and iWARP) protocols.
OK, but my point is we shouldn't be having this discussion at all. I
shouldn't be required to know that the wire protocol uses a 64-bit
unsigned little-endian/big endian integer in order to use the rdma api.
All I should need to know is that I can advertise either dma handles or
kernel VAs, and know that I can choose between two functions, say,
ib_send_wr_fastreg_dma_init() and ib_send_wr_fastreg_kva_init() to
initialise the ib_send_wr structure correctly.
Trond
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <1240865214.8818.73.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-04-27 20:49 ` Steve Wise
2009-05-11 22:25 ` Steve Wise
1 sibling, 0 replies; 29+ messages in thread
From: Steve Wise @ 2009-04-27 20:49 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
Trond Myklebust wrote:
> On Mon, 2009-04-27 at 14:32 -0500, Steve Wise wrote:
>
>> Clarification: The iova_start isn't used to store dma handles. The
>> iova_start is the "address" base value that is advertised to a peer to
>> describe the base address of a memory region. The contents of that can
>> be more than just a dma handle...its up to the application. For
>> instance, you could advertise a iova_start of zero or a kernel VA as the
>> rdma server does. Also, the type is u64 because that is the size used
>> on the wire as part of the rdma (IB and iWARP) protocols.
>>
>
> OK, but my point is we shouldn't be having this discussion at all. I
> shouldn't be required to know that the wire protocol uses a 64-bit
> unsigned little-endian/big endian integer in order to use the rdma api.
>
> All I should need to know is that I can advertise either dma handles or
> kernel VAs, and know that I can choose between two functions, say,
> ib_send_wr_fastreg_dma_init() and ib_send_wr_fastreg_kva_init() to
> initialise the ib_send_wr structure correctly.
>
I hear ya.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <1240865214.8818.73.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-04-27 20:49 ` Steve Wise
@ 2009-05-11 22:25 ` Steve Wise
2009-05-11 22:50 ` Trond Myklebust
1 sibling, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-05-11 22:25 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
Hey Trond,
Will this bug fix make 2.6.30?
Thanks,
Steve.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-05-11 22:25 ` Steve Wise
@ 2009-05-11 22:50 ` Trond Myklebust
[not found] ` <1242082203.1743.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2009-05-11 22:50 UTC (permalink / raw)
To: Steve Wise; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
> Hey Trond,
>
> Will this bug fix make 2.6.30?
>
> Thanks,
>
> Steve.
Not in the form it is in now. As I've said earlier, I'm not happy about
the sunrpc layer having to circumvent ordinary type checking on
non-sunrpc structures.
Cheers
Trond
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <1242082203.1743.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-05-12 0:13 ` Steve Wise
2009-05-12 0:23 ` Tom Talpey
2009-05-12 0:44 ` Trond Myklebust
0 siblings, 2 replies; 29+ messages in thread
From: Steve Wise @ 2009-05-12 0:13 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
Trond Myklebust wrote:
> On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
>
>> Hey Trond,
>>
>> Will this bug fix make 2.6.30?
>>
>> Thanks,
>>
>> Steve.
>>
>
> Not in the form it is in now. As I've said earlier, I'm not happy about
> the sunrpc layer having to circumvent ordinary type checking on
> non-sunrpc structures.
>
> Cheers
> Trond
How is it circumventing? It's currently incorrectly casting a pointer
into a u64. That seems just broken to me. Also, its really the sunrpc
rdma transport layer. It deals specifically with rdma. It _should_
know about rdma interfaces and types.
But I'll whip up get/put type accessor methods for this field if that's
what you require. In the meantime, customers will just crash I guess.
Steve.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-05-12 0:13 ` Steve Wise
@ 2009-05-12 0:23 ` Tom Talpey
[not found] ` <4a08c1b5.151e640a.0a99.fffff868-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-12 0:44 ` Trond Myklebust
1 sibling, 1 reply; 29+ messages in thread
From: Tom Talpey @ 2009-05-12 0:23 UTC (permalink / raw)
To: Steve Wise; +Cc: Trond Myklebust, tom, linux-nfs, vuhuong
At 08:13 PM 5/11/2009, Steve Wise wrote:
>Trond Myklebust wrote:
>> On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
>>
>>> Hey Trond,
>>>
>>> Will this bug fix make 2.6.30?
>>>
>>> Thanks,
>>>
>>> Steve.
>>>
>>
>> Not in the form it is in now. As I've said earlier, I'm not happy about
>> the sunrpc layer having to circumvent ordinary type checking on
>> non-sunrpc structures.
>>
>> Cheers
>> Trond
>How is it circumventing? It's currently incorrectly casting a pointer
>into a u64. That seems just broken to me.
The cast is definitely broken, and since your patch removes it I agree
with the change, for the record. Especially since with the change, the
code doesn't work on 32-bit CPU / 64-bit IOMMU platforms.
> Also, its really the sunrpc
>rdma transport layer. It deals specifically with rdma. It _should_
>know about rdma interfaces and types.
IOW, this is an issue for the OFA API to address? I agree with Trond that
there are some hardware-specific warts in there. It might be a good idea
for us to go through both the client and server code overall - you up for
doing that? I'm in, if so.
Tom.
>
>But I'll whip up get/put type accessor methods for this field if that's
>what you require. In the meantime, customers will just crash I guess.
>
>
>Steve.
>
>
>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-05-12 0:13 ` Steve Wise
2009-05-12 0:23 ` Tom Talpey
@ 2009-05-12 0:44 ` Trond Myklebust
[not found] ` <1242089066.1743.19.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
1 sibling, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2009-05-12 0:44 UTC (permalink / raw)
To: Steve Wise; +Cc: Tom Talpey, tom, linux-nfs, vuhuong
On Mon, 2009-05-11 at 19:13 -0500, Steve Wise wrote:
> Trond Myklebust wrote:
> > On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
> >
> >> Hey Trond,
> >>
> >> Will this bug fix make 2.6.30?
> >>
> >> Thanks,
> >>
> >> Steve.
> >>
> >
> > Not in the form it is in now. As I've said earlier, I'm not happy about
> > the sunrpc layer having to circumvent ordinary type checking on
> > non-sunrpc structures.
> >
> > Cheers
> > Trond
> How is it circumventing? It's currently incorrectly casting a pointer
> into a u64. That seems just broken to me. Also, its really the sunrpc
> rdma transport layer. It deals specifically with rdma. It _should_
> know about rdma interfaces and types.
The fact is that I'm simply not interested enough in rdma to tolerate
hacks. If it isn't done cleanly, in a manner that I can maintain, then
the whole transport layer comes out...
Trond
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <4a08c1b5.151e640a.0a99.fffff868-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2009-05-12 0:44 ` Steve Wise
0 siblings, 0 replies; 29+ messages in thread
From: Steve Wise @ 2009-05-12 0:44 UTC (permalink / raw)
To: Tom Talpey; +Cc: Trond Myklebust, tom, linux-nfs, vuhuong
Tom Talpey wrote:
> At 08:13 PM 5/11/2009, Steve Wise wrote:
>
>> Trond Myklebust wrote:
>>
>>> On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
>>>
>>>
>>>> Hey Trond,
>>>>
>>>> Will this bug fix make 2.6.30?
>>>>
>>>> Thanks,
>>>>
>>>> Steve.
>>>>
>>>>
>>> Not in the form it is in now. As I've said earlier, I'm not happy about
>>> the sunrpc layer having to circumvent ordinary type checking on
>>> non-sunrpc structures.
>>>
>>> Cheers
>>> Trond
>>>
>> How is it circumventing? It's currently incorrectly casting a pointer
>> into a u64. That seems just broken to me.
>>
>
> The cast is definitely broken, and since your patch removes it I agree
> with the change, for the record. Especially since with the change, the
> code doesn't work on 32-bit CPU / 64-bit IOMMU platforms.
>
>
That was my short-term concern. Its just broke.
>> Also, its really the sunrpc
>> rdma transport layer. It deals specifically with rdma. It _should_
>> know about rdma interfaces and types.
>>
>
> IOW, this is an issue for the OFA API to address? I agree with Trond that
> there are some hardware-specific warts in there. It might be a good idea
> for us to go through both the client and server code overall - you up for
> doing that? I'm in, if so.
>
>
Ok.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <1242089066.1743.19.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-05-12 1:14 ` Tom Talpey
[not found] ` <4a08cd7b.48c3f10a.6bb1.fffff6d3-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Tom Talpey @ 2009-05-12 1:14 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Steve Wise, tom, linux-nfs, vuhuong
At 08:44 PM 5/11/2009, Trond Myklebust wrote:
>On Mon, 2009-05-11 at 19:13 -0500, Steve Wise wrote:
>> Trond Myklebust wrote:
>> > On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
>> >
>> >> Hey Trond,
>> >>
>> >> Will this bug fix make 2.6.30?
>> >>
>> >> Thanks,
>> >>
>> >> Steve.
>> >>
>> >
>> > Not in the form it is in now. As I've said earlier, I'm not happy about
>> > the sunrpc layer having to circumvent ordinary type checking on
>> > non-sunrpc structures.
>> >
>> > Cheers
>> > Trond
>> How is it circumventing? It's currently incorrectly casting a pointer
>> into a u64. That seems just broken to me. Also, its really the sunrpc
>> rdma transport layer. It deals specifically with rdma. It _should_
>> know about rdma interfaces and types.
>
>The fact is that I'm simply not interested enough in rdma to tolerate
>hacks. If it isn't done cleanly, in a manner that I can maintain, then
>the whole transport layer comes out...
I know exactly what you want - it's not what the code does now and
it's not an accessor function to set the hardware's u64 field. What's
needed is a new function to manage the entire RDMA triplet, and the
memory registration behind it, in the OFA code side. Put the hardware
goop below the line, IOW. I'll dust up Steve on this.
Tom.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <4a08cd7b.48c3f10a.6bb1.fffff6d3-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2009-05-12 1:35 ` Trond Myklebust
[not found] ` <1242092150.16618.15.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2009-05-12 1:35 UTC (permalink / raw)
To: Tom Talpey; +Cc: Steve Wise, tom, linux-nfs, vuhuong
On Mon, 2009-05-11 at 21:14 -0400, Tom Talpey wrote:
> At 08:44 PM 5/11/2009, Trond Myklebust wrote:
> >On Mon, 2009-05-11 at 19:13 -0500, Steve Wise wrote:
> >> Trond Myklebust wrote:
> >> > On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
> >> >
> >> >> Hey Trond,
> >> >>
> >> >> Will this bug fix make 2.6.30?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Steve.
> >> >>
> >> >
> >> > Not in the form it is in now. As I've said earlier, I'm not happy about
> >> > the sunrpc layer having to circumvent ordinary type checking on
> >> > non-sunrpc structures.
> >> >
> >> > Cheers
> >> > Trond
> >> How is it circumventing? It's currently incorrectly casting a pointer
> >> into a u64. That seems just broken to me. Also, its really the sunrpc
> >> rdma transport layer. It deals specifically with rdma. It _should_
> >> know about rdma interfaces and types.
> >
> >The fact is that I'm simply not interested enough in rdma to tolerate
> >hacks. If it isn't done cleanly, in a manner that I can maintain, then
> >the whole transport layer comes out...
>
> I know exactly what you want - it's not what the code does now and
> it's not an accessor function to set the hardware's u64 field. What's
> needed is a new function to manage the entire RDMA triplet, and the
> memory registration behind it, in the OFA code side. Put the hardware
> goop below the line, IOW. I'll dust up Steve on this.
This does indeed sound like what I'd looking for.
There is a huge difference between having code that depends on well
defined rdma interfaces, and code that depends on rdma hacks. A piece of
code that requires casts from a non-local opaque type into another
protocol-dependent non-local type will definitely fall in the latter
category. I really don't care what the current code does, but a fix for
that code is something that does it _correctly_; it is not yet another
hack, whether or not it fixes a bug in the short term.
Trond
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <1242092150.16618.15.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-05-12 3:06 ` Steve Wise
2009-05-12 16:11 ` [ofa-general] " Steve Wise
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-05-12 3:06 UTC (permalink / raw)
To: Trond Myklebust
Cc: Tom Talpey, tom, linux-nfs, vuhuong, Roland Dreier,
general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
Trond Myklebust wrote:
> On Mon, 2009-05-11 at 21:14 -0400, Tom Talpey wrote:
>
>> At 08:44 PM 5/11/2009, Trond Myklebust wrote:
>>
>>> On Mon, 2009-05-11 at 19:13 -0500, Steve Wise wrote:
>>>
>>>> Trond Myklebust wrote:
>>>>
>>>>> On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
>>>>>
>>>>>
>>>>>> Hey Trond,
>>>>>>
>>>>>> Will this bug fix make 2.6.30?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Steve.
>>>>>>
>>>>>>
>>>>> Not in the form it is in now. As I've said earlier, I'm not happy about
>>>>> the sunrpc layer having to circumvent ordinary type checking on
>>>>> non-sunrpc structures.
>>>>>
>>>>> Cheers
>>>>> Trond
>>>>>
>>>> How is it circumventing? It's currently incorrectly casting a pointer
>>>> into a u64. That seems just broken to me. Also, its really the sunrpc
>>>> rdma transport layer. It deals specifically with rdma. It _should_
>>>> know about rdma interfaces and types.
>>>>
>>> The fact is that I'm simply not interested enough in rdma to tolerate
>>> hacks. If it isn't done cleanly, in a manner that I can maintain, then
>>> the whole transport layer comes out...
>>>
>> I know exactly what you want - it's not what the code does now and
>> it's not an accessor function to set the hardware's u64 field. What's
>> needed is a new function to manage the entire RDMA triplet, and the
>> memory registration behind it, in the OFA code side. Put the hardware
>> goop below the line, IOW. I'll dust up Steve on this.
>>
>
> This does indeed sound like what I'd looking for.
>
> There is a huge difference between having code that depends on well
> defined rdma interfaces, and code that depends on rdma hacks. A piece of
> code that requires casts from a non-local opaque type into another
> protocol-dependent non-local type will definitely fall in the latter
> category. I really don't care what the current code does, but a fix for
> that code is something that does it _correctly_; it is not yet another
> hack, whether or not it fixes a bug in the short term.
>
> Trond
>
>
Trond, I get your point, and we can certainly work on improving this
with the rdma developer community. But removing the one-line-broken
cast will resolve a current crash situation for 2.6.30. Can't we get
this fix in 2.6.30 and work on the API improvements for 2.6.31? I've
CCed Roland and the ofa general list to get everyone involved in this
thread so we can get this API design change going.
I agree we can clean this up moving forward, but lets fix the broken
2.6.30 code.
Will this work?
Steve.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [ofa-general] Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-05-12 3:06 ` Steve Wise
@ 2009-05-12 16:11 ` Steve Wise
2009-05-12 16:23 ` Steve Wise
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-05-12 16:11 UTC (permalink / raw)
To: Trond Myklebust
Cc: linux-nfs, Roland Dreier,
general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org,
Talpey, Thomas
Steve Wise wrote:
>Trond Myklebust wrote (earlier in this thread):
>
> All I should need to know is that I can advertise either dma handles or
> kernel VAs, and know that I can choose between two functions, say,
> ib_send_wr_fastreg_dma_init() and ib_send_wr_fastreg_kva_init() to
> initialise the ib_send_wr structure correctly.
To align more with the rest of the fast_reg API in ib_verbs.h, I propose:
static inline void ib_init_fast_reg_iova_start_dma(struct ib_send_wr
*send_wr, dma_addr_t dma);
static inline void ib_init_fast_reg_iova_start_kva(struct ib_send_wr
*send_wr, void *kva);
Thoughts?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [ofa-general] Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-05-12 16:11 ` [ofa-general] " Steve Wise
@ 2009-05-12 16:23 ` Steve Wise
2009-05-13 21:35 ` Roland Dreier
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-05-12 16:23 UTC (permalink / raw)
To: Trond Myklebust
Cc: linux-nfs, Talpey, Thomas, Roland Dreier,
general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
Steve Wise wrote:
> Steve Wise wrote:
>
> >Trond Myklebust wrote (earlier in this thread):
> >
> > All I should need to know is that I can advertise either dma handles or
> > kernel VAs, and know that I can choose between two functions, say,
> > ib_send_wr_fastreg_dma_init() and ib_send_wr_fastreg_kva_init() to
> > initialise the ib_send_wr structure correctly.
>
>
> To align more with the rest of the fast_reg API in ib_verbs.h, I propose:
>
> static inline void ib_init_fast_reg_iova_start_dma(struct ib_send_wr
> *send_wr, dma_addr_t dma);
> static inline void ib_init_fast_reg_iova_start_kva(struct ib_send_wr
> *send_wr, void *kva);
>
> Thoughts?
>
>
uncompiled patch:
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c179318..fb56930 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1940,6 +1940,30 @@ static inline void ib_update_fast_reg_key(struct
ib_mr *mr, u8 newkey)
}
/**
+ * ib_init_fast_reg_iova_start_dma - initializes the iova_start field
+ * based on a dma address supplied by the user.
+ * @wr - struct ib_send_wr pointer to be initialized
+ * @addr - dma_addr_t value to be used as the iova_start
+ */
+static inline void ib_init_fast_reg_iova_start_dma(struct ib_send_wr *wr,
+ dma_addr_t addr)
+{
+ wr->wr.fast_reg.iova_start = addr;
+}
+
+/**
+ * ib_init_fast_reg_iova_start_kva - initializes the iova_start field
+ * based on a kernel virtual address supplied by the user.
+ * @wr - struct ib_send_wr pointer to be initialized
+ * @addr - void * address to be used as the iova_start
+ */
+static inline void ib_init_fast_reg_iova_start_kva(struct ib_send_wr *wr,
+ void *addr)
+{
+ wr->wr.fast_reg.iova_start = (unsigned long)addr;
+}
+
+/**
* ib_alloc_mw - Allocates a memory window.
* @pd: The protection domain associated with the memory window.
*/
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [ofa-general] Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-05-12 16:23 ` Steve Wise
@ 2009-05-13 21:35 ` Roland Dreier
[not found] ` <adak54kr8iz.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Roland Dreier @ 2009-05-13 21:35 UTC (permalink / raw)
To: Steve Wise
Cc: Trond Myklebust, linux-nfs, Talpey, Thomas,
general@lists.openfabrics.org
> Trond Myklebust wrote (earlier in this thread):
> >
> > All I should need to know is that I can advertise either dma handles or
> > kernel VAs, and know that I can choose between two functions, say,
> > ib_send_wr_fastreg_dma_init() and ib_send_wr_fastreg_kva_init() to
> > initialise the ib_send_wr structure correctly.
I skimmed the earlier thread, and I have to say that I don't quite see
what the problem with assigning things to a u64 directly is. You can
use any address you want, and I don't quite understand why using the
correct cast to avoid sign extension or truncation problems is such a
big maintenance burden?
The code below really just looks like obfuscation to me -- are we going
to want to add something like
/**
* ib_init_fast_reg_iova_start_u64 - initializes the iova_start field
* based on a 64-bit address supplied by the user.
* @wr - struct ib_send_wr pointer to be initialized
* @addr - void * address to be used as the iova_start
*/
static inline void ib_init_fast_reg_iova_start_kva(struct ib_send_wr *wr,
u64 addr)
{
wr->wr.fast_reg.iova_start = addr;
}
next, to make sure we don't get confused about assigning a u64 to a u64?
It all looks a bit overcomplicated to me.
- R.
> /**
> + * ib_init_fast_reg_iova_start_dma - initializes the iova_start field
> + * based on a dma address supplied by the user.
> + * @wr - struct ib_send_wr pointer to be initialized
> + * @addr - dma_addr_t value to be used as the iova_start
> + */
> +static inline void ib_init_fast_reg_iova_start_dma(struct ib_send_wr *wr,
> + dma_addr_t addr)
> +{
> + wr->wr.fast_reg.iova_start = addr;
> +}
> +
> +/**
> + * ib_init_fast_reg_iova_start_kva - initializes the iova_start field
> + * based on a kernel virtual address supplied by the user.
> + * @wr - struct ib_send_wr pointer to be initialized
> + * @addr - void * address to be used as the iova_start
> + */
> +static inline void ib_init_fast_reg_iova_start_kva(struct ib_send_wr *wr,
> + void *addr)
> +{
> + wr->wr.fast_reg.iova_start = (unsigned long)addr;
> +}
> +
> +/**
> * ib_alloc_mw - Allocates a memory window.
> * @pd: The protection domain associated with the memory window.
> */
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [ofa-general] Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <adak54kr8iz.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2009-05-14 7:22 ` Or Gerlitz
[not found] ` <4A0BC6A6.1070002-smomgflXvOZWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 29+ messages in thread
From: Or Gerlitz @ 2009-05-14 7:22 UTC (permalink / raw)
To: Roland Dreier, Trond Myklebust
Cc: Steve Wise, linux-nfs, Talpey, Thomas,
general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> Trond Myklebust wrote
>> All I should need to know is that I can advertise either dma handles or kernel VAs
Maybe its obvious to some people here, but may I ask why there's a need
to post either dma address or kernel virtual address? is it application
need? hardware (e.g IB vs iWARP vs vendor implementation) specific? or
something else?
Or.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [ofa-general] Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
[not found] ` <4A0BC6A6.1070002-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2009-05-14 13:41 ` Steve Wise
2009-05-14 13:45 ` Or Gerlitz
0 siblings, 1 reply; 29+ messages in thread
From: Steve Wise @ 2009-05-14 13:41 UTC (permalink / raw)
To: Or Gerlitz
Cc: Roland Dreier, Trond Myklebust, linux-nfs, Talpey, Thomas,
general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
Or Gerlitz wrote:
>> Trond Myklebust wrote
>>> All I should need to know is that I can advertise either dma handles
>>> or kernel VAs
>
> Maybe its obvious to some people here, but may I ask why there's a
> need to post either dma address or kernel virtual address? is it
> application need? hardware (e.g IB vs iWARP vs vendor implementation)
> specific? or something else?
>
> Or.
>
>
The NFSRDMA transport uses Fast Register Memory Regions. In this
particular section of code, the NFSRDMA client is building a fastreg
work request to bind a page list to a fastreg mr. You can read about
this in the IBTA spec on memory management extensions, or in the RDMA
Verbs draft.
Steve.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [ofa-general] Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
2009-05-14 13:41 ` Steve Wise
@ 2009-05-14 13:45 ` Or Gerlitz
0 siblings, 0 replies; 29+ messages in thread
From: Or Gerlitz @ 2009-05-14 13:45 UTC (permalink / raw)
To: Steve Wise
Cc: Roland Dreier, Trond Myklebust, linux-nfs, Talpey, Thomas,
general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
Steve Wise wrote:
> The NFSRDMA transport uses Fast Register Memory Regions. In this
> particular section of code, the NFSRDMA client is building a fastreg
> work request to bind a page list to a fastreg mr. You can read about
> this in the IBTA spec on memory management extensions, or in the RDMA Verbs draft.
Hi Steve,
I was aware for the context being fastreg work request. I was thinking
that the spec mandates either dma addr or kva on the iova but from your reply
I assume to be wrong, thanks.
Or.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2009-05-14 13:45 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-24 19:05 [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client Steve Wise
[not found] ` <20090424190510.3134.90405.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
2009-04-25 14:11 ` Steve Wise
2009-04-26 18:57 ` Steve Wise
2009-04-27 2:17 ` Tom Talpey
[not found] ` <49f515a5.1d1e640a.1c82.6677-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-04-27 17:37 ` Steve Wise
2009-04-27 18:05 ` Trond Myklebust
[not found] ` <1240855510.8818.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-04-27 18:23 ` Trond Myklebust
[not found] ` <1240856613.8818.16.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-04-27 19:32 ` Steve Wise
2009-04-27 19:42 ` Tom Talpey
[not found] ` <49f60ac4.1c1d640a.2d0a.61a7-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-04-27 19:50 ` Steve Wise
2009-04-27 20:06 ` Tom Talpey
[not found] ` <49f61067.181e640a.3cb9.0e6c-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-04-27 20:20 ` Steve Wise
2009-04-27 20:46 ` Trond Myklebust
[not found] ` <1240865214.8818.73.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-04-27 20:49 ` Steve Wise
2009-05-11 22:25 ` Steve Wise
2009-05-11 22:50 ` Trond Myklebust
[not found] ` <1242082203.1743.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-12 0:13 ` Steve Wise
2009-05-12 0:23 ` Tom Talpey
[not found] ` <4a08c1b5.151e640a.0a99.fffff868-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-12 0:44 ` Steve Wise
2009-05-12 0:44 ` Trond Myklebust
[not found] ` <1242089066.1743.19.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-12 1:14 ` Tom Talpey
[not found] ` <4a08cd7b.48c3f10a.6bb1.fffff6d3-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-12 1:35 ` Trond Myklebust
[not found] ` <1242092150.16618.15.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-12 3:06 ` Steve Wise
2009-05-12 16:11 ` [ofa-general] " Steve Wise
2009-05-12 16:23 ` Steve Wise
2009-05-13 21:35 ` Roland Dreier
[not found] ` <adak54kr8iz.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-05-14 7:22 ` Or Gerlitz
[not found] ` <4A0BC6A6.1070002-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-05-14 13:41 ` Steve Wise
2009-05-14 13:45 ` Or Gerlitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox