* Re: Problems using krping
[not found] ` <201001211807011710722-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
@ 2010-01-21 14:40 ` Steve Wise
0 siblings, 0 replies; 6+ messages in thread
From: Steve Wise @ 2010-01-21 14:40 UTC (permalink / raw)
To: lihaidong; +Cc: linux-rdma
It appears the MLX4 driver does not support kernel mode memory regions.
You'll have to use dma mrs or fast_reg mrs with that device.
Steve.
lihaidong wrote:
> Mr.Wise:
>
> When using mr mode as the memory registration method, krping failed to
> get memory region using ib_reg_phys_mr().Could you help me, please?
>
> dmesg:
> krping_init
> krping: proc write |client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1,verbose|
> client
> ipaddr (10.10.10.15)
> port 9999
> count 1
> verbose
> created cm_id ffff88013c74c800
> cma_event type 0 cma_id ffff88013c74c800 (parent)
> cma_event type 2 cma_id ffff88013c74c800 (parent)
> rdma_resolve_addr - rdma_resolve_route successful
> created pd ffff880133845280
> created cq ffff88013c1ff400
> created qp ffff88013c1ffe00
> krping: krping_setup_buffers called on cb ffff88013c59f800
> krping: recv buf dma_addr 13c59f968 size 16
> krping: recv_buf reg_mr failed
> krping: krping_setup_buffers failed: -38
> destroy cm_id ffff88013c74c800
> krping: proc write |client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1,verbose|
> client
> ipaddr (10.10.10.15)
> port 9999
> count 1
> verbose
> created cm_id ffff88013c59f800
> cma_event type 0 cma_id ffff88013c59f800 (parent)
> cma_event type 2 cma_id ffff88013c59f800 (parent)
> rdma_resolve_addr - rdma_resolve_route successful
> created pd ffff88012f5964a0
> created cq ffff88012faee400
> created qp ffff88012faeec00
> krping: krping_setup_buffers called on cb ffff88013c71d400
> krping: recv buf dma_addr 13c71d568 size 16
> krping: recv_buf reg_mr failed
> krping: krping_setup_buffers failed: -38
> destroy cm_id ffff88013c59f800
>
>
> echo "client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1" > /proc/krping
>
> echo "server,addr=10.10.10.15,mem_mode=mr,port=9999" > /proc/krping
>
> Using OFED-1.5 ofa_kernel-1.5
> HardWare:Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
>
> put krping source files into drivers/infiniband/hw/mlx4
>
> 2010-01-21
> ------------------------------------------------------------------------
> lihaidong
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems using krping
[not found] ` <201001221533186875550-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
@ 2010-01-22 15:16 ` Steve Wise
0 siblings, 0 replies; 6+ messages in thread
From: Steve Wise @ 2010-01-22 15:16 UTC (permalink / raw)
To: lihaidong; +Cc: linux-rdma
lihaidong wrote:
> Mr.Wise:
> Sorry to Bother you Another problem.
> fastreg(Must with local_dma_key to avoid using ib_reg_phys_mr in my
> case) with read_inv succeed, yet with server_inv failed.
>
> What does
> 'krping: cq completion failed with wr_id 0 status 6 opcode -1 vender_err 78'
> in client dmesg info mean?
Maybe the mlx4 experts can comment on status 6 vender_err 78?
> Why data transferring happened before server waked up from waiting for
> CONNECTED?
This can happen. There is a race between getting the CONNECTED event
and the first incoming data completion (like a recv completion).
> Client demsg:
> krping: proc write |client,addr=10.10.10.15,port=8888,count=1,verbose,local_dma_lkey,mem_mode=fastreg,server_inv|
> client
> ipaddr (10.10.10.15)
> port 8888
> count 1
> verbose
> using local dma lkey
> created cm_id ffff88013cd11000
> cma_event type 0 cma_id ffff88013cd11000 (parent)
> cma_event type 2 cma_id ffff88013cd11000 (parent)
> Fastreg supported - device_cap_flags 0x7c9c76
> rdma_resolve_addr - rdma_resolve_route successful
> created pd ffff88012a9fff80
> created cq ffff8801305d5400
> created qp ffff8801305d5800
> krping: krping_setup_buffers called on cb ffff88013cd11800
> krping: fastreg rkey 0x88001a00 page_list ffff88012b3cee80 page_list_len 1
> krping: allocated & registered buffers...
> cma_event type 9 cma_id ffff88013cd11000 (parent)
> ESTABLISHED
> rdma_connect successful
> krping: page_list[0] 0x12a40c000
> krping: post_inv = 0, fastreg new rkey 0x88001a01 shift 12 len 64 iova_start 12a40c180 page_list_len 1
> RDMA addr 12a40c180 rkey 88001a01 len 64
> krping: cq completion failed with wr_id 0 status 6 opcode -1 vender_err 78
> krping: cq completion in ERROR state
> krping: krping_format_send failed
> krping_free_buffers called on cb ffff88013cd11800
> destroy cm_id ffff88013cd11000
>
> Server dmesg:
> krping: proc write |server,addr=10.10.10.15,port=8888,count=1,verbose,local_dma_lkey,mem_mode=fastreg,server_inv|
> server
> ipaddr (10.10.10.15)
> port 8888
> count 1
> verbose
> using local dma lkey
> created cm_id ffff88007e3c5c00
> rdma_bind_addr successful
> rdma_listen
> cma_event type 4 cma_id ffff88003e0dac00 (child)
> child cma ffff88003e0dac00
> Fastreg supported - device_cap_flags 0x7c9c76
> created pd ffff880035d17d20
> created cq ffff88001e0f9e00
> created qp ffff88001e0f9c00
> krping: krping_setup_buffers called on cb ffff88007e2b0000
> krping: fastreg rkey 0x68001d00 page_list ffff8800027ab580 page_list_len 1
> krping: allocated & registered buffers...
> accepting client connection request
> cma_event type 9 cma_id ffff88003e0dac00 (child)
> ESTABLISHED
> cma_event type 10 cma_id ffff88003e0dac00 (child)
> krping: DISCONNECT EVENT...
> krping: wait for CONNECTED state 10
> krping: connect error -1
> krping_free_buffers called on cb ffff88007e2b0000
> destroy cm_id ffff88007e3c5c00
> 2010-01-22
> ------------------------------------------------------------------------
> lihaidong
> ------------------------------------------------------------------------
> *发件人:* Steve Wise
> *发送时间:* 2010-01-21 06:43:30
> *收件人:* lihaidong
> *抄送:* linux-rdma
> *主题:* Re: Problems using krping
> It appears the MLX4 driver does not support kernel mode memory regions.
> You'll have to use dma mrs or fast_reg mrs with that device.
> Steve.
> lihaidong wrote:
> > Mr.Wise:
> >
> > When using mr mode as the memory registration method, krping failed to
> > get memory region using ib_reg_phys_mr().Could you help me, please?
> >
> > dmesg:
> > krping_init
> > krping: proc write |client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1,verbose|
> > client
> > ipaddr (10.10.10.15)
> > port 9999
> > count 1
> > verbose
> > created cm_id ffff88013c74c800
> > cma_event type 0 cma_id ffff88013c74c800 (parent)
> > cma_event type 2 cma_id ffff88013c74c800 (parent)
> > rdma_resolve_addr - rdma_resolve_route successful
> > created pd ffff880133845280
> > created cq ffff88013c1ff400
> > created qp ffff88013c1ffe00
> > krping: krping_setup_buffers called on cb ffff88013c59f800
> > krping: recv buf dma_addr 13c59f968 size 16
> > krping: recv_buf reg_mr failed
> > krping: krping_setup_buffers failed: -38
> > destroy cm_id ffff88013c74c800
> > krping: proc write |client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1,verbose|
> > client
> > ipaddr (10.10.10.15)
> > port 9999
> > count 1
> > verbose
> > created cm_id ffff88013c59f800
> > cma_event type 0 cma_id ffff88013c59f800 (parent)
> > cma_event type 2 cma_id ffff88013c59f800 (parent)
> > rdma_resolve_addr - rdma_resolve_route successful
> > created pd ffff88012f5964a0
> > created cq ffff88012faee400
> > created qp ffff88012faeec00
> > krping: krping_setup_buffers called on cb ffff88013c71d400
> > krping: recv buf dma_addr 13c71d568 size 16
> > krping: recv_buf reg_mr failed
> > krping: krping_setup_buffers failed: -38
> > destroy cm_id ffff88013c59f800
> >
> >
> > echo "client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1" > /proc/krping
> >
> > echo "server,addr=10.10.10.15,mem_mode=mr,port=9999" > /proc/krping
> >
> > Using OFED-1.5 ofa_kernel-1.5
> > HardWare:Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
> >
> > put krping source files into drivers/infiniband/hw/mlx4
> >
> > 2010-01-21
> > ------------------------------------------------------------------------
> > lihaidong
> __________ Information from ESET NOD32 Antivirus, version of virus signature database 4788 (20100120) __________
> The message was checked by ESET NOD32 Antivirus.
> http://www.eset.com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems using krping
[not found] ` <201001242242553436345-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
@ 2010-01-24 21:44 ` Steve Wise
0 siblings, 0 replies; 6+ messages in thread
From: Steve Wise @ 2010-01-24 21:44 UTC (permalink / raw)
To: lihaidong; +Cc: linux-rdma
lihaidong wrote:
> Mr.Wise
> I'm actually rewritting your program in order to get familiar with
> Verbs+CMA API. In order to make the procedure more clearly, I put
> nearly all the stuff into two long functions, one for server, the
> other for client, the cq/cma event handler are the only exception.
> I get the program run step by step.I use DMA mode firstly. As mlx4
> driver don't support MR/MW mode, I turned to FMR.
> Before adding FMR codes, I want make local_dma_lkey option work.ie.
> mem_mode=dma,local_dma_lkey
> I run into a strange problem here.
> I changed the sgl's lkey into local_dma_lkey when preparing recv send
> , rdma write wrs.
> The problem is : After server post RDMA Read wr, get completion, and
> print the data read from client.These are all normal. But after post a
> send wr to indicate client to go ahead, instead of receving a
> IB_WC_SEND wc ,the cq event handler get an event whose status is not
> 0, so it print something as follows:
> cq completion failed with wr_id 0 opcode 2 status 4 vendor_err 52<3>
>
> the opcode is 2, so it is an event of RDMA read, isn't weird? Why it
> comes again and in wrong status?
> the status 4 means IB_WC_LOC_PROT_ERR, is it a base/bounds violation?
> How could this happen? The remote_len told by client is equal to cb->size.
Maybe the opcode is not valid for error CQEs with mlx4? I seem to
remember that was the case for mthca. You could make the wr_id's in the
WRs unique, then correlate the wr_id in the CQE to verify this.
LOC_PROT_ERR usually means the MR doesn't have the appropriate access
rights.
>
>
> ps: Why recv_sgl send_sgl uses dma_mr->lkey while rdma_sgl use
> dma_mr->rkey?
> Could recv_sgl uses dma_mr->rkey or rdma_sgl use dma_mr_lkey? Why?
For iWARP, the targer or sink of a read must have remote write. So you
must use the rkey if you want the code to run on both IB and iWARP...
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems using krping
[not found] ` <201001251036131874884-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
@ 2010-01-25 3:28 ` Steve Wise
0 siblings, 0 replies; 6+ messages in thread
From: Steve Wise @ 2010-01-25 3:28 UTC (permalink / raw)
To: lihaidong; +Cc: linux-rdma
>
> >> Why data transferring happened before server waked up from waiting for
> >> CONNECTED?
> >This can happen. There is a race between getting the CONNECTED event
> >and the first incoming data completion (like a recv completion).
> Could give me some tips to fix this? Since this sometimes will upset
> normal state machine, which a stable program does not allow it appears.
> --
What normal state machine? Its the nature of the beast. Make your
state machine handle it. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems using krping
[not found] ` <201001261339589687690-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
@ 2010-01-26 15:01 ` Steve Wise
0 siblings, 0 replies; 6+ messages in thread
From: Steve Wise @ 2010-01-26 15:01 UTC (permalink / raw)
To: lihaidong; +Cc: linux-rdma
lihaidong wrote:
> Mr.Wise
> ||| else {
> |||
> ||| cb->rdma_sq_wr.opcode = IB_WR_RDMA_READ;
> ||| if (cb->mem == FASTREG) {
> ||| /*
> ||| * Immediately follow the read with a
> ||| * fenced LOCAL_INV.
> ||| */
> ||| cb->rdma_sq_wr.next = &inv;
> ||| memset(&inv, 0, sizeof inv);
> ||| inv.opcode = IB_WR_LOCAL_INV;
> ||| inv.ex.invalidate_rkey = cb->fastreg_mr->rkey;
> ||| inv.send_flags = IB_SEND_FENCE;
> ||| }
> ||| }
> |||
> ||| ret = ib_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr);
> ||| if (ret) {
> ||| printk(KERN_ERR PFX "post send error %d\n", ret);
> ||| break;
> ||| }
> ||| cb->rdma_sq_wr.next = NULL;
> the last line, is that safe? There's an invalidate wr following
> rdma_sq_wr.This line will not disturb invalidate wr being performed in
> lower level, right?
>
Correct. The work requests are copied by the verbs layer. So once you
return from ib_post_send, the send work request structure itself can be
reused/freed/whatever.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems using krping
[not found] ` <201001262046537030587-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
@ 2010-01-26 15:02 ` Steve Wise
0 siblings, 0 replies; 6+ messages in thread
From: Steve Wise @ 2010-01-26 15:02 UTC (permalink / raw)
To: lihaidong; +Cc: linux-rdma
lihaidong wrote:
> Mr.Wise
You need to start calling me Steve. You're making me feel older than I
am. ;-)
> When krping setup buffers failed, ib_free_fast_reg_page_list is used;
> When krping free buffers, ib_free_fast_reg_page_list is not used.
> Though krping still works well, could you an comment on that?
Sounds like a bug.
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-01-26 15:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <201001211807011710722@inspur.com>
[not found] ` <201001211807011710722-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-21 14:40 ` Problems using krping Steve Wise
[not found] ` <201001221533186875550@inspur.com>
[not found] ` <201001221533186875550-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-22 15:16 ` Steve Wise
[not found] ` <201001242242553436345@inspur.com>
[not found] ` <201001242242553436345-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-24 21:44 ` Steve Wise
[not found] ` <201001251036131874884@inspur.com>
[not found] ` <201001251036131874884-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-25 3:28 ` Steve Wise
[not found] ` <201001261339589687690@inspur.com>
[not found] ` <201001261339589687690-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-26 15:01 ` Steve Wise
[not found] ` <201001262046537030587@inspur.com>
[not found] ` <201001262046537030587-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-26 15:02 ` Steve Wise
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox