* Re: Problems using krping [not found] ` <201001211807011710722-6gUaA8visnnQT0dZR+AlfA@public.gmane.org> @ 2010-01-21 14:40 ` Steve Wise 0 siblings, 0 replies; 6+ messages in thread From: Steve Wise @ 2010-01-21 14:40 UTC (permalink / raw) To: lihaidong; +Cc: linux-rdma It appears the MLX4 driver does not support kernel mode memory regions. You'll have to use dma mrs or fast_reg mrs with that device. Steve. lihaidong wrote: > Mr.Wise: > > When using mr mode as the memory registration method, krping failed to > get memory region using ib_reg_phys_mr().Could you help me, please? > > dmesg: > krping_init > krping: proc write |client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1,verbose| > client > ipaddr (10.10.10.15) > port 9999 > count 1 > verbose > created cm_id ffff88013c74c800 > cma_event type 0 cma_id ffff88013c74c800 (parent) > cma_event type 2 cma_id ffff88013c74c800 (parent) > rdma_resolve_addr - rdma_resolve_route successful > created pd ffff880133845280 > created cq ffff88013c1ff400 > created qp ffff88013c1ffe00 > krping: krping_setup_buffers called on cb ffff88013c59f800 > krping: recv buf dma_addr 13c59f968 size 16 > krping: recv_buf reg_mr failed > krping: krping_setup_buffers failed: -38 > destroy cm_id ffff88013c74c800 > krping: proc write |client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1,verbose| > client > ipaddr (10.10.10.15) > port 9999 > count 1 > verbose > created cm_id ffff88013c59f800 > cma_event type 0 cma_id ffff88013c59f800 (parent) > cma_event type 2 cma_id ffff88013c59f800 (parent) > rdma_resolve_addr - rdma_resolve_route successful > created pd ffff88012f5964a0 > created cq ffff88012faee400 > created qp ffff88012faeec00 > krping: krping_setup_buffers called on cb ffff88013c71d400 > krping: recv buf dma_addr 13c71d568 size 16 > krping: recv_buf reg_mr failed > krping: krping_setup_buffers failed: -38 > destroy cm_id ffff88013c59f800 > > > echo "client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1" > /proc/krping > > echo "server,addr=10.10.10.15,mem_mode=mr,port=9999" > /proc/krping > > Using OFED-1.5 ofa_kernel-1.5 > HardWare:Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] > > put krping source files into drivers/infiniband/hw/mlx4 > > 2010-01-21 > ------------------------------------------------------------------------ > lihaidong -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <201001221533186875550@inspur.com>]
[parent not found: <201001221533186875550-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>]
* Re: Problems using krping [not found] ` <201001221533186875550-6gUaA8visnnQT0dZR+AlfA@public.gmane.org> @ 2010-01-22 15:16 ` Steve Wise 0 siblings, 0 replies; 6+ messages in thread From: Steve Wise @ 2010-01-22 15:16 UTC (permalink / raw) To: lihaidong; +Cc: linux-rdma lihaidong wrote: > Mr.Wise: > Sorry to Bother you Another problem. > fastreg(Must with local_dma_key to avoid using ib_reg_phys_mr in my > case) with read_inv succeed, yet with server_inv failed. > > What does > 'krping: cq completion failed with wr_id 0 status 6 opcode -1 vender_err 78' > in client dmesg info mean? Maybe the mlx4 experts can comment on status 6 vender_err 78? > Why data transferring happened before server waked up from waiting for > CONNECTED? This can happen. There is a race between getting the CONNECTED event and the first incoming data completion (like a recv completion). > Client demsg: > krping: proc write |client,addr=10.10.10.15,port=8888,count=1,verbose,local_dma_lkey,mem_mode=fastreg,server_inv| > client > ipaddr (10.10.10.15) > port 8888 > count 1 > verbose > using local dma lkey > created cm_id ffff88013cd11000 > cma_event type 0 cma_id ffff88013cd11000 (parent) > cma_event type 2 cma_id ffff88013cd11000 (parent) > Fastreg supported - device_cap_flags 0x7c9c76 > rdma_resolve_addr - rdma_resolve_route successful > created pd ffff88012a9fff80 > created cq ffff8801305d5400 > created qp ffff8801305d5800 > krping: krping_setup_buffers called on cb ffff88013cd11800 > krping: fastreg rkey 0x88001a00 page_list ffff88012b3cee80 page_list_len 1 > krping: allocated & registered buffers... > cma_event type 9 cma_id ffff88013cd11000 (parent) > ESTABLISHED > rdma_connect successful > krping: page_list[0] 0x12a40c000 > krping: post_inv = 0, fastreg new rkey 0x88001a01 shift 12 len 64 iova_start 12a40c180 page_list_len 1 > RDMA addr 12a40c180 rkey 88001a01 len 64 > krping: cq completion failed with wr_id 0 status 6 opcode -1 vender_err 78 > krping: cq completion in ERROR state > krping: krping_format_send failed > krping_free_buffers called on cb ffff88013cd11800 > destroy cm_id ffff88013cd11000 > > Server dmesg: > krping: proc write |server,addr=10.10.10.15,port=8888,count=1,verbose,local_dma_lkey,mem_mode=fastreg,server_inv| > server > ipaddr (10.10.10.15) > port 8888 > count 1 > verbose > using local dma lkey > created cm_id ffff88007e3c5c00 > rdma_bind_addr successful > rdma_listen > cma_event type 4 cma_id ffff88003e0dac00 (child) > child cma ffff88003e0dac00 > Fastreg supported - device_cap_flags 0x7c9c76 > created pd ffff880035d17d20 > created cq ffff88001e0f9e00 > created qp ffff88001e0f9c00 > krping: krping_setup_buffers called on cb ffff88007e2b0000 > krping: fastreg rkey 0x68001d00 page_list ffff8800027ab580 page_list_len 1 > krping: allocated & registered buffers... > accepting client connection request > cma_event type 9 cma_id ffff88003e0dac00 (child) > ESTABLISHED > cma_event type 10 cma_id ffff88003e0dac00 (child) > krping: DISCONNECT EVENT... > krping: wait for CONNECTED state 10 > krping: connect error -1 > krping_free_buffers called on cb ffff88007e2b0000 > destroy cm_id ffff88007e3c5c00 > 2010-01-22 > ------------------------------------------------------------------------ > lihaidong > ------------------------------------------------------------------------ > *发件人:* Steve Wise > *发送时间:* 2010-01-21 06:43:30 > *收件人:* lihaidong > *抄送:* linux-rdma > *主题:* Re: Problems using krping > It appears the MLX4 driver does not support kernel mode memory regions. > You'll have to use dma mrs or fast_reg mrs with that device. > Steve. > lihaidong wrote: > > Mr.Wise: > > > > When using mr mode as the memory registration method, krping failed to > > get memory region using ib_reg_phys_mr().Could you help me, please? > > > > dmesg: > > krping_init > > krping: proc write |client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1,verbose| > > client > > ipaddr (10.10.10.15) > > port 9999 > > count 1 > > verbose > > created cm_id ffff88013c74c800 > > cma_event type 0 cma_id ffff88013c74c800 (parent) > > cma_event type 2 cma_id ffff88013c74c800 (parent) > > rdma_resolve_addr - rdma_resolve_route successful > > created pd ffff880133845280 > > created cq ffff88013c1ff400 > > created qp ffff88013c1ffe00 > > krping: krping_setup_buffers called on cb ffff88013c59f800 > > krping: recv buf dma_addr 13c59f968 size 16 > > krping: recv_buf reg_mr failed > > krping: krping_setup_buffers failed: -38 > > destroy cm_id ffff88013c74c800 > > krping: proc write |client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1,verbose| > > client > > ipaddr (10.10.10.15) > > port 9999 > > count 1 > > verbose > > created cm_id ffff88013c59f800 > > cma_event type 0 cma_id ffff88013c59f800 (parent) > > cma_event type 2 cma_id ffff88013c59f800 (parent) > > rdma_resolve_addr - rdma_resolve_route successful > > created pd ffff88012f5964a0 > > created cq ffff88012faee400 > > created qp ffff88012faeec00 > > krping: krping_setup_buffers called on cb ffff88013c71d400 > > krping: recv buf dma_addr 13c71d568 size 16 > > krping: recv_buf reg_mr failed > > krping: krping_setup_buffers failed: -38 > > destroy cm_id ffff88013c59f800 > > > > > > echo "client,addr=10.10.10.15,mem_mode=mr,port=9999,count=1" > /proc/krping > > > > echo "server,addr=10.10.10.15,mem_mode=mr,port=9999" > /proc/krping > > > > Using OFED-1.5 ofa_kernel-1.5 > > HardWare:Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] > > > > put krping source files into drivers/infiniband/hw/mlx4 > > > > 2010-01-21 > > ------------------------------------------------------------------------ > > lihaidong > __________ Information from ESET NOD32 Antivirus, version of virus signature database 4788 (20100120) __________ > The message was checked by ESET NOD32 Antivirus. > http://www.eset.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <201001242242553436345@inspur.com>]
[parent not found: <201001242242553436345-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>]
* Re: Problems using krping [not found] ` <201001242242553436345-6gUaA8visnnQT0dZR+AlfA@public.gmane.org> @ 2010-01-24 21:44 ` Steve Wise 0 siblings, 0 replies; 6+ messages in thread From: Steve Wise @ 2010-01-24 21:44 UTC (permalink / raw) To: lihaidong; +Cc: linux-rdma lihaidong wrote: > Mr.Wise > I'm actually rewritting your program in order to get familiar with > Verbs+CMA API. In order to make the procedure more clearly, I put > nearly all the stuff into two long functions, one for server, the > other for client, the cq/cma event handler are the only exception. > I get the program run step by step.I use DMA mode firstly. As mlx4 > driver don't support MR/MW mode, I turned to FMR. > Before adding FMR codes, I want make local_dma_lkey option work.ie. > mem_mode=dma,local_dma_lkey > I run into a strange problem here. > I changed the sgl's lkey into local_dma_lkey when preparing recv send > , rdma write wrs. > The problem is : After server post RDMA Read wr, get completion, and > print the data read from client.These are all normal. But after post a > send wr to indicate client to go ahead, instead of receving a > IB_WC_SEND wc ,the cq event handler get an event whose status is not > 0, so it print something as follows: > cq completion failed with wr_id 0 opcode 2 status 4 vendor_err 52<3> > > the opcode is 2, so it is an event of RDMA read, isn't weird? Why it > comes again and in wrong status? > the status 4 means IB_WC_LOC_PROT_ERR, is it a base/bounds violation? > How could this happen? The remote_len told by client is equal to cb->size. Maybe the opcode is not valid for error CQEs with mlx4? I seem to remember that was the case for mthca. You could make the wr_id's in the WRs unique, then correlate the wr_id in the CQE to verify this. LOC_PROT_ERR usually means the MR doesn't have the appropriate access rights. > > > ps: Why recv_sgl send_sgl uses dma_mr->lkey while rdma_sgl use > dma_mr->rkey? > Could recv_sgl uses dma_mr->rkey or rdma_sgl use dma_mr_lkey? Why? For iWARP, the targer or sink of a read must have remote write. So you must use the rkey if you want the code to run on both IB and iWARP... Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <201001251036131874884@inspur.com>]
[parent not found: <201001251036131874884-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>]
* Re: Problems using krping [not found] ` <201001251036131874884-6gUaA8visnnQT0dZR+AlfA@public.gmane.org> @ 2010-01-25 3:28 ` Steve Wise 0 siblings, 0 replies; 6+ messages in thread From: Steve Wise @ 2010-01-25 3:28 UTC (permalink / raw) To: lihaidong; +Cc: linux-rdma > > >> Why data transferring happened before server waked up from waiting for > >> CONNECTED? > >This can happen. There is a race between getting the CONNECTED event > >and the first incoming data completion (like a recv completion). > Could give me some tips to fix this? Since this sometimes will upset > normal state machine, which a stable program does not allow it appears. > -- What normal state machine? Its the nature of the beast. Make your state machine handle it. :) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <201001261339589687690@inspur.com>]
[parent not found: <201001261339589687690-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>]
* Re: Problems using krping [not found] ` <201001261339589687690-6gUaA8visnnQT0dZR+AlfA@public.gmane.org> @ 2010-01-26 15:01 ` Steve Wise 0 siblings, 0 replies; 6+ messages in thread From: Steve Wise @ 2010-01-26 15:01 UTC (permalink / raw) To: lihaidong; +Cc: linux-rdma lihaidong wrote: > Mr.Wise > ||| else { > ||| > ||| cb->rdma_sq_wr.opcode = IB_WR_RDMA_READ; > ||| if (cb->mem == FASTREG) { > ||| /* > ||| * Immediately follow the read with a > ||| * fenced LOCAL_INV. > ||| */ > ||| cb->rdma_sq_wr.next = &inv; > ||| memset(&inv, 0, sizeof inv); > ||| inv.opcode = IB_WR_LOCAL_INV; > ||| inv.ex.invalidate_rkey = cb->fastreg_mr->rkey; > ||| inv.send_flags = IB_SEND_FENCE; > ||| } > ||| } > ||| > ||| ret = ib_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); > ||| if (ret) { > ||| printk(KERN_ERR PFX "post send error %d\n", ret); > ||| break; > ||| } > ||| cb->rdma_sq_wr.next = NULL; > the last line, is that safe? There's an invalidate wr following > rdma_sq_wr.This line will not disturb invalidate wr being performed in > lower level, right? > Correct. The work requests are copied by the verbs layer. So once you return from ib_post_send, the send work request structure itself can be reused/freed/whatever. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <201001262046537030587@inspur.com>]
[parent not found: <201001262046537030587-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>]
* Re: Problems using krping [not found] ` <201001262046537030587-6gUaA8visnnQT0dZR+AlfA@public.gmane.org> @ 2010-01-26 15:02 ` Steve Wise 0 siblings, 0 replies; 6+ messages in thread From: Steve Wise @ 2010-01-26 15:02 UTC (permalink / raw) To: lihaidong; +Cc: linux-rdma lihaidong wrote: > Mr.Wise You need to start calling me Steve. You're making me feel older than I am. ;-) > When krping setup buffers failed, ib_free_fast_reg_page_list is used; > When krping free buffers, ib_free_fast_reg_page_list is not used. > Though krping still works well, could you an comment on that? Sounds like a bug. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-01-26 15:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <201001211807011710722@inspur.com>
[not found] ` <201001211807011710722-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-21 14:40 ` Problems using krping Steve Wise
[not found] ` <201001221533186875550@inspur.com>
[not found] ` <201001221533186875550-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-22 15:16 ` Steve Wise
[not found] ` <201001242242553436345@inspur.com>
[not found] ` <201001242242553436345-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-24 21:44 ` Steve Wise
[not found] ` <201001251036131874884@inspur.com>
[not found] ` <201001251036131874884-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-25 3:28 ` Steve Wise
[not found] ` <201001261339589687690@inspur.com>
[not found] ` <201001261339589687690-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-26 15:01 ` Steve Wise
[not found] ` <201001262046537030587@inspur.com>
[not found] ` <201001262046537030587-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2010-01-26 15:02 ` Steve Wise
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox