From mboxrd@z Thu Jan 1 00:00:00 1970 From: jthumshirn@suse.de (Johannes Thumshirn) Date: Tue, 21 Mar 2017 09:12:56 -0400 Subject: Need some pointers to debug a KASAN splat in NVMe over Fabrics with rdma-rxe In-Reply-To: References: <5b16edcf-c39a-bc4b-0e32-8ccfb5d75efb@suse.de> <38a12b22-7bad-c069-4c96-39fc93d20d79@suse.de> Message-ID: <20170321131255.GA24294@linux-x5ow.site> On Thu, Mar 09, 2017@12:59:31PM +0200, Moni Shoua wrote: > >> 2. You can take a look at librxe implementation of init_send_wqe() (it > >> looks slightly different from kernel's implementation) and see what > >> happens if you change implementation accordingly. > > > > OK I'll have a look and hopefully come back with a (RFC) patch (fingers > > crossed). > > Thanks. Waiting to see what you found and in the meantime I'll try to > reproduce in my setup Hi Moni, I got the NVMf with RXE connection problems (minus the KASAN splats) traced down to the check_rkey() function. It bailes out on this code in check_rkey(): if (pkt->mask & RXE_WRITE_MASK) { if (resid > mtu) { if (pktlen != mtu || bth_pad(pkt)) { state = RESPST_ERR_LENGTH; goto err; } qp->resp.resid = mtu; } else { if (pktlen != resid) { This even happens if I set the mtu to 9000. I instrumented the driver to get some more debug information out of it and here's the last output I see before nvme enters an error state and reconnects: rdma_rxe: write_data_in: data_len: 4096, qp->resp.resid: 4096 rdma_rxe: check_rkey: mtu: 4096, resid: 12288, pktlen: 4096 rdma_rxe: write_data_in: data_len: 4096, qp->resp.resid: 4096 rdma_rxe: check_rkey: mtu: 4096, resid: 0, pktlen: 4096 rdma_rxe: qp#19 moved to error state nvme nvme0: RECV for CQE 0xffff88001f0ef240 failed with status WR flushed (5) qp->resp.resid comes from this hunk in check_rkey(): if (pkt->mask & (RXE_READ_MASK | RXE_WRITE_MASK)) { if (pkt->mask & RXE_RETH_MASK) { qp->resp.va = reth_va(pkt); qp->resp.rkey = reth_rkey(pkt); qp->resp.resid = reth_len(pkt); So I suppose reth_len() has some hiccups here. I'll continue debugging and report back any new findings. Byte, Johannes -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850