From mboxrd@z Thu Jan 1 00:00:00 1970 From: Venkat Venkatsubra Subject: Re: rds cq event handler issue Date: Tue, 06 Mar 2012 12:58:05 -0600 Message-ID: <4F565E3D.8030206@oracle.com> References: <4F5651E5.1020005@opengridcomputing.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-rdma , Netdev , Vipul Pandya To: Steve Wise Return-path: In-Reply-To: <4F5651E5.1020005-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 3/6/2012 12:05 PM, Steve Wise wrote: > Hey Venkat, > > I think I see a bug in the RDS RDMA module where RDS is not adhering > to the RDMA locking context. From the kernel tree > Documentation/infiniband/core_locking.txt: > > --- > The context in which completion event and asynchronous event > callbacks run is not defined. Depending on the low-level driver, it > may be process context, softirq context, or interrupt context. > Upper level protocol consumers may not sleep in a callback. > --- > > So RDMA ULPs cannot assume any certain context for their callback > functions. Yet I get a BUG_ON() when running RDS with iw_cxgb3 where > RDS is bugging in rds_rdma_free_op(): > > --- > /* Mark page dirty if it was possibly modified, which > * is the case for a RDMA_READ which copies from remote > * to local memory */ > if (!ro->op_write) { > BUG_ON(irqs_disabled()); > set_page_dirty(page); > } > --- > > And rds_rdma_free_op() can be called in the cq callback path. Here's > a stack trace when it bugged: > > --- > Call Trace: > [] :rds:rds_message_purge+0x54/0x79 > [] :rds:rds_message_put+0x41/0x4c > [] :rds_rdma:rds_iw_send_unmap_rm+0xe2/0xf2 > [] :rds_rdma:rds_iw_send_cq_comp_handler+0x193/0x2e5 > [] :iw_cxgb3:iwch_ev_dispatch+0x1df/0x2b1 > [] :iw_cxgb3:cxio_hal_ev_handler+0x6b/0xb4 > [] :cxgb3:process_rx+0x3d/0xa0 > [] :cxgb3:process_responses+0x120c/0x1350 > --- > > iwch_ev_dispatch() explicitly disables irqs to ensure proper > serialization: > > --- > spin_lock_irqsave(&chp->comp_handler_lock, flag); > (*chp->ibcq.comp_handler)(&chp->ibcq, chp->ibcq.cq_context); > spin_unlock_irqrestore(&chp->comp_handler_lock, flag); > --- > > I'm not sure if that BUG_ON() in rds_rdma_free_op() is valid or not. > If it is valid, then RDS needs to run this logic in a safe context, > not in the context of the CQ callback. It BUG_ON() is not valid, we > can remove it :). > > Can you comment? > > Thanks, > > Steve. > Hi Steve, Our internal latest code has a WARN_ON instead: --------------------- /* Mark page dirty if it was possibly modified, which * is the case for a RDMA_READ which copies from remote * to local memory */ if (!ro->op_write) { WARN_ON_ONCE(page_mapping(page) && irqs_disabled()); set_page_dirty(page); } --------------------- Venkat -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html