From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Subject: Re: Kernel fast memory registration API proposal [RFC] Date: Fri, 17 Jul 2015 14:36:02 -0600 Message-ID: <20150717203602.GA21949@obsidianresearch.com> References: <20150715171926.GB23588@obsidianresearch.com> <20150715224928.GA941@obsidianresearch.com> <20150716174046.GB3680@obsidianresearch.com> <20150716204932.GA10638@obsidianresearch.com> <62F9F5B8-0A18-4DF8-B47E-7408BFFE9904@oracle.com> <20150717172141.GA15808@obsidianresearch.com> <9A70883F-9963-42D0-9F5C-EF49F822A037@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <9A70883F-9963-42D0-9F5C-EF49F822A037-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Chuck Lever Cc: Sagi Grimberg , Christoph Hellwig , "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Steve Wise , Or Gerlitz , Oren Duer , Bart Van Assche , Liran Liss , "Hefty, Sean" , Doug Ledford , Tom Talpey List-Id: linux-rdma@vger.kernel.org On Fri, Jul 17, 2015 at 03:26:04PM -0400, Chuck Lever wrote: > > I'd say the above is broadly typical for what I'd consider correct = use > > of a RDMA QP.. The three flow control loops of #0 should be fairly = obvious > > and explicit in the code. >=20 > Jason, thanks for your comments and your time. No problem, I hope you can work something out and keep participating in the various new API discussions! > Some send queue accounting is already in place (see DECR_CQCOUNT). > I=E2=80=99m sure that can be enhanced. What may be missing is a check= for > available send queue resources before dispatching the next RPC. Just some more clarity and colour: I talked about tracking SQEs, this is explicitly monitoring the SQ and preventing overflow, but I'm assuming that there is a 1:1 mapping of SQ to CQ -> ie the CQ is not shared. In this case, the SQE limit is the smaller of the two queues and tracking the SQEs tracks the CQ space. If the CQ is shared, then the CQ itself should also be tracked, and nobody can post to a related Q without CQ space. This forms a fourth flow control loop. So language wise, talk about tracking SQE (send queue entries), and if you have shared CQs then add a CQ count. Implementation wise, I often use wrapping 64 bit counters to keep track of this stuff. Every SQE post incres the head and every SCQ reap incrs the tail, (head-tail) < limit is the main math. This lets the counter be used as a record, and aids debugging, see below > However, if we start signaling more aggressively when the send > queue is full, that means intentionally multiplying the completion > and interrupt rate when the workload is heaviest. That could have > performance scalability consequences. Consider, it is also possible that the SQ is full because we are not signaling enough: There are many unreaped entries. There are many different schemes that are possible here.. What I described was something simple and easy to understand, while still thinking about various deadlock situations. Something like this is a more complete example: uint64_t head_sqe; uint64_t tail_sqe; uint64_t signaled_sqe; if (need_signal || =20 (head_sqe - signaled_sqe) >=3D sqe_limit/2 || ((head_sqe - tail_sqe) >=3D (sqe_limit - N) && (head_sqe - signaled_sqe) >=3D sqe_limit/4) && ring64_gt(signaled_sqe,tail_sqe)) { wr[0].send_flags |=3D IB_SEND_SIGNALED; signaled_sqe =3D head_sqe; } ib_post(..,1); head_sqe +=3D 1; assert(head_sqe - tail_sqe < sqe_limit); - Every SQE that crosses a 1/2 marker get a signal at the marker. - Upon going full we start signaling, unless we signaled recently, and the last signal has not been reaped. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" i= n the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html