From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [RFC PATCH v2 03/14] xsk: add umem fill queue support and mmap Date: Thu, 12 Apr 2018 05:15:42 +0300 Message-ID: <20180412050542-mutt-send-email-mst@kernel.org> References: <20180327165919.17933-1-bjorn.topel@gmail.com> <20180327165919.17933-4-bjorn.topel@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: magnus.karlsson@intel.com, alexander.h.duyck@intel.com, alexander.duyck@gmail.com, john.fastabend@gmail.com, ast@fb.com, brouer@redhat.com, willemdebruijn.kernel@gmail.com, daniel@iogearbox.net, netdev@vger.kernel.org, michael.lundkvist@ericsson.com, jesse.brandeburg@intel.com, anjali.singhai@intel.com, qi.z.zhang@intel.com, ravineet.singh@ericsson.com To: =?iso-8859-1?Q?Bj=F6rn_T=F6pel?= Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:45320 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751633AbeDLCPq (ORCPT ); Wed, 11 Apr 2018 22:15:46 -0400 Content-Disposition: inline In-Reply-To: <20180327165919.17933-4-bjorn.topel@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, Mar 27, 2018 at 06:59:08PM +0200, Björn Töpel wrote: > @@ -30,4 +31,18 @@ struct xdp_umem_reg { > __u32 frame_headroom; /* Frame head room */ > }; > > +/* Pgoff for mmaping the rings */ > +#define XDP_UMEM_PGOFF_FILL_QUEUE 0x100000000 > + > +struct xdp_queue { > + __u32 head_idx __attribute__((aligned(64))); > + __u32 tail_idx __attribute__((aligned(64))); > +}; > + > +/* Used for the fill and completion queues for buffers */ > +struct xdp_umem_queue { > + struct xdp_queue ptrs; > + __u32 desc[0] __attribute__((aligned(64))); > +}; > + > #endif /* _LINUX_IF_XDP_H */ So IIUC it's a head/tail ring of 32 bit descriptors. In my experience (from implementing ptr_ring) this implies that head/tail cache lines bounce a lot between CPUs. Caching will help some. You are also forced to use barriers to check validity which is slow on some architectures. If instead you can use a special descriptor value (e.g. 0) as a valid signal, things work much better: - you read descriptor atomically, if it's not 0 it's fine - same with write - write 0 to pass it to the other side - there is a data dependency so no need for barriers (except on dec alpha) - no need for power of 2 limitations, you can make it any size you like - easy to resize too architecture (if not implementation) would be shared with ptr_ring so some of the optimization ideas like batched updates could be lifted from there. When I was building ptr_ring, any head/tail design underperformed storing valid flag with data itself. YMMV. -- MST