From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matt Wilson Subject: Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used Date: Mon, 17 Dec 2012 12:09:52 -0800 Message-ID: <20121217200950.GA29382@u109add4315675089e695.ant.amazon.com> References: <7D7C26B1462EB14CB0E7246697A18C1312C3D2@INHYMS111A.ca.com> <1346314031.27277.20.camel@zakaz.uk.xensource.com> <20121204232305.GA5301@u109add4315675089e695.ant.amazon.com> <7D7C26B1462EB14CB0E7246697A18C13143D77@INHYMS111A.ca.com> <20121206053521.GA3482@u109add4315675089e695.ant.amazon.com> <7D7C26B1462EB14CB0E7246697A18C13145668@INHYMS111A.ca.com> <20121211213437.GA29869@u109add4315675089e695.ant.amazon.com> <7D7C26B1462EB14CB0E7246697A18C1314611A@INHYMS111A.ca.com> <20121214185304.GA9236@u109add4315675089e695.ant.amazon.com> <1355743598.14620.43.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1355743598.14620.43.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: "xen-devel@lists.xen.org" , "Palagummi, Siva" , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote: > On Fri, 2012-12-14 at 18:53 +0000, Matt Wilson wrote: > > On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote: > > > > -----Original Message----- > > > > From: Matt Wilson [mailto:msw@amazon.com] > > > > Sent: Wednesday, December 12, 2012 3:05 AM > > > > > > > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote: > > > > > > > > > > You can clearly see below that copy_off is input to > > > > > start_new_rx_buffer while copying frags. > > > > > > > > Yes, but that's the right thing to do. copy_off should be set to the > > > > destination offset after copying the last byte of linear data, which > > > > means "skb_headlen(skb) % PAGE_SIZE" is correct. > > > > > > > > > > No. It is not correct for two reasons. For example what if > > > skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would > > > be set to ZERO. And now if there exists some data in frags, ZERO > > > will be passed in as copy_off value and start_new_rx_buffer will > > > return FALSE. And second reason is the obvious case from the current > > > code where "offset_in_page(skb->data)" size hole will be left in the > > > first buffer after first pass in case remaining data that need to be > > > copied is going to overflow the first buffer. > > > > Right, and I'm arguing that having the code leave a hole is less > > desirable than potentially increasing the number of copy > > operations. I'd like to hear from Ian and others if using the buffers > > efficiently is more important than reducing copy operations. Intuitively, > > I think it's more important to use the ring efficiently. > > Do you mean the ring or the actual buffers? Sorry, the actual buffers. > The current code tries to coalesce multiple small frags/heads because it > is usually trivial but doesn't try too hard with multiple larger frags, > since they take up most of a page by themselves anyway. I suppose this > does waste a bit of buffer space and therefore could take more ring > slots, but it's not clear to me how much this matters in practice (it > might be tricky to measure this with any realistic workload). In the case where we're consistently handling large heads (like when using a MTU value of 9000 for streaming traffic), we're wasting 1/3 of the available buffers. > The cost of splitting a copy into two should be low though, the copies > are already batched into a single hypercall and I'd expect things to be > mostly dominated by the data copy itself rather than the setup of each > individual op, which would argue for splitting a copy in two if that > helps fill the buffers. That was my thought as well. We're testing a patch that does just this now. > The flip side is that once you get past the headers etc the paged frags > likely tend to either bits and bobs (fine) or mostly whole pages. In the > whole pages case trying to fill the buffers will result in every copy > getting split. My gut tells me that the whole pages case probably > dominates, but I'm not sure what the real world impact of splitting all > the copies would be. Right, I'm less concerned about the paged frags. It might make sense to skip some space so that the copying can be page aligned. I suppose it depends on how many defferent pages are in the list, and what the total size is. In practice I'd think it would be rare to see a paged SKB for ingress traffic to domUs unless there is significant intra-host communication (dom0->domU, domU->domU). When domU ingress traffic is originating from an Ethernet device it shouldn't be paged. Paged SKBs would come into play when a SKB is formed for transmit on an egress device that is SG-capable. Or am I misunderstanding how paged SKBs are used these days? Matt