From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: [PATCH 0/4] skb paged fragment destructors Date: Wed, 9 Nov 2011 15:01:35 +0000 Message-ID: <1320850895.955.172.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Jesse Brandeburg , To: David Miller Return-path: Received: from smtp.ctxuk.citrix.com ([62.200.22.115]:25179 "EHLO SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754146Ab1KIPBh (ORCPT ); Wed, 9 Nov 2011 10:01:37 -0500 Sender: netdev-owner@vger.kernel.org List-ID: The following series makes use of the skb fragment API (which is in 3.2) to add a per-paged-fragment destructor callback. This can be used by creators of skbs who are interested in the lifecycle of the pages included in that skb after they have handed it off to the network stack. I think these have all been posted before, but have been backed up behind the skb fragment API. The mail at [0] contains some more background and rationale but basically the completed series will allow entities which inject pages into the networking stack to receive a notification when the stack has really finished with those pages (i.e. including retransmissions, clones, pull-ups etc) and not just when the original skb is finished with, which is beneficial to many subsystems which wish to inject pages into the network stack without giving up full ownership of those page's lifecycle. It implements something broadly along the lines of what was described in [1]. I have also included a patch to the RPC subsystem which uses this API to fix the bug which I describe at [2]. I presented this work at LPC in September and there was a question/concern raised (by Jesse Brandenburg IIRC) regarding the overhead of adding this extra field per fragment. If I understand correctly it seems that in the there have been performance regressions in the past with allocations outgrowing one allocation size bucket and therefore using the next. The change in datastructure size resulting from this series is: BEFORE AFTER AMD64: sizeof(struct skb_frag_struct) = 16 24 sizeof(struct skb_shared_info) = 344 488 sizeof(struct sk_buff) = 240 240 i386: sizeof(struct skb_frag_struct) = 8 12 sizeof(struct skb_shared_info) = 188 260 sizeof(struct sk_buff) = 192 192 (I think these are representative of 32 and 64 bit arches generally) On amd64 this doesn't in itself push the shared_info over a slab boundary but since the linear data is part of the same allocation the size of the linear data which will push us into the next size is reduced from 168 to 24 bytes, which is effectively the same thing as pushing directly into the next size. On i386 we go straight to the next bucket (although the 68 bytes available slack for linear area becomes 252 in that larger size). I'm not sure if this is a showstopper or the particular issue with slab still exists (or maybe it was only slab/slub/slob specific?). I need to find some benchmark which might demonstrate the issue (presumably something where frames are commonly 24