From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Horman Subject: [RFC] Idea about increasing efficency of skb allocation in network devices Date: Sun, 26 Jul 2009 20:36:09 -0400 Message-ID: <20090727003609.GA30438@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: nhorman@tuxdriver.com To: netdev@vger.kernel.org Return-path: Received: from charlotte.tuxdriver.com ([70.61.120.58]:45635 "EHLO smtp.tuxdriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754463AbZG0AgL (ORCPT ); Sun, 26 Jul 2009 20:36:11 -0400 Content-Disposition: inline Sender: netdev-owner@vger.kernel.org List-ID: Hey all- I've been thinking of an idea lately, and I'm starting to tinker with implementation, so I thought before I went to far down any one path, I'd like to solicit for comments on it, just to avoid early design errors and the like. Please find my proposal below. Feel free to openly ridicule it if you think its completely off base or pointless. Any and all criticizm welcome. Thanks! Problem Statement: Currently the networking stack receive path consists of a set of producers (the network drivers which allocate skbs to receive on the wire data into), and a set of consumers (user space applications and other networking devices which free those skbs when their use is finished). These consumers and producders are dynamic (additional consumers and producers can be added alsmot at will within the system). Currently, there exists an potential inefficiency in this receive path when using NUMA systems. Given that allocation of skb data buffers is done with only minimal regard to the NUMA node on which a producer exists (following standard vm policy in which we try to allocate on the local node first), it is entirely possible that a consumer of this frame data will exist on a different NUMA node than the node on which it was allocated. This disparity leads to slower copying when an application attempts to copy this data from the kernel, as it must cross a greater number of memory bridges. Proposed solution: Since Network devices dma their memory into a provided DMA buffer (which can usually be at an arbitrary location, as they must cross potentially several pci busses to reach any memory location), I'm postulating that it would increase our receive path efficiency to provide a hint to the driver layer as to which node to allocate an skb data buffer on. This hint would be determined by a feedback mechanism. I was thinking that we could provide a callback function via the skb, that accepted the skb and the originating net_device. This callback can track statistics on which numa nodes consume (read: copy data from) skbs that were produced by specific net devices. Then, when in the future that netdevice allocates a new skb (perhaps via netdev_alloc_skb), we can use that statistical profile to determine if the data buffer should be allocated on the local node, or on a remote node instead. Ideally, this 'consumer based allocation bias' would allow us to reduce the amount of time it takes to transfer recieved buffers to user space and make the overall receive path more efficient. I see lots of opportunity here to develop tools to measure the speedup this might provide (perhaps via ftrace plugins), as well as various algorithms to better predict how to allocate skb's on various nodes. Obviously, the code is going to do the talking here, but I wanted to get the idea out there so that I anyone who wanted to could point out anything obvious that would lead to the conclusion that I was nuts. Feel free to tear it all apart, or, on the off chance that this has legs, suggestions for improvements/features that you might like. Thanks! Neil