From: Eric Dumazet <eric.dumazet@gmail.com>
To: Brice Goglin <Brice.Goglin@inria.fr>
Cc: David Miller <davem@davemloft.net>,
nhorman@tuxdriver.com, netdev@vger.kernel.org
Subject: Re: [RFC] Idea about increasing efficency of skb allocation in network devices
Date: Mon, 27 Jul 2009 09:58:22 +0200 [thread overview]
Message-ID: <4A6D5E1E.3090907@gmail.com> (raw)
In-Reply-To: <4A6D52FF.2030008@inria.fr>
Brice Goglin a écrit :
> David Miller wrote:
>> From: Neil Horman <nhorman@tuxdriver.com>
>> Date: Sun, 26 Jul 2009 20:36:09 -0400
>>
>>
>>> Since Network devices dma their memory into a provided DMA
>>> buffer (which can usually be at an arbitrary location, as they must
>>> cross potentially several pci busses to reach any memory location),
>>> I'm postulating that it would increase our receive path efficiency
>>> to provide a hint to the driver layer as to which node to allocate
>>> an skb data buffer on. This hint would be determined by a feedback
>>> mechanism. I was thinking that we could provide a callback function
>>> via the skb, that accepted the skb and the originating net_device.
>>> This callback can track statistics on which numa nodes consume
>>> (read: copy data from) skbs that were produced by specific net
>>> devices. Then, when in the future that netdevice allocates a new
>>> skb (perhaps via netdev_alloc_skb), we can use that statistical
>>> profile to determine if the data buffer should be allocated on the
>>> local node, or on a remote node instead.
>>>
>> No matter what, you will do an inter-node memory operation.
>>
>> Unless, the consumer NUMA node is the same as the one the
>> device is on.
>>
>> Because since the device is on a NUMA node, if you DMA remotely
>> you've eaten the NUMA cost already.
>>
>> If you always DMA to the device's NUMA node (what we try to do now) at
>> least the is the possibility of eliminating cross-NUMA traffic.
>>
>> Better to move the application or stack processing towards the NUMA
>> node the network device is on, I think.
>>
>
> Is there an easy way to get this NUMA node from the application socket
> descriptor?
Thats not easy, this information can change for every packet (think of
bonding setups, whith aggregation of devices on different NUMA nodes)
We could add a getsockopt() call to peek this information from the next
data to be read from socket (returns node id where skb data is sitting,
hoping that NIC driver hadnt copybreak it (ie : allocate a small skb and
copy the device provided data on it before feeding packet to network stack))
> Also, one question that was raised at the Linux Symposium is: how do you
> know which processors run the receive queue for a specific connection ?
> It would be nice to have a way to retrieve such information in the
> application to avoid inter-node and inter-core/cache traffic.
All this depends on the fact you have multiqueue devices or not, and
trafic spreads on all queues or not.
Assuming you have single queue device, only current way to handle
this is to do the reverse thinking.
Ie, bind NIC interrupts to the appropriate set of cpus, and
possibly bind user apps threads dealing with network trafic to same set.
Only background or cpu hungry threads should be allowed to run
on foreigns nodes.
next prev parent reply other threads:[~2009-07-27 7:58 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-27 0:36 [RFC] Idea about increasing efficency of skb allocation in network devices Neil Horman
2009-07-27 1:02 ` David Miller
2009-07-27 7:10 ` Brice Goglin
2009-07-27 7:58 ` Eric Dumazet [this message]
2009-07-27 8:27 ` Brice Goglin
2009-07-27 10:55 ` Neil Horman
2009-07-29 8:20 ` Brice Goglin
2009-07-29 10:47 ` Neil Horman
2009-07-27 10:52 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A6D5E1E.3090907@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=Brice.Goglin@inria.fr \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).