From: John Fastabend <john.fastabend@gmail.com>
To: Tom Herbert <tom@herbertland.com>,
Jesper Dangaard Brouer <brouer@redhat.com>
Cc: David Miller <davem@davemloft.net>,
Eric Dumazet <eric.dumazet@gmail.com>,
Or Gerlitz <gerlitz.or@gmail.com>,
Eric Dumazet <edumazet@google.com>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Alexander Duyck <alexander.duyck@gmail.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Daniel Borkmann <borkmann@iogearbox.net>,
Marek Majkowski <marek@cloudflare.com>,
Hannes Frederic Sowa <hannes@stressinduktion.org>,
Florian Westphal <fw@strlen.de>, Paolo Abeni <pabeni@redhat.com>,
John Fastabend <john.r.fastabend@intel.com>,
Amir Vadai <amirva@gmail.com>,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: Optimizing instruction-cache, more packets at each stage
Date: Sun, 24 Jan 2016 13:41:13 -0800 [thread overview]
Message-ID: <56A544F9.9000501@gmail.com> (raw)
In-Reply-To: <CALx6S36Tq+4wwC5Z=6cuBt6k3beb_d7gy_ofdWf86KCxLjSaew@mail.gmail.com>
On 16-01-24 12:09 PM, Tom Herbert wrote:
> On Sun, Jan 24, 2016 at 6:28 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
>> On Thu, 21 Jan 2016 10:54:01 -0800 (PST)
>> David Miller <davem@davemloft.net> wrote:
>>
>>> From: Jesper Dangaard Brouer <brouer@redhat.com>
>>> Date: Thu, 21 Jan 2016 12:27:30 +0100
>>>
>>>> eth_type_trans() does two things:
>>>>
>>>> 1) determine skb->protocol
>>>> 2) setup skb->pkt_type = PACKET_{BROADCAST,MULTICAST,OTHERHOST}
>>>>
>>>> Could the HW descriptor deliver the "proto", or perhaps just some bits
>>>> on the most common proto's?
>>>>
>>>> The skb->pkt_type don't need many bits. And I bet the HW already have
>>>> the information. The BROADCAST and MULTICAST indication are easy. The
>>>> PACKET_OTHERHOST, can be turned around, by instead set a PACKET_HOST
>>>> indication, if the eth->h_dest match the devices dev->dev_addr (else a
>>>> SW compare is required).
>>>>
>>>> Is that doable in hardware?
>>>
>>> I feel like we've had this discussion before several years ago.
>>>
>>> I think having just the protocol value would be enough.
>>>
>>> skb->pkt_type we could deal with by using always an accessor and
>>> evaluating it lazily. Nothing needs it until we hit ip_rcv() or
>>> similar.
>>
>> First I thought, I liked the idea delaying the eval of skb->pkt_type.
>>
>> BUT then I realized, what if we take this even further. What if we
>> actually use this information, for something useful, at this very
>> early RX stage.
>>
>> The information I'm interested in, from the HW descriptor, is if this
>> packet is NOT for local delivery. If so, we can send the packet on a
>> "fast-forward" code path.
>>
>> Think about bridging packets to a guest OS. Because we know very
>> early at RX (from packet HW descriptor) we might even avoid allocating
>> a SKB. We could just "forward" the packet-page to the guest OS.
>>
>> Taking Eric's idea, of remote CPUs, we could even send these
>> packet-pages to a remote CPU (e.g. where the guest OS is running),
>> without having touched a single cache-line in the packet-data. I
>> would still bundle them up first, to amortize the (100-133ns) cost of
>> transferring something to another CPU.
>>
> You mean like RPS/RFS/aRFS/flow_director already does (except for the
> zero-touch part)?
>
You could also look at ATR in the ixgbe/i40e drivers which on xmit
uses a tuple to try and force the hardware to recv on the same queue
pair as the sending side. The idea being you can bind tx/rx queue
pairs to a core and send/recv on the same core which tends to be an
OK strategy although not always. It is sometimes better to tx and rx
on separate cores.
>> The data-cache trick, would be to instruct prefetcher only to start
>> prefetching to L3 or L2, when these packet are destined for a remote
>> CPU. At-least Intel CPUs have prefetch operations that specify only
>> L2/L3 cache.
>>
>>
>> Maybe, we need a combined solution. Lazy eval skb->pkt_type, for
>> local delivery, but set the information if avail from HW desc. And
>> fast page-forward don't even need a SKB.
>>
>> --
>> Best regards,
>> Jesper Dangaard Brouer
>> MSc.CS, Principal Kernel Engineer at Red Hat
>> Author of http://www.iptv-analyzer.org
>> LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2016-01-24 21:41 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-15 13:22 Optimizing instruction-cache, more packets at each stage Jesper Dangaard Brouer
2016-01-15 13:32 ` Hannes Frederic Sowa
2016-01-15 14:17 ` Jesper Dangaard Brouer
2016-01-15 13:36 ` David Laight
2016-01-15 14:00 ` Jesper Dangaard Brouer
2016-01-15 14:38 ` Felix Fietkau
2016-01-18 11:54 ` Jesper Dangaard Brouer
2016-01-18 17:01 ` Eric Dumazet
2016-01-25 0:08 ` Florian Fainelli
2016-01-15 20:47 ` David Miller
2016-01-18 10:27 ` Jesper Dangaard Brouer
2016-01-18 16:24 ` David Miller
2016-01-20 22:20 ` Or Gerlitz
2016-01-20 23:02 ` Eric Dumazet
2016-01-20 23:27 ` Tom Herbert
2016-01-21 11:27 ` Jesper Dangaard Brouer
2016-01-21 12:49 ` Or Gerlitz
2016-01-21 13:57 ` Jesper Dangaard Brouer
2016-01-21 18:56 ` David Miller
2016-01-21 22:45 ` Or Gerlitz
2016-01-21 22:59 ` David Miller
2016-01-21 16:38 ` Eric Dumazet
2016-01-21 18:54 ` David Miller
2016-01-24 14:28 ` Jesper Dangaard Brouer
2016-01-24 14:44 ` Michael S. Tsirkin
2016-01-24 17:28 ` John Fastabend
2016-01-25 13:15 ` Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) Jesper Dangaard Brouer
2016-01-25 17:09 ` Tom Herbert
2016-01-25 17:50 ` John Fastabend
2016-01-25 21:32 ` Tom Herbert
2016-01-25 21:58 ` John Fastabend
2016-01-25 22:10 ` Jesper Dangaard Brouer
2016-01-27 20:47 ` Jesper Dangaard Brouer
2016-01-27 21:56 ` Alexei Starovoitov
2016-01-28 9:52 ` Jesper Dangaard Brouer
2016-01-28 12:54 ` Eric Dumazet
2016-01-28 13:25 ` Eric Dumazet
2016-01-28 16:43 ` Tom Herbert
2016-01-28 2:50 ` Tom Herbert
2016-01-28 9:25 ` Jesper Dangaard Brouer
2016-01-28 12:45 ` Eric Dumazet
2016-01-28 16:37 ` Tom Herbert
2016-01-28 16:43 ` Eric Dumazet
2016-01-28 17:04 ` Jesper Dangaard Brouer
2016-01-24 20:09 ` Optimizing instruction-cache, more packets at each stage Tom Herbert
2016-01-24 21:41 ` John Fastabend [this message]
2016-01-24 23:50 ` Tom Herbert
2016-01-21 12:23 ` Jesper Dangaard Brouer
2016-01-21 16:38 ` Tom Herbert
2016-01-21 17:48 ` Eric Dumazet
2016-01-22 12:33 ` Jesper Dangaard Brouer
2016-01-22 14:33 ` Eric Dumazet
2016-01-22 17:07 ` Tom Herbert
2016-01-22 17:17 ` Jesper Dangaard Brouer
2016-02-02 16:13 ` Or Gerlitz
2016-02-02 16:37 ` Eric Dumazet
2016-01-18 16:53 ` Eric Dumazet
2016-01-18 17:36 ` Tom Herbert
2016-01-18 17:49 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56A544F9.9000501@gmail.com \
--to=john.fastabend@gmail.com \
--cc=alexander.duyck@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=amirva@gmail.com \
--cc=borkmann@iogearbox.net \
--cc=brouer@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=fw@strlen.de \
--cc=gerlitz.or@gmail.com \
--cc=hannes@stressinduktion.org \
--cc=john.r.fastabend@intel.com \
--cc=marek@cloudflare.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.