* [PATCH net-next] vrf: Add ethernet header for pass through VRF device
@ 2015-08-23 18:41 David Ahern
2015-08-25 21:02 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: David Ahern @ 2015-08-23 18:41 UTC (permalink / raw)
To: netdev; +Cc: shm, David Ahern
The change to use a custom dst broke tcpdump captures on the VRF device:
$ tcpdump -n -i vrf10
...
05:32:29.009362 IP 10.2.1.254 > 10.2.1.2: ICMP echo request, id 21989, seq 1, length 64
05:32:29.009855 00:00:40:01:8d:36 > 45:00:00:54:d6:6f, ethertype Unknown (0x0a02), length 84:
0x0000: 0102 0a02 01fe 0000 9181 55e5 0001 bd11 ..........U.....
0x0010: da55 0000 0000 bb5d 0700 0000 0000 1011 .U.....]........
0x0020: 1213 1415 1617 1819 1a1b 1c1d 1e1f 2021 ...............!
0x0030: 2223 2425 2627 2829 2a2b 2c2d 2e2f 3031 "#$%&'()*+,-./01
0x0040: 3233 3435 3637 234567
Local packets going through the VRF device are missing an ethernet header.
Fix by adding one and then stripping it off before pushing back to the IP
stack. With this patch you get the expected dumps:
...
05:36:15.713944 IP 10.2.1.254 > 10.2.1.2: ICMP echo request, id 23795, seq 1, length 64
05:36:15.714160 IP 10.2.1.2 > 10.2.1.254: ICMP echo reply, id 23795, seq 1, length 64
...
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
drivers/net/vrf.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index dbeffe789185..e5c792e4c224 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -219,6 +219,9 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev)
{
+ /* strip the ethernet header added for pass through VRF device */
+ __skb_pull(skb, skb_network_offset(skb));
+
switch (skb->protocol) {
case htons(ETH_P_IP):
return vrf_process_v4_outbound(skb, dev);
@@ -250,6 +253,17 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev)
static netdev_tx_t vrf_finish(struct sock *sk, struct sk_buff *skb)
{
+ int err;
+
+ __skb_pull(skb, skb_network_offset(skb));
+ err = dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
+ NULL, NULL, skb->len);
+
+ if (err < 0) {
+ vrf_tx_error(skb->dev, skb);
+ return -EINVAL;
+ }
+
return dev_queue_xmit(skb);
}
--
2.3.2 (Apple Git-55)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] vrf: Add ethernet header for pass through VRF device
2015-08-23 18:41 [PATCH net-next] vrf: Add ethernet header for pass through VRF device David Ahern
@ 2015-08-25 21:02 ` David Miller
2015-08-25 22:37 ` David Ahern
0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2015-08-25 21:02 UTC (permalink / raw)
To: dsa; +Cc: netdev, shm
From: David Ahern <dsa@cumulusnetworks.com>
Date: Sun, 23 Aug 2015 12:41:00 -0600
> @@ -250,6 +253,17 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev)
>
> static netdev_tx_t vrf_finish(struct sock *sk, struct sk_buff *skb)
> {
> + int err;
> +
> + __skb_pull(skb, skb_network_offset(skb));
> + err = dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
> + NULL, NULL, skb->len);
> +
> + if (err < 0) {
> + vrf_tx_error(skb->dev, skb);
> + return -EINVAL;
> + }
> +
> return dev_queue_xmit(skb);
This is expensive and rediculous to do for every TX frame.
You'll need to find another way.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] vrf: Add ethernet header for pass through VRF device
2015-08-25 21:02 ` David Miller
@ 2015-08-25 22:37 ` David Ahern
2015-08-25 22:51 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: David Ahern @ 2015-08-25 22:37 UTC (permalink / raw)
To: David Miller; +Cc: netdev, shm
On 8/25/15 2:02 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Sun, 23 Aug 2015 12:41:00 -0600
>
>> @@ -250,6 +253,17 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev)
>>
>> static netdev_tx_t vrf_finish(struct sock *sk, struct sk_buff *skb)
>> {
>> + int err;
>> +
>> + __skb_pull(skb, skb_network_offset(skb));
>> + err = dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
>> + NULL, NULL, skb->len);
>> +
>> + if (err < 0) {
>> + vrf_tx_error(skb->dev, skb);
>> + return -EINVAL;
>> + }
>> +
>> return dev_queue_xmit(skb);
>
> This is expensive and rediculous to do for every TX frame.
>
> You'll need to find another way.
>
The packet is directed here from the IP layer via the custom dst, so
there is no L2 header on the skb. So while the push and pop of the
header seems silly it is part and parcel of the feature to run tcpdump
on the VRF device. I don't see how it could be done any other way.
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] vrf: Add ethernet header for pass through VRF device
2015-08-25 22:37 ` David Ahern
@ 2015-08-25 22:51 ` David Miller
2015-08-26 19:36 ` David Ahern
0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2015-08-25 22:51 UTC (permalink / raw)
To: dsa; +Cc: netdev, shm
From: David Ahern <dsa@cumulusnetworks.com>
Date: Tue, 25 Aug 2015 15:37:55 -0700
> On 8/25/15 2:02 PM, David Miller wrote:
>> From: David Ahern <dsa@cumulusnetworks.com>
>> Date: Sun, 23 Aug 2015 12:41:00 -0600
>>
>>> @@ -250,6 +253,17 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb,
>>> struct net_device *dev)
>>>
>>> static netdev_tx_t vrf_finish(struct sock *sk, struct sk_buff *skb)
>>> {
>>> + int err;
>>> +
>>> + __skb_pull(skb, skb_network_offset(skb));
>>> + err = dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
>>> + NULL, NULL, skb->len);
>>> +
>>> + if (err < 0) {
>>> + vrf_tx_error(skb->dev, skb);
>>> + return -EINVAL;
>>> + }
>>> +
>>> return dev_queue_xmit(skb);
>>
>> This is expensive and rediculous to do for every TX frame.
>>
>> You'll need to find another way.
>>
>
> The packet is directed here from the IP layer via the custom dst, so
> there is no L2 header on the skb. So while the push and pop of the
> header seems silly it is part and parcel of the feature to run tcpdump
> on the VRF device. I don't see how it could be done any other way.
You're losing a significant optimization on the transmit path by not
using the neighbour table entry hard header cache.
That's what I want you to fix.
See dst_neigh_output() and in particular neigh_hh_output().
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] vrf: Add ethernet header for pass through VRF device
2015-08-25 22:51 ` David Miller
@ 2015-08-26 19:36 ` David Ahern
2015-08-27 0:30 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: David Ahern @ 2015-08-26 19:36 UTC (permalink / raw)
To: David Miller; +Cc: netdev, shm
On 8/25/15 3:51 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Tue, 25 Aug 2015 15:37:55 -0700
>
>> On 8/25/15 2:02 PM, David Miller wrote:
>>> From: David Ahern <dsa@cumulusnetworks.com>
>>> Date: Sun, 23 Aug 2015 12:41:00 -0600
>>>
>>>> @@ -250,6 +253,17 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb,
>>>> struct net_device *dev)
>>>>
>>>> static netdev_tx_t vrf_finish(struct sock *sk, struct sk_buff *skb)
>>>> {
>>>> + int err;
>>>> +
>>>> + __skb_pull(skb, skb_network_offset(skb));
>>>> + err = dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
>>>> + NULL, NULL, skb->len);
>>>> +
>>>> + if (err < 0) {
>>>> + vrf_tx_error(skb->dev, skb);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> return dev_queue_xmit(skb);
>>>
>>> This is expensive and rediculous to do for every TX frame.
>>>
>>> You'll need to find another way.
>>>
>>
>> The packet is directed here from the IP layer via the custom dst, so
>> there is no L2 header on the skb. So while the push and pop of the
>> header seems silly it is part and parcel of the feature to run tcpdump
>> on the VRF device. I don't see how it could be done any other way.
>
> You're losing a significant optimization on the transmit path by not
> using the neighbour table entry hard header cache.
>
> That's what I want you to fix.
>
> See dst_neigh_output() and in particular neigh_hh_output().
>
I'm sure you'll correct me if I am wrong ...
For VRF device we don't need dst_neigh_output or neigh_hh_output or a
neighbor cache. The packet never hits a wire with the VRF device header;
it just hits tcpdump and then recirculates in the stack. i.e, the vrf
device xmit just hides the eth header via the skb_pull and recirculates
the packet back in the stack with the dst pointing to the real device.
That's just the game for tc, netfilter, tcpdump to work with the VRF device.
As such all we need is to push an eth header to the front of the skb for
1 loop through the stack and eth_header via dev_hard_header with NULL
daddr is the simplest path to accomplish that. Any other path is just
extra overhead.
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] vrf: Add ethernet header for pass through VRF device
2015-08-26 19:36 ` David Ahern
@ 2015-08-27 0:30 ` David Miller
0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2015-08-27 0:30 UTC (permalink / raw)
To: dsa; +Cc: netdev, shm
From: David Ahern <dsa@cumulusnetworks.com>
Date: Wed, 26 Aug 2015 12:36:15 -0700
> As such all we need is to push an eth header to the front of the skb
> for 1 loop through the stack and eth_header via dev_hard_header with
> NULL daddr is the simplest path to accomplish that. Any other path is
> just extra overhead.
And dev_hard_header() is full of conditional code and partial stores,
whereas the hard header cache is a _SINGLE UNCONDITIONAL MEMCPY_.
You're making this data path more expensive than it needs to be just
to placate features which in no way are default situations.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-08-27 0:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-23 18:41 [PATCH net-next] vrf: Add ethernet header for pass through VRF device David Ahern
2015-08-25 21:02 ` David Miller
2015-08-25 22:37 ` David Ahern
2015-08-25 22:51 ` David Miller
2015-08-26 19:36 ` David Ahern
2015-08-27 0:30 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).