From: ebiederm@xmission.com (Eric W. Biederman)
To: Paul Pearce <pearce@cs.berkeley.edu>
Cc: Michael Richardson <mcr@sandelman.ca>,
Eric Dumazet <eric.dumazet@gmail.com>,
Ani Sinha <ani@aristanetworks.com>,
Jiri Pirko <jpirko@redhat.com>,
netdev@vger.kernel.org, edumazet <edumazet@google.com>,
tcpdump-workers <tcpdump-workers@lists.tcpdump.org>,
dborkman <dborkman@redhat.com>
Subject: Re: [tcpdump-workers] [PATCH net 1/2] net: dev_queue_xmit_nit: fix skb->vlan_tci field value
Date: Fri, 15 Feb 2013 00:17:33 -0800 [thread overview]
Message-ID: <87ehgi3s9u.fsf@xmission.com> (raw)
In-Reply-To: <CAOUgPvSpMmsLpiwmnxwFx9p+B=KOG5VqjNvu_+54hLcBPNK4Cg@mail.gmail.com> (Paul Pearce's message of "Thu, 10 Jan 2013 18:37:19 -0800")
Paul Pearce <pearce@cs.berkeley.edu> writes:
>>> My opinion as a kernel developer is that the network tap is here to have
>>> a copy of the exact frame given to the _device_.
>
>> Good: as someone who spends lots of time with tcpdump doing both network
>> and protocol diagnostics, it's really important to see exactly there.
>> If that means turning off some hardware offload in order to get the
>> intact 1p header, then that may be fine for many situations.
>> (At 10G, on a live router... well...)
>
> I agree as well.
>
> But I think Ani's point was that for RX packets, as of commit
> bcc6d47903612c3861201cc3a866fb604f26b8b2, the filters are not
> getting exactly what's "on the wire." Independent of hardware
> acceleration the vlan headers are being stripped off and skb->vlan_tci
> is being set. That's was the origin of this whole mess.
The mess goes back much farther than that. That commit just flushed
a lot of the mess out into the open, and made it apparent the kernel
had insufficient facilities for dealing with packets whose vlan
tags had been stripped and that libpcap had not been handling stripped
vlan tags.
> The msg from that commit reads in part:
>> Vlan untagging happens early in __netif_receive_skb so the rest of
>> code (ptype_all handlers, rx_handlers) see the skb like it was
>> untagged by hw.
>
> His confusion (which I share) is why it's acceptable to have this
> behavior of removing headers and setting skb->vlan_tci (regardless of
> hardware acceleration) on the RX path but not also set skb->vlan_tci
> on the TX path.
On all paths the kernel will now set a flag VLAN_TAG_PRESENT if the
vlan_tci is stripped off and used. So there is no pressing need for a
kernel change. recvmsg and BPF filters have all of the information they
need to figure out what is going on. So at this point this is a libpcap
problem not a kernel problem.
On the RX path always stripping the header allowed the vlan processing
code to be simplified and some bugs to be fixed.
Just reading through the code a bit more it looks like stripping the
vlan headers on TX if the network device does not support vlan header
accelleration is a performance loss. There are other cases besides
AF_PACKET in particular vlan_dev_hard_header that will insert the vlan
header on a packet before the packet is transmitted.
> Indepdent of proposed userspace or PACKET_AUXDATA solutions,
> clarification on the RX skb->vlan_tci behavior would be appreciated.
There are two variables now available in AUXDATA and in the BPF filters
for packets. VLAN_TAG_PRESENT and VLAN_TAG.
Packets that have their vlan tags stripped have VLAN_TAG_PRESENT set
and the tag is available in VLAN_TAG.
> My knowledge of this code is quite limited so it's entirely possible
> I'm off base here. If so please tell me.
Eric
prev parent reply other threads:[~2013-02-15 8:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-09 5:15 [PATCH net 1/2] net: dev_queue_xmit_nit: fix skb->vlan_tci field value Paul Pearce
2013-01-09 6:06 ` Ani Sinha
2013-01-09 6:27 ` Eric Dumazet
2013-01-09 6:34 ` Ani Sinha
2013-01-09 19:27 ` Ani Sinha
2013-01-09 19:51 ` Eric Dumazet
2013-01-09 20:01 ` Ani Sinha
2013-01-09 20:06 ` Ani Sinha
2013-01-11 1:47 ` [tcpdump-workers] " Michael Richardson
2013-01-11 2:37 ` Paul Pearce
2013-01-11 8:46 ` Daniel Borkmann
2013-02-15 8:17 ` Eric W. Biederman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ehgi3s9u.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=ani@aristanetworks.com \
--cc=dborkman@redhat.com \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=jpirko@redhat.com \
--cc=mcr@sandelman.ca \
--cc=netdev@vger.kernel.org \
--cc=pearce@cs.berkeley.edu \
--cc=tcpdump-workers@lists.tcpdump.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.