netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux, tcpdump and vlan
@ 2007-07-19 18:20 andrei radulescu-banu
  2007-07-19 19:28 ` Stephen Hemminger
  0 siblings, 1 reply; 28+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 18:20 UTC (permalink / raw)
  To: Ben Greear
  Cc: Patrick McHardy, Stephen Hemminger, Krzysztof Halasa,
	linux-kernel, Linux Netdev List

> [Ben] If tcpdump and/or bridging needs to disable the hw-accel, then it can 
explicitly do so by some API.  That is better than overloading
the promisc flag in my opinion.  

I guess I could be persuaded in the end. But let me still play devil advocate. The semantics of 'promiscuous', in my opinion, mean 'receive everything', including vlan.

> [Ben] This is especially true since promisc 
is not easily readable by user-space and things like tcpdump
cannot have full control of promisc (if a mac-vlan has the NIC in 
promisc mode, for instance, then tcpdump can never disable it.)

I agree with all the above. For example when you run 'ifconfig' during 'tcpdump', the interface does not have the promiscuous flag set!! 

This confused me for a while, until I realized that tcpdump's packet socket was using an obscure packet_dev_mc() API (af_packet.c) to get the interface in promiscuous mode. The reason for this is that packet_mc_add() implements a reference counted mechanism for promiscuous. So that:
- starting tcpdump instance 1 sets promiscuous mode
- starting tcpdump instance 2 bumps the ref count in packet_mc_add()
- killing tcpdump instance 1 bumps down the ref count, the interface stays promiscuous
- killing tcpdump instance 2 truly clear promiscuous mode.

The trick here is that when you kill tcpdump, the kernel clears the packet socket, and in process bumps down the ref count. Had tcpdump manually set/cleared the promisc flag, the interface would have stayed promisc after tcpdump was killed.

(The mac-vlan driver must have this corner problem as well. If a mac-vlan interface is disabled while tcpdump runs, it may yank promiscuousness from under tcpdump.)

So if you want to create an ethtool API to set vlan-promiscuous mode, one problem to grapple is that we need a similar mechanism to the above, so you can run two concurrent tcpdump's (or tcpdump while bridging vlans) and the vlan-promiscuous mode gets set correctly each time.  For tcpdump at least, the new ethtool API needs to be called from packet_mc_add().








       
____________________________________________________________________________________
Yahoo! oneSearch: Finally, mobile search 
that gives answers, not web links. 
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: Linux, tcpdump and vlan
@ 2007-07-19 21:38 andrei radulescu-banu
  2007-07-19 23:38 ` Ben Greear
  0 siblings, 1 reply; 28+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 21:38 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Stephen Hemminger, Krzysztof Halasa, linux-kernel,
	Linux Netdev List

During debugging, I noticed that dev_queue_xmit() is called twice for tx vlan frames. This results in a frame being passed twice to a packet socket bound to 'any' interface. If the packet socket is bound to a specific interface, though, it will get only one copy of the tx frame, which is good.

In more detail: suppose we're tx'ing a frame, and the route table lookup yields a vlan outgoing device eth0.2. dev_queue_xmit() is called, which calls dev_queue_xmit_nit() for dev = eth0.2 then dev->hard_start_xmit() for dev = eth0.2. 

The latter call gets into the vlan layer, which attaches the vlan id 2 (accelerated or not... in my e1000 case accelerated) then calls dev_queue_xmit() again. This time around dev_queue_xmit_nit() is called for dev = eth0, and dev->hard_start_xmit() actually calls the ethernet driver.

The net result is that dev_queue_xmit_nit() is called twice, once for dev=eth0.2 then for dev=eth0.



      ____________________________________________________________________________________
Shape Yahoo! in your own image.  Join our Network Research Panel today!   http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 



^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: Linux, tcpdump and vlan
@ 2007-07-19 17:46 andrei radulescu-banu
  0 siblings, 0 replies; 28+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 17:46 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Stephen Hemminger, Krzysztof Halasa, linux-kernel,
	Linux Netdev List


>> [Andrei] VLAN_TX_SKB_CB() is perfect for that.
> [Patrick, Stephen] No its not. Its only legal to use while something has ownership
of the skb. Between VLAN devices and real devices qdiscs are
free to use it.

All right, using VLAN_TX_SKB_CB() is a bad idea. In that case, we need to amend the skb struct, I don't see another way.





       
____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: Linux, tcpdump and vlan
@ 2007-07-19 16:02 andrei radulescu-banu
  2007-07-20 19:58 ` Krzysztof Halasa
  0 siblings, 1 reply; 28+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 16:02 UTC (permalink / raw)
  To: Stephen Hemminger, Krzysztof Halasa
  Cc: Patrick McHardy, linux-kernel, Linux Netdev List

One additional thought: with the proposed changes in my prev message, the driver can be set to hw vlan accelerated mode, even if no vlan interfaces are configured. We would not have to switch hw vlan accelerated mode anymore, when vlan interfaces are created or destroyed.






       
____________________________________________________________________________________
Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. 
http://answers.yahoo.com/dir/?link=list&sid=396545433

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: Linux, tcpdump and vlan
@ 2007-07-19 15:47 andrei radulescu-banu
  2007-07-19 16:21 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 15:47 UTC (permalink / raw)
  To: Patrick McHardy, Stephen Hemminger
  Cc: Krzysztof Halasa, linux-kernel, Linux Netdev List

The consensus seems to be that skb's need to carry vlan accelerated tags in their cb's, on rx as well as tx. VLAN_TX_SKB_CB() is perfect for that.

> [Patrick] On the TX path, it could simply use the CB, but this is actually
also wrong (for both macvlan and real devices) since qdiscs have
ownership of the skb in between, and at least netem *does* modify
the CB, breaking VLAN.

Thanks for pointing that out... It appears to me that qdisc/netem already breaks the vlan implementation, in the path 

vlan_dev_hwaccel_hard_start_xmit(): sets accelerated vlan tag in skb->cb, calls
dev_queue_xmit(): may pass skb to qdisc/netem, which may mangle skb->cb before calling
dev->hard_start_xmit(), resulting in a tx frame without its vlan tag.

So netem needs to look for hw accelerated vlan metadata and insert it in the skb... Don't see any other way around this. 

> [Patrick] Your suggestion of disabling VLAN acceleration in promiscuous
mode sounds like a reasonable solution until then ..

I was rather thinking of keeping hw vlan acceleration in promiscuous mode. Upon becoming promisc, the driver will be changed to disable vlan filters - it will reenable them when leaving promisc mode.

My 2 cents on vlan hw acceleration: it does not save much in computing cycles, if software is written carefully. It is vlan filtering that saves computing time.

> [Ben] I think a better method would be to allow disabling VLAN HW accel for a NIC with ethtool.

This requires changes to ethtool and e1000 driver, +other drivers. It is a handy thing to have. I don't view it as a solution to tcpdump - or to the vlan bridging problem. One concern: if we're switching hw accel mode on the fly, we need to carefully protect tx frames that are just about going out and have already been set up for the opposite mode.

Any comments on what is the expected behavior of 'tcpdump -i eth0.2' vs. 'tcpdump -i eth0'?

Andrei Radulescu-Banu
Brix Networks





       
____________________________________________________________________________________
Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/

^ permalink raw reply	[flat|nested] 28+ messages in thread
[parent not found: <878246.51044.qm@web56608.mail.re3.yahoo.com>]

end of thread, other threads:[~2007-07-21 21:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-19 18:20 Linux, tcpdump and vlan andrei radulescu-banu
2007-07-19 19:28 ` Stephen Hemminger
  -- strict thread matches above, loose matches on Subject: below --
2007-07-19 21:38 andrei radulescu-banu
2007-07-19 23:38 ` Ben Greear
2007-07-20 20:19   ` Krzysztof Halasa
2007-07-19 17:46 andrei radulescu-banu
2007-07-19 16:02 andrei radulescu-banu
2007-07-20 19:58 ` Krzysztof Halasa
2007-07-20 20:34   ` Ben Greear
2007-07-21 11:32     ` Krzysztof Halasa
2007-07-21 17:57       ` Ben Greear
2007-07-21 21:15         ` Krzysztof Halasa
2007-07-19 15:47 andrei radulescu-banu
2007-07-19 16:21 ` Stephen Hemminger
2007-07-19 16:33 ` Patrick McHardy
2007-07-19 16:47 ` Ben Greear
     [not found] <878246.51044.qm@web56608.mail.re3.yahoo.com>
2007-07-18 22:57 ` Patrick McHardy
2007-07-18 23:22   ` Ben Greear
2007-07-18 23:34     ` Patrick McHardy
2007-07-19  0:01       ` Ben Greear
2007-07-19  0:19         ` Patrick McHardy
2007-07-19 13:28   ` Krzysztof Halasa
2007-07-19 13:41     ` Stephen Hemminger
2007-07-19 14:00       ` Patrick McHardy
2007-07-19 14:23       ` Krzysztof Halasa
2007-07-19 15:00         ` Stephen Hemminger
2007-07-19 15:45           ` Krzysztof Halasa
2007-07-19 15:20         ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).