* [PATCH] bridge: avoid ptype_all packet handling @ 2007-03-01 1:18 Stephen Hemminger 2007-03-01 1:28 ` Ben Greear 2007-03-02 21:26 ` David Miller 0 siblings, 2 replies; 25+ messages in thread From: Stephen Hemminger @ 2007-03-01 1:18 UTC (permalink / raw) To: David Miller; +Cc: bridge, netdev I was measuring bridging/routing performance and noticed this. The current code runs the "all packet" type handlers before calling the bridge hook. If an application (like some DHCP clients) is using AF_PACKET, this means that each received packet gets run through the Berkeley Packet Filter code in sk_run_filter (slow). By moving the bridging hook to run first, the packets flowing through the bridge get filtered out there. This results in a 14% improvement in performance, but it does mean that some snooping applications would miss packets if being used on a bridge. The correct way to see all packets on a bridge is to set the bridge pseudo-device to promiscuous mode. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> --- net/core/dev.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index cf71614..dc2cda6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1792,6 +1792,10 @@ int netif_receive_skb(struct sk_buff *skb) rcu_read_lock(); + if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) + goto out; + + #ifdef CONFIG_NET_CLS_ACT if (skb->tc_verd & TC_NCLS) { skb->tc_verd = CLR_TC_NCLS(skb->tc_verd); @@ -1826,9 +1830,6 @@ int netif_receive_skb(struct sk_buff *skb) ncls: #endif - if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) - goto out; - type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) { if (ptype->type == type && -- 1.4.4.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 1:18 [PATCH] bridge: avoid ptype_all packet handling Stephen Hemminger @ 2007-03-01 1:28 ` Ben Greear 2007-03-01 3:56 ` Stephen Hemminger 2007-03-02 21:26 ` David Miller 1 sibling, 1 reply; 25+ messages in thread From: Ben Greear @ 2007-03-01 1:28 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, bridge, netdev Stephen Hemminger wrote: > I was measuring bridging/routing performance and noticed this. > > The current code runs the "all packet" type handlers before calling the > bridge hook. If an application (like some DHCP clients) is using AF_PACKET, > this means that each received packet gets run through the Berkeley Packet Filter > code in sk_run_filter (slow). > > By moving the bridging hook to run first, the packets flowing through > the bridge get filtered out there. This results in a 14% > improvement in performance, but it does mean that some snooping applications > would miss packets if being used on a bridge. The correct way to see all > packets on a bridge is to set the bridge pseudo-device to promiscuous > mode. Seems it would be better to fix these clients to be more selective as to where they bind. This breaks the case where you want to see packets on a particular interface, not just the entire bridge, right? Thanks, Ben > > Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> > --- > net/core/dev.c | 7 ++++--- > 1 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index cf71614..dc2cda6 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1792,6 +1792,10 @@ int netif_receive_skb(struct sk_buff *skb) > > rcu_read_lock(); > > + if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) > + goto out; > + > + > #ifdef CONFIG_NET_CLS_ACT > if (skb->tc_verd & TC_NCLS) { > skb->tc_verd = CLR_TC_NCLS(skb->tc_verd); > @@ -1826,9 +1830,6 @@ int netif_receive_skb(struct sk_buff *skb) > ncls: > #endif > > - if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) > - goto out; > - > type = skb->protocol; > list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) { > if (ptype->type == type && -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 1:28 ` Ben Greear @ 2007-03-01 3:56 ` Stephen Hemminger 2007-03-01 4:05 ` Ben Greear 0 siblings, 1 reply; 25+ messages in thread From: Stephen Hemminger @ 2007-03-01 3:56 UTC (permalink / raw) To: Ben Greear; +Cc: David Miller, bridge, netdev On Wed, 28 Feb 2007 17:28:09 -0800 Ben Greear <greearb@candelatech.com> wrote: > Stephen Hemminger wrote: > > I was measuring bridging/routing performance and noticed this. > > > > The current code runs the "all packet" type handlers before calling > > the bridge hook. If an application (like some DHCP clients) is > > using AF_PACKET, this means that each received packet gets run > > through the Berkeley Packet Filter code in sk_run_filter (slow). > > > > By moving the bridging hook to run first, the packets flowing > > through the bridge get filtered out there. This results in a 14% > > improvement in performance, but it does mean that some snooping > > applications would miss packets if being used on a bridge. The > > correct way to see all packets on a bridge is to set the bridge > > pseudo-device to promiscuous mode. > > Seems it would be better to fix these clients to be more selective as > to where they bind. The problem is any use of BPF is a lose, if it has to be done to all traffic. > This breaks the case where you want to see packets on a particular > interface, not just the entire bridge, right? It might be possible to use promisc counter to handle this. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 3:56 ` Stephen Hemminger @ 2007-03-01 4:05 ` Ben Greear 2007-03-01 7:04 ` Stephen Hemminger 0 siblings, 1 reply; 25+ messages in thread From: Ben Greear @ 2007-03-01 4:05 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, bridge, netdev Stephen Hemminger wrote: > On Wed, 28 Feb 2007 17:28:09 -0800 > Ben Greear <greearb@candelatech.com> wrote: > > >> Stephen Hemminger wrote: >> >>> I was measuring bridging/routing performance and noticed this. >>> >>> The current code runs the "all packet" type handlers before calling >>> the bridge hook. If an application (like some DHCP clients) is >>> using AF_PACKET, this means that each received packet gets run >>> through the Berkeley Packet Filter code in sk_run_filter (slow). >>> >>> By moving the bridging hook to run first, the packets flowing >>> through the bridge get filtered out there. This results in a 14% >>> improvement in performance, but it does mean that some snooping >>> applications would miss packets if being used on a bridge. The >>> correct way to see all packets on a bridge is to set the bridge >>> pseudo-device to promiscuous mode. >>> >> Seems it would be better to fix these clients to be more selective as >> to where they bind. >> > > The problem is any use of BPF is a lose, if it has to be done to all > traffic. > Right, but couldn't you have the dhcp client bind to eth0, eth7, and br0 (ie, skipping the eth1-6 that comprise the bridge group?) The only difficulty I see is having the client know when new devices come and go, but there are probably ways to know that without keeping a whole lot of state or probing the /proc/net/dev (like my own bloated app does :)) I envision the client args to be something like --skip-devices "eth1 eth2 eth3 ..." I know you can bind raw packet sockets to individual devices, though I don't know much about BPF, so it's possible I'm wrong... >> This breaks the case where you want to see packets on a particular >> interface, not just the entire bridge, right? >> > > It might be possible to use promisc counter to handle this. > Not really, it's perfectly valid to sniff a port in non-promiscuous mode... Ben > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 4:05 ` Ben Greear @ 2007-03-01 7:04 ` Stephen Hemminger 2007-03-01 7:22 ` David Miller 0 siblings, 1 reply; 25+ messages in thread From: Stephen Hemminger @ 2007-03-01 7:04 UTC (permalink / raw) To: Ben Greear; +Cc: David Miller, bridge, netdev Ben Greear wrote: > Stephen Hemminger wrote: >> On Wed, 28 Feb 2007 17:28:09 -0800 >> Ben Greear <greearb@candelatech.com> wrote: >> >> >>> Stephen Hemminger wrote: >>> >>>> I was measuring bridging/routing performance and noticed this. >>>> >>>> The current code runs the "all packet" type handlers before calling >>>> the bridge hook. If an application (like some DHCP clients) is >>>> using AF_PACKET, this means that each received packet gets run >>>> through the Berkeley Packet Filter code in sk_run_filter (slow). >>>> >>>> By moving the bridging hook to run first, the packets flowing >>>> through the bridge get filtered out there. This results in a 14% >>>> improvement in performance, but it does mean that some snooping >>>> applications would miss packets if being used on a bridge. The >>>> correct way to see all packets on a bridge is to set the bridge >>>> pseudo-device to promiscuous mode. >>>> >>> Seems it would be better to fix these clients to be more selective as >>> to where they bind. >>> >> >> The problem is any use of BPF is a lose, if it has to be done to all >> traffic. >> > Right, but couldn't you have the dhcp client bind to eth0, eth7, and > br0 (ie, skipping the eth1-6 that comprise the bridge group?) > > The only difficulty I see is having the client know when new devices > come and go, but there are probably > ways to know that without keeping a whole lot of state or probing the > /proc/net/dev (like my own bloated app does :)) > > I envision the client args to be something like --skip-devices "eth1 > eth2 eth3 ..." > > I know you can bind raw packet sockets to individual devices, though I > don't know much about BPF, so it's > possible I'm wrong... > The kernel has to deal with busted applications all the time. And each damn distro and configuration seems to invent it's own new way of doing network configuration. If an normal application has to use something like raw packet filtering, it seems there is a missing API. >>> This breaks the case where you want to see packets on a particular >>> interface, not just the entire bridge, right? >>> >> >> It might be possible to use promisc counter to handle this. >> > Not really, it's perfectly valid to sniff a port in non-promiscuous > mode... > The non-promiscuous mode packets still make it in through the normal receive path. The only packets that don't make up the stack are those that are being bridged. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 7:04 ` Stephen Hemminger @ 2007-03-01 7:22 ` David Miller 2007-03-01 7:26 ` Stephen Hemminger 0 siblings, 1 reply; 25+ messages in thread From: David Miller @ 2007-03-01 7:22 UTC (permalink / raw) To: shemminger; +Cc: greearb, bridge, netdev From: Stephen Hemminger <shemminger@linux-foundation.org> Date: Wed, 28 Feb 2007 23:04:36 -0800 > If an normal application has to use something like raw packet > filtering, it seems there is a missing API. I'm loosely following this discussion, but Ben mentions DHCP and I remember learning the other month that DHCP uses AF_PACKET and filtering instead of IP RAW sockets because it needs to get the MAC address and RAW sockets don't provide that. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 7:22 ` David Miller @ 2007-03-01 7:26 ` Stephen Hemminger 2007-03-01 7:30 ` David Miller 0 siblings, 1 reply; 25+ messages in thread From: Stephen Hemminger @ 2007-03-01 7:26 UTC (permalink / raw) To: David Miller; +Cc: greearb, bridge, netdev David Miller wrote: > From: Stephen Hemminger <shemminger@linux-foundation.org> > Date: Wed, 28 Feb 2007 23:04:36 -0800 > > >> If an normal application has to use something like raw packet >> filtering, it seems there is a missing API. >> > > I'm loosely following this discussion, but Ben mentions DHCP > and I remember learning the other month that DHCP uses AF_PACKET > and filtering instead of IP RAW sockets because it needs to get > the MAC address and RAW sockets don't provide that. > sounds like a socket option would help, the data is already there. Then the normal UDP receive path would work. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 7:26 ` Stephen Hemminger @ 2007-03-01 7:30 ` David Miller 2007-03-01 11:47 ` jamal 2007-03-03 2:14 ` Andi Kleen 0 siblings, 2 replies; 25+ messages in thread From: David Miller @ 2007-03-01 7:30 UTC (permalink / raw) To: shemminger; +Cc: greearb, bridge, netdev From: Stephen Hemminger <shemminger@linux-foundation.org> Date: Wed, 28 Feb 2007 23:26:36 -0800 > sounds like a socket option would help, the data is already there. Then > the normal > UDP receive path would work. That would be perfect for new applications. But we have to support all the old ones, so we're stuck providing correctly functioning AF_PACKET handling on all devices, sorry. And in fact that effectively makes the new socket option pointless, since it doesn't buy us anything since we have to support the old stuff fully anyways. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 7:30 ` David Miller @ 2007-03-01 11:47 ` jamal 2007-03-03 2:14 ` Andi Kleen 1 sibling, 0 replies; 25+ messages in thread From: jamal @ 2007-03-01 11:47 UTC (permalink / raw) To: David Miller; +Cc: shemminger, greearb, bridge, netdev On Wed, 2007-28-02 at 23:30 -0800, David Miller wrote: > That would be perfect for new applications. > But we have to support all the old ones, so we're stuck > providing correctly functioning AF_PACKET handling on > all devices, sorry. > It also breaks all the ingress tc code by making that change. The two scenarios i see in respect to performance are: - run a sniffer and you will see a substantial performance degradation as the pps goes up. There is no rocket science to that. This has nothing to do with bridging and will happen still even if that patch went in. - dont run any tap with the current codepath and you still see the _substantial_ performance drop then we have an opportunity to optimize. Not _by avoiding the code_ as in Stephens patch but by tunning that tap code path. [For example, you should still see a _tiny_ performance degradation if you turned on TC actions on ingress at compile time but had zero policies at run time - but that code path has been reduce to a single compare]. > And in fact that effectively makes the new socket option > pointless, since it doesn't buy us anything since we have > to support the old stuff fully anyways. agreed. I have been tied up elsewhere so havent been following netdev closely: There seem to be complaints of bridging performance going down in recent kernels - or is that someone misconfiguring their drivers? cheers, jamal ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 7:30 ` David Miller 2007-03-01 11:47 ` jamal @ 2007-03-03 2:14 ` Andi Kleen 2007-03-03 4:22 ` David Miller 1 sibling, 1 reply; 25+ messages in thread From: Andi Kleen @ 2007-03-03 2:14 UTC (permalink / raw) To: David Miller; +Cc: shemminger, greearb, bridge, netdev David Miller <davem@davemloft.net> writes: > > And in fact that effectively makes the new socket option > pointless, since it doesn't buy us anything since we have > to support the old stuff fully anyways. I don't think it's pointless because it would still allow newer DHCP clients to have less impact on other packets when they are active. This can matter when you have a system with multiple interfaces where DHCP doesn't get a address on one. That's pretty common with many x86 server boards because they come with two NICs by default but must people only plug the cable into one. However the distro installers run DHCP on all. When this happens all packets are always forced through ptype_all chains before being rejected by AF_PACKETs device bind, which adds some overhead to them. -Andi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-03 2:14 ` Andi Kleen @ 2007-03-03 4:22 ` David Miller 2007-03-03 7:09 ` Stephen Hemminger 2007-03-03 12:30 ` Andi Kleen 0 siblings, 2 replies; 25+ messages in thread From: David Miller @ 2007-03-03 4:22 UTC (permalink / raw) To: andi; +Cc: shemminger, greearb, bridge, netdev From: Andi Kleen <andi@firstfloor.org> Date: 03 Mar 2007 03:14:29 +0100 > That's pretty common with many x86 server boards because > they come with two NICs by default but must people only > plug the cable into one. However the distro installers > run DHCP on all. Nope, that's not what I've seen them do, instead they run dhcp on interfaces that report a link being present. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-03 4:22 ` David Miller @ 2007-03-03 7:09 ` Stephen Hemminger 2007-03-03 12:30 ` Andi Kleen 1 sibling, 0 replies; 25+ messages in thread From: Stephen Hemminger @ 2007-03-03 7:09 UTC (permalink / raw) To: David Miller; +Cc: andi, greearb, bridge, netdev David Miller wrote: > From: Andi Kleen <andi@firstfloor.org> > Date: 03 Mar 2007 03:14:29 +0100 > > >> That's pretty common with many x86 server boards because >> they come with two NICs by default but must people only >> plug the cable into one. However the distro installers >> run DHCP on all. >> > > Nope, that's not what I've seen them do, instead they run dhcp on > interfaces that report a link being present. > Actually, It may be even simpler... I start bridge with a script and there was still a dhclient left over running on the original interface. It was an interesting exercise, and I have new tools to help, but still no magic bullet to get up to full line rate. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-03 4:22 ` David Miller 2007-03-03 7:09 ` Stephen Hemminger @ 2007-03-03 12:30 ` Andi Kleen 1 sibling, 0 replies; 25+ messages in thread From: Andi Kleen @ 2007-03-03 12:30 UTC (permalink / raw) To: David Miller; +Cc: andi, shemminger, greearb, bridge, netdev On Fri, Mar 02, 2007 at 08:22:11PM -0800, David Miller wrote: > From: Andi Kleen <andi@firstfloor.org> > Date: 03 Mar 2007 03:14:29 +0100 > > > That's pretty common with many x86 server boards because > > they come with two NICs by default but must people only > > plug the cable into one. However the distro installers > > run DHCP on all. > > Nope, that's not what I've seen them do, instead they run dhcp on > interfaces that report a link being present. I've seen otherwise. And that's also bad. It means that when the user moves the machine and happens to plug the Ethernet into the other port network will be a notwork until the configuration is manually changed. Similar when the cable is not plugged in yet at install time. All not good. Allowing low overhead DHCP is useful imho. The main problem with running it always is that it will use more power because it will IFF_UP the interface. But longer term that can be only properly solved by real idle network power management I think. -Andi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-01 1:18 [PATCH] bridge: avoid ptype_all packet handling Stephen Hemminger 2007-03-01 1:28 ` Ben Greear @ 2007-03-02 21:26 ` David Miller 2007-03-02 22:09 ` [RFC 1/2] " Stephen Hemminger ` (2 more replies) 1 sibling, 3 replies; 25+ messages in thread From: David Miller @ 2007-03-02 21:26 UTC (permalink / raw) To: shemminger; +Cc: bridge, netdev From: Stephen Hemminger <shemminger@linux-foundation.org> Date: Wed, 28 Feb 2007 17:18:46 -0800 > I was measuring bridging/routing performance and noticed this. > > The current code runs the "all packet" type handlers before calling the > bridge hook. If an application (like some DHCP clients) is using AF_PACKET, > this means that each received packet gets run through the Berkeley Packet Filter > code in sk_run_filter (slow). I know we closed this out by saying that even though performance sucks, we can't really apply this without breaking things. What would be broken is if the DHCP client isn't specifying a device ifindex when it binds the AF_PACKET socket. That would be an easy way to fix this performance problem at the application level. The DHCP client should only care about a particular interface's traffic, the one it wants to listen on. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-02 21:26 ` David Miller @ 2007-03-02 22:09 ` Stephen Hemminger 2007-03-02 22:14 ` [RFC 2/2] bridge: per device promiscious taps Stephen Hemminger 2007-03-02 22:48 ` [RFC 1/2] bridge: avoid ptype_all packet handling David Miller 2007-03-02 22:15 ` Stephen Hemminger 2007-03-03 12:04 ` [PATCH] " Stefan Rompf 2 siblings, 2 replies; 25+ messages in thread From: Stephen Hemminger @ 2007-03-02 22:09 UTC (permalink / raw) To: David Miller; +Cc: bridge, netdev On Fri, 02 Mar 2007 13:26:38 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: Stephen Hemminger <shemminger@linux-foundation.org> > Date: Wed, 28 Feb 2007 17:18:46 -0800 > > > I was measuring bridging/routing performance and noticed this. > > > > The current code runs the "all packet" type handlers before calling the > > bridge hook. If an application (like some DHCP clients) is using AF_PACKET, > > this means that each received packet gets run through the Berkeley Packet Filter > > code in sk_run_filter (slow). > > I know we closed this out by saying that even though performance > sucks, we can't really apply this without breaking things. wrong. > What would be broken is if the DHCP client isn't specifying > a device ifindex when it binds the AF_PACKET socket. That > would be an easy way to fix this performance problem at the > application level. > > The DHCP client should only care about a particular interface's > traffic, the one it wants to listen on. My assumption is that when bridging, the normal stack path only has to receive those packets that it would receive if it was not doing bridging. A better version of the patch is: ============== The current code runs the "all packet" type handlers before calling the bridge hook. If an application (like some DHCP clients) is using AF_PACKET, this means that each received packet gets run through the Berkeley Packet Filter code in sk_run_filter. This is significant overhead. By moving the bridging hook to run first, the packets flowing through the bridge get filtered out there first. This results in a 14% improvement in performance. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> --- net/core/dev.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) --- netem.orig/net/core/dev.c +++ netem/net/core/dev.c @@ -1702,9 +1702,12 @@ struct net_bridge_fdb_entry *(*br_fdb_ge unsigned char *addr); void (*br_fdb_put_hook)(struct net_bridge_fdb_entry *ent); -static __inline__ int handle_bridge(struct sk_buff **pskb, - struct packet_type **pt_prev, int *ret, - struct net_device *orig_dev) +/* + * If bridge module is loaded call bridging hook. + * when it returns 1, this is a non-local packet + */ +int (*br_handle_frame_hook)(struct net_bridge_port *p, struct sk_buff **pskb) __read_mostly; +static int handle_bridge(struct sk_buff **pskb) { struct net_bridge_port *port; @@ -1712,15 +1715,10 @@ static __inline__ int handle_bridge(stru (port = rcu_dereference((*pskb)->dev->br_port)) == NULL) return 0; - if (*pt_prev) { - *ret = deliver_skb(*pskb, *pt_prev, orig_dev); - *pt_prev = NULL; - } - return br_handle_frame_hook(port, pskb); } #else -#define handle_bridge(skb, pt_prev, ret, orig_dev) (0) +#define handle_bridge(pskb) 0 #endif #ifdef CONFIG_NET_CLS_ACT @@ -1799,6 +1797,9 @@ int netif_receive_skb(struct sk_buff *sk } #endif + if (handle_bridge(&skb)) + goto out; + list_for_each_entry_rcu(ptype, &ptype_all, list) { if (!ptype->dev || ptype->dev == skb->dev) { if (pt_prev) @@ -1826,9 +1827,6 @@ int netif_receive_skb(struct sk_buff *sk ncls: #endif - if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) - goto out; - type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) { if (ptype->type == type && -- Stephen Hemminger <shemminger@linux-foundation.org> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC 2/2] bridge: per device promiscious taps 2007-03-02 22:09 ` [RFC 1/2] " Stephen Hemminger @ 2007-03-02 22:14 ` Stephen Hemminger 2007-03-02 22:48 ` [RFC 1/2] bridge: avoid ptype_all packet handling David Miller 1 sibling, 0 replies; 25+ messages in thread From: Stephen Hemminger @ 2007-03-02 22:14 UTC (permalink / raw) To: David Miller; +Cc: bridge, netdev Part of the next set of bridge patches includes this. It allows packet capture by interface on a bridge: tcpdump -i eth0 will work as expected. @@ -128,34 +125,45 @@ static inline int is_link_local(const un int br_handle_frame(struct net_bridge_port *p, struct sk_buff **pskb) { struct sk_buff *skb = *pskb; + struct sk_buff *skb2 = NULL; const unsigned char *dest = eth_hdr(skb)->h_dest; if (!is_valid_ether_addr(eth_hdr(skb)->h_source)) goto err; if (unlikely(is_link_local(dest))) { skb->pkt_type = PACKET_HOST; return NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev, NULL, br_handle_local_finish) != 0; } + + if (unlikely(p->dev->promiscuity > 1)) + skb2 = skb_clone(skb, GFP_ATOMIC); - if (p->state == BR_STATE_FORWARDING || p->state == BR_STATE_LEARNING) { + switch (p->state) { + case BR_STATE_FORWARDING: if (br_should_route_hook) { - if (br_should_route_hook(pskb)) + if (br_should_route_hook(pskb)) { + kfree_skb(skb2); return 0; + } skb = *pskb; dest = eth_hdr(skb)->h_dest; } if (!compare_ether_addr(p->br->dev->dev_addr, dest)) skb->pkt_type = PACKET_HOST; + /* fall thru */ + case BR_STATE_LEARNING: NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL, br_handle_frame_finish); - return 1; + break; + + default: + kfree_skb(skb); } -err: - kfree_skb(skb); - return 1; + if (likely(!skb2)) + return 1; + + *pskb = skb2; + return 0; } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-02 22:09 ` [RFC 1/2] " Stephen Hemminger 2007-03-02 22:14 ` [RFC 2/2] bridge: per device promiscious taps Stephen Hemminger @ 2007-03-02 22:48 ` David Miller 2007-03-02 23:18 ` David Miller 1 sibling, 1 reply; 25+ messages in thread From: David Miller @ 2007-03-02 22:48 UTC (permalink / raw) To: shemminger; +Cc: bridge, netdev From: Stephen Hemminger <shemminger@linux-foundation.org> Date: Fri, 2 Mar 2007 14:09:29 -0800 > On Fri, 02 Mar 2007 13:26:38 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > > > From: Stephen Hemminger <shemminger@linux-foundation.org> > > Date: Wed, 28 Feb 2007 17:18:46 -0800 > > > > > I was measuring bridging/routing performance and noticed this. > > > > > > The current code runs the "all packet" type handlers before calling the > > > bridge hook. If an application (like some DHCP clients) is using AF_PACKET, > > > this means that each received packet gets run through the Berkeley Packet Filter > > > code in sk_run_filter (slow). > > > > I know we closed this out by saying that even though performance > > sucks, we can't really apply this without breaking things. > > wrong. I disagee, and your patch is still broken because as Jamal pointed out (which you didn't address in any way) this breaks traffic classification of bridged traffic as well. If someone wants their network tap to hear all traffic, they do mean all traffic, and this includes potentially seeing it multiple times when things like bridging and virtual devices decap incoming frames. We can't apply this. Back to a workable solution, why doesn't DHCP specify a specific device? It would fix this performance problem completely, at the application level. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-02 22:48 ` [RFC 1/2] bridge: avoid ptype_all packet handling David Miller @ 2007-03-02 23:18 ` David Miller 2007-03-02 23:34 ` Stephen Hemminger 2007-03-03 5:38 ` Herbert Xu 0 siblings, 2 replies; 25+ messages in thread From: David Miller @ 2007-03-02 23:18 UTC (permalink / raw) To: shemminger; +Cc: bridge, netdev From: David Miller <davem@davemloft.net> Date: Fri, 02 Mar 2007 14:48:18 -0800 (PST) > Back to a workable solution, why doesn't DHCP specify a specific > device? It would fix this performance problem completely, at the > application level. Since nobody seems to be able to be bothered to actually look at what DHCP clients are doing, I actually did and it's no surprise that broken stuff is happening here. Here is how dhcp3-3.0.3 binds AF_PACKET sockets, in common/lpf.c: struct sockaddr sa; ... /* Bind to the interface name */ memset (&sa, 0, sizeof sa); sa.sa_family = AF_PACKET; strncpy (sa.sa_data, (const char *)info -> ifp, sizeof sa.sa_data); if (bind (sock, &sa, sizeof sa)) { if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT || errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT || errno == EAFNOSUPPORT || errno == EINVAL) { log_error ("socket: %m - make sure"); log_error ("CONFIG_PACKET (Packet socket) %s", "and CONFIG_FILTER"); log_error ("(Socket Filtering) are enabled %s", "in your kernel"); log_fatal ("configuration!"); } log_fatal ("Bind socket to interface: %m"); } So it puts a string into the sockaddr data, and there is no mention of sockaddr_ll, which is what is supposed to be provided as the socket address here, in the entire DHCP tree. I'm tempted to say I must be missing something here, since I can't see how this could possible work at all. The string passed in should be interpreted as the ifindex value, and thus trigger a -ENODEV return from AF_PACKET's bind() implementation. My suspicions are confirmed by the patch here: http://kernel.org/pub/linux/kernel/people/chuyee/patches/dhcp-3.0/dhcp-3.0-linux_cooked_packet.patch Really, this bogus bind() explains everything. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-02 23:18 ` David Miller @ 2007-03-02 23:34 ` Stephen Hemminger 2007-03-02 23:41 ` David Miller 2007-03-03 5:38 ` Herbert Xu 1 sibling, 1 reply; 25+ messages in thread From: Stephen Hemminger @ 2007-03-02 23:34 UTC (permalink / raw) To: David Miller; +Cc: bridge, netdev On Fri, 02 Mar 2007 15:18:03 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: David Miller <davem@davemloft.net> > Date: Fri, 02 Mar 2007 14:48:18 -0800 (PST) > > > Back to a workable solution, why doesn't DHCP specify a specific > > device? It would fix this performance problem completely, at the > > application level. > > Since nobody seems to be able to be bothered to actually look > at what DHCP clients are doing, I actually did and it's no > surprise that broken stuff is happening here. I was in middle of checking that.. > Here is how dhcp3-3.0.3 binds AF_PACKET sockets, in common/lpf.c: > > struct sockaddr sa; > ... > /* Bind to the interface name */ > memset (&sa, 0, sizeof sa); > sa.sa_family = AF_PACKET; > strncpy (sa.sa_data, (const char *)info -> ifp, sizeof sa.sa_data); > if (bind (sock, &sa, sizeof sa)) { > if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT || > errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT || > errno == EAFNOSUPPORT || errno == EINVAL) { > log_error ("socket: %m - make sure"); > log_error ("CONFIG_PACKET (Packet socket) %s", > "and CONFIG_FILTER"); > log_error ("(Socket Filtering) are enabled %s", > "in your kernel"); > log_fatal ("configuration!"); > } > log_fatal ("Bind socket to interface: %m"); > } > > So it puts a string into the sockaddr data, and there > is no mention of sockaddr_ll, which is what is supposed to be > provided as the socket address here, in the entire DHCP tree. > > I'm tempted to say I must be missing something here, since I can't see > how this could possible work at all. The string passed in should > be interpreted as the ifindex value, and thus trigger a -ENODEV > return from AF_PACKET's bind() implementation. > > My suspicions are confirmed by the patch here: > > http://kernel.org/pub/linux/kernel/people/chuyee/patches/dhcp-3.0/dhcp-3.0-linux_cooked_packet.patch Can you get FC fixed? > Really, this bogus bind() explains everything. Should we add a warning to kernel log, to make distro's fix it? It might make sense to add a per-device ptype_dev list in network device? -- Stephen Hemminger <shemminger@linux-foundation.org> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-02 23:34 ` Stephen Hemminger @ 2007-03-02 23:41 ` David Miller 0 siblings, 0 replies; 25+ messages in thread From: David Miller @ 2007-03-02 23:41 UTC (permalink / raw) To: shemminger; +Cc: bridge, netdev From: Stephen Hemminger <shemminger@linux-foundation.org> Date: Fri, 2 Mar 2007 15:34:14 -0800 > Can you get FC fixed? I am not the DHCP package maintainer. :-) I'm up to my earfulls already dealing with people trying to slug broken patches into the kernel networking that paper around application bugs. ;) > Should we add a warning to kernel log, to make distro's fix it? Unfortunately it looks like a properly formed sockaddr_ll, the ifindex is in fact zero, so there is nothing we can do to warn about this case. The sockaddr_ll sits after the first sockaddr string in the ifreq, and the rest remains initialized to zeros, thus the bind() succeeds. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-02 23:18 ` David Miller 2007-03-02 23:34 ` Stephen Hemminger @ 2007-03-03 5:38 ` Herbert Xu 2007-03-03 5:59 ` David Miller 1 sibling, 1 reply; 25+ messages in thread From: Herbert Xu @ 2007-03-03 5:38 UTC (permalink / raw) To: David Miller; +Cc: shemminger, bridge, netdev David Miller <davem@davemloft.net> wrote: > > I'm tempted to say I must be missing something here, since I can't see > how this could possible work at all. The string passed in should > be interpreted as the ifindex value, and thus trigger a -ENODEV > return from AF_PACKET's bind() implementation. This is using packet_bind_spkt which uses a name instead of ifindex. As you may recall, I've made a patch to convert it to use the new (actually it's not-so-new anymore) AF_PACKET interface. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-03 5:38 ` Herbert Xu @ 2007-03-03 5:59 ` David Miller 2007-03-03 6:42 ` Herbert Xu 0 siblings, 1 reply; 25+ messages in thread From: David Miller @ 2007-03-03 5:59 UTC (permalink / raw) To: herbert; +Cc: shemminger, bridge, netdev From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sat, 03 Mar 2007 16:38:45 +1100 > This is using packet_bind_spkt which uses a name instead of ifindex. So it should be just fine, it should be binding to a specific device (by name instead of ifindex) and therefore it should only trigger the pt_all hook when the packet arrives on that specific device. > As you may recall, I've made a patch to convert it to use the new > (actually it's not-so-new anymore) AF_PACKET interface. That's right. So it's still a mystery why dhcp is causing bridge devices to trigger the network tap paths on Stephen's machine. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-03 5:59 ` David Miller @ 2007-03-03 6:42 ` Herbert Xu 0 siblings, 0 replies; 25+ messages in thread From: Herbert Xu @ 2007-03-03 6:42 UTC (permalink / raw) To: David Miller; +Cc: shemminger, bridge, netdev On Fri, Mar 02, 2007 at 09:59:05PM -0800, David Miller wrote: > > So it's still a mystery why dhcp is causing bridge devices > to trigger the network tap paths on Stephen's machine. If this is the ISC DHCP daemon then perhaps it's because Stephen didn't specify an interface for it to listen on? By default it'll enumerate all broadcast interfaces and listen to each one of them. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC 1/2] bridge: avoid ptype_all packet handling 2007-03-02 21:26 ` David Miller 2007-03-02 22:09 ` [RFC 1/2] " Stephen Hemminger @ 2007-03-02 22:15 ` Stephen Hemminger 2007-03-03 12:04 ` [PATCH] " Stefan Rompf 2 siblings, 0 replies; 25+ messages in thread From: Stephen Hemminger @ 2007-03-02 22:15 UTC (permalink / raw) To: David Miller; +Cc: bridge, netdev On Fri, 02 Mar 2007 13:26:38 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: Stephen Hemminger <shemminger@linux-foundation.org> > Date: Wed, 28 Feb 2007 17:18:46 -0800 > > > I was measuring bridging/routing performance and noticed this. > > > > The current code runs the "all packet" type handlers before calling the > > bridge hook. If an application (like some DHCP clients) is using AF_PACKET, > > this means that each received packet gets run through the Berkeley Packet Filter > > code in sk_run_filter (slow). > > I know we closed this out by saying that even though performance > sucks, we can't really apply this without breaking things. wrong. > What would be broken is if the DHCP client isn't specifying > a device ifindex when it binds the AF_PACKET socket. That > would be an easy way to fix this performance problem at the > application level. > > The DHCP client should only care about a particular interface's > traffic, the one it wants to listen on. My assumption is that when bridging, the normal stack path only has to receive those packets that it would receive if it was not doing bridging. A better version of the patch is: ============== The current code runs the "all packet" type handlers before calling the bridge hook. If an application (like some DHCP clients) is using AF_PACKET, this means that each received packet gets run through the Berkeley Packet Filter code in sk_run_filter. This is significant overhead. By moving the bridging hook to run first, the packets flowing through the bridge get filtered out there first. This results in a 14% improvement in performance. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> --- net/core/dev.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) --- netem.orig/net/core/dev.c +++ netem/net/core/dev.c @@ -1702,9 +1702,12 @@ struct net_bridge_fdb_entry *(*br_fdb_ge unsigned char *addr); void (*br_fdb_put_hook)(struct net_bridge_fdb_entry *ent); -static __inline__ int handle_bridge(struct sk_buff **pskb, - struct packet_type **pt_prev, int *ret, - struct net_device *orig_dev) +/* + * If bridge module is loaded call bridging hook. + * when it returns 1, this is a non-local packet + */ +int (*br_handle_frame_hook)(struct net_bridge_port *p, struct sk_buff **pskb) __read_mostly; +static int handle_bridge(struct sk_buff **pskb) { struct net_bridge_port *port; @@ -1712,15 +1715,10 @@ static __inline__ int handle_bridge(stru (port = rcu_dereference((*pskb)->dev->br_port)) == NULL) return 0; - if (*pt_prev) { - *ret = deliver_skb(*pskb, *pt_prev, orig_dev); - *pt_prev = NULL; - } - return br_handle_frame_hook(port, pskb); } #else -#define handle_bridge(skb, pt_prev, ret, orig_dev) (0) +#define handle_bridge(pskb) 0 #endif #ifdef CONFIG_NET_CLS_ACT @@ -1799,6 +1797,9 @@ int netif_receive_skb(struct sk_buff *sk } #endif + if (handle_bridge(&skb)) + goto out; + list_for_each_entry_rcu(ptype, &ptype_all, list) { if (!ptype->dev || ptype->dev == skb->dev) { if (pt_prev) @@ -1826,9 +1827,6 @@ int netif_receive_skb(struct sk_buff *sk ncls: #endif - if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) - goto out; - type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) { if (ptype->type == type && -- Stephen Hemminger <shemminger@linux-foundation.org> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] bridge: avoid ptype_all packet handling 2007-03-02 21:26 ` David Miller 2007-03-02 22:09 ` [RFC 1/2] " Stephen Hemminger 2007-03-02 22:15 ` Stephen Hemminger @ 2007-03-03 12:04 ` Stefan Rompf 2 siblings, 0 replies; 25+ messages in thread From: Stefan Rompf @ 2007-03-03 12:04 UTC (permalink / raw) To: David Miller; +Cc: shemminger, bridge, netdev Am Freitag, 2. März 2007 22:26 schrieb David Miller: > The DHCP client should only care about a particular interface's > traffic, the one it wants to listen on. Also, a DHCP client should close the socket between address acquisition and renewal. The only interesting events in that period are operstate changes. Stefan ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2007-03-03 12:31 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-01 1:18 [PATCH] bridge: avoid ptype_all packet handling Stephen Hemminger 2007-03-01 1:28 ` Ben Greear 2007-03-01 3:56 ` Stephen Hemminger 2007-03-01 4:05 ` Ben Greear 2007-03-01 7:04 ` Stephen Hemminger 2007-03-01 7:22 ` David Miller 2007-03-01 7:26 ` Stephen Hemminger 2007-03-01 7:30 ` David Miller 2007-03-01 11:47 ` jamal 2007-03-03 2:14 ` Andi Kleen 2007-03-03 4:22 ` David Miller 2007-03-03 7:09 ` Stephen Hemminger 2007-03-03 12:30 ` Andi Kleen 2007-03-02 21:26 ` David Miller 2007-03-02 22:09 ` [RFC 1/2] " Stephen Hemminger 2007-03-02 22:14 ` [RFC 2/2] bridge: per device promiscious taps Stephen Hemminger 2007-03-02 22:48 ` [RFC 1/2] bridge: avoid ptype_all packet handling David Miller 2007-03-02 23:18 ` David Miller 2007-03-02 23:34 ` Stephen Hemminger 2007-03-02 23:41 ` David Miller 2007-03-03 5:38 ` Herbert Xu 2007-03-03 5:59 ` David Miller 2007-03-03 6:42 ` Herbert Xu 2007-03-02 22:15 ` Stephen Hemminger 2007-03-03 12:04 ` [PATCH] " Stefan Rompf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).