* Re: ipvs oops in 3.0-rc7
From: Randy Dunlap @ 2011-07-21 17:26 UTC (permalink / raw)
To: Huajun Li
Cc: Simon Horman, netdev, lvs-devel, Wensong Zhang, Julian Anastasov
In-Reply-To: <CA+v9cxYJtrpUSHWY5Z2R14kZjX1c5nd2yYrvTx2URLJz26d_Tw@mail.gmail.com>
On Thu, 21 Jul 2011 16:42:17 +0800 Huajun Li wrote:
> Hi Randy and Simon,
> I happened to meet the issue too, loading and unloading module of
> ip_vs, then loading it again will cause Oops, the root cause may be
> ip_vs_dst_notifier is not unregistered. Please try following patch, it
> works for me.
>
>
> Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
> ---
> net/netfilter/ipvs/ip_vs_ctl.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 699c79a..a178cb3 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -3771,6 +3771,7 @@ err_sock:
> void ip_vs_control_cleanup(void)
> {
> EnterFunction(2);
> + unregister_netdevice_notifier(&ip_vs_dst_notifier);
> ip_vs_genl_unregister();
> nf_unregister_sockopt(&ip_vs_sockopts);
> LeaveFunction(2);
> --
Yes, this patch or the one here: http://www.spinics.net/lists/lvs-devel/msg02051.html
works. Thanks.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
> 2011/7/21 Simon Horman <horms@verge.net.au>:
> > On Wed, Jul 20, 2011 at 08:50:19PM -0700, Randy Dunlap wrote:
> >> I'm seeing the following Oops in 3.0-rc7 on x86_64, just loading and unloading
> >> modules. Any chance this is already fixed? I can test current git, but I
> >> wanted to ask first.
> >>
> >> Looks like it is on the second module load of ip_vs (i.e.,
> >> modprobe ip_vs; rmmod ip_vs; modprobe ip_vs).
> >
> > Hi Randy,
> >
> > I don't believe that this problem has been resolved (or observed before).
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
^ permalink raw reply
* Re: IPv6: autoconfiguration and suspend/resume or link down/up
From: Dan Williams @ 2011-07-21 16:35 UTC (permalink / raw)
To: Jiri Bohac; +Cc: netdev, Herbert Xu, David S. Miller, stephen hemminger
In-Reply-To: <1311226254.3140.52.camel@dcbw.foobar.com>
On Thu, 2011-07-21 at 00:30 -0500, Dan Williams wrote:
> On Wed, 2011-07-20 at 18:36 +0200, Jiri Bohac wrote:
> > On Wed, Jul 20, 2011 at 11:21:43AM -0500, Dan Williams wrote:
> > > ... and in the resume handler use that value to age anything
> > > that needs to know about time spent in suspend, and then do what needs
> > > to be done with that. So something like that may work for IPv6
> > > addrconf; on suspend save current time, and on resume check the current
> > > time, subtract the time you saved on suspend, and magically add that to
> > > the lifetime counts and then run any expiry stuff.
> >
> > IPv6 (by specification) does not send any RS when an IP address
> > or route expires. So only subtracting the supend time from the
> > lifetimes and possibly expiring the routes/IP addresses won't fix
> > the problem.
>
> Well, the prefix option of the RA includes the Valid Lifetime (in
> seconds, no less) so I'd assume the kernel starts a timer when it
> receives the RA and updates any addresses configured as a result of
> receiving that RA+prefix, such that when the timer expires, the
> autoconfigured address is deleted. That timer can be used as a base for
> the expiry mechanism that I've noted above, no? This fixes problem #1
> from your first mail.
>
> For problem #2, shouldn't a new RS be sent whenever the interface
> changes it's IFF_LOWER_UP bit? IFF_LOWER_UP indicates a carrier on/off
> event and thus indicates possible disconnect/reconnect to a new network.
> I don't specifically know how it works now, but if RS isn't triggered
> from IFF_LOWER_UP, I'd imagine that either (a) something didn't get
> updated when IFF_LOWER_UP became how carrier was indicated in 2.6.17
> (commit b00055aacdb172c05067612278ba27265fcd05ce) or (b) there's a
> reason IFF_LOWER_UP isn't used as the trigger for sending an RS and I'm
> qualified to say why.
Should be "I'm not qualified to say why".
> Dan
>
> > When I move to a new network, I need to restart the
> > autoconfiguration. This does not currently happen - neither for
> > an alive system where the ethernet link goes down/up, nor for a
> > system that gets suspended, moved and then resumed.
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [patch net-next-2.6 18/47 V3] igbvf: do vlan cleanup
From: Jiri Pirko @ 2011-07-21 16:30 UTC (permalink / raw)
To: netdev
Cc: jesse, e1000-devel, bruce.w.allan, jesse.brandeburg, mirqus,
john.ronciak, shemminger, davem
In-Reply-To: <20110721132229.GC2107@minipsycho>
- unify vlan and nonvlan rx path
- kill adapter->vlgrp and igbvf_vlan_rx_register
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/igbvf/igbvf.h | 4 ++--
drivers/net/igbvf/netdev.c | 44 ++++++++++++++------------------------------
2 files changed, 16 insertions(+), 32 deletions(-)
diff --git a/drivers/net/igbvf/igbvf.h b/drivers/net/igbvf/igbvf.h
index d5dad5d..fd4a7b7 100644
--- a/drivers/net/igbvf/igbvf.h
+++ b/drivers/net/igbvf/igbvf.h
@@ -34,7 +34,7 @@
#include <linux/timer.h>
#include <linux/io.h>
#include <linux/netdevice.h>
-
+#include <linux/if_vlan.h>
#include "vf.h"
@@ -173,7 +173,7 @@ struct igbvf_adapter {
const struct igbvf_info *ei;
- struct vlan_group *vlgrp;
+ unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
u32 bd_number;
u32 rx_buffer_len;
u32 polling_interval;
diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
index 64b47bf..1330c8e 100644
--- a/drivers/net/igbvf/netdev.c
+++ b/drivers/net/igbvf/netdev.c
@@ -100,12 +100,12 @@ static void igbvf_receive_skb(struct igbvf_adapter *adapter,
struct sk_buff *skb,
u32 status, u16 vlan)
{
- if (adapter->vlgrp && (status & E1000_RXD_STAT_VP))
- vlan_hwaccel_receive_skb(skb, adapter->vlgrp,
- le16_to_cpu(vlan) &
- E1000_RXD_SPC_VLAN_MASK);
- else
- netif_receive_skb(skb);
+ if (status & E1000_RXD_STAT_VP) {
+ u16 vid = le16_to_cpu(vlan) & E1000_RXD_SPC_VLAN_MASK;
+
+ __vlan_hwaccel_put_tag(skb, vid);
+ }
+ netif_receive_skb(skb);
}
static inline void igbvf_rx_checksum_adv(struct igbvf_adapter *adapter,
@@ -1167,12 +1167,10 @@ static int igbvf_poll(struct napi_struct *napi, int budget)
*/
static void igbvf_set_rlpml(struct igbvf_adapter *adapter)
{
- int max_frame_size = adapter->max_frame_size;
+ int max_frame_size;
struct e1000_hw *hw = &adapter->hw;
- if (adapter->vlgrp)
- max_frame_size += VLAN_TAG_SIZE;
-
+ max_frame_size = adapter->max_frame_size + VLAN_TAG_SIZE;
e1000_rlpml_set_vf(hw, max_frame_size);
}
@@ -1183,6 +1181,8 @@ static void igbvf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
if (hw->mac.ops.set_vfta(hw, vid, true))
dev_err(&adapter->pdev->dev, "Failed to add vlan id %d\n", vid);
+ else
+ set_bit(vid, adapter->active_vlans);
}
static void igbvf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
@@ -1191,7 +1191,6 @@ static void igbvf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
struct e1000_hw *hw = &adapter->hw;
igbvf_irq_disable(adapter);
- vlan_group_set_device(adapter->vlgrp, vid, NULL);
if (!test_bit(__IGBVF_DOWN, &adapter->state))
igbvf_irq_enable(adapter);
@@ -1199,30 +1198,16 @@ static void igbvf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
if (hw->mac.ops.set_vfta(hw, vid, false))
dev_err(&adapter->pdev->dev,
"Failed to remove vlan id %d\n", vid);
-}
-
-static void igbvf_vlan_rx_register(struct net_device *netdev,
- struct vlan_group *grp)
-{
- struct igbvf_adapter *adapter = netdev_priv(netdev);
-
- adapter->vlgrp = grp;
+ else
+ clear_bit(vid, adapter->active_vlans);
}
static void igbvf_restore_vlan(struct igbvf_adapter *adapter)
{
u16 vid;
- if (!adapter->vlgrp)
- return;
-
- for (vid = 0; vid < VLAN_N_VID; vid++) {
- if (!vlan_group_get_device(adapter->vlgrp, vid))
- continue;
+ for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID)
igbvf_vlan_rx_add_vid(adapter->netdev, vid);
- }
-
- igbvf_set_rlpml(adapter);
}
/**
@@ -2203,7 +2188,7 @@ static netdev_tx_t igbvf_xmit_frame_ring_adv(struct sk_buff *skb,
return NETDEV_TX_BUSY;
}
- if (adapter->vlgrp && vlan_tx_tag_present(skb)) {
+ if (vlan_tx_tag_present(skb)) {
tx_flags |= IGBVF_TX_FLAGS_VLAN;
tx_flags |= (vlan_tx_tag_get(skb) << IGBVF_TX_FLAGS_VLAN_SHIFT);
}
@@ -2556,7 +2541,6 @@ static const struct net_device_ops igbvf_netdev_ops = {
.ndo_change_mtu = igbvf_change_mtu,
.ndo_do_ioctl = igbvf_ioctl,
.ndo_tx_timeout = igbvf_tx_timeout,
- .ndo_vlan_rx_register = igbvf_vlan_rx_register,
.ndo_vlan_rx_add_vid = igbvf_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = igbvf_vlan_rx_kill_vid,
#ifdef CONFIG_NET_POLL_CONTROLLER
--
1.7.6
------------------------------------------------------------------------------
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
^ permalink raw reply related
* Re: [patch net-next-2.6 18/47 V2] igbvf: do vlan cleanup
From: Jiri Pirko @ 2011-07-21 16:23 UTC (permalink / raw)
To: Rose, Gregory V
Cc: netdev@vger.kernel.org, davem@davemloft.net,
shemminger@linux-foundation.org, eric.dumazet@gmail.com,
greearb@candelatech.com, mirqus@gmail.com, Kirsher, Jeffrey T,
Brandeburg, Jesse, Waskiewicz Jr, Peter P, Allan, Bruce W,
Wyborny, Carolyn, Skidmore, Donald C, Duyck, Alexander H,
Ronciak, John, e1000-devel@lists.sourceforge.net,
jesse@nicira.com
In-Reply-To: <43F901BD926A4E43B106BF17856F0755019404FB53@orsmsx508.amr.corp.intel.com>
Thu, Jul 21, 2011 at 05:57:08PM CEST, gregory.v.rose@intel.com wrote:
>> -----Original Message-----
>> From: Jiri Pirko [mailto:jpirko@redhat.com]
>> Sent: Thursday, July 21, 2011 6:23 AM
>> To: netdev@vger.kernel.org
>> Cc: davem@davemloft.net; shemminger@linux-foundation.org;
>> eric.dumazet@gmail.com; greearb@candelatech.com; mirqus@gmail.com;
>> Kirsher, Jeffrey T; Brandeburg, Jesse; Waskiewicz Jr, Peter P; Allan,
>> Bruce W; Wyborny, Carolyn; Skidmore, Donald C; Rose, Gregory V; Duyck,
>> Alexander H; Ronciak, John; e1000-devel@lists.sourceforge.net;
>> jesse@nicira.com
>> Subject: [patch net-next-2.6 18/47 V2] igbvf: do vlan cleanup
>>
>> - unify vlan and nonvlan rx path
>> - kill adapter->vlgrp and igbvf_vlan_rx_register
>>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>> ---
>> drivers/net/igbvf/igbvf.h | 4 +-
>> drivers/net/igbvf/netdev.c | 55 +++++++++++++++++++--------------------
>> ----
>> 2 files changed, 26 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/net/igbvf/igbvf.h b/drivers/net/igbvf/igbvf.h
>> index d5dad5d..fd4a7b7 100644
>> --- a/drivers/net/igbvf/igbvf.h
>> +++ b/drivers/net/igbvf/igbvf.h
>> @@ -34,7 +34,7 @@
>> #include <linux/timer.h>
>> #include <linux/io.h>
>> #include <linux/netdevice.h>
>> -
>> +#include <linux/if_vlan.h>
>>
>> #include "vf.h"
>>
>> @@ -173,7 +173,7 @@ struct igbvf_adapter {
>>
>> const struct igbvf_info *ei;
>>
>> - struct vlan_group *vlgrp;
>> + unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
>> u32 bd_number;
>> u32 rx_buffer_len;
>> u32 polling_interval;
>> diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
>> index 64b47bf..d924b09 100644
>> --- a/drivers/net/igbvf/netdev.c
>> +++ b/drivers/net/igbvf/netdev.c
>> @@ -100,12 +100,12 @@ static void igbvf_receive_skb(struct igbvf_adapter
>> *adapter,
>> struct sk_buff *skb,
>> u32 status, u16 vlan)
>> {
>> - if (adapter->vlgrp && (status & E1000_RXD_STAT_VP))
>> - vlan_hwaccel_receive_skb(skb, adapter->vlgrp,
>> - le16_to_cpu(vlan) &
>> - E1000_RXD_SPC_VLAN_MASK);
>> - else
>> - netif_receive_skb(skb);
>> + if (status & E1000_RXD_STAT_VP) {
>> + u16 vid = le16_to_cpu(vlan) & E1000_RXD_SPC_VLAN_MASK;
>> +
>> + __vlan_hwaccel_put_tag(skb, vid);
>> + }
>> + netif_receive_skb(skb);
>> }
>>
>> static inline void igbvf_rx_checksum_adv(struct igbvf_adapter *adapter,
>> @@ -1167,22 +1167,29 @@ static int igbvf_poll(struct napi_struct *napi,
>> int budget)
>> */
>> static void igbvf_set_rlpml(struct igbvf_adapter *adapter)
>> {
>> - int max_frame_size = adapter->max_frame_size;
>> + int max_frame_size;
>> struct e1000_hw *hw = &adapter->hw;
>>
>> - if (adapter->vlgrp)
>> - max_frame_size += VLAN_TAG_SIZE;
>> -
>> + max_frame_size = adapter->max_frame_size + VLAN_TAG_SIZE;
>> e1000_rlpml_set_vf(hw, max_frame_size);
>> }
>>
>> -static void igbvf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
>> +static bool __igbvf_vlan_rx_add_vid(struct igbvf_adapter *adapter, u16
>> vid)
>> {
>> - struct igbvf_adapter *adapter = netdev_priv(netdev);
>> struct e1000_hw *hw = &adapter->hw;
>>
>> if (hw->mac.ops.set_vfta(hw, vid, true))
>> dev_err(&adapter->pdev->dev, "Failed to add vlan id %d\n",
>> vid);
>> + return false;
>> + return true;
>> +}
>
>I'm pretty sure you intended to put a curly brace after the if statement here.
Right, missed that. Thanks.
Rsposting also with removing igbvf_set_rlpml call from
igbvf_restore_vlan since it is called from igbvf_configure_rx
Thanks
Jirka
>
>Other than that it seems fine.
>
>- Greg
>
>
^ permalink raw reply
* Re: [PATCH V2] vhost: fix check for # of outstanding buffers
From: Shirley Ma @ 2011-07-21 16:12 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: David Miller, netdev, jasowang
In-Reply-To: <20110721080617.GA20360@redhat.com>
On Thu, 2011-07-21 at 11:06 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 20, 2011 at 10:23:12AM -0700, Shirley Ma wrote:
> > Fix the check for number of outstanding buffers returns incorrect
> > results due to vq->pend_idx wrap around;
> >
> > Signed-off-by: Shirley Ma <xma@us.ibm.com>
>
> OK, the logic's right now, and it's not worse
> than what we had, so I applied this after
> fixing up the comment (it's upend_idx and English
> sentences don't need to end with a semicolumn ;)
>
> However, I would like to see the effect of the bug
> noted in the log in the future.
>
> And the reason I mention this here, is that
> I think that the whole VHOST_MAX_PEND thing
> does not work as advertised: this logic only
> triggers when the ring is empty, so we will happily push
> more than VHOST_MAX_PEND packets if the guest manages
> to give them to us.
>
> I'm not sure why we have the limit, either: the wmem
> limit in the socket still applies and seems more
> effective to prevent denial of service by a malicious guest.
Vhost can push more than VHOST_MAX_PEND if the guest manages to give
more. That's managed by wmem limit.
MAX_PEND is max of outstanding used buffers which lower level device
can't DMAed on time. socket destructor remains unchanged, so it can't
managed by wmem.
Since vhost handle_tx always calls vhost_zerocopy_singal_used() so this
condition is unlikely hit unless the lower device can't DMAed TX
MAX_PEND packets.
Thanks
Shirley
^ permalink raw reply
* RE: [patch net-next-2.6 18/47 V2] igbvf: do vlan cleanup
From: Rose, Gregory V @ 2011-07-21 15:57 UTC (permalink / raw)
To: Jiri Pirko, netdev@vger.kernel.org
Cc: davem@davemloft.net, shemminger@linux-foundation.org,
eric.dumazet@gmail.com, greearb@candelatech.com, mirqus@gmail.com,
Kirsher, Jeffrey T, Brandeburg, Jesse, Waskiewicz Jr, Peter P,
Allan, Bruce W, Wyborny, Carolyn, Skidmore, Donald C,
Duyck, Alexander H, Ronciak, John,
e1000-devel@lists.sourceforge.net, jesse@nicira.com
In-Reply-To: <20110721132229.GC2107@minipsycho>
> -----Original Message-----
> From: Jiri Pirko [mailto:jpirko@redhat.com]
> Sent: Thursday, July 21, 2011 6:23 AM
> To: netdev@vger.kernel.org
> Cc: davem@davemloft.net; shemminger@linux-foundation.org;
> eric.dumazet@gmail.com; greearb@candelatech.com; mirqus@gmail.com;
> Kirsher, Jeffrey T; Brandeburg, Jesse; Waskiewicz Jr, Peter P; Allan,
> Bruce W; Wyborny, Carolyn; Skidmore, Donald C; Rose, Gregory V; Duyck,
> Alexander H; Ronciak, John; e1000-devel@lists.sourceforge.net;
> jesse@nicira.com
> Subject: [patch net-next-2.6 18/47 V2] igbvf: do vlan cleanup
>
> - unify vlan and nonvlan rx path
> - kill adapter->vlgrp and igbvf_vlan_rx_register
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
> drivers/net/igbvf/igbvf.h | 4 +-
> drivers/net/igbvf/netdev.c | 55 +++++++++++++++++++--------------------
> ----
> 2 files changed, 26 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/net/igbvf/igbvf.h b/drivers/net/igbvf/igbvf.h
> index d5dad5d..fd4a7b7 100644
> --- a/drivers/net/igbvf/igbvf.h
> +++ b/drivers/net/igbvf/igbvf.h
> @@ -34,7 +34,7 @@
> #include <linux/timer.h>
> #include <linux/io.h>
> #include <linux/netdevice.h>
> -
> +#include <linux/if_vlan.h>
>
> #include "vf.h"
>
> @@ -173,7 +173,7 @@ struct igbvf_adapter {
>
> const struct igbvf_info *ei;
>
> - struct vlan_group *vlgrp;
> + unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
> u32 bd_number;
> u32 rx_buffer_len;
> u32 polling_interval;
> diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
> index 64b47bf..d924b09 100644
> --- a/drivers/net/igbvf/netdev.c
> +++ b/drivers/net/igbvf/netdev.c
> @@ -100,12 +100,12 @@ static void igbvf_receive_skb(struct igbvf_adapter
> *adapter,
> struct sk_buff *skb,
> u32 status, u16 vlan)
> {
> - if (adapter->vlgrp && (status & E1000_RXD_STAT_VP))
> - vlan_hwaccel_receive_skb(skb, adapter->vlgrp,
> - le16_to_cpu(vlan) &
> - E1000_RXD_SPC_VLAN_MASK);
> - else
> - netif_receive_skb(skb);
> + if (status & E1000_RXD_STAT_VP) {
> + u16 vid = le16_to_cpu(vlan) & E1000_RXD_SPC_VLAN_MASK;
> +
> + __vlan_hwaccel_put_tag(skb, vid);
> + }
> + netif_receive_skb(skb);
> }
>
> static inline void igbvf_rx_checksum_adv(struct igbvf_adapter *adapter,
> @@ -1167,22 +1167,29 @@ static int igbvf_poll(struct napi_struct *napi,
> int budget)
> */
> static void igbvf_set_rlpml(struct igbvf_adapter *adapter)
> {
> - int max_frame_size = adapter->max_frame_size;
> + int max_frame_size;
> struct e1000_hw *hw = &adapter->hw;
>
> - if (adapter->vlgrp)
> - max_frame_size += VLAN_TAG_SIZE;
> -
> + max_frame_size = adapter->max_frame_size + VLAN_TAG_SIZE;
> e1000_rlpml_set_vf(hw, max_frame_size);
> }
>
> -static void igbvf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
> +static bool __igbvf_vlan_rx_add_vid(struct igbvf_adapter *adapter, u16
> vid)
> {
> - struct igbvf_adapter *adapter = netdev_priv(netdev);
> struct e1000_hw *hw = &adapter->hw;
>
> if (hw->mac.ops.set_vfta(hw, vid, true))
> dev_err(&adapter->pdev->dev, "Failed to add vlan id %d\n",
> vid);
> + return false;
> + return true;
> +}
I'm pretty sure you intended to put a curly brace after the if statement here.
Other than that it seems fine.
- Greg
^ permalink raw reply
* Re: v3.0-rc* intermittent network failure: how to debug?
From: Richard Kennedy @ 2011-07-21 15:18 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
In-Reply-To: <20110721143218.GA10595@electric-eye.fr.zoreil.com>
On Thu, 2011-07-21 at 16:32 +0200, Francois Romieu wrote:
> Richard Kennedy <richard@rsk.demon.co.uk> :
> > I keep seeing a total network failure on v3.0.0-rc* , it is highly
> > intermittent, anything from 1 hour to 12+, and I don't have a reliable
> > test case.
> > When it fails I lose all network comms, but there are no errors in the
> > system log, no hung tasks reported, nothing. But after it fails the
> > machine hangs during shutdown, it just never turns off. So I guess
> > something is getting stuck but I can't find it.
>
> Assuming the kernel hangs late enough, you can try the "reboot=" kernel
> parameter and see if a value in arch/x86/include/asm/emergency-restart.h
> makes a difference.
>
> > Can you suggest how to find out what going on?
>
> Switch into text mode before starting the reboot sequence then send a
> magic sysrq T or W ?
>
> > I'm going to add a serial console and see if that helps.
>
> It will help, especially with the kilometer long output of sysrq.
>
> > this is on a x86_64, via_velocity currently running 3.0.0-rc7 latest.
> >
> > all suggestions gratefully received
>
> Last via-velocity change in mainline dates back to may 25 (see
> d10358de8d70aaeb965a974d56e9b72f6c6dbb3a). Were you previously fine
> with a recent enough kernel to rule it out ?
>
Thanks Francois,
I'll try the reboot= tomorrow.
I don't really know when my last know good was, it could be that
via-velocity change, but the problem is so intermittent it's difficult
to be sure. I've been trying to stress the network to make the problem
happen sooner but I've had no luck yet.
regards
Richard
^ permalink raw reply
* Kernel 2.6.38.3 panic and hangs on network traffic
From: Rustam Afanasyev @ 2011-07-21 15:20 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 617 bytes --]
Hi!
I starting up new server for the test. It's task is terminate L2TP&PPTP
users. And when i have opened server (NAS) for users - its crush quickly :(
What it's could be?
Hardware is Intel Q6600 with 4GB, Intel nic (igb from inel, latest, RSS
enabled 4 queue per RX,TX on both nic). Software is accel-ppp and
openl2tpd, ppp redirects to ifb(no ip is up on it, just shaping), ipset,
iptables.
If need more info - i can give it. Strange is - sometimes (mostly) i see
panic, but sometime it's just hangs. Only reset helps...
All IF'ses are IPV4. What on the users side - must be ipv4, but i think
not the only...
[-- Attachment #2: trap.txt --]
[-- Type: text/plain, Size: 6900 bytes --]
[ 6089.358502] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 6089.362482] IP: [<ffffffffa0264e24>] __nf_conntrack_confirm+0x2b4/0x480 [nf_conntrack]
[ 6089.362482] PGD 117822067 PUD 117823067 PMD 0
[ 6089.362482] Oops: 0002 [#1] SMP
[ 6089.362482] last sysfs file: /sys/devices/virtual/net/ppp6/uevent
[ 6089.362482] CPU 1
[ 6089.362482] Modules linked in: act_mirred act_skbedit cls_u32 sch_ingress arc4 ecb ppp_mppe l2tp_ppp l2tp_netlink l2tpd
[ 6089.362482]
[ 6089.362482] Pid: 0, comm: kworker/0:0 Not tainted 2.6.39-std-def-alt3 #1 ASUS RS100-E4/PI2/P5M2-M/RS100-E4
[ 6089.362482] RIP: 0010:[<ffffffffa0264e24>] [<ffffffffa0264e24>] __nf_conntrack_confirm+0x2b4/0x480 [nf_conntrack]
[ 6089.362482] RSP: 0018:ffff88011fc83a80 EFLAGS: 00010202
[ 6089.362482] RAX: 00000000000047cd RBX: ffff88011fc94620 RCX: 0000000000000000
[ 6089.362482] RDX: 000000000000fa10 RSI: 00000000d00dd719 RDI: ffffffffa026f680
[ 6089.362482] RBP: ffff88011fc83ac0 R08: 00000000e753847a R09: ffff880117f00000
[ 6089.362482] R10: 0000000000004000 R11: 0000000000000001 R12: 0000000000000000
[ 6089.362482] R13: ffff880116a95080 R14: ffffffff81a4e1c0 R15: 00000000000114d8
[ 6089.362482] FS: 0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
[ 6089.362482] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6089.362482] CR2: 0000000000000000 CR3: 0000000117821000 CR4: 00000000000006e0
[ 6089.362482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6089.362482] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6089.362482] Process kworker/0:0 (pid: 0, threadinfo ffff880118ef6000, task ffff880118ef42c0)
[ 6089.362482] Stack:
[ 6089.362482] 000000007d09da80 0000400016bd4400 0000000116ea2000 ffff880116a95080
[ 6089.362482] 0000000000000001 ffff88011fc94620 0000000000000002 ffff880116ea2000
[ 6089.362482] ffff88011fc83b30 ffffffffa02b1dd8 ffff880116ea2000 0000000000000000
[ 6089.362482] Call Trace:
[ 6089.362482] <IRQ>
[ 6089.362482] [<ffffffffa02b1dd8>] ipv4_confirm+0x188/0x1c0 [nf_conntrack_ipv4]
[ 6089.362482] [<ffffffff81382004>] nf_iterate+0x84/0xa0
[ 6089.362482] [<ffffffff81389390>] ? ip_rcv_finish+0x390/0x390
[ 6089.362482] [<ffffffff81382096>] nf_hook_slow+0x76/0x130
[ 6089.362482] [<ffffffff81389390>] ? ip_rcv_finish+0x390/0x390
[ 6089.362482] [<ffffffff813897d7>] ip_local_deliver+0x67/0x90
[ 6089.362482] [<ffffffff81389135>] ip_rcv_finish+0x135/0x390
[ 6089.362482] [<ffffffff81389a1c>] ip_rcv+0x21c/0x2e0
[ 6089.362482] [<ffffffff81356f5a>] __netif_receive_skb+0x52a/0x690
[ 6089.362482] [<ffffffff813572d0>] netif_receive_skb+0x60/0x90
[ 6089.362482] [<ffffffff8124de9c>] ? is_swiotlb_buffer+0x3c/0x50
[ 6089.362482] [<ffffffff81357440>] napi_skb_finish+0x50/0x70
[ 6089.362482] [<ffffffff813579bd>] napi_gro_receive+0xbd/0xd0
[ 6089.362482] [<ffffffffa012129b>] igb_poll+0x6fb/0xae0 [igb]
[ 6089.362482] [<ffffffff8107fb61>] ? enqueue_hrtimer+0x31/0x80
[ 6090.302391] [<ffffffff81357be5>] net_rx_action+0x135/0x270
[ 6090.302391] [<ffffffff81062705>] __do_softirq+0xa5/0x1d0
[ 6090.302391] [<ffffffff8141301c>] call_softirq+0x1c/0x30
[ 6090.302391] [<ffffffff8100d355>] do_softirq+0x65/0xa0
[ 6090.302391] [<ffffffff81062a96>] irq_exit+0x86/0xa0
[ 6090.302391] [<ffffffff8100cf01>] do_IRQ+0x61/0xe0
[ 6090.302391] [<ffffffff8140a593>] common_interrupt+0x13/0x13
[ 6090.302391] <EOI>
[ 6090.302391] [<ffffffff81012deb>] ? mwait_idle+0x9b/0x1d0
[ 6090.302391] [<ffffffff8140de35>] ? atomic_notifier_call_chain+0x15/0x20
[ 6090.302391] [<ffffffff8100a1e6>] cpu_idle+0x56/0xa0
[ 6090.302391] [<ffffffff81403729>] start_secondary+0x197/0x19c
[ 6090.302391] Code: ff 74 09 40 0f b6 ff 41 0f b7 0c 38 66 41 39 cc 0f 84 f1 fe ff ff 48 8b 00 a8 01 0f 84 51 ff ff ff 4
[ 6090.302391] 89 01 75 04 48 89 48 08 48 8b 05 4c 4a 60 e1 48 01 83 98 00
[ 6090.302391] RIP [<ffffffffa0264e24>] __nf_conntrack_confirm+0x2b4/0x480 [nf_conntrack]
[ 6090.302391] RSP <ffff88011fc83a80>
[ 6090.302391] CR2: 0000000000000000
[ 6090.611999] ---[ end trace ded67c8afb62f164 ]---
[ 6090.625910] Kernel panic - not syncing: Fatal exception in interrupt
[ 6090.645022] Pid: 0, comm: kworker/0:0 Tainted: G D 2.6.39-std-def-alt3 #1
[ 6090.667822] Call Trace:
[ 6090.675232] <IRQ> [<ffffffff81407284>] panic+0x8c/0x197
[ 6090.691563] [<ffffffff8140b4d2>] oops_end+0xe2/0xf0
[ 6090.706514] [<ffffffff81039ad0>] no_context+0xf0/0x260
[ 6090.722242] [<ffffffff81382096>] ? nf_hook_slow+0x76/0x130
[ 6090.739013] [<ffffffff8138e0f0>] ? ip_fragment+0x960/0x960
[ 6090.755784] [<ffffffff81039d65>] __bad_area_nosemaphore+0x125/0x1e0
[ 6090.774894] [<ffffffff81039e2e>] bad_area_nosemaphore+0xe/0x10
[ 6090.792703] [<ffffffff8140dbf6>] do_page_fault+0x306/0x4b0
[ 6090.809473] [<ffffffff8106b48e>] ? mod_timer+0x15e/0x2c0
[ 6090.825725] [<ffffffff81345b47>] ? sk_reset_timer+0x17/0x30
[ 6090.842754] [<ffffffff81393eeb>] ? inet_csk_reset_keepalive_timer+0x1b/0x20
[ 6090.863945] [<ffffffff81393fe7>] ? inet_csk_reqsk_queue_hash_add+0xf7/0x110
[ 6090.885135] [<ffffffff8140a855>] page_fault+0x25/0x30
[ 6090.900608] [<ffffffffa0264e24>] ? __nf_conntrack_confirm+0x2b4/0x480 [nf_conntrack]
[ 6090.924189] [<ffffffffa02b1dd8>] ipv4_confirm+0x188/0x1c0 [nf_conntrack_ipv4]
[ 6090.945950] [<ffffffff81382004>] nf_iterate+0x84/0xa0
[ 6090.961420] [<ffffffff81389390>] ? ip_rcv_finish+0x390/0x390
[ 6090.978710] [<ffffffff81382096>] nf_hook_slow+0x76/0x130
[ 6090.994959] [<ffffffff81389390>] ? ip_rcv_finish+0x390/0x390
[ 6091.012250] [<ffffffff813897d7>] ip_local_deliver+0x67/0x90
[ 6091.029281] [<ffffffff81389135>] ip_rcv_finish+0x135/0x390
[ 6091.046052] [<ffffffff81389a1c>] ip_rcv+0x21c/0x2e0
[ 6091.061000] [<ffffffff81356f5a>] __netif_receive_skb+0x52a/0x690
[ 6091.079332] [<ffffffff813572d0>] netif_receive_skb+0x60/0x90
[ 6091.096622] [<ffffffff8124de9c>] ? is_swiotlb_buffer+0x3c/0x50
[ 6091.114431] [<ffffffff81357440>] napi_skb_finish+0x50/0x70
[ 6091.131201] [<ffffffff813579bd>] napi_gro_receive+0xbd/0xd0
[ 6091.148234] [<ffffffffa012129b>] igb_poll+0x6fb/0xae0 [igb]
[ 6091.165262] [<ffffffff8107fb61>] ? enqueue_hrtimer+0x31/0x80
[ 6091.182553] [<ffffffff81357be5>] net_rx_action+0x135/0x270
[ 6091.199323] [<ffffffff81062705>] __do_softirq+0xa5/0x1d0
[ 6091.215573] [<ffffffff8141301c>] call_softirq+0x1c/0x30
[ 6091.231562] [<ffffffff8100d355>] do_softirq+0x65/0xa0
[ 6091.247035] [<ffffffff81062a96>] irq_exit+0x86/0xa0
[ 6091.261983] [<ffffffff8100cf01>] do_IRQ+0x61/0xe0
[ 6091.276413] [<ffffffff8140a593>] common_interrupt+0x13/0x13
[ 6091.293442] <EOI> [<ffffffff81012deb>] ? mwait_idle+0x9b/0x1d0
[ 6091.311593] [<ffffffff8140de35>] ? atomic_notifier_call_chain+0x15/0x20
[ 6091.331742] [<ffffffff8100a1e6>] cpu_idle+0x56/0xa0
[ 6091.346692] [<ffffffff81403729>] start_secondary+0x197/0x19c
[ 6091.363986] Rebooting in 30 seconds..
^ permalink raw reply
* RE: Bridging behavior apparently changed around the Fedora 14 time
From: Greg Scott @ 2011-07-21 15:01 UTC (permalink / raw)
To: netdev; +Cc: Lynn Hanson, Joe Whalen, Graham Parenteau, David Lamparter
In-Reply-To: <925A849792280C4E80C5461017A4B8A2A04134@mail733.InfraSupportEtc.com>
Aw nuts, nothing is ever straightforward.
When I do:
ip link set br0 promisc on
My internal users can see the internally hosted websites using the
public IP Addresses. The router on a stick rules I put in work just
fine. (In on br0/eth1, DNATed in PREROUTING, MASQUERADEd in
POSTROUTING, back out br0/eth1 to the correct internal host.)
However, I just learned last night, this breaks both inbound and
outbound PPTP VPNs. And when I do:
ip link set br0 promisc off
now my PPTP VPNs work, but this breaks my above router on a stick rules.
My PPTP VPN stuff uses the GRE iptables conntrack modules,
ip_conntrack_pptp and ip_nat_pptp, and some PREROUTING and POSTROUTING
rules to DNAT TCP 1723 and all GRE packets to an internal Windows RRAS
server. But when I turn promisc on for br0, I see a storm of packets
looping over and over again, until the remote client finally times out
after what seems like an eternity.
I'll bet that bridge forwards my packets out the wrong physical ethnn
interface when it's in promisc mode and that's why my NATed PPTP VPN
breaks. Which makes me wonder if putting br0 in promisc mode breaks any
of my other NATed services.
So back to square one I guess.
- Greg Scott
-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
On Behalf Of Greg Scott
Sent: Tuesday, July 12, 2011 11:29 AM
To: David Lamparter
Cc: netdev@vger.kernel.org; Lynn Hanson; Joe Whalen
Subject: RE: Bridging behavior apparently changed around the Fedora 14
time
> P.S.: you blissfully ignored my "ip neigh add proxy 1.2.3.4" note :)
Sorry - didn't ignore it, just didn't reply back to it. I'll look into
it. What I've read about this before has all been kind of vague. Does
this mean I proxy ARP only for IP Address 1.2.3.4? So somebody sends an
ARP whois 1.2.3.4, I'll answer with 1.2.3.4. is at {My MAC Address}? If
so, then I agree, not nearly as evil as just setting proxy_arp.
> Whoa. And here I was almost ashamed of running 2.6.38. I'm sorry, but
I
> think you need to go bug RedHat.
Yeah, maybe. OK, probably. This was such a bizarre problem - I started
with Netfilter and those guys suggested I try here. At least now I
understand the problem lots better than before. And it's not like I can
just go and update dozens of kernels at dozens of sites all the time
when a new kernel comes out.
> You totally misunderstood me. I'm suggesting the separate VLAN for
your
> servers which have private IPs but which have services exposed to the
> internet (and your clients) on public IPs through NAT.
Ahh - OK. The challenge with many small sites is, economic reality.
That same server that hosts the public ftp and websites also hosts all
the internal Windows file/print services. It's the only server at this
site, so it has several roles. I would love to build a real DMZ network
and put all the public facing stuff in there, but I don't have money for
multiple servers. This will become even more difficult to separate when
we go to virtual servers and clustered hosts.
> Your H323 stuff is totally unrelated.
Agreed. Wholeheartedly.
> Yes. Your problem seems to be between the private-IP clients in your
> network and your private-IP servers if I understand correctly.
Yes. Dead-bang, right on target.
> Yes. And because it is a router, it as an IP from the private subnet
> your clients are in. My question was: what device is that IP on?
Ahh - eth1 is the private LAN side, 192.168.10.1. All the NATed LAN
stuff and all the workstations are in the 192.168.10.0/24 subnet and
connected to eth1. Eth0 is the Internet side. The Internet side has
the firewall NIC, a cable, and the Internet router. That's it.
Everything is connected to the LAN side.
> No. You're jumping to conclusions. You're affecting the "top" bridge
> device's promiscuity. I would say that the effect you're seeing is in
> the IP stack above it, caused by it now promiscuously handling packets
> that are dropped otherwise.
Well they were sure dropped before I set it to PROMISC mode, that's for
sure. And it all worked with the earlier version. That's why this feels
like a layer 2 issue. If it was an IP issue, why didn't it break
several years ago when I first set it up?
Does bridging make everything a little more complex and delicate to set
up? Well, yeah. And some of the netfilter stuff has been a moving
target over the years.
I don't see how ICMP redirects matter. Comparing
/proc/sys/net/ipv4/conf/*/accept_redirects with this version and an
older one at another site - all identical. ../all/accept_recdirects is
0, the rest are all 1. Shared media and ARP settings -
/proc/sys/net/ipv4/conf/*/shared_media - all 1 for all interfaces.
There are a zillion arp settings. Looking at
/proc/sys/net/ipv4/conf/*/*arp* - all are 0 in both the other older site
and this newer site.
Curiously - at one of my other older sites, apparently br0 is not in
promisc mode. But I don't think these guys do any of the stick routing
stuff. I wonder if these guys have the problem but we don't see it
because they never try it?
[root@NSSSS-fw1 ~]# more /sys/class/net/br0/flags
0x1003
[root@NSSSS-fw1 ~]#
[root@NSSSS-fw1 ~]# more /proc/version
Linux version 2.6.32.11-99.fc12.i686.PAE
(mockbuild@x86-05.phx2.fedoraproject.org) (gcc version 4.4.3 20100127
(Red Hat 4.4.3-4) (GCC) )
#1 SMP Mon Apr 5 16:15:03 EDT 2010
[root@NSSSS-fw1 ~]#
[root@NSSSS-fw1 ~]# uname -a
Linux NSSSS-fw1 2.6.32.11-99.fc12.i686.PAE #1 SMP Mon Apr 5 16:15:03 EDT
2010 i686 i686 i386 GNU/Linux
[root@NSSSS-fw1 ~]#
Here is a much older bridged site based on Fedora 9 and I'm sure these
guys use my stick routing stuff. Look at the difference in ..br0/flags.
[root@lme-fw2 ~]# more /sys/class/net/br0/flags
0x1103
[root@lme-fw2 ~]#
[root@lme-fw2 ~]# more /proc/version
Linux version 2.6.25-14.fc9.i686 (mockbuild@) (gcc version 4.3.0
20080428 (Red H
at 4.3.0-8) (GCC) ) #1 SMP Thu May 1 06:28:41 EDT 2008
[root@lme-fw2 ~]#
[root@lme-fw2 ~]# uname -a
Linux lme-fw2 2.6.25-14.fc9.i686 #1 SMP Thu May 1 06:28:41 EDT 2008 i686
i686 i386 GNU/Linux
I can still get my hands on the old box at the site in question. I
guess it couldn't hurt to fire it up and look at its br0 flags.
- Greg
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: v3.0-rc* intermittent network failure: how to debug?
From: Francois Romieu @ 2011-07-21 14:32 UTC (permalink / raw)
To: Richard Kennedy; +Cc: netdev
In-Reply-To: <1311256194.2980.18.camel@castor.rsk>
Richard Kennedy <richard@rsk.demon.co.uk> :
> I keep seeing a total network failure on v3.0.0-rc* , it is highly
> intermittent, anything from 1 hour to 12+, and I don't have a reliable
> test case.
> When it fails I lose all network comms, but there are no errors in the
> system log, no hung tasks reported, nothing. But after it fails the
> machine hangs during shutdown, it just never turns off. So I guess
> something is getting stuck but I can't find it.
Assuming the kernel hangs late enough, you can try the "reboot=" kernel
parameter and see if a value in arch/x86/include/asm/emergency-restart.h
makes a difference.
> Can you suggest how to find out what going on?
Switch into text mode before starting the reboot sequence then send a
magic sysrq T or W ?
> I'm going to add a serial console and see if that helps.
It will help, especially with the kilometer long output of sysrq.
> this is on a x86_64, via_velocity currently running 3.0.0-rc7 latest.
>
> all suggestions gratefully received
Last via-velocity change in mainline dates back to may 25 (see
d10358de8d70aaeb965a974d56e9b72f6c6dbb3a). Were you previously fine
with a recent enough kernel to rule it out ?
--
Ueimor
^ permalink raw reply
* v3.0-rc* intermittent network failure: how to debug?
From: Richard Kennedy @ 2011-07-21 13:49 UTC (permalink / raw)
To: netdev
I keep seeing a total network failure on v3.0.0-rc* , it is highly
intermittent, anything from 1 hour to 12+, and I don't have a reliable
test case.
When it fails I lose all network comms, but there are no errors in the
system log, no hung tasks reported, nothing. But after it fails the
machine hangs during shutdown, it just never turns off. So I guess
something is getting stuck but I can't find it.
Can you suggest how to find out what going on?
Or how collect more information as to what's failing?
I'm going to add a serial console and see if that helps.
this is on a x86_64, via_velocity currently running 3.0.0-rc7 latest.
all suggestions gratefully received
regards
Richard
^ permalink raw reply
* Re: ipvs oops in 3.0-rc7
From: Huajun Li @ 2011-07-21 13:37 UTC (permalink / raw)
To: Julian Anastasov
Cc: Randy Dunlap, netdev, lvs-devel, Simon Horman, Wensong Zhang
In-Reply-To: <alpine.LFD.2.00.1107211202380.1660@ja.ssi.bg>
2011/7/21 Julian Anastasov <ja@ssi.bg>:
>
> Hello,
>
> On Wed, 20 Jul 2011, Randy Dunlap wrote:
>
>> I'm seeing the following Oops in 3.0-rc7 on x86_64, just loading and unloading
>> modules. Any chance this is already fixed? I can test current git, but I
>> wanted to ask first.
>>
>> Looks like it is on the second module load of ip_vs (i.e.,
>> modprobe ip_vs; rmmod ip_vs; modprobe ip_vs).
>
> I think, this problem was fixed by this patch:
>
> http://www.spinics.net/lists/lvs-devel/msg02051.html
>
> But it seems it was lost somewhere ...
>
That's great, SB. can help to apply it again, thanks.
^ permalink raw reply
* [patch net-next-2.6 37/47 V2] igb: do vlan cleanup
From: Jiri Pirko @ 2011-07-21 13:27 UTC (permalink / raw)
To: netdev
Cc: davem, shemminger, eric.dumazet, greearb, mirqus,
jeffrey.t.kirsher, jesse.brandeburg, peter.p.waskiewicz.jr,
bruce.w.allan, carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
alexander.h.duyck, john.ronciak, e1000-devel, jesse
In-Reply-To: <1311173689-17419-38-git-send-email-jpirko@redhat.com>
- unify vlan and nonvlan rx path
- kill adapter->vlgrp and igb_vlan_rx_register
- allow to turn on/off rx/tx vlan accel via ethtool (set_features)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/igb/igb.h | 4 +-
drivers/net/igb/igb_main.c | 90 +++++++++++++++++++++----------------------
2 files changed, 47 insertions(+), 47 deletions(-)
diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index 0389ff6..265e151 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -37,6 +37,8 @@
#include <linux/clocksource.h>
#include <linux/timecompare.h>
#include <linux/net_tstamp.h>
+#include <linux/bitops.h>
+#include <linux/if_vlan.h>
struct igb_adapter;
@@ -252,7 +254,7 @@ static inline int igb_desc_unused(struct igb_ring *ring)
struct igb_adapter {
struct timer_list watchdog_timer;
struct timer_list phy_info_timer;
- struct vlan_group *vlgrp;
+ unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
u16 mng_vlan_id;
u32 bd_number;
u32 wol;
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index f4d82b2..cb8c6bb 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -28,6 +28,7 @@
#include <linux/module.h>
#include <linux/types.h>
#include <linux/init.h>
+#include <linux/bitops.h>
#include <linux/vmalloc.h>
#include <linux/pagemap.h>
#include <linux/netdevice.h>
@@ -46,6 +47,7 @@
#include <linux/if_ether.h>
#include <linux/aer.h>
#include <linux/prefetch.h>
+#include <linux/if_vlan.h>
#ifdef CONFIG_IGB_DCA
#include <linux/dca.h>
#endif
@@ -140,7 +142,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *, int *, int);
static int igb_ioctl(struct net_device *, struct ifreq *, int cmd);
static void igb_tx_timeout(struct net_device *);
static void igb_reset_task(struct work_struct *);
-static void igb_vlan_rx_register(struct net_device *, struct vlan_group *);
+static void igb_vlan_mode(struct net_device *netdev, u32 features);
static void igb_vlan_rx_add_vid(struct net_device *, u16);
static void igb_vlan_rx_kill_vid(struct net_device *, u16);
static void igb_restore_vlan(struct igb_adapter *);
@@ -1362,7 +1364,7 @@ static void igb_update_mng_vlan(struct igb_adapter *adapter)
if ((old_vid != (u16)IGB_MNG_VLAN_NONE) &&
(vid != old_vid) &&
- !vlan_group_get_device(adapter->vlgrp, old_vid)) {
+ !test_bit(old_vid, adapter->active_vlans)) {
/* remove VID from filter table */
igb_vfta_set(hw, old_vid, false);
}
@@ -1748,10 +1750,25 @@ void igb_reset(struct igb_adapter *adapter)
igb_get_phy_info(hw);
}
+static u32 igb_fix_features(struct net_device *netdev, u32 features)
+{
+ /*
+ * Since there is no support for separate rx/tx vlan accel
+ * enable/disable make sure tx flag is always in same state as rx.
+ */
+ if (features & NETIF_F_HW_VLAN_RX)
+ features |= NETIF_F_HW_VLAN_TX;
+ else
+ features &= ~NETIF_F_HW_VLAN_TX;
+
+ return features;
+}
+
static int igb_set_features(struct net_device *netdev, u32 features)
{
struct igb_adapter *adapter = netdev_priv(netdev);
int i;
+ u32 changed = netdev->features ^ features;
for (i = 0; i < adapter->num_rx_queues; i++) {
if (features & NETIF_F_RXCSUM)
@@ -1760,6 +1777,9 @@ static int igb_set_features(struct net_device *netdev, u32 features)
adapter->rx_ring[i]->flags &= ~IGB_RING_FLAG_RX_CSUM;
}
+ if (changed & NETIF_F_HW_VLAN_RX)
+ igb_vlan_mode(netdev, features);
+
return 0;
}
@@ -1775,7 +1795,6 @@ static const struct net_device_ops igb_netdev_ops = {
.ndo_do_ioctl = igb_ioctl,
.ndo_tx_timeout = igb_tx_timeout,
.ndo_validate_addr = eth_validate_addr,
- .ndo_vlan_rx_register = igb_vlan_rx_register,
.ndo_vlan_rx_add_vid = igb_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = igb_vlan_rx_kill_vid,
.ndo_set_vf_mac = igb_ndo_set_vf_mac,
@@ -1785,7 +1804,8 @@ static const struct net_device_ops igb_netdev_ops = {
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = igb_netpoll,
#endif
- .ndo_set_features = igb_set_features,
+ .ndo_fix_features = igb_fix_features,
+ .ndo_set_features = igb_set_features,
};
/**
@@ -1930,11 +1950,11 @@ static int __devinit igb_probe(struct pci_dev *pdev,
NETIF_F_IPV6_CSUM |
NETIF_F_TSO |
NETIF_F_TSO6 |
- NETIF_F_RXCSUM;
+ NETIF_F_RXCSUM |
+ NETIF_F_HW_VLAN_RX;
netdev->features = netdev->hw_features |
NETIF_F_HW_VLAN_TX |
- NETIF_F_HW_VLAN_RX |
NETIF_F_HW_VLAN_FILTER;
netdev->vlan_features |= NETIF_F_TSO;
@@ -2057,6 +2077,8 @@ static int __devinit igb_probe(struct pci_dev *pdev,
if (err)
goto err_register;
+ igb_vlan_mode(netdev, netdev->features);
+
/* carrier off reporting is important to ethtool even BEFORE open */
netif_carrier_off(netdev);
@@ -2939,12 +2961,11 @@ static inline int igb_set_vf_rlpml(struct igb_adapter *adapter, int size,
**/
static void igb_rlpml_set(struct igb_adapter *adapter)
{
- u32 max_frame_size = adapter->max_frame_size;
+ u32 max_frame_size;
struct e1000_hw *hw = &adapter->hw;
u16 pf_id = adapter->vfs_allocated_count;
- if (adapter->vlgrp)
- max_frame_size += VLAN_TAG_SIZE;
+ max_frame_size = adapter->max_frame_size + VLAN_TAG_SIZE;
/* if vfs are enabled we set RLPML to the largest possible request
* size and set the VMOLR RLPML to the size we need */
@@ -5693,25 +5714,6 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
return count < tx_ring->count;
}
-/**
- * igb_receive_skb - helper function to handle rx indications
- * @q_vector: structure containing interrupt and ring information
- * @skb: packet to send up
- * @vlan_tag: vlan tag for packet
- **/
-static void igb_receive_skb(struct igb_q_vector *q_vector,
- struct sk_buff *skb,
- u16 vlan_tag)
-{
- struct igb_adapter *adapter = q_vector->adapter;
-
- if (vlan_tag && adapter->vlgrp)
- vlan_gro_receive(&q_vector->napi, adapter->vlgrp,
- vlan_tag, skb);
- else
- napi_gro_receive(&q_vector->napi, skb);
-}
-
static inline void igb_rx_checksum_adv(struct igb_ring *ring,
u32 status_err, struct sk_buff *skb)
{
@@ -5809,7 +5811,6 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
unsigned int i;
u32 staterr;
u16 length;
- u16 vlan_tag;
i = rx_ring->next_to_clean;
buffer_info = &rx_ring->buffer_info[i];
@@ -5894,10 +5895,12 @@ send_up:
skb->protocol = eth_type_trans(skb, netdev);
skb_record_rx_queue(skb, rx_ring->queue_index);
- vlan_tag = ((staterr & E1000_RXD_STAT_VP) ?
- le16_to_cpu(rx_desc->wb.upper.vlan) : 0);
+ if (staterr & E1000_RXD_STAT_VP) {
+ u16 vid = le16_to_cpu(rx_desc->wb.upper.vlan);
- igb_receive_skb(q_vector, skb, vlan_tag);
+ __vlan_hwaccel_put_tag(skb, vid);
+ }
+ napi_gro_receive(&q_vector->napi, skb);
next_desc:
rx_desc->wb.upper.status_error = 0;
@@ -6290,17 +6293,15 @@ s32 igb_write_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value)
return 0;
}
-static void igb_vlan_rx_register(struct net_device *netdev,
- struct vlan_group *grp)
+static void igb_vlan_mode(struct net_device *netdev, u32 features)
{
struct igb_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
u32 ctrl, rctl;
igb_irq_disable(adapter);
- adapter->vlgrp = grp;
- if (grp) {
+ if (features & NETIF_F_HW_VLAN_RX) {
/* enable VLAN tag insert/strip */
ctrl = rd32(E1000_CTRL);
ctrl |= E1000_CTRL_VME;
@@ -6334,6 +6335,8 @@ static void igb_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
/* add the filter since PF can receive vlans w/o entry in vlvf */
igb_vfta_set(hw, vid, true);
+
+ set_bit(vid, adapter->active_vlans);
}
static void igb_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
@@ -6344,7 +6347,6 @@ static void igb_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
s32 err;
igb_irq_disable(adapter);
- vlan_group_set_device(adapter->vlgrp, vid, NULL);
if (!test_bit(__IGB_DOWN, &adapter->state))
igb_irq_enable(adapter);
@@ -6355,20 +6357,16 @@ static void igb_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
/* if vid was not present in VLVF just remove it from table */
if (err)
igb_vfta_set(hw, vid, false);
+
+ clear_bit(vid, adapter->active_vlans);
}
static void igb_restore_vlan(struct igb_adapter *adapter)
{
- igb_vlan_rx_register(adapter->netdev, adapter->vlgrp);
+ u16 vid;
- if (adapter->vlgrp) {
- u16 vid;
- for (vid = 0; vid < VLAN_N_VID; vid++) {
- if (!vlan_group_get_device(adapter->vlgrp, vid))
- continue;
- igb_vlan_rx_add_vid(adapter->netdev, vid);
- }
- }
+ for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID)
+ igb_vlan_rx_add_vid(adapter->netdev, vid);
}
int igb_set_spd_dplx(struct igb_adapter *adapter, u32 spd, u8 dplx)
--
1.7.6
^ permalink raw reply related
* [patch net-next-2.6 35/47 V2] e1000: do vlan cleanup
From: Jiri Pirko @ 2011-07-21 13:26 UTC (permalink / raw)
To: netdev
Cc: davem, shemminger, eric.dumazet, greearb, mirqus,
jeffrey.t.kirsher, jesse.brandeburg, peter.p.waskiewicz.jr,
bruce.w.allan, carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
alexander.h.duyck, john.ronciak, e1000-devel, jesse
In-Reply-To: <1311173689-17419-36-git-send-email-jpirko@redhat.com>
- unify vlan and nonvlan rx path
- kill adapter->vlgrp and e1000_vlan_rx_register
- allow to turn on/off rx/tx vlan accel via ethtool (set_features)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/e1000/e1000.h | 2 +-
drivers/net/e1000/e1000_main.c | 168 +++++++++++++++++++++++++---------------
2 files changed, 108 insertions(+), 62 deletions(-)
diff --git a/drivers/net/e1000/e1000.h b/drivers/net/e1000/e1000.h
index 8676899..24f41da 100644
--- a/drivers/net/e1000/e1000.h
+++ b/drivers/net/e1000/e1000.h
@@ -215,7 +215,7 @@ struct e1000_adapter {
struct timer_list tx_fifo_stall_timer;
struct timer_list watchdog_timer;
struct timer_list phy_info_timer;
- struct vlan_group *vlgrp;
+ unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
u16 mng_vlan_id;
u32 bd_number;
u32 rx_buffer_len;
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 188d99a..acaebec 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -30,6 +30,8 @@
#include <net/ip6_checksum.h>
#include <linux/io.h>
#include <linux/prefetch.h>
+#include <linux/bitops.h>
+#include <linux/if_vlan.h>
/* Intel Media SOC GbE MDIO physical base address */
static unsigned long ce4100_gbe_mdio_base_phy;
@@ -166,7 +168,8 @@ static void e1000_smartspeed(struct e1000_adapter *adapter);
static int e1000_82547_fifo_workaround(struct e1000_adapter *adapter,
struct sk_buff *skb);
-static void e1000_vlan_rx_register(struct net_device *netdev, struct vlan_group *grp);
+static bool e1000_vlan_used(struct e1000_adapter *adapter);
+static void e1000_vlan_mode(struct net_device *netdev, u32 features);
static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid);
static void e1000_vlan_rx_kill_vid(struct net_device *netdev, u16 vid);
static void e1000_restore_vlan(struct e1000_adapter *adapter);
@@ -330,21 +333,24 @@ static void e1000_update_mng_vlan(struct e1000_adapter *adapter)
struct net_device *netdev = adapter->netdev;
u16 vid = hw->mng_cookie.vlan_id;
u16 old_vid = adapter->mng_vlan_id;
- if (adapter->vlgrp) {
- if (!vlan_group_get_device(adapter->vlgrp, vid)) {
- if (hw->mng_cookie.status &
- E1000_MNG_DHCP_COOKIE_STATUS_VLAN_SUPPORT) {
- e1000_vlan_rx_add_vid(netdev, vid);
- adapter->mng_vlan_id = vid;
- } else
- adapter->mng_vlan_id = E1000_MNG_VLAN_NONE;
- if ((old_vid != (u16)E1000_MNG_VLAN_NONE) &&
- (vid != old_vid) &&
- !vlan_group_get_device(adapter->vlgrp, old_vid))
- e1000_vlan_rx_kill_vid(netdev, old_vid);
- } else
+ if (!e1000_vlan_used(adapter))
+ return;
+
+ if (!test_bit(vid, adapter->active_vlans)) {
+ if (hw->mng_cookie.status &
+ E1000_MNG_DHCP_COOKIE_STATUS_VLAN_SUPPORT) {
+ e1000_vlan_rx_add_vid(netdev, vid);
adapter->mng_vlan_id = vid;
+ } else {
+ adapter->mng_vlan_id = E1000_MNG_VLAN_NONE;
+ }
+ if ((old_vid != (u16)E1000_MNG_VLAN_NONE) &&
+ (vid != old_vid) &&
+ !test_bit(old_vid, adapter->active_vlans))
+ e1000_vlan_rx_kill_vid(netdev, old_vid);
+ } else {
+ adapter->mng_vlan_id = vid;
}
}
@@ -797,11 +803,28 @@ static int e1000_is_need_ioport(struct pci_dev *pdev)
}
}
+static u32 e1000_fix_features(struct net_device *netdev, u32 features)
+{
+ /*
+ * Since there is no support for separate rx/tx vlan accel
+ * enable/disable make sure tx flag is always in same state as rx.
+ */
+ if (features & NETIF_F_HW_VLAN_RX)
+ features |= NETIF_F_HW_VLAN_TX;
+ else
+ features &= ~NETIF_F_HW_VLAN_TX;
+
+ return features;
+}
+
static int e1000_set_features(struct net_device *netdev, u32 features)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
u32 changed = features ^ netdev->features;
+ if (changed & NETIF_F_HW_VLAN_RX)
+ e1000_vlan_mode(netdev, features);
+
if (!(changed & NETIF_F_RXCSUM))
return 0;
@@ -822,18 +845,17 @@ static const struct net_device_ops e1000_netdev_ops = {
.ndo_get_stats = e1000_get_stats,
.ndo_set_rx_mode = e1000_set_rx_mode,
.ndo_set_mac_address = e1000_set_mac,
- .ndo_tx_timeout = e1000_tx_timeout,
+ .ndo_tx_timeout = e1000_tx_timeout,
.ndo_change_mtu = e1000_change_mtu,
.ndo_do_ioctl = e1000_ioctl,
.ndo_validate_addr = eth_validate_addr,
-
- .ndo_vlan_rx_register = e1000_vlan_rx_register,
.ndo_vlan_rx_add_vid = e1000_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = e1000_vlan_rx_kill_vid,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = e1000_netpoll,
#endif
- .ndo_set_features = e1000_set_features,
+ .ndo_fix_features = e1000_fix_features,
+ .ndo_set_features = e1000_set_features,
};
/**
@@ -1036,9 +1058,9 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
if (hw->mac_type >= e1000_82543) {
netdev->hw_features = NETIF_F_SG |
- NETIF_F_HW_CSUM;
+ NETIF_F_HW_CSUM |
+ NETIF_F_HW_VLAN_RX;
netdev->features = NETIF_F_HW_VLAN_TX |
- NETIF_F_HW_VLAN_RX |
NETIF_F_HW_VLAN_FILTER;
}
@@ -1197,6 +1219,8 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
if (err)
goto err_register;
+ e1000_vlan_mode(netdev, netdev->features);
+
/* print bus type/speed/width info */
e_info(probe, "(PCI%s:%dMHz:%d-bit) %pM\n",
((hw->bus_type == e1000_bus_type_pcix) ? "-X" : ""),
@@ -1441,8 +1465,7 @@ static int e1000_close(struct net_device *netdev)
* the same ID is registered on the host OS (let 8021q kill it) */
if ((hw->mng_cookie.status &
E1000_MNG_DHCP_COOKIE_STATUS_VLAN_SUPPORT) &&
- !(adapter->vlgrp &&
- vlan_group_get_device(adapter->vlgrp, adapter->mng_vlan_id))) {
+ !test_bit(adapter->mng_vlan_id, adapter->active_vlans)) {
e1000_vlan_rx_kill_vid(netdev, adapter->mng_vlan_id);
}
@@ -2233,7 +2256,7 @@ static void e1000_set_rx_mode(struct net_device *netdev)
else
rctl &= ~E1000_RCTL_MPE;
/* Enable VLAN filter if there is a VLAN */
- if (adapter->vlgrp)
+ if (e1000_vlan_used(adapter))
rctl |= E1000_RCTL_VFE;
}
@@ -3180,7 +3203,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
}
}
- if (unlikely(vlan_tx_tag_present(skb))) {
+ if (vlan_tx_tag_present(skb)) {
tx_flags |= E1000_TX_FLAGS_VLAN;
tx_flags |= (vlan_tx_tag_get(skb) << E1000_TX_FLAGS_VLAN_SHIFT);
}
@@ -3735,12 +3758,12 @@ static void e1000_receive_skb(struct e1000_adapter *adapter, u8 status,
{
skb->protocol = eth_type_trans(skb, adapter->netdev);
- if ((unlikely(adapter->vlgrp && (status & E1000_RXD_STAT_VP))))
- vlan_gro_receive(&adapter->napi, adapter->vlgrp,
- le16_to_cpu(vlan) & E1000_RXD_SPC_VLAN_MASK,
- skb);
- else
- napi_gro_receive(&adapter->napi, skb);
+ if (status & E1000_RXD_STAT_VP) {
+ u16 vid = le16_to_cpu(vlan) & E1000_RXD_SPC_VLAN_MASK;
+
+ __vlan_hwaccel_put_tag(skb, vid);
+ }
+ napi_gro_receive(&adapter->napi, skb);
}
/**
@@ -4523,46 +4546,61 @@ void e1000_io_write(struct e1000_hw *hw, unsigned long port, u32 value)
outl(value, port);
}
-static void e1000_vlan_rx_register(struct net_device *netdev,
- struct vlan_group *grp)
+static bool e1000_vlan_used(struct e1000_adapter *adapter)
+{
+ u16 vid;
+
+ for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID)
+ return true;
+ return false;
+}
+
+static void e1000_vlan_filter_on_off(struct e1000_adapter *adapter,
+ bool filter_on)
{
- struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
- u32 ctrl, rctl;
+ u32 rctl;
if (!test_bit(__E1000_DOWN, &adapter->flags))
e1000_irq_disable(adapter);
- adapter->vlgrp = grp;
-
- if (grp) {
- /* enable VLAN tag insert/strip */
- ctrl = er32(CTRL);
- ctrl |= E1000_CTRL_VME;
- ew32(CTRL, ctrl);
+ if (filter_on) {
/* enable VLAN receive filtering */
rctl = er32(RCTL);
rctl &= ~E1000_RCTL_CFIEN;
- if (!(netdev->flags & IFF_PROMISC))
+ if (!(adapter->netdev->flags & IFF_PROMISC))
rctl |= E1000_RCTL_VFE;
ew32(RCTL, rctl);
e1000_update_mng_vlan(adapter);
} else {
- /* disable VLAN tag insert/strip */
- ctrl = er32(CTRL);
- ctrl &= ~E1000_CTRL_VME;
- ew32(CTRL, ctrl);
-
/* disable VLAN receive filtering */
rctl = er32(RCTL);
rctl &= ~E1000_RCTL_VFE;
ew32(RCTL, rctl);
+ }
- if (adapter->mng_vlan_id != (u16)E1000_MNG_VLAN_NONE) {
- e1000_vlan_rx_kill_vid(netdev, adapter->mng_vlan_id);
- adapter->mng_vlan_id = E1000_MNG_VLAN_NONE;
- }
+ if (!test_bit(__E1000_DOWN, &adapter->flags))
+ e1000_irq_enable(adapter);
+}
+
+static void e1000_vlan_mode(struct net_device *netdev, u32 features)
+{
+ struct e1000_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+ u32 ctrl;
+
+ if (!test_bit(__E1000_DOWN, &adapter->flags))
+ e1000_irq_disable(adapter);
+
+ ctrl = er32(CTRL);
+ if (features & NETIF_F_HW_VLAN_RX) {
+ /* enable VLAN tag insert/strip */
+ ctrl |= E1000_CTRL_VME;
+ } else {
+ /* disable VLAN tag insert/strip */
+ ctrl &= ~E1000_CTRL_VME;
}
+ ew32(CTRL, ctrl);
if (!test_bit(__E1000_DOWN, &adapter->flags))
e1000_irq_enable(adapter);
@@ -4578,11 +4616,17 @@ static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
E1000_MNG_DHCP_COOKIE_STATUS_VLAN_SUPPORT) &&
(vid == adapter->mng_vlan_id))
return;
+
+ if (!e1000_vlan_used(adapter))
+ e1000_vlan_filter_on_off(adapter, true);
+
/* add VID to filter table */
index = (vid >> 5) & 0x7F;
vfta = E1000_READ_REG_ARRAY(hw, VFTA, index);
vfta |= (1 << (vid & 0x1F));
e1000_write_vfta(hw, index, vfta);
+
+ set_bit(vid, adapter->active_vlans);
}
static void e1000_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
@@ -4593,7 +4637,6 @@ static void e1000_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
if (!test_bit(__E1000_DOWN, &adapter->flags))
e1000_irq_disable(adapter);
- vlan_group_set_device(adapter->vlgrp, vid, NULL);
if (!test_bit(__E1000_DOWN, &adapter->flags))
e1000_irq_enable(adapter);
@@ -4602,20 +4645,23 @@ static void e1000_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
vfta = E1000_READ_REG_ARRAY(hw, VFTA, index);
vfta &= ~(1 << (vid & 0x1F));
e1000_write_vfta(hw, index, vfta);
+
+ clear_bit(vid, adapter->active_vlans);
+
+ if (!e1000_vlan_used(adapter))
+ e1000_vlan_filter_on_off(adapter, false);
}
static void e1000_restore_vlan(struct e1000_adapter *adapter)
{
- e1000_vlan_rx_register(adapter->netdev, adapter->vlgrp);
+ u16 vid;
- if (adapter->vlgrp) {
- u16 vid;
- for (vid = 0; vid < VLAN_N_VID; vid++) {
- if (!vlan_group_get_device(adapter->vlgrp, vid))
- continue;
- e1000_vlan_rx_add_vid(adapter->netdev, vid);
- }
- }
+ if (!e1000_vlan_used(adapter))
+ return;
+
+ e1000_vlan_filter_on_off(adapter, true);
+ for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID)
+ e1000_vlan_rx_add_vid(adapter->netdev, vid);
}
int e1000_set_spd_dplx(struct e1000_adapter *adapter, u32 spd, u8 dplx)
--
1.7.6
^ permalink raw reply related
* [patch net-next-2.6 26/47 V2] ixgbevf: do vlan cleanup
From: Jiri Pirko @ 2011-07-21 13:25 UTC (permalink / raw)
To: netdev
Cc: davem, shemminger, eric.dumazet, greearb, mirqus,
jeffrey.t.kirsher, jesse.brandeburg, peter.p.waskiewicz.jr,
bruce.w.allan, carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
alexander.h.duyck, john.ronciak, e1000-devel, jesse
In-Reply-To: <1311173689-17419-27-git-send-email-jpirko@redhat.com>
- unify vlan and nonvlan rx path
- kill adapter->vlgrp and ixgbevf_vlan_rx_register
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/ixgbevf/ixgbevf.h | 6 ++--
drivers/net/ixgbevf/ixgbevf_main.c | 64 ++++++++---------------------------
2 files changed, 18 insertions(+), 52 deletions(-)
diff --git a/drivers/net/ixgbevf/ixgbevf.h b/drivers/net/ixgbevf/ixgbevf.h
index a2bbbb3..8857df4 100644
--- a/drivers/net/ixgbevf/ixgbevf.h
+++ b/drivers/net/ixgbevf/ixgbevf.h
@@ -29,9 +29,11 @@
#define _IXGBEVF_H_
#include <linux/types.h>
+#include <linux/bitops.h>
#include <linux/timer.h>
#include <linux/io.h>
#include <linux/netdevice.h>
+#include <linux/if_vlan.h>
#include "vf.h"
@@ -185,9 +187,7 @@ struct ixgbevf_q_vector {
/* board specific private data structure */
struct ixgbevf_adapter {
struct timer_list watchdog_timer;
-#ifdef NETIF_F_HW_VLAN_TX
- struct vlan_group *vlgrp;
-#endif
+ unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
u16 bd_number;
struct work_struct reset_task;
struct ixgbevf_q_vector *q_vector[MAX_MSIX_Q_VECTORS];
diff --git a/drivers/net/ixgbevf/ixgbevf_main.c b/drivers/net/ixgbevf/ixgbevf_main.c
index fec36bd..3b880a2 100644
--- a/drivers/net/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ixgbevf/ixgbevf_main.c
@@ -30,6 +30,7 @@
Copyright (c)2006 - 2007 Myricom, Inc. for some LRO specific code
******************************************************************************/
#include <linux/types.h>
+#include <linux/bitops.h>
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/netdevice.h>
@@ -288,21 +289,17 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
{
struct ixgbevf_adapter *adapter = q_vector->adapter;
bool is_vlan = (status & IXGBE_RXD_STAT_VP);
- u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
- if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) {
- if (adapter->vlgrp && is_vlan)
- vlan_gro_receive(&q_vector->napi,
- adapter->vlgrp,
- tag, skb);
- else
+ if (is_vlan) {
+ u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
+
+ __vlan_hwaccel_put_tag(skb, tag);
+ }
+
+ if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
napi_gro_receive(&q_vector->napi, skb);
- } else {
- if (adapter->vlgrp && is_vlan)
- vlan_hwaccel_rx(skb, adapter->vlgrp, tag);
- else
+ else
netif_rx(skb);
- }
}
/**
@@ -1401,24 +1398,6 @@ static void ixgbevf_configure_rx(struct ixgbevf_adapter *adapter)
}
}
-static void ixgbevf_vlan_rx_register(struct net_device *netdev,
- struct vlan_group *grp)
-{
- struct ixgbevf_adapter *adapter = netdev_priv(netdev);
- struct ixgbe_hw *hw = &adapter->hw;
- int i, j;
- u32 ctrl;
-
- adapter->vlgrp = grp;
-
- for (i = 0; i < adapter->num_rx_queues; i++) {
- j = adapter->rx_ring[i].reg_idx;
- ctrl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(j));
- ctrl |= IXGBE_RXDCTL_VME;
- IXGBE_WRITE_REG(hw, IXGBE_VFRXDCTL(j), ctrl);
- }
-}
-
static void ixgbevf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
{
struct ixgbevf_adapter *adapter = netdev_priv(netdev);
@@ -1427,6 +1406,7 @@ static void ixgbevf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
/* add VID to filter table */
if (hw->mac.ops.set_vfta)
hw->mac.ops.set_vfta(hw, vid, 0, true);
+ set_bit(vid, adapter->active_vlans);
}
static void ixgbevf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
@@ -1434,31 +1414,18 @@ static void ixgbevf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
struct ixgbevf_adapter *adapter = netdev_priv(netdev);
struct ixgbe_hw *hw = &adapter->hw;
- if (!test_bit(__IXGBEVF_DOWN, &adapter->state))
- ixgbevf_irq_disable(adapter);
-
- vlan_group_set_device(adapter->vlgrp, vid, NULL);
-
- if (!test_bit(__IXGBEVF_DOWN, &adapter->state))
- ixgbevf_irq_enable(adapter, true, true);
-
/* remove VID from filter table */
if (hw->mac.ops.set_vfta)
hw->mac.ops.set_vfta(hw, vid, 0, false);
+ clear_bit(vid, adapter->active_vlans);
}
static void ixgbevf_restore_vlan(struct ixgbevf_adapter *adapter)
{
- ixgbevf_vlan_rx_register(adapter->netdev, adapter->vlgrp);
+ u16 vid;
- if (adapter->vlgrp) {
- u16 vid;
- for (vid = 0; vid < VLAN_N_VID; vid++) {
- if (!vlan_group_get_device(adapter->vlgrp, vid))
- continue;
- ixgbevf_vlan_rx_add_vid(adapter->netdev, vid);
- }
- }
+ for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID)
+ ixgbevf_vlan_rx_add_vid(adapter->netdev, vid);
}
static int ixgbevf_write_uc_addr_list(struct net_device *netdev)
@@ -1648,7 +1615,7 @@ static int ixgbevf_up_complete(struct ixgbevf_adapter *adapter)
for (i = 0; i < num_rx_rings; i++) {
j = adapter->rx_ring[i].reg_idx;
rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(j));
- rxdctl |= IXGBE_RXDCTL_ENABLE;
+ rxdctl |= IXGBE_RXDCTL_ENABLE | IXGBE_RXDCTL_VME;
if (hw->mac.type == ixgbe_mac_X540_vf) {
rxdctl &= ~IXGBE_RXDCTL_RLPMLMASK;
rxdctl |= ((netdev->mtu + ETH_HLEN + ETH_FCS_LEN) |
@@ -3258,7 +3225,6 @@ static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_set_mac_address = ixgbevf_set_mac,
.ndo_change_mtu = ixgbevf_change_mtu,
.ndo_tx_timeout = ixgbevf_tx_timeout,
- .ndo_vlan_rx_register = ixgbevf_vlan_rx_register,
.ndo_vlan_rx_add_vid = ixgbevf_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = ixgbevf_vlan_rx_kill_vid,
};
--
1.7.6
^ permalink raw reply related
* [patch net-next-2.6 21/47 V2] qlge: do vlan cleanup
From: Jiri Pirko @ 2011-07-21 13:24 UTC (permalink / raw)
To: netdev
Cc: davem, shemminger, eric.dumazet, greearb, mirqus,
jitendra.kalsaria, ron.mercer, linux-driver, jesse
In-Reply-To: <1311173689-17419-22-git-send-email-jpirko@redhat.com>
- unify vlan and nonvlan path
- kill qdev->vlgrp and qlge_vlan_rx_register
- allow to turn on/off rx/tx vlan accel via ethtool (set_features)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/qlge/qlge.h | 3 +-
drivers/net/qlge/qlge_main.c | 163 ++++++++++++++++++++++++------------------
2 files changed, 94 insertions(+), 72 deletions(-)
diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index 794252c..8731f79 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -11,6 +11,7 @@
#include <linux/pci.h>
#include <linux/netdevice.h>
#include <linux/rtnetlink.h>
+#include <linux/if_vlan.h>
/*
* General definitions...
@@ -2052,7 +2053,7 @@ struct ql_adapter {
struct nic_stats nic_stats;
- struct vlan_group *vlgrp;
+ unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
/* PCI Configuration information for this device */
struct pci_dev *pdev;
diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index 68fbfac..743e3ec 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -7,6 +7,7 @@
*/
#include <linux/kernel.h>
#include <linux/init.h>
+#include <linux/bitops.h>
#include <linux/types.h>
#include <linux/module.h>
#include <linux/list.h>
@@ -33,6 +34,7 @@
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/ethtool.h>
+#include <linux/if_vlan.h>
#include <linux/skbuff.h>
#include <linux/if_vlan.h>
#include <linux/delay.h>
@@ -415,7 +417,7 @@ static int ql_set_mac_addr_reg(struct ql_adapter *qdev, u8 *addr, u32 type,
(qdev->
func << CAM_OUT_FUNC_SHIFT) |
(0 << CAM_OUT_CQ_ID_SHIFT));
- if (qdev->vlgrp)
+ if (qdev->ndev->features & NETIF_F_HW_VLAN_RX)
cam_output |= CAM_OUT_RV;
/* route to NIC core */
ql_write32(qdev, MAC_ADDR_DATA, cam_output);
@@ -1507,10 +1509,9 @@ static void ql_process_mac_rx_gro_page(struct ql_adapter *qdev,
rx_ring->rx_bytes += length;
skb->ip_summed = CHECKSUM_UNNECESSARY;
skb_record_rx_queue(skb, rx_ring->cq_id);
- if (qdev->vlgrp && (vlan_id != 0xffff))
- vlan_gro_frags(&rx_ring->napi, qdev->vlgrp, vlan_id);
- else
- napi_gro_frags(napi);
+ if (vlan_id != 0xffff)
+ __vlan_hwaccel_put_tag(skb, vlan_id);
+ napi_gro_frags(napi);
}
/* Process an inbound completion from an rx ring. */
@@ -1594,17 +1595,12 @@ static void ql_process_mac_rx_page(struct ql_adapter *qdev,
}
skb_record_rx_queue(skb, rx_ring->cq_id);
- if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
- if (qdev->vlgrp && (vlan_id != 0xffff))
- vlan_gro_receive(napi, qdev->vlgrp, vlan_id, skb);
- else
- napi_gro_receive(napi, skb);
- } else {
- if (qdev->vlgrp && (vlan_id != 0xffff))
- vlan_hwaccel_receive_skb(skb, qdev->vlgrp, vlan_id);
- else
- netif_receive_skb(skb);
- }
+ if (vlan_id != 0xffff)
+ __vlan_hwaccel_put_tag(skb, vlan_id);
+ if (skb->ip_summed == CHECKSUM_UNNECESSARY)
+ napi_gro_receive(napi, skb);
+ else
+ netif_receive_skb(skb);
return;
err_out:
dev_kfree_skb_any(skb);
@@ -1707,18 +1703,12 @@ static void ql_process_mac_rx_skb(struct ql_adapter *qdev,
}
skb_record_rx_queue(skb, rx_ring->cq_id);
- if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
- if (qdev->vlgrp && (vlan_id != 0xffff))
- vlan_gro_receive(&rx_ring->napi, qdev->vlgrp,
- vlan_id, skb);
- else
- napi_gro_receive(&rx_ring->napi, skb);
- } else {
- if (qdev->vlgrp && (vlan_id != 0xffff))
- vlan_hwaccel_receive_skb(skb, qdev->vlgrp, vlan_id);
- else
- netif_receive_skb(skb);
- }
+ if (vlan_id != 0xffff)
+ __vlan_hwaccel_put_tag(skb, vlan_id);
+ if (skb->ip_summed == CHECKSUM_UNNECESSARY)
+ napi_gro_receive(&rx_ring->napi, skb);
+ else
+ netif_receive_skb(skb);
}
static void ql_realign_skb(struct sk_buff *skb, int len)
@@ -2028,22 +2018,12 @@ static void ql_process_mac_split_rx_intr(struct ql_adapter *qdev,
rx_ring->rx_packets++;
rx_ring->rx_bytes += skb->len;
skb_record_rx_queue(skb, rx_ring->cq_id);
- if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
- if (qdev->vlgrp &&
- (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_V) &&
- (vlan_id != 0))
- vlan_gro_receive(&rx_ring->napi, qdev->vlgrp,
- vlan_id, skb);
- else
- napi_gro_receive(&rx_ring->napi, skb);
- } else {
- if (qdev->vlgrp &&
- (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_V) &&
- (vlan_id != 0))
- vlan_hwaccel_receive_skb(skb, qdev->vlgrp, vlan_id);
- else
- netif_receive_skb(skb);
- }
+ if ((ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_V) && (vlan_id != 0))
+ __vlan_hwaccel_put_tag(skb, vlan_id);
+ if (skb->ip_summed == CHECKSUM_UNNECESSARY)
+ napi_gro_receive(&rx_ring->napi, skb);
+ else
+ netif_receive_skb(skb);
}
/* Process an inbound completion from an rx ring. */
@@ -2334,71 +2314,111 @@ static int ql_napi_poll_msix(struct napi_struct *napi, int budget)
return work_done;
}
-static void qlge_vlan_rx_register(struct net_device *ndev, struct vlan_group *grp)
+static void qlge_vlan_mode(struct net_device *ndev, u32 features)
{
struct ql_adapter *qdev = netdev_priv(ndev);
- qdev->vlgrp = grp;
- if (grp) {
- netif_printk(qdev, ifup, KERN_DEBUG, qdev->ndev,
+ if (features & NETIF_F_HW_VLAN_RX) {
+ netif_printk(qdev, ifup, KERN_DEBUG, ndev,
"Turning on VLAN in NIC_RCV_CFG.\n");
ql_write32(qdev, NIC_RCV_CFG, NIC_RCV_CFG_VLAN_MASK |
- NIC_RCV_CFG_VLAN_MATCH_AND_NON);
+ NIC_RCV_CFG_VLAN_MATCH_AND_NON);
} else {
- netif_printk(qdev, ifup, KERN_DEBUG, qdev->ndev,
+ netif_printk(qdev, ifup, KERN_DEBUG, ndev,
"Turning off VLAN in NIC_RCV_CFG.\n");
ql_write32(qdev, NIC_RCV_CFG, NIC_RCV_CFG_VLAN_MASK);
}
}
-static void qlge_vlan_rx_add_vid(struct net_device *ndev, u16 vid)
+static u32 qlge_fix_features(struct net_device *ndev, u32 features)
+{
+ /*
+ * Since there is no support for separate rx/tx vlan accel
+ * enable/disable make sure tx flag is always in same state as rx.
+ */
+ if (features & NETIF_F_HW_VLAN_RX)
+ features |= NETIF_F_HW_VLAN_TX;
+ else
+ features &= ~NETIF_F_HW_VLAN_TX;
+
+ return features;
+}
+
+static int qlge_set_features(struct net_device *ndev, u32 features)
+{
+ u32 changed = ndev->features ^ features;
+
+ if (changed & NETIF_F_HW_VLAN_RX)
+ qlge_vlan_mode(ndev, features);
+
+ return 0;
+}
+
+static void __qlge_vlan_rx_add_vid(struct ql_adapter *qdev, u16 vid)
{
- struct ql_adapter *qdev = netdev_priv(ndev);
u32 enable_bit = MAC_ADDR_E;
- int status;
- status = ql_sem_spinlock(qdev, SEM_MAC_ADDR_MASK);
- if (status)
- return;
if (ql_set_mac_addr_reg
(qdev, (u8 *) &enable_bit, MAC_ADDR_TYPE_VLAN, vid)) {
netif_err(qdev, ifup, qdev->ndev,
"Failed to init vlan address.\n");
}
- ql_sem_unlock(qdev, SEM_MAC_ADDR_MASK);
}
-static void qlge_vlan_rx_kill_vid(struct net_device *ndev, u16 vid)
+static void qlge_vlan_rx_add_vid(struct net_device *ndev, u16 vid)
{
struct ql_adapter *qdev = netdev_priv(ndev);
- u32 enable_bit = 0;
int status;
status = ql_sem_spinlock(qdev, SEM_MAC_ADDR_MASK);
if (status)
return;
+ __qlge_vlan_rx_add_vid(qdev, vid);
+ set_bit(vid, qdev->active_vlans);
+
+ ql_sem_unlock(qdev, SEM_MAC_ADDR_MASK);
+}
+
+static void __qlge_vlan_rx_kill_vid(struct ql_adapter *qdev, u16 vid)
+{
+ u32 enable_bit = 0;
+
if (ql_set_mac_addr_reg
(qdev, (u8 *) &enable_bit, MAC_ADDR_TYPE_VLAN, vid)) {
netif_err(qdev, ifup, qdev->ndev,
"Failed to clear vlan address.\n");
}
- ql_sem_unlock(qdev, SEM_MAC_ADDR_MASK);
+}
+
+static void qlge_vlan_rx_kill_vid(struct net_device *ndev, u16 vid)
+{
+ struct ql_adapter *qdev = netdev_priv(ndev);
+ int status;
+
+ status = ql_sem_spinlock(qdev, SEM_MAC_ADDR_MASK);
+ if (status)
+ return;
+
+ __qlge_vlan_rx_kill_vid(qdev, vid);
+ clear_bit(vid, qdev->active_vlans);
+ ql_sem_unlock(qdev, SEM_MAC_ADDR_MASK);
}
static void qlge_restore_vlan(struct ql_adapter *qdev)
{
- qlge_vlan_rx_register(qdev->ndev, qdev->vlgrp);
+ int status;
+ u16 vid;
- if (qdev->vlgrp) {
- u16 vid;
- for (vid = 0; vid < VLAN_N_VID; vid++) {
- if (!vlan_group_get_device(qdev->vlgrp, vid))
- continue;
- qlge_vlan_rx_add_vid(qdev->ndev, vid);
- }
- }
+ status = ql_sem_spinlock(qdev, SEM_MAC_ADDR_MASK);
+ if (status)
+ return;
+
+ for_each_set_bit(vid, qdev->active_vlans, VLAN_N_VID)
+ __qlge_vlan_rx_add_vid(qdev, vid);
+
+ ql_sem_unlock(qdev, SEM_MAC_ADDR_MASK);
}
/* MSI-X Multiple Vector Interrupt Handler for inbound completions. */
@@ -4661,7 +4681,8 @@ static const struct net_device_ops qlge_netdev_ops = {
.ndo_set_mac_address = qlge_set_mac_address,
.ndo_validate_addr = eth_validate_addr,
.ndo_tx_timeout = qlge_tx_timeout,
- .ndo_vlan_rx_register = qlge_vlan_rx_register,
+ .ndo_fix_features = qlge_fix_features,
+ .ndo_set_features = qlge_set_features,
.ndo_vlan_rx_add_vid = qlge_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = qlge_vlan_rx_kill_vid,
};
--
1.7.6
^ permalink raw reply related
* [patch net-next-2.6 18/47 V2] igbvf: do vlan cleanup
From: Jiri Pirko @ 2011-07-21 13:22 UTC (permalink / raw)
To: netdev
Cc: jesse, e1000-devel, bruce.w.allan, jesse.brandeburg, mirqus,
john.ronciak, shemminger, davem
In-Reply-To: <1311173689-17419-19-git-send-email-jpirko@redhat.com>
- unify vlan and nonvlan rx path
- kill adapter->vlgrp and igbvf_vlan_rx_register
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/igbvf/igbvf.h | 4 +-
drivers/net/igbvf/netdev.c | 55 +++++++++++++++++++------------------------
2 files changed, 26 insertions(+), 33 deletions(-)
diff --git a/drivers/net/igbvf/igbvf.h b/drivers/net/igbvf/igbvf.h
index d5dad5d..fd4a7b7 100644
--- a/drivers/net/igbvf/igbvf.h
+++ b/drivers/net/igbvf/igbvf.h
@@ -34,7 +34,7 @@
#include <linux/timer.h>
#include <linux/io.h>
#include <linux/netdevice.h>
-
+#include <linux/if_vlan.h>
#include "vf.h"
@@ -173,7 +173,7 @@ struct igbvf_adapter {
const struct igbvf_info *ei;
- struct vlan_group *vlgrp;
+ unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
u32 bd_number;
u32 rx_buffer_len;
u32 polling_interval;
diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
index 64b47bf..d924b09 100644
--- a/drivers/net/igbvf/netdev.c
+++ b/drivers/net/igbvf/netdev.c
@@ -100,12 +100,12 @@ static void igbvf_receive_skb(struct igbvf_adapter *adapter,
struct sk_buff *skb,
u32 status, u16 vlan)
{
- if (adapter->vlgrp && (status & E1000_RXD_STAT_VP))
- vlan_hwaccel_receive_skb(skb, adapter->vlgrp,
- le16_to_cpu(vlan) &
- E1000_RXD_SPC_VLAN_MASK);
- else
- netif_receive_skb(skb);
+ if (status & E1000_RXD_STAT_VP) {
+ u16 vid = le16_to_cpu(vlan) & E1000_RXD_SPC_VLAN_MASK;
+
+ __vlan_hwaccel_put_tag(skb, vid);
+ }
+ netif_receive_skb(skb);
}
static inline void igbvf_rx_checksum_adv(struct igbvf_adapter *adapter,
@@ -1167,22 +1167,29 @@ static int igbvf_poll(struct napi_struct *napi, int budget)
*/
static void igbvf_set_rlpml(struct igbvf_adapter *adapter)
{
- int max_frame_size = adapter->max_frame_size;
+ int max_frame_size;
struct e1000_hw *hw = &adapter->hw;
- if (adapter->vlgrp)
- max_frame_size += VLAN_TAG_SIZE;
-
+ max_frame_size = adapter->max_frame_size + VLAN_TAG_SIZE;
e1000_rlpml_set_vf(hw, max_frame_size);
}
-static void igbvf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
+static bool __igbvf_vlan_rx_add_vid(struct igbvf_adapter *adapter, u16 vid)
{
- struct igbvf_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
if (hw->mac.ops.set_vfta(hw, vid, true))
dev_err(&adapter->pdev->dev, "Failed to add vlan id %d\n", vid);
+ return false;
+ return true;
+}
+
+static void igbvf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ if (__igbvf_vlan_rx_add_vid(adapter, vid))
+ set_bit(vid, adapter->active_vlans);
}
static void igbvf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
@@ -1191,7 +1198,6 @@ static void igbvf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
struct e1000_hw *hw = &adapter->hw;
igbvf_irq_disable(adapter);
- vlan_group_set_device(adapter->vlgrp, vid, NULL);
if (!test_bit(__IGBVF_DOWN, &adapter->state))
igbvf_irq_enable(adapter);
@@ -1199,28 +1205,16 @@ static void igbvf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
if (hw->mac.ops.set_vfta(hw, vid, false))
dev_err(&adapter->pdev->dev,
"Failed to remove vlan id %d\n", vid);
-}
-
-static void igbvf_vlan_rx_register(struct net_device *netdev,
- struct vlan_group *grp)
-{
- struct igbvf_adapter *adapter = netdev_priv(netdev);
-
- adapter->vlgrp = grp;
+ else
+ clear_bit(vid, adapter->active_vlans);
}
static void igbvf_restore_vlan(struct igbvf_adapter *adapter)
{
u16 vid;
- if (!adapter->vlgrp)
- return;
-
- for (vid = 0; vid < VLAN_N_VID; vid++) {
- if (!vlan_group_get_device(adapter->vlgrp, vid))
- continue;
- igbvf_vlan_rx_add_vid(adapter->netdev, vid);
- }
+ for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID)
+ __igbvf_vlan_rx_add_vid(adapter, vid);
igbvf_set_rlpml(adapter);
}
@@ -2203,7 +2197,7 @@ static netdev_tx_t igbvf_xmit_frame_ring_adv(struct sk_buff *skb,
return NETDEV_TX_BUSY;
}
- if (adapter->vlgrp && vlan_tx_tag_present(skb)) {
+ if (vlan_tx_tag_present(skb)) {
tx_flags |= IGBVF_TX_FLAGS_VLAN;
tx_flags |= (vlan_tx_tag_get(skb) << IGBVF_TX_FLAGS_VLAN_SHIFT);
}
@@ -2556,7 +2550,6 @@ static const struct net_device_ops igbvf_netdev_ops = {
.ndo_change_mtu = igbvf_change_mtu,
.ndo_do_ioctl = igbvf_ioctl,
.ndo_tx_timeout = igbvf_tx_timeout,
- .ndo_vlan_rx_register = igbvf_vlan_rx_register,
.ndo_vlan_rx_add_vid = igbvf_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = igbvf_vlan_rx_kill_vid,
#ifdef CONFIG_NET_POLL_CONTROLLER
--
1.7.6
------------------------------------------------------------------------------
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
^ permalink raw reply related
* Re: [PATCH 09/10] nfs: use sk fragment destructors to delay I/O completion until page is released by network stack.
From: Ian Campbell @ 2011-07-21 13:18 UTC (permalink / raw)
To: Trond Myklebust
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1310738489.4381.20.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
On Fri, 2011-07-15 at 15:01 +0100, Trond Myklebust wrote:
> On Fri, 2011-07-15 at 12:07 +0100, Ian Campbell wrote:
> > Thos prevents an issue where an ACK is delayed, a retransmit is queued (either
> > at the RPC or TCP level) and the ACK arrives before the retransmission hits the
> > wire. If this happens then the write() system call and the userspace process
> > can continue potentially modifying the data before the retransmission occurs.
> >
> > NB: this only covers the O_DIRECT write() case. I expect other cases to need
> > handling as well.
>
> That is why this belongs entirely in the RPC layer, and really should
> not touch the NFS layer.
> If you move your callback to the RPC layer and have it notify the
> rpc_task when the pages have been sent, then it should be possible to
> achieve the same thing.
>
> IOW: Add an extra state machine step after call_decode() which checks if
> all the page data has been transmitted and if not, puts the rpc_task on
> a wait queue, and has it wait for the fragment destructor callback
> before calling rpc_exit_task().
>
> Cheers
Is this the sort of thing? I wasn't sure where best to put the
destructor data structure to get the right lifecycle and ended up
putting it in the struct rpc_rqst and initialising it at
xprt_request_init time.
I changed everywhere which currently transitions to rpc_exit_task to
transition to a new "call_complete" task, blocking on the pending wait
queue. The SKB destructor wakes that queue and call_complete then
transitions to rpc_exit_task.
Several of the locations already block on that wait queue so I simply
remove the wake up in those cases (since it will happen in the SKB frag
destructor). Since we call unref at these points (to drop the initial
refcount) in the common case we will be woken from the pending wait
queue before we even sleep on it.
Thanks,
Ian.
>From 49d7d53d065bf0963fd4bb70405f4f1972f618c4 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
Date: Mon, 11 Jul 2011 14:43:24 +0100
Subject: [PATCH] sunrpc: use SKB fragment destructors to delay completion until page is released by network stack.
This prevents an issue where an ACK is delayed, a retransmit is queued (either
at the RPC or TCP level) and the ACK arrives before the retransmission hits the
wire. If this happens to an NFS WRITE RPC then the write() system call
completes and the userspace process can continue, potentially modifying data
referenced by the retransmission before the retransmission occurs.
Signed-off-by: Ian Campbell <ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
---
include/linux/sunrpc/xdr.h | 2 ++
include/linux/sunrpc/xprt.h | 5 ++++-
net/sunrpc/clnt.c | 28 +++++++++++++++++++++++-----
net/sunrpc/svcsock.c | 2 +-
net/sunrpc/xprt.c | 13 +++++++++++++
net/sunrpc/xprtsock.c | 2 +-
6 files changed, 44 insertions(+), 8 deletions(-)
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index a20970e..172f81e 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -16,6 +16,7 @@
#include <asm/byteorder.h>
#include <asm/unaligned.h>
#include <linux/scatterlist.h>
+#include <linux/skbuff.h>
/*
* Buffer adjustment
@@ -57,6 +58,7 @@ struct xdr_buf {
tail[1]; /* Appended after page data */
struct page ** pages; /* Array of contiguous pages */
+ struct skb_frag_destructor *destructor;
unsigned int page_base, /* Start of page data */
page_len, /* Length of page data */
flags; /* Flags for data disposition */
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 81cce3b..0de6bc3 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -91,7 +91,10 @@ struct rpc_rqst {
/* A cookie used to track the
state of the transport
connection */
-
+ struct skb_frag_destructor destructor; /* SKB paged fragment
+ * destructor for
+ * transmitted pages*/
+
/*
* Partial send handling
*/
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 8c91415..0c85acb 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -61,6 +61,7 @@ static void call_reserve(struct rpc_task *task);
static void call_reserveresult(struct rpc_task *task);
static void call_allocate(struct rpc_task *task);
static void call_decode(struct rpc_task *task);
+static void call_complete(struct rpc_task *task);
static void call_bind(struct rpc_task *task);
static void call_bind_status(struct rpc_task *task);
static void call_transmit(struct rpc_task *task);
@@ -1114,6 +1115,8 @@ rpc_xdr_encode(struct rpc_task *task)
(char *)req->rq_buffer + req->rq_callsize,
req->rq_rcvsize);
+ req->rq_snd_buf.destructor = &req->destructor;
+
p = rpc_encode_header(task);
if (p == NULL) {
printk(KERN_INFO "RPC: couldn't encode RPC header, exit EIO\n");
@@ -1277,6 +1280,7 @@ call_connect_status(struct rpc_task *task)
static void
call_transmit(struct rpc_task *task)
{
+ struct rpc_rqst *req = task->tk_rqstp;
dprint_status(task);
task->tk_action = call_status;
@@ -1310,8 +1314,8 @@ call_transmit(struct rpc_task *task)
call_transmit_status(task);
if (rpc_reply_expected(task))
return;
- task->tk_action = rpc_exit_task;
- rpc_wake_up_queued_task(&task->tk_xprt->pending, task);
+ task->tk_action = call_complete;
+ skb_frag_destructor_unref(&req->destructor);
}
/*
@@ -1384,7 +1388,8 @@ call_bc_transmit(struct rpc_task *task)
return;
}
- task->tk_action = rpc_exit_task;
+ task->tk_action = call_complete;
+ skb_frag_destructor_unref(&req->destructor);
if (task->tk_status < 0) {
printk(KERN_NOTICE "RPC: Could not send backchannel reply "
"error: %d\n", task->tk_status);
@@ -1424,7 +1429,6 @@ call_bc_transmit(struct rpc_task *task)
"error: %d\n", task->tk_status);
break;
}
- rpc_wake_up_queued_task(&req->rq_xprt->pending, task);
}
#endif /* CONFIG_NFS_V4_1 */
@@ -1591,12 +1595,14 @@ call_decode(struct rpc_task *task)
return;
}
- task->tk_action = rpc_exit_task;
+ task->tk_action = call_complete;
if (decode) {
task->tk_status = rpcauth_unwrap_resp(task, decode, req, p,
task->tk_msg.rpc_resp);
}
+ rpc_sleep_on(&req->rq_xprt->pending, task, NULL);
+ skb_frag_destructor_unref(&req->destructor);
dprintk("RPC: %5u call_decode result %d\n", task->tk_pid,
task->tk_status);
return;
@@ -1611,6 +1617,18 @@ out_retry:
}
}
+/*
+ * 8. Wait for pages to be released by the network stack.
+ */
+static void
+call_complete(struct rpc_task *task)
+{
+ struct rpc_rqst *req = task->tk_rqstp;
+ dprintk("RPC: %5u call_complete result %d\n", task->tk_pid, task->tk_status);
+ task->tk_action = rpc_exit_task;
+ rpc_wake_up_queued_task(&req->rq_xprt->pending, task);
+}
+
static __be32 *
rpc_encode_header(struct rpc_task *task)
{
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index a80b1d3..40c2420 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -194,7 +194,7 @@ int svc_send_common(struct socket *sock, struct xdr_buf *xdr,
while (pglen > 0) {
if (slen == size)
flags = 0;
- result = kernel_sendpage(sock, *ppage, NULL, base, size, flags);
+ result = kernel_sendpage(sock, *ppage, xdr->destructor, base, size, flags);
if (result > 0)
len += result;
if (result != size)
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ce5eb68..62f52a3 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1017,6 +1017,16 @@ static inline void xprt_init_xid(struct rpc_xprt *xprt)
xprt->xid = net_random();
}
+static int xprt_complete_skb_pages(void *calldata)
+{
+ struct rpc_task *task = calldata;
+ struct rpc_rqst *req = task->tk_rqstp;
+
+ dprintk("RPC: %5u completing skb pages\n", task->tk_pid);
+ rpc_wake_up_queued_task(&req->rq_xprt->pending, task);
+ return 0;
+}
+
static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt)
{
struct rpc_rqst *req = task->tk_rqstp;
@@ -1028,6 +1038,9 @@ static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt)
req->rq_xid = xprt_alloc_xid(xprt);
req->rq_release_snd_buf = NULL;
xprt_reset_majortimeo(req);
+ atomic_set(&req->destructor.ref, 1);
+ req->destructor.destroy = &xprt_complete_skb_pages;
+ req->destructor.data = task;
dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid,
req, ntohl(req->rq_xid));
}
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index d027621..ca1643b 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -397,7 +397,7 @@ static int xs_send_pagedata(struct socket *sock, struct xdr_buf *xdr, unsigned i
remainder -= len;
if (remainder != 0 || more)
flags |= MSG_MORE;
- err = sock->ops->sendpage(sock, *ppage, NULL, base, len, flags);
+ err = sock->ops->sendpage(sock, *ppage, xdr->destructor, base, len, flags);
if (remainder == 0 || err != len)
break;
sent += err;
--
1.7.2.5
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH][TRIVIAL] net, netfilter: Remove redundant goto in ebt_ulog_packet
From: Jiri Kosina @ 2011-07-21 12:02 UTC (permalink / raw)
To: Jesper Juhl
Cc: linux-kernel, netdev, bridge, coreteam, netfilter,
netfilter-devel, David S. Miller, Stephen Hemminger,
Patrick McHardy, Bart De Schuymer
In-Reply-To: <alpine.LNX.2.00.1107171953500.32359@swampdragon.chaosbits.net>
On Sun, 17 Jul 2011, Jesper Juhl wrote:
> In net/bridge/netfilter/ebt_ulog.c:ebt_ulog_packet() the 'goto unlock'
> before the 'alloc_failure' label is completely redundant. This patch
> removes it.
>
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>
> ---
> net/bridge/netfilter/ebt_ulog.c | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c
> index 26377e9..bf2a333 100644
> --- a/net/bridge/netfilter/ebt_ulog.c
> +++ b/net/bridge/netfilter/ebt_ulog.c
> @@ -216,7 +216,6 @@ unlock:
> nlmsg_failure:
> pr_debug("error during NLMSG_PUT. This should "
> "not happen, please report to author.\n");
> - goto unlock;
> alloc_failure:
> goto unlock;
> }
> --
> 1.7.6
>
>
> PS. Please CC me on replies since I'm not subscribed to all the lists
> copied on this mail.
Doesn't seem to be present in linux-next as of today. I have picked it up.
--
Jiri Kosina
SUSE Labs
^ permalink raw reply
* Re: NIC driver r8168 with r8169 for RTL8111/8168B and DGE-528T together
From: Francois Romieu @ 2011-07-21 10:22 UTC (permalink / raw)
To: Danie Wessels; +Cc: Stephen Hemminger, netdev
In-Reply-To: <4E2752A2.3030208@telkomsa.net>
Danie Wessels <dawessels@telkomsa.net> :
[...]
> .> Remove the 8168 PCI IDs from the r8169 driver and you should be set.
> gr8. I can give that a try...8^0 (some more hints....?)
Search for rtl8169_pci_tbl and remove the 0x8168 line : the kernel r8169
driver will stop being used on the 8168s.
>
> .>It ought to be supported by the kernel r8169 driver.
>
> See bugs listed on Ubuntu:
> https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/141343
Executive summary :
a. started in 2007
b. includes "kernel r8169 driver does not work, tried Realtek's r8168,
happy now"
c. includes b. + "...wait a few days, Oops"
d. covers different devices (8168b, 8168c, plain 8169).
e. people still using old kernel (see post 2.6.26
77332894c21165404496c56763d7df6c15c4bb09 in #38 then mention of 2.6.22,
2.6.24)
f. usual "this is the same bug" (#141343 is the same bug as #141343
e. one (1) dmesg attached to the whole thread (74 messages). Tons of lspci,
not a single XID.
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/221499
Trend (?) : recent kernels help. XID and up-to-date reports would be
welcome if they can be streamlined to a standard (for me :o) ) kernel.
See below.
> https://bugs.launchpad.net/linux/+bug/347711
I wish I could filter out comments by author. Flat PR get really, really
messy with time.
> And 0n Bug #347711
> linux-kernel-bugs #12411
> Duplicates of this bug
> Bug #76489
> Bug #347670
Parts of reports look the same but it is more a bag of mixed bugs than
a duplicated one. On the bright side I only have to pick one to fix
something. :o/
> .> Which problem(s) do you have with it ?
> No communication to outside devices. I can not ping my router
> through it but can ping its IP.
So it is receiving packets either in promiscuous mode or (and)
as long as its peer knows its MAC address ?
No Tx at all (it may help if you can capture traffic on the remote
end) or just a few packets before it stops (and spits a NETDEV
WATCHDOG message) ?
> What can I do now or where should I report it or how can I help?
> RTFM 4 rules.d = where?
- Where (general)
Here and/or kernel.org bugzilla. Cc: good. Private email: bad. You may
Cc: Hayes as well.
If you go for bugzilla, fill Product/Component as "Drivers/Network"
- What (general)
If it is not a regression - i.e. it does not fall in the "stopped
working" bucket - always try last Linus's -rc or David Miller's "next"
branch. I may have a bit more pending sauce but the "next" branch usually
is a good starting point (especially as fixing regressions in it asap is
a rather high priority task). "next" is waiting for you at :
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git
Comparing Realtek's driver and kernel r8169 one is welcome. BIOS lan
options and reboot from a different OS sometimes make a difference (its
always nice to be sure than the damn thing _can_ work).
Include:
- dmesg output (unabbreviated, explicit). The XID line is a mess but it
allows me to triage the bugs.
- ethtool -d ethX. Be it a single line or a (nowadays partial) registers
dump.
- lspci -v/-tv
- brand/motherboard identification
(wrt your report, I already have most of the needed material)
Ping me when you want a status update or feel things are sidetracked.
- What (specific)
One of Ubuntu's report suggests that you are experiencing a regression.
If so it would be nice to bissect it.
In the short run I can not do more as I must push something out for a mac
address change problem with the 8168evl.
--
Ueimor
^ permalink raw reply
* [PATCH 5/8] netfilter: nfnetlink_queue: batch verdict support
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>
From: Florian Westphal <fw@strlen.de>
Introduces a new nfnetlink type that applies a given
verdict to all queued packets with an id <= the id in the verdict
message.
If a mark is provided it is applied to all matched packets.
This reduces the number of verdicts that have to be sent.
Applications that make use of this feature need to maintain
a timeout to send a batchverdict periodically to avoid starvation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
include/linux/netfilter/nfnetlink_queue.h | 1 +
net/netfilter/nfnetlink_queue.c | 115 ++++++++++++++++++++++++++---
2 files changed, 104 insertions(+), 12 deletions(-)
diff --git a/include/linux/netfilter/nfnetlink_queue.h b/include/linux/netfilter/nfnetlink_queue.h
index af94e00..24b32e6 100644
--- a/include/linux/netfilter/nfnetlink_queue.h
+++ b/include/linux/netfilter/nfnetlink_queue.h
@@ -8,6 +8,7 @@ enum nfqnl_msg_types {
NFQNL_MSG_PACKET, /* packet from kernel to userspace */
NFQNL_MSG_VERDICT, /* verdict from userspace to kernel */
NFQNL_MSG_CONFIG, /* connect to a particular queue */
+ NFQNL_MSG_VERDICT_BATCH, /* batchv from userspace to kernel */
NFQNL_MSG_MAX
};
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 3b2af8c..fbfcd83 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -171,6 +171,13 @@ __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
queue->queue_total++;
}
+static void
+__dequeue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
+{
+ list_del(&entry->list);
+ queue->queue_total--;
+}
+
static struct nf_queue_entry *
find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
{
@@ -185,10 +192,8 @@ find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
}
}
- if (entry) {
- list_del(&entry->list);
- queue->queue_total--;
- }
+ if (entry)
+ __dequeue_entry(queue, entry);
spin_unlock_bh(&queue->lock);
@@ -611,6 +616,92 @@ static const struct nla_policy nfqa_verdict_policy[NFQA_MAX+1] = {
[NFQA_PAYLOAD] = { .type = NLA_UNSPEC },
};
+static const struct nla_policy nfqa_verdict_batch_policy[NFQA_MAX+1] = {
+ [NFQA_VERDICT_HDR] = { .len = sizeof(struct nfqnl_msg_verdict_hdr) },
+ [NFQA_MARK] = { .type = NLA_U32 },
+};
+
+static struct nfqnl_instance *verdict_instance_lookup(u16 queue_num, int nlpid)
+{
+ struct nfqnl_instance *queue;
+
+ queue = instance_lookup(queue_num);
+ if (!queue)
+ return ERR_PTR(-ENODEV);
+
+ if (queue->peer_pid != nlpid)
+ return ERR_PTR(-EPERM);
+
+ return queue;
+}
+
+static struct nfqnl_msg_verdict_hdr*
+verdicthdr_get(const struct nlattr * const nfqa[])
+{
+ struct nfqnl_msg_verdict_hdr *vhdr;
+ unsigned int verdict;
+
+ if (!nfqa[NFQA_VERDICT_HDR])
+ return NULL;
+
+ vhdr = nla_data(nfqa[NFQA_VERDICT_HDR]);
+ verdict = ntohl(vhdr->verdict);
+ if ((verdict & NF_VERDICT_MASK) > NF_MAX_VERDICT)
+ return NULL;
+ return vhdr;
+}
+
+static int nfq_id_after(unsigned int id, unsigned int max)
+{
+ return (int)(id - max) > 0;
+}
+
+static int
+nfqnl_recv_verdict_batch(struct sock *ctnl, struct sk_buff *skb,
+ const struct nlmsghdr *nlh,
+ const struct nlattr * const nfqa[])
+{
+ struct nfgenmsg *nfmsg = NLMSG_DATA(nlh);
+ struct nf_queue_entry *entry, *tmp;
+ unsigned int verdict, maxid;
+ struct nfqnl_msg_verdict_hdr *vhdr;
+ struct nfqnl_instance *queue;
+ LIST_HEAD(batch_list);
+ u16 queue_num = ntohs(nfmsg->res_id);
+
+ queue = verdict_instance_lookup(queue_num, NETLINK_CB(skb).pid);
+ if (IS_ERR(queue))
+ return PTR_ERR(queue);
+
+ vhdr = verdicthdr_get(nfqa);
+ if (!vhdr)
+ return -EINVAL;
+
+ verdict = ntohl(vhdr->verdict);
+ maxid = ntohl(vhdr->id);
+
+ spin_lock_bh(&queue->lock);
+
+ list_for_each_entry_safe(entry, tmp, &queue->queue_list, list) {
+ if (nfq_id_after(entry->id, maxid))
+ break;
+ __dequeue_entry(queue, entry);
+ list_add_tail(&entry->list, &batch_list);
+ }
+
+ spin_unlock_bh(&queue->lock);
+
+ if (list_empty(&batch_list))
+ return -ENOENT;
+
+ list_for_each_entry_safe(entry, tmp, &batch_list, list) {
+ if (nfqa[NFQA_MARK])
+ entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
+ nf_reinject(entry, verdict);
+ }
+ return 0;
+}
+
static int
nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
const struct nlmsghdr *nlh,
@@ -626,20 +717,17 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
queue = instance_lookup(queue_num);
if (!queue)
- return -ENODEV;
- if (queue->peer_pid != NETLINK_CB(skb).pid)
- return -EPERM;
+ queue = verdict_instance_lookup(queue_num, NETLINK_CB(skb).pid);
+ if (IS_ERR(queue))
+ return PTR_ERR(queue);
- if (!nfqa[NFQA_VERDICT_HDR])
+ vhdr = verdicthdr_get(nfqa);
+ if (!vhdr)
return -EINVAL;
- vhdr = nla_data(nfqa[NFQA_VERDICT_HDR]);
verdict = ntohl(vhdr->verdict);
- if ((verdict & NF_VERDICT_MASK) > NF_MAX_VERDICT)
- return -EINVAL;
-
entry = find_dequeue_entry(queue, ntohl(vhdr->id));
if (entry == NULL)
return -ENOENT;
@@ -775,6 +863,9 @@ static const struct nfnl_callback nfqnl_cb[NFQNL_MSG_MAX] = {
[NFQNL_MSG_CONFIG] = { .call = nfqnl_recv_config,
.attr_count = NFQA_CFG_MAX,
.policy = nfqa_cfg_policy },
+ [NFQNL_MSG_VERDICT_BATCH]={ .call_rcu = nfqnl_recv_verdict_batch,
+ .attr_count = NFQA_MAX,
+ .policy = nfqa_verdict_batch_policy },
};
static const struct nfnetlink_subsystem nfqnl_subsys = {
--
1.7.2.3
^ permalink raw reply related
* [PATCH 6/8] netfilter: ipset: make possible to hash some part of the data element only
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>
From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
include/linux/netfilter/ipset/ip_set_ahash.h | 14 ++++++++++----
1 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/linux/netfilter/ipset/ip_set_ahash.h b/include/linux/netfilter/ipset/ip_set_ahash.h
index c5b06aa..42b7d25 100644
--- a/include/linux/netfilter/ipset/ip_set_ahash.h
+++ b/include/linux/netfilter/ipset/ip_set_ahash.h
@@ -211,12 +211,16 @@ ip_set_hash_destroy(struct ip_set *set)
set->data = NULL;
}
-#define HKEY(data, initval, htable_bits) \
-(jhash2((u32 *)(data), sizeof(struct type_pf_elem)/sizeof(u32), initval) \
- & jhash_mask(htable_bits))
-
#endif /* _IP_SET_AHASH_H */
+#ifndef HKEY_DATALEN
+#define HKEY_DATALEN sizeof(struct type_pf_elem)
+#endif
+
+#define HKEY(data, initval, htable_bits) \
+(jhash2((u32 *)(data), HKEY_DATALEN/sizeof(u32), initval) \
+ & jhash_mask(htable_bits))
+
#define CONCAT(a, b, c) a##b##c
#define TOKEN(a, b, c) CONCAT(a, b, c)
@@ -1054,6 +1058,8 @@ type_pf_gc_init(struct ip_set *set)
IPSET_GC_PERIOD(h->timeout));
}
+#undef HKEY_DATALEN
+#undef HKEY
#undef type_pf_data_equal
#undef type_pf_data_isnull
#undef type_pf_data_copy
--
1.7.2.3
^ permalink raw reply related
* [PATCH 7/8] netfilter: ipset: hash:net,iface fixed to handle overlapping nets behind different interfaces
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>
From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
If overlapping networks with different interfaces was added to
the set, the type did not handle it properly. Example
ipset create test hash:net,iface
ipset add test 192.168.0.0/16,eth0
ipset add test 192.168.0.0/24,eth1
Now, if a packet was sent from 192.168.0.0/24,eth0, the type returned
a match.
In the patch the algorithm is fixed in order to correctly handle
overlapping networks.
Limitation: the same network cannot be stored with more than 64 different
interfaces in a single set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
include/linux/netfilter/ipset/ip_set_ahash.h | 92 +++++++++++++++++---------
net/netfilter/ipset/ip_set_hash_ip.c | 6 +-
net/netfilter/ipset/ip_set_hash_ipport.c | 6 +-
net/netfilter/ipset/ip_set_hash_ipportip.c | 6 +-
net/netfilter/ipset/ip_set_hash_ipportnet.c | 6 +-
net/netfilter/ipset/ip_set_hash_net.c | 6 +-
net/netfilter/ipset/ip_set_hash_netiface.c | 40 +++++++++--
net/netfilter/ipset/ip_set_hash_netport.c | 6 +-
8 files changed, 117 insertions(+), 51 deletions(-)
diff --git a/include/linux/netfilter/ipset/ip_set_ahash.h b/include/linux/netfilter/ipset/ip_set_ahash.h
index 42b7d25..1e7f759 100644
--- a/include/linux/netfilter/ipset/ip_set_ahash.h
+++ b/include/linux/netfilter/ipset/ip_set_ahash.h
@@ -28,7 +28,32 @@
/* Number of elements to store in an initial array block */
#define AHASH_INIT_SIZE 4
/* Max number of elements to store in an array block */
-#define AHASH_MAX_SIZE (3*4)
+#define AHASH_MAX_SIZE (3*AHASH_INIT_SIZE)
+
+/* Max number of elements can be tuned */
+#ifdef IP_SET_HASH_WITH_MULTI
+#define AHASH_MAX(h) ((h)->ahash_max)
+
+static inline u8
+tune_ahash_max(u8 curr, u32 multi)
+{
+ u32 n;
+
+ if (multi < curr)
+ return curr;
+
+ n = curr + AHASH_INIT_SIZE;
+ /* Currently, at listing one hash bucket must fit into a message.
+ * Therefore we have a hard limit here.
+ */
+ return n > curr && n <= 64 ? n : curr;
+}
+#define TUNE_AHASH_MAX(h, multi) \
+ ((h)->ahash_max = tune_ahash_max((h)->ahash_max, multi))
+#else
+#define AHASH_MAX(h) AHASH_MAX_SIZE
+#define TUNE_AHASH_MAX(h, multi)
+#endif
/* A hash bucket */
struct hbucket {
@@ -60,6 +85,9 @@ struct ip_set_hash {
u32 timeout; /* timeout value, if enabled */
struct timer_list gc; /* garbage collection when timeout enabled */
struct type_pf_next next; /* temporary storage for uadd */
+#ifdef IP_SET_HASH_WITH_MULTI
+ u8 ahash_max; /* max elements in an array block */
+#endif
#ifdef IP_SET_HASH_WITH_NETMASK
u8 netmask; /* netmask value for subnets to store */
#endif
@@ -279,12 +307,13 @@ ip_set_hash_destroy(struct ip_set *set)
/* Add an element to the hash table when resizing the set:
* we spare the maintenance of the internal counters. */
static int
-type_pf_elem_add(struct hbucket *n, const struct type_pf_elem *value)
+type_pf_elem_add(struct hbucket *n, const struct type_pf_elem *value,
+ u8 ahash_max)
{
if (n->pos >= n->size) {
void *tmp;
- if (n->size >= AHASH_MAX_SIZE)
+ if (n->size >= ahash_max)
/* Trigger rehashing */
return -EAGAIN;
@@ -339,7 +368,7 @@ retry:
for (j = 0; j < n->pos; j++) {
data = ahash_data(n, j);
m = hbucket(t, HKEY(data, h->initval, htable_bits));
- ret = type_pf_elem_add(m, data);
+ ret = type_pf_elem_add(m, data, AHASH_MAX(h));
if (ret < 0) {
read_unlock_bh(&set->lock);
ahash_destroy(t);
@@ -376,7 +405,7 @@ type_pf_add(struct ip_set *set, void *value, u32 timeout, u32 flags)
const struct type_pf_elem *d = value;
struct hbucket *n;
int i, ret = 0;
- u32 key;
+ u32 key, multi = 0;
if (h->elements >= h->maxelem)
return -IPSET_ERR_HASH_FULL;
@@ -386,12 +415,12 @@ type_pf_add(struct ip_set *set, void *value, u32 timeout, u32 flags)
key = HKEY(value, h->initval, t->htable_bits);
n = hbucket(t, key);
for (i = 0; i < n->pos; i++)
- if (type_pf_data_equal(ahash_data(n, i), d)) {
+ if (type_pf_data_equal(ahash_data(n, i), d, &multi)) {
ret = -IPSET_ERR_EXIST;
goto out;
}
-
- ret = type_pf_elem_add(n, value);
+ TUNE_AHASH_MAX(h, multi);
+ ret = type_pf_elem_add(n, value, AHASH_MAX(h));
if (ret != 0) {
if (ret == -EAGAIN)
type_pf_data_next(h, d);
@@ -419,13 +448,13 @@ type_pf_del(struct ip_set *set, void *value, u32 timeout, u32 flags)
struct hbucket *n;
int i;
struct type_pf_elem *data;
- u32 key;
+ u32 key, multi = 0;
key = HKEY(value, h->initval, t->htable_bits);
n = hbucket(t, key);
for (i = 0; i < n->pos; i++) {
data = ahash_data(n, i);
- if (!type_pf_data_equal(data, d))
+ if (!type_pf_data_equal(data, d, &multi))
continue;
if (i != n->pos - 1)
/* Not last one */
@@ -466,17 +495,17 @@ type_pf_test_cidrs(struct ip_set *set, struct type_pf_elem *d, u32 timeout)
struct hbucket *n;
const struct type_pf_elem *data;
int i, j = 0;
- u32 key;
+ u32 key, multi = 0;
u8 host_mask = SET_HOST_MASK(set->family);
pr_debug("test by nets\n");
- for (; j < host_mask && h->nets[j].cidr; j++) {
+ for (; j < host_mask && h->nets[j].cidr && !multi; j++) {
type_pf_data_netmask(d, h->nets[j].cidr);
key = HKEY(d, h->initval, t->htable_bits);
n = hbucket(t, key);
for (i = 0; i < n->pos; i++) {
data = ahash_data(n, i);
- if (type_pf_data_equal(data, d))
+ if (type_pf_data_equal(data, d, &multi))
return 1;
}
}
@@ -494,7 +523,7 @@ type_pf_test(struct ip_set *set, void *value, u32 timeout, u32 flags)
struct hbucket *n;
const struct type_pf_elem *data;
int i;
- u32 key;
+ u32 key, multi = 0;
#ifdef IP_SET_HASH_WITH_NETS
/* If we test an IP address and not a network address,
@@ -507,7 +536,7 @@ type_pf_test(struct ip_set *set, void *value, u32 timeout, u32 flags)
n = hbucket(t, key);
for (i = 0; i < n->pos; i++) {
data = ahash_data(n, i);
- if (type_pf_data_equal(data, d))
+ if (type_pf_data_equal(data, d, &multi))
return 1;
}
return 0;
@@ -664,14 +693,14 @@ type_pf_data_timeout_set(struct type_pf_elem *data, u32 timeout)
static int
type_pf_elem_tadd(struct hbucket *n, const struct type_pf_elem *value,
- u32 timeout)
+ u8 ahash_max, u32 timeout)
{
struct type_pf_elem *data;
if (n->pos >= n->size) {
void *tmp;
- if (n->size >= AHASH_MAX_SIZE)
+ if (n->size >= ahash_max)
/* Trigger rehashing */
return -EAGAIN;
@@ -776,7 +805,7 @@ retry:
for (j = 0; j < n->pos; j++) {
data = ahash_tdata(n, j);
m = hbucket(t, HKEY(data, h->initval, htable_bits));
- ret = type_pf_elem_tadd(m, data,
+ ret = type_pf_elem_tadd(m, data, AHASH_MAX(h),
type_pf_data_timeout(data));
if (ret < 0) {
read_unlock_bh(&set->lock);
@@ -807,9 +836,9 @@ type_pf_tadd(struct ip_set *set, void *value, u32 timeout, u32 flags)
const struct type_pf_elem *d = value;
struct hbucket *n;
struct type_pf_elem *data;
- int ret = 0, i, j = AHASH_MAX_SIZE + 1;
+ int ret = 0, i, j = AHASH_MAX(h) + 1;
bool flag_exist = flags & IPSET_FLAG_EXIST;
- u32 key;
+ u32 key, multi = 0;
if (h->elements >= h->maxelem)
/* FIXME: when set is full, we slow down here */
@@ -823,18 +852,18 @@ type_pf_tadd(struct ip_set *set, void *value, u32 timeout, u32 flags)
n = hbucket(t, key);
for (i = 0; i < n->pos; i++) {
data = ahash_tdata(n, i);
- if (type_pf_data_equal(data, d)) {
+ if (type_pf_data_equal(data, d, &multi)) {
if (type_pf_data_expired(data) || flag_exist)
j = i;
else {
ret = -IPSET_ERR_EXIST;
goto out;
}
- } else if (j == AHASH_MAX_SIZE + 1 &&
+ } else if (j == AHASH_MAX(h) + 1 &&
type_pf_data_expired(data))
j = i;
}
- if (j != AHASH_MAX_SIZE + 1) {
+ if (j != AHASH_MAX(h) + 1) {
data = ahash_tdata(n, j);
#ifdef IP_SET_HASH_WITH_NETS
del_cidr(h, data->cidr, HOST_MASK);
@@ -844,7 +873,8 @@ type_pf_tadd(struct ip_set *set, void *value, u32 timeout, u32 flags)
type_pf_data_timeout_set(data, timeout);
goto out;
}
- ret = type_pf_elem_tadd(n, d, timeout);
+ TUNE_AHASH_MAX(h, multi);
+ ret = type_pf_elem_tadd(n, d, AHASH_MAX(h), timeout);
if (ret != 0) {
if (ret == -EAGAIN)
type_pf_data_next(h, d);
@@ -869,13 +899,13 @@ type_pf_tdel(struct ip_set *set, void *value, u32 timeout, u32 flags)
struct hbucket *n;
int i;
struct type_pf_elem *data;
- u32 key;
+ u32 key, multi = 0;
key = HKEY(value, h->initval, t->htable_bits);
n = hbucket(t, key);
for (i = 0; i < n->pos; i++) {
data = ahash_tdata(n, i);
- if (!type_pf_data_equal(data, d))
+ if (!type_pf_data_equal(data, d, &multi))
continue;
if (type_pf_data_expired(data))
return -IPSET_ERR_EXIST;
@@ -915,16 +945,16 @@ type_pf_ttest_cidrs(struct ip_set *set, struct type_pf_elem *d, u32 timeout)
struct type_pf_elem *data;
struct hbucket *n;
int i, j = 0;
- u32 key;
+ u32 key, multi = 0;
u8 host_mask = SET_HOST_MASK(set->family);
- for (; j < host_mask && h->nets[j].cidr; j++) {
+ for (; j < host_mask && h->nets[j].cidr && !multi; j++) {
type_pf_data_netmask(d, h->nets[j].cidr);
key = HKEY(d, h->initval, t->htable_bits);
n = hbucket(t, key);
for (i = 0; i < n->pos; i++) {
data = ahash_tdata(n, i);
- if (type_pf_data_equal(data, d))
+ if (type_pf_data_equal(data, d, &multi))
return !type_pf_data_expired(data);
}
}
@@ -940,7 +970,7 @@ type_pf_ttest(struct ip_set *set, void *value, u32 timeout, u32 flags)
struct type_pf_elem *data, *d = value;
struct hbucket *n;
int i;
- u32 key;
+ u32 key, multi = 0;
#ifdef IP_SET_HASH_WITH_NETS
if (d->cidr == SET_HOST_MASK(set->family))
@@ -950,7 +980,7 @@ type_pf_ttest(struct ip_set *set, void *value, u32 timeout, u32 flags)
n = hbucket(t, key);
for (i = 0; i < n->pos; i++) {
data = ahash_tdata(n, i);
- if (type_pf_data_equal(data, d))
+ if (type_pf_data_equal(data, d, &multi))
return !type_pf_data_expired(data);
}
return 0;
diff --git a/net/netfilter/ipset/ip_set_hash_ip.c b/net/netfilter/ipset/ip_set_hash_ip.c
index fa80bb9..f2d576e 100644
--- a/net/netfilter/ipset/ip_set_hash_ip.c
+++ b/net/netfilter/ipset/ip_set_hash_ip.c
@@ -53,7 +53,8 @@ struct hash_ip4_telem {
static inline bool
hash_ip4_data_equal(const struct hash_ip4_elem *ip1,
- const struct hash_ip4_elem *ip2)
+ const struct hash_ip4_elem *ip2,
+ u32 *multi)
{
return ip1->ip == ip2->ip;
}
@@ -225,7 +226,8 @@ struct hash_ip6_telem {
static inline bool
hash_ip6_data_equal(const struct hash_ip6_elem *ip1,
- const struct hash_ip6_elem *ip2)
+ const struct hash_ip6_elem *ip2,
+ u32 *multi)
{
return ipv6_addr_cmp(&ip1->ip.in6, &ip2->ip.in6) == 0;
}
diff --git a/net/netfilter/ipset/ip_set_hash_ipport.c b/net/netfilter/ipset/ip_set_hash_ipport.c
index bbf51b6..6ee10f5 100644
--- a/net/netfilter/ipset/ip_set_hash_ipport.c
+++ b/net/netfilter/ipset/ip_set_hash_ipport.c
@@ -60,7 +60,8 @@ struct hash_ipport4_telem {
static inline bool
hash_ipport4_data_equal(const struct hash_ipport4_elem *ip1,
- const struct hash_ipport4_elem *ip2)
+ const struct hash_ipport4_elem *ip2,
+ u32 *multi)
{
return ip1->ip == ip2->ip &&
ip1->port == ip2->port &&
@@ -276,7 +277,8 @@ struct hash_ipport6_telem {
static inline bool
hash_ipport6_data_equal(const struct hash_ipport6_elem *ip1,
- const struct hash_ipport6_elem *ip2)
+ const struct hash_ipport6_elem *ip2,
+ u32 *multi)
{
return ipv6_addr_cmp(&ip1->ip.in6, &ip2->ip.in6) == 0 &&
ip1->port == ip2->port &&
diff --git a/net/netfilter/ipset/ip_set_hash_ipportip.c b/net/netfilter/ipset/ip_set_hash_ipportip.c
index 96525f5..fb90e34 100644
--- a/net/netfilter/ipset/ip_set_hash_ipportip.c
+++ b/net/netfilter/ipset/ip_set_hash_ipportip.c
@@ -62,7 +62,8 @@ struct hash_ipportip4_telem {
static inline bool
hash_ipportip4_data_equal(const struct hash_ipportip4_elem *ip1,
- const struct hash_ipportip4_elem *ip2)
+ const struct hash_ipportip4_elem *ip2,
+ u32 *multi)
{
return ip1->ip == ip2->ip &&
ip1->ip2 == ip2->ip2 &&
@@ -286,7 +287,8 @@ struct hash_ipportip6_telem {
static inline bool
hash_ipportip6_data_equal(const struct hash_ipportip6_elem *ip1,
- const struct hash_ipportip6_elem *ip2)
+ const struct hash_ipportip6_elem *ip2,
+ u32 *multi)
{
return ipv6_addr_cmp(&ip1->ip.in6, &ip2->ip.in6) == 0 &&
ipv6_addr_cmp(&ip1->ip2.in6, &ip2->ip2.in6) == 0 &&
diff --git a/net/netfilter/ipset/ip_set_hash_ipportnet.c b/net/netfilter/ipset/ip_set_hash_ipportnet.c
index d2d6ab8..deb3e3d 100644
--- a/net/netfilter/ipset/ip_set_hash_ipportnet.c
+++ b/net/netfilter/ipset/ip_set_hash_ipportnet.c
@@ -62,7 +62,8 @@ struct hash_ipportnet4_telem {
static inline bool
hash_ipportnet4_data_equal(const struct hash_ipportnet4_elem *ip1,
- const struct hash_ipportnet4_elem *ip2)
+ const struct hash_ipportnet4_elem *ip2,
+ u32 *multi)
{
return ip1->ip == ip2->ip &&
ip1->ip2 == ip2->ip2 &&
@@ -335,7 +336,8 @@ struct hash_ipportnet6_telem {
static inline bool
hash_ipportnet6_data_equal(const struct hash_ipportnet6_elem *ip1,
- const struct hash_ipportnet6_elem *ip2)
+ const struct hash_ipportnet6_elem *ip2,
+ u32 *multi)
{
return ipv6_addr_cmp(&ip1->ip.in6, &ip2->ip.in6) == 0 &&
ipv6_addr_cmp(&ip1->ip2.in6, &ip2->ip2.in6) == 0 &&
diff --git a/net/netfilter/ipset/ip_set_hash_net.c b/net/netfilter/ipset/ip_set_hash_net.c
index 2d4b1f4..60d0165 100644
--- a/net/netfilter/ipset/ip_set_hash_net.c
+++ b/net/netfilter/ipset/ip_set_hash_net.c
@@ -58,7 +58,8 @@ struct hash_net4_telem {
static inline bool
hash_net4_data_equal(const struct hash_net4_elem *ip1,
- const struct hash_net4_elem *ip2)
+ const struct hash_net4_elem *ip2,
+ u32 *multi)
{
return ip1->ip == ip2->ip && ip1->cidr == ip2->cidr;
}
@@ -249,7 +250,8 @@ struct hash_net6_telem {
static inline bool
hash_net6_data_equal(const struct hash_net6_elem *ip1,
- const struct hash_net6_elem *ip2)
+ const struct hash_net6_elem *ip2,
+ u32 *multi)
{
return ipv6_addr_cmp(&ip1->ip.in6, &ip2->ip.in6) == 0 &&
ip1->cidr == ip2->cidr;
diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c
index 3d6c53b..e13095d 100644
--- a/net/netfilter/ipset/ip_set_hash_netiface.c
+++ b/net/netfilter/ipset/ip_set_hash_netiface.c
@@ -99,7 +99,7 @@ iface_test(struct rb_root *root, const char **iface)
while (n) {
const char *d = iface_data(n);
- int res = ifname_compare(*iface, d);
+ long res = ifname_compare(*iface, d);
if (res < 0)
n = n->rb_left;
@@ -121,7 +121,7 @@ iface_add(struct rb_root *root, const char **iface)
while (*n) {
char *ifname = iface_data(*n);
- int res = ifname_compare(*iface, ifname);
+ long res = ifname_compare(*iface, ifname);
p = *n;
if (res < 0)
@@ -159,31 +159,42 @@ hash_netiface_same_set(const struct ip_set *a, const struct ip_set *b);
/* The type variant functions: IPv4 */
+struct hash_netiface4_elem_hashed {
+ __be32 ip;
+ u8 physdev;
+ u8 cidr;
+ u16 padding;
+};
+
+#define HKEY_DATALEN sizeof(struct hash_netiface4_elem_hashed)
+
/* Member elements without timeout */
struct hash_netiface4_elem {
__be32 ip;
- const char *iface;
u8 physdev;
u8 cidr;
u16 padding;
+ const char *iface;
};
/* Member elements with timeout support */
struct hash_netiface4_telem {
__be32 ip;
- const char *iface;
u8 physdev;
u8 cidr;
u16 padding;
+ const char *iface;
unsigned long timeout;
};
static inline bool
hash_netiface4_data_equal(const struct hash_netiface4_elem *ip1,
- const struct hash_netiface4_elem *ip2)
+ const struct hash_netiface4_elem *ip2,
+ u32 *multi)
{
return ip1->ip == ip2->ip &&
ip1->cidr == ip2->cidr &&
+ (++*multi) &&
ip1->physdev == ip2->physdev &&
ip1->iface == ip2->iface;
}
@@ -257,6 +268,7 @@ nla_put_failure:
#define IP_SET_HASH_WITH_NETS
#define IP_SET_HASH_WITH_RBTREE
+#define IP_SET_HASH_WITH_MULTI
#define PF 4
#define HOST_MASK 32
@@ -424,29 +436,40 @@ hash_netiface_same_set(const struct ip_set *a, const struct ip_set *b)
/* The type variant functions: IPv6 */
+struct hash_netiface6_elem_hashed {
+ union nf_inet_addr ip;
+ u8 physdev;
+ u8 cidr;
+ u16 padding;
+};
+
+#define HKEY_DATALEN sizeof(struct hash_netiface6_elem_hashed)
+
struct hash_netiface6_elem {
union nf_inet_addr ip;
- const char *iface;
u8 physdev;
u8 cidr;
u16 padding;
+ const char *iface;
};
struct hash_netiface6_telem {
union nf_inet_addr ip;
- const char *iface;
u8 physdev;
u8 cidr;
u16 padding;
+ const char *iface;
unsigned long timeout;
};
static inline bool
hash_netiface6_data_equal(const struct hash_netiface6_elem *ip1,
- const struct hash_netiface6_elem *ip2)
+ const struct hash_netiface6_elem *ip2,
+ u32 *multi)
{
return ipv6_addr_cmp(&ip1->ip.in6, &ip2->ip.in6) == 0 &&
ip1->cidr == ip2->cidr &&
+ (++*multi) &&
ip1->physdev == ip2->physdev &&
ip1->iface == ip2->iface;
}
@@ -681,6 +704,7 @@ hash_netiface_create(struct ip_set *set, struct nlattr *tb[], u32 flags)
h->maxelem = maxelem;
get_random_bytes(&h->initval, sizeof(h->initval));
h->timeout = IPSET_NO_TIMEOUT;
+ h->ahash_max = AHASH_MAX_SIZE;
hbits = htable_bits(hashsize);
h->table = ip_set_alloc(
diff --git a/net/netfilter/ipset/ip_set_hash_netport.c b/net/netfilter/ipset/ip_set_hash_netport.c
index fe203d1..8f9de72 100644
--- a/net/netfilter/ipset/ip_set_hash_netport.c
+++ b/net/netfilter/ipset/ip_set_hash_netport.c
@@ -59,7 +59,8 @@ struct hash_netport4_telem {
static inline bool
hash_netport4_data_equal(const struct hash_netport4_elem *ip1,
- const struct hash_netport4_elem *ip2)
+ const struct hash_netport4_elem *ip2,
+ u32 *multi)
{
return ip1->ip == ip2->ip &&
ip1->port == ip2->port &&
@@ -300,7 +301,8 @@ struct hash_netport6_telem {
static inline bool
hash_netport6_data_equal(const struct hash_netport6_elem *ip1,
- const struct hash_netport6_elem *ip2)
+ const struct hash_netport6_elem *ip2,
+ u32 *multi)
{
return ipv6_addr_cmp(&ip1->ip.in6, &ip2->ip.in6) == 0 &&
ip1->port == ip2->port &&
--
1.7.2.3
^ permalink raw reply related
* [PATCH 3/8] netfilter: nfnetlink_queue: provide rcu enabled callbacks
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>
From: Eric Dumazet <eric.dumazet@gmail.com>
nenetlink_queue operations on SMP are not efficent if several queues are
used, because of nfnl_mutex contention when applications give packet
verdict.
Use new call_rcu field in struct nfnl_callback to advertize a callback
that is called under rcu_read_lock instead of nfnl_mutex.
On my 2x4x2 machine, I was able to reach 2.000.000 pps going through
user land returning NF_ACCEPT verdicts without losses, instead of less
than 500.000 pps before patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
net/netfilter/nfnetlink_queue.c | 41 +++++++++++---------------------------
1 files changed, 12 insertions(+), 29 deletions(-)
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index b83123f..c645b87 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -619,39 +619,26 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
struct nfqnl_instance *queue;
unsigned int verdict;
struct nf_queue_entry *entry;
- int err;
- rcu_read_lock();
queue = instance_lookup(queue_num);
- if (!queue) {
- err = -ENODEV;
- goto err_out_unlock;
- }
+ if (!queue)
+ return -ENODEV;
- if (queue->peer_pid != NETLINK_CB(skb).pid) {
- err = -EPERM;
- goto err_out_unlock;
- }
+ if (queue->peer_pid != NETLINK_CB(skb).pid)
+ return -EPERM;
- if (!nfqa[NFQA_VERDICT_HDR]) {
- err = -EINVAL;
- goto err_out_unlock;
- }
+ if (!nfqa[NFQA_VERDICT_HDR])
+ return -EINVAL;
vhdr = nla_data(nfqa[NFQA_VERDICT_HDR]);
verdict = ntohl(vhdr->verdict);
- if ((verdict & NF_VERDICT_MASK) > NF_MAX_VERDICT) {
- err = -EINVAL;
- goto err_out_unlock;
- }
+ if ((verdict & NF_VERDICT_MASK) > NF_MAX_VERDICT)
+ return -EINVAL;
entry = find_dequeue_entry(queue, ntohl(vhdr->id));
- if (entry == NULL) {
- err = -ENOENT;
- goto err_out_unlock;
- }
- rcu_read_unlock();
+ if (entry == NULL)
+ return -ENOENT;
if (nfqa[NFQA_PAYLOAD]) {
if (nfqnl_mangle(nla_data(nfqa[NFQA_PAYLOAD]),
@@ -664,10 +651,6 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
nf_reinject(entry, verdict);
return 0;
-
-err_out_unlock:
- rcu_read_unlock();
- return err;
}
static int
@@ -780,9 +763,9 @@ err_out_unlock:
}
static const struct nfnl_callback nfqnl_cb[NFQNL_MSG_MAX] = {
- [NFQNL_MSG_PACKET] = { .call = nfqnl_recv_unsupp,
+ [NFQNL_MSG_PACKET] = { .call_rcu = nfqnl_recv_unsupp,
.attr_count = NFQA_MAX, },
- [NFQNL_MSG_VERDICT] = { .call = nfqnl_recv_verdict,
+ [NFQNL_MSG_VERDICT] = { .call_rcu = nfqnl_recv_verdict,
.attr_count = NFQA_MAX,
.policy = nfqa_verdict_policy },
[NFQNL_MSG_CONFIG] = { .call = nfqnl_recv_config,
--
1.7.2.3
^ permalink raw reply related
* [PATCH 1/8] netfilter: add SELinux context support to AUDIT target
From: kaber @ 2011-07-21 10:17 UTC (permalink / raw)
To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1311243476-18236-1-git-send-email-kaber@trash.net>
From: Mr Dash Four <mr.dash.four@googlemail.com>
In this revision the conversion of secid to SELinux context and adding it
to the audit log is moved from xt_AUDIT.c to audit.c with the aid of a
separate helper function - audit_log_secctx - which does both the conversion
and logging of SELinux context, thus also preventing internal secid number
being leaked to userspace. If conversion is not successful an error is raised.
With the introduction of this helper function the work done in xt_AUDIT.c is
much more simplified. It also opens the possibility of this helper function
being used by other modules (including auditd itself), if desired. With this
addition, typical (raw auditd) output after applying the patch would be:
type=NETFILTER_PKT msg=audit(1305852240.082:31012): action=0 hook=1 len=52 inif=? outif=eth0 saddr=10.1.1.7 daddr=10.1.2.1 ipid=16312 proto=6 sport=56150 dport=22 obj=system_u:object_r:ssh_client_packet_t:s0
type=NETFILTER_PKT msg=audit(1306772064.079:56): action=0 hook=3 len=48 inif=eth0 outif=? smac=00:05:5d:7c:27:0b dmac=00:02:b3:0a:7f:81 macproto=0x0800 saddr=10.1.2.1 daddr=10.1.1.7 ipid=462 proto=6 sport=22 dport=3561 obj=system_u:object_r:ssh_server_packet_t:s0
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Mr Dash Four <mr.dash.four@googlemail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
include/linux/audit.h | 7 +++++++
kernel/audit.c | 29 +++++++++++++++++++++++++++++
net/netfilter/xt_AUDIT.c | 5 +++++
3 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 9d339eb..0c80061 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -613,6 +613,12 @@ extern void audit_log_d_path(struct audit_buffer *ab,
extern void audit_log_key(struct audit_buffer *ab,
char *key);
extern void audit_log_lost(const char *message);
+#ifdef CONFIG_SECURITY
+extern void audit_log_secctx(struct audit_buffer *ab, u32 secid);
+#else
+#define audit_log_secctx(b,s) do { ; } while (0)
+#endif
+
extern int audit_update_lsm_rules(void);
/* Private API (for audit.c only) */
@@ -635,6 +641,7 @@ extern int audit_enabled;
#define audit_log_untrustedstring(a,s) do { ; } while (0)
#define audit_log_d_path(b, p, d) do { ; } while (0)
#define audit_log_key(b, k) do { ; } while (0)
+#define audit_log_secctx(b,s) do { ; } while (0)
#define audit_enabled 0
#endif
#endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 9395003..52501b5 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -55,6 +55,9 @@
#include <net/sock.h>
#include <net/netlink.h>
#include <linux/skbuff.h>
+#ifdef CONFIG_SECURITY
+#include <linux/security.h>
+#endif
#include <linux/netlink.h>
#include <linux/freezer.h>
#include <linux/tty.h>
@@ -1502,6 +1505,32 @@ void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
}
}
+#ifdef CONFIG_SECURITY
+/**
+ * audit_log_secctx - Converts and logs SELinux context
+ * @ab: audit_buffer
+ * @secid: security number
+ *
+ * This is a helper function that calls security_secid_to_secctx to convert
+ * secid to secctx and then adds the (converted) SELinux context to the audit
+ * log by calling audit_log_format, thus also preventing leak of internal secid
+ * to userspace. If secid cannot be converted audit_panic is called.
+ */
+void audit_log_secctx(struct audit_buffer *ab, u32 secid)
+{
+ u32 len;
+ char *secctx;
+
+ if (security_secid_to_secctx(secid, &secctx, &len)) {
+ audit_panic("Cannot convert secid to context");
+ } else {
+ audit_log_format(ab, " obj=%s", secctx);
+ security_release_secctx(secctx, len);
+ }
+}
+EXPORT_SYMBOL(audit_log_secctx);
+#endif
+
EXPORT_SYMBOL(audit_log_start);
EXPORT_SYMBOL(audit_log_end);
EXPORT_SYMBOL(audit_log_format);
diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
index 363a99e..4bca15a 100644
--- a/net/netfilter/xt_AUDIT.c
+++ b/net/netfilter/xt_AUDIT.c
@@ -163,6 +163,11 @@ audit_tg(struct sk_buff *skb, const struct xt_action_param *par)
break;
}
+#ifdef CONFIG_NETWORK_SECMARK
+ if (skb->secmark)
+ audit_log_secctx(ab, skb->secmark);
+#endif
+
audit_log_end(ab);
errout:
--
1.7.2.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox