Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 00/13] net: Add and use ether_addr_equal
From: Joe Perches @ 2012-05-09  4:56 UTC (permalink / raw)
  To: David S. Miller, netdev, bridge, netfilter-devel, netfilter,
	coreteam, linux-wireless
  Cc: linux-bluetooth, linux-kernel

Add a boolean function to test 2 ethernet addresses for equality
Convert compare_ether_addr uses to ether_addr_equal

Joe Perches (13):
  etherdevice.h: Add ether_addr_equal
  802: Convert compare_ether_addr to ether_addr_equal
  8021q: Convert compare_ether_addr to ether_addr_equal
  bridge: netfilter: Convert compare_ether_addr to ether_addr_equal
  bridge: Convert compare_ether_addr to ether_addr_equal
  atm: Convert compare_ether_addr to ether_addr_equal
  bluetooth: Convert compare_ether_addr to ether_addr_equal
  mac80211: Convert compare_ether_addr to ether_addr_equal
  mac80211: Convert compare_ether_addr to ether_addr_equal by hand
  netfilter: Convert compare_ether_addr to ether_addr_equal
  wireless: Convert compare_ether_addr to ether_addr_equal
  wireless: Convert compare_ether_addr to ether_addr_equal by hand
  dsa: Convert compare_ether_addr to ether_addr_equal

 include/linux/etherdevice.h               |   12 +++++++++
 net/802/stp.c                             |    2 +-
 net/8021q/vlan.c                          |   10 +++---
 net/8021q/vlan_core.c                     |    3 +-
 net/8021q/vlan_dev.c                      |   10 +++---
 net/atm/lec.c                             |    6 ++--
 net/atm/mpc.c                             |    3 +-
 net/bluetooth/bnep/core.c                 |    6 ++--
 net/bridge/br_device.c                    |    2 +-
 net/bridge/br_fdb.c                       |   14 +++++-----
 net/bridge/br_input.c                     |    2 +-
 net/bridge/br_stp_bpdu.c                  |    2 +-
 net/bridge/br_stp_if.c                    |   11 +++----
 net/bridge/netfilter/ebt_stp.c            |    4 +-
 net/dsa/slave.c                           |   10 +++---
 net/mac80211/cfg.c                        |    2 +-
 net/mac80211/ibss.c                       |   12 ++++----
 net/mac80211/ieee80211_i.h                |    2 +-
 net/mac80211/iface.c                      |    2 +-
 net/mac80211/mesh.c                       |    4 +-
 net/mac80211/mesh_hwmp.c                  |   14 +++++-----
 net/mac80211/mesh_pathtbl.c               |   12 ++++----
 net/mac80211/mlme.c                       |   29 +++++++++------------
 net/mac80211/rx.c                         |   39 +++++++++++++---------------
 net/mac80211/scan.c                       |    2 +-
 net/mac80211/sta_info.c                   |    8 +++---
 net/mac80211/sta_info.h                   |    2 +-
 net/mac80211/status.c                     |    2 +-
 net/mac80211/tx.c                         |   11 +++----
 net/netfilter/ipset/ip_set_bitmap_ipmac.c |    4 +-
 net/netfilter/xt_mac.c                    |    2 +-
 net/wireless/ibss.c                       |    2 +-
 net/wireless/mlme.c                       |   31 +++++++++++------------
 net/wireless/scan.c                       |    2 +-
 net/wireless/util.c                       |   11 +++----
 net/wireless/wext-sme.c                   |    2 +-
 net/wireless/wext-spy.c                   |    2 +-
 37 files changed, 147 insertions(+), 147 deletions(-)

-- 
1.7.8.111.gad25c.dirty

^ permalink raw reply

* Re: [PATCH] r8169: fix problem with TSO (TX_BUFFS_AVAIL negative value)
From: Alex Villacís Lasso @ 2012-05-09  4:12 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Thomas Pilarski, Julien Ducourthial,
	Realtek linux nic maintainers, netdev, linux-kernel
In-Reply-To: <20120507234205.GA2230@electric-eye.fr.zoreil.com>

El 07/05/12 18:42, Francois Romieu escribió:
> Julien Ducourthial<jducourt@free.fr>  :
>> The r8169 may get stuck or show bad behaviour after activating TSO :
>> the net_device is not stopped when it has no more TX descriptors.
>> This problem comes from TX_BUFS_AVAIL which may reach -1 when all
>> transmit descriptors are in use. The patch simply tries to keep positive
>> values.
> It seems more than good.
>
> Alex, Thomas, can you check if Julien's patch below fixes your broken
> kernels as well ?
>
No luck. The backtrace still appears after using the patched driver.

^ permalink raw reply

* Re: [PATCH] net/bluetooth/bnep/core.c: use constant for ethertype
From: Gustavo Padovan @ 2012-05-09  4:09 UTC (permalink / raw)
  To: Eldad Zack
  Cc: Marcel Holtmann, Johan Hedberg, David S. Miller, linux-bluetooth,
	netdev, linux-kernel
In-Reply-To: <1336428575-28996-1-git-send-email-eldad@fogrefinery.com>

Hi Eldad,

* Eldad Zack <eldad@fogrefinery.com> [2012-05-08 00:09:35 +0200]:

> The dot1q ethertype number (0x8100) is embedded in the code, although
> it is already defined in included headers.
> 
> Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
> ---
>  net/bluetooth/bnep/core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Patch has been applied to bluetooth-next. Thanks.

	Gustavo

^ permalink raw reply

* bizarre tcp rst out of nowhere...
From: Simon Chen @ 2012-05-09  2:04 UTC (permalink / raw)
  To: netdev

I am running a ubuntu 11.04 server, with kernel flavor 2.6.38-13-server.

The server is running nova-network, the openstack network controller.
Essentially, it is doing a static NATting function. When a packet
comes from a VM, it SNAT the VM's private src IP to a public floating
IP and send the packet out. When a packet is sent to the floating IP,
it DNAT the packet to the VM's private IP in reverse.

So, the server in question should be a strictly pass-through device,
while doing address translation. It doesn't have any additional
filtering rules.

I have a remote client trying to "curl https://floater_ip". The
bizarre thing is that this sometimes works, sometimes doesn't. When it
works, the packets go through my NAT server as normal. When it doesn't
work, it seems that my NAT server decides to send back a tcp-rst
immediately after the tcp-syn is received - it is supposed to pass it
on, not process it locally...

Any idea why this happens?

I've been watching the server closely, the floater IP is always on the
public nic, the iptables rules are always there, the interface is
always up, the backend VM is always reachable...

-Simon

^ permalink raw reply

* Re: [PATCH 1/1] r8169: fix unsigned int wraparound with TSO
From: David Miller @ 2012-05-08 23:35 UTC (permalink / raw)
  To: romieu; +Cc: avillaci, thomas.pi, jducourt, hayeswang, netdev
In-Reply-To: <20120508220006.GA22733@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 9 May 2012 00:00:06 +0200

> From: Julien Ducourthial <jducourt@free.fr>
> 
> The r8169 may get stuck or show bad behaviour after activating TSO :
> the net_device is not stopped when it has no more TX descriptors.
> This problem comes from TX_BUFS_AVAIL which may reach -1 when all
> transmit descriptors are in use. The patch simply tries to keep positive
> values.
> 
> Tested with 8111d(onboard) on a D510MO, and with 8111e(onboard) on a
> Zotac 890GXITX.
> 
> Signed-off-by: Julien Ducourthial <jducourt@free.fr>
> Acked-by: Francois Romieu <romieu@fr.zoreil.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [GIT net] Open vSwitch
From: David Miller @ 2012-05-08 23:32 UTC (permalink / raw)
  To: jesse-l0M0P4e3n4LQT0dZR+AlfA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1336441885-11085-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Date: Mon,  7 May 2012 18:51:22 -0700

> A few patches for net/3.4.
> 
> The following changes since commit dd775ae2549217d3ae09363e3edb305d0fa19928:
> 
>   Linux 3.4-rc1 (2012-03-31 16:24:09 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git fixes

Pulled, thanks Jesse.

^ permalink raw reply

* [PATCH 1/1] r8169: fix unsigned int wraparound with TSO
From: Francois Romieu @ 2012-05-08 22:00 UTC (permalink / raw)
  To: David Miller
  Cc: avillaci, Thomas Pilarski, Julien Ducourthial, Hayes Wang, netdev

From: Julien Ducourthial <jducourt@free.fr>

The r8169 may get stuck or show bad behaviour after activating TSO :
the net_device is not stopped when it has no more TX descriptors.
This problem comes from TX_BUFS_AVAIL which may reach -1 when all
transmit descriptors are in use. The patch simply tries to keep positive
values.

Tested with 8111d(onboard) on a D510MO, and with 8111e(onboard) on a
Zotac 890GXITX.

Signed-off-by: Julien Ducourthial <jducourt@free.fr>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
---

 The patch applies to -stable as well.

 drivers/net/ethernet/realtek/r8169.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index f545093..d1e3c51 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -61,8 +61,12 @@
 #define R8169_MSG_DEFAULT \
 	(NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN)
 
-#define TX_BUFFS_AVAIL(tp) \
-	(tp->dirty_tx + NUM_TX_DESC - tp->cur_tx - 1)
+#define TX_SLOTS_AVAIL(tp) \
+	(tp->dirty_tx + NUM_TX_DESC - tp->cur_tx)
+
+/* A skbuff with nr_frags needs nr_frags+1 entries in the tx queue */
+#define TX_FRAGS_READY_FOR(tp,nr_frags) \
+	(TX_SLOTS_AVAIL(tp) >= (nr_frags + 1))
 
 /* Maximum number of multicast addresses to filter (vs. Rx-all-multicast).
    The RTL chips use a 64 element hash table based on the Ethernet CRC. */
@@ -5115,7 +5119,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	u32 opts[2];
 	int frags;
 
-	if (unlikely(TX_BUFFS_AVAIL(tp) < skb_shinfo(skb)->nr_frags)) {
+	if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
 		netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
 		goto err_stop_0;
 	}
@@ -5169,7 +5173,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	mmiowb();
 
-	if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
+	if (!TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
 		/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
 		 * not miss a ring update when it notices a stopped queue.
 		 */
@@ -5183,7 +5187,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 		 * can't.
 		 */
 		smp_mb();
-		if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)
+		if (TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS))
 			netif_wake_queue(dev);
 	}
 
@@ -5306,7 +5310,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 		 */
 		smp_mb();
 		if (netif_queue_stopped(dev) &&
-		    (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
+		    TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
 			netif_wake_queue(dev);
 		}
 		/*
-- 
1.7.7.6

^ permalink raw reply related

* Re: SO_TIMESTAMP on tcp sockets?
From: Andy Lutomirski @ 2012-05-08 21:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development
In-Reply-To: <1336451831.3752.2373.camel@edumazet-glaptop>

On Mon, May 7, 2012 at 9:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2012-05-07 at 18:39 -0700, Andy Lutomirski wrote:
>> I've been using SO_TIMESTAMPNS to good effect on udp sockets.  I'd
>> like to do the same thing for tcp.  I realize that this is
>> semantically strange [1], but I don't think there's a real issue for
>> my use case.  We have very thin streams -- we are likely to process
>> each incoming segment as it is received, and I want the most precise
>> timestamp possible on each segment.
>>
>> A simple approach (I think) would be for a recvmsg on a tcp socket
>> with SO_TIMESTAMP(NS) to return at most one skb worth of data along
>> with the timestamp associated with that skb.  This could be a little
>> strange if multiple segments overlap or if lro is involved, but
>> neither of those cases seems like a major problem.
>>
>> Is there any interest in something like this?
>>
>
> LRO/GRO is not really a problem, buffers are merged because they are
> received in a very short time period. If you want nanosec timestamping
> on TCP, just cancel the whole idea.
>
> TCP can 'collapse' several buffers onto single ones (to reduce memory
> overhead). Which timestamp would be chosen at collapse time ?
>
> net-next also has tcp coalescing, wich also merge buffers as soon as
> they enter receive or ofo queue.

Hmm.  Here are two possibilities:

1. When timestamping is on, turn off all coalescing on that socket.
Throughput starts to suck, but at least for my use case this is
irrelevant.

2. Instead of timestamping when a given piece of data arrived,
timestamp when the socket last became readable in the POLLIN sense.
Return the answer as ancillary data on the first recvmsg after the
socket becomes readable.  This would be enough for my purposes.
(Basically, I want to be able to correlate my receives with pcap data,
at least in the common case, and I also want to be able to estimate
latency between the network interrupt and my app handling the data.
The phy timestamp would be even better, but that's not supported on my
hardware.)

>
> Another problem with SO_TIMESTAMPNS is it globally enables time stamping
> on all skbs on the host, adding some latencies. (ktime_get() can be
> slowed down when time keeping triggers and hold xtime seqlock)
>
>

This doesn't bother me too much -- I'm already paying that cost.  In
any case, it should be mostly fixable by taking the xtime lock for
write a lot less often than we do now.  Getting the time (via vdso,
which is probably much better optimized than ktime_get) takes about
15ns on my machine.

--Andy

^ permalink raw reply

* Re: [PATCH] net: orphan queued skbs if device tx can stall
From: Michael S. Tsirkin @ 2012-05-08 19:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu
In-Reply-To: <1334065929.5300.40.camel@edumazet-glaptop>

On Tue, Apr 10, 2012 at 03:52:09PM +0200, Eric Dumazet wrote:
> With following patch, no more qdisc on top of tun device,

I'm not sure killing qdisc is a right direction.
I think tun queue is currently oversized but I
also think we would benefit from a smart queue
management. As it is if tun user gets delayed,
stale packets accumulate in the queue and there's
no way to get rid of them without ploughing through
the backlog.

In a sense we don't really need the queue in tun at all,
the only reason we have it at all, is to make code simpler.
We could thinkably move skbs from qdisc directly to userspace.

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: orphan queued skbs if device tx can stall
From: Michael S. Tsirkin @ 2012-05-08 19:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Jamal Hadi Salim,
	Stephen Hemminger, Jason Wang, Neil Horman, Jiri Pirko,
	Jeff Kirsher, Michał Mirosław, Ben Hutchings,
	Herbert Xu
In-Reply-To: <1334065929.5300.40.camel@edumazet-glaptop>

On Tue, Apr 10, 2012 at 03:52:09PM +0200, Eric Dumazet wrote:
> By the way, skb orphaning should already be done in skb_orphan_try(),
> not sure why its done again in tun_net_xmit().

Because we need to force it for skbs which have tx_flags set.

> Note we perform orphaning
> right before giving skb to device on premise it'll be sent (and freed)
> in a reasonable amount of time.

It is usually the case. I think this assumption holds for
tun normally.

^ permalink raw reply

* Re: [PATCH v4 2/2] macvtap: restore vlan header on user read
From: Michael S. Tsirkin @ 2012-05-08 19:28 UTC (permalink / raw)
  To: Basil Gor; +Cc: Eric W. Biederman, David S. Miller, netdev
In-Reply-To: <1336121724-31902-2-git-send-email-basil.gor@gmail.com>

On Fri, May 04, 2012 at 12:55:24PM +0400, Basil Gor wrote:
> Ethernet vlan header is not on the packet and kept in the skb->vlan_tci
> when it comes from lower dev. This patch inserts vlan header in user
> buffer during skb copy on user read.
> 
> Signed-off-by: Basil Gor <basil.gor@gmail.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

Eric, ack?

> ---
>  drivers/net/macvtap.c |   43 ++++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 0427c65..cb8fd50 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -1,5 +1,6 @@
>  #include <linux/etherdevice.h>
>  #include <linux/if_macvlan.h>
> +#include <linux/if_vlan.h>
>  #include <linux/interrupt.h>
>  #include <linux/nsproxy.h>
>  #include <linux/compat.h>
> @@ -759,6 +760,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
>  	struct macvlan_dev *vlan;
>  	int ret;
>  	int vnet_hdr_len = 0;
> +	int vlan_offset = 0;
> +	int copied;
>  
>  	if (q->flags & IFF_VNET_HDR) {
>  		struct virtio_net_hdr vnet_hdr;
> @@ -773,18 +776,48 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
>  		if (memcpy_toiovecend(iv, (void *)&vnet_hdr, 0, sizeof(vnet_hdr)))
>  			return -EFAULT;
>  	}
> +	copied = vnet_hdr_len;
> +
> +	if (!vlan_tx_tag_present(skb))
> +		len = min_t(int, skb->len, len);
> +	else {
> +		int copy;
> +		struct {
> +			__be16 h_vlan_proto;
> +			__be16 h_vlan_TCI;
> +		} veth;
> +		veth.h_vlan_proto = htons(ETH_P_8021Q);
> +		veth.h_vlan_TCI = htons(vlan_tx_tag_get(skb));
> +
> +		vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
> +		len = min_t(int, skb->len + VLAN_HLEN, len);
> +
> +		copy = min_t(int, vlan_offset, len);
> +		ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy);
> +		len -= copy;
> +		copied += copy;
> +		if (ret || !len)
> +			goto done;
> +
> +		copy = min_t(int, sizeof(veth), len);
> +		ret = memcpy_toiovecend(iv, (void *)&veth, copied, copy);
> +		len -= copy;
> +		copied += copy;
> +		if (ret || !len)
> +			goto done;
> +	}
>  
> -	len = min_t(int, skb->len, len);
> -
> -	ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len, len);
> +	ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, copied, len);
> +	copied += len;
>  
> +done:
>  	rcu_read_lock_bh();
>  	vlan = rcu_dereference_bh(q->vlan);
>  	if (vlan)
> -		macvlan_count_rx(vlan, len, ret == 0, 0);
> +		macvlan_count_rx(vlan, copied - vnet_hdr_len, ret == 0, 0);
>  	rcu_read_unlock_bh();
>  
> -	return ret ? ret : (len + vnet_hdr_len);
> +	return ret ? ret : copied;
>  }
>  
>  static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
> -- 
> 1.7.6.5

^ permalink raw reply

* Re: [PATCH] pch_gbe: Adding read memory barriers
From: Erwan Velu @ 2012-05-08 19:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, tshimizu818
In-Reply-To: <20120508.143951.1433269599287007774.davem@davemloft.net>

Le 08/05/2012 20:39, David Miller a écrit :
> You never need to ask questions like this. Your patch is queued up to 
> be reviewed in patchwork: 
> http://patchwork.ozlabs.org/project/netdev/list/ Therefore you only 
> make more work for maintainers and irritate them by asking this, and 
> therefore it will take even longer for them to get to your patch. 

It wasn't my aim to irritate anyone. I'm just brand new into committing 
something here and surely lack of a good understanding on the complete 
process.

I do understand you are very solicited and newbies like me can sometimes 
irritate by asking questions & doing thing not properly.

Anyway, thanks for your help & patience.

Erwan

^ permalink raw reply

* Re: [PATCH v4 1/2] vhost-net: fix handle_rx buffer size
From: Michael S. Tsirkin @ 2012-05-08 19:27 UTC (permalink / raw)
  To: Basil Gor; +Cc: Eric W. Biederman, David S. Miller, netdev
In-Reply-To: <1336121724-31902-1-git-send-email-basil.gor@gmail.com>

On Fri, May 04, 2012 at 12:55:23PM +0400, Basil Gor wrote:
> Take vlan header length into account, when vlan id is stored as
> vlan_tci. Otherwise tagged packets comming from macvtap will be
> truncated.
> 
> Signed-off-by: Basil Gor <basil.gor@gmail.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

This doesn't fix packet socket backends but that can be
handled separately later.

Eric, ack?

> ---
>  drivers/vhost/net.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 1f21d2a..5c17010 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -24,6 +24,7 @@
>  #include <linux/if_arp.h>
>  #include <linux/if_tun.h>
>  #include <linux/if_macvlan.h>
> +#include <linux/if_vlan.h>
>  
>  #include <net/sock.h>
>  
> @@ -283,8 +284,12 @@ static int peek_head_len(struct sock *sk)
>  
>  	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
>  	head = skb_peek(&sk->sk_receive_queue);
> -	if (likely(head))
> +	if (likely(head)) {
>  		len = head->len;
> +		if (vlan_tx_tag_present(head))
> +			len += VLAN_HLEN;
> +	}
> +
>  	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
>  	return len;
>  }
> -- 
> 1.7.6.5

^ permalink raw reply

* Re: [PATCH 00/25] [v3] netfilter updates for net-next (upcoming 3.5)
From: David Miller @ 2012-05-08 18:51 UTC (permalink / raw)
  To: pablo; +Cc: netdev, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: pablo@netfilter.org
Date: Tue,  8 May 2012 20:37:58 +0200

> git://1984.lsi.us.es/net-next master

Pulled, thanks.

^ permalink raw reply

* [PATCH 25/25] netfilter: remove ip_queue support
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.

This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.

Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 Documentation/ABI/removed/ip_queue      |    9 +
 include/linux/netfilter_ipv4/Kbuild     |    1 -
 include/linux/netfilter_ipv4/ip_queue.h |   72 ----
 include/linux/netlink.h                 |    2 +-
 net/ipv4/netfilter/Makefile             |    3 -
 net/ipv4/netfilter/ip_queue.c           |  639 ------------------------------
 net/ipv6/netfilter/Kconfig              |   22 --
 net/ipv6/netfilter/Makefile             |    1 -
 net/ipv6/netfilter/ip6_queue.c          |  641 -------------------------------
 security/selinux/nlmsgtab.c             |   13 -
 10 files changed, 10 insertions(+), 1393 deletions(-)
 create mode 100644 Documentation/ABI/removed/ip_queue
 delete mode 100644 include/linux/netfilter_ipv4/ip_queue.h
 delete mode 100644 net/ipv4/netfilter/ip_queue.c
 delete mode 100644 net/ipv6/netfilter/ip6_queue.c

diff --git a/Documentation/ABI/removed/ip_queue b/Documentation/ABI/removed/ip_queue
new file mode 100644
index 0000000..3243613
--- /dev/null
+++ b/Documentation/ABI/removed/ip_queue
@@ -0,0 +1,9 @@
+What:		ip_queue
+Date:		finally removed in kernel v3.5.0
+Contact:	Pablo Neira Ayuso <pablo@netfilter.org>
+Description:
+	ip_queue has been replaced by nfnetlink_queue which provides
+	more advanced queueing mechanism to user-space. The ip_queue
+	module was already announced to become obsolete years ago.
+
+Users:
diff --git a/include/linux/netfilter_ipv4/Kbuild b/include/linux/netfilter_ipv4/Kbuild
index 31f8bec..c61b8fb 100644
--- a/include/linux/netfilter_ipv4/Kbuild
+++ b/include/linux/netfilter_ipv4/Kbuild
@@ -1,4 +1,3 @@
-header-y += ip_queue.h
 header-y += ip_tables.h
 header-y += ipt_CLUSTERIP.h
 header-y += ipt_ECN.h
diff --git a/include/linux/netfilter_ipv4/ip_queue.h b/include/linux/netfilter_ipv4/ip_queue.h
deleted file mode 100644
index a03507f..0000000
--- a/include/linux/netfilter_ipv4/ip_queue.h
+++ /dev/null
@@ -1,72 +0,0 @@
-/*
- * This is a module which is used for queueing IPv4 packets and
- * communicating with userspace via netlink.
- *
- * (C) 2000 James Morris, this code is GPL.
- */
-#ifndef _IP_QUEUE_H
-#define _IP_QUEUE_H
-
-#ifdef __KERNEL__
-#ifdef DEBUG_IPQ
-#define QDEBUG(x...) printk(KERN_DEBUG ## x)
-#else
-#define QDEBUG(x...)
-#endif  /* DEBUG_IPQ */
-#else
-#include <net/if.h>
-#endif	/* ! __KERNEL__ */
-
-/* Messages sent from kernel */
-typedef struct ipq_packet_msg {
-	unsigned long packet_id;	/* ID of queued packet */
-	unsigned long mark;		/* Netfilter mark value */
-	long timestamp_sec;		/* Packet arrival time (seconds) */
-	long timestamp_usec;		/* Packet arrvial time (+useconds) */
-	unsigned int hook;		/* Netfilter hook we rode in on */
-	char indev_name[IFNAMSIZ];	/* Name of incoming interface */
-	char outdev_name[IFNAMSIZ];	/* Name of outgoing interface */
-	__be16 hw_protocol;		/* Hardware protocol (network order) */
-	unsigned short hw_type;		/* Hardware type */
-	unsigned char hw_addrlen;	/* Hardware address length */
-	unsigned char hw_addr[8];	/* Hardware address */
-	size_t data_len;		/* Length of packet data */
-	unsigned char payload[0];	/* Optional packet data */
-} ipq_packet_msg_t;
-
-/* Messages sent from userspace */
-typedef struct ipq_mode_msg {
-	unsigned char value;		/* Requested mode */
-	size_t range;			/* Optional range of packet requested */
-} ipq_mode_msg_t;
-
-typedef struct ipq_verdict_msg {
-	unsigned int value;		/* Verdict to hand to netfilter */
-	unsigned long id;		/* Packet ID for this verdict */
-	size_t data_len;		/* Length of replacement data */
-	unsigned char payload[0];	/* Optional replacement packet */
-} ipq_verdict_msg_t;
-
-typedef struct ipq_peer_msg {
-	union {
-		ipq_verdict_msg_t verdict;
-		ipq_mode_msg_t mode;
-	} msg;
-} ipq_peer_msg_t;
-
-/* Packet delivery modes */
-enum {
-	IPQ_COPY_NONE,		/* Initial mode, packets are dropped */
-	IPQ_COPY_META,		/* Copy metadata */
-	IPQ_COPY_PACKET		/* Copy metadata + packet (range) */
-};
-#define IPQ_COPY_MAX IPQ_COPY_PACKET
-
-/* Types of messages */
-#define IPQM_BASE	0x10	/* standard netlink messages below this */
-#define IPQM_MODE	(IPQM_BASE + 1)		/* Mode request from peer */
-#define IPQM_VERDICT	(IPQM_BASE + 2)		/* Verdict from peer */ 
-#define IPQM_PACKET	(IPQM_BASE + 3)		/* Packet from kernel */
-#define IPQM_MAX	(IPQM_BASE + 4)
-
-#endif /*_IP_QUEUE_H*/
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index a2092f5..0f628ff 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -7,7 +7,7 @@
 #define NETLINK_ROUTE		0	/* Routing/device hook				*/
 #define NETLINK_UNUSED		1	/* Unused number				*/
 #define NETLINK_USERSOCK	2	/* Reserved for user mode socket protocols 	*/
-#define NETLINK_FIREWALL	3	/* Firewalling hook				*/
+#define NETLINK_FIREWALL	3	/* Unused number, formerly ip_queue		*/
 #define NETLINK_SOCK_DIAG	4	/* socket monitoring				*/
 #define NETLINK_NFLOG		5	/* netfilter/iptables ULOG */
 #define NETLINK_XFRM		6	/* ipsec */
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 240b684..c20674d 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -66,6 +66,3 @@ obj-$(CONFIG_IP_NF_ARP_MANGLE) += arpt_mangle.o
 
 # just filtering instance of ARP tables for now
 obj-$(CONFIG_IP_NF_ARPFILTER) += arptable_filter.o
-
-obj-$(CONFIG_IP_NF_QUEUE) += ip_queue.o
-
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
deleted file mode 100644
index 09775a1..0000000
--- a/net/ipv4/netfilter/ip_queue.c
+++ /dev/null
@@ -1,639 +0,0 @@
-/*
- * This is a module which is used for queueing IPv4 packets and
- * communicating with userspace via netlink.
- *
- * (C) 2000-2002 James Morris <jmorris@intercode.com.au>
- * (C) 2003-2005 Netfilter Core Team <coreteam@netfilter.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/module.h>
-#include <linux/skbuff.h>
-#include <linux/init.h>
-#include <linux/ip.h>
-#include <linux/notifier.h>
-#include <linux/netdevice.h>
-#include <linux/netfilter.h>
-#include <linux/netfilter_ipv4/ip_queue.h>
-#include <linux/netfilter_ipv4/ip_tables.h>
-#include <linux/netlink.h>
-#include <linux/spinlock.h>
-#include <linux/sysctl.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/security.h>
-#include <linux/net.h>
-#include <linux/mutex.h>
-#include <linux/slab.h>
-#include <net/net_namespace.h>
-#include <net/sock.h>
-#include <net/route.h>
-#include <net/netfilter/nf_queue.h>
-#include <net/ip.h>
-
-#define IPQ_QMAX_DEFAULT 1024
-#define IPQ_PROC_FS_NAME "ip_queue"
-#define NET_IPQ_QMAX 2088
-#define NET_IPQ_QMAX_NAME "ip_queue_maxlen"
-
-typedef int (*ipq_cmpfn)(struct nf_queue_entry *, unsigned long);
-
-static unsigned char copy_mode __read_mostly = IPQ_COPY_NONE;
-static unsigned int queue_maxlen __read_mostly = IPQ_QMAX_DEFAULT;
-static DEFINE_SPINLOCK(queue_lock);
-static int peer_pid __read_mostly;
-static unsigned int copy_range __read_mostly;
-static unsigned int queue_total;
-static unsigned int queue_dropped = 0;
-static unsigned int queue_user_dropped = 0;
-static struct sock *ipqnl __read_mostly;
-static LIST_HEAD(queue_list);
-static DEFINE_MUTEX(ipqnl_mutex);
-
-static inline void
-__ipq_enqueue_entry(struct nf_queue_entry *entry)
-{
-       list_add_tail(&entry->list, &queue_list);
-       queue_total++;
-}
-
-static inline int
-__ipq_set_mode(unsigned char mode, unsigned int range)
-{
-	int status = 0;
-
-	switch(mode) {
-	case IPQ_COPY_NONE:
-	case IPQ_COPY_META:
-		copy_mode = mode;
-		copy_range = 0;
-		break;
-
-	case IPQ_COPY_PACKET:
-		if (range > 0xFFFF)
-			range = 0xFFFF;
-		copy_range = range;
-		copy_mode = mode;
-		break;
-
-	default:
-		status = -EINVAL;
-
-	}
-	return status;
-}
-
-static void __ipq_flush(ipq_cmpfn cmpfn, unsigned long data);
-
-static inline void
-__ipq_reset(void)
-{
-	peer_pid = 0;
-	net_disable_timestamp();
-	__ipq_set_mode(IPQ_COPY_NONE, 0);
-	__ipq_flush(NULL, 0);
-}
-
-static struct nf_queue_entry *
-ipq_find_dequeue_entry(unsigned long id)
-{
-	struct nf_queue_entry *entry = NULL, *i;
-
-	spin_lock_bh(&queue_lock);
-
-	list_for_each_entry(i, &queue_list, list) {
-		if ((unsigned long)i == id) {
-			entry = i;
-			break;
-		}
-	}
-
-	if (entry) {
-		list_del(&entry->list);
-		queue_total--;
-	}
-
-	spin_unlock_bh(&queue_lock);
-	return entry;
-}
-
-static void
-__ipq_flush(ipq_cmpfn cmpfn, unsigned long data)
-{
-	struct nf_queue_entry *entry, *next;
-
-	list_for_each_entry_safe(entry, next, &queue_list, list) {
-		if (!cmpfn || cmpfn(entry, data)) {
-			list_del(&entry->list);
-			queue_total--;
-			nf_reinject(entry, NF_DROP);
-		}
-	}
-}
-
-static void
-ipq_flush(ipq_cmpfn cmpfn, unsigned long data)
-{
-	spin_lock_bh(&queue_lock);
-	__ipq_flush(cmpfn, data);
-	spin_unlock_bh(&queue_lock);
-}
-
-static struct sk_buff *
-ipq_build_packet_message(struct nf_queue_entry *entry, int *errp)
-{
-	sk_buff_data_t old_tail;
-	size_t size = 0;
-	size_t data_len = 0;
-	struct sk_buff *skb;
-	struct ipq_packet_msg *pmsg;
-	struct nlmsghdr *nlh;
-	struct timeval tv;
-
-	switch (ACCESS_ONCE(copy_mode)) {
-	case IPQ_COPY_META:
-	case IPQ_COPY_NONE:
-		size = NLMSG_SPACE(sizeof(*pmsg));
-		break;
-
-	case IPQ_COPY_PACKET:
-		if (entry->skb->ip_summed == CHECKSUM_PARTIAL &&
-		    (*errp = skb_checksum_help(entry->skb)))
-			return NULL;
-
-		data_len = ACCESS_ONCE(copy_range);
-		if (data_len == 0 || data_len > entry->skb->len)
-			data_len = entry->skb->len;
-
-		size = NLMSG_SPACE(sizeof(*pmsg) + data_len);
-		break;
-
-	default:
-		*errp = -EINVAL;
-		return NULL;
-	}
-
-	skb = alloc_skb(size, GFP_ATOMIC);
-	if (!skb)
-		goto nlmsg_failure;
-
-	old_tail = skb->tail;
-	nlh = NLMSG_PUT(skb, 0, 0, IPQM_PACKET, size - sizeof(*nlh));
-	pmsg = NLMSG_DATA(nlh);
-	memset(pmsg, 0, sizeof(*pmsg));
-
-	pmsg->packet_id       = (unsigned long )entry;
-	pmsg->data_len        = data_len;
-	tv = ktime_to_timeval(entry->skb->tstamp);
-	pmsg->timestamp_sec   = tv.tv_sec;
-	pmsg->timestamp_usec  = tv.tv_usec;
-	pmsg->mark            = entry->skb->mark;
-	pmsg->hook            = entry->hook;
-	pmsg->hw_protocol     = entry->skb->protocol;
-
-	if (entry->indev)
-		strcpy(pmsg->indev_name, entry->indev->name);
-	else
-		pmsg->indev_name[0] = '\0';
-
-	if (entry->outdev)
-		strcpy(pmsg->outdev_name, entry->outdev->name);
-	else
-		pmsg->outdev_name[0] = '\0';
-
-	if (entry->indev && entry->skb->dev &&
-	    entry->skb->mac_header != entry->skb->network_header) {
-		pmsg->hw_type = entry->skb->dev->type;
-		pmsg->hw_addrlen = dev_parse_header(entry->skb,
-						    pmsg->hw_addr);
-	}
-
-	if (data_len)
-		if (skb_copy_bits(entry->skb, 0, pmsg->payload, data_len))
-			BUG();
-
-	nlh->nlmsg_len = skb->tail - old_tail;
-	return skb;
-
-nlmsg_failure:
-	kfree_skb(skb);
-	*errp = -EINVAL;
-	printk(KERN_ERR "ip_queue: error creating packet message\n");
-	return NULL;
-}
-
-static int
-ipq_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
-{
-	int status = -EINVAL;
-	struct sk_buff *nskb;
-
-	if (copy_mode == IPQ_COPY_NONE)
-		return -EAGAIN;
-
-	nskb = ipq_build_packet_message(entry, &status);
-	if (nskb == NULL)
-		return status;
-
-	spin_lock_bh(&queue_lock);
-
-	if (!peer_pid)
-		goto err_out_free_nskb;
-
-	if (queue_total >= queue_maxlen) {
-		queue_dropped++;
-		status = -ENOSPC;
-		if (net_ratelimit())
-			  printk (KERN_WARNING "ip_queue: full at %d entries, "
-				  "dropping packets(s). Dropped: %d\n", queue_total,
-				  queue_dropped);
-		goto err_out_free_nskb;
-	}
-
-	/* netlink_unicast will either free the nskb or attach it to a socket */
-	status = netlink_unicast(ipqnl, nskb, peer_pid, MSG_DONTWAIT);
-	if (status < 0) {
-		queue_user_dropped++;
-		goto err_out_unlock;
-	}
-
-	__ipq_enqueue_entry(entry);
-
-	spin_unlock_bh(&queue_lock);
-	return status;
-
-err_out_free_nskb:
-	kfree_skb(nskb);
-
-err_out_unlock:
-	spin_unlock_bh(&queue_lock);
-	return status;
-}
-
-static int
-ipq_mangle_ipv4(ipq_verdict_msg_t *v, struct nf_queue_entry *e)
-{
-	int diff;
-	struct iphdr *user_iph = (struct iphdr *)v->payload;
-	struct sk_buff *nskb;
-
-	if (v->data_len < sizeof(*user_iph))
-		return 0;
-	diff = v->data_len - e->skb->len;
-	if (diff < 0) {
-		if (pskb_trim(e->skb, v->data_len))
-			return -ENOMEM;
-	} else if (diff > 0) {
-		if (v->data_len > 0xFFFF)
-			return -EINVAL;
-		if (diff > skb_tailroom(e->skb)) {
-			nskb = skb_copy_expand(e->skb, skb_headroom(e->skb),
-					       diff, GFP_ATOMIC);
-			if (!nskb) {
-				printk(KERN_WARNING "ip_queue: error "
-				      "in mangle, dropping packet\n");
-				return -ENOMEM;
-			}
-			kfree_skb(e->skb);
-			e->skb = nskb;
-		}
-		skb_put(e->skb, diff);
-	}
-	if (!skb_make_writable(e->skb, v->data_len))
-		return -ENOMEM;
-	skb_copy_to_linear_data(e->skb, v->payload, v->data_len);
-	e->skb->ip_summed = CHECKSUM_NONE;
-
-	return 0;
-}
-
-static int
-ipq_set_verdict(struct ipq_verdict_msg *vmsg, unsigned int len)
-{
-	struct nf_queue_entry *entry;
-
-	if (vmsg->value > NF_MAX_VERDICT || vmsg->value == NF_STOLEN)
-		return -EINVAL;
-
-	entry = ipq_find_dequeue_entry(vmsg->id);
-	if (entry == NULL)
-		return -ENOENT;
-	else {
-		int verdict = vmsg->value;
-
-		if (vmsg->data_len && vmsg->data_len == len)
-			if (ipq_mangle_ipv4(vmsg, entry) < 0)
-				verdict = NF_DROP;
-
-		nf_reinject(entry, verdict);
-		return 0;
-	}
-}
-
-static int
-ipq_set_mode(unsigned char mode, unsigned int range)
-{
-	int status;
-
-	spin_lock_bh(&queue_lock);
-	status = __ipq_set_mode(mode, range);
-	spin_unlock_bh(&queue_lock);
-	return status;
-}
-
-static int
-ipq_receive_peer(struct ipq_peer_msg *pmsg,
-		 unsigned char type, unsigned int len)
-{
-	int status = 0;
-
-	if (len < sizeof(*pmsg))
-		return -EINVAL;
-
-	switch (type) {
-	case IPQM_MODE:
-		status = ipq_set_mode(pmsg->msg.mode.value,
-				      pmsg->msg.mode.range);
-		break;
-
-	case IPQM_VERDICT:
-		status = ipq_set_verdict(&pmsg->msg.verdict,
-					 len - sizeof(*pmsg));
-		break;
-	default:
-		status = -EINVAL;
-	}
-	return status;
-}
-
-static int
-dev_cmp(struct nf_queue_entry *entry, unsigned long ifindex)
-{
-	if (entry->indev)
-		if (entry->indev->ifindex == ifindex)
-			return 1;
-	if (entry->outdev)
-		if (entry->outdev->ifindex == ifindex)
-			return 1;
-#ifdef CONFIG_BRIDGE_NETFILTER
-	if (entry->skb->nf_bridge) {
-		if (entry->skb->nf_bridge->physindev &&
-		    entry->skb->nf_bridge->physindev->ifindex == ifindex)
-			return 1;
-		if (entry->skb->nf_bridge->physoutdev &&
-		    entry->skb->nf_bridge->physoutdev->ifindex == ifindex)
-			return 1;
-	}
-#endif
-	return 0;
-}
-
-static void
-ipq_dev_drop(int ifindex)
-{
-	ipq_flush(dev_cmp, ifindex);
-}
-
-#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while (0)
-
-static inline void
-__ipq_rcv_skb(struct sk_buff *skb)
-{
-	int status, type, pid, flags;
-	unsigned int nlmsglen, skblen;
-	struct nlmsghdr *nlh;
-	bool enable_timestamp = false;
-
-	skblen = skb->len;
-	if (skblen < sizeof(*nlh))
-		return;
-
-	nlh = nlmsg_hdr(skb);
-	nlmsglen = nlh->nlmsg_len;
-	if (nlmsglen < sizeof(*nlh) || skblen < nlmsglen)
-		return;
-
-	pid = nlh->nlmsg_pid;
-	flags = nlh->nlmsg_flags;
-
-	if(pid <= 0 || !(flags & NLM_F_REQUEST) || flags & NLM_F_MULTI)
-		RCV_SKB_FAIL(-EINVAL);
-
-	if (flags & MSG_TRUNC)
-		RCV_SKB_FAIL(-ECOMM);
-
-	type = nlh->nlmsg_type;
-	if (type < NLMSG_NOOP || type >= IPQM_MAX)
-		RCV_SKB_FAIL(-EINVAL);
-
-	if (type <= IPQM_BASE)
-		return;
-
-	if (!capable(CAP_NET_ADMIN))
-		RCV_SKB_FAIL(-EPERM);
-
-	spin_lock_bh(&queue_lock);
-
-	if (peer_pid) {
-		if (peer_pid != pid) {
-			spin_unlock_bh(&queue_lock);
-			RCV_SKB_FAIL(-EBUSY);
-		}
-	} else {
-		enable_timestamp = true;
-		peer_pid = pid;
-	}
-
-	spin_unlock_bh(&queue_lock);
-	if (enable_timestamp)
-		net_enable_timestamp();
-	status = ipq_receive_peer(NLMSG_DATA(nlh), type,
-				  nlmsglen - NLMSG_LENGTH(0));
-	if (status < 0)
-		RCV_SKB_FAIL(status);
-
-	if (flags & NLM_F_ACK)
-		netlink_ack(skb, nlh, 0);
-}
-
-static void
-ipq_rcv_skb(struct sk_buff *skb)
-{
-	mutex_lock(&ipqnl_mutex);
-	__ipq_rcv_skb(skb);
-	mutex_unlock(&ipqnl_mutex);
-}
-
-static int
-ipq_rcv_dev_event(struct notifier_block *this,
-		  unsigned long event, void *ptr)
-{
-	struct net_device *dev = ptr;
-
-	if (!net_eq(dev_net(dev), &init_net))
-		return NOTIFY_DONE;
-
-	/* Drop any packets associated with the downed device */
-	if (event == NETDEV_DOWN)
-		ipq_dev_drop(dev->ifindex);
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block ipq_dev_notifier = {
-	.notifier_call	= ipq_rcv_dev_event,
-};
-
-static int
-ipq_rcv_nl_event(struct notifier_block *this,
-		 unsigned long event, void *ptr)
-{
-	struct netlink_notify *n = ptr;
-
-	if (event == NETLINK_URELEASE && n->protocol == NETLINK_FIREWALL) {
-		spin_lock_bh(&queue_lock);
-		if ((net_eq(n->net, &init_net)) && (n->pid == peer_pid))
-			__ipq_reset();
-		spin_unlock_bh(&queue_lock);
-	}
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block ipq_nl_notifier = {
-	.notifier_call	= ipq_rcv_nl_event,
-};
-
-#ifdef CONFIG_SYSCTL
-static struct ctl_table_header *ipq_sysctl_header;
-
-static ctl_table ipq_table[] = {
-	{
-		.procname	= NET_IPQ_QMAX_NAME,
-		.data		= &queue_maxlen,
-		.maxlen		= sizeof(queue_maxlen),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec
-	},
-	{ }
-};
-#endif
-
-#ifdef CONFIG_PROC_FS
-static int ip_queue_show(struct seq_file *m, void *v)
-{
-	spin_lock_bh(&queue_lock);
-
-	seq_printf(m,
-		      "Peer PID          : %d\n"
-		      "Copy mode         : %hu\n"
-		      "Copy range        : %u\n"
-		      "Queue length      : %u\n"
-		      "Queue max. length : %u\n"
-		      "Queue dropped     : %u\n"
-		      "Netlink dropped   : %u\n",
-		      peer_pid,
-		      copy_mode,
-		      copy_range,
-		      queue_total,
-		      queue_maxlen,
-		      queue_dropped,
-		      queue_user_dropped);
-
-	spin_unlock_bh(&queue_lock);
-	return 0;
-}
-
-static int ip_queue_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, ip_queue_show, NULL);
-}
-
-static const struct file_operations ip_queue_proc_fops = {
-	.open		= ip_queue_open,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= single_release,
-	.owner		= THIS_MODULE,
-};
-#endif
-
-static const struct nf_queue_handler nfqh = {
-	.name	= "ip_queue",
-	.outfn	= &ipq_enqueue_packet,
-};
-
-static int __init ip_queue_init(void)
-{
-	int status = -ENOMEM;
-	struct proc_dir_entry *proc __maybe_unused;
-
-	netlink_register_notifier(&ipq_nl_notifier);
-	ipqnl = netlink_kernel_create(&init_net, NETLINK_FIREWALL, 0,
-				      ipq_rcv_skb, NULL, THIS_MODULE);
-	if (ipqnl == NULL) {
-		printk(KERN_ERR "ip_queue: failed to create netlink socket\n");
-		goto cleanup_netlink_notifier;
-	}
-
-#ifdef CONFIG_PROC_FS
-	proc = proc_create(IPQ_PROC_FS_NAME, 0, init_net.proc_net,
-			   &ip_queue_proc_fops);
-	if (!proc) {
-		printk(KERN_ERR "ip_queue: failed to create proc entry\n");
-		goto cleanup_ipqnl;
-	}
-#endif
-	register_netdevice_notifier(&ipq_dev_notifier);
-#ifdef CONFIG_SYSCTL
-	ipq_sysctl_header = register_net_sysctl(&init_net, "net/ipv4", ipq_table);
-#endif
-	status = nf_register_queue_handler(NFPROTO_IPV4, &nfqh);
-	if (status < 0) {
-		printk(KERN_ERR "ip_queue: failed to register queue handler\n");
-		goto cleanup_sysctl;
-	}
-	return status;
-
-cleanup_sysctl:
-#ifdef CONFIG_SYSCTL
-	unregister_net_sysctl_table(ipq_sysctl_header);
-#endif
-	unregister_netdevice_notifier(&ipq_dev_notifier);
-	proc_net_remove(&init_net, IPQ_PROC_FS_NAME);
-cleanup_ipqnl: __maybe_unused
-	netlink_kernel_release(ipqnl);
-	mutex_lock(&ipqnl_mutex);
-	mutex_unlock(&ipqnl_mutex);
-
-cleanup_netlink_notifier:
-	netlink_unregister_notifier(&ipq_nl_notifier);
-	return status;
-}
-
-static void __exit ip_queue_fini(void)
-{
-	nf_unregister_queue_handlers(&nfqh);
-
-	ipq_flush(NULL, 0);
-
-#ifdef CONFIG_SYSCTL
-	unregister_net_sysctl_table(ipq_sysctl_header);
-#endif
-	unregister_netdevice_notifier(&ipq_dev_notifier);
-	proc_net_remove(&init_net, IPQ_PROC_FS_NAME);
-
-	netlink_kernel_release(ipqnl);
-	mutex_lock(&ipqnl_mutex);
-	mutex_unlock(&ipqnl_mutex);
-
-	netlink_unregister_notifier(&ipq_nl_notifier);
-}
-
-MODULE_DESCRIPTION("IPv4 packet queue handler");
-MODULE_AUTHOR("James Morris <jmorris@intercode.com.au>");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_FIREWALL);
-
-module_init(ip_queue_init);
-module_exit(ip_queue_fini);
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index d33cddd..1013534 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -25,28 +25,6 @@ config NF_CONNTRACK_IPV6
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
-config IP6_NF_QUEUE
-	tristate "IP6 Userspace queueing via NETLINK (OBSOLETE)"
-	depends on INET && IPV6 && NETFILTER
-	depends on NETFILTER_ADVANCED
-	---help---
-
-	  This option adds a queue handler to the kernel for IPv6
-	  packets which enables users to receive the filtered packets
-	  with QUEUE target using libipq.
-
-	  This option enables the old IPv6-only "ip6_queue" implementation
-	  which has been obsoleted by the new "nfnetlink_queue" code (see
-	  CONFIG_NETFILTER_NETLINK_QUEUE).
-
-	  (C) Fernando Anton 2001
-	  IPv64 Project - Work based in IPv64 draft by Arturo Azcorra.
-	  Universidad Carlos III de Madrid
-	  Universidad Politecnica de Alcala de Henares
-	  email: <fanton@it.uc3m.es>.
-
-	  To compile it as a module, choose M here.  If unsure, say N.
-
 config IP6_NF_IPTABLES
 	tristate "IP6 tables support (required for filtering)"
 	depends on INET && IPV6
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index d4dfd0a..534d3f2 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -6,7 +6,6 @@
 obj-$(CONFIG_IP6_NF_IPTABLES) += ip6_tables.o
 obj-$(CONFIG_IP6_NF_FILTER) += ip6table_filter.o
 obj-$(CONFIG_IP6_NF_MANGLE) += ip6table_mangle.o
-obj-$(CONFIG_IP6_NF_QUEUE) += ip6_queue.o
 obj-$(CONFIG_IP6_NF_RAW) += ip6table_raw.o
 obj-$(CONFIG_IP6_NF_SECURITY) += ip6table_security.o
 
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
deleted file mode 100644
index 3ca9303..0000000
--- a/net/ipv6/netfilter/ip6_queue.c
+++ /dev/null
@@ -1,641 +0,0 @@
-/*
- * This is a module which is used for queueing IPv6 packets and
- * communicating with userspace via netlink.
- *
- * (C) 2001 Fernando Anton, this code is GPL.
- *     IPv64 Project - Work based in IPv64 draft by Arturo Azcorra.
- *     Universidad Carlos III de Madrid - Leganes (Madrid) - Spain
- *     Universidad Politecnica de Alcala de Henares - Alcala de H. (Madrid) - Spain
- *     email: fanton@it.uc3m.es
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/module.h>
-#include <linux/skbuff.h>
-#include <linux/init.h>
-#include <linux/ipv6.h>
-#include <linux/notifier.h>
-#include <linux/netdevice.h>
-#include <linux/netfilter.h>
-#include <linux/netlink.h>
-#include <linux/spinlock.h>
-#include <linux/sysctl.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/mutex.h>
-#include <linux/slab.h>
-#include <net/net_namespace.h>
-#include <net/sock.h>
-#include <net/ipv6.h>
-#include <net/ip6_route.h>
-#include <net/netfilter/nf_queue.h>
-#include <linux/netfilter_ipv4/ip_queue.h>
-#include <linux/netfilter_ipv4/ip_tables.h>
-#include <linux/netfilter_ipv6/ip6_tables.h>
-
-#define IPQ_QMAX_DEFAULT 1024
-#define IPQ_PROC_FS_NAME "ip6_queue"
-#define NET_IPQ_QMAX_NAME "ip6_queue_maxlen"
-
-typedef int (*ipq_cmpfn)(struct nf_queue_entry *, unsigned long);
-
-static unsigned char copy_mode __read_mostly = IPQ_COPY_NONE;
-static unsigned int queue_maxlen __read_mostly = IPQ_QMAX_DEFAULT;
-static DEFINE_SPINLOCK(queue_lock);
-static int peer_pid __read_mostly;
-static unsigned int copy_range __read_mostly;
-static unsigned int queue_total;
-static unsigned int queue_dropped = 0;
-static unsigned int queue_user_dropped = 0;
-static struct sock *ipqnl __read_mostly;
-static LIST_HEAD(queue_list);
-static DEFINE_MUTEX(ipqnl_mutex);
-
-static inline void
-__ipq_enqueue_entry(struct nf_queue_entry *entry)
-{
-       list_add_tail(&entry->list, &queue_list);
-       queue_total++;
-}
-
-static inline int
-__ipq_set_mode(unsigned char mode, unsigned int range)
-{
-	int status = 0;
-
-	switch(mode) {
-	case IPQ_COPY_NONE:
-	case IPQ_COPY_META:
-		copy_mode = mode;
-		copy_range = 0;
-		break;
-
-	case IPQ_COPY_PACKET:
-		if (range > 0xFFFF)
-			range = 0xFFFF;
-		copy_range = range;
-		copy_mode = mode;
-		break;
-
-	default:
-		status = -EINVAL;
-
-	}
-	return status;
-}
-
-static void __ipq_flush(ipq_cmpfn cmpfn, unsigned long data);
-
-static inline void
-__ipq_reset(void)
-{
-	peer_pid = 0;
-	net_disable_timestamp();
-	__ipq_set_mode(IPQ_COPY_NONE, 0);
-	__ipq_flush(NULL, 0);
-}
-
-static struct nf_queue_entry *
-ipq_find_dequeue_entry(unsigned long id)
-{
-	struct nf_queue_entry *entry = NULL, *i;
-
-	spin_lock_bh(&queue_lock);
-
-	list_for_each_entry(i, &queue_list, list) {
-		if ((unsigned long)i == id) {
-			entry = i;
-			break;
-		}
-	}
-
-	if (entry) {
-		list_del(&entry->list);
-		queue_total--;
-	}
-
-	spin_unlock_bh(&queue_lock);
-	return entry;
-}
-
-static void
-__ipq_flush(ipq_cmpfn cmpfn, unsigned long data)
-{
-	struct nf_queue_entry *entry, *next;
-
-	list_for_each_entry_safe(entry, next, &queue_list, list) {
-		if (!cmpfn || cmpfn(entry, data)) {
-			list_del(&entry->list);
-			queue_total--;
-			nf_reinject(entry, NF_DROP);
-		}
-	}
-}
-
-static void
-ipq_flush(ipq_cmpfn cmpfn, unsigned long data)
-{
-	spin_lock_bh(&queue_lock);
-	__ipq_flush(cmpfn, data);
-	spin_unlock_bh(&queue_lock);
-}
-
-static struct sk_buff *
-ipq_build_packet_message(struct nf_queue_entry *entry, int *errp)
-{
-	sk_buff_data_t old_tail;
-	size_t size = 0;
-	size_t data_len = 0;
-	struct sk_buff *skb;
-	struct ipq_packet_msg *pmsg;
-	struct nlmsghdr *nlh;
-	struct timeval tv;
-
-	switch (ACCESS_ONCE(copy_mode)) {
-	case IPQ_COPY_META:
-	case IPQ_COPY_NONE:
-		size = NLMSG_SPACE(sizeof(*pmsg));
-		break;
-
-	case IPQ_COPY_PACKET:
-		if (entry->skb->ip_summed == CHECKSUM_PARTIAL &&
-		    (*errp = skb_checksum_help(entry->skb)))
-			return NULL;
-
-		data_len = ACCESS_ONCE(copy_range);
-		if (data_len == 0 || data_len > entry->skb->len)
-			data_len = entry->skb->len;
-
-		size = NLMSG_SPACE(sizeof(*pmsg) + data_len);
-		break;
-
-	default:
-		*errp = -EINVAL;
-		return NULL;
-	}
-
-	skb = alloc_skb(size, GFP_ATOMIC);
-	if (!skb)
-		goto nlmsg_failure;
-
-	old_tail = skb->tail;
-	nlh = NLMSG_PUT(skb, 0, 0, IPQM_PACKET, size - sizeof(*nlh));
-	pmsg = NLMSG_DATA(nlh);
-	memset(pmsg, 0, sizeof(*pmsg));
-
-	pmsg->packet_id       = (unsigned long )entry;
-	pmsg->data_len        = data_len;
-	tv = ktime_to_timeval(entry->skb->tstamp);
-	pmsg->timestamp_sec   = tv.tv_sec;
-	pmsg->timestamp_usec  = tv.tv_usec;
-	pmsg->mark            = entry->skb->mark;
-	pmsg->hook            = entry->hook;
-	pmsg->hw_protocol     = entry->skb->protocol;
-
-	if (entry->indev)
-		strcpy(pmsg->indev_name, entry->indev->name);
-	else
-		pmsg->indev_name[0] = '\0';
-
-	if (entry->outdev)
-		strcpy(pmsg->outdev_name, entry->outdev->name);
-	else
-		pmsg->outdev_name[0] = '\0';
-
-	if (entry->indev && entry->skb->dev &&
-	    entry->skb->mac_header != entry->skb->network_header) {
-		pmsg->hw_type = entry->skb->dev->type;
-		pmsg->hw_addrlen = dev_parse_header(entry->skb, pmsg->hw_addr);
-	}
-
-	if (data_len)
-		if (skb_copy_bits(entry->skb, 0, pmsg->payload, data_len))
-			BUG();
-
-	nlh->nlmsg_len = skb->tail - old_tail;
-	return skb;
-
-nlmsg_failure:
-	kfree_skb(skb);
-	*errp = -EINVAL;
-	printk(KERN_ERR "ip6_queue: error creating packet message\n");
-	return NULL;
-}
-
-static int
-ipq_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
-{
-	int status = -EINVAL;
-	struct sk_buff *nskb;
-
-	if (copy_mode == IPQ_COPY_NONE)
-		return -EAGAIN;
-
-	nskb = ipq_build_packet_message(entry, &status);
-	if (nskb == NULL)
-		return status;
-
-	spin_lock_bh(&queue_lock);
-
-	if (!peer_pid)
-		goto err_out_free_nskb;
-
-	if (queue_total >= queue_maxlen) {
-		queue_dropped++;
-		status = -ENOSPC;
-		if (net_ratelimit())
-			printk (KERN_WARNING "ip6_queue: fill at %d entries, "
-				"dropping packet(s).  Dropped: %d\n", queue_total,
-				queue_dropped);
-		goto err_out_free_nskb;
-	}
-
-	/* netlink_unicast will either free the nskb or attach it to a socket */
-	status = netlink_unicast(ipqnl, nskb, peer_pid, MSG_DONTWAIT);
-	if (status < 0) {
-		queue_user_dropped++;
-		goto err_out_unlock;
-	}
-
-	__ipq_enqueue_entry(entry);
-
-	spin_unlock_bh(&queue_lock);
-	return status;
-
-err_out_free_nskb:
-	kfree_skb(nskb);
-
-err_out_unlock:
-	spin_unlock_bh(&queue_lock);
-	return status;
-}
-
-static int
-ipq_mangle_ipv6(ipq_verdict_msg_t *v, struct nf_queue_entry *e)
-{
-	int diff;
-	struct ipv6hdr *user_iph = (struct ipv6hdr *)v->payload;
-	struct sk_buff *nskb;
-
-	if (v->data_len < sizeof(*user_iph))
-		return 0;
-	diff = v->data_len - e->skb->len;
-	if (diff < 0) {
-		if (pskb_trim(e->skb, v->data_len))
-			return -ENOMEM;
-	} else if (diff > 0) {
-		if (v->data_len > 0xFFFF)
-			return -EINVAL;
-		if (diff > skb_tailroom(e->skb)) {
-			nskb = skb_copy_expand(e->skb, skb_headroom(e->skb),
-					       diff, GFP_ATOMIC);
-			if (!nskb) {
-				printk(KERN_WARNING "ip6_queue: OOM "
-				      "in mangle, dropping packet\n");
-				return -ENOMEM;
-			}
-			kfree_skb(e->skb);
-			e->skb = nskb;
-		}
-		skb_put(e->skb, diff);
-	}
-	if (!skb_make_writable(e->skb, v->data_len))
-		return -ENOMEM;
-	skb_copy_to_linear_data(e->skb, v->payload, v->data_len);
-	e->skb->ip_summed = CHECKSUM_NONE;
-
-	return 0;
-}
-
-static int
-ipq_set_verdict(struct ipq_verdict_msg *vmsg, unsigned int len)
-{
-	struct nf_queue_entry *entry;
-
-	if (vmsg->value > NF_MAX_VERDICT || vmsg->value == NF_STOLEN)
-		return -EINVAL;
-
-	entry = ipq_find_dequeue_entry(vmsg->id);
-	if (entry == NULL)
-		return -ENOENT;
-	else {
-		int verdict = vmsg->value;
-
-		if (vmsg->data_len && vmsg->data_len == len)
-			if (ipq_mangle_ipv6(vmsg, entry) < 0)
-				verdict = NF_DROP;
-
-		nf_reinject(entry, verdict);
-		return 0;
-	}
-}
-
-static int
-ipq_set_mode(unsigned char mode, unsigned int range)
-{
-	int status;
-
-	spin_lock_bh(&queue_lock);
-	status = __ipq_set_mode(mode, range);
-	spin_unlock_bh(&queue_lock);
-	return status;
-}
-
-static int
-ipq_receive_peer(struct ipq_peer_msg *pmsg,
-		 unsigned char type, unsigned int len)
-{
-	int status = 0;
-
-	if (len < sizeof(*pmsg))
-		return -EINVAL;
-
-	switch (type) {
-	case IPQM_MODE:
-		status = ipq_set_mode(pmsg->msg.mode.value,
-				      pmsg->msg.mode.range);
-		break;
-
-	case IPQM_VERDICT:
-		status = ipq_set_verdict(&pmsg->msg.verdict,
-					 len - sizeof(*pmsg));
-		break;
-	default:
-		status = -EINVAL;
-	}
-	return status;
-}
-
-static int
-dev_cmp(struct nf_queue_entry *entry, unsigned long ifindex)
-{
-	if (entry->indev)
-		if (entry->indev->ifindex == ifindex)
-			return 1;
-
-	if (entry->outdev)
-		if (entry->outdev->ifindex == ifindex)
-			return 1;
-#ifdef CONFIG_BRIDGE_NETFILTER
-	if (entry->skb->nf_bridge) {
-		if (entry->skb->nf_bridge->physindev &&
-		    entry->skb->nf_bridge->physindev->ifindex == ifindex)
-			return 1;
-		if (entry->skb->nf_bridge->physoutdev &&
-		    entry->skb->nf_bridge->physoutdev->ifindex == ifindex)
-			return 1;
-	}
-#endif
-	return 0;
-}
-
-static void
-ipq_dev_drop(int ifindex)
-{
-	ipq_flush(dev_cmp, ifindex);
-}
-
-#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while (0)
-
-static inline void
-__ipq_rcv_skb(struct sk_buff *skb)
-{
-	int status, type, pid, flags;
-	unsigned int nlmsglen, skblen;
-	struct nlmsghdr *nlh;
-	bool enable_timestamp = false;
-
-	skblen = skb->len;
-	if (skblen < sizeof(*nlh))
-		return;
-
-	nlh = nlmsg_hdr(skb);
-	nlmsglen = nlh->nlmsg_len;
-	if (nlmsglen < sizeof(*nlh) || skblen < nlmsglen)
-		return;
-
-	pid = nlh->nlmsg_pid;
-	flags = nlh->nlmsg_flags;
-
-	if(pid <= 0 || !(flags & NLM_F_REQUEST) || flags & NLM_F_MULTI)
-		RCV_SKB_FAIL(-EINVAL);
-
-	if (flags & MSG_TRUNC)
-		RCV_SKB_FAIL(-ECOMM);
-
-	type = nlh->nlmsg_type;
-	if (type < NLMSG_NOOP || type >= IPQM_MAX)
-		RCV_SKB_FAIL(-EINVAL);
-
-	if (type <= IPQM_BASE)
-		return;
-
-	if (!capable(CAP_NET_ADMIN))
-		RCV_SKB_FAIL(-EPERM);
-
-	spin_lock_bh(&queue_lock);
-
-	if (peer_pid) {
-		if (peer_pid != pid) {
-			spin_unlock_bh(&queue_lock);
-			RCV_SKB_FAIL(-EBUSY);
-		}
-	} else {
-		enable_timestamp = true;
-		peer_pid = pid;
-	}
-
-	spin_unlock_bh(&queue_lock);
-	if (enable_timestamp)
-		net_enable_timestamp();
-
-	status = ipq_receive_peer(NLMSG_DATA(nlh), type,
-				  nlmsglen - NLMSG_LENGTH(0));
-	if (status < 0)
-		RCV_SKB_FAIL(status);
-
-	if (flags & NLM_F_ACK)
-		netlink_ack(skb, nlh, 0);
-}
-
-static void
-ipq_rcv_skb(struct sk_buff *skb)
-{
-	mutex_lock(&ipqnl_mutex);
-	__ipq_rcv_skb(skb);
-	mutex_unlock(&ipqnl_mutex);
-}
-
-static int
-ipq_rcv_dev_event(struct notifier_block *this,
-		  unsigned long event, void *ptr)
-{
-	struct net_device *dev = ptr;
-
-	if (!net_eq(dev_net(dev), &init_net))
-		return NOTIFY_DONE;
-
-	/* Drop any packets associated with the downed device */
-	if (event == NETDEV_DOWN)
-		ipq_dev_drop(dev->ifindex);
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block ipq_dev_notifier = {
-	.notifier_call	= ipq_rcv_dev_event,
-};
-
-static int
-ipq_rcv_nl_event(struct notifier_block *this,
-		 unsigned long event, void *ptr)
-{
-	struct netlink_notify *n = ptr;
-
-	if (event == NETLINK_URELEASE && n->protocol == NETLINK_IP6_FW) {
-		spin_lock_bh(&queue_lock);
-		if ((net_eq(n->net, &init_net)) && (n->pid == peer_pid))
-			__ipq_reset();
-		spin_unlock_bh(&queue_lock);
-	}
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block ipq_nl_notifier = {
-	.notifier_call	= ipq_rcv_nl_event,
-};
-
-#ifdef CONFIG_SYSCTL
-static struct ctl_table_header *ipq_sysctl_header;
-
-static ctl_table ipq_table[] = {
-	{
-		.procname	= NET_IPQ_QMAX_NAME,
-		.data		= &queue_maxlen,
-		.maxlen		= sizeof(queue_maxlen),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec
-	},
-	{ }
-};
-#endif
-
-#ifdef CONFIG_PROC_FS
-static int ip6_queue_show(struct seq_file *m, void *v)
-{
-	spin_lock_bh(&queue_lock);
-
-	seq_printf(m,
-		      "Peer PID          : %d\n"
-		      "Copy mode         : %hu\n"
-		      "Copy range        : %u\n"
-		      "Queue length      : %u\n"
-		      "Queue max. length : %u\n"
-		      "Queue dropped     : %u\n"
-		      "Netfilter dropped : %u\n",
-		      peer_pid,
-		      copy_mode,
-		      copy_range,
-		      queue_total,
-		      queue_maxlen,
-		      queue_dropped,
-		      queue_user_dropped);
-
-	spin_unlock_bh(&queue_lock);
-	return 0;
-}
-
-static int ip6_queue_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, ip6_queue_show, NULL);
-}
-
-static const struct file_operations ip6_queue_proc_fops = {
-	.open		= ip6_queue_open,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= single_release,
-	.owner		= THIS_MODULE,
-};
-#endif
-
-static const struct nf_queue_handler nfqh = {
-	.name	= "ip6_queue",
-	.outfn	= &ipq_enqueue_packet,
-};
-
-static int __init ip6_queue_init(void)
-{
-	int status = -ENOMEM;
-	struct proc_dir_entry *proc __maybe_unused;
-
-	netlink_register_notifier(&ipq_nl_notifier);
-	ipqnl = netlink_kernel_create(&init_net, NETLINK_IP6_FW, 0,
-			              ipq_rcv_skb, NULL, THIS_MODULE);
-	if (ipqnl == NULL) {
-		printk(KERN_ERR "ip6_queue: failed to create netlink socket\n");
-		goto cleanup_netlink_notifier;
-	}
-
-#ifdef CONFIG_PROC_FS
-	proc = proc_create(IPQ_PROC_FS_NAME, 0, init_net.proc_net,
-			   &ip6_queue_proc_fops);
-	if (!proc) {
-		printk(KERN_ERR "ip6_queue: failed to create proc entry\n");
-		goto cleanup_ipqnl;
-	}
-#endif
-	register_netdevice_notifier(&ipq_dev_notifier);
-#ifdef CONFIG_SYSCTL
-	ipq_sysctl_header = register_net_sysctl(&init_net, "net/ipv6", ipq_table);
-#endif
-	status = nf_register_queue_handler(NFPROTO_IPV6, &nfqh);
-	if (status < 0) {
-		printk(KERN_ERR "ip6_queue: failed to register queue handler\n");
-		goto cleanup_sysctl;
-	}
-	return status;
-
-cleanup_sysctl:
-#ifdef CONFIG_SYSCTL
-	unregister_net_sysctl_table(ipq_sysctl_header);
-#endif
-	unregister_netdevice_notifier(&ipq_dev_notifier);
-	proc_net_remove(&init_net, IPQ_PROC_FS_NAME);
-
-cleanup_ipqnl: __maybe_unused
-	netlink_kernel_release(ipqnl);
-	mutex_lock(&ipqnl_mutex);
-	mutex_unlock(&ipqnl_mutex);
-
-cleanup_netlink_notifier:
-	netlink_unregister_notifier(&ipq_nl_notifier);
-	return status;
-}
-
-static void __exit ip6_queue_fini(void)
-{
-	nf_unregister_queue_handlers(&nfqh);
-
-	ipq_flush(NULL, 0);
-
-#ifdef CONFIG_SYSCTL
-	unregister_net_sysctl_table(ipq_sysctl_header);
-#endif
-	unregister_netdevice_notifier(&ipq_dev_notifier);
-	proc_net_remove(&init_net, IPQ_PROC_FS_NAME);
-
-	netlink_kernel_release(ipqnl);
-	mutex_lock(&ipqnl_mutex);
-	mutex_unlock(&ipqnl_mutex);
-
-	netlink_unregister_notifier(&ipq_nl_notifier);
-}
-
-MODULE_DESCRIPTION("IPv6 packet queue handler");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_IP6_FW);
-
-module_init(ip6_queue_init);
-module_exit(ip6_queue_fini);
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 0920ea3..d309e7f 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -14,7 +14,6 @@
 #include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 #include <linux/if.h>
-#include <linux/netfilter_ipv4/ip_queue.h>
 #include <linux/inet_diag.h>
 #include <linux/xfrm.h>
 #include <linux/audit.h>
@@ -70,12 +69,6 @@ static struct nlmsg_perm nlmsg_route_perms[] =
 	{ RTM_SETDCB,		NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 };
 
-static struct nlmsg_perm nlmsg_firewall_perms[] =
-{
-	{ IPQM_MODE,		NETLINK_FIREWALL_SOCKET__NLMSG_WRITE },
-	{ IPQM_VERDICT,		NETLINK_FIREWALL_SOCKET__NLMSG_WRITE },
-};
-
 static struct nlmsg_perm nlmsg_tcpdiag_perms[] =
 {
 	{ TCPDIAG_GETSOCK,	NETLINK_TCPDIAG_SOCKET__NLMSG_READ },
@@ -145,12 +138,6 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
 				 sizeof(nlmsg_route_perms));
 		break;
 
-	case SECCLASS_NETLINK_FIREWALL_SOCKET:
-	case SECCLASS_NETLINK_IP6FW_SOCKET:
-		err = nlmsg_perm(nlmsg_type, perm, nlmsg_firewall_perms,
-				 sizeof(nlmsg_firewall_perms));
-		break;
-
 	case SECCLASS_NETLINK_TCPDIAG_SOCKET:
 		err = nlmsg_perm(nlmsg_type, perm, nlmsg_tcpdiag_perms,
 				 sizeof(nlmsg_tcpdiag_perms));
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 22/25] net: export sysctl_[r|w]mem_max symbols needed by ip_vs_sync
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max
needs to be exported.

The dependency was added by "ipvs: wakeup master thread" patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/core/sock.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index b8c818e..26ed27f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -259,7 +259,9 @@ static struct lock_class_key af_callback_keys[AF_MAX];
 
 /* Run time adjustable parameters. */
 __u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX;
+EXPORT_SYMBOL(sysctl_wmem_max);
 __u32 sysctl_rmem_max __read_mostly = SK_RMEM_MAX;
+EXPORT_SYMBOL(sysctl_rmem_max);
 __u32 sysctl_wmem_default __read_mostly = SK_WMEM_MAX;
 __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 04/25] netfilter: bridge: optionally set indev to vlan
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.

When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.

This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.

Also update Documentation with current brnf default settings.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 Documentation/networking/ip-sysctl.txt |   13 +++++++++++--
 net/bridge/br_netfilter.c              |   26 ++++++++++++++++++++++++--
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 90b0c4f..6f896b9 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1301,13 +1301,22 @@ bridge-nf-call-ip6tables - BOOLEAN
 bridge-nf-filter-vlan-tagged - BOOLEAN
 	1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables.
 	0 : disable this.
-	Default: 1
+	Default: 0
 
 bridge-nf-filter-pppoe-tagged - BOOLEAN
 	1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables.
 	0 : disable this.
-	Default: 1
+	Default: 0
 
+bridge-nf-pass-vlan-input-dev - BOOLEAN
+	1: if bridge-nf-filter-vlan-tagged is enabled, try to find a vlan
+	interface on the bridge and set the netfilter input device to the vlan.
+	This allows use of e.g. "iptables -i br0.1" and makes the REDIRECT
+	target work with vlan-on-top-of-bridge interfaces.  When no matching
+	vlan interface is found, or this switch is off, the input device is
+	set to the bridge interface.
+	0: disable bridge netfilter vlan interface lookup.
+	Default: 0
 
 proc/sys/net/sctp/* Variables:
 
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 53f0836..dce55d4 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -54,12 +54,14 @@ static int brnf_call_ip6tables __read_mostly = 1;
 static int brnf_call_arptables __read_mostly = 1;
 static int brnf_filter_vlan_tagged __read_mostly = 0;
 static int brnf_filter_pppoe_tagged __read_mostly = 0;
+static int brnf_pass_vlan_indev __read_mostly = 0;
 #else
 #define brnf_call_iptables 1
 #define brnf_call_ip6tables 1
 #define brnf_call_arptables 1
 #define brnf_filter_vlan_tagged 0
 #define brnf_filter_pppoe_tagged 0
+#define brnf_pass_vlan_indev 0
 #endif
 
 #define IS_IP(skb) \
@@ -503,6 +505,19 @@ bridged_dnat:
 	return 0;
 }
 
+static struct net_device *brnf_get_logical_dev(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct net_device *vlan, *br;
+
+	br = bridge_parent(dev);
+	if (brnf_pass_vlan_indev == 0 || !vlan_tx_tag_present(skb))
+		return br;
+
+	vlan = __vlan_find_dev_deep(br, vlan_tx_tag_get(skb) & VLAN_VID_MASK);
+
+	return vlan ? vlan : br;
+}
+
 /* Some common code for IPv4/IPv6 */
 static struct net_device *setup_pre_routing(struct sk_buff *skb)
 {
@@ -515,7 +530,7 @@ static struct net_device *setup_pre_routing(struct sk_buff *skb)
 
 	nf_bridge->mask |= BRNF_NF_BRIDGE_PREROUTING;
 	nf_bridge->physindev = skb->dev;
-	skb->dev = bridge_parent(skb->dev);
+	skb->dev = brnf_get_logical_dev(skb, skb->dev);
 	if (skb->protocol == htons(ETH_P_8021Q))
 		nf_bridge->mask |= BRNF_8021Q;
 	else if (skb->protocol == htons(ETH_P_PPP_SES))
@@ -774,7 +789,7 @@ static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff *skb,
 	else
 		skb->protocol = htons(ETH_P_IPV6);
 
-	NF_HOOK(pf, NF_INET_FORWARD, skb, bridge_parent(in), parent,
+	NF_HOOK(pf, NF_INET_FORWARD, skb, brnf_get_logical_dev(skb, in), parent,
 		br_nf_forward_finish);
 
 	return NF_STOLEN;
@@ -1002,6 +1017,13 @@ static ctl_table brnf_table[] = {
 		.mode		= 0644,
 		.proc_handler	= brnf_sysctl_call_tables,
 	},
+	{
+		.procname	= "bridge-nf-pass-vlan-input-dev",
+		.data		= &brnf_pass_vlan_indev,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= brnf_sysctl_call_tables,
+	},
 	{ }
 };
 #endif
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 05/25] ipvs: timeout tables do not need GFP_ATOMIC allocation
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: Julian Anastasov <ja@ssi.bg>

	They are called only on initialization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_proto.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index fdc82ad..ca16476 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -199,7 +199,7 @@ void ip_vs_protocol_timeout_change(struct netns_ipvs *ipvs, int flags)
 int *
 ip_vs_create_timeout_table(int *table, int size)
 {
-	return kmemdup(table, size, GFP_ATOMIC);
+	return kmemdup(table, size, GFP_KERNEL);
 }
 
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 20/25] ipvs: ip_vs_ftp: local functions should not be exposed globally
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: H Hartley Sweeten <hartleys@visionengravers.com>

Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.

This quiets the sparse warnings:

warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ftp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 510f2b5..b20b29c 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -485,7 +485,7 @@ static struct pernet_operations ip_vs_ftp_ops = {
 	.exit = __ip_vs_ftp_exit,
 };
 
-int __init ip_vs_ftp_init(void)
+static int __init ip_vs_ftp_init(void)
 {
 	int rv;
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 21/25] ipvs: ip_vs_proto: local functions should not be exposed globally
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: H Hartley Sweeten <hartleys@visionengravers.com>

Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.

This quiets the sparse warnings:

warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_proto.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index e91c898..50d82186 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -156,7 +156,7 @@ EXPORT_SYMBOL(ip_vs_proto_get);
 /*
  *	get ip_vs_protocol object data by netns and proto
  */
-struct ip_vs_proto_data *
+static struct ip_vs_proto_data *
 __ipvs_proto_data_get(struct netns_ipvs *ipvs, unsigned short proto)
 {
 	struct ip_vs_proto_data *pd;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 17/25] ipvs: reduce sync rate with time thresholds
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: Julian Anastasov <ja@ssi.bg>

	Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
	timer that triggers new sync message. It can be used to
	avoid sync messages for the specified period (or half of
	the connection timeout if it is lower) if connection state
	is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
	sync_refresh_period/8. Useful to protect against loss of
	sync messages.

	Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

	Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

	Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h             |   30 +++++++++-
 net/netfilter/ipvs/ip_vs_conn.c |    7 ++-
 net/netfilter/ipvs/ip_vs_core.c |   30 +---------
 net/netfilter/ipvs/ip_vs_ctl.c  |   25 +++++++-
 net/netfilter/ipvs/ip_vs_sync.c |  121 +++++++++++++++++++++++++++++++++------
 5 files changed, 165 insertions(+), 48 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 30e43c8..d3a4b93 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -504,6 +504,7 @@ struct ip_vs_conn {
 						 * state transition triggerd
 						 * synchronization
 						 */
+	unsigned long		sync_endtime;	/* jiffies + sent_retries */
 
 	/* Control members */
 	struct ip_vs_conn       *control;       /* Master control connection */
@@ -875,6 +876,8 @@ struct netns_ipvs {
 	int			sysctl_expire_nodest_conn;
 	int			sysctl_expire_quiescent_template;
 	int			sysctl_sync_threshold[2];
+	unsigned int		sysctl_sync_refresh_period;
+	int			sysctl_sync_retries;
 	int			sysctl_nat_icmp_send;
 
 	/* ip_vs_lblc */
@@ -916,10 +919,13 @@ struct netns_ipvs {
 #define DEFAULT_SYNC_THRESHOLD	3
 #define DEFAULT_SYNC_PERIOD	50
 #define DEFAULT_SYNC_VER	1
+#define DEFAULT_SYNC_REFRESH_PERIOD	(0U * HZ)
+#define DEFAULT_SYNC_RETRIES		0
 #define IPVS_SYNC_WAKEUP_RATE	8
 #define IPVS_SYNC_QLEN_MAX	(IPVS_SYNC_WAKEUP_RATE * 4)
 #define IPVS_SYNC_SEND_DELAY	(HZ / 50)
 #define IPVS_SYNC_CHECK_PERIOD	HZ
+#define IPVS_SYNC_FLUSH_TIME	(HZ * 2)
 
 #ifdef CONFIG_SYSCTL
 
@@ -930,7 +936,17 @@ static inline int sysctl_sync_threshold(struct netns_ipvs *ipvs)
 
 static inline int sysctl_sync_period(struct netns_ipvs *ipvs)
 {
-	return ipvs->sysctl_sync_threshold[1];
+	return ACCESS_ONCE(ipvs->sysctl_sync_threshold[1]);
+}
+
+static inline unsigned int sysctl_sync_refresh_period(struct netns_ipvs *ipvs)
+{
+	return ACCESS_ONCE(ipvs->sysctl_sync_refresh_period);
+}
+
+static inline int sysctl_sync_retries(struct netns_ipvs *ipvs)
+{
+	return ipvs->sysctl_sync_retries;
 }
 
 static inline int sysctl_sync_ver(struct netns_ipvs *ipvs)
@@ -960,6 +976,16 @@ static inline int sysctl_sync_period(struct netns_ipvs *ipvs)
 	return DEFAULT_SYNC_PERIOD;
 }
 
+static inline unsigned int sysctl_sync_refresh_period(struct netns_ipvs *ipvs)
+{
+	return DEFAULT_SYNC_REFRESH_PERIOD;
+}
+
+static inline int sysctl_sync_retries(struct netns_ipvs *ipvs)
+{
+	return DEFAULT_SYNC_RETRIES & 3;
+}
+
 static inline int sysctl_sync_ver(struct netns_ipvs *ipvs)
 {
 	return DEFAULT_SYNC_VER;
@@ -1248,7 +1274,7 @@ extern struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp);
 extern int start_sync_thread(struct net *net, int state, char *mcast_ifn,
 			     __u8 syncid);
 extern int stop_sync_thread(struct net *net, int state);
-extern void ip_vs_sync_conn(struct net *net, struct ip_vs_conn *cp);
+extern void ip_vs_sync_conn(struct net *net, struct ip_vs_conn *cp, int pkts);
 
 
 /*
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index fd74f88..4f3205d 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -762,7 +762,8 @@ int ip_vs_check_template(struct ip_vs_conn *ct)
 static void ip_vs_conn_expire(unsigned long data)
 {
 	struct ip_vs_conn *cp = (struct ip_vs_conn *)data;
-	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));
+	struct net *net = ip_vs_conn_net(cp);
+	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	cp->timeout = 60*HZ;
 
@@ -827,6 +828,9 @@ static void ip_vs_conn_expire(unsigned long data)
 		  atomic_read(&cp->refcnt)-1,
 		  atomic_read(&cp->n_control));
 
+	if (ipvs->sync_state & IP_VS_STATE_MASTER)
+		ip_vs_sync_conn(net, cp, sysctl_sync_threshold(ipvs));
+
 	ip_vs_conn_put(cp);
 }
 
@@ -900,6 +904,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 	/* Set its state and timeout */
 	cp->state = 0;
 	cp->timeout = 3*HZ;
+	cp->sync_endtime = jiffies & ~3UL;
 
 	/* Bind its packet transmitter */
 #ifdef CONFIG_IP_VS_IPV6
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index c8f36b9..a54b018c 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1613,34 +1613,8 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	else
 		pkts = atomic_add_return(1, &cp->in_pkts);
 
-	if ((ipvs->sync_state & IP_VS_STATE_MASTER) &&
-	    cp->protocol == IPPROTO_SCTP) {
-		if ((cp->state == IP_VS_SCTP_S_ESTABLISHED &&
-			(pkts % sysctl_sync_period(ipvs)
-			 == sysctl_sync_threshold(ipvs))) ||
-				(cp->old_state != cp->state &&
-				 ((cp->state == IP_VS_SCTP_S_CLOSED) ||
-				  (cp->state == IP_VS_SCTP_S_SHUT_ACK_CLI) ||
-				  (cp->state == IP_VS_SCTP_S_SHUT_ACK_SER)))) {
-			ip_vs_sync_conn(net, cp);
-			goto out;
-		}
-	}
-
-	/* Keep this block last: TCP and others with pp->num_states <= 1 */
-	else if ((ipvs->sync_state & IP_VS_STATE_MASTER) &&
-	    (((cp->protocol != IPPROTO_TCP ||
-	       cp->state == IP_VS_TCP_S_ESTABLISHED) &&
-	      (pkts % sysctl_sync_period(ipvs)
-	       == sysctl_sync_threshold(ipvs))) ||
-	     ((cp->protocol == IPPROTO_TCP) && (cp->old_state != cp->state) &&
-	      ((cp->state == IP_VS_TCP_S_FIN_WAIT) ||
-	       (cp->state == IP_VS_TCP_S_CLOSE) ||
-	       (cp->state == IP_VS_TCP_S_CLOSE_WAIT) ||
-	       (cp->state == IP_VS_TCP_S_TIME_WAIT)))))
-		ip_vs_sync_conn(net, cp);
-out:
-	cp->old_state = cp->state;
+	if (ipvs->sync_state & IP_VS_STATE_MASTER)
+		ip_vs_sync_conn(net, cp, pkts);
 
 	ip_vs_conn_put(cp);
 	return ret;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index bd3827e..a77b9bd 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1599,6 +1599,10 @@ static int ip_vs_zero_all(struct net *net)
 }
 
 #ifdef CONFIG_SYSCTL
+
+static int zero;
+static int three = 3;
+
 static int
 proc_do_defense_mode(ctl_table *table, int write,
 		     void __user *buffer, size_t *lenp, loff_t *ppos)
@@ -1632,7 +1636,8 @@ proc_do_sync_threshold(ctl_table *table, int write,
 	memcpy(val, valp, sizeof(val));
 
 	rc = proc_dointvec(table, write, buffer, lenp, ppos);
-	if (write && (valp[0] < 0 || valp[1] < 0 || valp[0] >= valp[1])) {
+	if (write && (valp[0] < 0 || valp[1] < 0 ||
+	    (valp[0] >= valp[1] && valp[1]))) {
 		/* Restore the correct value */
 		memcpy(valp, val, sizeof(val));
 	}
@@ -1755,6 +1760,20 @@ static struct ctl_table vs_vars[] = {
 		.proc_handler	= proc_do_sync_threshold,
 	},
 	{
+		.procname	= "sync_refresh_period",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_jiffies,
+	},
+	{
+		.procname	= "sync_retries",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &three,
+	},
+	{
 		.procname	= "nat_icmp_send",
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
@@ -3678,6 +3697,10 @@ int __net_init ip_vs_control_net_init_sysctl(struct net *net)
 	ipvs->sysctl_sync_threshold[1] = DEFAULT_SYNC_PERIOD;
 	tbl[idx].data = &ipvs->sysctl_sync_threshold;
 	tbl[idx++].maxlen = sizeof(ipvs->sysctl_sync_threshold);
+	ipvs->sysctl_sync_refresh_period = DEFAULT_SYNC_REFRESH_PERIOD;
+	tbl[idx++].data = &ipvs->sysctl_sync_refresh_period;
+	ipvs->sysctl_sync_retries = clamp_t(int, DEFAULT_SYNC_RETRIES, 0, 3);
+	tbl[idx++].data = &ipvs->sysctl_sync_retries;
 	tbl[idx++].data = &ipvs->sysctl_nat_icmp_send;
 
 
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index b3235b2..8d6a421 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -451,11 +451,94 @@ ip_vs_sync_buff_create_v0(struct netns_ipvs *ipvs)
 	return sb;
 }
 
+/* Check if conn should be synced.
+ * pkts: conn packets, use sysctl_sync_threshold to avoid packet check
+ * - (1) sync_refresh_period: reduce sync rate. Additionally, retry
+ *	sync_retries times with period of sync_refresh_period/8
+ * - (2) if both sync_refresh_period and sync_period are 0 send sync only
+ *	for state changes or only once when pkts matches sync_threshold
+ * - (3) templates: rate can be reduced only with sync_refresh_period or
+ *	with (2)
+ */
+static int ip_vs_sync_conn_needed(struct netns_ipvs *ipvs,
+				  struct ip_vs_conn *cp, int pkts)
+{
+	unsigned long orig = ACCESS_ONCE(cp->sync_endtime);
+	unsigned long now = jiffies;
+	unsigned long n = (now + cp->timeout) & ~3UL;
+	unsigned int sync_refresh_period;
+	int sync_period;
+	int force;
+
+	/* Check if we sync in current state */
+	if (unlikely(cp->flags & IP_VS_CONN_F_TEMPLATE))
+		force = 0;
+	else if (likely(cp->protocol == IPPROTO_TCP)) {
+		if (!((1 << cp->state) &
+		      ((1 << IP_VS_TCP_S_ESTABLISHED) |
+		       (1 << IP_VS_TCP_S_FIN_WAIT) |
+		       (1 << IP_VS_TCP_S_CLOSE) |
+		       (1 << IP_VS_TCP_S_CLOSE_WAIT) |
+		       (1 << IP_VS_TCP_S_TIME_WAIT))))
+			return 0;
+		force = cp->state != cp->old_state;
+		if (force && cp->state != IP_VS_TCP_S_ESTABLISHED)
+			goto set;
+	} else if (unlikely(cp->protocol == IPPROTO_SCTP)) {
+		if (!((1 << cp->state) &
+		      ((1 << IP_VS_SCTP_S_ESTABLISHED) |
+		       (1 << IP_VS_SCTP_S_CLOSED) |
+		       (1 << IP_VS_SCTP_S_SHUT_ACK_CLI) |
+		       (1 << IP_VS_SCTP_S_SHUT_ACK_SER))))
+			return 0;
+		force = cp->state != cp->old_state;
+		if (force && cp->state != IP_VS_SCTP_S_ESTABLISHED)
+			goto set;
+	} else {
+		/* UDP or another protocol with single state */
+		force = 0;
+	}
+
+	sync_refresh_period = sysctl_sync_refresh_period(ipvs);
+	if (sync_refresh_period > 0) {
+		long diff = n - orig;
+		long min_diff = max(cp->timeout >> 1, 10UL * HZ);
+
+		/* Avoid sync if difference is below sync_refresh_period
+		 * and below the half timeout.
+		 */
+		if (abs(diff) < min_t(long, sync_refresh_period, min_diff)) {
+			int retries = orig & 3;
+
+			if (retries >= sysctl_sync_retries(ipvs))
+				return 0;
+			if (time_before(now, orig - cp->timeout +
+					(sync_refresh_period >> 3)))
+				return 0;
+			n |= retries + 1;
+		}
+	}
+	sync_period = sysctl_sync_period(ipvs);
+	if (sync_period > 0) {
+		if (!(cp->flags & IP_VS_CONN_F_TEMPLATE) &&
+		    pkts % sync_period != sysctl_sync_threshold(ipvs))
+			return 0;
+	} else if (sync_refresh_period <= 0 &&
+		   pkts != sysctl_sync_threshold(ipvs))
+		return 0;
+
+set:
+	cp->old_state = cp->state;
+	n = cmpxchg(&cp->sync_endtime, orig, n);
+	return n == orig || force;
+}
+
 /*
  *      Version 0 , could be switched in by sys_ctl.
  *      Add an ip_vs_conn information into the current sync_buff.
  */
-void ip_vs_sync_conn_v0(struct net *net, struct ip_vs_conn *cp)
+static void ip_vs_sync_conn_v0(struct net *net, struct ip_vs_conn *cp,
+			       int pkts)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_sync_mesg_v0 *m;
@@ -468,6 +551,9 @@ void ip_vs_sync_conn_v0(struct net *net, struct ip_vs_conn *cp)
 	if (cp->flags & IP_VS_CONN_F_ONE_PACKET)
 		return;
 
+	if (!ip_vs_sync_conn_needed(ipvs, cp, pkts))
+		return;
+
 	spin_lock(&ipvs->sync_buff_lock);
 	if (!ipvs->sync_buff) {
 		ipvs->sync_buff =
@@ -513,8 +599,14 @@ void ip_vs_sync_conn_v0(struct net *net, struct ip_vs_conn *cp)
 	spin_unlock(&ipvs->sync_buff_lock);
 
 	/* synchronize its controller if it has */
-	if (cp->control)
-		ip_vs_sync_conn(net, cp->control);
+	cp = cp->control;
+	if (cp) {
+		if (cp->flags & IP_VS_CONN_F_TEMPLATE)
+			pkts = atomic_add_return(1, &cp->in_pkts);
+		else
+			pkts = sysctl_sync_threshold(ipvs);
+		ip_vs_sync_conn(net, cp->control, pkts);
+	}
 }
 
 /*
@@ -522,7 +614,7 @@ void ip_vs_sync_conn_v0(struct net *net, struct ip_vs_conn *cp)
  *      Called by ip_vs_in.
  *      Sending Version 1 messages
  */
-void ip_vs_sync_conn(struct net *net, struct ip_vs_conn *cp)
+void ip_vs_sync_conn(struct net *net, struct ip_vs_conn *cp, int pkts)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_sync_mesg *m;
@@ -532,13 +624,16 @@ void ip_vs_sync_conn(struct net *net, struct ip_vs_conn *cp)
 
 	/* Handle old version of the protocol */
 	if (sysctl_sync_ver(ipvs) == 0) {
-		ip_vs_sync_conn_v0(net, cp);
+		ip_vs_sync_conn_v0(net, cp, pkts);
 		return;
 	}
 	/* Do not sync ONE PACKET */
 	if (cp->flags & IP_VS_CONN_F_ONE_PACKET)
 		goto control;
 sloop:
+	if (!ip_vs_sync_conn_needed(ipvs, cp, pkts))
+		goto control;
+
 	/* Sanity checks */
 	pe_name_len = 0;
 	if (cp->pe_data_len) {
@@ -653,16 +748,10 @@ control:
 	cp = cp->control;
 	if (!cp)
 		return;
-	/*
-	 * Reduce sync rate for templates
-	 * i.e only increment in_pkts for Templates.
-	 */
-	if (cp->flags & IP_VS_CONN_F_TEMPLATE) {
-		int pkts = atomic_add_return(1, &cp->in_pkts);
-
-		if (pkts % sysctl_sync_period(ipvs) != 1)
-			return;
-	}
+	if (cp->flags & IP_VS_CONN_F_TEMPLATE)
+		pkts = atomic_add_return(1, &cp->in_pkts);
+	else
+		pkts = sysctl_sync_threshold(ipvs);
 	goto sloop;
 }
 
@@ -1494,7 +1583,7 @@ next_sync_buff(struct netns_ipvs *ipvs)
 	if (sb)
 		return sb;
 	/* Do not delay entries in buffer for more than 2 seconds */
-	return get_curr_sync_buff(ipvs, 2 * HZ);
+	return get_curr_sync_buff(ipvs, IPVS_SYNC_FLUSH_TIME);
 }
 
 static int sync_thread_master(void *data)
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 24/25] netfilter: nf_conntrack: fix explicit helper attachment and NAT
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:

9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment

Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.

Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).

Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.

iptables -I PREROUTING -t raw -p tcp --dport 8888 \
	-j CT --helper ftp

And you disabled the automatic helper assignment.

I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/nf_conntrack_common.h |    4 ++++
 net/netfilter/nf_conntrack_helper.c           |   13 ++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/netfilter/nf_conntrack_common.h b/include/linux/netfilter/nf_conntrack_common.h
index 0d3dd66..d146872 100644
--- a/include/linux/netfilter/nf_conntrack_common.h
+++ b/include/linux/netfilter/nf_conntrack_common.h
@@ -83,6 +83,10 @@ enum ip_conntrack_status {
 	/* Conntrack is a fake untracked entry */
 	IPS_UNTRACKED_BIT = 12,
 	IPS_UNTRACKED = (1 << IPS_UNTRACKED_BIT),
+
+	/* Conntrack got a helper explicitly attached via CT target. */
+	IPS_HELPER_BIT = 13,
+	IPS_HELPER = (1 << IPS_HELPER_BIT),
 };

 /* Connection tracking event types */
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 317f6e4..4fa2ff9 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -182,10 +182,21 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 	struct net *net = nf_ct_net(ct);
 	int ret = 0;

+	/* We already got a helper explicitly attached. The function
+	 * nf_conntrack_alter_reply - in case NAT is in use - asks for looking
+	 * the helper up again. Since now the user is in full control of
+	 * making consistent helper configurations, skip this automatic
+	 * re-lookup, otherwise we'll lose the helper.
+	 */
+	if (test_bit(IPS_HELPER_BIT, &ct->status))
+		return 0;
+
 	if (tmpl != NULL) {
 		help = nfct_help(tmpl);
-		if (help != NULL)
+		if (help != NULL) {
 			helper = help->helper;
+			set_bit(IPS_HELPER_BIT, &ct->status);
+		}
 	}

 	help = nfct_help(ct);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 02/25] netfilter: nf_ct_helper: allow to disable automatic helper assignment
From: pablo @ 2012-05-08 18:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: Eric Leblond <eric@regit.org>

This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_conntrack_helper.h |    4 +-
 include/net/netns/conntrack.h               |    3 +
 net/netfilter/nf_conntrack_core.c           |   15 ++--
 net/netfilter/nf_conntrack_helper.c         |  109 ++++++++++++++++++++++++---
 4 files changed, 110 insertions(+), 21 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h
index 5767dc2..1d18894 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -60,8 +60,8 @@ static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
 	return nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
 }
 
-extern int nf_conntrack_helper_init(void);
-extern void nf_conntrack_helper_fini(void);
+extern int nf_conntrack_helper_init(struct net *net);
+extern void nf_conntrack_helper_fini(struct net *net);
 
 extern int nf_conntrack_broadcast_help(struct sk_buff *skb,
 				       unsigned int protoff,
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 7a911ec..a053a19 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -26,11 +26,14 @@ struct netns_ct {
 	int			sysctl_tstamp;
 	int			sysctl_checksum;
 	unsigned int		sysctl_log_invalid; /* Log invalid packets */
+	int			sysctl_auto_assign_helper;
+	bool			auto_assign_helper_warned;
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*sysctl_header;
 	struct ctl_table_header	*acct_sysctl_header;
 	struct ctl_table_header	*tstamp_sysctl_header;
 	struct ctl_table_header	*event_sysctl_header;
+	struct ctl_table_header	*helper_sysctl_header;
 #endif
 	char			*slabname;
 };
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index cf0747c..32c5909 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1336,7 +1336,6 @@ static void nf_conntrack_cleanup_init_net(void)
 	while (untrack_refs() > 0)
 		schedule();
 
-	nf_conntrack_helper_fini();
 	nf_conntrack_proto_fini();
 #ifdef CONFIG_NF_CONNTRACK_ZONES
 	nf_ct_extend_unregister(&nf_ct_zone_extend);
@@ -1354,6 +1353,7 @@ static void nf_conntrack_cleanup_net(struct net *net)
 	}
 
 	nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
+	nf_conntrack_helper_fini(net);
 	nf_conntrack_timeout_fini(net);
 	nf_conntrack_ecache_fini(net);
 	nf_conntrack_tstamp_fini(net);
@@ -1504,10 +1504,6 @@ static int nf_conntrack_init_init_net(void)
 	if (ret < 0)
 		goto err_proto;
 
-	ret = nf_conntrack_helper_init();
-	if (ret < 0)
-		goto err_helper;
-
 #ifdef CONFIG_NF_CONNTRACK_ZONES
 	ret = nf_ct_extend_register(&nf_ct_zone_extend);
 	if (ret < 0)
@@ -1525,10 +1521,8 @@ static int nf_conntrack_init_init_net(void)
 
 #ifdef CONFIG_NF_CONNTRACK_ZONES
 err_extend:
-	nf_conntrack_helper_fini();
-#endif
-err_helper:
 	nf_conntrack_proto_fini();
+#endif
 err_proto:
 	return ret;
 }
@@ -1589,9 +1583,14 @@ static int nf_conntrack_init_net(struct net *net)
 	ret = nf_conntrack_timeout_init(net);
 	if (ret < 0)
 		goto err_timeout;
+	ret = nf_conntrack_helper_init(net);
+	if (ret < 0)
+		goto err_helper;
 
 	return 0;
 
+err_helper:
+	nf_conntrack_timeout_fini(net);
 err_timeout:
 	nf_conntrack_ecache_fini(net);
 err_ecache:
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 436b7cb..317f6e4 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -34,6 +34,67 @@ static struct hlist_head *nf_ct_helper_hash __read_mostly;
 static unsigned int nf_ct_helper_hsize __read_mostly;
 static unsigned int nf_ct_helper_count __read_mostly;
 
+static bool nf_ct_auto_assign_helper __read_mostly = true;
+module_param_named(nf_conntrack_helper, nf_ct_auto_assign_helper, bool, 0644);
+MODULE_PARM_DESC(nf_conntrack_helper,
+		 "Enable automatic conntrack helper assignment (default 1)");
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table helper_sysctl_table[] = {
+	{
+		.procname	= "nf_conntrack_helper",
+		.data		= &init_net.ct.sysctl_auto_assign_helper,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{}
+};
+
+static int nf_conntrack_helper_init_sysctl(struct net *net)
+{
+	struct ctl_table *table;
+
+	table = kmemdup(helper_sysctl_table, sizeof(helper_sysctl_table),
+			GFP_KERNEL);
+	if (!table)
+		goto out;
+
+	table[0].data = &net->ct.sysctl_auto_assign_helper;
+
+	net->ct.helper_sysctl_header =
+		register_net_sysctl(net, "net/netfilter", table);
+
+	if (!net->ct.helper_sysctl_header) {
+		pr_err("nf_conntrack_helper: can't register to sysctl.\n");
+		goto out_register;
+	}
+	return 0;
+
+out_register:
+	kfree(table);
+out:
+	return -ENOMEM;
+}
+
+static void nf_conntrack_helper_fini_sysctl(struct net *net)
+{
+	struct ctl_table *table;
+
+	table = net->ct.helper_sysctl_header->ctl_table_arg;
+	unregister_net_sysctl_table(net->ct.helper_sysctl_header);
+	kfree(table);
+}
+#else
+static int nf_conntrack_helper_init_sysctl(struct net *net)
+{
+	return 0;
+}
+
+static void nf_conntrack_helper_fini_sysctl(struct net *net)
+{
+}
+#endif /* CONFIG_SYSCTL */
 
 /* Stupid hash, but collision free for the default registrations of the
  * helpers currently in the kernel. */
@@ -118,6 +179,7 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 {
 	struct nf_conntrack_helper *helper = NULL;
 	struct nf_conn_help *help;
+	struct net *net = nf_ct_net(ct);
 	int ret = 0;
 
 	if (tmpl != NULL) {
@@ -127,8 +189,17 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 	}
 
 	help = nfct_help(ct);
-	if (helper == NULL)
+	if (net->ct.sysctl_auto_assign_helper && helper == NULL) {
 		helper = __nf_ct_helper_find(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+		if (unlikely(!net->ct.auto_assign_helper_warned && helper)) {
+			pr_info("nf_conntrack: automatic helper "
+				"assignment is deprecated and it will "
+				"be removed soon. Use the iptables CT target "
+				"to attach helpers instead.\n");
+			net->ct.auto_assign_helper_warned = true;
+		}
+	}
+
 	if (helper == NULL) {
 		if (help)
 			RCU_INIT_POINTER(help->helper, NULL);
@@ -315,28 +386,44 @@ static struct nf_ct_ext_type helper_extend __read_mostly = {
 	.id	= NF_CT_EXT_HELPER,
 };
 
-int nf_conntrack_helper_init(void)
+int nf_conntrack_helper_init(struct net *net)
 {
 	int err;
 
-	nf_ct_helper_hsize = 1; /* gets rounded up to use one page */
-	nf_ct_helper_hash = nf_ct_alloc_hashtable(&nf_ct_helper_hsize, 0);
-	if (!nf_ct_helper_hash)
-		return -ENOMEM;
+	net->ct.auto_assign_helper_warned = false;
+	net->ct.sysctl_auto_assign_helper = nf_ct_auto_assign_helper;
 
-	err = nf_ct_extend_register(&helper_extend);
+	if (net_eq(net, &init_net)) {
+		nf_ct_helper_hsize = 1; /* gets rounded up to use one page */
+		nf_ct_helper_hash =
+			nf_ct_alloc_hashtable(&nf_ct_helper_hsize, 0);
+		if (!nf_ct_helper_hash)
+			return -ENOMEM;
+
+		err = nf_ct_extend_register(&helper_extend);
+		if (err < 0)
+			goto err1;
+	}
+
+	err = nf_conntrack_helper_init_sysctl(net);
 	if (err < 0)
-		goto err1;
+		goto out_sysctl;
 
 	return 0;
 
+out_sysctl:
+	if (net_eq(net, &init_net))
+		nf_ct_extend_unregister(&helper_extend);
 err1:
 	nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
 	return err;
 }
 
-void nf_conntrack_helper_fini(void)
+void nf_conntrack_helper_fini(struct net *net)
 {
-	nf_ct_extend_unregister(&helper_extend);
-	nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
+	nf_conntrack_helper_fini_sysctl(net);
+	if (net_eq(net, &init_net)) {
+		nf_ct_extend_unregister(&helper_extend);
+		nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
+	}
 }
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 01/25] netfilter: nf_ct_ecache: refactor notifier registration
From: pablo @ 2012-05-08 18:37 UTC (permalink / raw)
  To: netdev; +Cc: davem, openbsc
In-Reply-To: <1336502303-1722-1-git-send-email-pablo@netfilter.org>

From: Tony Zelenoff <antonz@parallels.com>

* ret variable initialization removed as useless
* similar code strings concatenated and functions code
  flow became more plain

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_ecache.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index b924f3a..e7be79e 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -84,7 +84,7 @@ EXPORT_SYMBOL_GPL(nf_ct_deliver_cached_events);
 int nf_conntrack_register_notifier(struct net *net,
 				   struct nf_ct_event_notifier *new)
 {
-	int ret = 0;
+	int ret;
 	struct nf_ct_event_notifier *notify;
 
 	mutex_lock(&nf_ct_ecache_mutex);
@@ -95,8 +95,7 @@ int nf_conntrack_register_notifier(struct net *net,
 		goto out_unlock;
 	}
 	rcu_assign_pointer(net->ct.nf_conntrack_event_cb, new);
-	mutex_unlock(&nf_ct_ecache_mutex);
-	return ret;
+	ret = 0;
 
 out_unlock:
 	mutex_unlock(&nf_ct_ecache_mutex);
@@ -121,7 +120,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_unregister_notifier);
 int nf_ct_expect_register_notifier(struct net *net,
 				   struct nf_exp_event_notifier *new)
 {
-	int ret = 0;
+	int ret;
 	struct nf_exp_event_notifier *notify;
 
 	mutex_lock(&nf_ct_ecache_mutex);
@@ -132,8 +131,7 @@ int nf_ct_expect_register_notifier(struct net *net,
 		goto out_unlock;
 	}
 	rcu_assign_pointer(net->ct.nf_expect_event_cb, new);
-	mutex_unlock(&nf_ct_ecache_mutex);
-	return ret;
+	ret = 0;
 
 out_unlock:
 	mutex_unlock(&nf_ct_ecache_mutex);
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] pch_gbe: Adding read memory barriers
From: David Miller @ 2012-05-08 18:39 UTC (permalink / raw)
  To: erwanaliasr1; +Cc: netdev, linux-kernel, tshimizu818
In-Reply-To: <4FA96770.9070009@gmail.com>

From: Erwan Velu <erwanaliasr1@gmail.com>
Date: Tue, 08 May 2012 20:35:28 +0200

> Le 07/05/2012 21:30, Erwan Velu a écrit :
>> From bb1e271db0fa1a29df19bede69faf8004389132d Mon Sep 17 00:00:00 2001
>> From: Erwan Velu <erwan.velu@zodiacaerospace.com>
>> Date: Mon, 7 May 2012 19:15:29 +0000
>> Subject: [PATCH 1/1] pch_gbe: Adding read memory barriers
> 
> Does my patch can be considered as acceptable or shall I rework
> something on it ?

You never need to ask questions like this.

Your patch is queued up to be reviewed in patchwork:

http://patchwork.ozlabs.org/project/netdev/list/

Therefore you only make more work for maintainers and irritate them by
asking this, and therefore it will take even longer for them to get to
your patch.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox