Netdev List
 help / color / mirror / Atom feed
* [PATCH 0/1] net/hyperv: Use wait_event on outstanding sends during device removal
From: Haiyang Zhang @ 2012-06-04 14:35 UTC (permalink / raw)
  To: davem, netdev; +Cc: devel, haiyangz, olaf, linux-kernel

This patch is targeting net-next tree (when it's available for check in).

Haiyang Zhang (1):
  net/hyperv: Use wait_event on outstanding sends during device removal

 drivers/net/hyperv/hyperv_net.h |    1 +
 drivers/net/hyperv/netvsc.c     |   12 ++++++------
 2 files changed, 7 insertions(+), 6 deletions(-)

-- 
1.7.4.1

^ permalink raw reply

* Re: [PATCH net-next] net: netdev_alloc_skb() use build_skb()
From: Michael S. Tsirkin @ 2012-06-04 14:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Willy Tarreau, David Miller, netdev
In-Reply-To: <1338818501.2760.1821.camel@edumazet-glaptop>

On Mon, Jun 04, 2012 at 04:01:41PM +0200, Eric Dumazet wrote:
> On Mon, 2012-06-04 at 16:41 +0300, Michael S. Tsirkin wrote:
> 
> > This is generally what virtio does, take a look:
> > page_to_skb fills the first fragment and receive_mergeable fills the
> > rest (other modes are for legacy hardware).
> > 
> > The way hypervisor now works is this (we call it mergeable buffers):
> > 
> > - pages are passed to hardware
> > - hypervisor puts virtio specific stuff in first 12 bytes
> >   on first page
> > - following this, the rest of the first page and all following
> >   pages have data
> > 
> > The driver gets the 1st page, allocates the skb, copies out the 12 byte
> > header and copies the first 128 bytes of data into skb.
> > The rest if any is populated by the pages.
> > 
> > So I guess I'm asking for advice, would it make sense to switch to build_skb
> > and how best to handle the data copying above? Maybe it would help
> > if we changed the hypervisor to write the 12 bytes separately?
> >   
> 
> Thanks for these details.
> 
> Not sure 12 bytes of headroom would be enough (instead of the
> NET_SKB_PAD reserved in netdev_alloc_skb_ip_align(), but what could be
> done indeed is to use the first page as the skb->head, so using
> build_skb() indeed, removing one fragment, one (small) copy and one
> {put|get}_page() pair.
> 

bnx2 and tg3 both do skb_reserve of at least NET_SKB_PAD
after build_skb. You are saying it's not a must?

Hmm so maybe we should teach the hypervisor to write data
out at an offset. Interesting.

Another question is about very small packets truesize.
build_skb sets truesize to frag_size but isn't
this too small? We keep the whole page around, no?

^ permalink raw reply

* Re: [PATCH net-next] net: netdev_alloc_skb() use build_skb()
From: Eric Dumazet @ 2012-06-04 14:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Willy Tarreau, David Miller, netdev
In-Reply-To: <1338818501.2760.1821.camel@edumazet-glaptop>

On Mon, 2012-06-04 at 16:01 +0200, Eric Dumazet wrote:

> Not sure 12 bytes of headroom would be enough (instead of the
> NET_SKB_PAD reserved in netdev_alloc_skb_ip_align(), but what could be
> done indeed is to use the first page as the skb->head, so using
> build_skb() indeed, removing one fragment, one (small) copy and one
> {put|get}_page() pair.

It would also avoid 'pulling' tcp data payload in linear part.

page_to_skb() does :

copy = len;
if (copy > skb_tailroom(skb))
	copy = skb_tailroom(skb);
memcpy(skb_put(skb, copy), p, copy);

This means GRO or TCP coalescing (or splice()) has to handle two
segments to fetch data.

^ permalink raw reply

* Re: [PATCH RFC] c_can_pci: generic module for c_can on PCI
From: Marc Kleine-Budde @ 2012-06-04 14:04 UTC (permalink / raw)
  To: Federico Vaga
  Cc: Wolfgang Grandegger, Giancarlo Asnaghi, Alan Cox,
	Alessandro Rubini, linux-can, netdev, linux-kernel
In-Reply-To: <1338816766-7089-2-git-send-email-federico.vaga@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9563 bytes --]

On 06/04/2012 03:32 PM, Federico Vaga wrote:
> Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
> Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
> Cc: Alan Cox <alan@linux.intel.com>

Please port you driver to the recent c_can changes. Use the c_can branch
of the linux-can-next repo[1] as base for your work. You have to rework
the register access function. Please have a look if there are devm_
variants for the registration/mapping of the pci and clock.

[1] https://gitorious.org/linux-can/linux-can-next

More comments inline. Marc

> ---
>  drivers/net/can/c_can/Kconfig     |   11 +-
>  drivers/net/can/c_can/Makefile    |    1 +
>  drivers/net/can/c_can/c_can_pci.c |  221 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 230 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/net/can/c_can/c_can_pci.c
> 
> diff --git a/drivers/net/can/c_can/Kconfig b/drivers/net/can/c_can/Kconfig
> index ffb9773..74ef97d 100644
> --- a/drivers/net/can/c_can/Kconfig
> +++ b/drivers/net/can/c_can/Kconfig
> @@ -2,14 +2,19 @@ menuconfig CAN_C_CAN
>  	tristate "Bosch C_CAN devices"
>  	depends on CAN_DEV && HAS_IOMEM
>  
> -if CAN_C_CAN

please keep the if CAN_C_CAN...

> -
>  config CAN_C_CAN_PLATFORM
>  	tristate "Generic Platform Bus based C_CAN driver"
> +	depends on CAN_C_CAN

...then you don't have to add the depends on here.

>  	---help---
>  	  This driver adds support for the C_CAN chips connected to
>  	  the "platform bus" (Linux abstraction for directly to the
>  	  processor attached devices) which can be found on various
>  	  boards from ST Microelectronics (http://www.st.com)
>  	  like the SPEAr1310 and SPEAr320 evaluation boards.
> -endif

... Just move you pci driver inside the if...endif block...
> +
> +config CAN_C_CAN_PCI
> +	tristate "Generic PCI Bus based C_CAN driver"
> +	depends on CAN_C_CAN

...and remove the depends on CAN_C_CAN. You probably have to add a
depends on PCI.

> +	---help---
> +	  This driver adds support for the C_CAN chips connected to
> +	  the PCI bus.
> diff --git a/drivers/net/can/c_can/Makefile b/drivers/net/can/c_can/Makefile
> index 9273f6d..ad1cc84 100644
> --- a/drivers/net/can/c_can/Makefile
> +++ b/drivers/net/can/c_can/Makefile
> @@ -4,5 +4,6 @@
>  
>  obj-$(CONFIG_CAN_C_CAN) += c_can.o
>  obj-$(CONFIG_CAN_C_CAN_PLATFORM) += c_can_platform.o
> +obj-$(CONFIG_CAN_C_CAN_PCI) += c_can_pci.o
>  
>  ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
> diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c
> new file mode 100644
> index 0000000..b635375
> --- /dev/null
> +++ b/drivers/net/can/c_can/c_can_pci.c
> @@ -0,0 +1,221 @@
> +/*
> + * Platform CAN bus driver for Bosch C_CAN controller
> + *
> + * Copyright (C) 2012 Federico Vaga <federico.vaga@gmail.com>
> +  *
   ^^^ double space :)

> + * This file is licensed under the terms of the GNU General Public
> + * License version 2. This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netdevice.h>
> +#include <linux/clk.h>
> +#include <linux/pci.h>
> +#include <linux/can/dev.h>
> +
> +#include "c_can.h"
> +
> +enum c_can_pci_reg_align {
> +	C_CAN_REG_ALIGN_16,
> +	C_CAN_REG_ALIGN_32,
> +};
> +
> +struct c_can_pci_data {
> +	unsigned int reg_align;	/* Set the register alignment in the memory */
        ^^^^^^^^^^^^
use the enum you defined above.

> +	unsigned int freq;	/* Set the frequency if clk is not usable */
> +};
> +
> +/*
> + * 16-bit c_can registers can be arranged differently in the memory
> + * architecture of different implementations. For example: 16-bit
> + * registers can be aligned to a 16-bit boundary or 32-bit boundary etc.
> + * Handle the same by providing a common read/write interface.
> + */
> +static u16 c_can_pci_read_reg_aligned_to_16bit(struct c_can_priv *priv,
> +						void *reg)
> +{
> +	return readw(reg);
> +}
> +
> +static void c_can_pci_write_reg_aligned_to_16bit(struct c_can_priv *priv,
> +						void *reg, u16 val)
> +{
> +	writew(val, reg);
> +}
> +
> +static u16 c_can_pci_read_reg_aligned_to_32bit(struct c_can_priv *priv,
> +						void *reg)
> +{
> +	return readw(reg + (long)reg - (long)priv->regs);
> +}
> +
> +static void c_can_pci_write_reg_aligned_to_32bit(struct c_can_priv *priv,
> +						void *reg, u16 val)
> +{
> +	writew(val, reg + (long)reg - (long)priv->regs);
> +}
> +
> +static int __devinit c_can_pci_probe(struct pci_dev *pdev,
> +				     const struct pci_device_id *ent)
> +{
> +	struct c_can_pci_data *c_can_pci_data = (void *)ent->driver_data;
> +	struct c_can_priv *priv;
> +	struct net_device *dev;
> +	void __iomem *addr;
> +	struct clk *clk;
> +	int ret;
> +
> +	ret = pci_enable_device(pdev);
> +	if (ret) {
> +		dev_err(&pdev->dev, "pci_enable_device FAILED\n");
> +		goto out;
> +	}
> +
> +	ret = pci_request_regions(pdev, KBUILD_MODNAME);
> +	if (ret) {
> +		dev_err(&pdev->dev, "pci_request_regions FAILED\n");
> +		goto out_disable_device;
> +	}
> +
> +	pci_set_master(pdev);
> +	pci_enable_msi(pdev);
> +
> +	addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));
> +	if (!addr) {
> +		dev_err(&pdev->dev,
> +			"device has no PCI memory resources, "
> +			"failing adapter\n");
> +		ret = -ENOMEM;
> +		goto out_release_regions;
> +	}
> +
> +	/* allocate the c_can device */
> +	dev = alloc_c_can_dev();
> +	if (!dev) {
> +		ret = -ENOMEM;
> +		goto out_iounmap;
> +	}
> +
> +	priv = netdev_priv(dev);
> +	pci_set_drvdata(pdev, dev);
> +	SET_NETDEV_DEV(dev, &pdev->dev);
> +
> +	dev->irq = pdev->irq;
> +	priv->regs = addr;
> +
> +	if (!c_can_pci_data->freq) {
> +		/* get the appropriate clk */
> +		clk = clk_get(&pdev->dev, NULL);
> +		if (IS_ERR(clk)) {
> +			dev_err(&pdev->dev, "no clock defined\n");
> +			ret = -ENODEV;
> +			goto out_free_c_can;
> +		}
> +		priv->can.clock.freq = clk_get_rate(clk);
> +		priv->priv = clk;
> +	} else {
> +		priv->can.clock.freq = c_can_pci_data->freq;
> +		priv->priv = NULL;
> +	}
> +
> +	switch (c_can_pci_data->reg_align) {
> +	case C_CAN_REG_ALIGN_32:
> +		priv->read_reg = c_can_pci_read_reg_aligned_to_32bit;
> +		priv->write_reg = c_can_pci_write_reg_aligned_to_32bit;
> +		break;
> +	case C_CAN_REG_ALIGN_16:
> +	default:
> +		priv->read_reg = c_can_pci_read_reg_aligned_to_16bit;
> +		priv->write_reg = c_can_pci_write_reg_aligned_to_16bit;
> +		break;
> +	}
> +
> +	ret = register_c_can_dev(dev);
> +	if (ret) {
> +		dev_err(&pdev->dev, "registering %s failed (err=%d)\n",
> +			KBUILD_MODNAME, ret);
> +		goto out_free_clock;
> +	}
> +
> +	dev_info(&pdev->dev, "%s device registered (regs=%p, irq=%d)\n",
> +		 KBUILD_MODNAME, priv->regs, dev->irq);
> +
> +	return 0;
> +
> +out_free_clock:
> +	if (!priv->priv)
           ^^^

looks fishy

> +		clk_put(priv->priv);
> +out_free_c_can:
> +	pci_set_drvdata(pdev, NULL);
> +	free_c_can_dev(dev);
> +out_iounmap:
> +	pci_iounmap(pdev, addr);
> +out_release_regions:
> +	pci_disable_msi(pdev);
> +	pci_clear_master(pdev);
> +	pci_release_regions(pdev);
> +out_disable_device:
> +	/*
> +	 * do not call pci_disable_device on sta2x11 because it
> +	 * break all other Bus masters on this EP
> +	 */
> +	if(pdev->vendor == PCI_VENDOR_ID_STMICRO &&
> +	   pdev->device == PCI_DEVICE_ID_STMICRO_CAN)
> +		goto out;
> +	pci_disable_device(pdev);
> +out:
> +	return ret;
> +}
> +
> +static void __devexit c_can_pci_remove(struct pci_dev *pdev)
> +{
> +	struct net_device *dev = pci_get_drvdata(pdev);
> +	struct c_can_priv *priv = netdev_priv(dev);
> +
> +	pci_set_drvdata(pdev, NULL);
> +	free_c_can_dev(dev);
> +	if (!priv->priv)
dito
> +		clk_put(priv->priv);
> +	pci_iounmap(pdev, priv->regs);
> +	pci_disable_msi(pdev);
> +	pci_clear_master(pdev);
> +	pci_release_regions(pdev);
> +	/*
> +	 * do not call pci_disable_device on sta2x11 because it
> +	 * break all other Bus masters on this EP
> +	 */
> +	if(pdev->vendor == PCI_VENDOR_ID_STMICRO &&
> +	   pdev->device == PCI_DEVICE_ID_STMICRO_CAN)
> +		return;
> +	pci_disable_device(pdev);
> +}
> +
> +static struct c_can_pci_data c_can_sta2x11= {
> +	.reg_align = C_CAN_REG_ALIGN_32,
> +	.freq = 52000000, /* 52 Mhz */
> +};
> +
> +#define C_CAN_ID(_vend, _dev, _driverdata) {		\
> +	PCI_DEVICE(_vend, _dev),			\
> +	.driver_data = (unsigned long)&_driverdata,	\
> +}
> +DEFINE_PCI_DEVICE_TABLE(c_can_pci_tbl) = {
^^^^

static?

> +	C_CAN_ID(PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_CAN,
> +		 c_can_sta2x11),
> +	{},
> +};
> +static struct pci_driver sta2x11_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = c_can_pci_tbl,
> +	.probe = c_can_pci_probe,
> +	.remove = __devexit_p(c_can_pci_remove),
> +};
> +
> +module_pci_driver(sta2x11_pci_driver);
> +
> +MODULE_AUTHOR("Federico Vaga <federico.vaga@gmail.com>");
> +MODULE_LICENSE("GPL V2");

IIRC, the correct case is "GPL v2"

> +MODULE_DESCRIPTION("PCI CAN bus driver for Bosch C_CAN controller");
> +MODULE_DEVICE_TABLE(pci, c_can_pci_tbl);


-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH 7/7] netfilter: add user-space connection tracking helper infrastructure
From: Jan Engelhardt @ 2012-06-04 14:04 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1338812485-4232-8-git-send-email-pablo@netfilter.org>


On Monday 2012-06-04 14:21, pablo@netfilter.org wrote:
>+static int
>+nfnl_cthelper_from_nlattr(struct nlattr *attr, struct nf_conn *ct)
>+{
>+	const struct nf_conn_help *help = nfct_help(ct);
>+
>+	if (help->helper->data_len == 0)
>+		return -EINVAL;
>+
>+	memcpy(&help->data, nla_data(attr), help->helper->data_len);

memcpy(help->data, ...)

>+static int
>+nfnl_cthelper_to_nlattr(struct sk_buff *skb, const struct nf_conn *ct)
>+{
>+	const struct nf_conn_help *help = nfct_help(ct);
>+
>+	if (help->helper->data_len &&
>+	    nla_put(skb, CTA_HELP_INFO, help->helper->data_len, &help->data))
>+		goto nla_put_failure;

help->data

^ permalink raw reply

* Re: [PATCH net-next] net: netdev_alloc_skb() use build_skb()
From: Eric Dumazet @ 2012-06-04 14:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Willy Tarreau, David Miller, netdev
In-Reply-To: <20120604134138.GA29814@redhat.com>

On Mon, 2012-06-04 at 16:41 +0300, Michael S. Tsirkin wrote:

> This is generally what virtio does, take a look:
> page_to_skb fills the first fragment and receive_mergeable fills the
> rest (other modes are for legacy hardware).
> 
> The way hypervisor now works is this (we call it mergeable buffers):
> 
> - pages are passed to hardware
> - hypervisor puts virtio specific stuff in first 12 bytes
>   on first page
> - following this, the rest of the first page and all following
>   pages have data
> 
> The driver gets the 1st page, allocates the skb, copies out the 12 byte
> header and copies the first 128 bytes of data into skb.
> The rest if any is populated by the pages.
> 
> So I guess I'm asking for advice, would it make sense to switch to build_skb
> and how best to handle the data copying above? Maybe it would help
> if we changed the hypervisor to write the 12 bytes separately?
>   

Thanks for these details.

Not sure 12 bytes of headroom would be enough (instead of the
NET_SKB_PAD reserved in netdev_alloc_skb_ip_align(), but what could be
done indeed is to use the first page as the skb->head, so using
build_skb() indeed, removing one fragment, one (small) copy and one
{put|get}_page() pair.

^ permalink raw reply

* [PATCH net-next] sock_diag: add SK_MEMINFO_BACKLOG
From: Eric Dumazet @ 2012-06-04 13:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

Adding socket backlog len in INET_DIAG_SKMEMINFO is really useful to
diagnose various TCP problems.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/sock_diag.h |    1 +
 net/core/sock_diag.c      |    1 +
 2 files changed, 2 insertions(+)

diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index db4bae7..6793fac 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -18,6 +18,7 @@ enum {
 	SK_MEMINFO_FWD_ALLOC,
 	SK_MEMINFO_WMEM_QUEUED,
 	SK_MEMINFO_OPTMEM,
+	SK_MEMINFO_BACKLOG,
 
 	SK_MEMINFO_VARS,
 };
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 5fd1467..0d934ce 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -46,6 +46,7 @@ int sock_diag_put_meminfo(struct sock *sk, struct sk_buff *skb, int attrtype)
 	mem[SK_MEMINFO_FWD_ALLOC] = sk->sk_forward_alloc;
 	mem[SK_MEMINFO_WMEM_QUEUED] = sk->sk_wmem_queued;
 	mem[SK_MEMINFO_OPTMEM] = atomic_read(&sk->sk_omem_alloc);
+	mem[SK_MEMINFO_BACKLOG] = sk->sk_backlog.len;
 
 	return 0;
 

^ permalink raw reply related

* Re: [PATCH net-next] net: netdev_alloc_skb() use build_skb()
From: Michael S. Tsirkin @ 2012-06-04 13:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Willy Tarreau, David Miller, netdev
In-Reply-To: <1338815213.2760.1806.camel@edumazet-glaptop>

On Mon, Jun 04, 2012 at 03:06:53PM +0200, Eric Dumazet wrote:
> On Mon, 2012-06-04 at 15:37 +0300, Michael S. Tsirkin wrote:
> > On Thu, May 17, 2012 at 07:34:16PM +0200, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > Please note I havent tested yet this patch, lacking hardware for this.
> > > 
> > > (tg3/bnx2/bnx2x use build_skb, r8169 does a copy of incoming frames,
> > > ixgbe uses fragments...)
> > 
> > virtio-net uses netdev_alloc_skb but maybe it should call
> > build_skb instead?
> > 
> > Also, it's not uncommon for drivers to copy short packets out to be able
> > to reuse pages.  virtio does this but I am guessing the logic is not
> > really virtio specific.
> > 
> > We could do
> > 	if (len < GOOD_COPY_LEN)
> > 		netdev_alloc_skb
> > 		memmov
> > 	else
> > 		build_skb
> > 
> > but maybe it makes sense to put this logic in build_skb?
> > 
> > 
> 
> I am not sure to understand the question.
> 
> If virtio-net uses netdev_alloc_skb(), all is good, you have nothing to
> change.
> 
> build_skb() is for drivers that allocate the memory to hold frame, and
> wait for NIC completion before allocating/populating the skb itself.
> 


This is generally what virtio does, take a look:
page_to_skb fills the first fragment and receive_mergeable fills the
rest (other modes are for legacy hardware).

The way hypervisor now works is this (we call it mergeable buffers):

- pages are passed to hardware
- hypervisor puts virtio specific stuff in first 12 bytes
  on first page
- following this, the rest of the first page and all following
  pages have data

The driver gets the 1st page, allocates the skb, copies out the 12 byte
header and copies the first 128 bytes of data into skb.
The rest if any is populated by the pages.

So I guess I'm asking for advice, would it make sense to switch to build_skb
and how best to handle the data copying above? Maybe it would help
if we changed the hypervisor to write the 12 bytes separately?
  


-- 
MST

^ permalink raw reply

* Re: [PATCH 4/7] netfilter: add glue code to integrate nfnetlink_queue and ctnetlink
From: Jan Engelhardt @ 2012-06-04 13:38 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1338812485-4232-5-git-send-email-pablo@netfilter.org>


On Monday 2012-06-04 14:21, pablo@netfilter.org wrote:
>+static int
>+ctnetlink_nfqueue_parse(const struct nlattr *attr, struct nf_conn *ct)
>+{
>+	const struct nlattr * const cda[CTA_MAX+1];

I suppose you wrote that because the same appears in function
headers/signatures

	void foo(const struct nlattr *const tb[]) { ... }

But there, it is actually equal to

	void foo(const struct nlattr *const *tb) { ... }

In either case, tb is writable. IMHO, [] should be avoided in
signatures to avoid self-confusion, as it seemed to occur in your
case, where cda is - unlike tb - really marked const.
You likely wanted

	const struct nlattr *cda[CTA_MAX+1];

>+       nla_parse_nested((struct nlattr **)cda, CTA_MAX, attr, ct_nla_policy);



^ permalink raw reply

* generic module for c-can on pci
From: Federico Vaga @ 2012-06-04 13:32 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde
  Cc: Federico Vaga, Giancarlo Asnaghi, Alan Cox, Alessandro Rubini,
	linux-can, netdev, linux-kernel
In-Reply-To: <4FC135C6.5030206@grandegger.com>


As suggested I developed a generic module for C-CAN
on PCI. Probably I will do some changes about our
specific board, but I think that the module is generic
enough.

^ permalink raw reply

* [PATCH RFC] c_can_pci: generic module for c_can on PCI
From: Federico Vaga @ 2012-06-04 13:32 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde
  Cc: Federico Vaga, Giancarlo Asnaghi, Alan Cox, Alessandro Rubini,
	linux-can, netdev, linux-kernel
In-Reply-To: <1338816766-7089-1-git-send-email-federico.vaga@gmail.com>

Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Cc: Alan Cox <alan@linux.intel.com>
---
 drivers/net/can/c_can/Kconfig     |   11 +-
 drivers/net/can/c_can/Makefile    |    1 +
 drivers/net/can/c_can/c_can_pci.c |  221 +++++++++++++++++++++++++++++++++++++
 3 files changed, 230 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/can/c_can/c_can_pci.c

diff --git a/drivers/net/can/c_can/Kconfig b/drivers/net/can/c_can/Kconfig
index ffb9773..74ef97d 100644
--- a/drivers/net/can/c_can/Kconfig
+++ b/drivers/net/can/c_can/Kconfig
@@ -2,14 +2,19 @@ menuconfig CAN_C_CAN
 	tristate "Bosch C_CAN devices"
 	depends on CAN_DEV && HAS_IOMEM
 
-if CAN_C_CAN
-
 config CAN_C_CAN_PLATFORM
 	tristate "Generic Platform Bus based C_CAN driver"
+	depends on CAN_C_CAN
 	---help---
 	  This driver adds support for the C_CAN chips connected to
 	  the "platform bus" (Linux abstraction for directly to the
 	  processor attached devices) which can be found on various
 	  boards from ST Microelectronics (http://www.st.com)
 	  like the SPEAr1310 and SPEAr320 evaluation boards.
-endif
+
+config CAN_C_CAN_PCI
+	tristate "Generic PCI Bus based C_CAN driver"
+	depends on CAN_C_CAN
+	---help---
+	  This driver adds support for the C_CAN chips connected to
+	  the PCI bus.
diff --git a/drivers/net/can/c_can/Makefile b/drivers/net/can/c_can/Makefile
index 9273f6d..ad1cc84 100644
--- a/drivers/net/can/c_can/Makefile
+++ b/drivers/net/can/c_can/Makefile
@@ -4,5 +4,6 @@
 
 obj-$(CONFIG_CAN_C_CAN) += c_can.o
 obj-$(CONFIG_CAN_C_CAN_PLATFORM) += c_can_platform.o
+obj-$(CONFIG_CAN_C_CAN_PCI) += c_can_pci.o
 
 ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c
new file mode 100644
index 0000000..b635375
--- /dev/null
+++ b/drivers/net/can/c_can/c_can_pci.c
@@ -0,0 +1,221 @@
+/*
+ * Platform CAN bus driver for Bosch C_CAN controller
+ *
+ * Copyright (C) 2012 Federico Vaga <federico.vaga@gmail.com>
+  *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/clk.h>
+#include <linux/pci.h>
+#include <linux/can/dev.h>
+
+#include "c_can.h"
+
+enum c_can_pci_reg_align {
+	C_CAN_REG_ALIGN_16,
+	C_CAN_REG_ALIGN_32,
+};
+
+struct c_can_pci_data {
+	unsigned int reg_align;	/* Set the register alignment in the memory */
+	unsigned int freq;	/* Set the frequency if clk is not usable */
+};
+
+/*
+ * 16-bit c_can registers can be arranged differently in the memory
+ * architecture of different implementations. For example: 16-bit
+ * registers can be aligned to a 16-bit boundary or 32-bit boundary etc.
+ * Handle the same by providing a common read/write interface.
+ */
+static u16 c_can_pci_read_reg_aligned_to_16bit(struct c_can_priv *priv,
+						void *reg)
+{
+	return readw(reg);
+}
+
+static void c_can_pci_write_reg_aligned_to_16bit(struct c_can_priv *priv,
+						void *reg, u16 val)
+{
+	writew(val, reg);
+}
+
+static u16 c_can_pci_read_reg_aligned_to_32bit(struct c_can_priv *priv,
+						void *reg)
+{
+	return readw(reg + (long)reg - (long)priv->regs);
+}
+
+static void c_can_pci_write_reg_aligned_to_32bit(struct c_can_priv *priv,
+						void *reg, u16 val)
+{
+	writew(val, reg + (long)reg - (long)priv->regs);
+}
+
+static int __devinit c_can_pci_probe(struct pci_dev *pdev,
+				     const struct pci_device_id *ent)
+{
+	struct c_can_pci_data *c_can_pci_data = (void *)ent->driver_data;
+	struct c_can_priv *priv;
+	struct net_device *dev;
+	void __iomem *addr;
+	struct clk *clk;
+	int ret;
+
+	ret = pci_enable_device(pdev);
+	if (ret) {
+		dev_err(&pdev->dev, "pci_enable_device FAILED\n");
+		goto out;
+	}
+
+	ret = pci_request_regions(pdev, KBUILD_MODNAME);
+	if (ret) {
+		dev_err(&pdev->dev, "pci_request_regions FAILED\n");
+		goto out_disable_device;
+	}
+
+	pci_set_master(pdev);
+	pci_enable_msi(pdev);
+
+	addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));
+	if (!addr) {
+		dev_err(&pdev->dev,
+			"device has no PCI memory resources, "
+			"failing adapter\n");
+		ret = -ENOMEM;
+		goto out_release_regions;
+	}
+
+	/* allocate the c_can device */
+	dev = alloc_c_can_dev();
+	if (!dev) {
+		ret = -ENOMEM;
+		goto out_iounmap;
+	}
+
+	priv = netdev_priv(dev);
+	pci_set_drvdata(pdev, dev);
+	SET_NETDEV_DEV(dev, &pdev->dev);
+
+	dev->irq = pdev->irq;
+	priv->regs = addr;
+
+	if (!c_can_pci_data->freq) {
+		/* get the appropriate clk */
+		clk = clk_get(&pdev->dev, NULL);
+		if (IS_ERR(clk)) {
+			dev_err(&pdev->dev, "no clock defined\n");
+			ret = -ENODEV;
+			goto out_free_c_can;
+		}
+		priv->can.clock.freq = clk_get_rate(clk);
+		priv->priv = clk;
+	} else {
+		priv->can.clock.freq = c_can_pci_data->freq;
+		priv->priv = NULL;
+	}
+
+	switch (c_can_pci_data->reg_align) {
+	case C_CAN_REG_ALIGN_32:
+		priv->read_reg = c_can_pci_read_reg_aligned_to_32bit;
+		priv->write_reg = c_can_pci_write_reg_aligned_to_32bit;
+		break;
+	case C_CAN_REG_ALIGN_16:
+	default:
+		priv->read_reg = c_can_pci_read_reg_aligned_to_16bit;
+		priv->write_reg = c_can_pci_write_reg_aligned_to_16bit;
+		break;
+	}
+
+	ret = register_c_can_dev(dev);
+	if (ret) {
+		dev_err(&pdev->dev, "registering %s failed (err=%d)\n",
+			KBUILD_MODNAME, ret);
+		goto out_free_clock;
+	}
+
+	dev_info(&pdev->dev, "%s device registered (regs=%p, irq=%d)\n",
+		 KBUILD_MODNAME, priv->regs, dev->irq);
+
+	return 0;
+
+out_free_clock:
+	if (!priv->priv)
+		clk_put(priv->priv);
+out_free_c_can:
+	pci_set_drvdata(pdev, NULL);
+	free_c_can_dev(dev);
+out_iounmap:
+	pci_iounmap(pdev, addr);
+out_release_regions:
+	pci_disable_msi(pdev);
+	pci_clear_master(pdev);
+	pci_release_regions(pdev);
+out_disable_device:
+	/*
+	 * do not call pci_disable_device on sta2x11 because it
+	 * break all other Bus masters on this EP
+	 */
+	if(pdev->vendor == PCI_VENDOR_ID_STMICRO &&
+	   pdev->device == PCI_DEVICE_ID_STMICRO_CAN)
+		goto out;
+	pci_disable_device(pdev);
+out:
+	return ret;
+}
+
+static void __devexit c_can_pci_remove(struct pci_dev *pdev)
+{
+	struct net_device *dev = pci_get_drvdata(pdev);
+	struct c_can_priv *priv = netdev_priv(dev);
+
+	pci_set_drvdata(pdev, NULL);
+	free_c_can_dev(dev);
+	if (!priv->priv)
+		clk_put(priv->priv);
+	pci_iounmap(pdev, priv->regs);
+	pci_disable_msi(pdev);
+	pci_clear_master(pdev);
+	pci_release_regions(pdev);
+	/*
+	 * do not call pci_disable_device on sta2x11 because it
+	 * break all other Bus masters on this EP
+	 */
+	if(pdev->vendor == PCI_VENDOR_ID_STMICRO &&
+	   pdev->device == PCI_DEVICE_ID_STMICRO_CAN)
+		return;
+	pci_disable_device(pdev);
+}
+
+static struct c_can_pci_data c_can_sta2x11= {
+	.reg_align = C_CAN_REG_ALIGN_32,
+	.freq = 52000000, /* 52 Mhz */
+};
+
+#define C_CAN_ID(_vend, _dev, _driverdata) {		\
+	PCI_DEVICE(_vend, _dev),			\
+	.driver_data = (unsigned long)&_driverdata,	\
+}
+DEFINE_PCI_DEVICE_TABLE(c_can_pci_tbl) = {
+	C_CAN_ID(PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_CAN,
+		 c_can_sta2x11),
+	{},
+};
+static struct pci_driver sta2x11_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = c_can_pci_tbl,
+	.probe = c_can_pci_probe,
+	.remove = __devexit_p(c_can_pci_remove),
+};
+
+module_pci_driver(sta2x11_pci_driver);
+
+MODULE_AUTHOR("Federico Vaga <federico.vaga@gmail.com>");
+MODULE_LICENSE("GPL V2");
+MODULE_DESCRIPTION("PCI CAN bus driver for Bosch C_CAN controller");
+MODULE_DEVICE_TABLE(pci, c_can_pci_tbl);
-- 
1.7.10.2

^ permalink raw reply related

* Re: [PATCH 3/7] netfilter: nf_ct_helper: implement variable length helper private data
From: Jan Engelhardt @ 2012-06-04 13:16 UTC (permalink / raw)
  To: Joe Perches; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <1338815399.8574.10.camel@joe2Laptop>

On Monday 2012-06-04 15:09, Joe Perches wrote:

>On Mon, 2012-06-04 at 15:06 +0200, Jan Engelhardt wrote:
>> On Monday 2012-06-04 14:21, pablo@netfilter.org wrote:
>
>> >@@ -218,13 +221,13 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
>> > 	}
>> > 
>> > 	if (help == NULL) {
>> >-		help = nf_ct_helper_ext_add(ct, flags);
>> >+		help = nf_ct_helper_ext_add(ct, helper, flags);
>> > 		if (help == NULL) {
>> > 			ret = -ENOMEM;
>> > 			goto out;
>> > 		}
>> > 	} else {
>> >-		memset(&help->help, 0, sizeof(help->help));
>> >+		memset(&help->data, 0, sizeof(helper->data_len));
>> > 	}
>> 
>> memset(help->data, 0, sizeof(helper->data_len));
>
>	memset(help->data, 0, helper->data_len);

I knew this looked suspect. With so many "sizeof"s, this spot was 
starting to look like a "mine is bigger" competition.

^ permalink raw reply

* Re: [PATCH 3/7] netfilter: nf_ct_helper: implement variable length helper private data
From: Joe Perches @ 2012-06-04 13:09 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <alpine.LNX.2.01.1206041456480.16684@frira.zrqbmnf.qr>

On Mon, 2012-06-04 at 15:06 +0200, Jan Engelhardt wrote:
> On Monday 2012-06-04 14:21, pablo@netfilter.org wrote:

> >@@ -218,13 +221,13 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
> > 	}
> > 
> > 	if (help == NULL) {
> >-		help = nf_ct_helper_ext_add(ct, flags);
> >+		help = nf_ct_helper_ext_add(ct, helper, flags);
> > 		if (help == NULL) {
> > 			ret = -ENOMEM;
> > 			goto out;
> > 		}
> > 	} else {
> >-		memset(&help->help, 0, sizeof(help->help));
> >+		memset(&help->data, 0, sizeof(helper->data_len));
> > 	}
> 
> memset(help->data, 0, sizeof(helper->data_len));

	memset(help->data, 0, helper->data_len);



^ permalink raw reply

* Re: [PATCH net-next] net: netdev_alloc_skb() use build_skb()
From: Eric Dumazet @ 2012-06-04 13:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Willy Tarreau, David Miller, netdev
In-Reply-To: <20120604123738.GA28992@redhat.com>

On Mon, 2012-06-04 at 15:37 +0300, Michael S. Tsirkin wrote:
> On Thu, May 17, 2012 at 07:34:16PM +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Please note I havent tested yet this patch, lacking hardware for this.
> > 
> > (tg3/bnx2/bnx2x use build_skb, r8169 does a copy of incoming frames,
> > ixgbe uses fragments...)
> 
> virtio-net uses netdev_alloc_skb but maybe it should call
> build_skb instead?
> 
> Also, it's not uncommon for drivers to copy short packets out to be able
> to reuse pages.  virtio does this but I am guessing the logic is not
> really virtio specific.
> 
> We could do
> 	if (len < GOOD_COPY_LEN)
> 		netdev_alloc_skb
> 		memmov
> 	else
> 		build_skb
> 
> but maybe it makes sense to put this logic in build_skb?
> 
> 

I am not sure to understand the question.

If virtio-net uses netdev_alloc_skb(), all is good, you have nothing to
change.

build_skb() is for drivers that allocate the memory to hold frame, and
wait for NIC completion before allocating/populating the skb itself.

^ permalink raw reply

* Re: [PATCH 3/7] netfilter: nf_ct_helper: implement variable length helper private data
From: Jan Engelhardt @ 2012-06-04 13:06 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1338812485-4232-4-git-send-email-pablo@netfilter.org>

On Monday 2012-06-04 14:21, pablo@netfilter.org wrote:

>+static inline void *nfct_help_data(const struct nf_conn *ct)
>+{
>+	struct nf_conn_help *help;
>+
>+	help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
>+
>+	return (void *)&help->data;
>+}

I think you wanted

	return help->data;

here. Remember that help->data is of type char[0] which is
convertible to char* - which is what you want,
while adding an extra & would turn that into the undesired (char[0])*.



>@@ -89,12 +59,13 @@ struct nf_conn_help {
> 	/* Helper. if any */
> 	struct nf_conntrack_helper __rcu *helper;
> 
>-	union nf_conntrack_help help;
>-
> 	struct hlist_head expectations;
> 
> 	/* Current number of expected connections */
> 	u8 expecting[NF_CT_MAX_EXPECT_CLASSES];
>+
>+	/* private helper information. */
>+	char data[0];

There is a now-standardized notation:

	char data[];



>@@ -218,13 +221,13 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
> 	}
> 
> 	if (help == NULL) {
>-		help = nf_ct_helper_ext_add(ct, flags);
>+		help = nf_ct_helper_ext_add(ct, helper, flags);
> 		if (help == NULL) {
> 			ret = -ENOMEM;
> 			goto out;
> 		}
> 	} else {
>-		memset(&help->help, 0, sizeof(help->help));
>+		memset(&help->data, 0, sizeof(helper->data_len));
> 	}

memset(help->data, 0, sizeof(helper->data_len));

>index 6f4b00a..30f5e12 100644
>--- a/net/netfilter/nf_conntrack_netlink.c
>+++ b/net/netfilter/nf_conntrack_netlink.c
>@@ -1218,7 +1218,7 @@ ctnetlink_change_helper(struct nf_conn *ct, const struct nlattr * const cda[])
> 		if (help->helper)
> 			return -EBUSY;
> 		/* need to zero data of old helper */
>-		memset(&help->help, 0, sizeof(help->help));
>+		memset(&help->data, 0, help->helper->data_len);

Here too.. memset(help->data,...

^ permalink raw reply

* Re: [PATCH net-next] net: netdev_alloc_skb() use build_skb()
From: Willy Tarreau @ 2012-06-04 12:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Eric Dumazet, David Miller, netdev
In-Reply-To: <20120604123912.GB28992@redhat.com>

On Mon, Jun 04, 2012 at 03:39:12PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 17, 2012 at 07:45:51PM +0200, Willy Tarreau wrote:
> > Impressed !
> > 
> > For the first time I could proxy HTTP traffic at gigabit speed on this
> > little box powered by USB ! I've long believed that proper splicing
> > would make this possible and now I'm seeing it is. Congrats Eric !
> 
> which userspace do you use?

It's haproxy-1.5-dev with splicing enabled.

> anything I can try?

Yes, feel free to download -dev11, build it for kernels >= 2.6.28 and
make a small config to relay TCP/HTTP to another host. of course you
need gigabit-capable client and server.

Willy

^ permalink raw reply

* Re: [PATCH net-next] net: netdev_alloc_skb() use build_skb()
From: Michael S. Tsirkin @ 2012-06-04 12:39 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Eric Dumazet, David Miller, netdev
In-Reply-To: <20120517174551.GN14498@1wt.eu>

On Thu, May 17, 2012 at 07:45:51PM +0200, Willy Tarreau wrote:
> Impressed !
> 
> For the first time I could proxy HTTP traffic at gigabit speed on this
> little box powered by USB ! I've long believed that proper splicing
> would make this possible and now I'm seeing it is. Congrats Eric !

which userspace do you use?
anything I can try?

^ permalink raw reply

* Re: [PATCH net-next] net: netdev_alloc_skb() use build_skb()
From: Michael S. Tsirkin @ 2012-06-04 12:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Willy Tarreau, David Miller, netdev
In-Reply-To: <1337276056.3403.37.camel@edumazet-glaptop>

On Thu, May 17, 2012 at 07:34:16PM +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Please note I havent tested yet this patch, lacking hardware for this.
> 
> (tg3/bnx2/bnx2x use build_skb, r8169 does a copy of incoming frames,
> ixgbe uses fragments...)

virtio-net uses netdev_alloc_skb but maybe it should call
build_skb instead?

Also, it's not uncommon for drivers to copy short packets out to be able
to reuse pages.  virtio does this but I am guessing the logic is not
really virtio specific.

We could do
	if (len < GOOD_COPY_LEN)
		netdev_alloc_skb
		memmov
	else
		build_skb

but maybe it makes sense to put this logic in build_skb?


-- 
MST

^ permalink raw reply

* [BUG] vanilla 32bit 3.4.0, lockdep, l2tp_xmit_skb + sch_direct_xmit warning
From: Denys Fedoryshchenko @ 2012-06-04 12:37 UTC (permalink / raw)
  To: netdev, linux-kernel

CBSS_PPPoE ~ # ip l2tp show tunnel
Tunnel 2, encap IP
   From 194.146.153.XX to 194.146.153.YY
   Peer tunnel 2

CBSS_PPPoE ~ # ip l2tp show session
   Session 1 in tunnel 2
   Peer session 1, tunnel 2
   interface name: tun0
   offset 0, peer offset 0

CBSS_PPPoE ~ # ip link show dev tun0
303: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast 
state UNKNOWN mode DEFAULT qlen 1000
     link/ether 6e:25:18:ce:8e:3b brd ff:ff:ff:ff:ff:ff
CBSS_PPPoE ~ # ip addr show dev tun0
303: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast 
state UNKNOWN qlen 1000
     link/ether 6e:25:18:ce:8e:3b brd ff:ff:ff:ff:ff:ff
     inet 10.0.6.2/24 scope global tun0

command was executed: ping 10.0.6.1

         [  135.292915]
         [  135.293008] 
======================================================
         [  135.293115] [ INFO: possible circular locking dependency 
detected ]
         [  135.293221] 3.4.0-build-0061 #12 Not tainted
         [  135.293316] 
-------------------------------------------------------
         [  135.293420] ping/6404 is trying to acquire lock:
         [  135.293517]  (slock-AF_INET){+.-...}, at: [<f88c83ec>] 
l2tp_xmit_skb+0x173/0x47e [l2tp_core]
         [  135.293780]
         [  135.293780] but task is already holding lock:
         [  135.293970]  (_xmit_ETHER#2){+.-...}, at: [<c02f09b9>] 
sch_direct_xmit+0x36/0x119
         [  135.294251]
         [  135.294252] which lock already depends on the new lock.
         [  135.294252]
         [  135.294532]
         [  135.294533] the existing dependency chain (in reverse order) 
is:
         [  135.294727]
         [  135.294728] -> #1 (_xmit_ETHER#2){+.-...}:
         [  135.295018]        [<c015a6d1>] lock_acquire+0x71/0x85
         [  135.295140]        [<c034ddad>] _raw_spin_lock+0x33/0x40
         [  135.295262]        [<c02e79f2>] neigh_update+0x1d9/0x385
         [  135.295384]        [<c031fff7>] arp_process+0x477/0x491
         [  135.295507]        [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
         [  135.295628]        [<c031fb2c>] arp_rcv+0xb1/0xc3
         [  135.295748]        [<c02deca7>] 
__netif_receive_skb+0x329/0x378
         [  135.295873]        [<c02dee74>] netif_receive_skb+0x4e/0x7d
         [  135.295997]        [<c02def60>] napi_skb_finish+0x1e/0x34
         [  135.296121]        [<c02df389>] napi_gro_receive+0x20/0x24
         [  135.296245]        [<f8527213>] rtl8169_poll+0x2e6/0x52c 
[r8169]
         [  135.296374]        [<c02df48f>] net_rx_action+0x90/0x15d
         [  135.296496]        [<c012b42d>] __do_softirq+0x7b/0x118
         [  135.296620]
         [  135.296620] -> #0 (slock-AF_INET){+.-...}:
         [  135.296889]        [<c015a08b>] __lock_acquire+0x9a3/0xc27
         [  135.297010]        [<c015a6d1>] lock_acquire+0x71/0x85
         [  135.297130]        [<c034ddad>] _raw_spin_lock+0x33/0x40
         [  135.297251]        [<f88c83ec>] l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
         [  135.297376]        [<f86b11fb>] l2tp_eth_dev_xmit+0x1a/0x2f 
[l2tp_eth]
         [  135.297500]        [<c02e0573>] 
dev_hard_start_xmit+0x333/0x3f2
         [  135.297623]        [<c02f09d8>] sch_direct_xmit+0x55/0x119
         [  135.297745]        [<c02e08b4>] dev_queue_xmit+0x282/0x418
         [  135.297868]        [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
         [  135.297992]        [<c031f8b0>] arp_xmit+0x22/0x24
         [  135.298113]        [<c031f8f3>] arp_send+0x41/0x48
         [  135.300267]        [<c031fa65>] arp_solicit+0x16b/0x181
         [  135.300388]        [<c02e6852>] neigh_probe+0x3c/0x52
         [  135.300509]        [<c02e6e46>] 
__neigh_event_send+0x1a3/0x1bc
         [  135.300630]        [<c02e8221>] 
neigh_resolve_output+0x59/0x149
         [  135.300750]        [<c03039e0>] 
ip_finish_output2+0x1e1/0x21c
         [  135.300871]        [<c0303a50>] ip_finish_output+0x35/0x39
         [  135.300989]        [<c03048c7>] ip_output+0x87/0x8c
         [  135.301110]        [<c03030c6>] dst_output+0x15/0x18
         [  135.301232]        [<c03042d7>] ip_local_out+0x17/0x1a
         [  135.301355]        [<c0304f59>] ip_send_skb+0x12/0x5c
         [  135.301478]        [<c0304fcd>] 
ip_push_pending_frames+0x2a/0x2e
         [  135.301603]        [<c031b98d>] raw_sendmsg+0x67a/0x749
         [  135.301726]        [<c032445f>] inet_sendmsg+0x53/0x5a
         [  135.301850]        [<c02d0162>] sock_sendmsg+0xaa/0xc2
         [  135.301974]        [<c02d0387>] __sys_sendmsg+0x182/0x20c
         [  135.302098]        [<c02d1518>] sys_sendmsg+0x36/0x4d
         [  135.302219]        [<c02d1a66>] sys_socketcall+0x214/0x27e
         [  135.302344]        [<c034e511>] syscall_call+0x7/0xb
         [  135.302467]
         [  135.302468] other info that might help us debug this:
         [  135.302468]
         [  135.302739]  Possible unsafe locking scenario:
         [  135.302739]
         [  135.302928]        CPU0                    CPU1
         [  135.303022]        ----                    ----
         [  135.303118]   lock(_xmit_ETHER#2);
         [  135.303274]                                
lock(slock-AF_INET);
         [  135.303417]                                
lock(_xmit_ETHER#2);
         [  135.303582]   lock(slock-AF_INET);
         [  135.303719]
         [  135.303719]  *** DEADLOCK ***
         [  135.303719]
         [  135.303990] 4 locks held by ping/6404:
         [  135.304087]  #0:  (sk_lock-AF_INET){+.+.+.}, at: 
[<c031b928>] raw_sendmsg+0x615/0x749
         [  135.304361]  #1:  (rcu_read_lock){.+.+..}, at: [<c0302fad>] 
rcu_read_lock+0x0/0x35
         [  135.304637]  #2:  (rcu_read_lock_bh){.+....}, at: 
[<c02dbf9c>] rcu_lock_acquire+0x0/0x30
         [  135.304913]  #3:  (_xmit_ETHER#2){+.-...}, at: [<c02f09b9>] 
sch_direct_xmit+0x36/0x119
         [  135.305209]
         [  135.305209] stack backtrace:
         [  135.305391] Pid: 6404, comm: ping Not tainted 
3.4.0-build-0061 #12
         [  135.305492] Call Trace:
         [  135.305589]  [<c034c156>] ? printk+0x18/0x1a
         [  135.305689]  [<c0158a74>] print_circular_bug+0x1ac/0x1b6
         [  135.305790]  [<c015a08b>] __lock_acquire+0x9a3/0xc27
         [  135.305890]  [<c0159500>] ? valid_state+0x1d4/0x201
         [  135.305989]  [<c019d0d6>] ? 
__slab_alloc.clone.59.clone.64+0xc4/0x2de
         [  135.306097]  [<c015a6d1>] lock_acquire+0x71/0x85
         [  135.306197]  [<f88c83ec>] ? l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
         [  135.306301]  [<c034ddad>] _raw_spin_lock+0x33/0x40
         [  135.306401]  [<f88c83ec>] ? l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
         [  135.306506]  [<f88c83ec>] l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
         [  135.306609]  [<c014f946>] ? timekeeping_get_ns+0xf/0x46
         [  135.306710]  [<f86b11fb>] l2tp_eth_dev_xmit+0x1a/0x2f 
[l2tp_eth]
         [  135.306815]  [<c02e0573>] dev_hard_start_xmit+0x333/0x3f2
         [  135.306919]  [<c02f09d8>] sch_direct_xmit+0x55/0x119
         [  135.307016]  [<c02e08b4>] dev_queue_xmit+0x282/0x418
         [  135.307112]  [<c02e0632>] ? dev_hard_start_xmit+0x3f2/0x3f2
         [  135.307220]  [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
         [  135.307322]  [<c031f8b0>] arp_xmit+0x22/0x24
         [  135.307428]  [<c02e0632>] ? dev_hard_start_xmit+0x3f2/0x3f2
         [  135.307529]  [<c031f8f3>] arp_send+0x41/0x48
         [  135.307625]  [<c031fa65>] arp_solicit+0x16b/0x181
         [  135.307721]  [<c02e6852>] neigh_probe+0x3c/0x52
         [  135.307821]  [<c02e6e46>] __neigh_event_send+0x1a3/0x1bc
         [  135.307923]  [<c02e8221>] neigh_resolve_output+0x59/0x149
         [  135.308025]  [<c0302fe0>] ? rcu_read_lock+0x33/0x35
         [  135.308125]  [<c03039e0>] ip_finish_output2+0x1e1/0x21c
         [  135.308225]  [<c02fcce6>] ? ipv4_mtu+0x36/0x65
         [  135.308326]  [<c0303a50>] ip_finish_output+0x35/0x39
         [  135.308426]  [<c03048c7>] ip_output+0x87/0x8c
         [  135.308523]  [<c0303a1b>] ? ip_finish_output2+0x21c/0x21c
         [  135.308624]  [<c03030c6>] dst_output+0x15/0x18
         [  135.308721]  [<c03042d7>] ip_local_out+0x17/0x1a
         [  135.308821]  [<c0304f59>] ip_send_skb+0x12/0x5c
         [  135.308922]  [<c0304fcd>] ip_push_pending_frames+0x2a/0x2e
         [  135.309022]  [<c031b98d>] raw_sendmsg+0x67a/0x749
         [  135.309119]  [<c032445f>] inet_sendmsg+0x53/0x5a
         [  135.309219]  [<c02d0162>] sock_sendmsg+0xaa/0xc2
         [  135.309318]  [<c015a397>] ? 
lock_release_non_nested+0x88/0x20b
         [  135.309420]  [<c01898a0>] ? might_fault+0x2d/0x79
         [  135.309520]  [<c01898e6>] ? might_fault+0x73/0x79
         [  135.309620]  [<c02d8400>] ? copy_from_user+0x8/0xa
         [  135.309717]  [<c02d8729>] ? verify_iovec+0x3e/0x75
         [  135.309817]  [<c02d0387>] __sys_sendmsg+0x182/0x20c
         [  135.309919]  [<c0159553>] ? mark_lock+0x26/0x1bb
         [  135.310019]  [<c0159bb6>] ? __lock_acquire+0x4ce/0xc27
         [  135.310120]  [<c018b700>] ? handle_pte_fault+0x284/0x93d
         [  135.310221]  [<c015a397>] ? 
lock_release_non_nested+0x88/0x20b
         [  135.310328]  [<c0188dee>] ? page_address+0x8a/0x9f
         [  135.310430]  [<c01898a0>] ? might_fault+0x2d/0x79
         [  135.310531]  [<c01a42e2>] ? fget_light+0x2b/0x7c
         [  135.310631]  [<c02d1518>] sys_sendmsg+0x36/0x4d
         [  135.310725]  [<c02d1a66>] sys_socketcall+0x214/0x27e
         [  135.310827]  [<c034e544>] ? restore_all+0xf/0xf
         [  135.310929]  [<c011e817>] ? vmalloc_sync_all+0x5/0x5
         [  135.311032]  [<c022d2ec>] ? trace_hardirqs_on_thunk+0xc/0x10
         [  135.311135]  [<c034e511>] syscall_call+0x7/0xb
---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* [PATCH 4/7] netfilter: add glue code to integrate nfnetlink_queue and ctnetlink
From: pablo @ 2012-06-04 12:21 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev
In-Reply-To: <1338812485-4232-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

This patch allows you to include the conntrack information together
with the packet that is sent to user-space via NFQUEUE.

Previously, there was no integration between ctnetlink and
nfnetlink_queue. If you wanted to access conntrack information
from your libnetfilter_queue program, you required to query
ctnetlink from user-space to obtain it. Thus, delaying the packet
processing even more.

Including the conntrack information is optional, you can set it
via NFQNL_F_CONNTRACK flag with the new NFQ_CFG_FLAGS attribute.

This change provides the required features to use nfnetlink_queue
as the user-space queueing infrastructure for the follow-up patch
that introduces user-space conntrack helpers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter.h                 |   10 ++
 include/linux/netfilter/nfnetlink_queue.h |    7 ++
 net/netfilter/core.c                      |    4 +
 net/netfilter/nf_conntrack_netlink.c      |  158 ++++++++++++++++++++++++++++-
 net/netfilter/nfnetlink_queue.c           |   62 +++++++++++
 5 files changed, 240 insertions(+), 1 deletion(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index ff9c84c..a08dcb6 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -383,6 +383,16 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family)
 extern void (*ip_ct_attach)(struct sk_buff *, struct sk_buff *) __rcu;
 extern void nf_ct_attach(struct sk_buff *, struct sk_buff *);
 extern void (*nf_ct_destroy)(struct nf_conntrack *) __rcu;
+
+struct nf_conn;
+struct nlattr;
+
+struct nfq_ct_hook {
+	size_t (*build_size)(const struct nf_conn *ct);
+	int (*build)(struct sk_buff *skb, struct nf_conn *ct);
+	int (*parse)(const struct nlattr *attr, struct nf_conn *ct);
+};
+extern struct nfq_ct_hook *nfq_ct_hook;
 #else
 static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {}
 #endif
diff --git a/include/linux/netfilter/nfnetlink_queue.h b/include/linux/netfilter/nfnetlink_queue.h
index 24b32e6..da44b33 100644
--- a/include/linux/netfilter/nfnetlink_queue.h
+++ b/include/linux/netfilter/nfnetlink_queue.h
@@ -42,6 +42,8 @@ enum nfqnl_attr_type {
 	NFQA_IFINDEX_PHYSOUTDEV,	/* __u32 ifindex */
 	NFQA_HWADDR,			/* nfqnl_msg_packet_hw */
 	NFQA_PAYLOAD,			/* opaque data payload */
+	NFQA_CT,			/* nf_conntrack_netlink.h */
+	NFQA_CT_INFO,			/* enum ip_conntrack_info */
 
 	__NFQA_MAX
 };
@@ -78,12 +80,17 @@ struct nfqnl_msg_config_params {
 	__u8	copy_mode;	/* enum nfqnl_config_mode */
 } __attribute__ ((packed));
 
+enum nfqnl_flags {
+	NFQNL_F_NONE		= 0,
+	NFQNL_F_CONNTRACK	= (1 << 0),
+};
 
 enum nfqnl_attr_config {
 	NFQA_CFG_UNSPEC,
 	NFQA_CFG_CMD,			/* nfqnl_msg_config_cmd */
 	NFQA_CFG_PARAMS,		/* nfqnl_msg_config_params */
 	NFQA_CFG_QUEUE_MAXLEN,		/* __u32 */
+	NFQA_CFG_FLAGS,			/* __u32 */
 	__NFQA_CFG_MAX
 };
 #define NFQA_CFG_MAX (__NFQA_CFG_MAX-1)
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index e19f365..7eef845 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -264,6 +264,10 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct)
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(nf_conntrack_destroy);
+
+struct nfq_ct_hook *nfq_ct_hook;
+EXPORT_SYMBOL_GPL(nfq_ct_hook);
+
 #endif /* CONFIG_NF_CONNTRACK */
 
 #ifdef CONFIG_PROC_FS
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 30f5e12..28ac04c 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1620,6 +1620,152 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
 	return err;
 }
 
+#if defined(CONFIG_NETFILTER_NETLINK_QUEUE) ||	\
+    defined(CONFIG_NETFILTER_NETLINK_QUEUE_MODULE)
+static size_t
+ctnetlink_nfqueue_build_size(const struct nf_conn *ct)
+{
+	return 3 * nla_total_size(0) /* CTA_TUPLE_ORIG|REPL|MASTER */
+	       + 3 * nla_total_size(0) /* CTA_TUPLE_IP */
+	       + 3 * nla_total_size(0) /* CTA_TUPLE_PROTO */
+	       + 3 * nla_total_size(sizeof(u_int8_t)) /* CTA_PROTO_NUM */
+	       + nla_total_size(sizeof(u_int32_t)) /* CTA_ID */
+	       + nla_total_size(sizeof(u_int32_t)) /* CTA_STATUS */
+	       + nla_total_size(sizeof(u_int32_t)) /* CTA_TIMEOUT */
+	       + nla_total_size(0) /* CTA_PROTOINFO */
+	       + nla_total_size(0) /* CTA_HELP */
+	       + nla_total_size(NF_CT_HELPER_NAME_LEN) /* CTA_HELP_NAME */
+	       + ctnetlink_secctx_size(ct)
+#ifdef CONFIG_NF_NAT_NEEDED
+	       + 2 * nla_total_size(0) /* CTA_NAT_SEQ_ADJ_ORIG|REPL */
+	       + 6 * nla_total_size(sizeof(u_int32_t)) /* CTA_NAT_SEQ_OFFSET */
+#endif
+#ifdef CONFIG_NF_CONNTRACK_MARK
+	       + nla_total_size(sizeof(u_int32_t)) /* CTA_MARK */
+#endif
+	       + ctnetlink_proto_size(ct)
+	       ;
+}
+
+static int
+ctnetlink_nfqueue_build(struct sk_buff *skb, struct nf_conn *ct)
+{
+	struct nlattr *nest_parms;
+
+	rcu_read_lock();
+	nest_parms = nla_nest_start(skb, CTA_TUPLE_ORIG | NLA_F_NESTED);
+	if (!nest_parms)
+		goto nla_put_failure;
+	if (ctnetlink_dump_tuples(skb, nf_ct_tuple(ct, IP_CT_DIR_ORIGINAL)) < 0)
+		goto nla_put_failure;
+	nla_nest_end(skb, nest_parms);
+
+	nest_parms = nla_nest_start(skb, CTA_TUPLE_REPLY | NLA_F_NESTED);
+	if (!nest_parms)
+		goto nla_put_failure;
+	if (ctnetlink_dump_tuples(skb, nf_ct_tuple(ct, IP_CT_DIR_REPLY)) < 0)
+		goto nla_put_failure;
+	nla_nest_end(skb, nest_parms);
+
+	if (nf_ct_zone(ct)) {
+		if (nla_put_be16(skb, CTA_ZONE, htons(nf_ct_zone(ct))))
+			goto nla_put_failure;
+	}
+
+	if (ctnetlink_dump_id(skb, ct) < 0)
+		goto nla_put_failure;
+
+	if (ctnetlink_dump_status(skb, ct) < 0)
+		goto nla_put_failure;
+
+	if (ctnetlink_dump_timeout(skb, ct) < 0)
+		goto nla_put_failure;
+
+	if (ctnetlink_dump_protoinfo(skb, ct) < 0)
+		goto nla_put_failure;
+
+	if (ctnetlink_dump_helpinfo(skb, ct) < 0)
+		goto nla_put_failure;
+
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+	if (ct->secmark && ctnetlink_dump_secctx(skb, ct) < 0)
+		goto nla_put_failure;
+#endif
+	if (ct->master && ctnetlink_dump_master(skb, ct) < 0)
+		goto nla_put_failure;
+
+	if ((ct->status & IPS_SEQ_ADJUST) &&
+	    ctnetlink_dump_nat_seq_adj(skb, ct) < 0)
+		goto nla_put_failure;
+
+#ifdef CONFIG_NF_CONNTRACK_MARK
+	if (ct->mark && ctnetlink_dump_mark(skb, ct) < 0)
+		goto nla_put_failure;
+#endif
+	rcu_read_unlock();
+	return 0;
+
+nla_put_failure:
+	rcu_read_unlock();
+	return -ENOSPC;
+}
+
+static int
+ctnetlink_nfqueue_parse(const struct nlattr *attr, struct nf_conn *ct)
+{
+	const struct nlattr * const cda[CTA_MAX+1];
+	struct nf_conntrack_tuple otuple, rtuple;
+	u16 u3 = nf_ct_l3num(ct);
+	int err;
+
+	nla_parse_nested((struct nlattr **)cda, CTA_MAX, attr, ct_nla_policy);
+
+	if (cda[CTA_TUPLE_ORIG]) {
+		err = ctnetlink_parse_tuple(cda, &otuple, CTA_TUPLE_ORIG, u3);
+		if (err < 0)
+			return err;
+	}
+	if (cda[CTA_TUPLE_REPLY]) {
+		err = ctnetlink_parse_tuple(cda, &rtuple, CTA_TUPLE_REPLY, u3);
+		if (err < 0)
+			return err;
+	}
+	if (cda[CTA_TIMEOUT]) {
+		err = ctnetlink_change_timeout(ct, cda);
+		if (err < 0)
+			return err;
+	}
+	if (cda[CTA_STATUS]) {
+		err = ctnetlink_change_status(ct, cda);
+		if (err < 0)
+			return err;
+	}
+	if (cda[CTA_PROTOINFO]) {
+		err = ctnetlink_change_protoinfo(ct, cda);
+		if (err < 0)
+			return err;
+	}
+#if defined(CONFIG_NF_CONNTRACK_MARK)
+	if (cda[CTA_MARK])
+		ct->mark = ntohl(nla_get_be32(cda[CTA_MARK]));
+#endif
+#ifdef CONFIG_NF_NAT_NEEDED
+	if (cda[CTA_NAT_SEQ_ADJ_ORIG] || cda[CTA_NAT_SEQ_ADJ_REPLY]) {
+		err = ctnetlink_change_nat_seq_adj(ct, cda);
+		if (err < 0)
+			return err;
+	}
+#endif
+	return 0;
+}
+
+static struct nfq_ct_hook ctnetlink_nfqueue_hook = {
+	.build_size	= ctnetlink_nfqueue_build_size,
+	.build		= ctnetlink_nfqueue_build,
+	.parse		= ctnetlink_nfqueue_parse,
+};
+#endif /* CONFIG_NETFILTER_NETLINK_QUEUE */
+
 /***********************************************************************
  * EXPECT
  ***********************************************************************/
@@ -2424,7 +2570,12 @@ static int __init ctnetlink_init(void)
 		pr_err("ctnetlink_init: cannot register pernet operations\n");
 		goto err_unreg_exp_subsys;
 	}
-
+#if defined(CONFIG_NETFILTER_NETLINK_QUEUE) ||	\
+    defined(CONFIG_NETFILTER_NETLINK_QUEUE_MODULE)
+	/* setup interaction between nf_queue and nf_conntrack_netlink. */
+	RCU_INIT_POINTER(nfq_ct_hook, &ctnetlink_nfqueue_hook);
+	printk("registering nf_queue and ctnetlink interaction\n");
+#endif
 	return 0;
 
 err_unreg_exp_subsys:
@@ -2442,6 +2593,11 @@ static void __exit ctnetlink_exit(void)
 	unregister_pernet_subsys(&ctnetlink_net_ops);
 	nfnetlink_subsys_unregister(&ctnl_exp_subsys);
 	nfnetlink_subsys_unregister(&ctnl_subsys);
+#if defined(CONFIG_NETFILTER_NETLINK_QUEUE) ||	\
+    defined(CONFIG_NETFILTER_NETLINK_QUEUE_MODULE)
+	RCU_INIT_POINTER(nfq_ct_hook, NULL);
+	printk("unregistering nf_queue and ctnetlink interaction\n");
+#endif
 }
 
 module_init(ctnetlink_init);
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 8d6bcf3..b007779 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -30,6 +30,7 @@
 #include <linux/list.h>
 #include <net/sock.h>
 #include <net/netfilter/nf_queue.h>
+#include <net/netfilter/nf_conntrack.h>
 
 #include <linux/atomic.h>
 
@@ -44,6 +45,7 @@ struct nfqnl_instance {
 	struct rcu_head rcu;
 
 	int peer_pid;
+	unsigned int flags;
 	unsigned int queue_maxlen;
 	unsigned int copy_range;
 	unsigned int queue_dropped;
@@ -232,6 +234,9 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 	struct sk_buff *entskb = entry->skb;
 	struct net_device *indev;
 	struct net_device *outdev;
+	struct nfq_ct_hook *nfq_ct;
+	struct nf_conn *ct = NULL;
+	enum ip_conntrack_info ctinfo = 0; /* make gcc happy. */
 
 	size =    NLMSG_SPACE(sizeof(struct nfgenmsg))
 		+ nla_total_size(sizeof(struct nfqnl_msg_packet_hdr))
@@ -265,6 +270,17 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 		break;
 	}
 
+	/* rcu_read_lock()ed by __nf_queue already. */
+	nfq_ct = rcu_dereference(nfq_ct_hook);
+	if (nfq_ct != NULL && (queue->flags & NFQNL_F_CONNTRACK)) {
+		ct = nf_ct_get(entskb, &ctinfo);
+		if (ct) {
+			if (!nf_ct_is_untracked(ct))
+				size += nfq_ct->build_size(ct);
+			else
+				ct = NULL;
+		}
+	}
 
 	skb = alloc_skb(size, GFP_ATOMIC);
 	if (!skb)
@@ -388,6 +404,24 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 			BUG();
 	}
 
+	if (nfq_ct != NULL && (queue->flags & NFQNL_F_CONNTRACK) && ct) {
+		struct nlattr *nest_parms;
+		u_int32_t tmp;
+
+		nest_parms = nla_nest_start(skb, NFQA_CT | NLA_F_NESTED);
+		if (!nest_parms)
+			goto nla_put_failure;
+
+		if (nfq_ct->build(skb, ct) < 0)
+			goto nla_put_failure;
+
+		nla_nest_end(skb, nest_parms);
+
+		tmp = ctinfo;
+		if (nla_put_u32(skb, NFQA_CT_INFO, htonl(ctinfo)))
+			goto nla_put_failure;
+	}
+
 	nlh->nlmsg_len = skb->tail - old_tail;
 	return skb;
 
@@ -726,6 +760,7 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 	struct nfqnl_instance *queue;
 	unsigned int verdict;
 	struct nf_queue_entry *entry;
+	struct nfq_ct_hook *nfq_ct;
 
 	queue = instance_lookup(queue_num);
 	if (!queue)
@@ -753,6 +788,19 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 	if (nfqa[NFQA_MARK])
 		entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
 
+	rcu_read_lock();
+	nfq_ct = rcu_dereference(nfq_ct_hook);
+	if (nfq_ct != NULL &&
+	    (queue->flags & NFQNL_F_CONNTRACK) && nfqa[NFQA_CT]) {
+		enum ip_conntrack_info ctinfo;
+		struct nf_conn *ct;
+
+		ct = nf_ct_get(entry->skb, &ctinfo);
+		if (ct && !nf_ct_is_untracked(ct))
+			nfq_ct->parse(nfqa[NFQA_CT], ct);
+	}
+	rcu_read_unlock();
+
 	nf_reinject(entry, verdict);
 	return 0;
 }
@@ -768,6 +816,7 @@ nfqnl_recv_unsupp(struct sock *ctnl, struct sk_buff *skb,
 static const struct nla_policy nfqa_cfg_policy[NFQA_CFG_MAX+1] = {
 	[NFQA_CFG_CMD]		= { .len = sizeof(struct nfqnl_msg_config_cmd) },
 	[NFQA_CFG_PARAMS]	= { .len = sizeof(struct nfqnl_msg_config_params) },
+	[NFQA_CFG_FLAGS]	= { .type = NLA_U32 }
 };
 
 static const struct nf_queue_handler nfqh = {
@@ -861,6 +910,19 @@ nfqnl_recv_config(struct sock *ctnl, struct sk_buff *skb,
 		spin_unlock_bh(&queue->lock);
 	}
 
+	if (nfqa[NFQA_CFG_FLAGS]) {
+		__be32 *flags;
+
+		if (!queue) {
+			ret = -ENODEV;
+			goto err_out_unlock;
+		}
+		flags = nla_data(nfqa[NFQA_CFG_FLAGS]);
+		spin_lock_bh(&queue->lock);
+		queue->flags = ntohl(*flags);
+		spin_unlock_bh(&queue->lock);
+	}
+
 err_out_unlock:
 	rcu_read_unlock();
 	return ret;
-- 
1.7.10

^ permalink raw reply related

* [PATCH 0/7] [RFC] new user-space connection tracking helper infrastructure
From: pablo @ 2012-06-04 12:21 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>

Hi!

This is a new try to provide a full user-space connection tracking helper
infrastructure. Some of you, that check my tree, already know that I've been
working on this since time ago.

Previous approaches had important limitations and the integration with iptables
was not precisely nice.

The initial patches prepare the field for the introduction of the
cthelper infrastructure:

1) allocate fixed area for helper name, as a side effect, the initialization
   code of the kernel-space helpers looks better IMO.

2) allow variable length conntrack extensions.

3) add support for variable length helper extensions.

4) improve integration between nfnetlink_queue and ctnetlink. Now, you don't
   have to open two handlers listen to packets via nfqueue and receive
   events via ctnetlink. Instead, you can enable one flag to get the conntrack
   data together with the packet via nfqueue.

5) improve integration of packet mangling and nf_conntrack. This has been
   a long standing issue. If you mangle one TCP packet in user-space and
   connection tracking is enabled, nf_ct_tcp reports sequence tracking errors.
   This patch aims to resolve this issue.

6) Add CTA_HELP_INFO attribute. This is used to store the private helper
   data. Thus, we don't need to keep a redundant cache of conntrack entries
   in user-space. The private helper information is stored.

7) finally, the netlink cthelper infrastructure.

Of course, this patch makes no sense without the user-space changes, they are:

* updates in the conntrack-tools (see cthelper11 branch):
http://git.netfilter.org/cgi-bin/gitweb.cgi?p=conntrack-tools.git;a=shortlog;h=refs/heads/cthelper11

It includes the FTP user-space helper, one RPC helper (for NFSv3) and one TNS
helper (for Oracle).

* libnetfilter_cthelper
http://git.netfilter.org/cgi-bin/gitweb.cgi?p=libnetfilter_cthelper.git;a=summary

* libnetfilter_conntrack (new libmnl API)
http://git.netfilter.org/cgi-bin/gitweb.cgi?p=libnetfilter_conntrack.git;a=summary

* libnetfilter_queue
http://git.netfilter.org/cgi-bin/gitweb.cgi?p=libnetfilter_queue.git;a=shortlog;h=refs/heads/cthelper2

WARNING: Changes may occur in the user-space side until all those cthelper
branches are merged into master. Mind that this is work-in-progress.

Pablo Neira Ayuso (7):
  netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names
  netfilter: nf_ct_ext: support variable length extensions
  netfilter: nf_ct_helper: implement variable length helper private data
  netfilter: add glue code to integrate nfnetlink_queue and ctnetlink
  netfilter: nfnl_queue: support NAT TCP sequence adjustment if packet mangled
  netfilter: ctnetlink: add CTA_HELP_INFO attribute
  netfilter: add user-space connection tracking helper infrastructure

 include/linux/netfilter.h                      |   10 +
 include/linux/netfilter/Kbuild                 |    1 +
 include/linux/netfilter/nf_conntrack_sip.h     |    1 +
 include/linux/netfilter/nfnetlink.h            |    3 +-
 include/linux/netfilter/nfnetlink_conntrack.h  |    1 +
 include/linux/netfilter/nfnetlink_cthelper.h   |   55 ++
 include/linux/netfilter/nfnetlink_queue.h      |    7 +
 include/linux/netfilter_ipv4.h                 |    1 +
 include/linux/netfilter_ipv6.h                 |    1 +
 include/net/netfilter/nf_conntrack.h           |   35 +-
 include/net/netfilter/nf_conntrack_expect.h    |    4 +-
 include/net/netfilter/nf_conntrack_extend.h    |    7 +-
 include/net/netfilter/nf_conntrack_helper.h    |   29 +-
 include/net/netfilter/nf_nat_helper.h          |    7 +
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   56 +-
 net/ipv4/netfilter/nf_nat_amanda.c             |    4 +-
 net/ipv4/netfilter/nf_nat_h323.c               |    8 +-
 net/ipv4/netfilter/nf_nat_helper.c             |   13 +
 net/ipv4/netfilter/nf_nat_pptp.c               |    6 +-
 net/ipv4/netfilter/nf_nat_sip.c                |   14 +-
 net/ipv4/netfilter/nf_nat_tftp.c               |    4 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   56 +-
 net/netfilter/Kconfig                          |    8 +
 net/netfilter/Makefile                         |    1 +
 net/netfilter/core.c                           |    4 +
 net/netfilter/nf_conntrack_core.c              |    3 +-
 net/netfilter/nf_conntrack_extend.c            |   16 +-
 net/netfilter/nf_conntrack_ftp.c               |   11 +-
 net/netfilter/nf_conntrack_h323_main.c         |   16 +-
 net/netfilter/nf_conntrack_helper.c            |   35 +-
 net/netfilter/nf_conntrack_irc.c               |    8 +-
 net/netfilter/nf_conntrack_netlink.c           |  190 ++++++-
 net/netfilter/nf_conntrack_pptp.c              |   17 +-
 net/netfilter/nf_conntrack_proto_gre.c         |   16 +-
 net/netfilter/nf_conntrack_sane.c              |   12 +-
 net/netfilter/nf_conntrack_sip.c               |   36 +-
 net/netfilter/nf_conntrack_tftp.c              |    8 +-
 net/netfilter/nfnetlink_cthelper.c             |  668 ++++++++++++++++++++++++
 net/netfilter/nfnetlink_queue.c                |   84 ++-
 net/netfilter/xt_CT.c                          |   44 +-
 40 files changed, 1309 insertions(+), 191 deletions(-)
 create mode 100644 include/linux/netfilter/nfnetlink_cthelper.h
 create mode 100644 net/netfilter/nfnetlink_cthelper.c

-- 
1.7.10

^ permalink raw reply

* [PATCH 7/7] netfilter: add user-space connection tracking helper infrastructure
From: pablo @ 2012-06-04 12:21 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev
In-Reply-To: <1338812485-4232-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

There are good reasons to supports helpers in user-space instead:

* Rapid connection tracking helper development, as developing code
  in user-space is usually faster.

* Reliability: A buggy helper does not crash the kernel. Moreover,
  we can monitor the helper process and restart it in case of problems.

* Security: Avoid complex string matching and mangling in kernel-space
  running in unprivileged mode. Going further, we can even think about
  running user-space helpers as a non-root process.

* It allows the development of very specific helpers (most likely
  non-standard proprietary protocols) that are very likely to be rejected
  for mainline inclusion in the form of kernel-space connection tracking
  helpers.

This patch adds the infrastructure to allow the implementation of
user-space conntrack helpers by means of the new nfnetlink subsystem
`nfnetlink_cthelper' and the existing queueing infrastructure
(nfnetlink_queue).

I had to add the new hook NF_IP6_PRI_CONNTRACK_HELPER to register
ipv[4|6]_helper which results from splitting ipv[4|6]_confirm into
two pieces. This change is required not to break NAT sequence
adjustment and conntrack confirmation for traffic that is enqueued
to our user-space conntrack helpers.

Basic operation, in a few steps:

1) Register user-space helper by means of `nfct':

 nfct helper add ftp inet

 [ It must be a valid existing helper supported by conntrack-tools.

2) Add rules to enable the FTP user-space helper which is
   used to track traffic going to TCP port 10000.

For locally generated packets:

 iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp

For non-locally generated packets:

 iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp

3) Run the test conntrackd in helper mode (see example files under
   doc/helper/conntrackd.conf

 conntrackd

4) Generate FTP traffic going, if everything is OK, then conntrackd
   should create expectations (you can check that with `conntrack':

 conntrack -E expect

    [NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
[DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp

This confirms that our test helper is receiving packets including the
conntrack information, and adding expectations in kernel-space.

The user-space helper can also store its private tracking information
in the conntrack structure in the kernel via the CTA_HELP_INFO. The
kernel will consider this a binary blob whose layout is unknown. This
information will be included in the information that is transfered
to user-space via glue code that integrates nfnetlink_queue and
ctnetlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/Kbuild                 |    1 +
 include/linux/netfilter/nfnetlink.h            |    3 +-
 include/linux/netfilter/nfnetlink_cthelper.h   |   55 ++
 include/linux/netfilter_ipv4.h                 |    1 +
 include/linux/netfilter_ipv6.h                 |    1 +
 include/net/netfilter/nf_conntrack_helper.h    |   11 +
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   56 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   56 +-
 net/netfilter/Kconfig                          |    8 +
 net/netfilter/Makefile                         |    1 +
 net/netfilter/nf_conntrack_helper.c            |   24 +-
 net/netfilter/nfnetlink_cthelper.c             |  668 ++++++++++++++++++++++++
 12 files changed, 858 insertions(+), 27 deletions(-)
 create mode 100644 include/linux/netfilter/nfnetlink_cthelper.h
 create mode 100644 net/netfilter/nfnetlink_cthelper.c

diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index 1697036..874ae8f 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -10,6 +10,7 @@ header-y += nfnetlink.h
 header-y += nfnetlink_acct.h
 header-y += nfnetlink_compat.h
 header-y += nfnetlink_conntrack.h
+header-y += nfnetlink_cthelper.h
 header-y += nfnetlink_cttimeout.h
 header-y += nfnetlink_log.h
 header-y += nfnetlink_queue.h
diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h
index a1048c1..18341cd 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -50,7 +50,8 @@ struct nfgenmsg {
 #define NFNL_SUBSYS_IPSET		6
 #define NFNL_SUBSYS_ACCT		7
 #define NFNL_SUBSYS_CTNETLINK_TIMEOUT	8
-#define NFNL_SUBSYS_COUNT		9
+#define NFNL_SUBSYS_CTHELPER		9
+#define NFNL_SUBSYS_COUNT		10
 
 #ifdef __KERNEL__
 
diff --git a/include/linux/netfilter/nfnetlink_cthelper.h b/include/linux/netfilter/nfnetlink_cthelper.h
new file mode 100644
index 0000000..33659f6
--- /dev/null
+++ b/include/linux/netfilter/nfnetlink_cthelper.h
@@ -0,0 +1,55 @@
+#ifndef _NFNL_CTHELPER_H_
+#define _NFNL_CTHELPER_H_
+
+#define NFCT_HELPER_STATUS_DISABLED	0
+#define NFCT_HELPER_STATUS_ENABLED	1
+
+enum nfnl_acct_msg_types {
+	NFNL_MSG_CTHELPER_NEW,
+	NFNL_MSG_CTHELPER_GET,
+	NFNL_MSG_CTHELPER_DEL,
+	NFNL_MSG_CTHELPER_MAX
+};
+
+enum nfnl_cthelper_type {
+	NFCTH_UNSPEC,
+	NFCTH_NAME,
+	NFCTH_TUPLE,
+	NFCTH_QUEUE_NUM,
+	NFCTH_POLICY,
+	NFCTH_PRIV_DATA_LEN,
+	NFCTH_STATUS,
+	__NFCTH_MAX
+};
+#define NFCTH_MAX (__NFCTH_MAX - 1)
+
+enum nfnl_cthelper_policy_type {
+	NFCTH_POLICY_SET_UNSPEC,
+	NFCTH_POLICY_SET_NUM,
+	NFCTH_POLICY_SET,
+	NFCTH_POLICY_SET1	= NFCTH_POLICY_SET,
+	NFCTH_POLICY_SET2,
+	NFCTH_POLICY_SET3,
+	NFCTH_POLICY_SET4,
+	__NFCTH_POLICY_SET_MAX
+};
+#define NFCTH_POLICY_SET_MAX (__NFCTH_POLICY_SET_MAX - 1)
+
+enum nfnl_cthelper_pol_type {
+	NFCTH_POLICY_UNSPEC,
+	NFCTH_POLICY_NAME,
+	NFCTH_POLICY_EXPECT_MAX,
+	NFCTH_POLICY_EXPECT_TIMEOUT,
+	__NFCTH_POLICY_MAX
+};
+#define NFCTH_POLICY_MAX (__NFCTH_POLICY_MAX - 1)
+
+enum nfnl_cthelper_tuple_type {
+	NFCTH_TUPLE_UNSPEC,
+	NFCTH_TUPLE_L3PROTONUM,
+	NFCTH_TUPLE_L4PROTONUM,
+	__NFCTH_TUPLE_MAX,
+};
+#define NFCTH_TUPLE_MAX (__NFCTH_TUPLE_MAX - 1)
+
+#endif /* _NFNL_CTHELPER_H */
diff --git a/include/linux/netfilter_ipv4.h b/include/linux/netfilter_ipv4.h
index fa0946c..e2b1280 100644
--- a/include/linux/netfilter_ipv4.h
+++ b/include/linux/netfilter_ipv4.h
@@ -66,6 +66,7 @@ enum nf_ip_hook_priorities {
 	NF_IP_PRI_SECURITY = 50,
 	NF_IP_PRI_NAT_SRC = 100,
 	NF_IP_PRI_SELINUX_LAST = 225,
+	NF_IP_PRI_CONNTRACK_HELPER = 300,
 	NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,
 	NF_IP_PRI_LAST = INT_MAX,
 };
diff --git a/include/linux/netfilter_ipv6.h b/include/linux/netfilter_ipv6.h
index 57c0251..7c8a513 100644
--- a/include/linux/netfilter_ipv6.h
+++ b/include/linux/netfilter_ipv6.h
@@ -71,6 +71,7 @@ enum nf_ip6_hook_priorities {
 	NF_IP6_PRI_SECURITY = 50,
 	NF_IP6_PRI_NAT_SRC = 100,
 	NF_IP6_PRI_SELINUX_LAST = 225,
+	NF_IP6_PRI_CONNTRACK_HELPER = 300,
 	NF_IP6_PRI_LAST = INT_MAX,
 };
 
diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h
index e5091a9..f499aa5 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -15,6 +15,11 @@
 
 struct module;
 
+enum nf_ct_helper_flags {
+	NF_CT_HELPER_F_USERSPACE	= (1 << 0),
+	NF_CT_HELPER_F_CONFIGURED	= (1 << 1),
+};
+
 #define NF_CT_HELPER_NAME_LEN	16
 
 struct nf_conntrack_helper {
@@ -42,6 +47,9 @@ struct nf_conntrack_helper {
 	int (*from_nlattr)(struct nlattr *attr, struct nf_conn *ct);
 	int (*to_nlattr)(struct sk_buff *skb, const struct nf_conn *ct);
 	unsigned int expect_class_max;
+
+	unsigned int flags;
+	unsigned int queue_num;		/* For user-space helpers. */
 };
 
 extern struct nf_conntrack_helper *
@@ -96,4 +104,7 @@ nf_ct_helper_expectfn_find_by_name(const char *name);
 struct nf_ct_helper_expectfn *
 nf_ct_helper_expectfn_find_by_symbol(const void *symbol);
 
+extern struct hlist_head *nf_ct_helper_hash;
+extern unsigned int nf_ct_helper_hsize;
+
 #endif /*_NF_CONNTRACK_HELPER_H*/
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 91747d4..d3cb34d 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -95,11 +95,11 @@ static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
 	return NF_ACCEPT;
 }
 
-static unsigned int ipv4_confirm(unsigned int hooknum,
-				 struct sk_buff *skb,
-				 const struct net_device *in,
-				 const struct net_device *out,
-				 int (*okfn)(struct sk_buff *))
+static unsigned int ipv4_helper(unsigned int hooknum,
+				struct sk_buff *skb,
+				const struct net_device *in,
+				const struct net_device *out,
+				int (*okfn)(struct sk_buff *))
 {
 	struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
@@ -110,24 +110,45 @@ static unsigned int ipv4_confirm(unsigned int hooknum,
 	/* This is where we call the helper: as the packet goes out. */
 	ct = nf_ct_get(skb, &ctinfo);
 	if (!ct || ctinfo == IP_CT_RELATED_REPLY)
-		goto out;
+		return NF_ACCEPT;
 
 	help = nfct_help(ct);
 	if (!help)
-		goto out;
+		return NF_ACCEPT;
 
 	/* rcu_read_lock()ed by nf_hook_slow */
 	helper = rcu_dereference(help->helper);
 	if (!helper)
-		goto out;
+		return NF_ACCEPT;
+
+	/* This is an user-space helper not yet configured, skip. */
+	if ((helper->flags &
+		(NF_CT_HELPER_F_USERSPACE | NF_CT_HELPER_F_CONFIGURED)) ==
+		 NF_CT_HELPER_F_USERSPACE) {
+		return NF_ACCEPT;
+	}
 
 	ret = helper->help(skb, skb_network_offset(skb) + ip_hdrlen(skb),
 			   ct, ctinfo);
-	if (ret != NF_ACCEPT) {
+	if (ret != NF_ACCEPT && (ret & NF_VERDICT_MASK) != NF_QUEUE) {
 		nf_log_packet(NFPROTO_IPV4, hooknum, skb, in, out, NULL,
 			      "nf_ct_%s: dropping packet", helper->name);
-		return ret;
 	}
+	return ret;
+}
+
+static unsigned int ipv4_confirm(unsigned int hooknum,
+				 struct sk_buff *skb,
+				 const struct net_device *in,
+				 const struct net_device *out,
+				 int (*okfn)(struct sk_buff *))
+{
+	struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+		return NF_ACCEPT;
 
 	/* adjust seqs for loopback traffic only in outgoing direction */
 	if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) &&
@@ -140,7 +161,6 @@ static unsigned int ipv4_confirm(unsigned int hooknum,
 			return NF_DROP;
 		}
 	}
-out:
 	/* We've seen it coming out the other side: confirm it */
 	return nf_conntrack_confirm(skb);
 }
@@ -185,6 +205,13 @@ static struct nf_hook_ops ipv4_conntrack_ops[] __read_mostly = {
 		.priority	= NF_IP_PRI_CONNTRACK,
 	},
 	{
+		.hook		= ipv4_helper,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV4,
+		.hooknum	= NF_INET_POST_ROUTING,
+		.priority	= NF_IP_PRI_CONNTRACK_HELPER,
+	},
+	{
 		.hook		= ipv4_confirm,
 		.owner		= THIS_MODULE,
 		.pf		= NFPROTO_IPV4,
@@ -192,6 +219,13 @@ static struct nf_hook_ops ipv4_conntrack_ops[] __read_mostly = {
 		.priority	= NF_IP_PRI_CONNTRACK_CONFIRM,
 	},
 	{
+		.hook		= ipv4_helper,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV4,
+		.hooknum	= NF_INET_LOCAL_IN,
+		.priority	= NF_IP_PRI_CONNTRACK_HELPER,
+	},
+	{
 		.hook		= ipv4_confirm,
 		.owner		= THIS_MODULE,
 		.pf		= NFPROTO_IPV4,
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index fe925e4..f9b3693 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -143,11 +143,11 @@ static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
 	return NF_ACCEPT;
 }
 
-static unsigned int ipv6_confirm(unsigned int hooknum,
-				 struct sk_buff *skb,
-				 const struct net_device *in,
-				 const struct net_device *out,
-				 int (*okfn)(struct sk_buff *))
+static unsigned int ipv6_helper(unsigned int hooknum,
+				struct sk_buff *skb,
+				const struct net_device *in,
+				const struct net_device *out,
+				int (*okfn)(struct sk_buff *))
 {
 	struct nf_conn *ct;
 	const struct nf_conn_help *help;
@@ -161,15 +161,15 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
 	/* This is where we call the helper: as the packet goes out. */
 	ct = nf_ct_get(skb, &ctinfo);
 	if (!ct || ctinfo == IP_CT_RELATED_REPLY)
-		goto out;
+		return NF_ACCEPT;
 
 	help = nfct_help(ct);
 	if (!help)
-		goto out;
+		return NF_ACCEPT;
 	/* rcu_read_lock()ed by nf_hook_slow */
 	helper = rcu_dereference(help->helper);
 	if (!helper)
-		goto out;
+		return NF_ACCEPT;
 
 	protoff = nf_ct_ipv6_skip_exthdr(skb, extoff, &pnum,
 					 skb->len - extoff);
@@ -178,13 +178,35 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
 		return NF_ACCEPT;
 	}
 
+	/* This is an user-space helper not yet configured, skip. */
+	if ((helper->flags &
+		(NF_CT_HELPER_F_USERSPACE | NF_CT_HELPER_F_CONFIGURED)) ==
+		 NF_CT_HELPER_F_USERSPACE) {
+		return NF_ACCEPT;
+	}
+
 	ret = helper->help(skb, protoff, ct, ctinfo);
-	if (ret != NF_ACCEPT) {
+	if (ret != NF_ACCEPT && (ret & NF_VERDICT_MASK) != NF_QUEUE) {
 		nf_log_packet(NFPROTO_IPV6, hooknum, skb, in, out, NULL,
 			      "nf_ct_%s: dropping packet", helper->name);
 		return ret;
 	}
-out:
+	return ret;
+}
+
+static unsigned int ipv6_confirm(unsigned int hooknum,
+				 struct sk_buff *skb,
+				 const struct net_device *in,
+				 const struct net_device *out,
+				 int (*okfn)(struct sk_buff *))
+{
+	struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+		return NF_ACCEPT;
+
 	/* We've seen it coming out the other side: confirm it */
 	return nf_conntrack_confirm(skb);
 }
@@ -255,6 +277,13 @@ static struct nf_hook_ops ipv6_conntrack_ops[] __read_mostly = {
 		.priority	= NF_IP6_PRI_CONNTRACK,
 	},
 	{
+		.hook		= ipv6_helper,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_POST_ROUTING,
+		.priority	= NF_IP6_PRI_CONNTRACK_HELPER,
+	},
+	{
 		.hook		= ipv6_confirm,
 		.owner		= THIS_MODULE,
 		.pf		= NFPROTO_IPV6,
@@ -262,6 +291,13 @@ static struct nf_hook_ops ipv6_conntrack_ops[] __read_mostly = {
 		.priority	= NF_IP6_PRI_LAST,
 	},
 	{
+		.hook		= ipv6_helper,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_LOCAL_IN,
+		.priority	= NF_IP6_PRI_CONNTRACK_HELPER,
+	},
+	{
 		.hook		= ipv6_confirm,
 		.owner		= THIS_MODULE,
 		.pf		= NFPROTO_IPV6,
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 209c1ed..cd5668e 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -12,6 +12,14 @@ tristate "Netfilter NFACCT over NFNETLINK interface"
 	  If this option is enabled, the kernel will include support
 	  for extended accounting via NFNETLINK.
 
+config NETFILTER_NETLINK_CTHELPER
+tristate "Netfilter NFCT_HELPER over NFNETLINK interface"
+	depends on NETFILTER_ADVANCED
+	select NETFILTER_NETLINK
+	help
+	  If this option is enabled, the kernel will include support
+	  for user-space connection tracking helpers via NFNETLINK.
+
 config NETFILTER_NETLINK_QUEUE
 	tristate "Netfilter NFQUEUE over NFNETLINK interface"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 4e7960c..2f3bc0f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_NETFILTER) = netfilter.o
 
 obj-$(CONFIG_NETFILTER_NETLINK) += nfnetlink.o
 obj-$(CONFIG_NETFILTER_NETLINK_ACCT) += nfnetlink_acct.o
+obj-$(CONFIG_NETFILTER_NETLINK_CTHELPER) += nfnetlink_cthelper.o
 obj-$(CONFIG_NETFILTER_NETLINK_QUEUE) += nfnetlink_queue.o
 obj-$(CONFIG_NETFILTER_NETLINK_LOG) += nfnetlink_log.o
 
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index e0d3f4b..3d91d11 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -30,8 +30,10 @@
 #include <net/netfilter/nf_conntrack_extend.h>
 
 static DEFINE_MUTEX(nf_ct_helper_mutex);
-static struct hlist_head *nf_ct_helper_hash __read_mostly;
-static unsigned int nf_ct_helper_hsize __read_mostly;
+struct hlist_head *nf_ct_helper_hash __read_mostly;
+EXPORT_SYMBOL_GPL(nf_ct_helper_hash);
+unsigned int nf_ct_helper_hsize __read_mostly;
+EXPORT_SYMBOL_GPL(nf_ct_helper_hsize);
 static unsigned int nf_ct_helper_count __read_mostly;
 
 static bool nf_ct_auto_assign_helper __read_mostly = true;
@@ -322,18 +324,30 @@ EXPORT_SYMBOL_GPL(nf_ct_helper_expectfn_find_by_symbol);
 
 int nf_conntrack_helper_register(struct nf_conntrack_helper *me)
 {
+	int ret = 0;
+	struct nf_conntrack_helper *cur;
+	struct hlist_node *n;
 	unsigned int h = helper_hash(&me->tuple);
 
-	BUG_ON(me->expect_policy == NULL);
+	BUG_ON(me->expect_policy == NULL &&
+	       !(me->flags & NF_CT_HELPER_F_USERSPACE));
 	BUG_ON(me->expect_class_max >= NF_CT_MAX_EXPECT_CLASSES);
 	BUG_ON(strlen(me->name) > NF_CT_HELPER_NAME_LEN - 1);
 
 	mutex_lock(&nf_ct_helper_mutex);
+	hlist_for_each_entry(cur, n, &nf_ct_helper_hash[h], hnode) {
+		if (strncmp(cur->name, me->name, NF_CT_HELPER_NAME_LEN) == 0 &&
+		    cur->tuple.src.l3num == me->tuple.src.l3num &&
+		    cur->tuple.dst.protonum == me->tuple.dst.protonum) {
+			ret = -EEXIST;
+			goto out;
+		}
+	}
 	hlist_add_head_rcu(&me->hnode, &nf_ct_helper_hash[h]);
 	nf_ct_helper_count++;
+out:
 	mutex_unlock(&nf_ct_helper_mutex);
-
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_helper_register);
 
diff --git a/net/netfilter/nfnetlink_cthelper.c b/net/netfilter/nfnetlink_cthelper.c
new file mode 100644
index 0000000..8c80a34
--- /dev/null
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -0,0 +1,668 @@
+/*
+ * (C) 2012 Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation (or any later at your option).
+ *
+ * This software has been sponsored by Vyatta Inc. <http://www.vyatta.com>
+ */
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/netlink.h>
+#include <linux/rculist.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/errno.h>
+#include <net/netlink.h>
+#include <net/sock.h>
+
+#include <net/netfilter/nf_conntrack_helper.h>
+#include <net/netfilter/nf_conntrack_expect.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/nfnetlink_conntrack.h>
+#include <linux/netfilter/nfnetlink_cthelper.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
+MODULE_DESCRIPTION("nfnl_cthelper: User-space connection tracking helpers");
+
+static int
+nfnl_userspace_cthelper(struct sk_buff *skb, unsigned int protoff,
+			struct nf_conn *ct, enum ip_conntrack_info ctinfo)
+{
+	const struct nf_conn_help *help;
+	struct nf_conntrack_helper *helper;
+
+	help = nfct_help(ct);
+	if (help == NULL)
+		return NF_DROP;
+
+	/* rcu_read_lock()ed by nf_hook_slow */
+	helper = rcu_dereference(help->helper);
+	if (helper == NULL)
+		return NF_DROP;
+
+	/* If the user-space helper is not available, don't block traffic. */
+	return NF_QUEUE_NR(helper->queue_num) | NF_VERDICT_FLAG_QUEUE_BYPASS;
+}
+
+static const struct nla_policy nfnl_cthelper_tuple_pol[NFCTH_TUPLE_MAX+1] = {
+	[NFCTH_TUPLE_L3PROTONUM] = { .type = NLA_U16, },
+	[NFCTH_TUPLE_L4PROTONUM] = { .type = NLA_U8, },
+};
+
+static int
+nfnl_cthelper_parse_tuple(struct nf_conntrack_tuple *tuple,
+			  const struct nlattr *attr)
+{
+	struct nlattr *tb[NFCTH_TUPLE_MAX+1];
+
+	nla_parse_nested(tb, NFCTH_TUPLE_MAX, attr, nfnl_cthelper_tuple_pol);
+
+	if (!tb[NFCTH_TUPLE_L3PROTONUM] || !tb[NFCTH_TUPLE_L4PROTONUM])
+		return -EINVAL;
+
+	tuple->src.l3num = ntohs(nla_get_u16(tb[NFCTH_TUPLE_L3PROTONUM]));
+	tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]);
+
+	return 0;
+}
+
+static int
+nfnl_cthelper_from_nlattr(struct nlattr *attr, struct nf_conn *ct)
+{
+	const struct nf_conn_help *help = nfct_help(ct);
+
+	if (help->helper->data_len == 0)
+		return -EINVAL;
+
+	memcpy(&help->data, nla_data(attr), help->helper->data_len);
+	return 0;
+}
+
+static int
+nfnl_cthelper_to_nlattr(struct sk_buff *skb, const struct nf_conn *ct)
+{
+	const struct nf_conn_help *help = nfct_help(ct);
+
+	if (help->helper->data_len &&
+	    nla_put(skb, CTA_HELP_INFO, help->helper->data_len, &help->data))
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -ENOSPC;
+}
+
+static const struct nla_policy nfnl_cthelper_expect_pol[NFCTH_POLICY_MAX+1] = {
+	[NFCTH_POLICY_NAME] = { .type = NLA_NUL_STRING,
+				.len = NF_CT_HELPER_NAME_LEN-1 },
+	[NFCTH_POLICY_EXPECT_MAX] = { .type = NLA_U32, },
+	[NFCTH_POLICY_EXPECT_TIMEOUT] = { .type = NLA_U32, },
+};
+
+static int
+nfnl_cthelper_expect_policy(struct nf_conntrack_expect_policy *expect_policy,
+			    const struct nlattr *attr)
+{
+	struct nlattr *tb[NFCTH_POLICY_MAX+1];
+
+	nla_parse_nested(tb, NFCTH_POLICY_MAX, attr, nfnl_cthelper_expect_pol);
+
+	if (!tb[NFCTH_POLICY_NAME] ||
+	    !tb[NFCTH_POLICY_EXPECT_MAX] ||
+	    !tb[NFCTH_POLICY_EXPECT_TIMEOUT])
+		return -EINVAL;
+
+	strncpy(expect_policy->name,
+		nla_data(tb[NFCTH_POLICY_NAME]), NF_CT_HELPER_NAME_LEN);
+	expect_policy->max_expected =
+		ntohl(nla_get_be32(tb[NFCTH_POLICY_EXPECT_MAX]));
+	expect_policy->timeout =
+		ntohl(nla_get_be32(tb[NFCTH_POLICY_EXPECT_TIMEOUT]));
+
+	return 0;
+}
+
+static const struct nla_policy
+nfnl_cthelper_expect_policy_set[NFCTH_POLICY_SET_MAX+1] = {
+	[NFCTH_POLICY_SET_NUM] = { .type = NLA_U32, },
+};
+
+static int
+nfnl_cthelper_parse_expect_policy(struct nf_conntrack_helper *helper,
+				  const struct nlattr *attr)
+{
+	int i, ret;
+	struct nf_conntrack_expect_policy *expect_policy;
+	struct nlattr *tb[NFCTH_POLICY_SET_MAX+1];
+
+	nla_parse_nested(tb, NFCTH_POLICY_SET_MAX, attr,
+					nfnl_cthelper_expect_policy_set);
+
+	if (!tb[NFCTH_POLICY_SET_NUM])
+		return -EINVAL;
+
+	helper->expect_class_max =
+		ntohl(nla_get_be32(tb[NFCTH_POLICY_SET_NUM]));
+
+	if (helper->expect_class_max != 0 &&
+	    helper->expect_class_max > NF_CT_MAX_EXPECT_CLASSES)
+		return -EOVERFLOW;
+
+	expect_policy = kzalloc(sizeof(struct nf_conntrack_expect_policy) *
+				helper->expect_class_max, GFP_KERNEL);
+	if (expect_policy == NULL)
+		return -ENOMEM;
+
+	for (i=0; i<helper->expect_class_max; i++) {
+		if (!tb[NFCTH_POLICY_SET+i])
+			goto err;
+
+		ret = nfnl_cthelper_expect_policy(&expect_policy[i],
+						  tb[NFCTH_POLICY_SET+i]);
+		if (ret < 0)
+			goto err;
+	}
+	helper->expect_policy = expect_policy;
+	return 0;
+err:
+	kfree(expect_policy);
+	return -EINVAL;
+}
+
+static int
+nfnl_cthelper_create(const struct nlattr * const tb[],
+		     struct nf_conntrack_tuple *tuple)
+{
+	struct nf_conntrack_helper *helper;
+	int ret;
+
+	if (!tb[NFCTH_TUPLE] || !tb[NFCTH_POLICY] || !tb[NFCTH_PRIV_DATA_LEN])
+		return -EINVAL;
+
+	helper = kzalloc(sizeof(struct nf_conntrack_helper), GFP_KERNEL);
+	if (helper == NULL)
+		return -ENOMEM;
+
+	ret = nfnl_cthelper_parse_expect_policy(helper, tb[NFCTH_POLICY]);
+	if (ret < 0)
+		goto err;
+
+	strncpy(helper->name, nla_data(tb[NFCTH_NAME]), NF_CT_HELPER_NAME_LEN);
+	helper->data_len = ntohl(nla_get_be32(tb[NFCTH_PRIV_DATA_LEN]));
+	helper->flags |= NF_CT_HELPER_F_USERSPACE;
+	memcpy(&helper->tuple, tuple, sizeof(struct nf_conntrack_tuple));
+
+	helper->me = THIS_MODULE;
+	helper->help = nfnl_userspace_cthelper;
+	helper->from_nlattr = nfnl_cthelper_from_nlattr;
+	helper->to_nlattr = nfnl_cthelper_to_nlattr;
+
+	/* Default to queue number zero, this can be updated at any time. */
+	if (tb[NFCTH_QUEUE_NUM])
+		helper->queue_num = ntohl(nla_get_be32(tb[NFCTH_QUEUE_NUM]));
+
+	if (tb[NFCTH_STATUS]) {
+		int status = ntohl(nla_get_be32(tb[NFCTH_STATUS]));
+
+		switch(status) {
+		case NFCT_HELPER_STATUS_ENABLED:
+			helper->flags |= NF_CT_HELPER_F_CONFIGURED;
+			break;
+		case NFCT_HELPER_STATUS_DISABLED:
+			helper->flags &= ~NF_CT_HELPER_F_CONFIGURED;
+			break;
+		}
+	}
+
+	ret = nf_conntrack_helper_register(helper);
+	if (ret < 0)
+		goto err;
+
+	return 0;
+err:
+	kfree(helper);
+	return ret;
+}
+
+static int
+nfnl_cthelper_update(const struct nlattr * const tb[],
+		     struct nf_conntrack_helper *helper)
+{
+	int ret;
+
+	if (tb[NFCTH_PRIV_DATA_LEN])
+		return -EBUSY;
+
+	if (tb[NFCTH_POLICY]) {
+		ret = nfnl_cthelper_parse_expect_policy(helper,
+							tb[NFCTH_POLICY]);
+		if (ret < 0)
+			return ret;
+	}
+	if (tb[NFCTH_QUEUE_NUM])
+		helper->queue_num = ntohl(nla_get_be32(tb[NFCTH_QUEUE_NUM]));
+
+	if (tb[NFCTH_STATUS]) {
+		int status = ntohl(nla_get_be32(tb[NFCTH_STATUS]));
+
+		switch(status) {
+		case NFCT_HELPER_STATUS_ENABLED:
+			helper->flags |= NF_CT_HELPER_F_CONFIGURED;
+			break;
+		case NFCT_HELPER_STATUS_DISABLED:
+			helper->flags &= ~NF_CT_HELPER_F_CONFIGURED;
+			break;
+		}
+	}
+	return 0;
+}
+
+static int
+nfnl_cthelper_new(struct sock *nfnl, struct sk_buff *skb,
+		  const struct nlmsghdr *nlh, const struct nlattr * const tb[])
+{
+	const char *helper_name;
+	struct nf_conntrack_helper *cur, *helper = NULL;
+	struct nf_conntrack_tuple tuple;
+	struct hlist_node *n;
+	int ret = 0, i;
+
+	if (!tb[NFCTH_NAME] || !tb[NFCTH_TUPLE])
+		return -EINVAL;
+
+	helper_name = nla_data(tb[NFCTH_NAME]);
+
+	ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]);
+	if (ret < 0)
+		return ret;
+
+	rcu_read_lock();
+	for (i = 0; i < nf_ct_helper_hsize && !helper; i++) {
+		hlist_for_each_entry_rcu(cur, n, &nf_ct_helper_hash[i], hnode) {
+
+			/* skip non-userspace conntrack helpers. */
+			if (!(cur->flags & NF_CT_HELPER_F_USERSPACE))
+				continue;
+
+			if (strncmp(cur->name, helper_name,
+					NF_CT_HELPER_NAME_LEN) != 0)
+				continue;
+
+			if ((tuple.src.l3num != cur->tuple.src.l3num ||
+			     tuple.dst.protonum != cur->tuple.dst.protonum))
+				continue;
+
+			if (nlh->nlmsg_flags & NLM_F_EXCL) {
+				ret = -EEXIST;
+				goto err;
+			}
+			helper = cur;
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	if (helper == NULL)
+		ret = nfnl_cthelper_create(tb, &tuple);
+	else
+		ret = nfnl_cthelper_update(tb, helper);
+
+	return ret;
+err:
+	rcu_read_unlock();
+	return ret;
+}
+
+static int
+nfnl_cthelper_dump_tuple(struct sk_buff *skb,
+			 struct nf_conntrack_helper *helper)
+{
+	struct nlattr *nest_parms;
+
+	nest_parms = nla_nest_start(skb, NFCTH_TUPLE | NLA_F_NESTED);
+	if (nest_parms == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u16(skb, NFCTH_TUPLE_L3PROTONUM,
+			htons(helper->tuple.src.l3num)))
+		goto nla_put_failure;
+
+	if (nla_put_u8(skb, NFCTH_TUPLE_L4PROTONUM, helper->tuple.dst.protonum))
+		goto nla_put_failure;
+
+	nla_nest_end(skb, nest_parms);
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
+static int
+nfnl_cthelper_dump_policy(struct sk_buff *skb,
+			struct nf_conntrack_helper *helper)
+{
+	int i;
+	struct nlattr *nest_parms1, *nest_parms2;
+
+	nest_parms1 = nla_nest_start(skb, NFCTH_POLICY | NLA_F_NESTED);
+	if (nest_parms1 == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, NFCTH_POLICY_SET_NUM,
+			htonl(helper->expect_class_max)))
+		goto nla_put_failure;
+
+	for (i=0; i<helper->expect_class_max; i++) {
+		nest_parms2 = nla_nest_start(skb,
+				(NFCTH_POLICY_SET+i) | NLA_F_NESTED);
+		if (nest_parms2 == NULL)
+			goto nla_put_failure;
+
+		if (nla_put_string(skb, NFCTH_POLICY_NAME,
+				   helper->expect_policy[i].name))
+			goto nla_put_failure;
+
+		if (nla_put_u32(skb, NFCTH_POLICY_EXPECT_MAX,
+				htonl(helper->expect_policy[i].max_expected)))
+			goto nla_put_failure;
+
+		if (nla_put_u32(skb, NFCTH_POLICY_EXPECT_TIMEOUT,
+				htonl(helper->expect_policy[i].timeout)))
+			goto nla_put_failure;
+
+		nla_nest_end(skb, nest_parms2);
+	}
+	nla_nest_end(skb, nest_parms1);
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
+static int
+nfnl_cthelper_fill_info(struct sk_buff *skb, u32 pid, u32 seq, u32 type,
+			int event, struct nf_conntrack_helper *helper)
+{
+	struct nlmsghdr *nlh;
+	struct nfgenmsg *nfmsg;
+	unsigned int flags = pid ? NLM_F_MULTI : 0;
+	int status;
+
+	event |= NFNL_SUBSYS_CTHELPER << 8;
+	nlh = nlmsg_put(skb, pid, seq, event, sizeof(*nfmsg), flags);
+	if (nlh == NULL)
+		goto nlmsg_failure;
+
+	nfmsg = nlmsg_data(nlh);
+	nfmsg->nfgen_family = AF_UNSPEC;
+	nfmsg->version = NFNETLINK_V0;
+	nfmsg->res_id = 0;
+
+	if (nla_put_string(skb, NFCTH_NAME, helper->name))
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, NFCTH_QUEUE_NUM, htonl(helper->queue_num)))
+		goto nla_put_failure;
+
+	if (nfnl_cthelper_dump_tuple(skb, helper) < 0)
+		goto nla_put_failure;
+
+	if (nfnl_cthelper_dump_policy(skb, helper) < 0)
+		goto nla_put_failure;
+
+	if (nla_put_be32(skb, NFCTH_PRIV_DATA_LEN, htonl(helper->data_len)))
+		goto nla_put_failure;
+
+	if (helper->flags & NF_CT_HELPER_F_CONFIGURED)
+		status = NFCT_HELPER_STATUS_ENABLED;
+	else
+		status = NFCT_HELPER_STATUS_DISABLED;
+
+	if (nla_put_be32(skb, NFCTH_STATUS, htonl(status)))
+		goto nla_put_failure;
+
+	nlmsg_end(skb, nlh);
+	return skb->len;
+
+nlmsg_failure:
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -1;
+}
+
+static int
+nfnl_cthelper_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct nf_conntrack_helper *cur, *last;
+	struct hlist_node *n;
+
+	rcu_read_lock();
+	last = (struct nf_conntrack_helper *)cb->args[1];
+	for (; cb->args[0] < nf_ct_helper_hsize; cb->args[0]++) {
+restart:
+		hlist_for_each_entry_rcu(cur, n,
+				&nf_ct_helper_hash[cb->args[0]], hnode) {
+
+			/* skip non-userspace conntrack helpers. */
+			if (!(cur->flags & NF_CT_HELPER_F_USERSPACE))
+				continue;
+
+			if (cb->args[1]) {
+				if (cur != last)
+					continue;
+				cb->args[1] = 0;
+			}
+			if (nfnl_cthelper_fill_info(skb,
+					    NETLINK_CB(cb->skb).pid,
+					    cb->nlh->nlmsg_seq,
+					    NFNL_MSG_TYPE(cb->nlh->nlmsg_type),
+					    NFNL_MSG_CTHELPER_NEW, cur) < 0) {
+				cb->args[1] = (unsigned long)cur;
+				goto out;
+			}
+		}
+	}
+	if (cb->args[1]) {
+		cb->args[1] = 0;
+		goto restart;
+	}
+out:
+	rcu_read_unlock();
+	return skb->len;
+}
+
+static int
+nfnl_cthelper_get(struct sock *nfnl, struct sk_buff *skb,
+		  const struct nlmsghdr *nlh, const struct nlattr * const tb[])
+{
+	int ret = -ENOENT, i;
+	struct nf_conntrack_helper *cur;
+	struct hlist_node *n;
+	struct sk_buff *skb2;
+	char *helper_name = NULL;
+	struct nf_conntrack_tuple tuple;
+	bool tuple_set = false;
+
+	if (nlh->nlmsg_flags & NLM_F_DUMP) {
+		struct netlink_dump_control c = {
+			.dump = nfnl_cthelper_dump_table,
+		};
+		return netlink_dump_start(nfnl, skb, nlh, &c);
+	}
+
+	if (tb[NFCTH_NAME])
+		helper_name = nla_data(tb[NFCTH_NAME]);
+
+	if (tb[NFCTH_TUPLE]) {
+		ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]);
+		if (ret < 0)
+			return ret;
+
+		tuple_set = true;
+	}
+
+	for (i = 0; i < nf_ct_helper_hsize; i++) {
+		hlist_for_each_entry_rcu(cur, n, &nf_ct_helper_hash[i], hnode) {
+
+			/* skip non-userspace conntrack helpers. */
+			if (!(cur->flags & NF_CT_HELPER_F_USERSPACE))
+				continue;
+
+			if (helper_name && strncmp(cur->name, helper_name,
+						NF_CT_HELPER_NAME_LEN) != 0) {
+				continue;
+			}
+			if (tuple_set &&
+			    (tuple.src.l3num != cur->tuple.src.l3num ||
+			     tuple.dst.protonum != cur->tuple.dst.protonum))
+				continue;
+
+			skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+			if (skb2 == NULL) {
+				ret = -ENOMEM;
+				break;
+			}
+
+			ret = nfnl_cthelper_fill_info(skb2, NETLINK_CB(skb).pid,
+						nlh->nlmsg_seq,
+						NFNL_MSG_TYPE(nlh->nlmsg_type),
+						NFNL_MSG_CTHELPER_NEW, cur);
+			if (ret <= 0) {
+				kfree_skb(skb2);
+				break;
+			}
+
+			ret = netlink_unicast(nfnl, skb2, NETLINK_CB(skb).pid,
+						MSG_DONTWAIT);
+			if (ret > 0)
+				ret = 0;
+
+			/* this avoids a loop in nfnetlink. */
+			return ret == -EAGAIN ? -ENOBUFS : ret;
+		}
+	}
+	return ret;
+}
+
+static int
+nfnl_cthelper_del(struct sock *nfnl, struct sk_buff *skb,
+	     const struct nlmsghdr *nlh, const struct nlattr * const tb[])
+{
+	char *helper_name = NULL;
+	struct nf_conntrack_helper *cur;
+	struct hlist_node *n, *tmp;
+	struct nf_conntrack_tuple tuple;
+	bool tuple_set = false, found = false;
+	int i, j = 0, ret;
+
+	if (tb[NFCTH_NAME])
+		helper_name = nla_data(tb[NFCTH_NAME]);
+
+	if (tb[NFCTH_TUPLE]) {
+		ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]);
+		if (ret < 0)
+			return ret;
+
+		tuple_set = true;
+	}
+
+	for (i = 0; i < nf_ct_helper_hsize; i++) {
+		hlist_for_each_entry_safe(cur, n, tmp, &nf_ct_helper_hash[i],
+								hnode) {
+			/* skip non-userspace conntrack helpers. */
+			if (!(cur->flags & NF_CT_HELPER_F_USERSPACE))
+				continue;
+
+			j++;
+
+			if (helper_name && strncmp(cur->name, helper_name,
+						NF_CT_HELPER_NAME_LEN) != 0) {
+				continue;
+			}
+			if (tuple_set &&
+			    (tuple.src.l3num != cur->tuple.src.l3num ||
+			     tuple.dst.protonum != cur->tuple.dst.protonum))
+				continue;
+
+			found = true;
+			nf_conntrack_helper_unregister(cur);
+		}
+	}
+	/* Make sure we return success if we flush and there is no helpers */
+	return (found || j == 0) ? 0 : -ENOENT;
+}
+
+static const struct nla_policy nfnl_cthelper_policy[NFCTH_MAX+1] = {
+	[NFCTH_NAME] = { .type = NLA_NUL_STRING,
+			 .len = NF_CT_HELPER_NAME_LEN-1 },
+	[NFCTH_QUEUE_NUM] = { .type = NLA_U32, },
+};
+
+static const struct nfnl_callback nfnl_cthelper_cb[NFNL_MSG_CTHELPER_MAX] = {
+	[NFNL_MSG_CTHELPER_NEW]		= { .call = nfnl_cthelper_new,
+					    .attr_count = NFCTH_MAX,
+					    .policy = nfnl_cthelper_policy },
+	[NFNL_MSG_CTHELPER_GET]		= { .call = nfnl_cthelper_get,
+					    .attr_count = NFCTH_MAX,
+					    .policy = nfnl_cthelper_policy },
+	[NFNL_MSG_CTHELPER_DEL]		= { .call = nfnl_cthelper_del,
+					    .attr_count = NFCTH_MAX,
+					    .policy = nfnl_cthelper_policy },
+};
+
+static const struct nfnetlink_subsystem nfnl_cthelper_subsys = {
+	.name				= "cthelper",
+	.subsys_id			= NFNL_SUBSYS_CTHELPER,
+	.cb_count			= NFNL_MSG_CTHELPER_MAX,
+	.cb				= nfnl_cthelper_cb,
+};
+
+MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_CTHELPER);
+
+static int __init nfnl_cthelper_init(void)
+{
+	int ret;
+
+	pr_info("nfnl_cthelper: registering with nfnetlink.\n");
+	ret = nfnetlink_subsys_register(&nfnl_cthelper_subsys);
+	if (ret < 0) {
+		pr_err("nfnl_cthelper: cannot register with nfnetlink.\n");
+		goto err_out;
+	}
+	return 0;
+err_out:
+	return ret;
+}
+
+static void __exit nfnl_cthelper_exit(void)
+{
+	struct nf_conntrack_helper *cur;
+	struct hlist_node *n, *tmp;
+	int i;
+
+	pr_info("nfnl_cthelper: unregistering from nfnetlink.\n");
+	nfnetlink_subsys_unregister(&nfnl_cthelper_subsys);
+
+	for (i=0; i<nf_ct_helper_hsize; i++) {
+		hlist_for_each_entry_safe(cur, n, tmp, &nf_ct_helper_hash[i],
+									hnode) {
+			/* skip non-userspace conntrack helpers. */
+			if (!(cur->flags & NF_CT_HELPER_F_USERSPACE))
+				continue;
+
+			nf_conntrack_helper_unregister(cur);
+		}
+	}
+}
+
+module_init(nfnl_cthelper_init);
+module_exit(nfnl_cthelper_exit);
-- 
1.7.10


^ permalink raw reply related

* [PATCH 6/7] netfilter: ctnetlink: add CTA_HELP_INFO attribute
From: pablo @ 2012-06-04 12:21 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev
In-Reply-To: <1338812485-4232-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

This attribute can be used to modify and to dump the internal
protocol information. This is required by the follow-up patch
that adds the user-space cthelper infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/nfnetlink_conntrack.h |    1 +
 include/net/netfilter/nf_conntrack_helper.h   |    1 +
 net/netfilter/nf_conntrack_netlink.c          |   28 ++++++++++++++++++++-----
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink_conntrack.h b/include/linux/netfilter/nfnetlink_conntrack.h
index e58e4b9..7688833 100644
--- a/include/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/linux/netfilter/nfnetlink_conntrack.h
@@ -191,6 +191,7 @@ enum ctattr_expect_nat {
 enum ctattr_help {
 	CTA_HELP_UNSPEC,
 	CTA_HELP_NAME,
+	CTA_HELP_INFO,
 	__CTA_HELP_MAX
 };
 #define CTA_HELP_MAX (__CTA_HELP_MAX - 1)
diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h
index 5afdcf2..e5091a9 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -39,6 +39,7 @@ struct nf_conntrack_helper {
 
 	void (*destroy)(struct nf_conn *ct);
 
+	int (*from_nlattr)(struct nlattr *attr, struct nf_conn *ct);
 	int (*to_nlattr)(struct sk_buff *skb, const struct nf_conn *ct);
 	unsigned int expect_class_max;
 };
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 28ac04c..27b6a75 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -901,7 +901,8 @@ static const struct nla_policy help_nla_policy[CTA_HELP_MAX+1] = {
 };
 
 static inline int
-ctnetlink_parse_help(const struct nlattr *attr, char **helper_name)
+ctnetlink_parse_help(const struct nlattr *attr, char **helper_name,
+		     struct nlattr **helpinfo)
 {
 	struct nlattr *tb[CTA_HELP_MAX+1];
 
@@ -912,6 +913,9 @@ ctnetlink_parse_help(const struct nlattr *attr, char **helper_name)
 
 	*helper_name = nla_data(tb[CTA_HELP_NAME]);
 
+	if (tb[CTA_HELP_INFO])
+		*helpinfo = tb[CTA_HELP_INFO];
+
 	return 0;
 }
 
@@ -1172,13 +1176,14 @@ ctnetlink_change_helper(struct nf_conn *ct, const struct nlattr * const cda[])
 	struct nf_conntrack_helper *helper;
 	struct nf_conn_help *help = nfct_help(ct);
 	char *helpname = NULL;
+	struct nlattr *helpinfo = NULL;
 	int err;
 
 	/* don't change helper of sibling connections */
 	if (ct->master)
 		return -EBUSY;
 
-	err = ctnetlink_parse_help(cda[CTA_HELP], &helpname);
+	err = ctnetlink_parse_help(cda[CTA_HELP], &helpname, &helpinfo);
 	if (err < 0)
 		return err;
 
@@ -1213,8 +1218,12 @@ ctnetlink_change_helper(struct nf_conn *ct, const struct nlattr * const cda[])
 	}
 
 	if (help) {
-		if (help->helper == helper)
+		if (help->helper == helper) {
+			/* update private helper data if allowed. */
+			if (helper->from_nlattr && helpinfo)
+				helper->from_nlattr(helpinfo, ct);
 			return 0;
+		}
 		if (help->helper)
 			return -EBUSY;
 		/* need to zero data of old helper */
@@ -1410,8 +1419,9 @@ ctnetlink_create_conntrack(struct net *net, u16 zone,
 	rcu_read_lock();
  	if (cda[CTA_HELP]) {
 		char *helpname = NULL;
- 
- 		err = ctnetlink_parse_help(cda[CTA_HELP], &helpname);
+		struct nlattr *helpinfo = NULL;
+
+		err = ctnetlink_parse_help(cda[CTA_HELP], &helpname, &helpinfo);
  		if (err < 0)
 			goto err2;
 
@@ -1445,6 +1455,9 @@ ctnetlink_create_conntrack(struct net *net, u16 zone,
 				err = -ENOMEM;
 				goto err2;
 			}
+			/* set private helper data if allowed. */
+			if (helper->from_nlattr && helpinfo)
+				helper->from_nlattr(helpinfo, ct);
 
 			/* not in hash table yet so not strictly necessary */
 			RCU_INIT_POINTER(help->helper, helper);
@@ -1745,6 +1758,11 @@ ctnetlink_nfqueue_parse(const struct nlattr *attr, struct nf_conn *ct)
 		if (err < 0)
 			return err;
 	}
+	if (cda[CTA_HELP]) {
+		err = ctnetlink_change_helper(ct, cda);
+		if (err < 0)
+			return err;
+	}
 #if defined(CONFIG_NF_CONNTRACK_MARK)
 	if (cda[CTA_MARK])
 		ct->mark = ntohl(nla_get_be32(cda[CTA_MARK]));
-- 
1.7.10


^ permalink raw reply related

* [PATCH 5/7] netfilter: nfnl_queue: support NAT TCP sequence adjustment if packet mangled
From: pablo @ 2012-06-04 12:21 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev
In-Reply-To: <1338812485-4232-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.

With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:

1) The packet has been mangled,
2) the NFQNL_F_CONNTRACK flag has been set, and
3) NAT is detected.

This is required by the new user-space cthelper infrastructure.
By now, we only support TCP since we have no helpers for DCCP
or SCTP. Better to add this if we do ever need it.

You can still use this without the cthelper infrastructure. There
are several post on the Internet complaning about this (mostly
people using NFQUEUE for IDS).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_nat_helper.h |    7 ++++++
 net/ipv4/netfilter/nf_nat_helper.c    |   13 ++++++++++
 net/netfilter/nfnetlink_queue.c       |   42 ++++++++++++++++++++++-----------
 3 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/include/net/netfilter/nf_nat_helper.h b/include/net/netfilter/nf_nat_helper.h
index 02bb6c2..ee92a0c 100644
--- a/include/net/netfilter/nf_nat_helper.h
+++ b/include/net/netfilter/nf_nat_helper.h
@@ -46,6 +46,13 @@ extern int (*nf_nat_seq_adjust_hook)(struct sk_buff *skb,
 				     struct nf_conn *ct,
 				     enum ip_conntrack_info ctinfo);
 
+extern void nf_nat_tcp_seq_adjust(struct sk_buff *skb,
+				  struct nf_conn *ct,
+				  enum ip_conntrack_info ctinfo,
+				  int off);
+
+struct nf_conntrack_expect;
+
 /* Setup NAT on this expected conntrack so it follows master, but goes
  * to port ct->master->saved_proto. */
 extern void nf_nat_follow_master(struct nf_conn *ct,
diff --git a/net/ipv4/netfilter/nf_nat_helper.c b/net/ipv4/netfilter/nf_nat_helper.c
index af65958..7c7b5b8 100644
--- a/net/ipv4/netfilter/nf_nat_helper.c
+++ b/net/ipv4/netfilter/nf_nat_helper.c
@@ -153,6 +153,19 @@ void nf_nat_set_seq_adjust(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
 }
 EXPORT_SYMBOL_GPL(nf_nat_set_seq_adjust);
 
+void nf_nat_tcp_seq_adjust(struct sk_buff *skb, struct nf_conn *ct,
+			   enum ip_conntrack_info ctinfo, int off)
+{
+	const struct tcphdr *th;
+
+	if (nf_ct_protonum(ct) != IPPROTO_TCP)
+		return;
+
+	th = (struct tcphdr *)(skb_network_header(skb)+ ip_hdrlen(skb));
+	nf_nat_set_seq_adjust(ct, ctinfo, th->seq, off);
+}
+EXPORT_SYMBOL_GPL(nf_nat_tcp_seq_adjust);
+
 static void nf_nat_csum(struct sk_buff *skb, const struct iphdr *iph, void *data,
 			int datalen, __sum16 *check, int oldlen)
 {
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index b007779..b18a367 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -38,6 +38,10 @@
 #include "../bridge/br_private.h"
 #endif
 
+#ifdef CONFIG_NF_NAT_NEEDED
+#include <net/netfilter/nf_nat_helper.h>
+#endif
+
 #define NFQNL_QMAX_DEFAULT 1024
 
 struct nfqnl_instance {
@@ -497,12 +501,10 @@ err_out:
 }
 
 static int
-nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e)
+nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e, int diff)
 {
 	struct sk_buff *nskb;
-	int diff;
 
-	diff = data_len - e->skb->len;
 	if (diff < 0) {
 		if (pskb_trim(e->skb, data_len))
 			return -ENOMEM;
@@ -761,6 +763,8 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 	unsigned int verdict;
 	struct nf_queue_entry *entry;
 	struct nfq_ct_hook *nfq_ct;
+	enum ip_conntrack_info uninitialized_var(ctinfo);
+	struct nf_conn *ct = NULL;
 
 	queue = instance_lookup(queue_num);
 	if (!queue)
@@ -779,26 +783,36 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 	if (entry == NULL)
 		return -ENOENT;
 
+	rcu_read_lock();
+	nfq_ct = rcu_dereference(nfq_ct_hook);
+	if (nfq_ct != NULL && (queue->flags & NFQNL_F_CONNTRACK) &&
+	    nfqa[NFQA_CT]) {
+		ct = nf_ct_get(entry->skb, &ctinfo);
+		if (ct && nf_ct_is_untracked(ct))
+			ct = NULL;
+	}
+
 	if (nfqa[NFQA_PAYLOAD]) {
+		u16 payload_len = nla_len(nfqa[NFQA_PAYLOAD]);
+		int diff = payload_len - entry->skb->len;
+
 		if (nfqnl_mangle(nla_data(nfqa[NFQA_PAYLOAD]),
-				 nla_len(nfqa[NFQA_PAYLOAD]), entry) < 0)
+				 payload_len, entry, diff) < 0)
 			verdict = NF_DROP;
+
+#ifdef CONFIG_NF_NAT_NEEDED
+		/* Adjust sequence numbers to avoid puzzling conntrack */
+		if (ct && (ct->status & IPS_NAT_MASK) && diff)
+			nf_nat_tcp_seq_adjust(skb, ct, ctinfo, diff);
+#endif
 	}
 
 	if (nfqa[NFQA_MARK])
 		entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
 
-	rcu_read_lock();
-	nfq_ct = rcu_dereference(nfq_ct_hook);
-	if (nfq_ct != NULL &&
-	    (queue->flags & NFQNL_F_CONNTRACK) && nfqa[NFQA_CT]) {
-		enum ip_conntrack_info ctinfo;
-		struct nf_conn *ct;
+	if (ct)
+		nfq_ct->parse(nfqa[NFQA_CT], ct);
 
-		ct = nf_ct_get(entry->skb, &ctinfo);
-		if (ct && !nf_ct_is_untracked(ct))
-			nfq_ct->parse(nfqa[NFQA_CT], ct);
-	}
 	rcu_read_unlock();
 
 	nf_reinject(entry, verdict);
-- 
1.7.10


^ permalink raw reply related

* [PATCH 3/7] netfilter: nf_ct_helper: implement variable length helper private data
From: pablo @ 2012-06-04 12:21 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev
In-Reply-To: <1338812485-4232-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

This patch uses the new variable length conntrack extensions.

Instead of using union nf_conntrack_help that contain all the
helper private data information, we allocate variable length
area to store the private helper data.

This patch includes the modification of all existing helpers.
It also includes a couple of include header to avoid compilation
warnings.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/nf_conntrack_sip.h  |    1 +
 include/net/netfilter/nf_conntrack.h        |   35 ++-------------------
 include/net/netfilter/nf_conntrack_helper.h |   15 ++++++++-
 net/ipv4/netfilter/nf_nat_amanda.c          |    4 +--
 net/ipv4/netfilter/nf_nat_h323.c            |    8 ++---
 net/ipv4/netfilter/nf_nat_pptp.c            |    6 ++--
 net/ipv4/netfilter/nf_nat_sip.c             |   14 ++++-----
 net/ipv4/netfilter/nf_nat_tftp.c            |    4 +--
 net/netfilter/nf_conntrack_core.c           |    3 +-
 net/netfilter/nf_conntrack_ftp.c            |    3 +-
 net/netfilter/nf_conntrack_h323_main.c      |   16 ++++++----
 net/netfilter/nf_conntrack_helper.c         |   11 ++++---
 net/netfilter/nf_conntrack_netlink.c        |    4 +--
 net/netfilter/nf_conntrack_pptp.c           |   17 ++++++-----
 net/netfilter/nf_conntrack_proto_gre.c      |   16 +++++-----
 net/netfilter/nf_conntrack_sane.c           |    4 +--
 net/netfilter/nf_conntrack_sip.c            |   29 +++++++++---------
 net/netfilter/xt_CT.c                       |   44 ++++++++++++++++-----------
 18 files changed, 119 insertions(+), 115 deletions(-)

diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index feda699..34e19f8 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -3,6 +3,7 @@
 #ifdef __KERNEL__
 
 #include <linux/types.h>
+#include <net/netfilter/nf_conntrack_expect.h>
 
 #define SIP_PORT	5060
 #define SIP_TIMEOUT	3600
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index cce7f6a..7449bacd 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -39,36 +39,6 @@ union nf_conntrack_expect_proto {
 	/* insert expect proto private data here */
 };
 
-/* Add protocol helper include file here */
-#include <linux/netfilter/nf_conntrack_ftp.h>
-#include <linux/netfilter/nf_conntrack_pptp.h>
-#include <linux/netfilter/nf_conntrack_h323.h>
-#include <linux/netfilter/nf_conntrack_sane.h>
-#include <linux/netfilter/nf_conntrack_sip.h>
-
-/* per conntrack: application helper private data */
-union nf_conntrack_help {
-	/* insert conntrack helper private data (master) here */
-#if defined(CONFIG_NF_CONNTRACK_FTP) || defined(CONFIG_NF_CONNTRACK_FTP_MODULE)
-	struct nf_ct_ftp_master ct_ftp_info;
-#endif
-#if defined(CONFIG_NF_CONNTRACK_PPTP) || \
-    defined(CONFIG_NF_CONNTRACK_PPTP_MODULE)
-	struct nf_ct_pptp_master ct_pptp_info;
-#endif
-#if defined(CONFIG_NF_CONNTRACK_H323) || \
-    defined(CONFIG_NF_CONNTRACK_H323_MODULE)
-	struct nf_ct_h323_master ct_h323_info;
-#endif
-#if defined(CONFIG_NF_CONNTRACK_SANE) || \
-    defined(CONFIG_NF_CONNTRACK_SANE_MODULE)
-	struct nf_ct_sane_master ct_sane_info;
-#endif
-#if defined(CONFIG_NF_CONNTRACK_SIP) || defined(CONFIG_NF_CONNTRACK_SIP_MODULE)
-	struct nf_ct_sip_master ct_sip_info;
-#endif
-};
-
 #include <linux/types.h>
 #include <linux/skbuff.h>
 #include <linux/timer.h>
@@ -89,12 +59,13 @@ struct nf_conn_help {
 	/* Helper. if any */
 	struct nf_conntrack_helper __rcu *helper;
 
-	union nf_conntrack_help help;
-
 	struct hlist_head expectations;
 
 	/* Current number of expected connections */
 	u8 expecting[NF_CT_MAX_EXPECT_CLASSES];
+
+	/* private helper information. */
+	char data[0];
 };
 
 #include <net/netfilter/ipv4/nf_conntrack_ipv4.h>
diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h
index 5f5a4d9..5afdcf2 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -11,6 +11,7 @@
 #define _NF_CONNTRACK_HELPER_H
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_conntrack_extend.h>
+#include <net/netfilter/nf_conntrack_expect.h>
 
 struct module;
 
@@ -23,6 +24,9 @@ struct nf_conntrack_helper {
 	struct module *me;		/* pointer to self */
 	const struct nf_conntrack_expect_policy *expect_policy;
 
+	/* length of internal data, ie. sizeof(struct nf_ct_*_master) */
+	size_t data_len;
+
 	/* Tuple of things we will help (compared against server response) */
 	struct nf_conntrack_tuple tuple;
 
@@ -48,7 +52,7 @@ nf_conntrack_helper_try_module_get(const char *name, u16 l3num, u8 protonum);
 extern int nf_conntrack_helper_register(struct nf_conntrack_helper *);
 extern void nf_conntrack_helper_unregister(struct nf_conntrack_helper *);
 
-extern struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp);
+extern struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, struct nf_conntrack_helper *helper, gfp_t gfp);
 
 extern int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 				     gfp_t flags);
@@ -60,6 +64,15 @@ static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
 	return nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
 }
 
+static inline void *nfct_help_data(const struct nf_conn *ct)
+{
+	struct nf_conn_help *help;
+
+	help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
+
+	return (void *)&help->data;
+}
+
 extern int nf_conntrack_helper_init(struct net *net);
 extern void nf_conntrack_helper_fini(struct net *net);
 
diff --git a/net/ipv4/netfilter/nf_nat_amanda.c b/net/ipv4/netfilter/nf_nat_amanda.c
index 7b22382..8973d6a 100644
--- a/net/ipv4/netfilter/nf_nat_amanda.c
+++ b/net/ipv4/netfilter/nf_nat_amanda.c
@@ -13,11 +13,11 @@
 #include <linux/skbuff.h>
 #include <linux/udp.h>
 
-#include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <linux/netfilter/nf_conntrack_amanda.h>
+#include <net/netfilter/nf_nat_helper.h>
+#include <net/netfilter/nf_nat_rule.h>
 
 MODULE_AUTHOR("Brian J. Murrell <netfilter@interlinx.bc.ca>");
 MODULE_DESCRIPTION("Amanda NAT helper");
diff --git a/net/ipv4/netfilter/nf_nat_h323.c b/net/ipv4/netfilter/nf_nat_h323.c
index 8253670..a83865a 100644
--- a/net/ipv4/netfilter/nf_nat_h323.c
+++ b/net/ipv4/netfilter/nf_nat_h323.c
@@ -99,7 +99,7 @@ static int set_sig_addr(struct sk_buff *skb, struct nf_conn *ct,
 			unsigned char **data,
 			TransportAddress *taddr, int count)
 {
-	const struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	const struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	int i;
 	__be16 port;
@@ -182,7 +182,7 @@ static int nat_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 			struct nf_conntrack_expect *rtp_exp,
 			struct nf_conntrack_expect *rtcp_exp)
 {
-	struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	int i;
 	u_int16_t nated_port;
@@ -337,7 +337,7 @@ static int nat_h245(struct sk_buff *skb, struct nf_conn *ct,
 		    TransportAddress *taddr, __be16 port,
 		    struct nf_conntrack_expect *exp)
 {
-	struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	u_int16_t nated_port = ntohs(port);
 
@@ -427,7 +427,7 @@ static int nat_q931(struct sk_buff *skb, struct nf_conn *ct,
 		    unsigned char **data, TransportAddress *taddr, int idx,
 		    __be16 port, struct nf_conntrack_expect *exp)
 {
-	struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	u_int16_t nated_port = ntohs(port);
 	union nf_inet_addr addr;
diff --git a/net/ipv4/netfilter/nf_nat_pptp.c b/net/ipv4/netfilter/nf_nat_pptp.c
index c273d58..3881408 100644
--- a/net/ipv4/netfilter/nf_nat_pptp.c
+++ b/net/ipv4/netfilter/nf_nat_pptp.c
@@ -49,7 +49,7 @@ static void pptp_nat_expected(struct nf_conn *ct,
 	const struct nf_nat_pptp *nat_pptp_info;
 	struct nf_nat_ipv4_range range;
 
-	ct_pptp_info = &nfct_help(master)->help.ct_pptp_info;
+	ct_pptp_info = nfct_help_data(master);
 	nat_pptp_info = &nfct_nat(master)->help.nat_pptp_info;
 
 	/* And here goes the grand finale of corrosion... */
@@ -123,7 +123,7 @@ pptp_outbound_pkt(struct sk_buff *skb,
 	__be16 new_callid;
 	unsigned int cid_off;
 
-	ct_pptp_info  = &nfct_help(ct)->help.ct_pptp_info;
+	ct_pptp_info = nfct_help_data(ct);
 	nat_pptp_info = &nfct_nat(ct)->help.nat_pptp_info;
 
 	new_callid = ct_pptp_info->pns_call_id;
@@ -192,7 +192,7 @@ pptp_exp_gre(struct nf_conntrack_expect *expect_orig,
 	struct nf_ct_pptp_master *ct_pptp_info;
 	struct nf_nat_pptp *nat_pptp_info;
 
-	ct_pptp_info  = &nfct_help(ct)->help.ct_pptp_info;
+	ct_pptp_info = nfct_help_data(ct);
 	nat_pptp_info = &nfct_nat(ct)->help.nat_pptp_info;
 
 	/* save original PAC call ID in nat_info */
diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/ipv4/netfilter/nf_nat_sip.c
index aafc4b6..f49222e 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/ipv4/netfilter/nf_nat_sip.c
@@ -73,7 +73,7 @@ static int map_addr(struct sk_buff *skb, unsigned int dataoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	char buffer[sizeof("nnn.nnn.nnn.nnn:nnnnn")];
 	unsigned int buflen;
 	__be32 newaddr;
@@ -86,7 +86,7 @@ static int map_addr(struct sk_buff *skb, unsigned int dataoff,
 	} else if (ct->tuplehash[dir].tuple.dst.u3.ip == addr->ip &&
 		   ct->tuplehash[dir].tuple.dst.u.udp.port == port) {
 		newaddr = ct->tuplehash[!dir].tuple.src.u3.ip;
-		newport = help->help.ct_sip_info.forced_dport ? :
+		newport = ct_sip_info->forced_dport ? :
 			  ct->tuplehash[!dir].tuple.src.u.udp.port;
 	} else
 		return 1;
@@ -123,7 +123,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	unsigned int coff, matchoff, matchlen;
 	enum sip_header_types hdr;
 	union nf_inet_addr addr;
@@ -233,14 +233,14 @@ next:
 		return NF_DROP;
 
 	/* Mangle destination port for Cisco phones, then fix up checksums */
-	if (dir == IP_CT_DIR_REPLY && help->help.ct_sip_info.forced_dport) {
+	if (dir == IP_CT_DIR_REPLY && ct_sip_info->forced_dport) {
 		struct udphdr *uh;
 
 		if (!skb_make_writable(skb, skb->len))
 			return NF_DROP;
 
 		uh = (struct udphdr *)(skb->data + ip_hdrlen(skb));
-		uh->dest = help->help.ct_sip_info.forced_dport;
+		uh->dest = ct_sip_info->forced_dport;
 
 		if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, 0, 0, NULL, 0))
 			return NF_DROP;
@@ -297,7 +297,7 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int dataoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	__be32 newip;
 	u_int16_t port;
 	__be16 srcport;
@@ -313,7 +313,7 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int dataoff,
 	/* If the signalling port matches the connection's source port in the
 	 * original direction, try to use the destination port in the opposite
 	 * direction. */
-	srcport = help->help.ct_sip_info.forced_dport ? :
+	srcport = ct_sip_info->forced_dport ? :
 		  ct->tuplehash[dir].tuple.src.u.udp.port;
 	if (exp->tuple.dst.u.udp.port == srcport)
 		port = ntohs(ct->tuplehash[!dir].tuple.dst.u.udp.port);
diff --git a/net/ipv4/netfilter/nf_nat_tftp.c b/net/ipv4/netfilter/nf_nat_tftp.c
index a2901bf..4a1c270 100644
--- a/net/ipv4/netfilter/nf_nat_tftp.c
+++ b/net/ipv4/netfilter/nf_nat_tftp.c
@@ -8,11 +8,11 @@
 #include <linux/module.h>
 #include <linux/udp.h>
 
-#include <net/netfilter/nf_nat_helper.h>
-#include <net/netfilter/nf_nat_rule.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <linux/netfilter/nf_conntrack_tftp.h>
+#include <net/netfilter/nf_nat_helper.h>
+#include <net/netfilter/nf_nat_rule.h>
 
 MODULE_AUTHOR("Magnus Boden <mb@ozaba.mine.nu>");
 MODULE_DESCRIPTION("TFTP NAT helper");
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 32c5909..afa6939 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -822,7 +822,8 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 		__set_bit(IPS_EXPECTED_BIT, &ct->status);
 		ct->master = exp->master;
 		if (exp->helper) {
-			help = nf_ct_helper_ext_add(ct, GFP_ATOMIC);
+			help = nf_ct_helper_ext_add(ct, exp->helper,
+						    GFP_ATOMIC);
 			if (help)
 				rcu_assign_pointer(help->helper, exp->helper);
 		}
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index 44e47c9..4bb771d 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -358,7 +358,7 @@ static int help(struct sk_buff *skb,
 	u32 seq;
 	int dir = CTINFO2DIR(ctinfo);
 	unsigned int uninitialized_var(matchlen), uninitialized_var(matchoff);
-	struct nf_ct_ftp_master *ct_ftp_info = &nfct_help(ct)->help.ct_ftp_info;
+	struct nf_ct_ftp_master *ct_ftp_info = nfct_help_data(ct);
 	struct nf_conntrack_expect *exp;
 	union nf_inet_addr *daddr;
 	struct nf_conntrack_man cmd = {};
@@ -554,6 +554,7 @@ static int __init nf_conntrack_ftp_init(void)
 		ftp[i][0].tuple.src.l3num = PF_INET;
 		ftp[i][1].tuple.src.l3num = PF_INET6;
 		for (j = 0; j < 2; j++) {
+			ftp[i][j].data_len = sizeof(struct nf_ct_ftp_master);
 			ftp[i][j].tuple.src.u.tcp.port = htons(ports[i]);
 			ftp[i][j].tuple.dst.protonum = IPPROTO_TCP;
 			ftp[i][j].expect_policy = &ftp_exp_policy;
diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c
index 471b054..988e9c3 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -114,7 +114,7 @@ static int get_tpkt_data(struct sk_buff *skb, unsigned int protoff,
 			 struct nf_conn *ct, enum ip_conntrack_info ctinfo,
 			 unsigned char **data, int *datalen, int *dataoff)
 {
-	struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	const struct tcphdr *th;
 	struct tcphdr _tcph;
@@ -619,6 +619,7 @@ static const struct nf_conntrack_expect_policy h245_exp_policy = {
 static struct nf_conntrack_helper nf_conntrack_helper_h245 __read_mostly = {
 	.name			= "H.245",
 	.me			= THIS_MODULE,
+	.data_len		= sizeof(struct nf_ct_h323_master),
 	.tuple.src.l3num	= AF_UNSPEC,
 	.tuple.dst.protonum	= IPPROTO_UDP,
 	.help			= h245_help,
@@ -1172,6 +1173,7 @@ static struct nf_conntrack_helper nf_conntrack_helper_q931[] __read_mostly = {
 	{
 		.name			= "Q.931",
 		.me			= THIS_MODULE,
+		.data_len		= sizeof(struct nf_ct_h323_master),
 		.tuple.src.l3num	= AF_INET,
 		.tuple.src.u.tcp.port	= cpu_to_be16(Q931_PORT),
 		.tuple.dst.protonum	= IPPROTO_TCP,
@@ -1247,7 +1249,7 @@ static int expect_q931(struct sk_buff *skb, struct nf_conn *ct,
 		       unsigned char **data,
 		       TransportAddress *taddr, int count)
 {
-	struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	int ret = 0;
 	int i;
@@ -1362,7 +1364,7 @@ static int process_rrq(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
 		       unsigned char **data, RegistrationRequest *rrq)
 {
-	struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int ret;
 	typeof(set_ras_addr_hook) set_ras_addr;
 
@@ -1397,7 +1399,7 @@ static int process_rcf(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
 		       unsigned char **data, RegistrationConfirm *rcf)
 {
-	struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	int ret;
 	struct nf_conntrack_expect *exp;
@@ -1446,7 +1448,7 @@ static int process_urq(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
 		       unsigned char **data, UnregistrationRequest *urq)
 {
-	struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	int ret;
 	typeof(set_sig_addr_hook) set_sig_addr;
@@ -1478,7 +1480,7 @@ static int process_arq(struct sk_buff *skb, struct nf_conn *ct,
 		       enum ip_conntrack_info ctinfo,
 		       unsigned char **data, AdmissionRequest *arq)
 {
-	const struct nf_ct_h323_master *info = &nfct_help(ct)->help.ct_h323_info;
+	const struct nf_ct_h323_master *info = nfct_help_data(ct);
 	int dir = CTINFO2DIR(ctinfo);
 	__be16 port;
 	union nf_inet_addr addr;
@@ -1746,6 +1748,7 @@ static struct nf_conntrack_helper nf_conntrack_helper_ras[] __read_mostly = {
 	{
 		.name			= "RAS",
 		.me			= THIS_MODULE,
+		.data_len		= sizeof(struct nf_ct_h323_master),
 		.tuple.src.l3num	= AF_INET,
 		.tuple.src.u.udp.port	= cpu_to_be16(RAS_PORT),
 		.tuple.dst.protonum	= IPPROTO_UDP,
@@ -1755,6 +1758,7 @@ static struct nf_conntrack_helper nf_conntrack_helper_ras[] __read_mostly = {
 	{
 		.name			= "RAS",
 		.me			= THIS_MODULE,
+		.data_len		= sizeof(struct nf_ct_h323_master),
 		.tuple.src.l3num	= AF_INET6,
 		.tuple.src.u.udp.port	= cpu_to_be16(RAS_PORT),
 		.tuple.dst.protonum	= IPPROTO_UDP,
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 4fa2ff9..e0d3f4b 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -161,11 +161,14 @@ nf_conntrack_helper_try_module_get(const char *name, u16 l3num, u8 protonum)
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_helper_try_module_get);
 
-struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp)
+struct nf_conn_help *
+nf_ct_helper_ext_add(struct nf_conn *ct,
+		     struct nf_conntrack_helper *helper, gfp_t gfp)
 {
 	struct nf_conn_help *help;
 
-	help = nf_ct_ext_add(ct, NF_CT_EXT_HELPER, gfp);
+	help = nf_ct_ext_add_length(ct, NF_CT_EXT_HELPER,
+				    helper->data_len, gfp);
 	if (help)
 		INIT_HLIST_HEAD(&help->expectations);
 	else
@@ -218,13 +221,13 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 	}
 
 	if (help == NULL) {
-		help = nf_ct_helper_ext_add(ct, flags);
+		help = nf_ct_helper_ext_add(ct, helper, flags);
 		if (help == NULL) {
 			ret = -ENOMEM;
 			goto out;
 		}
 	} else {
-		memset(&help->help, 0, sizeof(help->help));
+		memset(&help->data, 0, sizeof(helper->data_len));
 	}
 
 	rcu_assign_pointer(help->helper, helper);
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 6f4b00a..30f5e12 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1218,7 +1218,7 @@ ctnetlink_change_helper(struct nf_conn *ct, const struct nlattr * const cda[])
 		if (help->helper)
 			return -EBUSY;
 		/* need to zero data of old helper */
-		memset(&help->help, 0, sizeof(help->help));
+		memset(&help->data, 0, help->helper->data_len);
 	} else {
 		/* we cannot set a helper for an existing conntrack */
 		return -EOPNOTSUPP;
@@ -1440,7 +1440,7 @@ ctnetlink_create_conntrack(struct net *net, u16 zone,
 		} else {
 			struct nf_conn_help *help;
 
-			help = nf_ct_helper_ext_add(ct, GFP_ATOMIC);
+			help = nf_ct_helper_ext_add(ct, helper, GFP_ATOMIC);
 			if (help == NULL) {
 				err = -ENOMEM;
 				goto err2;
diff --git a/net/netfilter/nf_conntrack_pptp.c b/net/netfilter/nf_conntrack_pptp.c
index 31d56b2..6fed9ec 100644
--- a/net/netfilter/nf_conntrack_pptp.c
+++ b/net/netfilter/nf_conntrack_pptp.c
@@ -174,7 +174,7 @@ static int destroy_sibling_or_exp(struct net *net, struct nf_conn *ct,
 static void pptp_destroy_siblings(struct nf_conn *ct)
 {
 	struct net *net = nf_ct_net(ct);
-	const struct nf_conn_help *help = nfct_help(ct);
+	const struct nf_ct_pptp_master *ct_pptp_info = nfct_help_data(ct);
 	struct nf_conntrack_tuple t;
 
 	nf_ct_gre_keymap_destroy(ct);
@@ -182,16 +182,16 @@ static void pptp_destroy_siblings(struct nf_conn *ct)
 	/* try original (pns->pac) tuple */
 	memcpy(&t, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, sizeof(t));
 	t.dst.protonum = IPPROTO_GRE;
-	t.src.u.gre.key = help->help.ct_pptp_info.pns_call_id;
-	t.dst.u.gre.key = help->help.ct_pptp_info.pac_call_id;
+	t.src.u.gre.key = ct_pptp_info->pns_call_id;
+	t.dst.u.gre.key = ct_pptp_info->pac_call_id;
 	if (!destroy_sibling_or_exp(net, ct, &t))
 		pr_debug("failed to timeout original pns->pac ct/exp\n");
 
 	/* try reply (pac->pns) tuple */
 	memcpy(&t, &ct->tuplehash[IP_CT_DIR_REPLY].tuple, sizeof(t));
 	t.dst.protonum = IPPROTO_GRE;
-	t.src.u.gre.key = help->help.ct_pptp_info.pac_call_id;
-	t.dst.u.gre.key = help->help.ct_pptp_info.pns_call_id;
+	t.src.u.gre.key = ct_pptp_info->pac_call_id;
+	t.dst.u.gre.key = ct_pptp_info->pns_call_id;
 	if (!destroy_sibling_or_exp(net, ct, &t))
 		pr_debug("failed to timeout reply pac->pns ct/exp\n");
 }
@@ -269,7 +269,7 @@ pptp_inbound_pkt(struct sk_buff *skb,
 		 struct nf_conn *ct,
 		 enum ip_conntrack_info ctinfo)
 {
-	struct nf_ct_pptp_master *info = &nfct_help(ct)->help.ct_pptp_info;
+	struct nf_ct_pptp_master *info = nfct_help_data(ct);
 	u_int16_t msg;
 	__be16 cid = 0, pcid = 0;
 	typeof(nf_nat_pptp_hook_inbound) nf_nat_pptp_inbound;
@@ -396,7 +396,7 @@ pptp_outbound_pkt(struct sk_buff *skb,
 		  struct nf_conn *ct,
 		  enum ip_conntrack_info ctinfo)
 {
-	struct nf_ct_pptp_master *info = &nfct_help(ct)->help.ct_pptp_info;
+	struct nf_ct_pptp_master *info = nfct_help_data(ct);
 	u_int16_t msg;
 	__be16 cid = 0, pcid = 0;
 	typeof(nf_nat_pptp_hook_outbound) nf_nat_pptp_outbound;
@@ -506,7 +506,7 @@ conntrack_pptp_help(struct sk_buff *skb, unsigned int protoff,
 
 {
 	int dir = CTINFO2DIR(ctinfo);
-	const struct nf_ct_pptp_master *info = &nfct_help(ct)->help.ct_pptp_info;
+	const struct nf_ct_pptp_master *info = nfct_help_data(ct);
 	const struct tcphdr *tcph;
 	struct tcphdr _tcph;
 	const struct pptp_pkt_hdr *pptph;
@@ -592,6 +592,7 @@ static const struct nf_conntrack_expect_policy pptp_exp_policy = {
 static struct nf_conntrack_helper pptp __read_mostly = {
 	.name			= "pptp",
 	.me			= THIS_MODULE,
+	.data_len		= sizeof(struct nf_ct_pptp_master),
 	.tuple.src.l3num	= AF_INET,
 	.tuple.src.u.tcp.port	= cpu_to_be16(PPTP_CONTROL_PORT),
 	.tuple.dst.protonum	= IPPROTO_TCP,
diff --git a/net/netfilter/nf_conntrack_proto_gre.c b/net/netfilter/nf_conntrack_proto_gre.c
index 4bf6b4e..c04a363 100644
--- a/net/netfilter/nf_conntrack_proto_gre.c
+++ b/net/netfilter/nf_conntrack_proto_gre.c
@@ -110,10 +110,10 @@ int nf_ct_gre_keymap_add(struct nf_conn *ct, enum ip_conntrack_dir dir,
 {
 	struct net *net = nf_ct_net(ct);
 	struct netns_proto_gre *net_gre = net_generic(net, proto_gre_net_id);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_pptp_master *ct_pptp_info = nfct_help_data(ct);
 	struct nf_ct_gre_keymap **kmp, *km;
 
-	kmp = &help->help.ct_pptp_info.keymap[dir];
+	kmp = &ct_pptp_info->keymap[dir];
 	if (*kmp) {
 		/* check whether it's a retransmission */
 		read_lock_bh(&net_gre->keymap_lock);
@@ -151,19 +151,19 @@ void nf_ct_gre_keymap_destroy(struct nf_conn *ct)
 {
 	struct net *net = nf_ct_net(ct);
 	struct netns_proto_gre *net_gre = net_generic(net, proto_gre_net_id);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_pptp_master *ct_pptp_info = nfct_help_data(ct);
 	enum ip_conntrack_dir dir;
 
 	pr_debug("entering for ct %p\n", ct);
 
 	write_lock_bh(&net_gre->keymap_lock);
 	for (dir = IP_CT_DIR_ORIGINAL; dir < IP_CT_DIR_MAX; dir++) {
-		if (help->help.ct_pptp_info.keymap[dir]) {
+		if (ct_pptp_info->keymap[dir]) {
 			pr_debug("removing %p from list\n",
-				 help->help.ct_pptp_info.keymap[dir]);
-			list_del(&help->help.ct_pptp_info.keymap[dir]->list);
-			kfree(help->help.ct_pptp_info.keymap[dir]);
-			help->help.ct_pptp_info.keymap[dir] = NULL;
+				 ct_pptp_info->keymap[dir]);
+			list_del(&ct_pptp_info->keymap[dir]->list);
+			kfree(ct_pptp_info->keymap[dir]);
+			ct_pptp_info->keymap[dir] = NULL;
 		}
 	}
 	write_unlock_bh(&net_gre->keymap_lock);
diff --git a/net/netfilter/nf_conntrack_sane.c b/net/netfilter/nf_conntrack_sane.c
index ec3fc18..295429f 100644
--- a/net/netfilter/nf_conntrack_sane.c
+++ b/net/netfilter/nf_conntrack_sane.c
@@ -69,13 +69,12 @@ static int help(struct sk_buff *skb,
 	void *sb_ptr;
 	int ret = NF_ACCEPT;
 	int dir = CTINFO2DIR(ctinfo);
-	struct nf_ct_sane_master *ct_sane_info;
+	struct nf_ct_sane_master *ct_sane_info = nfct_help_data(ct);
 	struct nf_conntrack_expect *exp;
 	struct nf_conntrack_tuple *tuple;
 	struct sane_request *req;
 	struct sane_reply_net_start *reply;
 
-	ct_sane_info = &nfct_help(ct)->help.ct_sane_info;
 	/* Until there's been traffic both ways, don't look in packets. */
 	if (ctinfo != IP_CT_ESTABLISHED &&
 	    ctinfo != IP_CT_ESTABLISHED_REPLY)
@@ -203,6 +202,7 @@ static int __init nf_conntrack_sane_init(void)
 		sane[i][0].tuple.src.l3num = PF_INET;
 		sane[i][1].tuple.src.l3num = PF_INET6;
 		for (j = 0; j < 2; j++) {
+			sane[i][j].data_len = sizeof(struct nf_ct_sane_master);
 			sane[i][j].tuple.src.u.tcp.port = htons(ports[i]);
 			sane[i][j].tuple.dst.protonum = IPPROTO_TCP;
 			sane[i][j].expect_policy = &sane_exp_policy;
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index c2daabe..a627863 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1075,12 +1075,12 @@ static int process_invite_response(struct sk_buff *skb, unsigned int dataoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 
 	if ((code >= 100 && code <= 199) ||
 	    (code >= 200 && code <= 299))
 		return process_sdp(skb, dataoff, dptr, datalen, cseq);
-	else if (help->help.ct_sip_info.invite_cseq == cseq)
+	else if (ct_sip_info->invite_cseq == cseq)
 		flush_expectations(ct, true);
 	return NF_ACCEPT;
 }
@@ -1091,12 +1091,12 @@ static int process_update_response(struct sk_buff *skb, unsigned int dataoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 
 	if ((code >= 100 && code <= 199) ||
 	    (code >= 200 && code <= 299))
 		return process_sdp(skb, dataoff, dptr, datalen, cseq);
-	else if (help->help.ct_sip_info.invite_cseq == cseq)
+	else if (ct_sip_info->invite_cseq == cseq)
 		flush_expectations(ct, true);
 	return NF_ACCEPT;
 }
@@ -1107,12 +1107,12 @@ static int process_prack_response(struct sk_buff *skb, unsigned int dataoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 
 	if ((code >= 100 && code <= 199) ||
 	    (code >= 200 && code <= 299))
 		return process_sdp(skb, dataoff, dptr, datalen, cseq);
-	else if (help->help.ct_sip_info.invite_cseq == cseq)
+	else if (ct_sip_info->invite_cseq == cseq)
 		flush_expectations(ct, true);
 	return NF_ACCEPT;
 }
@@ -1123,13 +1123,13 @@ static int process_invite_request(struct sk_buff *skb, unsigned int dataoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	unsigned int ret;
 
 	flush_expectations(ct, true);
 	ret = process_sdp(skb, dataoff, dptr, datalen, cseq);
 	if (ret == NF_ACCEPT)
-		help->help.ct_sip_info.invite_cseq = cseq;
+		ct_sip_info->invite_cseq = cseq;
 	return ret;
 }
 
@@ -1154,7 +1154,7 @@ static int process_register_request(struct sk_buff *skb, unsigned int dataoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
 	unsigned int matchoff, matchlen;
 	struct nf_conntrack_expect *exp;
@@ -1235,7 +1235,7 @@ static int process_register_request(struct sk_buff *skb, unsigned int dataoff,
 
 store_cseq:
 	if (ret == NF_ACCEPT)
-		help->help.ct_sip_info.register_cseq = cseq;
+		ct_sip_info->register_cseq = cseq;
 	return ret;
 }
 
@@ -1245,7 +1245,7 @@ static int process_register_response(struct sk_buff *skb, unsigned int dataoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
 	union nf_inet_addr addr;
 	__be16 port;
@@ -1262,7 +1262,7 @@ static int process_register_response(struct sk_buff *skb, unsigned int dataoff,
 	 * responses, so we store the sequence number of the last valid
 	 * request and compare it here.
 	 */
-	if (help->help.ct_sip_info.register_cseq != cseq)
+	if (ct_sip_info->register_cseq != cseq)
 		return NF_ACCEPT;
 
 	if (code >= 100 && code <= 199)
@@ -1363,7 +1363,7 @@ static int process_sip_request(struct sk_buff *skb, unsigned int dataoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
-	struct nf_conn_help *help = nfct_help(ct);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
 	unsigned int matchoff, matchlen;
 	unsigned int cseq, i;
@@ -1381,7 +1381,7 @@ static int process_sip_request(struct sk_buff *skb, unsigned int dataoff,
 				    &matchlen, &addr, &port) > 0 &&
 	    port != ct->tuplehash[dir].tuple.src.u.udp.port &&
 	    nf_inet_addr_cmp(&addr, &ct->tuplehash[dir].tuple.src.u3))
-		help->help.ct_sip_info.forced_dport = port;
+		ct_sip_info->forced_dport = port;
 
 	for (i = 0; i < ARRAY_SIZE(sip_handlers); i++) {
 		const struct sip_handler *handler;
@@ -1595,6 +1595,7 @@ static int __init nf_conntrack_sip_init(void)
 		sip[i][3].help = sip_help_tcp;
 
 		for (j = 0; j < ARRAY_SIZE(sip[i]); j++) {
+			sip[i][j].data_len = sizeof(struct nf_ct_sip_master);
 			sip[i][j].tuple.src.u.udp.port = htons(ports[i]);
 			sip[i][j].expect_policy = sip_exp_policy;
 			sip[i][j].expect_class_max = SIP_EXPECT_MAX;
diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index 3746d8b..3641e1b 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -113,6 +113,8 @@ static int xt_ct_tg_check_v0(const struct xt_tgchk_param *par)
 		goto err3;
 
 	if (info->helper[0]) {
+		struct nf_conntrack_helper *helper;
+
 		ret = -ENOENT;
 		proto = xt_ct_find_proto(par);
 		if (!proto) {
@@ -121,19 +123,21 @@ static int xt_ct_tg_check_v0(const struct xt_tgchk_param *par)
 			goto err3;
 		}
 
-		ret = -ENOMEM;
-		help = nf_ct_helper_ext_add(ct, GFP_KERNEL);
-		if (help == NULL)
-			goto err3;
-
 		ret = -ENOENT;
-		help->helper = nf_conntrack_helper_try_module_get(info->helper,
-								  par->family,
-								  proto);
-		if (help->helper == NULL) {
+		helper = nf_conntrack_helper_try_module_get(info->helper,
+							    par->family,
+							    proto);
+		if (helper == NULL) {
 			pr_info("No such helper \"%s\"\n", info->helper);
 			goto err3;
 		}
+
+		ret = -ENOMEM;
+		help = nf_ct_helper_ext_add(ct, helper, GFP_KERNEL);
+		if (help == NULL)
+			goto err3;
+
+		help->helper = helper;
 	}
 
 	__set_bit(IPS_TEMPLATE_BIT, &ct->status);
@@ -203,6 +207,8 @@ static int xt_ct_tg_check_v1(const struct xt_tgchk_param *par)
 		goto err3;
 
 	if (info->helper[0]) {
+		struct nf_conntrack_helper *helper;
+
 		ret = -ENOENT;
 		proto = xt_ct_find_proto(par);
 		if (!proto) {
@@ -211,19 +217,21 @@ static int xt_ct_tg_check_v1(const struct xt_tgchk_param *par)
 			goto err3;
 		}
 
-		ret = -ENOMEM;
-		help = nf_ct_helper_ext_add(ct, GFP_KERNEL);
-		if (help == NULL)
-			goto err3;
-
 		ret = -ENOENT;
-		help->helper = nf_conntrack_helper_try_module_get(info->helper,
-								  par->family,
-								  proto);
-		if (help->helper == NULL) {
+		helper = nf_conntrack_helper_try_module_get(info->helper,
+							    par->family,
+							    proto);
+		if (helper == NULL) {
 			pr_info("No such helper \"%s\"\n", info->helper);
 			goto err3;
 		}
+
+		ret = -ENOMEM;
+		help = nf_ct_helper_ext_add(ct, helper, GFP_KERNEL);
+		if (help == NULL)
+			goto err3;
+
+		help->helper = helper;
 	}
 
 #ifdef CONFIG_NF_CONNTRACK_TIMEOUT
-- 
1.7.10


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox