* vxlan in Linux kernel 3.7
From: Qin, Xiaohong @ 2012-12-27 18:42 UTC (permalink / raw)
To: netdev@vger.kernel.org
Hi All,
I have installed kernel 3.7 on my Linux box, see the following uname -a output,
uname -a
Linux c210-m2-sib-3 3.7.0-030700-generic #201212102335 SMP Tue Dec 11 04:36:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Does that mean I've got VXLAN module loaded or I still need to go through some extra steps to enable or configure it? Do you have any VXLAN setup or configuration document by chance?
Thanks.
Dennis Qin
P.S. If this is not the right place to ask this kind of questions, please let me know which mailing list I should use.
^ permalink raw reply
* Re: vxlan in Linux kernel 3.7
From: Stephen Hemminger @ 2012-12-27 18:49 UTC (permalink / raw)
To: Qin, Xiaohong; +Cc: netdev@vger.kernel.org
In-Reply-To: <A3CA455BB4F1DA4E92CB43AAF0E4BB1D0667E12B@MX01A.corp.emc.com>
On Thu, 27 Dec 2012 13:42:51 -0500
"Qin, Xiaohong" <Xiaohong.Qin@emc.com> wrote:
> Hi All,
>
> I have installed kernel 3.7 on my Linux box, see the following uname -a output,
>
> uname -a
> Linux c210-m2-sib-3 3.7.0-030700-generic #201212102335 SMP Tue Dec 11 04:36:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Does that mean I've got VXLAN module loaded or I still need to go through some extra steps to enable or configure it? Do you have any VXLAN setup or configuration document by chance?
>
> Thanks.
>
> Dennis Qin
>
> P.S. If this is not the right place to ask this kind of questions, please let me know which mailing list I should use.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
VXLAN driver is part of the kernel config.
If using a vendor supplied kernel, most likely it is available as a module.
Try:
/sbin/modinfo vxlan
If you see 'ERROR: Module vxlan not found' then vxlan was not configured.
You will also need to have current iproute2 utilities.
$ ip -V
ip utility, iproute2-ss121211
^ permalink raw reply
* RE: vxlan in Linux kernel 3.7
From: Qin, Xiaohong @ 2012-12-27 18:56 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20121227104916.4fb2af04@nehalam.linuxnetplumber.net>
Hi Stephen,
Thanks very much for the tip. Here is the output on my box,
/sbin/modinfo vxlan
filename: /lib/modules/3.7.0-030700-generic/kernel/drivers/net/vxlan.ko
alias: rtnl-link-vxlan
author: Stephen Hemminger <shemminger@vyatta.com>
version: 0.1
license: GPL
srcversion: D5253D8FFAF3FEF6A3A3026
depends:
intree: Y
vermagic: 3.7.0-030700-generic SMP mod_unload modversions
parm: udp_port:Destination UDP port (uint)
parm: log_ecn_error:Log packets received with corrupted ECN (bool)
# ip -V
ip utility, iproute2-ss111117
So I think I'm all set to give it a try?
Thanks.
Dennis Qin
-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Stephen Hemminger
Sent: Thursday, December 27, 2012 10:49 AM
To: Qin, Xiaohong
Cc: netdev@vger.kernel.org
Subject: Re: vxlan in Linux kernel 3.7
On Thu, 27 Dec 2012 13:42:51 -0500
"Qin, Xiaohong" <Xiaohong.Qin@emc.com> wrote:
> Hi All,
>
> I have installed kernel 3.7 on my Linux box, see the following uname
> -a output,
>
> uname -a
> Linux c210-m2-sib-3 3.7.0-030700-generic #201212102335 SMP Tue Dec 11
> 04:36:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Does that mean I've got VXLAN module loaded or I still need to go through some extra steps to enable or configure it? Do you have any VXLAN setup or configuration document by chance?
>
> Thanks.
>
> Dennis Qin
>
> P.S. If this is not the right place to ask this kind of questions, please let me know which mailing list I should use.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org More majordomo info
> at http://vger.kernel.org/majordomo-info.html
VXLAN driver is part of the kernel config.
If using a vendor supplied kernel, most likely it is available as a module.
Try:
/sbin/modinfo vxlan
If you see 'ERROR: Module vxlan not found' then vxlan was not configured.
You will also need to have current iproute2 utilities.
$ ip -V
ip utility, iproute2-ss121211
--
To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 4/4 v2] net/smsc911x: Provide common clock functionality
From: Lee Jones @ 2012-12-27 19:31 UTC (permalink / raw)
To: Linus Walleij
Cc: Steve Glendinning, Robert Marklund, linus.walleij, arnd, netdev,
linux-kernel, Russell King - ARM Linux, linux-arm-kernel
In-Reply-To: <CACRpkdZGC=f0X1wLt1QLMB=x+ba6VRDAbRcW6wR_2gvNLuGw1g@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 1216 bytes --]
No, you're right, I'm a moron.
Will fix up and resend when I'm back to work.
Sent from my mobile Linux device.
On Dec 26, 2012 12:51 AM, "Linus Walleij" <linus.walleij@linaro.org> wrote:
> On Fri, Dec 21, 2012 at 12:41 PM, Lee Jones <lee.jones@linaro.org> wrote:
>
> > + if (IS_ERR(pdata->clk)) {
> > + ret = clk_prepare_enable(pdata->clk);
> > + if (ret < 0)
> > + netdev_err(ndev, "failed to enable clock %d\n",
> ret);
> > + }
>
> I think you got all of these backwards now, shouldn't it be if
> (!IS_ERR(pdata->clk)) { } ...?
>
> It's late here but enlighten me if I don't get it.
>
> > + if (IS_ERR(pdata->clk))
> > + clk_disable_unprepare(pdata->clk);
>
> Dito.
>
> > + /* Request clock */
> > + pdata->clk = clk_get(&pdev->dev, NULL);
> > + if (IS_ERR(pdata->clk))
> > + netdev_warn(ndev, "couldn't get clock %li\n",
> PTR_ERR(pdata->clk));
>
> This one seems correct though.
>
> > + /* Free clock */
> > + if (IS_ERR(pdata->clk)) {
> > + clk_put(pdata->clk);
> > + pdata->clk = NULL;
> > + }
>
> Should be !IS_ERR()
>
> Yours,
> Linus Walleij
>
[-- Attachment #1.2: Type: text/html, Size: 1767 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: Larry Finger @ 2012-12-27 19:42 UTC (permalink / raw)
To: linville-2XuSBdqkA4R54TAoqtyWWQ, davem-fT/PcQaiUtIeIZ0/mPfg9Q
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, Larry Finger,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
With 3.8-rc1, the first call of pci_map_single() that is not checked
with a corresponding pci_dma_mapping_error() call results in a warning
with a splat as follows:
WARNING: at lib/dma-debug.c:933 check_unmap+0x480/0x950()
Hardware name: HP Pavilion dv2700 Notebook PC
forcedeth 0000:00:0a.0: DMA-API: device driver failed to check
map error[device address=0x00000000b176e002] [size=90 bytes] [mapped as single]
Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
---
drivers/net/ethernet/nvidia/forcedeth.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 653487d..de39cf2 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -1821,6 +1821,11 @@ static int nv_alloc_rx(struct net_device *dev)
skb->data,
skb_tailroom(skb),
PCI_DMA_FROMDEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ np->put_rx_ctx->dma)) {
+ dev_kfree_skb_any(skb);
+ goto packet_dropped;
+ }
np->put_rx_ctx->dma_len = skb_tailroom(skb);
np->put_rx.orig->buf = cpu_to_le32(np->put_rx_ctx->dma);
wmb();
@@ -1830,6 +1835,7 @@ static int nv_alloc_rx(struct net_device *dev)
if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
np->put_rx_ctx = np->first_rx_ctx;
} else {
+packet_dropped:
u64_stats_update_begin(&np->swstats_rx_syncp);
np->stat_rx_dropped++;
u64_stats_update_end(&np->swstats_rx_syncp);
@@ -1856,6 +1862,11 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
skb->data,
skb_tailroom(skb),
PCI_DMA_FROMDEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ np->put_rx_ctx->dma)) {
+ dev_kfree_skb_any(skb);
+ goto packet_dropped;
+ }
np->put_rx_ctx->dma_len = skb_tailroom(skb);
np->put_rx.ex->bufhigh = cpu_to_le32(dma_high(np->put_rx_ctx->dma));
np->put_rx.ex->buflow = cpu_to_le32(dma_low(np->put_rx_ctx->dma));
@@ -1866,6 +1877,7 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
np->put_rx_ctx = np->first_rx_ctx;
} else {
+packet_dropped:
u64_stats_update_begin(&np->swstats_rx_syncp);
np->stat_rx_dropped++;
u64_stats_update_end(&np->swstats_rx_syncp);
@@ -2217,6 +2229,9 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
PCI_DMA_TODEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ np->put_tx_ctx->dma))
+ return NETDEV_TX_BUSY;
np->put_tx_ctx->dma_len = bcnt;
np->put_tx_ctx->dma_single = 1;
put_tx->buf = cpu_to_le32(np->put_tx_ctx->dma);
@@ -2337,6 +2352,9 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
PCI_DMA_TODEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ np->put_tx_ctx->dma))
+ return NETDEV_TX_BUSY;
np->put_tx_ctx->dma_len = bcnt;
np->put_tx_ctx->dma_single = 1;
put_tx->bufhigh = cpu_to_le32(dma_high(np->put_tx_ctx->dma));
@@ -5003,6 +5021,11 @@ static int nv_loopback_test(struct net_device *dev)
test_dma_addr = pci_map_single(np->pci_dev, tx_skb->data,
skb_tailroom(tx_skb),
PCI_DMA_FROMDEVICE);
+ if (pci_dma_mapping_error(np->pci_dev,
+ test_dma_addr)) {
+ dev_kfree_skb_any(tx_skb);
+ goto out;
+ }
pkt_data = skb_put(tx_skb, pkt_len);
for (i = 0; i < pkt_len; i++)
pkt_data[i] = (u8)(i & 0xff);
--
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: vxlan in Linux kernel 3.7
From: Stephen Hemminger @ 2012-12-27 19:43 UTC (permalink / raw)
To: Qin, Xiaohong; +Cc: netdev@vger.kernel.org
In-Reply-To: <A3CA455BB4F1DA4E92CB43AAF0E4BB1D0667E12C@MX01A.corp.emc.com>
On Thu, 27 Dec 2012 13:56:51 -0500
"Qin, Xiaohong" <Xiaohong.Qin@emc.com> wrote:
> # ip -V
> ip utility, iproute2-ss111117
>
> So I think I'm all set to give it a try?
That version is way to old to know about vxlan configuration.
The last digits of the iproute version are the date it was
released. That date is November 17 2011 which is the 3.1 version.
You need 3.7 which was released on December 11 2012, or you can
use latest from iproute2 git repository.
^ permalink raw reply
* Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: Eric Dumazet @ 2012-12-27 20:05 UTC (permalink / raw)
To: Larry Finger; +Cc: linville, davem, linux-wireless, netdev, linux-kernel
In-Reply-To: <1356637327-4884-1-git-send-email-Larry.Finger@lwfinger.net>
On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote:
> With 3.8-rc1, the first call of pci_map_single() that is not checked
> with a corresponding pci_dma_mapping_error() call results in a warning
> with a splat as follows:
>
> WARNING: at lib/dma-debug.c:933 check_unmap+0x480/0x950()
> Hardware name: HP Pavilion dv2700 Notebook PC
> forcedeth 0000:00:0a.0: DMA-API: device driver failed to check
> map error[device address=0x00000000b176e002] [size=90 bytes] [mapped as single]
>
> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
> ---
> drivers/net/ethernet/nvidia/forcedeth.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
> index 653487d..de39cf2 100644
> --- a/drivers/net/ethernet/nvidia/forcedeth.c
> +++ b/drivers/net/ethernet/nvidia/forcedeth.c
> @@ -1821,6 +1821,11 @@ static int nv_alloc_rx(struct net_device *dev)
> skb->data,
> skb_tailroom(skb),
> PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + np->put_rx_ctx->dma)) {
> + dev_kfree_skb_any(skb);
skb has no destructor yet, kfree_skb(skb) should be fine
> + goto packet_dropped;
> + }
> np->put_rx_ctx->dma_len = skb_tailroom(skb);
> np->put_rx.orig->buf = cpu_to_le32(np->put_rx_ctx->dma);
> wmb();
> @@ -1830,6 +1835,7 @@ static int nv_alloc_rx(struct net_device *dev)
> if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
> np->put_rx_ctx = np->first_rx_ctx;
> } else {
> +packet_dropped:
> u64_stats_update_begin(&np->swstats_rx_syncp);
> np->stat_rx_dropped++;
> u64_stats_update_end(&np->swstats_rx_syncp);
> @@ -1856,6 +1862,11 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
> skb->data,
> skb_tailroom(skb),
> PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + np->put_rx_ctx->dma)) {
> + dev_kfree_skb_any(skb);
> + goto packet_dropped;
> + }
> np->put_rx_ctx->dma_len = skb_tailroom(skb);
> np->put_rx.ex->bufhigh = cpu_to_le32(dma_high(np->put_rx_ctx->dma));
> np->put_rx.ex->buflow = cpu_to_le32(dma_low(np->put_rx_ctx->dma));
> @@ -1866,6 +1877,7 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
> if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
> np->put_rx_ctx = np->first_rx_ctx;
> } else {
> +packet_dropped:
> u64_stats_update_begin(&np->swstats_rx_syncp);
> np->stat_rx_dropped++;
> u64_stats_update_end(&np->swstats_rx_syncp);
> @@ -2217,6 +2229,9 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
> bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
> np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
> PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + np->put_tx_ctx->dma))
> + return NETDEV_TX_BUSY;
Really this is not going to work very well : caller will call this in a
loop.
> np->put_tx_ctx->dma_len = bcnt;
> np->put_tx_ctx->dma_single = 1;
> put_tx->buf = cpu_to_le32(np->put_tx_ctx->dma);
> @@ -2337,6 +2352,9 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
> bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
> np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
> PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + np->put_tx_ctx->dma))
> + return NETDEV_TX_BUSY;
same problem here.
> np->put_tx_ctx->dma_len = bcnt;
> np->put_tx_ctx->dma_single = 1;
> put_tx->bufhigh = cpu_to_le32(dma_high(np->put_tx_ctx->dma));
> @@ -5003,6 +5021,11 @@ static int nv_loopback_test(struct net_device *dev)
> test_dma_addr = pci_map_single(np->pci_dev, tx_skb->data,
> skb_tailroom(tx_skb),
> PCI_DMA_FROMDEVICE);
> + if (pci_dma_mapping_error(np->pci_dev,
> + test_dma_addr)) {
> + dev_kfree_skb_any(tx_skb);
kfree_skb(skb);
> + goto out;
> + }
> pkt_data = skb_put(tx_skb, pkt_len);
> for (i = 0; i < pkt_len; i++)
> pkt_data[i] = (u8)(i & 0xff);
^ permalink raw reply
* Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: Larry Finger @ 2012-12-27 20:38 UTC (permalink / raw)
To: Eric Dumazet
Cc: linville-2XuSBdqkA4R54TAoqtyWWQ, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1356638715.30414.1349.camel@edumazet-glaptop>
On 12/27/2012 02:05 PM, Eric Dumazet wrote:
> On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote:
>> With 3.8-rc1, the first call of pci_map_single() that is not checked
>> with a corresponding pci_dma_mapping_error() call results in a warning
>> with a splat as follows:
>>
>> WARNING: at lib/dma-debug.c:933 check_unmap+0x480/0x950()
>> Hardware name: HP Pavilion dv2700 Notebook PC
>> forcedeth 0000:00:0a.0: DMA-API: device driver failed to check
>> map error[device address=0x00000000b176e002] [size=90 bytes] [mapped as single]
>>
>> Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
>> ---
>> drivers/net/ethernet/nvidia/forcedeth.c | 23 +++++++++++++++++++++++
>> 1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
>> index 653487d..de39cf2 100644
>> --- a/drivers/net/ethernet/nvidia/forcedeth.c
>> +++ b/drivers/net/ethernet/nvidia/forcedeth.c
>> @@ -1821,6 +1821,11 @@ static int nv_alloc_rx(struct net_device *dev)
>> skb->data,
>> skb_tailroom(skb),
>> PCI_DMA_FROMDEVICE);
>> + if (pci_dma_mapping_error(np->pci_dev,
>> + np->put_rx_ctx->dma)) {
>> + dev_kfree_skb_any(skb);
>
> skb has no destructor yet, kfree_skb(skb) should be fine
OK.
>
>> + goto packet_dropped;
>> + }
>> np->put_rx_ctx->dma_len = skb_tailroom(skb);
>> np->put_rx.orig->buf = cpu_to_le32(np->put_rx_ctx->dma);
>> wmb();
>> @@ -1830,6 +1835,7 @@ static int nv_alloc_rx(struct net_device *dev)
>> if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
>> np->put_rx_ctx = np->first_rx_ctx;
>> } else {
>> +packet_dropped:
>> u64_stats_update_begin(&np->swstats_rx_syncp);
>> np->stat_rx_dropped++;
>> u64_stats_update_end(&np->swstats_rx_syncp);
>> @@ -1856,6 +1862,11 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
>> skb->data,
>> skb_tailroom(skb),
>> PCI_DMA_FROMDEVICE);
>> + if (pci_dma_mapping_error(np->pci_dev,
>> + np->put_rx_ctx->dma)) {
>> + dev_kfree_skb_any(skb);
>> + goto packet_dropped;
>> + }
>> np->put_rx_ctx->dma_len = skb_tailroom(skb);
>> np->put_rx.ex->bufhigh = cpu_to_le32(dma_high(np->put_rx_ctx->dma));
>> np->put_rx.ex->buflow = cpu_to_le32(dma_low(np->put_rx_ctx->dma));
>> @@ -1866,6 +1877,7 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
>> if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
>> np->put_rx_ctx = np->first_rx_ctx;
>> } else {
>> +packet_dropped:
>> u64_stats_update_begin(&np->swstats_rx_syncp);
>> np->stat_rx_dropped++;
>> u64_stats_update_end(&np->swstats_rx_syncp);
>> @@ -2217,6 +2229,9 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
>> bcnt = (size > NV_TX2_TSO_MAX_SIZE) ? NV_TX2_TSO_MAX_SIZE : size;
>> np->put_tx_ctx->dma = pci_map_single(np->pci_dev, skb->data + offset, bcnt,
>> PCI_DMA_TODEVICE);
>> + if (pci_dma_mapping_error(np->pci_dev,
>> + np->put_tx_ctx->dma))
>> + return NETDEV_TX_BUSY;
>
> Really this is not going to work very well : caller will call this in a
> loop.
Any suggestions on what value should be returned, or does the caller need to be
modified?
Thanks for the review,
Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: Eric Dumazet @ 2012-12-27 21:03 UTC (permalink / raw)
To: Larry Finger; +Cc: linville, davem, linux-wireless, netdev, linux-kernel
In-Reply-To: <50DCB1D9.50906@lwfinger.net>
On Thu, 2012-12-27 at 14:38 -0600, Larry Finger wrote:
> On 12/27/2012 02:05 PM, Eric Dumazet wrote:
> > On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote:
> >> + if (pci_dma_mapping_error(np->pci_dev,
> >> + np->put_tx_ctx->dma))
> >> + return NETDEV_TX_BUSY;
> >
> > Really this is not going to work very well : caller will call this in a
> > loop.
>
> Any suggestions on what value should be returned, or does the caller need to be
> modified?
NETDEV_TX_BUSY is really obsolete
Documentation/networking/driver.txt
In case of mapping error, I would drop the packet.
(kfree_skb() it, increment a device tx_dropped counter, and return
NETDEV_TX_OK)
^ permalink raw reply
* Re: [PATCH] forcedeth: Fix WARNINGS that result when DMA mapping is not checked
From: David Miller @ 2012-12-27 21:32 UTC (permalink / raw)
To: eric.dumazet; +Cc: Larry.Finger, linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1356642209.30414.1411.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 27 Dec 2012 13:03:29 -0800
> On Thu, 2012-12-27 at 14:38 -0600, Larry Finger wrote:
>> On 12/27/2012 02:05 PM, Eric Dumazet wrote:
>> > On Thu, 2012-12-27 at 13:42 -0600, Larry Finger wrote:
>
>> >> + if (pci_dma_mapping_error(np->pci_dev,
>> >> + np->put_tx_ctx->dma))
>> >> + return NETDEV_TX_BUSY;
>> >
>> > Really this is not going to work very well : caller will call this in a
>> > loop.
>>
>> Any suggestions on what value should be returned, or does the caller need to be
>> modified?
>
> NETDEV_TX_BUSY is really obsolete
>
> Documentation/networking/driver.txt
>
> In case of mapping error, I would drop the packet.
Agreed.
^ permalink raw reply
* Re: Is keepalive behaving as expected in 3.7.0+/net-next?
From: Eric Dumazet @ 2012-12-27 21:54 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev, Jamie Gloudon
In-Reply-To: <50D4DD28.30903@hp.com>
On Fri, 2012-12-21 at 14:05 -0800, Rick Jones wrote:
> I was looking to do a bit more documentation clean-up and thought I
> would work on the descriptions of the "keepalive" sysctls, but first I
> wanted to see if they behaved as the existing descriptions suggested:
>
> > tcp_keepalive_time - INTEGER
> > How often TCP sends out keepalive messages when keepalive is enabled.
> > Default: 2hours.
> >
> > tcp_keepalive_probes - INTEGER
> > How many keepalive probes TCP sends out, until it decides that the
> > connection is broken. Default value: 9.
> >
> > tcp_keepalive_intvl - INTEGER
> > How frequently the probes are send out. Multiplied by
> > tcp_keepalive_probes it is time to kill not responding connection,
> > after probes started. Default value: 75sec i.e. connection
> > will be aborted after ~11 minutes of retries.
>
> I interpreted all that that as: When a connection is idle, TCP will
> send a keepalive probe every tcp_keepalive_time seconds. If a response
> to a keepalive probe is not received, TCP will resend (retransmit) it
> every tcp_keepalive_intvl seconds.
>
> However, what I see is that on a connection where the remote is indeed
> still there, only the first keepalive probe is sent after
> tcp_keepalive_time, and thereafter it is sent every tcp_keepalive_intvl
> seconds.
>
> Now, some of this may relate to my being impatient - rather than wait
> two hours for the first probe, I set tcp_keepalive_time to 3 seconds,
> and tcp_keepalive_intvl to 7 seconds. I then kicked-off a ./configure
> --intervals-enable netperf TCP_RR test with a burst of one and a wait
> time of 90 seconds and got the following (trimmed) trace:
>
> 13:43:46.879133 IP netnextraj.43054 > netnextraj2.srvr: Flags [S], seq
> 807869796, win 14600, options [mss 1460,sackOK,TS val 133470 ecr
> 0,nop,wscale 7], length 0
> 13:43:46.880091 IP netnextraj2.srvr > netnextraj.43054: Flags [S.], seq
> 1522345902, ack 807869797, win 14480, options [mss 1460,sackOK,TS val
> 136186 ecr 133470,nop,wscale 4], length 0
> 13:43:46.880114 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 1, win 115, options [nop,nop,TS val 133470 ecr 136186], length 0
> 13:43:46.880306 IP netnextraj.43054 > netnextraj2.srvr: Flags [P.], seq
> 1:11, ack 1, win 115, options [nop,nop,TS val 133470 ecr 136186], length 10
> 13:43:46.880948 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 136187 ecr 133470], length 0
> 13:43:46.880964 IP netnextraj2.srvr > netnextraj.43054: Flags [P.], seq
> 1:11, ack 11, win 905, options [nop,nop,TS val 136187 ecr 133470], length 10
> 13:43:46.881161 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 133470 ecr 136187], length 0
>
> The first probe above comes after 3 seconds - tcp_keepalive_time - at
> 13:43:49
>
> 13:43:49.886752 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 134222 ecr 136187], length 0
>
> And it does seem to elicit a response:
>
> 13:43:49.887530 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 136938 ecr 133470], length 0
>
> Now it starts sending probes every 7 seconds (tcp_keepalive_intvl):
>
> 13:43:56.903576 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 135976 ecr 136938], length 0
> 13:43:56.904480 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 138693 ecr 133470], length 0
> 13:44:03.910744 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 137728 ecr 138693], length 0
> 13:44:03.911623 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 140444 ecr 133470], length 0
>
> I;ve deleted the next 9 or so probes... It continues, and doesn't
> terminate the connection, so I assume it was happy with the responses to
> the probes.
>
> 13:45:13.990746 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 11, win 115, options [nop,nop,TS val 155248 ecr 156213], length 0
> 13:45:13.991578 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 11, win 905, options [nop,nop,TS val 157965 ecr 133470], length 0
>
> Now the next netperf transaction happens:
>
> 13:45:16.879222 IP netnextraj.43054 > netnextraj2.srvr: Flags [P.], seq
> 11:21, ack 11, win 115, options [nop,nop,TS val 155970 ecr 157965],
> length 10
> 13:45:16.880033 IP netnextraj2.srvr > netnextraj.43054: Flags [P.], seq
> 11:21, ack 21, win 905, options [nop,nop,TS val 158687 ecr 155970],
> length 10
> 13:45:16.880220 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 21, win 115, options [nop,nop,TS val 155970 ecr 158687], length 0
>
> But the next keepalive probe is tcp_keepalive_intvl seconds after the
> last one, rather than that many, or tcp_keepalive_time seconds after the
> connection was last "active."
>
> 13:45:20.998739 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 21, win 115, options [nop,nop,TS val 157000 ecr 158687], length 0
> 13:45:20.999754 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 21, win 905, options [nop,nop,TS val 159717 ecr 155970], length 0
> 13:45:28.006747 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack
> 21, win 115, options [nop,nop,TS val 158752 ecr 159717], length 0
> 13:45:28.007624 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack
> 21, win 905, options [nop,nop,TS val 161469 ecr 155970], length 0
>
> Is this the expected behaviour? If I reverse the values - make
> tcp_keepalive_time 7 and tcp_keepalive_intvl 3, it seems that all the
> probes are after 7 seconds.
>
> rick jones
Not sure if it makes sense to have
tcp_keepalive_intvl > tcp_keepalive_time
time should be an order of magnitude bigger than intvl.
keepalive timer is not reset each time we receive a valid frame, it
would be very expensive.
Its a long period timer.
First interval is tcp_keepalive_time, and subsequent interval are
tcp_keepalive_intvl
Each time timer is fired (once every 7200 seconds), we re-arm it with
the observed elapsed time (keepalive_time_elapsed)
Fixing this would require to add a timestamp in inet socket, to remember
time of next/last probe, and firing the timer using
min(keepalive_time_when(tp), keepalive_intvl_when(tp))
Probably not worth it.
^ permalink raw reply
* Re: [PATCH] bgmac: driver for GBit MAC core on BCMA bus
From: Francois Romieu @ 2012-12-27 22:03 UTC (permalink / raw)
To: Rafał Miłecki; +Cc: netdev, David S. Miller
In-Reply-To: <CACna6rydhYcBZnRpz2PD5S=UfjQJ5xqkY=_+Y9BeOt-ENQqc3w@mail.gmail.com>
Rafał Miłecki <zajec5@gmail.com> :
[...]
> I'm not 100% sure if I understand... Do you mean I should declare variables
> inside "while" loop when possible ?
Yes. Please consider the deepest inner block.
While at it:
> diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
[...]
> +static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
[...]
> + /* Check for poison and drop or pass the packet */
> + if (len == 0xdead && flags == 0xbeef) {
> + bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n", ring->start);
> + } else {
> + new_skb = netdev_alloc_skb(bgmac->net_dev, len);
> + if (new_skb) {
> + skb_put(new_skb, len);
> + skb_copy_from_linear_data_offset(skb, BGMAC_RX_FRAME_OFFSET,
> + new_skb->data, len);
> + new_skb->protocol =
> + eth_type_trans(new_skb, bgmac->net_dev);
> + netif_receive_skb(new_skb);
> + handled++;
> + } else {
> + bgmac_err(bgmac, "Allocation of skb for copying packet failed!\n");
> + }
You should increase bgmac->net_dev->stats.rx_dropped
[...]
> +static int bgmac_dma_alloc(struct bgmac *bgmac)
> +{
> + struct device *dma_dev = bgmac->core->dma_dev;
> + struct bgmac_dma_ring *ring;
> + u16 ring_base[] = { BGMAC_DMA_BASE0, BGMAC_DMA_BASE1, BGMAC_DMA_BASE2,
> + BGMAC_DMA_BASE3, };
ring_base could be 'static const'.
> + int size; /* ring size: different for Tx and Rx */
> + int err;
> + int i;
> +
> + BUILD_BUG_ON(BGMAC_MAX_TX_RINGS > ARRAY_SIZE(ring_base));
> + BUILD_BUG_ON(BGMAC_MAX_RX_RINGS > ARRAY_SIZE(ring_base));
> +
> + if (!(bcma_aread32(bgmac->core, BCMA_IOST) & BCMA_IOST_DMA64)) {
> + bgmac_err(bgmac, "Core does not report 64-bit DMA\n");
> + return -ENOTSUPP;
> + }
> +
> + for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) {
> + ring = &bgmac->tx_ring[i];
> + ring->tx = true;
> + ring->num_slots = BGMAC_TX_RING_SLOTS;
> + ring->mmio_base = ring_base[i];
> + if (bgmac_dma_unaligned(bgmac, ring))
> + bgmac_warn(bgmac, "TX on ring 0x%X supports unaligned addressing but this feature is not implemented\n", ring->mmio_base);
ring->tx is used once in bgmac_dma_unaligned.
Add bgmac_dma_unaligned an extra parameter for the BGMAC_DMA_[RT]X_RINGLO
offset and ring->[rt]x can go away.
> +
> + /* Alloc ring of descriptors */
> + size = ring->num_slots * sizeof(struct bgmac_dma_desc);
> + ring->cpu_base = dma_zalloc_coherent(dma_dev, size,
> + &(ring->dma_base),
Plese remove the useless parenthesis.
[...]
> +static void bgmac_chip_stats_update(struct bgmac *bgmac)
> +{
> + int i;
> +
> + if (bgmac->core->id.id != BCMA_CORE_4706_MAC_GBIT) {
> + for (i = 0; i < BGMAC_NUM_MIB_TX_REGS; i++)
> + bgmac->mib_tx_regs[i] =
> + bgmac_read(bgmac,
> + BGMAC_TX_GOOD_OCTETS + (i * 4));
> + for (i = 0; i < BGMAC_NUM_MIB_RX_REGS; i++)
> + bgmac->mib_rx_regs[i] =
> + bgmac_read(bgmac,
> + BGMAC_RX_GOOD_OCTETS + (i * 4));
> + }
Neither of mib_[rt]x_regs is ever read.
[...]
> +static void bgmac_chip_reset(struct bgmac *bgmac)
> +{
[...]
> + if (ci->id == BCMA_CHIP_ID_BCM5357 || ci->id == BCMA_CHIP_ID_BCM4749 ||
> + ci->id == BCMA_CHIP_ID_BCM53572) {
> + struct bcma_drv_cc *cc = &bgmac->core->bus->drv_cc;
> + u8 et_swtype = 0;
> + u8 sw_type = BGMAC_CHIPCTL_1_SW_TYPE_EPHY |
> + BGMAC_CHIPCTL_1_IF_TYPE_RMII;
> + char buf[2];
> + if (nvram_getenv("et_swtype", buf, 1) > 0) {
Please keep an empty line between variables declaration and body.
[...]
> +static irqreturn_t bgmac_interrupt(int irq, void *dev_id)
> +{
> + struct bgmac *bgmac = netdev_priv(dev_id);
> +
> + u32 int_status = bgmac_read(bgmac, BGMAC_INT_STATUS);
> + int_status &= bgmac->int_mask;
...
[...]
> +/**************************************************
> + * net_device_ops
> + **************************************************/
> +
> +static int bgmac_open(struct net_device *net_dev)
> +{
> + struct bgmac *bgmac = netdev_priv(net_dev);
> +
> + /* Reset */
> + bgmac_chip_reset(bgmac);
Please remove the useless comment(s).
> + /* Specs say about reclaiming rings here, but we do that in DMA init */
> + bgmac_chip_init(bgmac, true);
> +
> + /* Enable IRQs */
> + if (request_irq(bgmac->core->irq, bgmac_interrupt, IRQF_SHARED,
> + KBUILD_MODNAME, net_dev) < 0)
> + bgmac_err(bgmac, "IRQ request error!\n");
> + napi_enable(&bgmac->napi);
> +
> + return 0;
bgmac_open should not succeed when request_irq fails. Please propagate its
error status code.
[...]
> +static int bgmac_get_settings(struct net_device *net_dev,
> + struct ethtool_cmd *cmd)
> +{
> + struct bgmac *bgmac = netdev_priv(net_dev);
> +
> + cmd->supported = SUPPORTED_10baseT_Half |
> + SUPPORTED_10baseT_Full |
> + SUPPORTED_100baseT_Half |
> + SUPPORTED_100baseT_Full |
> + SUPPORTED_1000baseT_Half |
> + SUPPORTED_1000baseT_Full |
> + SUPPORTED_Autoneg;
> +
> + if (bgmac->autoneg) {
> + WARN_ON(cmd->advertising);
> + if (bgmac->full_duplex) {
> + if (bgmac->speed & BGMAC_SPEED_10)
> + cmd->advertising |= ADVERTISED_10baseT_Full;
> + if (bgmac->speed & BGMAC_SPEED_100)
> + cmd->advertising |= ADVERTISED_100baseT_Full;
> + if (bgmac->speed & BGMAC_SPEED_1000)
> + cmd->advertising |= ADVERTISED_1000baseT_Full;
> + } else {
> + if (bgmac->speed & BGMAC_SPEED_10)
> + cmd->advertising |= ADVERTISED_10baseT_Half;
> + if (bgmac->speed & BGMAC_SPEED_100)
> + cmd->advertising |= ADVERTISED_100baseT_Half;
> + if (bgmac->speed & BGMAC_SPEED_1000)
> + cmd->advertising |= ADVERTISED_1000baseT_Half;
> + }
> + } else {
> + switch (bgmac->speed) {
> + case BGMAC_SPEED_10:
> + ethtool_cmd_speed_set(cmd, SPEED_10);
> + break;
> + case BGMAC_SPEED_100:
> + ethtool_cmd_speed_set(cmd, SPEED_100);
> + break;
> + case BGMAC_SPEED_1000:
> + ethtool_cmd_speed_set(cmd, SPEED_1000);
> + break;
> + }
> + }
It should not take long for users to complain that the driver does not
report the current link speed. A struct mii_if_info will ease your life.
Btw the driver does not manage the link (netif_carrier_{on/off}).
[...]
> diff --git a/drivers/net/ethernet/broadcom/bgmac.h b/drivers/net/ethernet/broadcom/bgmac.h
[...]
> +struct bgmac_dma_ring {
> + bool tx;
> + u16 num_slots;
> + u16 start;
> + u16 end;
> +
> + u16 mmio_base;
> + void *cpu_base;
struct bgmac_dma_desc *cpu_base; ?
--
Ueimor
^ permalink raw reply
* BUG: skb_under_panic in eth_header (3.0.34)
From: Stephen Hemminger @ 2012-12-27 22:59 UTC (permalink / raw)
To: Michael Chan; +Cc: netdev
Bug report from sysop@prisjakt.nu
It is a skb_under_panic in eth_header where it trying to make space for
the Ethernet header?
https://bugzilla.kernel.org/show_bug.cgi?id=42764
I got it one more time. Ethernet card is:
0e:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit
Ethernet (rev 20)
ethtool -i eth0
driver: bnx2
version: 2.1.6
firmware-version: bc 4.4.14 IPMI 0.1.16
bus-info: 0000:04:00.0
ethtool -i eth2
driver: bnx2
version: 2.1.6
firmware-version: bc 4.4.10
bus-info: 0000:0e:00.0
Could this be related to an old firmware?
This machine does a lot of network traffic, because it load data from a
database every night.
Kernel version is 3.0.34
Sorry for not reporting hardware, I thought it was a protocol bug not related
to a specific card.
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:147!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: af_packet ipmi_devintf ipmi_si ipmi_msghandler ipv6 fuse
loop sr_mod cdrom mptsas mptscsih mptbase scsi_transport_sas tpm_tis tpm bnx2
tpm_bios sg i2c_piix4 i2c_core shpchp pci_hotplug button serio_raw linear
scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin sd_mod
crc_t10dif qla2xxx scsi_transport_fc scsi_tgt dm_snapshot dm_multipath scsi_dh
scsi_mod edd dm_mod ext3 mbcache jbd fan thermal processor thermal_sys hwmon
[last unloaded: usbcore]
Pid: 21566, comm: nscd Not tainted 3.0.34-inps #5 IBM BladeCenter LS42
-[7902CQG]-/Server Blade
RIP: 0010:[<ffffffff8123fd12>] [<ffffffff8123fd12>] skb_push+0x75/0x7e
RSP: 0018:ffff880b956c39d8 EFLAGS: 00010292
RAX: 0000000000000083 RBX: 0000000000000800 RCX: 0000000000023382
RDX: 0000000000007878 RSI: 0000000000000046 RDI: ffffffff8152ee9c
RBP: ffff880b956c39f8 R08: 0000000000000000 R09: 0720072007200720
R10: ffff880b956c37c8 R11: 0720072007200720 R12: 0000000000000000
R13: ffff8810231fb718 R14: 0000000000000055 R15: ffff880c25d24000
FS: 00007fd9a9222950(0000) GS:ffff880c3fc00000(0000) knlGS:00000000e8dafb90
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fd9b403a000 CR3: 0000000c265e8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nscd (pid: 21566, threadinfo ffff880b956c2000, task ffff8803e50e4790)
Stack:
0000000000000057 0000000000000080 ffff880c25d24000 ffff880c71806440
ffff880b956c3a38 ffffffff8125e0c5 ffff880b956c3a28 0000000000000036
ffff8810231fb680 ffff8810231fb718 ffff880b2cc49380 ffff8810231fb710
Call Trace:
[<ffffffff8125e0c5>] eth_header+0x29/0xa8
[<ffffffff81251a59>] neigh_resolve_output+0x284/0x2ed
[<ffffffff8127575e>] ip_finish_output+0x24c/0x293
[<ffffffff81097cac>] ? zone_watermark_ok+0x1a/0x1c
[<ffffffff81275846>] ip_output+0xa1/0xa5
[<ffffffff81274b92>] ip_local_out+0x24/0x29
[<ffffffff81274ba0>] ip_send_skb+0x9/0x4c
[<ffffffff8129178b>] udp_send_skb+0x28b/0x2e8
[<ffffffff812935cb>] udp_sendmsg+0x501/0x708
[<ffffffff810e42dd>] ? poll_freewait+0x8d/0x8d
[<ffffffff812740ec>] ? ip_append_page+0x4f4/0x4f4
[<ffffffff8109b8f2>] ? __alloc_pages_nodemask+0x731/0x77c
[<ffffffff810541ad>] ? sched_clock_local+0x1c/0x80
[<ffffffff81299607>] inet_sendmsg+0x83/0x90
[<ffffffff81238e93>] sock_sendmsg+0xe3/0x106
[<ffffffff810c7c8a>] ? ____cache_alloc_node+0x4c/0x132
[<ffffffff810c7b4f>] ? fallback_alloc+0xe1/0x1d0
[<ffffffff8126e390>] ? __ip_route_output_key+0x139/0x816
[<ffffffff812d24e3>] ? _raw_spin_unlock_bh+0xf/0x11
[<ffffffff8123ba9e>] ? release_sock+0xfd/0x106
[<ffffffff81239463>] sys_sendto+0xfa/0x123
[<ffffffff81239c3d>] ? sys_connect+0x88/0x9c
[<ffffffff812d78bb>] system_call_fastpath+0x16/0x1b
Code: 8b 57 68 48 89 44 24 10 8b 87 d0 00 00 00 48 89 44 24 08 8b bf cc 00 00
00 31 c0 48 89 3c 24 48 c7 c7 5a 79 3a 81 e8 df 03 09 00 <0f> 0b eb fe 48 89 c8
c9 c3 55 89 f1 48 89 e5 48 83 ec 20 83 7f
RIP [<ffffffff8123fd12>] skb_push+0x75/0x7e
RSP <ffff880b956c39d8>
---[ end trace 7e8c514a1c33a3f0 ]---
Output of lspci -vvv:
0e:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit
Ethernet (rev 20)
Subsystem: IBM Device 039f
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 26
Region 0: Memory at ce000000 (64-bit, non-prefetchable) [size=32M]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Count=1/8
Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [a0] MSI-X: Enable+ Mask- TabSize=9
Vector table: BAR=0 offset=0000c000
PBA: BAR=0 offset=0000e000
Capabilities: [ac] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1
<64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0
<4us, L1 <4us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Kernel driver in use: bnx2
Kernel modules: bnx2
0e:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit
Ethernet (rev 20)
Subsystem: IBM Device 039f
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 26
Region 0: Memory at d0000000 (64-bit, non-prefetchable) [size=32M]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Count=1/8
Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [a0] MSI-X: Enable- Mask- TabSize=8
Vector table: BAR=0 offset=0000c000
PBA: BAR=0 offset=0000e000
Capabilities: [ac] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1
<64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0
<4us, L1 <4us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Kernel driver in use: bnx2
Kernel modules: bnx2
^ permalink raw reply
* Re: BUG: skb_under_panic in eth_header (3.0.34)
From: Eric Dumazet @ 2012-12-27 23:16 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Michael Chan, netdev
In-Reply-To: <20121227145914.7b318d10@nehalam.linuxnetplumber.net>
On Thu, 2012-12-27 at 14:59 -0800, Stephen Hemminger wrote:
> Bug report from sysop@prisjakt.nu
>
> It is a skb_under_panic in eth_header where it trying to make space for
> the Ethernet header?
>
> https://bugzilla.kernel.org/show_bug.cgi?id=42764
>
>
> I got it one more time. Ethernet card is:
> 0e:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit
> Ethernet (rev 20)
>
> ethtool -i eth0
> driver: bnx2
> version: 2.1.6
> firmware-version: bc 4.4.14 IPMI 0.1.16
> bus-info: 0000:04:00.0
>
> ethtool -i eth2
> driver: bnx2
> version: 2.1.6
> firmware-version: bc 4.4.10
> bus-info: 0000:0e:00.0
>
> Could this be related to an old firmware?
>
> This machine does a lot of network traffic, because it load data from a
> database every night.
>
> Kernel version is 3.0.34
>
> Sorry for not reporting hardware, I thought it was a protocol bug not related
> to a specific card.
>
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:147!
> invalid opcode: 0000 [#1] SMP
> CPU 2
> Modules linked in: af_packet ipmi_devintf ipmi_si ipmi_msghandler ipv6 fuse
> loop sr_mod cdrom mptsas mptscsih mptbase scsi_transport_sas tpm_tis tpm bnx2
> tpm_bios sg i2c_piix4 i2c_core shpchp pci_hotplug button serio_raw linear
> scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin sd_mod
> crc_t10dif qla2xxx scsi_transport_fc scsi_tgt dm_snapshot dm_multipath scsi_dh
> scsi_mod edd dm_mod ext3 mbcache jbd fan thermal processor thermal_sys hwmon
> [last unloaded: usbcore]
>
> Pid: 21566, comm: nscd Not tainted 3.0.34-inps #5 IBM BladeCenter LS42
> -[7902CQG]-/Server Blade
> RIP: 0010:[<ffffffff8123fd12>] [<ffffffff8123fd12>] skb_push+0x75/0x7e
> RSP: 0018:ffff880b956c39d8 EFLAGS: 00010292
> RAX: 0000000000000083 RBX: 0000000000000800 RCX: 0000000000023382
> RDX: 0000000000007878 RSI: 0000000000000046 RDI: ffffffff8152ee9c
> RBP: ffff880b956c39f8 R08: 0000000000000000 R09: 0720072007200720
> R10: ffff880b956c37c8 R11: 0720072007200720 R12: 0000000000000000
> R13: ffff8810231fb718 R14: 0000000000000055 R15: ffff880c25d24000
> FS: 00007fd9a9222950(0000) GS:ffff880c3fc00000(0000) knlGS:00000000e8dafb90
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fd9b403a000 CR3: 0000000c265e8000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process nscd (pid: 21566, threadinfo ffff880b956c2000, task ffff8803e50e4790)
> Stack:
> 0000000000000057 0000000000000080 ffff880c25d24000 ffff880c71806440
> ffff880b956c3a38 ffffffff8125e0c5 ffff880b956c3a28 0000000000000036
> ffff8810231fb680 ffff8810231fb718 ffff880b2cc49380 ffff8810231fb710
> Call Trace:
> [<ffffffff8125e0c5>] eth_header+0x29/0xa8
> [<ffffffff81251a59>] neigh_resolve_output+0x284/0x2ed
> [<ffffffff8127575e>] ip_finish_output+0x24c/0x293
> [<ffffffff81097cac>] ? zone_watermark_ok+0x1a/0x1c
> [<ffffffff81275846>] ip_output+0xa1/0xa5
> [<ffffffff81274b92>] ip_local_out+0x24/0x29
> [<ffffffff81274ba0>] ip_send_skb+0x9/0x4c
> [<ffffffff8129178b>] udp_send_skb+0x28b/0x2e8
> [<ffffffff812935cb>] udp_sendmsg+0x501/0x708
> [<ffffffff810e42dd>] ? poll_freewait+0x8d/0x8d
> [<ffffffff812740ec>] ? ip_append_page+0x4f4/0x4f4
> [<ffffffff8109b8f2>] ? __alloc_pages_nodemask+0x731/0x77c
> [<ffffffff810541ad>] ? sched_clock_local+0x1c/0x80
> [<ffffffff81299607>] inet_sendmsg+0x83/0x90
> [<ffffffff81238e93>] sock_sendmsg+0xe3/0x106
> [<ffffffff810c7c8a>] ? ____cache_alloc_node+0x4c/0x132
> [<ffffffff810c7b4f>] ? fallback_alloc+0xe1/0x1d0
> [<ffffffff8126e390>] ? __ip_route_output_key+0x139/0x816
> [<ffffffff812d24e3>] ? _raw_spin_unlock_bh+0xf/0x11
> [<ffffffff8123ba9e>] ? release_sock+0xfd/0x106
> [<ffffffff81239463>] sys_sendto+0xfa/0x123
> [<ffffffff81239c3d>] ? sys_connect+0x88/0x9c
> [<ffffffff812d78bb>] system_call_fastpath+0x16/0x1b
> Code: 8b 57 68 48 89 44 24 10 8b 87 d0 00 00 00 48 89 44 24 08 8b bf cc 00 00
> 00 31 c0 48 89 3c 24 48 c7 c7 5a 79 3a 81 e8 df 03 09 00 <0f> 0b eb fe 48 89 c8
> c9 c3 55 89 f1 48 89 e5 48 83 ec 20 83 7f
> RIP [<ffffffff8123fd12>] skb_push+0x75/0x7e
> RSP <ffff880b956c39d8>
> ---[ end trace 7e8c514a1c33a3f0 ]---
>
> Output of lspci -vvv:
Probably fixed by :
commit e1f165032c8bade3a6bdf546f8faf61fda4dd01c
Author: ramesh.nagappa@gmail.com <ramesh.nagappa@gmail.com>
Date: Fri Oct 5 19:10:15 2012 +0000
net: Fix skb_under_panic oops in neigh_resolve_output
The retry loop in neigh_resolve_output() and neigh_connected_output()
call dev_hard_header() with out reseting the skb to network_header.
This causes the retry to fail with skb_under_panic. The fix is to
reset the network_header within the retry loop.
Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com>
Reviewed-by: Shawn Lu <shawn.lu@ericsson.com>
Reviewed-by: Robert Coulson <robert.coulson@ericsson.com>
Reviewed-by: Billie Alsup <billie.alsup@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: TUN problems (regression?)
From: Stephen Hemminger @ 2012-12-28 0:41 UTC (permalink / raw)
To: Jason Wang; +Cc: Eric Dumazet, Paul Moore, netdev
In-Reply-To: <50D3E510.6020008@redhat.com>
On Fri, 21 Dec 2012 12:26:56 +0800
Jason Wang <jasowang@redhat.com> wrote:
> On 12/21/2012 11:39 AM, Eric Dumazet wrote:
> > On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
> >> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
> >>> On Thu, 20 Dec 2012 15:38:17 -0800
> >>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>
> >>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> >>>>> [CC'ing netdev in case this is a known problem I just missed ...]
> >>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> I started doing some more testing with the multiqueue TUN changes and I ran
> >>>>> into a problem when running tunctl: running it once w/o arguments works as
> >>>>> expected, but running it a second time results in failure and a
> >>>>> kmem_cache_sanity_check() failure. The problem appears to be very repeatable
> >>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
> >>>>>
> >>>>> Have you seen this before?
> >>>>>
> >>>> Obviously code in tun_flow_init() is wrong...
> >>>>
> >>>> static int tun_flow_init(struct tun_struct *tun)
> >>>> {
> >>>> int i;
> >>>>
> >>>> tun->flow_cache = kmem_cache_create("tun_flow_cache",
> >>>> sizeof(struct tun_flow_entry), 0, 0,
> >>>> NULL);
> >>>> if (!tun->flow_cache)
> >>>> return -ENOMEM;
> >>>> ...
> >>>> }
> >>>>
> >>>>
> >>>> I have no idea why we would need a kmem_cache per tun_struct,
> >>>> and why we even need a kmem_cache.
> >>> Normally flow malloc/free should be good enough.
> >>> It might make sense to use private kmem_cache if doing hlist_nulls.
> >>>
> >>>
> >>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> >> Should be at least a global cache, I thought I can get some speed-up by
> >> using kmem_cache.
> >>
> >> Acked-by: Jason Wang <jasowang@redhat.com>
> > Was it with SLUB or SLAB ?
> >
> > Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
> > bytes per object, as we guarantee each object is on a single cache line.
> >
> >
>
> Right, thanks for the explanation.
>
I wonder if TUN would be better if it used a array to translate
receive hash to receive queue. This is how real hardware works with the
indirection table, and it would allow RFS acceleration. The current flow
cache stuff is prone to DoS attack and scaling problems with lots of
short lived flows.
^ permalink raw reply
* Re: vxlan in Linux kernel 3.7
From: Naoto MATSUMOTO @ 2012-12-28 0:46 UTC (permalink / raw)
To: Qin, Xiaohong; +Cc: netdev@vger.kernel.org
In-Reply-To: <A3CA455BB4F1DA4E92CB43AAF0E4BB1D0667E12B@MX01A.corp.emc.com>
Hi Qin
FYI(For Your Information)
A First Look At VXLAN over Infiniband Network On Linux 3.7-rc7 & iproute2
http://slidesha.re/TsCKWc
On Thu, 27 Dec 2012 13:42:51 -0500
"Qin, Xiaohong" <Xiaohong.Qin@emc.com> wrote:
> Hi All,
>
> I have installed kernel 3.7 on my Linux box, see the following uname -a output,
>
> uname -a
> Linux c210-m2-sib-3 3.7.0-030700-generic #201212102335 SMP Tue Dec 11 04:36:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Does that mean I've got VXLAN module loaded or I still need to go through some extra steps to enable or configure it? Do you have any VXLAN setup or configuration document by chance?
>
> Thanks.
>
> Dennis Qin
>
> P.S. If this is not the right place to ask this kind of questions, please let me know which mailing list I should use.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
SAKURA Internet Inc. / Senior Researcher
Naoto MATSUMOTO <n-matsumoto@sakura.ad.jp>
SAKURA Research Center <http://research.sakura.ad.jp/>
^ permalink raw reply
* ppoll() stuck on POLLIN while TCP peer is sending
From: Eric Wong @ 2012-12-28 1:45 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: Andreas Voellmy, viro, linux-fsdevel, Junchang(Jason) Wang
I'm finding ppoll() unexpectedly stuck when waiting for POLLIN on a
local TCP socket. The isolated code below can reproduces the issue
after many minutes (<1 hour). It might be easier to reproduce on
a busy system while disk I/O is happening.
This may also be related to an epoll-related issue reported
by Andreas Voellmy:
http://thread.gmane.org/gmane.linux.kernel/1408782/
My example involves a 3 thread data flow between two pairs
of (4) sockets:
send_loop -> recv_loop(recv_send) -> recv_loop(recv_only)
pair_a[1] -> (pair_a[0] -> pair_b[1]) -> pair_b[0]
At least 3.7 and 3.7.1 are affected.
I have tcp_low_latency=1 set, I will try 0 later
The last progress message I got was after receiving 2942052597760
bytes on fd=7 (out of 64-bit ULONG_MAX / 2)
strace:
3644 sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 0, NULL, 0 <unfinished ...>
3643 sendto(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 0, NULL, 0 <unfinished ...>
3642 ppoll([{fd=7, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
3641 futex(0x7f23ed8129d0, FUTEX_WAIT, 3644, NULL <unfinished ...>
The first and last lines of the strace are expected:
+ 3644 sendto(4) is blocked because 3643 is blocked on sendto(fd=6)
and not able to call recv().
+ 3641 is the main thread calling pthread_join
What is unexpected is the tid=3643 and tid=3642 interaction. As confirmed
by lsof below, fd=6 is sending to wake up fd=7, but ppoll(fd=7) seems
to not be waking up.
lsof:
toosleepy 3641 ew 4u IPv4 12405 0t0 TCP localhost:55904->localhost:33249 (ESTABLISHED)
toosleepy 3641 ew 5u IPv4 12406 0t0 TCP localhost:33249->localhost:55904 (ESTABLISHED)
toosleepy 3641 ew 6u IPv4 12408 0t0 TCP localhost:48777->localhost:33348 (ESTABLISHED)
toosleepy 3641 ew 7u IPv4 12409 0t0 TCP localhost:33348->localhost:48777 (ESTABLISHED)
System info: Linux 3.7.1 x86_64 SMP PREEMPT
AMD Phenom(tm) II X4 945 Processor (4 cores)
Nothing interesting in dmesg, iptables rules are empty.
I have not yet been able to reproduce the issue using UNIX sockets,
only TCP, but you can run:
./toosleepy unix
...to test with UNIX sockets intead of TCP.
The following code is also available via git://bogomips.org/toosleepy
gcc -o toosleepy -O2 -Wall -lpthread toosleepy.c
-------------------------------- 8< ------------------------------------
#define _GNU_SOURCE
#include <poll.h>
#include <sys/ioctl.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/tcp.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <assert.h>
#include <limits.h>
struct receiver {
int rfd;
int sfd;
};
/* blocking sender */
static void * send_loop(void *fdp)
{
int fd = *(int *)fdp;
char buf[16384];
ssize_t s;
size_t sent = 0;
size_t max = (size_t)ULONG_MAX / 2;
while (sent < max) {
s = send(fd, buf, sizeof(buf), 0);
if (s > 0)
sent += s;
if (s == -1)
assert(errno == EINTR);
}
dprintf(2, "%d done sending: %zu\n", fd, sent);
close(fd);
return NULL;
}
/* non-blocking receiver, using ppoll */
static void * recv_loop(void *p)
{
const struct receiver *rcvr = p;
char buf[16384];
nfds_t nfds = 1;
struct pollfd fds;
int rc;
ssize_t r, s;
size_t received = 0;
size_t sent = 0;
for (;;) {
r = recv(rcvr->rfd, buf, sizeof(buf), 0);
if (r == 0) {
break;
} else if (r == -1) {
assert(errno == EAGAIN);
fds.fd = rcvr->rfd;
fds.events = POLLIN;
errno = 0;
rc = ppoll(&fds, nfds, NULL, NULL);
assert(rc == 1);
} else {
assert(r > 0);
received += r;
if (rcvr->sfd >= 0) {
s = send(rcvr->sfd, buf, sizeof(buf), 0);
if (s > 0)
sent += s;
if (s == -1)
assert(errno == EINTR);
} else {
/* just burn some cycles */
write(-1, buf, sizeof(buf));
}
}
if ((received % (sizeof(buf) * sizeof(buf) * 16) == 0))
dprintf(2, " %d progress: %zu\n",
rcvr->rfd, received);
}
dprintf(2, "%d got: %zu\n", rcvr->rfd, received);
if (rcvr->sfd >= 0) {
dprintf(2, "%d sent: %zu\n", rcvr->sfd, sent);
close(rcvr->sfd);
}
return NULL;
}
static void tcp_socketpair(int sv[2], int accept_flags)
{
struct sockaddr_in addr;
socklen_t addrlen = sizeof(addr);
int l = socket(PF_INET, SOCK_STREAM, 0);
int c = socket(PF_INET, SOCK_STREAM, 0);
int a;
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_port = 0;
assert(0 == bind(l, (struct sockaddr*)&addr, addrlen));
assert(0 == listen(l, 1024));
assert(0 == getsockname(l, (struct sockaddr *)&addr, &addrlen));
assert(0 == connect(c, (struct sockaddr *)&addr, addrlen));
a = accept4(l, NULL, NULL, accept_flags);
assert(a >= 0);
close(l);
sv[0] = a;
sv[1] = c;
}
int main(int argc, char *argv[])
{
int pair_a[2];
int pair_b[2];
pthread_t s, rs, r;
struct receiver recv_only;
struct receiver recv_send;
if (argc == 2 && strcmp(argv[1], "unix") == 0) {
int val;
assert(0 == socketpair(AF_UNIX, SOCK_STREAM, 0, pair_a));
assert(0 == socketpair(AF_UNIX, SOCK_STREAM, 0, pair_b));
/* only make the receiver non-blocking */
val = 1;
assert(0 == ioctl(pair_a[0], FIONBIO, &val));
val = 1;
assert(0 == ioctl(pair_b[0], FIONBIO, &val));
} else {
tcp_socketpair(pair_a, SOCK_NONBLOCK);
tcp_socketpair(pair_b, SOCK_NONBLOCK);
}
recv_send.rfd = pair_a[0];
recv_send.sfd = pair_b[1];
recv_only.rfd = pair_b[0];
recv_only.sfd = -1;
/*
* data flow:
* send_loop -> recv_loop(recv_send) -> recv_loop(recv_only)
* pair_a[1] -> (pair_a[0] -> pair_b[1]) -> pair_b[0]
*/
assert(0 == pthread_create(&r, NULL, recv_loop, &recv_only));
assert(0 == pthread_create(&rs, NULL, recv_loop, &recv_send));
assert(0 == pthread_create(&s, NULL, send_loop, &pair_a[1]));
assert(0 == pthread_join(s, NULL));
assert(0 == pthread_join(rs, NULL));
assert(0 == pthread_join(r, NULL));
return 0;
}
-------------------------------- 8< ------------------------------------
Any help/suggestions/test patches would be greatly appreciated.
Thanks for reading!
--
Eric Wong
^ permalink raw reply
* [PATCH 12/19] netfilter: l4proto: prepare reworking l4proto support for netns
From: Gao feng @ 2012-12-28 2:36 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, canqunzhang, kaber, pablo, ebiederm, Gao feng
In-Reply-To: <1356662206-2260-1-git-send-email-gaofeng@cn.fujitsu.com>
Prepare to move the code that register/unregister l4proto
to the module_init/exit context.
This patch deletes the codes that register/unregister l4proto,
this code will be added in next patches.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
include/net/netfilter/nf_conntrack_l4proto.h | 18 ++++++++++----
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 24 +++++++++----------
net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 24 +++++++++----------
net/netfilter/nf_conntrack_proto.c | 33 ++++++++++----------------
net/netfilter/nf_conntrack_proto_dccp.c | 20 ++++++++--------
net/netfilter/nf_conntrack_proto_gre.c | 6 +++--
net/netfilter/nf_conntrack_proto_sctp.c | 14 +++++------
net/netfilter/nf_conntrack_proto_udplite.c | 17 +++++++------
8 files changed, 80 insertions(+), 76 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
index c3be4ae..c001ef7 100644
--- a/include/net/netfilter/nf_conntrack_l4proto.h
+++ b/include/net/netfilter/nf_conntrack_l4proto.h
@@ -121,11 +121,19 @@ extern struct nf_conntrack_l4proto *
nf_ct_l4proto_find_get(u_int16_t l3proto, u_int8_t l4proto);
extern void nf_ct_l4proto_put(struct nf_conntrack_l4proto *p);
-/* Protocol registration. */
-extern int nf_conntrack_l4proto_register(struct net *net,
- struct nf_conntrack_l4proto *proto);
-extern void nf_conntrack_l4proto_unregister(struct net *net,
- struct nf_conntrack_l4proto *proto);
+/* Protocol pernet registration. */
+extern int
+nf_conntrack_l4proto_pernet_register(struct net *net,
+ struct nf_conntrack_l4proto *proto);
+extern void
+nf_conntrack_l4proto_pernet_unregister(struct net *net,
+ struct nf_conntrack_l4proto *proto);
+
+/* Protocol global registration. */
+extern int
+nf_conntrack_l4proto_register(struct nf_conntrack_l4proto *proto);
+extern void
+nf_conntrack_l4proto_unregister(struct nf_conntrack_l4proto *proto);
static inline void nf_ct_kfree_compat_sysctl_table(struct nf_proto_net *pn)
{
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index a942add..933da838 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -420,20 +420,20 @@ static int ipv4_net_init(struct net *net)
{
int ret = 0;
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_tcp4);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_tcp4);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_tcp4 :protocol register failed\n");
goto out_tcp;
}
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_udp4);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_udp4);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_udp4 :protocol register failed\n");
goto out_udp;
}
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_icmp);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_icmp);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_icmp4 :protocol register failed\n");
goto out_icmp;
@@ -446,13 +446,13 @@ static int ipv4_net_init(struct net *net)
}
return 0;
out_ipv4:
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_icmp);
out_icmp:
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_udp4);
out_udp:
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_tcp4);
out_tcp:
return ret;
@@ -462,11 +462,11 @@ static void ipv4_net_exit(struct net *net)
{
nf_conntrack_l3proto_pernet_unregister(net,
&nf_conntrack_l3proto_ipv4);
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_icmp);
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_udp4);
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_tcp4);
}
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 07ec50b..8db8182 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -421,20 +421,20 @@ static int ipv6_net_init(struct net *net)
{
int ret = 0;
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_tcp6);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_tcp6);
if (ret < 0) {
printk(KERN_ERR "nf_conntrack_l4proto_tcp6: protocol register failed\n");
goto out;
}
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_udp6);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_udp6);
if (ret < 0) {
printk(KERN_ERR "nf_conntrack_l4proto_udp6: protocol register failed\n");
goto cleanup_tcp6;
}
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_icmpv6);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_icmpv6);
if (ret < 0) {
printk(KERN_ERR "nf_conntrack_l4proto_icmp6: protocol register failed\n");
goto cleanup_udp6;
@@ -447,13 +447,13 @@ static int ipv6_net_init(struct net *net)
}
return 0;
cleanup_icmpv6:
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_icmpv6);
cleanup_udp6:
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_udp6);
cleanup_tcp6:
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_tcp6);
out:
return ret;
@@ -463,11 +463,11 @@ static void ipv6_net_exit(struct net *net)
{
nf_conntrack_l3proto_pernet_unregister(net,
&nf_conntrack_l3proto_ipv6);
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_icmpv6);
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_udp6);
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_tcp6);
}
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index 5a625a6..ff38923 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -369,8 +369,8 @@ void nf_ct_l4proto_unregister_sysctl(struct net *net,
/* FIXME: Allow NULL functions and sub in pointers to generic for
them. --RR */
-static int
-nf_conntrack_l4proto_register_net(struct nf_conntrack_l4proto *l4proto)
+int
+nf_conntrack_l4proto_register(struct nf_conntrack_l4proto *l4proto)
{
int ret = 0;
@@ -424,9 +424,10 @@ out_unlock:
mutex_unlock(&nf_ct_proto_mutex);
return ret;
}
+EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_register);
-int nf_conntrack_l4proto_register(struct net *net,
- struct nf_conntrack_l4proto *l4proto)
+int nf_conntrack_l4proto_pernet_register(struct net *net,
+ struct nf_conntrack_l4proto *l4proto)
{
int ret = 0;
struct nf_proto_net *pn = NULL;
@@ -445,22 +446,14 @@ int nf_conntrack_l4proto_register(struct net *net,
if (ret < 0)
goto out;
- if (net == &init_net) {
- ret = nf_conntrack_l4proto_register_net(l4proto);
- if (ret < 0) {
- nf_ct_l4proto_unregister_sysctl(net, pn, l4proto);
- goto out;
- }
- }
-
pn->users++;
out:
return ret;
}
-EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_register);
+EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_pernet_register);
-static void
-nf_conntrack_l4proto_unregister_net(struct nf_conntrack_l4proto *l4proto)
+void
+nf_conntrack_l4proto_unregister(struct nf_conntrack_l4proto *l4proto)
{
BUG_ON(l4proto->l3proto >= PF_MAX);
@@ -475,15 +468,13 @@ nf_conntrack_l4proto_unregister_net(struct nf_conntrack_l4proto *l4proto)
synchronize_rcu();
}
+EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_unregister);
-void nf_conntrack_l4proto_unregister(struct net *net,
- struct nf_conntrack_l4proto *l4proto)
+void nf_conntrack_l4proto_pernet_unregister(struct net *net,
+ struct nf_conntrack_l4proto *l4proto)
{
struct nf_proto_net *pn = NULL;
- if (net == &init_net)
- nf_conntrack_l4proto_unregister_net(l4proto);
-
pn = nf_ct_l4proto_net(net, l4proto);
if (pn == NULL)
return;
@@ -494,7 +485,7 @@ void nf_conntrack_l4proto_unregister(struct net *net,
/* Remove all contrack entries for this protocol */
nf_ct_iterate_cleanup(net, kill_l4proto, l4proto);
}
-EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_unregister);
+EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_pernet_unregister);
int nf_conntrack_proto_pernet_init(struct net *net)
{
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index a8ae287..3850d68 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -935,32 +935,32 @@ static struct nf_conntrack_l4proto dccp_proto6 __read_mostly = {
static __net_init int dccp_net_init(struct net *net)
{
int ret = 0;
- ret = nf_conntrack_l4proto_register(net,
- &dccp_proto4);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &dccp_proto4);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_dccp4 :protocol register failed.\n");
goto out;
}
- ret = nf_conntrack_l4proto_register(net,
- &dccp_proto6);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &dccp_proto6);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_dccp6 :protocol register failed.\n");
goto cleanup_dccp4;
}
return 0;
cleanup_dccp4:
- nf_conntrack_l4proto_unregister(net,
- &dccp_proto4);
+ nf_conntrack_l4proto_pernet_unregister(net,
+ &dccp_proto4);
out:
return ret;
}
static __net_exit void dccp_net_exit(struct net *net)
{
- nf_conntrack_l4proto_unregister(net,
- &dccp_proto6);
- nf_conntrack_l4proto_unregister(net,
- &dccp_proto4);
+ nf_conntrack_l4proto_pernet_unregister(net,
+ &dccp_proto6);
+ nf_conntrack_l4proto_pernet_unregister(net,
+ &dccp_proto4);
}
static struct pernet_operations dccp_net_ops = {
diff --git a/net/netfilter/nf_conntrack_proto_gre.c b/net/netfilter/nf_conntrack_proto_gre.c
index b09b7af..f5f14c2 100644
--- a/net/netfilter/nf_conntrack_proto_gre.c
+++ b/net/netfilter/nf_conntrack_proto_gre.c
@@ -397,7 +397,8 @@ static struct nf_conntrack_l4proto nf_conntrack_l4proto_gre4 __read_mostly = {
static int proto_gre_net_init(struct net *net)
{
int ret = 0;
- ret = nf_conntrack_l4proto_register(net, &nf_conntrack_l4proto_gre4);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_gre4);
if (ret < 0)
pr_err("nf_conntrack_l4proto_gre4 :protocol register failed.\n");
return ret;
@@ -405,7 +406,8 @@ static int proto_gre_net_init(struct net *net)
static void proto_gre_net_exit(struct net *net)
{
- nf_conntrack_l4proto_unregister(net, &nf_conntrack_l4proto_gre4);
+ nf_conntrack_l4proto_pernet_unregister(net,
+ &nf_conntrack_l4proto_gre4);
nf_ct_gre_keymap_flush(net);
}
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index c746d61..0aa91dd 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -853,14 +853,14 @@ static int sctp_net_init(struct net *net)
{
int ret = 0;
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_sctp4);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_sctp4);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_sctp4 :protocol register failed.\n");
goto out;
}
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_sctp6);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_sctp6);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_sctp6 :protocol register failed.\n");
goto cleanup_sctp4;
@@ -868,7 +868,7 @@ static int sctp_net_init(struct net *net)
return 0;
cleanup_sctp4:
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_sctp4);
out:
return ret;
@@ -876,9 +876,9 @@ out:
static void sctp_net_exit(struct net *net)
{
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_sctp6);
- nf_conntrack_l4proto_unregister(net,
+ nf_conntrack_l4proto_pernet_unregister(net,
&nf_conntrack_l4proto_sctp4);
}
diff --git a/net/netfilter/nf_conntrack_proto_udplite.c b/net/netfilter/nf_conntrack_proto_udplite.c
index 4b66df2..56e53c0 100644
--- a/net/netfilter/nf_conntrack_proto_udplite.c
+++ b/net/netfilter/nf_conntrack_proto_udplite.c
@@ -336,14 +336,14 @@ static int udplite_net_init(struct net *net)
{
int ret = 0;
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_udplite4);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_udplite4);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_udplite4 :protocol register failed.\n");
goto out;
}
- ret = nf_conntrack_l4proto_register(net,
- &nf_conntrack_l4proto_udplite6);
+ ret = nf_conntrack_l4proto_pernet_register(net,
+ &nf_conntrack_l4proto_udplite6);
if (ret < 0) {
pr_err("nf_conntrack_l4proto_udplite4 :protocol register failed.\n");
goto cleanup_udplite4;
@@ -351,15 +351,18 @@ static int udplite_net_init(struct net *net)
return 0;
cleanup_udplite4:
- nf_conntrack_l4proto_unregister(net, &nf_conntrack_l4proto_udplite4);
+ nf_conntrack_l4proto_pernet_unregister(net,
+ &nf_conntrack_l4proto_udplite4);
out:
return ret;
}
static void udplite_net_exit(struct net *net)
{
- nf_conntrack_l4proto_unregister(net, &nf_conntrack_l4proto_udplite6);
- nf_conntrack_l4proto_unregister(net, &nf_conntrack_l4proto_udplite4);
+ nf_conntrack_l4proto_pernet_unregister(net,
+ &nf_conntrack_l4proto_udplite6);
+ nf_conntrack_l4proto_pernet_unregister(net,
+ &nf_conntrack_l4proto_udplite4);
}
static struct pernet_operations udplite_net_ops = {
--
1.7.11.7
^ permalink raw reply related
* [PATCH 13/19] netfilter: ipv4: move registration codes out of pernet_operations
From: Gao feng @ 2012-12-28 2:36 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, canqunzhang, kaber, pablo, ebiederm, Gao feng
In-Reply-To: <1356662206-2260-1-git-send-email-gaofeng@cn.fujitsu.com>
Move the proto (un)registration codes to the module_init/exit context.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 933da838..39521ca 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -501,6 +501,18 @@ static int __init nf_conntrack_l3proto_ipv4_init(void)
goto cleanup_pernet;
}
+ ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_tcp4);
+ if (ret < 0)
+ goto cleanup_tcp4;
+
+ ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_udp4);
+ if (ret < 0)
+ goto cleanup_udp4;
+
+ ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_icmp);
+ if (ret < 0)
+ goto cleanup_icmp;
+
ret = nf_conntrack_l3proto_register(&nf_conntrack_l3proto_ipv4);
if (ret < 0)
goto cleanup_proto;
@@ -515,6 +527,12 @@ static int __init nf_conntrack_l3proto_ipv4_init(void)
nf_conntrack_l3proto_unregister(&nf_conntrack_l3proto_ipv4);
#endif
cleanup_proto:
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_icmp);
+ cleanup_icmp:
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udp4);
+ cleanup_udp4:
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_tcp4);
+ cleanup_tcp4:
nf_unregister_hooks(ipv4_conntrack_ops, ARRAY_SIZE(ipv4_conntrack_ops));
cleanup_pernet:
unregister_pernet_subsys(&ipv4_net_ops);
@@ -529,7 +547,11 @@ static void __exit nf_conntrack_l3proto_ipv4_fini(void)
#if defined(CONFIG_PROC_FS) && defined(CONFIG_NF_CONNTRACK_PROC_COMPAT)
nf_conntrack_ipv4_compat_fini();
#endif
+
nf_conntrack_l3proto_unregister(&nf_conntrack_l3proto_ipv4);
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_icmp);
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udp4);
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_tcp4);
nf_unregister_hooks(ipv4_conntrack_ops, ARRAY_SIZE(ipv4_conntrack_ops));
unregister_pernet_subsys(&ipv4_net_ops);
nf_unregister_sockopt(&so_getorigdst);
--
1.7.11.7
^ permalink raw reply related
* [PATCH 14/19] netfilter: ipv6: move registration codes out of pernet_operations
From: Gao feng @ 2012-12-28 2:36 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, canqunzhang, kaber, pablo, ebiederm, Gao feng
In-Reply-To: <1356662206-2260-1-git-send-email-gaofeng@cn.fujitsu.com>
Move the proto (un)registration codes to the module_init/exit context.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 8db8182..6787ab1 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -499,12 +499,31 @@ static int __init nf_conntrack_l3proto_ipv6_init(void)
"hook.\n");
goto cleanup_ipv6;
}
+
+ ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_tcp6);
+ if (ret < 0)
+ goto cleanup_tcp6;
+
+ ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_udp6);
+ if (ret < 0)
+ goto cleanup_udp6;
+
+ ret = nf_conntrack_l4proto_register(&nf_conntrack_l4proto_icmpv6);
+ if (ret < 0)
+ goto cleanup_icmpv6;
+
ret = nf_conntrack_l3proto_register(&nf_conntrack_l3proto_ipv6);
if (ret < 0)
goto cleanup_proto;
return ret;
cleanup_proto:
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_icmpv6);
+ cleanup_icmpv6:
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udp6);
+ cleanup_udp6:
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_tcp6);
+ cleanup_tcp6:
nf_unregister_hooks(ipv6_conntrack_ops, ARRAY_SIZE(ipv6_conntrack_ops));
cleanup_ipv6:
unregister_pernet_subsys(&ipv6_net_ops);
@@ -516,6 +535,9 @@ static int __init nf_conntrack_l3proto_ipv6_init(void)
static void __exit nf_conntrack_l3proto_ipv6_fini(void)
{
synchronize_net();
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_icmpv6);
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_udp6);
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_tcp6);
nf_conntrack_l3proto_unregister(&nf_conntrack_l3proto_ipv6);
nf_unregister_hooks(ipv6_conntrack_ops, ARRAY_SIZE(ipv6_conntrack_ops));
unregister_pernet_subsys(&ipv6_net_ops);
--
1.7.11.7
^ permalink raw reply related
* [PATCH 17/19] netfilter: dccp: move registration codes out of pernet_operations
From: Gao feng @ 2012-12-28 2:36 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, canqunzhang, kaber, pablo, ebiederm, Gao feng
In-Reply-To: <1356662206-2260-1-git-send-email-gaofeng@cn.fujitsu.com>
Move the proto (un)registration codes to the module_init/exit context.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
net/netfilter/nf_conntrack_proto_dccp.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index 3850d68..5f84e2b 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -972,11 +972,32 @@ static struct pernet_operations dccp_net_ops = {
static int __init nf_conntrack_proto_dccp_init(void)
{
- return register_pernet_subsys(&dccp_net_ops);
+ int ret;
+ ret = nf_conntrack_l4proto_register(&dccp_proto4);
+ if (ret < 0)
+ goto out_dccp4;
+
+ ret = nf_conntrack_l4proto_register(&dccp_proto6);
+ if (ret < 0)
+ goto out_dccp6;
+
+ ret = register_pernet_subsys(&dccp_net_ops);
+ if (ret < 0)
+ goto out_pernet;
+
+ return 0;
+out_pernet:
+ nf_conntrack_l4proto_unregister(&dccp_proto6);
+out_dccp6:
+ nf_conntrack_l4proto_unregister(&dccp_proto4);
+out_dccp4:
+ return ret;
}
static void __exit nf_conntrack_proto_dccp_fini(void)
{
+ nf_conntrack_l4proto_unregister(&dccp_proto6);
+ nf_conntrack_l4proto_unregister(&dccp_proto4);
unregister_pernet_subsys(&dccp_net_ops);
}
--
1.7.11.7
^ permalink raw reply related
* [PATCH 19/19] netfilter: gre: fix resource leak when unregister gre proto
From: Gao feng @ 2012-12-28 2:36 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, canqunzhang, kaber, pablo, ebiederm, Gao feng
In-Reply-To: <1356662206-2260-1-git-send-email-gaofeng@cn.fujitsu.com>
Currectly we unregister proto before all conntrack entries of
this proto being destroyed. so in function destroy_conntrack
we can't find proper l4proto to call l4proto->destroy.
this will cause resource leak.
Because only nf_conntrack_l4proto_gre4 has its own destroy
pointer. fix this problem by marking gre4 proto disabled,
so this proto will not be found. after all contrack entries
of this proto being destroyed, we can unregister this proto
safely.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
include/net/netfilter/nf_conntrack.h | 2 +-
include/net/netfilter/nf_conntrack_l4proto.h | 9 ++++++
net/netfilter/nf_conntrack_core.c | 9 ++++--
net/netfilter/nf_conntrack_proto.c | 43 +++++++++++++++++++++++++---
net/netfilter/nf_conntrack_proto_gre.c | 3 +-
5 files changed, 57 insertions(+), 9 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index caca0c4..9ba61fe 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -248,7 +248,7 @@ static inline struct nf_conn *nf_ct_untracked_get(void)
extern void nf_ct_untracked_status_or(unsigned long bits);
/* Iterate over all conntracks: if iter returns true, it's deleted. */
-extern void
+extern bool
nf_ct_iterate_cleanup(struct net *net, int (*iter)(struct nf_conn *i, void *data), void *data);
extern void nf_conntrack_free(struct nf_conn *ct);
extern struct nf_conn *
diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
index c001ef7..967ae91 100644
--- a/include/net/netfilter/nf_conntrack_l4proto.h
+++ b/include/net/netfilter/nf_conntrack_l4proto.h
@@ -23,6 +23,10 @@ struct nf_conntrack_l4proto {
/* L4 Protocol number. */
u_int8_t l4proto;
+ /* under unregistration, proto is unavailable, only being set
+ * by gre4 proto */
+ bool disabled;
+
/* Try to fill in the third arg: dataoff is offset past network protocol
hdr. Return true if possible. */
bool (*pkt_to_tuple)(const struct sk_buff *skb, unsigned int dataoff,
@@ -115,6 +119,9 @@ extern struct nf_conntrack_l4proto nf_conntrack_l4proto_generic;
#define MAX_NF_CT_PROTO 256
extern struct nf_conntrack_l4proto *
+__nf_ct_l4proto_find_all(u_int16_t l3proto, u_int8_t l4proto);
+
+extern struct nf_conntrack_l4proto *
__nf_ct_l4proto_find(u_int16_t l3proto, u_int8_t l4proto);
extern struct nf_conntrack_l4proto *
@@ -134,6 +141,8 @@ extern int
nf_conntrack_l4proto_register(struct nf_conntrack_l4proto *proto);
extern void
nf_conntrack_l4proto_unregister(struct nf_conntrack_l4proto *proto);
+extern void
+nf_conntrack_l4proto_disable(struct nf_conntrack_l4proto *proto);
static inline void nf_ct_kfree_compat_sysctl_table(struct nf_proto_net *pn)
{
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index fc0805e..788e6fe 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -208,7 +208,7 @@ destroy_conntrack(struct nf_conntrack *nfct)
* destroy_conntrack() MUST NOT be called with a write lock
* to nf_conntrack_lock!!! -HW */
rcu_read_lock();
- l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
+ l4proto = __nf_ct_l4proto_find_all(nf_ct_l3num(ct), nf_ct_protonum(ct));
if (l4proto && l4proto->destroy)
l4proto->destroy(ct);
@@ -1237,21 +1237,24 @@ found:
return ct;
}
-void nf_ct_iterate_cleanup(struct net *net,
+bool nf_ct_iterate_cleanup(struct net *net,
int (*iter)(struct nf_conn *i, void *data),
void *data)
{
struct nf_conn *ct;
unsigned int bucket = 0;
+ bool ret = true;
while ((ct = get_next_corpse(net, iter, data, &bucket)) != NULL) {
/* Time to push up daises... */
if (del_timer(&ct->timeout))
death_by_timeout((unsigned long)ct);
/* ... else the timer will get him soon. */
-
+ if (atomic_read(&ct->ct_general.use) > 1)
+ ret = false;
nf_ct_put(ct);
}
+ return ret;
}
EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup);
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index ff38923..0f6251e 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -65,13 +65,26 @@ nf_ct_unregister_sysctl(struct ctl_table_header **header,
#endif
struct nf_conntrack_l4proto *
-__nf_ct_l4proto_find(u_int16_t l3proto, u_int8_t l4proto)
+__nf_ct_l4proto_find_all(u_int16_t l3proto, u_int8_t l4proto)
{
if (unlikely(l3proto >= AF_MAX || nf_ct_protos[l3proto] == NULL))
return &nf_conntrack_l4proto_generic;
return rcu_dereference(nf_ct_protos[l3proto][l4proto]);
}
+EXPORT_SYMBOL_GPL(__nf_ct_l4proto_find_all);
+
+struct nf_conntrack_l4proto *
+__nf_ct_l4proto_find(u_int16_t l3proto, u_int8_t l4proto)
+{
+ struct nf_conntrack_l4proto *proto;
+ proto = __nf_ct_l4proto_find_all(l3proto, l4proto);
+
+ if (proto->disabled)
+ return &nf_conntrack_l4proto_generic;
+
+ return proto;
+}
EXPORT_SYMBOL_GPL(__nf_ct_l4proto_find);
/* this is guaranteed to always return a valid protocol helper, since
@@ -457,6 +470,10 @@ nf_conntrack_l4proto_unregister(struct nf_conntrack_l4proto *l4proto)
{
BUG_ON(l4proto->l3proto >= PF_MAX);
+ if (l4proto->destroy && !l4proto->disabled)
+ pr_warn("unregister l4proto %s: disable proto first\n",
+ l4proto->name);
+
mutex_lock(&nf_ct_proto_mutex);
BUG_ON(rcu_dereference_protected(
nf_ct_protos[l4proto->l3proto][l4proto->l4proto],
@@ -481,12 +498,30 @@ void nf_conntrack_l4proto_pernet_unregister(struct net *net,
pn->users--;
nf_ct_l4proto_unregister_sysctl(net, pn, l4proto);
-
- /* Remove all contrack entries for this protocol */
- nf_ct_iterate_cleanup(net, kill_l4proto, l4proto);
+retry:
+ /* Remove all contrack entries for this protocol.
+ * l4proto is needed in nf_ct_destroy to destroy
+ * conntrack's proto private data,So make sure all
+ * conntrack entries being destroyed for this protocol.
+ * Then the l4proto can be unregistered safely.
+ */
+ if (!nf_ct_iterate_cleanup(net, kill_l4proto, l4proto)) {
+ schedule();
+ goto retry;
+ }
}
EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_pernet_unregister);
+void nf_conntrack_l4proto_disable(struct nf_conntrack_l4proto *proto)
+{
+ mutex_lock(&nf_ct_proto_mutex);
+ proto->disabled = true;
+ mutex_unlock(&nf_ct_proto_mutex);
+
+ synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_disable);
+
int nf_conntrack_proto_pernet_init(struct net *net)
{
int err;
diff --git a/net/netfilter/nf_conntrack_proto_gre.c b/net/netfilter/nf_conntrack_proto_gre.c
index ea1f651..2730f0d 100644
--- a/net/netfilter/nf_conntrack_proto_gre.c
+++ b/net/netfilter/nf_conntrack_proto_gre.c
@@ -438,8 +438,9 @@ out_gre4:
static void __exit nf_ct_proto_gre_fini(void)
{
- nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_gre4);
+ nf_conntrack_l4proto_disable(&nf_conntrack_l4proto_gre4);
unregister_pernet_subsys(&proto_gre_net_ops);
+ nf_conntrack_l4proto_unregister(&nf_conntrack_l4proto_gre4);
}
module_init(nf_ct_proto_gre_init);
--
1.7.11.7
^ permalink raw reply related
* [PATCH 01/19] netfilter: move nf_conntrack initialize out of pernet operations
From: Gao feng @ 2012-12-28 2:36 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, canqunzhang, kaber, pablo, ebiederm, Gao feng
canqun zhang reported a panic BUG,kernel may panic when
unloading nf_conntrack module.
It's because we reset nf_ct_destroy to NULL when we deal
with init_net,it's too early.Some packets belongs to other
netns still refers to the conntrack.when these packets need
to be freed, kfree_skb will call nf_ct_destroy which is
NULL.
fix this bug by moving the nf_conntrack initialize and cleanup
codes out of the pernet operations,this job should be done
in module_init/exit.We can't use init_net to identify if
it's the right time.
Reported-by: canqun zhang <canqunzhang@gmail.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
include/net/netfilter/nf_conntrack_core.h | 10 +++-
net/netfilter/nf_conntrack_core.c | 99 ++++++++++++-------------------
net/netfilter/nf_conntrack_standalone.c | 29 ++++++---
3 files changed, 67 insertions(+), 71 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index d8f5b9f..ec51a3c 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -25,8 +25,14 @@ extern unsigned int nf_conntrack_in(struct net *net,
unsigned int hooknum,
struct sk_buff *skb);
-extern int nf_conntrack_init(struct net *net);
-extern void nf_conntrack_cleanup(struct net *net);
+extern int nf_conntrack_init_net(struct net *net);
+extern void nf_conntrack_cleanup_net(struct net *net);
+
+extern int nf_conntrack_init_start(void);
+extern void nf_conntrack_cleanup_start(void);
+
+extern void nf_conntrack_init_end(void);
+extern void nf_conntrack_cleanup_end(void);
extern int nf_conntrack_proto_init(struct net *net);
extern void nf_conntrack_proto_fini(struct net *net);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 08cdc71..ffb2463 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1331,18 +1331,23 @@ static int untrack_refs(void)
return cnt;
}
-static void nf_conntrack_cleanup_init_net(void)
+void nf_conntrack_cleanup_start(void)
{
- while (untrack_refs() > 0)
- schedule();
-
-#ifdef CONFIG_NF_CONNTRACK_ZONES
- nf_ct_extend_unregister(&nf_ct_zone_extend);
-#endif
+ RCU_INIT_POINTER(ip_ct_attach, NULL);
}
-static void nf_conntrack_cleanup_net(struct net *net)
+/*
+ * Mishearing the voices in his head, our hero wonders how he's
+ * supposed to kill the mall.
+ */
+void nf_conntrack_cleanup_net(struct net *net)
{
+ /*
+ * This makes sure all current packets have passed through
+ * netfilter framework. Roll on, two-stage module
+ * delete...
+ */
+ synchronize_net();
i_see_dead_people:
nf_ct_iterate_cleanup(net, kill_all, NULL);
nf_ct_release_dying_list(net);
@@ -1352,6 +1357,7 @@ static void nf_conntrack_cleanup_net(struct net *net)
}
nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
+ nf_conntrack_proto_fini(net);
nf_conntrack_helper_fini(net);
nf_conntrack_timeout_fini(net);
nf_conntrack_ecache_fini(net);
@@ -1363,24 +1369,15 @@ static void nf_conntrack_cleanup_net(struct net *net)
free_percpu(net->ct.stat);
}
-/* Mishearing the voices in his head, our hero wonders how he's
- supposed to kill the mall. */
-void nf_conntrack_cleanup(struct net *net)
+void nf_conntrack_cleanup_end(void)
{
- if (net_eq(net, &init_net))
- RCU_INIT_POINTER(ip_ct_attach, NULL);
-
- /* This makes sure all current packets have passed through
- netfilter framework. Roll on, two-stage module
- delete... */
- synchronize_net();
- nf_conntrack_proto_fini(net);
- nf_conntrack_cleanup_net(net);
+ RCU_INIT_POINTER(nf_ct_destroy, NULL);
+ while (untrack_refs() > 0)
+ schedule();
- if (net_eq(net, &init_net)) {
- RCU_INIT_POINTER(nf_ct_destroy, NULL);
- nf_conntrack_cleanup_init_net();
- }
+#ifdef CONFIG_NF_CONNTRACK_ZONES
+ nf_ct_extend_unregister(&nf_ct_zone_extend);
+#endif
}
void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls)
@@ -1473,7 +1470,7 @@ void nf_ct_untracked_status_or(unsigned long bits)
}
EXPORT_SYMBOL_GPL(nf_ct_untracked_status_or);
-static int nf_conntrack_init_init_net(void)
+int nf_conntrack_init_start(void)
{
int max_factor = 8;
int ret, cpu;
@@ -1527,7 +1524,7 @@ err_extend:
#define UNCONFIRMED_NULLS_VAL ((1<<30)+0)
#define DYING_NULLS_VAL ((1<<30)+1)
-static int nf_conntrack_init_net(struct net *net)
+int nf_conntrack_init_net(struct net *net)
{
int ret;
@@ -1580,7 +1577,12 @@ static int nf_conntrack_init_net(struct net *net)
ret = nf_conntrack_helper_init(net);
if (ret < 0)
goto err_helper;
+ ret = nf_conntrack_proto_init(net);
+ if (ret < 0)
+ goto out_proto;
return 0;
+out_proto:
+ nf_conntrack_helper_fini(net);
err_helper:
nf_conntrack_timeout_fini(net);
err_timeout:
@@ -1603,42 +1605,17 @@ err_stat:
return ret;
}
+void nf_conntrack_init_end(void)
+{
+ /* For use by REJECT target */
+ RCU_INIT_POINTER(ip_ct_attach, nf_conntrack_attach);
+ RCU_INIT_POINTER(nf_ct_destroy, destroy_conntrack);
+
+ /* Howto get NAT offsets */
+ RCU_INIT_POINTER(nf_ct_nat_offset, NULL);
+}
+
s16 (*nf_ct_nat_offset)(const struct nf_conn *ct,
enum ip_conntrack_dir dir,
u32 seq);
EXPORT_SYMBOL_GPL(nf_ct_nat_offset);
-
-int nf_conntrack_init(struct net *net)
-{
- int ret;
-
- if (net_eq(net, &init_net)) {
- ret = nf_conntrack_init_init_net();
- if (ret < 0)
- goto out_init_net;
- }
- ret = nf_conntrack_proto_init(net);
- if (ret < 0)
- goto out_proto;
- ret = nf_conntrack_init_net(net);
- if (ret < 0)
- goto out_net;
-
- if (net_eq(net, &init_net)) {
- /* For use by REJECT target */
- RCU_INIT_POINTER(ip_ct_attach, nf_conntrack_attach);
- RCU_INIT_POINTER(nf_ct_destroy, destroy_conntrack);
-
- /* Howto get NAT offsets */
- RCU_INIT_POINTER(nf_ct_nat_offset, NULL);
- }
- return 0;
-
-out_net:
- nf_conntrack_proto_fini(net);
-out_proto:
- if (net_eq(net, &init_net))
- nf_conntrack_cleanup_init_net();
-out_init_net:
- return ret;
-}
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 363285d..00bf93c 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -530,11 +530,11 @@ static void nf_conntrack_standalone_fini_sysctl(struct net *net)
}
#endif /* CONFIG_SYSCTL */
-static int nf_conntrack_net_init(struct net *net)
+static int nf_conntrack_pernet_init(struct net *net)
{
int ret;
- ret = nf_conntrack_init(net);
+ ret = nf_conntrack_init_net(net);
if (ret < 0)
goto out_init;
ret = nf_conntrack_standalone_init_proc(net);
@@ -550,31 +550,44 @@ static int nf_conntrack_net_init(struct net *net)
out_sysctl:
nf_conntrack_standalone_fini_proc(net);
out_proc:
- nf_conntrack_cleanup(net);
+ nf_conntrack_cleanup_net(net);
out_init:
return ret;
}
-static void nf_conntrack_net_exit(struct net *net)
+static void nf_conntrack_pernet_exit(struct net *net)
{
nf_conntrack_standalone_fini_sysctl(net);
nf_conntrack_standalone_fini_proc(net);
- nf_conntrack_cleanup(net);
+ nf_conntrack_cleanup_net(net);
}
static struct pernet_operations nf_conntrack_net_ops = {
- .init = nf_conntrack_net_init,
- .exit = nf_conntrack_net_exit,
+ .init = nf_conntrack_pernet_init,
+ .exit = nf_conntrack_pernet_exit,
};
static int __init nf_conntrack_standalone_init(void)
{
- return register_pernet_subsys(&nf_conntrack_net_ops);
+ int ret = nf_conntrack_init_start();
+ if (ret < 0)
+ goto out_start;
+ ret = register_pernet_subsys(&nf_conntrack_net_ops);
+ if (ret < 0)
+ goto out_pernet;
+ nf_conntrack_init_end();
+ return 0;
+out_pernet:
+ nf_conntrack_cleanup_end();
+out_start:
+ return ret;
}
static void __exit nf_conntrack_standalone_fini(void)
{
+ nf_conntrack_cleanup_start();
unregister_pernet_subsys(&nf_conntrack_net_ops);
+ nf_conntrack_cleanup_end();
}
module_init(nf_conntrack_standalone_init);
--
1.7.11.7
^ permalink raw reply related
* [PATCH 04/19] netfilter: tstamp: move initial codes out of pernet_operations
From: Gao feng @ 2012-12-28 2:36 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, canqunzhang, kaber, pablo, ebiederm, Gao feng
In-Reply-To: <1356662206-2260-1-git-send-email-gaofeng@cn.fujitsu.com>
Move the global initial codes to the module_init/exit context.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
include/net/netfilter/nf_conntrack_timestamp.h | 21 ++++++++++++---
net/netfilter/nf_conntrack_core.c | 15 ++++++++---
net/netfilter/nf_conntrack_timestamp.c | 37 +++++++++-----------------
3 files changed, 41 insertions(+), 32 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_timestamp.h b/include/net/netfilter/nf_conntrack_timestamp.h
index fc9c82b..b004614 100644
--- a/include/net/netfilter/nf_conntrack_timestamp.h
+++ b/include/net/netfilter/nf_conntrack_timestamp.h
@@ -48,15 +48,28 @@ static inline void nf_ct_set_tstamp(struct net *net, bool enable)
}
#ifdef CONFIG_NF_CONNTRACK_TIMESTAMP
-extern int nf_conntrack_tstamp_init(struct net *net);
-extern void nf_conntrack_tstamp_fini(struct net *net);
+extern int nf_conntrack_tstamp_pernet_init(struct net *net);
+extern void nf_conntrack_tstamp_pernet_fini(struct net *net);
+
+extern int nf_conntrack_tstamp_init(void);
+extern void nf_conntrack_tstamp_fini(void);
#else
-static inline int nf_conntrack_tstamp_init(struct net *net)
+static inline int nf_conntrack_tstamp_pernet_init(struct net *net)
+{
+ return 0;
+}
+
+static inline void nf_conntrack_tstamp_pernet_fini(struct net *net)
+{
+ return;
+}
+
+static inline int nf_conntrack_tstamp_init(void)
{
return 0;
}
-static inline void nf_conntrack_tstamp_fini(struct net *net)
+static inline void nf_conntrack_tstamp_fini(void)
{
return;
}
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index feb08d0..b8fb65a 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1361,7 +1361,7 @@ void nf_conntrack_cleanup_net(struct net *net)
nf_conntrack_helper_fini(net);
nf_conntrack_timeout_fini(net);
nf_conntrack_ecache_fini(net);
- nf_conntrack_tstamp_fini(net);
+ nf_conntrack_tstamp_pernet_fini(net);
nf_conntrack_acct_pernet_fini(net);
nf_conntrack_expect_pernet_fini(net);
kmem_cache_destroy(net->ct.nf_conntrack_cachep);
@@ -1378,6 +1378,7 @@ void nf_conntrack_cleanup_end(void)
#ifdef CONFIG_NF_CONNTRACK_ZONES
nf_ct_extend_unregister(&nf_ct_zone_extend);
#endif
+ nf_conntrack_tstamp_fini();
nf_conntrack_acct_fini();
nf_conntrack_expect_fini();
}
@@ -1508,6 +1509,10 @@ int nf_conntrack_init_start(void)
if (ret < 0)
goto err_acct;
+ ret = nf_conntrack_tstamp_init();
+ if (ret < 0)
+ goto err_tstamp;
+
#ifdef CONFIG_NF_CONNTRACK_ZONES
ret = nf_ct_extend_register(&nf_ct_zone_extend);
if (ret < 0)
@@ -1525,8 +1530,10 @@ int nf_conntrack_init_start(void)
#ifdef CONFIG_NF_CONNTRACK_ZONES
err_extend:
- nf_conntrack_acct_fini();
+ nf_conntrack_tstamp_fini();
#endif
+err_tstamp:
+ nf_conntrack_acct_fini();
err_acct:
nf_conntrack_expect_fini();
err_expect:
@@ -1580,7 +1587,7 @@ int nf_conntrack_init_net(struct net *net)
ret = nf_conntrack_acct_pernet_init(net);
if (ret < 0)
goto err_acct;
- ret = nf_conntrack_tstamp_init(net);
+ ret = nf_conntrack_tstamp_pernet_init(net);
if (ret < 0)
goto err_tstamp;
ret = nf_conntrack_ecache_init(net);
@@ -1603,7 +1610,7 @@ err_helper:
err_timeout:
nf_conntrack_ecache_fini(net);
err_ecache:
- nf_conntrack_tstamp_fini(net);
+ nf_conntrack_tstamp_pernet_fini(net);
err_tstamp:
nf_conntrack_acct_pernet_fini(net);
err_acct:
diff --git a/net/netfilter/nf_conntrack_timestamp.c b/net/netfilter/nf_conntrack_timestamp.c
index 7ea8026..2df0135 100644
--- a/net/netfilter/nf_conntrack_timestamp.c
+++ b/net/netfilter/nf_conntrack_timestamp.c
@@ -88,37 +88,26 @@ static void nf_conntrack_tstamp_fini_sysctl(struct net *net)
}
#endif
-int nf_conntrack_tstamp_init(struct net *net)
+int nf_conntrack_tstamp_pernet_init(struct net *net)
{
- int ret;
-
net->ct.sysctl_tstamp = nf_ct_tstamp;
+ return nf_conntrack_tstamp_init_sysctl(net);
+}
- if (net_eq(net, &init_net)) {
- ret = nf_ct_extend_register(&tstamp_extend);
- if (ret < 0) {
- printk(KERN_ERR "nf_ct_tstamp: Unable to register "
- "extension\n");
- goto out_extend_register;
- }
- }
+void nf_conntrack_tstamp_pernet_fini(struct net *net)
+{
+ nf_conntrack_tstamp_fini_sysctl(net);
+}
- ret = nf_conntrack_tstamp_init_sysctl(net);
+int nf_conntrack_tstamp_init(void)
+{
+ int ret = nf_ct_extend_register(&tstamp_extend);
if (ret < 0)
- goto out_sysctl;
-
- return 0;
-
-out_sysctl:
- if (net_eq(net, &init_net))
- nf_ct_extend_unregister(&tstamp_extend);
-out_extend_register:
+ pr_err("nf_ct_tstamp: Unable to register extension\n");
return ret;
}
-void nf_conntrack_tstamp_fini(struct net *net)
+void nf_conntrack_tstamp_fini(void)
{
- nf_conntrack_tstamp_fini_sysctl(net);
- if (net_eq(net, &init_net))
- nf_ct_extend_unregister(&tstamp_extend);
+ nf_ct_extend_unregister(&tstamp_extend);
}
--
1.7.11.7
^ permalink raw reply related
* [PATCH 05/19] netfilter: ecache: move initial codes out of pernet_operations
From: Gao feng @ 2012-12-28 2:36 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, canqunzhang, kaber, pablo, ebiederm, Gao feng
In-Reply-To: <1356662206-2260-1-git-send-email-gaofeng@cn.fujitsu.com>
Move the global initial codes to the module_init/exit context.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
include/net/netfilter/nf_conntrack_ecache.h | 19 +++++++++++----
net/netfilter/nf_conntrack_core.c | 14 +++++++----
net/netfilter/nf_conntrack_ecache.c | 37 ++++++++++-------------------
3 files changed, 38 insertions(+), 32 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_ecache.h b/include/net/netfilter/nf_conntrack_ecache.h
index 5654d29..092dc65 100644
--- a/include/net/netfilter/nf_conntrack_ecache.h
+++ b/include/net/netfilter/nf_conntrack_ecache.h
@@ -207,9 +207,11 @@ nf_ct_expect_event(enum ip_conntrack_expect_events event,
nf_ct_expect_event_report(event, exp, 0, 0);
}
-extern int nf_conntrack_ecache_init(struct net *net);
-extern void nf_conntrack_ecache_fini(struct net *net);
+extern int nf_conntrack_ecache_pernet_init(struct net *net);
+extern void nf_conntrack_ecache_pernet_fini(struct net *net);
+extern int nf_conntrack_ecache_init(void);
+extern void nf_conntrack_ecache_fini(void);
#else /* CONFIG_NF_CONNTRACK_EVENTS */
static inline void nf_conntrack_event_cache(enum ip_conntrack_events event,
@@ -232,12 +234,21 @@ static inline void nf_ct_expect_event_report(enum ip_conntrack_expect_events e,
u32 portid,
int report) {}
-static inline int nf_conntrack_ecache_init(struct net *net)
+static inline int nf_conntrack_ecache_pernet_init(struct net *net)
{
return 0;
}
-static inline void nf_conntrack_ecache_fini(struct net *net)
+static inline void nf_conntrack_ecache_pernet_fini(struct net *net)
+{
+}
+
+static inline int nf_conntrack_ecache_init(void)
+{
+ return 0;
+}
+
+static inline void nf_conntrack_ecache_fini(void)
{
}
#endif /* CONFIG_NF_CONNTRACK_EVENTS */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index b8fb65a..b34bc9a 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1360,7 +1360,7 @@ void nf_conntrack_cleanup_net(struct net *net)
nf_conntrack_proto_fini(net);
nf_conntrack_helper_fini(net);
nf_conntrack_timeout_fini(net);
- nf_conntrack_ecache_fini(net);
+ nf_conntrack_ecache_pernet_fini(net);
nf_conntrack_tstamp_pernet_fini(net);
nf_conntrack_acct_pernet_fini(net);
nf_conntrack_expect_pernet_fini(net);
@@ -1378,6 +1378,7 @@ void nf_conntrack_cleanup_end(void)
#ifdef CONFIG_NF_CONNTRACK_ZONES
nf_ct_extend_unregister(&nf_ct_zone_extend);
#endif
+ nf_conntrack_ecache_fini();
nf_conntrack_tstamp_fini();
nf_conntrack_acct_fini();
nf_conntrack_expect_fini();
@@ -1512,6 +1513,9 @@ int nf_conntrack_init_start(void)
ret = nf_conntrack_tstamp_init();
if (ret < 0)
goto err_tstamp;
+ ret = nf_conntrack_ecache_init();
+ if (ret < 0)
+ goto err_ecache;
#ifdef CONFIG_NF_CONNTRACK_ZONES
ret = nf_ct_extend_register(&nf_ct_zone_extend);
@@ -1530,8 +1534,10 @@ int nf_conntrack_init_start(void)
#ifdef CONFIG_NF_CONNTRACK_ZONES
err_extend:
- nf_conntrack_tstamp_fini();
+ nf_conntrack_ecache_fini();
#endif
+err_ecache:
+ nf_conntrack_tstamp_fini();
err_tstamp:
nf_conntrack_acct_fini();
err_acct:
@@ -1590,7 +1596,7 @@ int nf_conntrack_init_net(struct net *net)
ret = nf_conntrack_tstamp_pernet_init(net);
if (ret < 0)
goto err_tstamp;
- ret = nf_conntrack_ecache_init(net);
+ ret = nf_conntrack_ecache_pernet_init(net);
if (ret < 0)
goto err_ecache;
ret = nf_conntrack_timeout_init(net);
@@ -1608,7 +1614,7 @@ out_proto:
err_helper:
nf_conntrack_timeout_fini(net);
err_timeout:
- nf_conntrack_ecache_fini(net);
+ nf_conntrack_ecache_pernet_fini(net);
err_ecache:
nf_conntrack_tstamp_pernet_fini(net);
err_tstamp:
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index faa978f..b5d2eb8 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -233,38 +233,27 @@ static void nf_conntrack_event_fini_sysctl(struct net *net)
}
#endif /* CONFIG_SYSCTL */
-int nf_conntrack_ecache_init(struct net *net)
+int nf_conntrack_ecache_pernet_init(struct net *net)
{
- int ret;
-
net->ct.sysctl_events = nf_ct_events;
net->ct.sysctl_events_retry_timeout = nf_ct_events_retry_timeout;
+ return nf_conntrack_event_init_sysctl(net);
+}
- if (net_eq(net, &init_net)) {
- ret = nf_ct_extend_register(&event_extend);
- if (ret < 0) {
- printk(KERN_ERR "nf_ct_event: Unable to register "
- "event extension.\n");
- goto out_extend_register;
- }
- }
+void nf_conntrack_ecache_pernet_fini(struct net *net)
+{
+ nf_conntrack_event_fini_sysctl(net);
+}
- ret = nf_conntrack_event_init_sysctl(net);
+int nf_conntrack_ecache_init(void)
+{
+ int ret = nf_ct_extend_register(&event_extend);
if (ret < 0)
- goto out_sysctl;
-
- return 0;
-
-out_sysctl:
- if (net_eq(net, &init_net))
- nf_ct_extend_unregister(&event_extend);
-out_extend_register:
+ pr_err("nf_ct_event: Unable to register event extension.\n");
return ret;
}
-void nf_conntrack_ecache_fini(struct net *net)
+void nf_conntrack_ecache_fini(void)
{
- nf_conntrack_event_fini_sysctl(net);
- if (net_eq(net, &init_net))
- nf_ct_extend_unregister(&event_extend);
+ nf_ct_extend_unregister(&event_extend);
}
--
1.7.11.7
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox