Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 1/4] flow_keys: include thoff into flow_keys for later usage
From: Daniel Borkmann @ 2013-03-19 13:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jasowang
In-Reply-To: <cover.1363699285.git.dborkman@redhat.com>

In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter. We need to reorder choke_skb_cb a bit, though. Also, by
having thoff 16 Bit, we do not need to pack flow_keys.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/net/flow_keys.h   | 1 +
 net/core/flow_dissector.c | 5 ++++-
 net/sched/sch_choke.c     | 4 ++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index 80461c1..bb8271d 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -9,6 +9,7 @@ struct flow_keys {
 		__be32 ports;
 		__be16 port16[2];
 	};
+	u16 thoff;
 	u8 ip_proto;
 };
 
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index f8d9e03..eb9dde1 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -23,7 +23,8 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
 
 bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow)
 {
-	int poff, nhoff = skb_network_offset(skb);
+	int poff;
+	u16 nhoff = skb_network_offset(skb);
 	u8 ip_proto;
 	__be16 proto = skb->protocol;
 
@@ -151,6 +152,8 @@ ipv6:
 			flow->ports = *ports;
 	}
 
+	flow->thoff = nhoff;
+
 	return true;
 }
 EXPORT_SYMBOL(skb_flow_dissect);
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index cc37dd5..9854c2b 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -141,9 +141,9 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
 }
 
 struct choke_skb_cb {
-	u16			classid;
-	u8			keys_valid;
 	struct flow_keys	keys;
+	u8			keys_valid;
+	u16			classid;
 };
 
 static inline struct choke_skb_cb *choke_skb_cb(const struct sk_buff *skb)
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 0/4] net: filter: BPF updates
From: Daniel Borkmann @ 2013-03-19 13:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jasowang

This set adds i) an ancillary operation to the BPF engine and ii) a
BPF JIT image disassembler in order to verify or debug the BPF JIT
compilers under arch/*/net/.

Daniel Borkmann (4):
  flow_keys: include thoff into flow_keys for later usage
  net: flow_dissector: add __skb_get_poff to get a start offset to payload
  filter: add ANC_PAY_OFFSET instruction for loading payload start offset
  filter: add minimal BPF JIT emitted image disassembler

 include/linux/filter.h      |   1 +
 include/linux/skbuff.h      |   2 +
 include/net/flow_keys.h     |   1 +
 include/uapi/linux/filter.h |   3 +-
 net/core/filter.c           |   5 +
 net/core/flow_dissector.c   |  62 ++++++++++++-
 net/sched/sch_choke.c       |   4 +-
 scripts/bpf_jit_disasm.c    | 216 ++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 290 insertions(+), 4 deletions(-)
 create mode 100644 scripts/bpf_jit_disasm.c

-- 
1.7.11.7

^ permalink raw reply

* Re: [PATCH] use xfrm direction when lookup policy
From: David Miller @ 2013-03-19 13:27 UTC (permalink / raw)
  To: baker.kernel; +Cc: netdev, linux-kernel, baker.zhang
In-Reply-To: <1363696786-11128-1-git-send-email-baker.zhang@gmail.com>

From: Baker Zhang <baker.kernel@gmail.com>
Date: Tue, 19 Mar 2013 20:39:46 +0800

> +static inline int flow_to_policy_dir(int dir)

Please do not use inline in foo.c files, just let the compiler decide
whether inlining is a good idea or not.

Thanks.

^ permalink raw reply

* Re: [PATCH v6] net: sh_eth: Add support of device tree probe
From: Sergei Shtylyov @ 2013-03-19 13:22 UTC (permalink / raw)
  To: Nobuhiro Iwamatsu
  Cc: netdev, devicetree-discuss, magnus.damm, kda, horms+renesas,
	mark.rutland, grant.likely
In-Reply-To: <1363675196-8940-1-git-send-email-nobuhiro.iwamatsu.yj@renesas.com>

On 19-03-2013 10:39, Nobuhiro Iwamatsu wrote:

> This adds support of device tree probe for Renesas sh-ether driver.

> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>

> V6: - Add renesas,sh-eth-gigabit, renesas,sh-eth-sh4 and
>        renesas,sh-eth-sh3-sh2 to compatible string.
> 	- Remove sh-eth,register-type. This is supplemented by the
>        compatible string.
>      - Use the of_property_read_bool instead of of_find_property.
>      - Add sanity chheck for of_property_read_u32.
>      - Update document.
> V5: - Rewrite sh_eth_parse_dt().
>        Remove of_device_is_available() and CONFIG_OF from support OF
>        checking function and re-add empty sh_eth_parse_dt().
>      - Add CONFIG_PM to definition of dev_pm_ops.
> 	- Add CONFIG_OF to definition of of_device_id.
> V4: - Remove empty sh_eth_parse_dt().
> V3: - Remove empty sh_eth_parse_dt().
> V3: - Removed sentnece of "needs-init" from document.
> V2: - Removed ether_setup().
>      - Fixed typo from "sh-etn" to "sh-eth".
> 	- Removed "needs-init" and sh-eth,endian from definition of DT.
> 	- Changed "sh-eth,edmac-endian" instead of "sh-eth,edmac-big-endain"
> 	  in definition of DT.
> ---
>   Documentation/devicetree/bindings/net/sh_ether.txt |   48 +++++++
>   drivers/net/ethernet/renesas/sh_eth.c              |  145 ++++++++++++++++----
>   2 files changed, 168 insertions(+), 25 deletions(-)
>   create mode 100644 Documentation/devicetree/bindings/net/sh_ether.txt

> diff --git a/Documentation/devicetree/bindings/net/sh_ether.txt b/Documentation/devicetree/bindings/net/sh_ether.txt
> new file mode 100644
> index 0000000..d1f961c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/sh_ether.txt
> @@ -0,0 +1,48 @@
> +* Renesas Electronics SuperH EMAC
> +
> +This file provides information, what the device node
> +for the sh_eth interface contains.
> +
> +Required properties:
> +- compatible:                   "renesas,sh-eth-gigabit": If a device has
> +                                            a function of gigabit, you should
> +											set this.
> +                                "renesas,sh-eth-sh4": If this is provided by
> +                                            SH4, you should set this.
> +                                "renesas,sh-eth-sh3-sh2": If this is provided
> +                                            by SH3 or SH2, you should set this.
> +
> +- interrupt-parent:             The phandle for the interrupt controller that
> +                                services interrupts for this device.
> +
> +- reg:                          Offset and length of the register set for the
> +                                device. If device has TSU registers, you need
> +                                to set up two register information here.
> +
> +- interrupts:                   Interrupt mapping for the sh_eth interrupt
> +                                sources (vector id). You can set one value.
> +
> +- phy-mode:                     String, operation mode of the PHY interface
> +                                (a string that of_get_phy_mode() can understand).
> +
> +- sh-eth,phy-id:                PHY id.
> +
> +Optional properties:
> +- local-mac-address:            6 bytes, mac address
> +- sh-eth,no-ether-link:         Set link control by software. When device does

    I got the impression that usually the vendor name, not driver name is used 
as a property prefix.

> diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
> index 7a6471d..d9df68e 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -1,7 +1,7 @@
>   /*
>    *  SuperH Ethernet device driver
>    *
> - *  Copyright (C) 2006-2012 Nobuhiro Iwamatsu
> + *  Copyright (C) 2006-2013 Nobuhiro Iwamatsu
>    *  Copyright (C) 2008-2012 Renesas Solutions Corp.

    Don't you want to extend the copyright of Renesas also -- you seem to be 
still working there. :-)

> @@ -2391,12 +2451,33 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
>   		goto out_release;
>   	}
>
> +	if (np) {
> +		pd = sh_eth_parse_dt(&pdev->dev, ndev, np);
> +		if (pdev->dev.platform_data && pd) {
> +			struct sh_eth_plat_data *tmp =
> +				pdev->dev.platform_data;
> +			pd->set_mdio_gate = tmp->set_mdio_gate;
> +			pd->needs_init = tmp->needs_init;

    OK, so we can't fully convert this driver to the device tree due to 
procedural platform data. I then would advice just using OF_DEV_AUXDATA() in 
the platform data instead of trying to convert the driver to device tree.

[...]
> @@ -2422,20 +2505,16 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
[...]
>   		mdp->tsu_addr = ioremap(rtsu->start,
> -					resource_size(rtsu));
> +				resource_size(rtsu));

    Why? It was indented perfectly before. Anyway, you shouldn't be doing 
collateral whitespace changes.

>   		mdp->port = devno % 2;
>   		ndev->features = NETIF_F_HW_VLAN_FILTER;
>   	}
> @@ -2502,6 +2581,7 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
>   	return 0;
>   }
>
> +#ifdef CONFIG_PM

    Unrelated change.

>   static int sh_eth_runtime_nop(struct device *dev)
>   {
>   	/*
> @@ -2515,17 +2595,32 @@ static int sh_eth_runtime_nop(struct device *dev)
>   	return 0;
>   }
>
> -static struct dev_pm_ops sh_eth_dev_pm_ops = {
> +static const struct dev_pm_ops sh_eth_dev_pm_ops = {
>   	.runtime_suspend = sh_eth_runtime_nop,
>   	.runtime_resume = sh_eth_runtime_nop,
>   };
> +#define SH_ETH_PM_OPS (&sh_eth_dev_pm_ops)

    () not needed.

> +#else
> +#define SH_ETH_PM_OPS NULL
> +#endif

    Unrelated change. Should be in a different patch.

> +
> +#ifdef CONFIG_OF
> +static const struct of_device_id sh_eth_match[] = {
> +	{ .compatible = "renesas,sh-eth-gigabit", },
> +	{ .compatible = "renesas,sh-eth-sh4", },
> +	{ .compatible = "renesas,sh-eth-sh3-sh2", },

    Biut this is not really enough: the driver supports much more variations 
of the SH and ARM SoCs all of which have difference not only in register 
layout but also in the registers bits or even presence of the whole register 
blocks. All this IMO should be reflected in the different values of the 
compatible "property". BTW, it seems another register layout and instance 
needs to be added for the R-Car SoCs (instead of the current ugly hack).

> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, sh_eth_match);
> +#endif
>
>   static struct platform_driver sh_eth_driver = {
>   	.probe = sh_eth_drv_probe,
>   	.remove = sh_eth_drv_remove,
>   	.driver = {
>   		   .name = CARDNAME,
> -		   .pm = &sh_eth_dev_pm_ops,
> +		   .pm = SH_ETH_PM_OPS,

    Unrelated as well.

WBR, Sergei

^ permalink raw reply

* Re: [PATCH v5] net: sh_eth: Add support of device tree probe
From: Simon Horman @ 2013-03-19 13:20 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Nobuhiro Iwamatsu, netdev, grant.likely, kuninori.morimoto.gx,
	devicetree-discuss, magnus.damm, kda, Mark Rutland
In-Reply-To: <514861EF.7050407@cogentembedded.com>

On Tue, Mar 19, 2013 at 05:02:39PM +0400, Sergei Shtylyov wrote:
> Hello.
> 
> On 19-03-2013 8:22, Simon Horman wrote:
> 
> >>[ Cc: Mark Rutland ]
> 
> >>On Tue, Mar 05, 2013 at 06:52:20AM +0900, Nobuhiro Iwamatsu wrote:
> >>>This adds support of device tree probe for Renesas sh-ether driver.
> 
> >>>Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
> 
> >>Hi Iwamatsu-san,
> 
> >>could you please address the review by
> >>Mark Rutland of the bindings in this patch.
> >>The review was made for v2 but I believe it
> >>is still applicable to v5.
> 
> >>http://www.spinics.net/lists/netdev/msg226454.html
> 
> >Ping.
> 
>    I don't think converting this driver to device tree can be
> achieved without unravelling the #ifdef mess first (which I'm
> currently doing).
> Or at least the driver will look even uglier than today. So I see
> little use in pinging Iwamatsu-san.

Thanks, I wasn't aware of that.

^ permalink raw reply

* [BUG][mvebu] mvneta: cannot request irq 25 on openblocks-ax3
From: Masami Hiramatsu @ 2013-03-19 13:12 UTC (permalink / raw)
  To: linux-arm-kernel, thomas.petazzoni, Jason Cooper
  Cc: linux-kernel, netdev, yrl.pp-manager.tt@hitachi.com

Hi,

Here I've hit a bug on the recent kernel. As far as I know, this bug
exists on 3.9-rc1 too.

When I tried the latest mvebu for-next tree
(git://git.infradead.org/users/jcooper/linux.git mvebu/for-next),
I got below warning at bootup time and mvneta didn't work (link was never up).
I ensured that "ifconfig ethX up" always caused that.

Does anyone succeed to boot openblocks-ax3 recently or hit same
trouble?

------------[ cut here ]------------
WARNING: at /ssd/ksrc/linux-3/kernel/irq/manage.c:1417
request_threaded_irq+0x64/0x124()
Modules linked in:
Backtrace:
[<c00119a4>] (dump_backtrace+0x0/0x110) from [<c04b2bac>] (dump_stack+0x18/0x1c)
 r6:c05ad11a r5:00000589 r4:00000000 r3:ed80e000
[<c04b2b94>] (dump_stack+0x0/0x1c) from [<c0021484>]
(warn_slowpath_common+0x54/0x70)
[<c0021430>] (warn_slowpath_common+0x0/0x70) from [<c00214c4>]
(warn_slowpath_null+0x24/0x2c)
 r8:00000000 r7:ef0a4600 r6:ee9c4d40 r5:00000000 r4:c037378c
r3:00000009
[<c00214a0>] (warn_slowpath_null+0x0/0x2c) from [<c0084bb8>]
(request_threaded_irq+0x64/0x124)
[<c0084b54>] (request_threaded_irq+0x0/0x124) from [<c037418c>]
(mvneta_open+0x80/0x15c)
[<c037410c>] (mvneta_open+0x0/0x15c) from [<c03c9e3c>] (__dev_open+0xa0/0xf4)
 r6:00001002 r5:c050bb24 r4:ee9c4800
[<c03c9d9c>] (__dev_open+0x0/0xf4) from [<c03ca070>] (__dev_change_flags+0x94/0x118)
 r7:00000001 r6:00001002 r5:00001003 r4:ee9c4800
[<c03c9fdc>] (__dev_change_flags+0x0/0x118) from [<c03ca170>]
(dev_change_flags+0x18/0x4c)
 r7:00000000 r6:edd0d380 r5:00001002 r4:ee9c4800
[<c03ca158>] (dev_change_flags+0x0/0x4c) from [<c03d6168>] (do_setlink+0x2a4/0x7a0)
 r6:edd0d380 r5:ed80fb4c r4:ee9c4800 r3:00001002
[<c03d5ec4>] (do_setlink+0x0/0x7a0) from [<c03d72d8>] (rtnl_newlink+0x248/0x45c)
[<c03d7090>] (rtnl_newlink+0x0/0x45c) from [<c03d7060>]
(rtnetlink_rcv_msg+0x248/0x278)
[<c03d6e18>] (rtnetlink_rcv_msg+0x0/0x278) from [<c03ead78>]
(netlink_rcv_skb+0x58/0xb4)
[<c03ead20>] (netlink_rcv_skb+0x0/0xb4) from [<c03d5d28>] (rtnetlink_rcv+0x20/0x2c)
 r6:eea75c00 r5:eea87240 r4:eea87240 r3:c03d5d08
[<c03d5d08>] (rtnetlink_rcv+0x0/0x2c) from [<c03ea780>]
(netlink_unicast+0x15c/0x214)
 r4:ef0a3000 r3:c03d5d08
[<c03ea624>] (netlink_unicast+0x0/0x214) from [<c03eab94>]
(netlink_sendmsg+0x2b8/0x348)
[<c03ea8dc>] (netlink_sendmsg+0x0/0x348) from [<c03b2dbc>] (sock_sendmsg+0x88/0xa4)
[<c03b2d34>] (sock_sendmsg+0x0/0xa4) from [<c03b3330>] (__sys_sendmsg+0x1c8/0x25c)
 r7:00000000 r6:00000020 r5:eefccc80 r4:ed80ff64
[<c03b3168>] (__sys_sendmsg+0x0/0x25c) from [<c03b4f30>] (sys_sendmsg+0x44/0x68)
[<c03b4eec>] (sys_sendmsg+0x0/0x68) from [<c000daa0>] (ret_fast_syscall+0x0/0x30)
 r6:be9f367c r5:00000000 r4:5146f675
---[ end trace a8ceb54233e388b5 ]---
mvneta d0070000.ethernet eth0: cannot request irq 25

Thank you,
-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply

* Re: [PATCH v5] net: sh_eth: Add support of device tree probe
From: Sergei Shtylyov @ 2013-03-19 13:02 UTC (permalink / raw)
  To: Simon Horman
  Cc: Nobuhiro Iwamatsu, netdev, grant.likely, kuninori.morimoto.gx,
	devicetree-discuss, magnus.damm, kda, Mark Rutland
In-Reply-To: <20130319042229.GG21444@verge.net.au>

Hello.

On 19-03-2013 8:22, Simon Horman wrote:

>> [ Cc: Mark Rutland ]

>> On Tue, Mar 05, 2013 at 06:52:20AM +0900, Nobuhiro Iwamatsu wrote:
>>> This adds support of device tree probe for Renesas sh-ether driver.

>>> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>

>> Hi Iwamatsu-san,

>> could you please address the review by
>> Mark Rutland of the bindings in this patch.
>> The review was made for v2 but I believe it
>> is still applicable to v5.

>> http://www.spinics.net/lists/netdev/msg226454.html

> Ping.

    I don't think converting this driver to device tree can be achieved 
without unravelling the #ifdef mess first (which I'm currently doing).
Or at least the driver will look even uglier than today. So I see little use 
in pinging Iwamatsu-san.

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: reset transport header if it was not set before transmission
From: Eric Dumazet @ 2013-03-19 12:59 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jason Wang, David Miller, netdev, linux-kernel, mst, edumazet
In-Reply-To: <514860DC.9020404@redhat.com>

On Tue, 2013-03-19 at 13:58 +0100, Daniel Borkmann wrote:

> Yes, will post them in a couple of minutes.
> 

Please target net tree for the first patch (adding thoff into struct
flow_keys), so that Jason or me can fix DODGY  providers.

Thanks

^ permalink raw reply

* Re: [PATCH net-next 1/2] net_sched: don't do precise pkt_len computation for untrusted packets
From: Eric Dumazet @ 2013-03-19 12:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Miller, netdev, linux-kernel, mst, edumazet,
	Daniel Borkmann
In-Reply-To: <1363695040.21184.32.camel@edumazet-glaptop>

On Tue, 2013-03-19 at 05:10 -0700, Eric Dumazet wrote:
> On Tue, 2013-03-19 at 17:25 +0800, Jason Wang wrote:
> 
> > I believe before doing header check for untrusted packets, the only
> > thing we can trust is skb->len and that's we've used before
> > 1def9238d4aa2. But after that, we're trying to use unchecked or
> > meaningless value (e.g gso_segs were reset to zero in
> > tun/macvtap/packet), and guest then can utilize this to result a very
> > huge (-1U) pkt_len by filling evil value in the header. Can all kinds of
> > packet scheduler survive this kinds of possible DOS?
> 
> I would use the flow dissector to fix the transport header from all
> DODGY providers.
> 
> Daniel Borkmann is working on a patch serie adding nhoff into flow_keys,
> and adding __skb_get_poff(const struct sk_buff *skb), for a BPF
> extension we talked about in Copenhagen / Netfilter Workshop.
> 
> You could then set the transport header offset to the right value.
> 
> (and drop evil packets before they go further in the stack)
> 
> if (gso_packet(skb)) {
> 	u32 poff = __skb_get_poff(skb);
> 
> 	if (!poff) {
> 		drop_evil_packet(skb);
> 	} else {
> 		skb_set_transport_header(skb, poff);
> 		...
> 	}
> }


Oh well, no need to use __skb_get_poff() but plain skb_flow_dissect()
(once patched to include thoff in struct flow_keys)

struct flow_keys keys;

if (!skb_flow_dissect(skb, &keys))
	goto drop;

if ((gso_type & (SKB_GSO_TCPV4|SKB_GSO_TCPV6)) &&
    keys.ip_proto != IP_PROTO_TCP)
	goto drop;

skb_set_transport_header(skb, keys.thoff);

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: reset transport header if it was not set before transmission
From: Daniel Borkmann @ 2013-03-19 12:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jason Wang, David Miller, netdev, linux-kernel, mst, edumazet
In-Reply-To: <1363695234.21184.36.camel@edumazet-glaptop>

On 03/19/2013 01:13 PM, Eric Dumazet wrote:
> On Tue, 2013-03-19 at 17:26 +0800, Jason Wang wrote:
>> On 03/18/2013 12:13 AM, David Miller wrote:
>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>> Date: Fri, 15 Mar 2013 19:10:51 -0700
>>>
>>>> Any way we can avoid adding this to fast path, for people not using
>>>> macvtap and ixgbe ?
>>> Likewise I'd rather see macvtap be responsible for fixing this up by
>>> setting the transport header properly, and therfore sending well
>>> formed packets to the rest of the stack.
>>
>> Ok, haven't checked all other possibility but looks like packet needs to
>> be fixed also.
>
> Daniel, could you post your patches if ready ?

Yes, will post them in a couple of minutes.

> Jason, I believe you could reuse existing flow dissector once Daniel
> patches are in.

^ permalink raw reply

* [PATCH 1/2] ipvs: fix hashing in ip_vs_svc_hashkey
From: Simon Horman @ 2013-03-19 12:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman
In-Reply-To: <1363697028-15314-1-git-send-email-horms@verge.net.au>

From: Julian Anastasov <ja@ssi.bg>

net is a pointer in host order, mix it properly
with other keys in network order. Fixes sparse warning.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ctl.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c68198b..a528178 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -271,16 +271,18 @@ ip_vs_svc_hashkey(struct net *net, int af, unsigned int proto,
 {
 	register unsigned int porth = ntohs(port);
 	__be32 addr_fold = addr->ip;
+	__u32 ahash;
 
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6)
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	addr_fold ^= ((size_t)net>>8);
+	ahash = ntohl(addr_fold);
+	ahash ^= ((size_t) net >> 8);
 
-	return (proto^ntohl(addr_fold)^(porth>>IP_VS_SVC_TAB_BITS)^porth)
-		& IP_VS_SVC_TAB_MASK;
+	return (proto ^ ahash ^ (porth >> IP_VS_SVC_TAB_BITS) ^ porth) &
+	       IP_VS_SVC_TAB_MASK;
 }
 
 /*
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/2] ipvs: fix some sparse warnings
From: Simon Horman @ 2013-03-19 12:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman
In-Reply-To: <1363697028-15314-1-git-send-email-horms@verge.net.au>

From: Julian Anastasov <ja@ssi.bg>

Add missing __percpu annotations and make ip_vs_net_id static.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h             |    2 +-
 net/netfilter/ipvs/ip_vs_core.c |    8 +-------
 net/netfilter/ipvs/ip_vs_est.c  |    2 +-
 3 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 68c69d5..29bc055 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -459,7 +459,7 @@ struct ip_vs_estimator {
 struct ip_vs_stats {
 	struct ip_vs_stats_user	ustats;		/* statistics */
 	struct ip_vs_estimator	est;		/* estimator */
-	struct ip_vs_cpu_stats	*cpustats;	/* per cpu counters */
+	struct ip_vs_cpu_stats __percpu	*cpustats;	/* per cpu counters */
 	spinlock_t		lock;		/* spin lock */
 	struct ip_vs_stats_user	ustats0;	/* reset values */
 };
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 47edf5a..3e5e80b 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -69,10 +69,7 @@ EXPORT_SYMBOL(ip_vs_conn_put);
 EXPORT_SYMBOL(ip_vs_get_debug_level);
 #endif
 
-int ip_vs_net_id __read_mostly;
-#ifdef IP_VS_GENERIC_NETNS
-EXPORT_SYMBOL(ip_vs_net_id);
-#endif
+static int ip_vs_net_id __read_mostly;
 /* netns cnt used for uniqueness */
 static atomic_t ipvs_netns_cnt = ATOMIC_INIT(0);
 
@@ -1181,9 +1178,6 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 						iph.len)))) {
 #ifdef CONFIG_IP_VS_IPV6
 				if (af == AF_INET6) {
-					struct net *net =
-						dev_net(skb_dst(skb)->dev);
-
 					if (!skb->dev)
 						skb->dev = net->loopback_dev;
 					icmpv6_send(skb,
diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c
index 0fac601..6bee6d0 100644
--- a/net/netfilter/ipvs/ip_vs_est.c
+++ b/net/netfilter/ipvs/ip_vs_est.c
@@ -56,7 +56,7 @@
  * Make a summary from each cpu
  */
 static void ip_vs_read_cpu_stats(struct ip_vs_stats_user *sum,
-				 struct ip_vs_cpu_stats *stats)
+				 struct ip_vs_cpu_stats __percpu *stats)
 {
 	int i;
 
-- 
1.7.10.4

^ permalink raw reply related

* [GIT PULL nf-next] IPVS changes for v3.10
From: Simon Horman @ 2013-03-19 12:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov

Hi Pablo,

The following changes since commit 1cdb09056b27b2a06b06dc7187d2c33d57082d20:

  netfilter: nfnetlink_queue: use xor hash function to distribute instances (2013-03-15 12:38:40 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs-for-v3.10

for you to fetch changes up to b962abdc6531c8de837504ebc98139587162f223:

  ipvs: fix some sparse warnings (2013-03-19 21:18:38 +0900)

----------------------------------------------------------------
IPVS changes for v3.10 from Julian Anastasov

----------------------------------------------------------------
Julian Anastasov (2):
      ipvs: fix hashing in ip_vs_svc_hashkey
      ipvs: fix some sparse warnings

 include/net/ip_vs.h             |    2 +-
 net/netfilter/ipvs/ip_vs_core.c |    8 +-------
 net/netfilter/ipvs/ip_vs_ctl.c  |    8 +++++---
 net/netfilter/ipvs/ip_vs_est.c  |    2 +-
 4 files changed, 8 insertions(+), 12 deletions(-)

^ permalink raw reply

* [PATCH 3/3] ipvs: remove extra rcu lock
From: Simon Horman @ 2013-03-19 12:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman
In-Reply-To: <1363696850-14766-1-git-send-email-horms@verge.net.au>

From: Julian Anastasov <ja@ssi.bg>

In 3.7 we added code that uses ipv4_update_pmtu but after commit
c5ae7d4192 (ipv4: must use rcu protection while calling fib_lookup)
the RCU lock is not needed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_core.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 18b4bc5..61f49d2 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1394,10 +1394,8 @@ ip_vs_in_icmp(struct sk_buff *skb, int *related, unsigned int hooknum)
 			skb_reset_network_header(skb);
 			IP_VS_DBG(12, "ICMP for IPIP %pI4->%pI4: mtu=%u\n",
 				&ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr, mtu);
-			rcu_read_lock();
 			ipv4_update_pmtu(skb, dev_net(skb->dev),
 					 mtu, 0, 0, 0, 0);
-			rcu_read_unlock();
 			/* Client uses PMTUD? */
 			if (!(cih->frag_off & htons(IP_DF)))
 				goto ignore_ipip;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/3] ipvs: add backup_only flag to avoid loops
From: Simon Horman @ 2013-03-19 12:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman
In-Reply-To: <1363696850-14766-1-git-send-email-horms@verge.net.au>

From: Julian Anastasov <ja@ssi.bg>

Dmitry Akindinov is reporting for a problem where SYNs are looping
between the master and backup server when the backup server is used as
real server in DR mode and has IPVS rules to function as director.

Even when the backup function is enabled we continue to forward
traffic and schedule new connections when the current master is using
the backup server as real server. While this is not a problem for NAT,
for DR and TUN method the backup server can not determine if a request
comes from client or from director.

To avoid such loops add new sysctl flag backup_only. It can be needed
for DR/TUN setups that do not need backup and director function at the
same time. When the backup function is enabled we stop any forwarding
and pass the traffic to the local stack (real server mode). The flag
disables the director function when the backup function is enabled.

For setups that enable backup function for some virtual services and
director function for other virtual services there should be another
more complex solution to support DR/TUN mode, may be to assign
per-virtual service syncid value, so that we can differentiate the
requests.

Reported-by: Dmitry Akindinov <dimak@stalker.com>
Tested-by: German Myzovsky <lawyer@sipnet.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 Documentation/networking/ipvs-sysctl.txt |    7 +++++++
 include/net/ip_vs.h                      |   12 ++++++++++++
 net/netfilter/ipvs/ip_vs_core.c          |   12 ++++++++----
 net/netfilter/ipvs/ip_vs_ctl.c           |    7 +++++++
 4 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt
index f2a2488..9573d0c 100644
--- a/Documentation/networking/ipvs-sysctl.txt
+++ b/Documentation/networking/ipvs-sysctl.txt
@@ -15,6 +15,13 @@ amemthresh - INTEGER
         enabled and the variable is automatically set to 2, otherwise
         the strategy is disabled and the variable is  set  to 1.
 
+backup_only - BOOLEAN
+	0 - disabled (default)
+	not 0 - enabled
+
+	If set, disable the director function while the server is
+	in backup mode to avoid packet loops for DR/TUN methods.
+
 conntrack - BOOLEAN
 	0 - disabled (default)
 	not 0 - enabled
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 68c69d5..fce8e6b 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -976,6 +976,7 @@ struct netns_ipvs {
 	int			sysctl_sync_retries;
 	int			sysctl_nat_icmp_send;
 	int			sysctl_pmtu_disc;
+	int			sysctl_backup_only;
 
 	/* ip_vs_lblc */
 	int			sysctl_lblc_expiration;
@@ -1067,6 +1068,12 @@ static inline int sysctl_pmtu_disc(struct netns_ipvs *ipvs)
 	return ipvs->sysctl_pmtu_disc;
 }
 
+static inline int sysctl_backup_only(struct netns_ipvs *ipvs)
+{
+	return ipvs->sync_state & IP_VS_STATE_BACKUP &&
+	       ipvs->sysctl_backup_only;
+}
+
 #else
 
 static inline int sysctl_sync_threshold(struct netns_ipvs *ipvs)
@@ -1114,6 +1121,11 @@ static inline int sysctl_pmtu_disc(struct netns_ipvs *ipvs)
 	return 1;
 }
 
+static inline int sysctl_backup_only(struct netns_ipvs *ipvs)
+{
+	return 0;
+}
+
 #endif
 
 /*
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 47edf5a..18b4bc5 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1577,7 +1577,8 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	}
 	/* ipvs enabled in this netns ? */
 	net = skb_net(skb);
-	if (!net_ipvs(net)->enable)
+	ipvs = net_ipvs(net);
+	if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable))
 		return NF_ACCEPT;
 
 	ip_vs_fill_iph_skb(af, skb, &iph);
@@ -1654,7 +1655,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	}
 
 	IP_VS_DBG_PKT(11, af, pp, skb, 0, "Incoming packet");
-	ipvs = net_ipvs(net);
 	/* Check the server status */
 	if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
 		/* the destination server is not available */
@@ -1815,13 +1815,15 @@ ip_vs_forward_icmp(unsigned int hooknum, struct sk_buff *skb,
 {
 	int r;
 	struct net *net;
+	struct netns_ipvs *ipvs;
 
 	if (ip_hdr(skb)->protocol != IPPROTO_ICMP)
 		return NF_ACCEPT;
 
 	/* ipvs enabled in this netns ? */
 	net = skb_net(skb);
-	if (!net_ipvs(net)->enable)
+	ipvs = net_ipvs(net);
+	if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable))
 		return NF_ACCEPT;
 
 	return ip_vs_in_icmp(skb, &r, hooknum);
@@ -1835,6 +1837,7 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb,
 {
 	int r;
 	struct net *net;
+	struct netns_ipvs *ipvs;
 	struct ip_vs_iphdr iphdr;
 
 	ip_vs_fill_iph_skb(AF_INET6, skb, &iphdr);
@@ -1843,7 +1846,8 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb,
 
 	/* ipvs enabled in this netns ? */
 	net = skb_net(skb);
-	if (!net_ipvs(net)->enable)
+	ipvs = net_ipvs(net);
+	if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable))
 		return NF_ACCEPT;
 
 	return ip_vs_in_icmp_v6(skb, &r, hooknum, &iphdr);
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c68198b..9e2d1cc 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1808,6 +1808,12 @@ static struct ctl_table vs_vars[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "backup_only",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 #ifdef CONFIG_IP_VS_DEBUG
 	{
 		.procname	= "debug_level",
@@ -3741,6 +3747,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct net *net)
 	tbl[idx++].data = &ipvs->sysctl_nat_icmp_send;
 	ipvs->sysctl_pmtu_disc = 1;
 	tbl[idx++].data = &ipvs->sysctl_pmtu_disc;
+	tbl[idx++].data = &ipvs->sysctl_backup_only;
 
 
 	ipvs->sysctl_hdr = register_net_sysctl(net, "net/ipv4/vs", tbl);
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 1/3] ipvs: fix sctp chunk length order
From: Simon Horman @ 2013-03-19 12:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman
In-Reply-To: <1363696850-14766-1-git-send-email-horms@verge.net.au>

From: Julian Anastasov <ja@ssi.bg>

Fix wrong but non-fatal access to chunk length.
sch->length should be in network order, next chunk should
be aligned to 4 bytes. Problem noticed in sparse output.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_proto_sctp.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index ae8ec6f..cd1d729 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -906,7 +906,7 @@ set_sctp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp,
 	sctp_chunkhdr_t _sctpch, *sch;
 	unsigned char chunk_type;
 	int event, next_state;
-	int ihl;
+	int ihl, cofs;
 
 #ifdef CONFIG_IP_VS_IPV6
 	ihl = cp->af == AF_INET ? ip_hdrlen(skb) : sizeof(struct ipv6hdr);
@@ -914,8 +914,8 @@ set_sctp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp,
 	ihl = ip_hdrlen(skb);
 #endif
 
-	sch = skb_header_pointer(skb, ihl + sizeof(sctp_sctphdr_t),
-				sizeof(_sctpch), &_sctpch);
+	cofs = ihl + sizeof(sctp_sctphdr_t);
+	sch = skb_header_pointer(skb, cofs, sizeof(_sctpch), &_sctpch);
 	if (sch == NULL)
 		return;
 
@@ -933,10 +933,12 @@ set_sctp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp,
 	 */
 	if ((sch->type == SCTP_CID_COOKIE_ECHO) ||
 	    (sch->type == SCTP_CID_COOKIE_ACK)) {
-		sch = skb_header_pointer(skb, (ihl + sizeof(sctp_sctphdr_t) +
-				sch->length), sizeof(_sctpch), &_sctpch);
-		if (sch) {
-			if (sch->type == SCTP_CID_ABORT)
+		int clen = ntohs(sch->length);
+
+		if (clen >= sizeof(sctp_chunkhdr_t)) {
+			sch = skb_header_pointer(skb, cofs + ALIGN(clen, 4),
+						 sizeof(_sctpch), &_sctpch);
+			if (sch && sch->type == SCTP_CID_ABORT)
 				chunk_type = sch->type;
 		}
 	}
-- 
1.7.10.4


^ permalink raw reply related

* [GIT PULL nf v2] IPVS fixes for v3.9
From: Simon Horman @ 2013-03-19 12:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov

Hi Pablo,

The following changes since commit a82783c91d5dce680dbd290ebf301a520b0e72a5:

  netfilter: ip6t_NPT: restrict to mangle table (2013-03-15 12:58:21 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git tags/ipvs-fixes-for-v3.9

for you to fetch changes up to bf93ad72cd8cfabe66a7b3d66236a1266d357189:

  ipvs: remove extra rcu lock (2013-03-19 21:21:52 +0900)

----------------------------------------------------------------
IPVS fixes for v3.9 from Julian Anastasov

----------------------------------------------------------------
Julian Anastasov (3):
      ipvs: fix sctp chunk length order
      ipvs: add backup_only flag to avoid loops
      ipvs: remove extra rcu lock

 Documentation/networking/ipvs-sysctl.txt |    7 +++++++
 include/net/ip_vs.h                      |   12 ++++++++++++
 net/netfilter/ipvs/ip_vs_core.c          |   14 ++++++++------
 net/netfilter/ipvs/ip_vs_ctl.c           |    7 +++++++
 net/netfilter/ipvs/ip_vs_proto_sctp.c    |   16 +++++++++-------
 5 files changed, 43 insertions(+), 13 deletions(-)

^ permalink raw reply

* [PATCH] use xfrm direction when lookup policy
From: Baker Zhang @ 2013-03-19 12:39 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, Baker Zhang

because xfrm policy direction has same value with corresponding
flow direction, so this problem is covered.

In xfrm_lookup and __xfrm_policy_check, flow_cache_lookup is used to
accelerate the lookup.

Flow direction is given to flow_cache_lookup by policy_to_flow_dir.

When the flow cache is mismatched, callback 'resolver' is called.

'resolver' requires xfrm direction,
so convert direction back to xfrm direction.

Signed-off-by: Baker Zhang <baker.zhang@gmail.com>
---
 net/xfrm/xfrm_policy.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 167c67d..3b8d5ba 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1037,6 +1037,24 @@ __xfrm_policy_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir
 	return xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_MAIN, fl, family, dir);
 }
 
+static inline int flow_to_policy_dir(int dir)
+{
+	if (XFRM_POLICY_IN == FLOW_DIR_IN &&
+	    XFRM_POLICY_OUT == FLOW_DIR_OUT &&
+	    XFRM_POLICY_FWD == FLOW_DIR_FWD)
+		return dir;
+
+	switch (dir) {
+	default:
+	case FLOW_DIR_IN:
+		return XFRM_POLICY_IN;
+	case FLOW_DIR_OUT:
+		return XFRM_POLICY_OUT;
+	case FLOW_DIR_FWD:
+		return XFRM_POLICY_FWD;
+	}
+}
+
 static struct flow_cache_object *
 xfrm_policy_lookup(struct net *net, const struct flowi *fl, u16 family,
 		   u8 dir, struct flow_cache_object *old_obj, void *ctx)
@@ -1046,7 +1064,7 @@ xfrm_policy_lookup(struct net *net, const struct flowi *fl, u16 family,
 	if (old_obj)
 		xfrm_pol_put(container_of(old_obj, struct xfrm_policy, flo));
 
-	pol = __xfrm_policy_lookup(net, fl, family, dir);
+	pol = __xfrm_policy_lookup(net, fl, family, flow_to_policy_dir(dir));
 	if (IS_ERR_OR_NULL(pol))
 		return ERR_CAST(pol);
 
@@ -1932,7 +1950,8 @@ xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir,
 	 * previous cache entry */
 	if (xdst == NULL) {
 		num_pols = 1;
-		pols[0] = __xfrm_policy_lookup(net, fl, family, dir);
+		pols[0] = __xfrm_policy_lookup(net, fl, family,
+					       flow_to_policy_dir(dir));
 		err = xfrm_expand_policies(fl, family, pols,
 					   &num_pols, &num_xfrms);
 		if (err < 0)
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic
From: Eric Dumazet @ 2013-03-19 12:25 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Dmitry Kravkov, davem@davemloft.net, netdev@vger.kernel.org,
	Eilon Greenstein, Tom Herbert
In-Reply-To: <CANP3RGehbKbh1ambT-RQQqfOkovzySB4_CWg+LPdJV7yDkYBCg@mail.gmail.com>

Please Maciej do not top post on lkml or netdev mailing lists.

On Tue, 2013-03-19 at 02:18 -0700, Maciej Żenczykowski wrote:
> Can the HW calculate and return a 1s complement sum of the entire
> packet (or a large portion there-of)?
> Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP
> relevant portions should still be simpler (well faster) than
> calculating the TCP checksum.
> I'm pretty sure that some relationship between 1s complement sum of
> all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum
> could be pulled out of a hat with some deeper thought.  (similarly for
> IPv4/GRE/IPv6/TCP and other combinations)
> 
> What portions of the packet can the HW/FW [partially] checksum - and
> return the value to the driver for further processing?
> Can it return 1s complement sum of data portion of outer IPv4 (ie. in
> IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes)
> 

I assume Dmitry was speaking of this possibility, and our stack should
handle this just fine.

NIC providing these kind of checksums set :

skb->ip_summed = CHECKSUM_COMPLETE;
skb->csum = csum;

before feeding the packet to the stack.

When we pull some header, we have to call skb_postpull_rcsum()
to adjust the skb->csum so that final check can be done.

About 20 drivers currently provide these kind of checksumming.

^ permalink raw reply

* RE: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic
From: Dmitry Kravkov @ 2013-03-19 12:21 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Eric Dumazet, davem@davemloft.net, netdev@vger.kernel.org,
	Eilon Greenstein, Tom Herbert
In-Reply-To: <CANP3RGehbKbh1ambT-RQQqfOkovzySB4_CWg+LPdJV7yDkYBCg@mail.gmail.com>


> -----Original Message-----
> From: Maciej Żenczykowski [mailto:maze@google.com]
> Sent: Tuesday, March 19, 2013 11:18 AM
> To: Dmitry Kravkov
> Cc: Eric Dumazet; davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert
> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic
> 
> Can the HW calculate and return a 1s complement sum of the entire
> packet (or a large portion there-of)?
No, it can't :(

> Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP
> relevant portions should still be simpler (well faster) than
> calculating the TCP checksum.
> I'm pretty sure that some relationship between 1s complement sum of
> all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum
> could be pulled out of a hat with some deeper thought.  (similarly for
> IPv4/GRE/IPv6/TCP and other combinations)
> 
> What portions of the packet can the HW/FW [partially] checksum - and
> return the value to the driver for further processing?
> Can it return 1s complement sum of data portion of outer IPv4 (ie. in
> IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes)
> 
> Maciej Żenczykowski, Kernel Networking Developer @ Google
> 
> On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
> >> -----Original Message-----
> >> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet
> >> Sent: Tuesday, March 19, 2013 2:08 AM
> >> To: Dmitry Kravkov
> >> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski
> >> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic
> >>
> >> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote:
> >> > The patch drives FW to perform RSS for GRE traffic,
> >> > based on inner headers.
> >> >
> >> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> >> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> >> > ---
> >> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 +++
> >> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  |   23 ++++++++++++-----------
> >> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  |    9 +++++++++
> >> >  3 files changed, 24 insertions(+), 11 deletions(-)
> >>
> >> This works very well.
> >>
> >> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
> >>
> >> (this was a patch from Tom Herbert, commit
> >> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping
> >> when rx'ing over tunnel )
> >>
> >> Meaning we hit a single cpu for the GRO stuff in ip_gre.
> >>
> >> I have to think about it.
> >>
> >>
> >> Another question is :
> >>
> >> Can bnx2x check the tcp checksum if GRE encapsulated ?
> >>
> > Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM
> validation for huge packets. Is it worth?
> >


^ permalink raw reply

* Re: [GIT PULL nf] IPVS fixes for v3.9
From: Simon Horman @ 2013-03-19 12:15 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov
In-Reply-To: <20130319094555.GB3602@localhost>

On Tue, Mar 19, 2013 at 10:45:55AM +0100, Pablo Neira Ayuso wrote:
> Hi Simon,
> 
> I proposed this:
> 
> > I think that these three fixes:
> >
> > ipvs: add backup_only flag to avoid loops
> > ipvs: remove extra rcu lock
> > ipvs: fix sctp chunk length order
> >
> > should find their path to the net tree.
> >
> > The remaining two sparse fixes should go to net-next.
> 
> But you sent me these for net:
> 
> ipvs: fix sctp chunk length order
> ipvs: fix hashing in ip_vs_svc_hashkey
> ipvs: fix some sparse warnings
> 
> that doesn't match.
> 
> Can you resolve this? Thanks.

Oops, will do.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: reset transport header if it was not set before transmission
From: Eric Dumazet @ 2013-03-19 12:13 UTC (permalink / raw)
  To: Jason Wang, Daniel Borkmann
  Cc: David Miller, netdev, linux-kernel, mst, edumazet
In-Reply-To: <51482F55.8020900@redhat.com>

On Tue, 2013-03-19 at 17:26 +0800, Jason Wang wrote:
> On 03/18/2013 12:13 AM, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Fri, 15 Mar 2013 19:10:51 -0700
> >
> >> Any way we can avoid adding this to fast path, for people not using
> >> macvtap and ixgbe ?
> > Likewise I'd rather see macvtap be responsible for fixing this up by
> > setting the transport header properly, and therfore sending well
> > formed packets to the rest of the stack.
> 
> Ok, haven't checked all other possibility but looks like packet needs to
> be fixed also.

Daniel, could you post your patches if ready ?

Jason, I believe you could reuse existing flow dissector once Daniel
patches are in.

^ permalink raw reply

* Re: [PATCH net-next 1/2] net_sched: don't do precise pkt_len computation for untrusted packets
From: Eric Dumazet @ 2013-03-19 12:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Miller, netdev, linux-kernel, mst, edumazet,
	Daniel Borkmann
In-Reply-To: <51482F14.9030407@redhat.com>

On Tue, 2013-03-19 at 17:25 +0800, Jason Wang wrote:

> I believe before doing header check for untrusted packets, the only
> thing we can trust is skb->len and that's we've used before
> 1def9238d4aa2. But after that, we're trying to use unchecked or
> meaningless value (e.g gso_segs were reset to zero in
> tun/macvtap/packet), and guest then can utilize this to result a very
> huge (-1U) pkt_len by filling evil value in the header. Can all kinds of
> packet scheduler survive this kinds of possible DOS?

I would use the flow dissector to fix the transport header from all
DODGY providers.

Daniel Borkmann is working on a patch serie adding nhoff into flow_keys,
and adding __skb_get_poff(const struct sk_buff *skb), for a BPF
extension we talked about in Copenhagen / Netfilter Workshop.

You could then set the transport header offset to the right value.

(and drop evil packets before they go further in the stack)

if (gso_packet(skb)) {
	u32 poff = __skb_get_poff(skb);

	if (!poff) {
		drop_evil_packet(skb);
	} else {
		skb_set_transport_header(skb, poff);
		...
	}
}

^ permalink raw reply

* Re: [PATCH v2 net-next 1/5] GRE: Refactor GRE tunneling code.
From: Bjørn Mork @ 2013-03-19 10:31 UTC (permalink / raw)
  To: David Miller; +Cc: stephen, pshelar, netdev, jesse
In-Reply-To: <20130318.153319.206719819832403477.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:
> From: Stephen Hemminger <stephen@networkplumber.org>
>> On Mon, 18 Mar 2013 11:13:25 -0700
>> Pravin B Shelar <pshelar@nicira.com> wrote:
>> 
>>> diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
>>> index 7944df7..2073226 100644
>>> --- a/net/ipv4/Kconfig
>>> +++ b/net/ipv4/Kconfig
>>> @@ -186,9 +186,14 @@ config NET_IPGRE_DEMUX
>>>  	 This is helper module to demultiplex GRE packets on GRE version field criteria.
>>>  	 Required by ip_gre and pptp modules.
>>>  
>>> +config NET_IP_TUNNEL
>>> +	tristate
>>> +	default n
>>> +
>> 
>> Won't this break existing kernel config's, shouldn't this default y?
>
> Or "m".  But indeed, it should be made to automatically work for
> existing configs.

Won't the "select NET_IP_TUNNEL" statements do just that?

Changing the default to m will enable this module even if all of
NET_IPGRE, VXLAN and NET_IPIP are disabled. That's pointless, isn't it?



Bjørn

^ permalink raw reply

* Re: [PATCH 2/2] man/send(2): document a long standing bug that can cause spurious EPERM errors
From: Pablo Neira Ayuso @ 2013-03-19 10:12 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Michael Kerrisk, linux-man-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA, Patrick McHardy,
	Hirotaka Sasaki
In-Reply-To: <1363675960.4767.10.camel@nexus>

On Tue, Mar 19, 2013 at 03:52:40PM +0900, Fernando Luis Vázquez Cao wrote:
> Subject: [PATCH 2/2] man/send(2): document a long standing bug that can cause spurious EPERM errors
> 
> This bug has been known since early 2009 (the latest) and  discussed in
> netdev before:
> 
> http://marc.info/?l=linux-netdev&w=2&r=1&s=Possible+race+condition+in+conntracking+&q=b
> 
> It seems that a proper fix would be non trivial, so document the bug
> in the meantime.
>
> Reported-by: Hirotaka Sasaki <sasaki.hirotaka-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando-gVGce1chcLdL9jVzuh4AOg@public.gmane.org>
> ---
> 
> diff -urNp man-pages-3.50-orig/man2/send.2 man-pages-3.50/man2/send.2
> --- man-pages-3.50-orig/man2/send.2	2013-03-19 15:18:03.784306647 +0900
> +++ man-pages-3.50/man2/send.2	2013-03-19 15:30:40.788060426 +0900
> @@ -420,6 +420,11 @@ Linux may return
>  .B EPIPE
>  instead of
>  .BR ENOTCONN .
> +
> +Linux may return spurious
> +.B EPERM
> +errors when netfilter's conntrack module is loaded and two or more
> +UDP packets belonging to the same connection are processed in parallel.

The Connection tracking system may drop packets for different reasons
under rare circunstances, not only in this case.

I'd prefer if you only apply patch 1/2.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox