Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] can: mcp251x: avoid repeated frame bug
From: Marc Kleine-Budde @ 2012-09-04 20:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can, Benoît Locher, stable, Marc Kleine-Budde
In-Reply-To: <1346792142-17609-1-git-send-email-mkl@pengutronix.de>

From: Benoît Locher <Benoit.Locher@skf.com>

The MCP2515 has a silicon bug causing repeated frame transmission, see section
5 of MCP2515 Rev. B Silicon Errata Revision G (March 2007).

Basically, setting TXBnCTRL.TXREQ in either SPI mode (00 or 11) will eventually
cause the bug. The workaround proposed by Microchip is to use mode 00 and send
a RTS command on the SPI bus to initiate the transmission.

Cc: <stable@vger.kernel.org>
Signed-off-by: Benoît Locher <Benoit.Locher@skf.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/mcp251x.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index a580db2..26e7129 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -83,6 +83,11 @@
 #define INSTRUCTION_LOAD_TXB(n)	(0x40 + 2 * (n))
 #define INSTRUCTION_READ_RXB(n)	(((n) == 0) ? 0x90 : 0x94)
 #define INSTRUCTION_RESET	0xC0
+#define RTS_TXB0		0x01
+#define RTS_TXB1		0x02
+#define RTS_TXB2		0x04
+#define INSTRUCTION_RTS(n)	(0x80 | ((n) & 0x07))
+
 
 /* MPC251x registers */
 #define CANSTAT	      0x0e
@@ -397,6 +402,7 @@ static void mcp251x_hw_tx_frame(struct spi_device *spi, u8 *buf,
 static void mcp251x_hw_tx(struct spi_device *spi, struct can_frame *frame,
 			  int tx_buf_idx)
 {
+	struct mcp251x_priv *priv = dev_get_drvdata(&spi->dev);
 	u32 sid, eid, exide, rtr;
 	u8 buf[SPI_TRANSFER_BUF_LEN];
 
@@ -418,7 +424,10 @@ static void mcp251x_hw_tx(struct spi_device *spi, struct can_frame *frame,
 	buf[TXBDLC_OFF] = (rtr << DLC_RTR_SHIFT) | frame->can_dlc;
 	memcpy(buf + TXBDAT_OFF, frame->data, frame->can_dlc);
 	mcp251x_hw_tx_frame(spi, buf, frame->can_dlc, tx_buf_idx);
-	mcp251x_write_reg(spi, TXBCTRL(tx_buf_idx), TXBCTRL_TXREQ);
+
+	/* use INSTRUCTION_RTS, to avoid "repeated frame problem" */
+	priv->spi_tx_buf[0] = INSTRUCTION_RTS(1 << tx_buf_idx);
+	mcp251x_spi_trans(priv->spi, 1);
 }
 
 static void mcp251x_hw_rx_frame(struct spi_device *spi, u8 *buf,
-- 
1.7.10


^ permalink raw reply related

* pull-request: can 2012-09-04
From: Marc Kleine-Budde @ 2012-09-04 20:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can

Hello David,

this patch is for the v3.6 release cycle. Benoît Locher fixed a repeated frame
bug in the mcp251x driver. He implemented the workaround suggested by the
errata sheet.

regards, Marc

--

The following changes since commit 5002200599429e83fc13e0d9a2d4788b79515b0c:

  net: qmi_wwan: add several new Gobi devices (2012-09-01 22:49:34 -0400)

are available in the git repository at:

  git://gitorious.org/linux-can/linux-can.git fixes-for-3.6

for you to fetch changes up to cab32f39dcc5b35db96497dc0a026b5dea76e4e7:

  can: mcp251x: avoid repeated frame bug (2012-09-03 20:12:06 +0200)

----------------------------------------------------------------
Benoît Locher (1):
      can: mcp251x: avoid repeated frame bug

 drivers/net/can/mcp251x.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)



^ permalink raw reply

* Re: sctp_close/sk_free: kernel BUG at arch/x86/mm/physaddr.c:18!
From: Marc Kleine-Budde @ 2012-09-04 20:42 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Fengguang Wu, networking, linux-can
In-Reply-To: <87mx15zfze.fsf@xmission.com>

[-- Attachment #1: Type: text/plain, Size: 4675 bytes --]

On 09/04/2012 10:32 PM, Eric W. Biederman wrote:
>>> FYI, another kconfig triggering a slightly different oops on tree
>>>
>>>         git://gitorious.org/linux-can/linux-can-next led-trigger
>>
>> This in turn means the problem doesn't come from the CAN patches, as
>> both trees have different CAN patches. I'm adding Eric W. Biederman on
>> Cc as he contributed some sctp patches between v3.6 and net-next/master.
> 
> Anything is possible, but this seems unlikely as I don't think I touched
> anything close to that part of the code.
> 
> This most definitely looks like a memory stomp somewhere.
> 
> sk->inet_sk->inet_opt has a bad value.
> 
> I am puzzled though what are we doing with both ipv4 and ipv6 release
> state doing on the same socket path?    Is this some crazy ipv6 socket
> doing sctp with only ipv4 addresses?

It's Wu's testcase, can you show us the code?
Eric, in case you haven't seen, this is another oops, from a slightly
different tree (a handfull of different CAN patches).

> [  233.046014] kfree_debugcheck: out of range ptr ea6000000bb8h.
> [  233.047399] ------------[ cut here ]------------
> [  233.048393] kernel BUG at /c/kernel-tests/src/stable/mm/slab.c:3074!
> [  233.048393] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [  233.048393] Modules linked in:
> [  233.048393] CPU 0 
> [  233.048393] Pid: 3929, comm: trinity-watchdo Not tainted 3.6.0-rc3+ #4192 Bochs Bochs
> [  233.048393] RIP: 0010:[<ffffffff81169653>]  [<ffffffff81169653>] kfree_debugcheck+0x27/0x2d
> [  233.048393] RSP: 0018:ffff88000facbca8  EFLAGS: 00010092
> [  233.048393] RAX: 0000000000000031 RBX: 0000ea6000000bb8 RCX: 00000000a189a188
> [  233.048393] RDX: 000000000000a189 RSI: ffffffff8108ad32 RDI: ffffffff810d30f9
> [  233.048393] RBP: ffff88000facbcb8 R08: 0000000000000002 R09: ffffffff843846f0
> [  233.048393] R10: ffffffff810ae37c R11: 0000000000000908 R12: 0000000000000202
> [  233.048393] R13: ffffffff823dbd5a R14: ffff88000ec5bea8 R15: ffffffff8363c780
> [  233.048393] FS:  00007faa6899c700(0000) GS:ffff88001f200000(0000) knlGS:0000000000000000
> [  233.048393] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  233.048393] CR2: 00007faa6841019c CR3: 0000000012c82000 CR4: 00000000000006f0
> [  233.048393] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  233.048393] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  233.048393] Process trinity-watchdo (pid: 3929, threadinfo ffff88000faca000, task ffff88000faec600)
> [  233.048393] Stack:
> [  233.048393]  0000000000000000 0000ea6000000bb8 ffff88000facbce8 ffffffff8116ad81
> [  233.048393]  ffff88000ff588a0 ffff88000ff58850 ffff88000ff588a0 0000000000000000
> [  233.048393]  ffff88000facbd08 ffffffff823dbd5a ffffffff823dbcb0 ffff88000ff58850
> [  233.048393] Call Trace:
> [  233.048393]  [<ffffffff8116ad81>] kfree+0x5f/0xca
> [  233.048393]  [<ffffffff823dbd5a>] inet_sock_destruct+0xaa/0x13c
> [  233.048393]  [<ffffffff823dbcb0>] ? inet_sk_rebuild_header+0x319/0x319
> [  233.048393]  [<ffffffff8231c307>] __sk_free+0x21/0x14b
> [  233.048393]  [<ffffffff8231c4bd>] sk_free+0x26/0x2a
> [  233.048393]  [<ffffffff825372db>] sctp_close+0x215/0x224
> [  233.048393]  [<ffffffff810d6835>] ? lock_release+0x16f/0x1b9
> [  233.048393]  [<ffffffff823daf12>] inet_release+0x7e/0x85
> [  233.048393]  [<ffffffff82317d15>] sock_release+0x1f/0x77
> [  233.048393]  [<ffffffff82317d94>] sock_close+0x27/0x2b
> [  233.048393]  [<ffffffff81173bbe>] __fput+0x101/0x20a
> [  233.048393]  [<ffffffff81173cd5>] ____fput+0xe/0x10
> [  233.048393]  [<ffffffff810a3794>] task_work_run+0x5d/0x75
> [  233.048393]  [<ffffffff8108da70>] do_exit+0x290/0x7f5
> [  233.048393]  [<ffffffff82707415>] ? retint_swapgs+0x13/0x1b
> [  233.048393]  [<ffffffff8108e23f>] do_group_exit+0x7b/0xba
> [  233.048393]  [<ffffffff8108e295>] sys_exit_group+0x17/0x17
> [  233.048393]  [<ffffffff8270de10>] tracesys+0xdd/0xe2
> [  233.048393] Code: 59 01 5d c3 55 48 89 e5 53 41 50 0f 1f 44 00 00 48 89 fb e8 d4 b0 f0 ff 84 c0 75 11 48 89 de 48 c7 c7 fc fa f7 82 e8 0d 0f 57 01 <0f> 0b 5f 5b 5d c3 55 48 89 e5 0f 1f 44 00 00 48 63 87 d8 00 00 
> [  233.048393] RIP  [<ffffffff81169653>] kfree_debugcheck+0x27/0x2d
> [  233.048393]  RSP <ffff88000facbca8>

Wu is running a bisect, let's hope that gives us a result.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply

* Re: sctp_close/sk_free: kernel BUG at arch/x86/mm/physaddr.c:18!
From: Eric W. Biederman @ 2012-09-04 20:32 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Fengguang Wu, networking, linux-can
In-Reply-To: <5046361C.5070602@pengutronix.de>

Marc Kleine-Budde <mkl@pengutronix.de> writes:

> On 09/04/2012 04:04 PM, Fengguang Wu wrote:
>> FYI, another kconfig triggering a slightly different oops on tree
>> 
>>         git://gitorious.org/linux-can/linux-can-next led-trigger
>
> This in turn means the problem doesn't come from the CAN patches, as
> both trees have different CAN patches. I'm adding Eric W. Biederman on
> Cc as he contributed some sctp patches between v3.6 and net-next/master.

Anything is possible, but this seems unlikely as I don't think I touched
anything close to that part of the code.

This most definitely looks like a memory stomp somewhere.

sk->inet_sk->inet_opt has a bad value.

I am puzzled though what are we doing with both ipv4 and ipv6 release
state doing on the same socket path?    Is this some crazy ipv6 socket
doing sctp with only ipv4 addresses?

Eric

> Marc
>> [   96.267311] ------------[ cut here ]------------
>> [   96.268294] kernel BUG at /c/kernel-tests/src/stable/arch/x86/mm/physaddr.c:18!
>> [   96.269988] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>> [   96.270636] Modules linked in:
>> [   96.270636] CPU 0 
>> [   96.270636] Pid: 2116, comm: trinity Not tainted 3.6.0-rc3+ #2679 Bochs Bochs
>> [   96.270636] RIP: 0010:[<ffffffff8102b22b>]  [<ffffffff8102b22b>] __phys_addr+0x46/0x6b
>> [   96.270636] RSP: 0018:ffff880019585c98  EFLAGS: 00010213
>> [   96.270636] RAX: ffff87ffffffffff RBX: 0000ea6000000bb8 RCX: 0000000000000000
>> [   96.270636] RDX: 0000000000000000 RSI: 0000000000000296 RDI: 0000ea6000000bb8
>> [   96.270636] RBP: ffff880019585c98 R08: 0000000000000058 R09: 0000000000000008
>> [   96.270636] R10: 000000000000000a R11: 0000000000000058 R12: ffff8800195f7718
>> [   96.270636] R13: ffffffff816521cf R14: ffffea0000000000 R15: 0000000000000000
>> [   96.270636] FS:  00007fa19b534700(0000) GS:ffff88001f200000(0000) knlGS:0000000000000000
>> [   96.270636] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   96.270636] CR2: 00007fa19b03eba0 CR3: 000000001957b000 CR4: 00000000000006f0
>> [   96.270636] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [   96.270636] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [   96.270636] Process trinity (pid: 2116, threadinfo ffff880019584000, task ffff88001af2c680)
>> [   96.270636] Stack:
>> [   96.270636]  ffff880019585cd8 ffffffff811091d7 0000000000000000 ffff88001b1ef200
>> [   96.270636]  ffff88001b1ef4d0 0000000000000000 ffff88001b1617b0 0000000000000000
>> [   96.270636]  ffff880019585cf8 ffffffff816521cf ffff88001b1ef200 ffff88001b1ef248
>> [   96.270636] Call Trace:
>> [   96.270636]  [<ffffffff811091d7>] kfree+0x63/0x162
>> [   96.270636]  [<ffffffff816521cf>] inet_sock_destruct+0x112/0x1ca
>> [   96.270636]  [<ffffffff815f6fa4>] __sk_free+0x1d/0x114
>> [   96.270636]  [<ffffffff815f710b>] sk_free+0x1c/0x1e
>> [   96.270636]  [<ffffffff816d59d5>] sctp_close+0x21a/0x229
>> [   96.270636]  [<ffffffff810810f6>] ? lock_release_holdtime.part.6+0xb2/0xb7
>> [   96.270636]  [<ffffffff81651b3e>] ? inet_release+0x65/0xc3
>> [   96.270636]  [<ffffffff81651b93>] inet_release+0xba/0xc3
>> [   96.270636]  [<ffffffff81651af9>] ? inet_release+0x20/0xc3
>> [   96.270636]  [<ffffffff81674134>] inet6_release+0x30/0x3c
>> [   96.270636]  [<ffffffff815f2317>] sock_release+0x1f/0x77
>> [   96.270636]  [<ffffffff815f2396>] sock_close+0x27/0x2b
>> [   96.270636]  [<ffffffff8110ec22>] __fput+0xf0/0x24b
>> [   96.270636]  [<ffffffff8110ed8b>] ____fput+0xe/0x10
>> [   96.270636]  [<ffffffff8104f370>] task_work_run+0x5d/0x75
>> [   96.270636]  [<ffffffff81038a66>] do_exit+0x26b/0x7d7
>> [   96.270636]  [<ffffffff81725a95>] ? retint_swapgs+0x13/0x1b
>> [   96.270636]  [<ffffffff8103925b>] do_group_exit+0x7b/0xba
>> [   96.270636]  [<ffffffff810392b1>] sys_exit_group+0x17/0x17
>> [   96.270636]  [<ffffffff8172c78e>] tracesys+0xd0/0xd5
>> [   96.270636] Code: 00 80 48 01 c7 48 81 ff ff ff ff 1f 76 02 0f 0b 48 89 f8 48 03 05 f6 bd ae 00 eb 32 48 b8 ff ff ff ff ff 87 ff ff 48 39 c7 77 02 <0f> 0b 0f b6 0d 55 57 ba 00 48 b8 00 00 00 00 00 78 00 00 48 01 
>> [   96.270636] RIP  [<ffffffff8102b22b>] __phys_addr+0x46/0x6b
>> [   96.270636]  RSP <ffff880019585c98>


^ permalink raw reply

* Re: [PATCH 2/2] RDMA/cxgb4: Update RDMA/cxgb4 due to macro definition removal in cxgb4 driver
From: David Miller @ 2012-09-04 19:59 UTC (permalink / raw)
  To: vipul-ut6Up61K2wZBDgjK7y7TUQ
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	roland-BHEL68pLQRGGvPXPguhicg, divy-ut6Up61K2wZBDgjK7y7TUQ,
	dm-ut6Up61K2wZBDgjK7y7TUQ, kumaras-ut6Up61K2wZBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	santosh-ut6Up61K2wZBDgjK7y7TUQ, sivasu-ut6Up61K2wZBDgjK7y7TUQ
In-Reply-To: <1346405072-24561-3-git-send-email-vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

From: Vipul Pandya <vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Date: Fri, 31 Aug 2012 14:54:32 +0530

> cxgb4 driver removed the duplicate definitions of registers which requires
> update in RDMA/cxgb4 driver.
> 
> Signed-off-by: Santosh Rastapur <santosh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> Signed-off-by: Vipul Pandya <vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Sivakumar Subramani <sivasu-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

If you do this in a seperate change, the build is broken between the
two changes.

Never do this, the tree must be fully bisectable and build and work
at each and every step along the way.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH] ipv6: fix handling of blackhole and prohibit routes
From: David Miller @ 2012-09-04 19:58 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <503F78C8.3070807@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 30 Aug 2012 16:29:28 +0200

> Comments are welcome.

I don't see why we have to create new flags for this.

Handle it like ipv4, where the RTN_* type dictates whether the
route is blackhole, prohibit, or other type of route.

^ permalink raw reply

* Re: [Bug 47021] New: kernel panic with l2tpv3 & mtu > 1500
From: David Miller @ 2012-09-04 19:55 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, netdev, a1bert
In-Reply-To: <1346788334.13121.82.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 04 Sep 2012 21:52:14 +0200

> [PATCH] l2tp: fix a typo in l2tp_eth_dev_recv()
> 
> While investigating l2tp bug, I hit a bug in eth_type_trans(),
> because not enough bytes were pulled in skb head.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

LoL, applied and queued up for -stable.

^ permalink raw reply

* Re: [Bug 47021] New: kernel panic with l2tpv3 & mtu > 1500
From: Stephen Hemminger @ 2012-09-04 19:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, a1bert
In-Reply-To: <1346788334.13121.82.camel@edumazet-glaptop>

On Tue, 04 Sep 2012 21:52:14 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> On Tue, 2012-09-04 at 09:07 -0700, Stephen Hemminger wrote:
> > 
> > Begin forwarded message:
> > 
> > Date: Tue,  4 Sep 2012 16:06:06 +0000 (UTC)
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: shemminger@linux-foundation.org
> > Subject: [Bug 47021] New: kernel panic with l2tpv3 & mtu > 1500
> > 
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=47021
> > 
> >            Summary: kernel panic with l2tpv3 & mtu > 1500
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 3.2.28
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: shemminger@linux-foundation.org
> >         ReportedBy: a1bert@atlas.cz
> >         Regression: No
> > 
> > 
> > first l2tpv3 packet that enters physical device with MTU > 1500 causes kernel
> > panic
> > 
> > tunel created using:
> > 
> > l2tpv3tun add tunnel tunnel_id 1 peer_tunnel_id 1 encap udp udp_sport 5001
> > udp_dport 5001 local 192.168.1.1 remote 192.168.1.2
> > 
> > l2tpv3tun add session tunnel_id 1 session_id 1 peer_session_id 1 dev l2tpeth1
> > 
> > 
> > reproducible: always
> > 
> 
> Seems following patch is needed, not sure if it helps
> 
> [PATCH] l2tp: fix a typo in l2tp_eth_dev_recv()
> 
> While investigating l2tp bug, I hit a bug in eth_type_trans(),
> because not enough bytes were pulled in skb head.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/l2tp/l2tp_eth.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> index f9ee74d..3bfb34a 100644
> --- a/net/l2tp/l2tp_eth.c
> +++ b/net/l2tp/l2tp_eth.c
> @@ -153,7 +153,7 @@ static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb,
>  		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, skb->data, length);
>  	}
>  
> -	if (!pskb_may_pull(skb, sizeof(ETH_HLEN)))
> +	if (!pskb_may_pull(skb, ETH_HLEN))
>  		goto error;

I guess nobody ever looked inside this code. That seems like an obvious bug.

^ permalink raw reply

* Re: [PATCH] net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries
From: David Miller @ 2012-09-04 19:52 UTC (permalink / raw)
  To: yamato; +Cc: netdev, linux-kernel
In-Reply-To: <1346273069-5390-1-git-send-email-yamato@redhat.com>

From: Masatake YAMATO <yamato@redhat.com>
Date: Thu, 30 Aug 2012 05:44:29 +0900

> lsof reports some of socket descriptors as "can't identify protocol" like:
> 
>     [yamato@localhost]/tmp% sudo lsof | grep dbus | grep iden
>     dbus-daem   652          dbus    6u     sock ... 17812 can't identify protocol
>     dbus-daem   652          dbus   34u     sock ... 24689 can't identify protocol
>     dbus-daem   652          dbus   42u     sock ... 24739 can't identify protocol
>     dbus-daem   652          dbus   48u     sock ... 22329 can't identify protocol
>     ...
> 
> lsof cannot resolve the protocol used in a socket because procfs
> doesn't provide the map between inode number on sockfs and protocol
> type of the socket.
> 
> For improving the situation this patch adds an extended attribute named
> 'system.sockprotoname' in which the protocol name for
> /proc/PID/fd/SOCKET is stored. So lsof can know the protocol for a
> given /proc/PID/fd/SOCKET with getxattr system call.
> 
> A few weeks ago I submitted a patch for the same purpose. The patch
> was introduced /proc/net/sockfs which enumerates inodes and protocols
> of all sockets alive on a system. However, it was rejected because (1)
> a global lock was needed, and (2) the layout of struct socket was
> changed with the patch.
> 
> This patch doesn't use any global lock; and doesn't change the layout
> of any structs.
> 
> In this patch, a protocol name is stored to dentry->d_name of sockfs
> when new socket is associated with a file descriptor. Before this
> patch dentry->d_name was not used; it was just filled with empty
> string. lsof may use an extended attribute named
> 'system.sockprotoname' to retrieve the value of dentry->d_name.
> 
> It is nice if we can see the protocol name with ls -l
> /proc/PID/fd. However, "socket:[#INODE]", the name format returned
> from sockfs_dname() was already defined. To keep the compatibility
> between kernel and user land, the extended attribute is used to
> prepare the value of dentry->d_name.
> 
> Signed-off-by: Masatake YAMATO <yamato@redhat.com>

This looks a lot more reasonable than your previous attempt.

Applied to net-next, thanks a lot.

^ permalink raw reply

* Re: Fw: [Bug 47021] New: kernel panic with l2tpv3 & mtu > 1500
From: Eric Dumazet @ 2012-09-04 19:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, a1bert
In-Reply-To: <20120904090756.71b34feb@nehalam.linuxnetplumber.net>

From: Eric Dumazet <edumazet@google.com>

On Tue, 2012-09-04 at 09:07 -0700, Stephen Hemminger wrote:
> 
> Begin forwarded message:
> 
> Date: Tue,  4 Sep 2012 16:06:06 +0000 (UTC)
> From: bugzilla-daemon@bugzilla.kernel.org
> To: shemminger@linux-foundation.org
> Subject: [Bug 47021] New: kernel panic with l2tpv3 & mtu > 1500
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=47021
> 
>            Summary: kernel panic with l2tpv3 & mtu > 1500
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 3.2.28
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: a1bert@atlas.cz
>         Regression: No
> 
> 
> first l2tpv3 packet that enters physical device with MTU > 1500 causes kernel
> panic
> 
> tunel created using:
> 
> l2tpv3tun add tunnel tunnel_id 1 peer_tunnel_id 1 encap udp udp_sport 5001
> udp_dport 5001 local 192.168.1.1 remote 192.168.1.2
> 
> l2tpv3tun add session tunnel_id 1 session_id 1 peer_session_id 1 dev l2tpeth1
> 
> 
> reproducible: always
> 

Seems following patch is needed, not sure if it helps

[PATCH] l2tp: fix a typo in l2tp_eth_dev_recv()

While investigating l2tp bug, I hit a bug in eth_type_trans(),
because not enough bytes were pulled in skb head.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/l2tp/l2tp_eth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index f9ee74d..3bfb34a 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -153,7 +153,7 @@ static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb,
 		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, skb->data, length);
 	}
 
-	if (!pskb_may_pull(skb, sizeof(ETH_HLEN)))
+	if (!pskb_may_pull(skb, ETH_HLEN))
 		goto error;
 
 	secpath_reset(skb);

^ permalink raw reply related

* Re: [PATCH V2 09/12] net/eipoib: Add main driver functionality
From: Or Gerlitz @ 2012-09-04 19:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael S. Tsirkin, Or Gerlitz, davem, roland, netdev, sean.hefty,
	Erez Shitrit, Ali Ayoub, Doug Ledford
In-Reply-To: <87a9x537r0.fsf@xmission.com>

On Tue, Sep 4, 2012 at 10:31 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Or Gerlitz <or.gerlitz@gmail.com> writes:
>> On Tue, Sep 4, 2012 at 12:22 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> Documentation we will fix,
>>> And just to stress the point, document the limitations as well.
>> sure, not that I see concrete limitations for the **user** at this point, but
>> if there are such, will put them clearly written.

> All ethernet protocols not working except IPv4 is a huge concrete limitation.

Oh, sure, that WILL be documented, currently eIPoIB can deliver only what
IPoIB can which is IPv4, ARP/RARP, IPv6+ND, for IPv6 see next

> So far you are still playing with a design that is strongly NOT
> ethernet.  So calling it eIPoIB will continue to be a LIE.

> You are still playing with an implementation that doesn't even dream
> of supporting IPv6 which makes it so far from ethernet I can't imagine
> anyone taking your code seriously.

This design can and will support IPv6, the IPv6 ND handling will follow the path
we are talking now for the IPv4 ARPs, e.g not within the driver, etc.
Could you be
more specific?

> Any implementation that breaks a naive ARP implementation also breaks
> IPv6.  Not to mention everything else that runs over ethernet.

not sure to follow on the naive impl. comment,  eIPoIB solution will
include kernel driver along with supporting user-space portion, this
is to follow a comment made by the community on mangling ARPs in
network driver.

> If you are clever you can use the current IPoIB hardware accelleration
> but you need to do something different so that you can either encode
> or imply the MAC address so you won't have to munge ethernet protocols.

also here not sure to follow, we have a new design under which the
original VM MAC is preserved on the RX side, we will not generate MAC
for Ethernet frames sent by VMs which we reconstruct on the RX side
any more.

> Just for fun you might want to consider what it takes to support 2 VMs
> in the same VLAN that share the same IP address (but different MAC
> addresses) for failover purposes.

will look on that

Or.

^ permalink raw reply

* Re: [PATCH v2 1/2] tcp: add generic netlink support for tcp_metrics
From: David Miller @ 2012-09-04 19:47 UTC (permalink / raw)
  To: ja; +Cc: eric.dumazet, netdev, shemminger, paulmck
In-Reply-To: <alpine.LFD.2.00.1209031041340.1663@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Mon, 3 Sep 2012 11:22:15 +0300 (EEST)

> 	BTW, is it appropriate to use kmem_cache for
> metrics and as result call_rcu for freeing?

I think it would work as things are implemented currently in
slab/slub/slob, however I would not rely upon it.

If you do move to a SLAB cache for the tcp metrics objects,
you might consider SLAB_DESTROY_BY_RCU.  It's a very delicate
facility (read the huge comment in linux/slab.h) but I think
it provides the semantics we need for TCP metrics blobs.

Looking forward to v3 :-)

^ permalink raw reply

* Re: Commit "ipconfig wait for carrier" makes boot hang for 2 mins if no  carrier
From: Micha Nelissen @ 2012-09-04 19:33 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev
In-Reply-To: <OF163B52E3.E2F5ACB3-ONC1257A6F.00635561-C1257A6F.00635564@transmode.se>

Joakim Tjernlund wrote:
>>  Why not set the IP address then in your rootfs yourself?  Micha 
> 
> I could ask you the same question, why do you need to have nfs in kernel?

Because that's where my root filesystem is? The IP autoconfiguration
code exists for this purpose.

> The answer is probably the same, it is much easier to
> manage our IP config in one place for our embedded system.

You retrieve the kernel via TFTP or so when booting?

> I don't understand why you need 2 minutes timeout for carrier either?

Just a safe value.

> The wait should be conditional on NFS root or not so that non NFS roots
> can skip this stage altogether.

Feel free to submit a patch :-)

Micha

^ permalink raw reply

* Re: [PATCH V2 09/12] net/eipoib: Add main driver functionality
From: Eric W. Biederman @ 2012-09-04 19:31 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Michael S. Tsirkin, Or Gerlitz, davem, roland, netdev, sean.hefty,
	Erez Shitrit, Ali Ayoub, Doug Ledford
In-Reply-To: <CAJZOPZJdmDY8rqHJ+jeuG2rLMj9CnwnemkBG=nxD=z9JBFQCRQ@mail.gmail.com>

Or Gerlitz <or.gerlitz@gmail.com> writes:

> On Tue, Sep 4, 2012 at 12:22 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Sep 03, 2012 at 11:53:56PM +0300, Or Gerlitz wrote:
>
>>> So we are remained with #3 - the ARPs -- thinking on this a little
>>> further, FWIW there --are-- components in the kernel which
>>> mangle/generate ARPs and are exposing netdevice, such as openvswitch, anyway:
>
>>> does it make sense to forward ARPs received into / sent over the
>>> eIPoIB netdevice (e.g using some sort of rule) to some outer entity
>>> such as user-space daemon  for interception and later re-injection into eIPoIB?
>
>> Well if this is all you want to do, you can bind a packet socket to the
>> interface, and drop them at the nic.  It is harder to do for incoming
>> ARP requests though.
>
>> I would do something else: send ARPs out to some defined IB address.
>> This could be local host or queries from some SA property.  Said remote
>> side could send you the responses in ethernet format so you do not need
>> to mangle responses at all.  Similarly for incoming ARP requests.
>
>> The rule to do this can also just redirect non IP packets - this is IPoIB after all.
>
> Thanks for the heads up on the possible implementation route, will
> look into that.
>
>>> Documentation we will fix,
>
>> And just to stress the point, document the limitations as well.
>
> sure, not that I see concrete limitations for the **user** at this point, but
> if there are such, will put them clearly written.

So far you are still playing with a design that is strongly NOT
ethernet.  So calling it eIPoIB will continue to be a LIE.

You are still playing with an implementation that doesn't even dream
of supporting IPv6 which makes it so far from ethernet I can't imagine
anyone taking your code seriously.

All ethernet protocols not working except IPv4 is a huge concrete
limitation.

Any implementation that breaks a naive ARP implementation also breaks
IPv6.  Not to mention everything else that runs over ethernet.

If you are clever you can use the current IPoIB hardware accelleration
but you need to do something different so that you can either encode
or imply the MAC address so you won't have to munge ethernet protocols.

Just for fun you might want to consider what it takes to support 2 VMs
in the same VLAN that share the same IP address (but different MAC
addresses) for failover purposes.

Eric

^ permalink raw reply

* Re: [GIT net-next] Open vSwitch
From: David Miller @ 2012-09-04 19:26 UTC (permalink / raw)
  To: jesse-l0M0P4e3n4LQT0dZR+AlfA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1346786049-3100-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Date: Tue,  4 Sep 2012 12:14:07 -0700

> Two feature additions to Open vSwitch for net-next/3.7.
> 
> The following changes since commit 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee:
> 
>   Linux 3.6-rc1 (2012-08-02 16:38:10 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git master
> 
> for you to fetch changes up to 15eac2a74277bc7de68a7c2a64a7c91b4b6f5961:
> 
>   openvswitch: Increase maximum number of datapath ports. (2012-09-03 19:20:49 -0700)

Pulled, thanks Jesse.

^ permalink raw reply

* Re: [GIT net] Open vSwitch
From: David Miller @ 2012-09-04 19:18 UTC (permalink / raw)
  To: jesse-l0M0P4e3n4LQT0dZR+AlfA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1346785688-2910-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Date: Tue,  4 Sep 2012 12:08:05 -0700

> A few bug fixes intended for net/3.6.
> 
> The following changes since commit 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee:
> 
>   Linux 3.6-rc1 (2012-08-02 16:38:10 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git fixes
> 
> for you to fetch changes up to c303aa94cdae483a7577230e61720e126e600a52:
> 
>   openvswitch: Fix FLOW_BUFSIZE definition. (2012-09-03 19:06:27 -0700)

Pulled, thanks Jesse.

^ permalink raw reply

* [PATCH net-next 2/2] openvswitch: Increase maximum number of datapath ports.
From: Jesse Gross @ 2012-09-04 19:14 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1346786049-3100-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

Use hash table to store ports of datapath. Allow 64K ports per switch.

Signed-off-by: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/actions.c  |    2 +-
 net/openvswitch/datapath.c |  110 +++++++++++++++++++++++++++++++-------------
 net/openvswitch/datapath.h |   33 ++++++++++---
 net/openvswitch/flow.c     |   11 ++---
 net/openvswitch/flow.h     |    3 +-
 net/openvswitch/vport.c    |    1 +
 net/openvswitch/vport.h    |    4 +-
 7 files changed, 113 insertions(+), 51 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index f3f96ba..0da6877 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -266,7 +266,7 @@ static int do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
 	if (unlikely(!skb))
 		return -ENOMEM;
 
-	vport = rcu_dereference(dp->ports[out_port]);
+	vport = ovs_vport_rcu(dp, out_port);
 	if (unlikely(!vport)) {
 		kfree_skb(skb);
 		return -ENODEV;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index cad39fc..105a0b5 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -116,7 +116,7 @@ static struct datapath *get_dp(struct net *net, int dp_ifindex)
 /* Must be called with rcu_read_lock or RTNL lock. */
 const char *ovs_dp_name(const struct datapath *dp)
 {
-	struct vport *vport = rcu_dereference_rtnl(dp->ports[OVSP_LOCAL]);
+	struct vport *vport = ovs_vport_rtnl_rcu(dp, OVSP_LOCAL);
 	return vport->ops->get_name(vport);
 }
 
@@ -127,7 +127,7 @@ static int get_dpifindex(struct datapath *dp)
 
 	rcu_read_lock();
 
-	local = rcu_dereference(dp->ports[OVSP_LOCAL]);
+	local = ovs_vport_rcu(dp, OVSP_LOCAL);
 	if (local)
 		ifindex = local->ops->get_ifindex(local);
 	else
@@ -145,9 +145,30 @@ static void destroy_dp_rcu(struct rcu_head *rcu)
 	ovs_flow_tbl_destroy((__force struct flow_table *)dp->table);
 	free_percpu(dp->stats_percpu);
 	release_net(ovs_dp_get_net(dp));
+	kfree(dp->ports);
 	kfree(dp);
 }
 
+static struct hlist_head *vport_hash_bucket(const struct datapath *dp,
+					    u16 port_no)
+{
+	return &dp->ports[port_no & (DP_VPORT_HASH_BUCKETS - 1)];
+}
+
+struct vport *ovs_lookup_vport(const struct datapath *dp, u16 port_no)
+{
+	struct vport *vport;
+	struct hlist_node *n;
+	struct hlist_head *head;
+
+	head = vport_hash_bucket(dp, port_no);
+	hlist_for_each_entry_rcu(vport, n, head, dp_hash_node) {
+		if (vport->port_no == port_no)
+			return vport;
+	}
+	return NULL;
+}
+
 /* Called with RTNL lock and genl_lock. */
 static struct vport *new_vport(const struct vport_parms *parms)
 {
@@ -156,9 +177,9 @@ static struct vport *new_vport(const struct vport_parms *parms)
 	vport = ovs_vport_add(parms);
 	if (!IS_ERR(vport)) {
 		struct datapath *dp = parms->dp;
+		struct hlist_head *head = vport_hash_bucket(dp, vport->port_no);
 
-		rcu_assign_pointer(dp->ports[parms->port_no], vport);
-		list_add(&vport->node, &dp->port_list);
+		hlist_add_head_rcu(&vport->dp_hash_node, head);
 	}
 
 	return vport;
@@ -170,8 +191,7 @@ void ovs_dp_detach_port(struct vport *p)
 	ASSERT_RTNL();
 
 	/* First drop references to device. */
-	list_del(&p->node);
-	rcu_assign_pointer(p->dp->ports[p->port_no], NULL);
+	hlist_del_rcu(&p->dp_hash_node);
 
 	/* Then destroy it. */
 	ovs_vport_del(p);
@@ -1248,7 +1268,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	struct vport *vport;
 	struct ovs_net *ovs_net;
-	int err;
+	int err, i;
 
 	err = -EINVAL;
 	if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
@@ -1261,7 +1281,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	if (dp == NULL)
 		goto err_unlock_rtnl;
 
-	INIT_LIST_HEAD(&dp->port_list);
 	ovs_dp_set_net(dp, hold_net(sock_net(skb->sk)));
 
 	/* Allocate table. */
@@ -1276,6 +1295,16 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_destroy_table;
 	}
 
+	dp->ports = kmalloc(DP_VPORT_HASH_BUCKETS * sizeof(struct hlist_head),
+			GFP_KERNEL);
+	if (!dp->ports) {
+		err = -ENOMEM;
+		goto err_destroy_percpu;
+	}
+
+	for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++)
+		INIT_HLIST_HEAD(&dp->ports[i]);
+
 	/* Set up our datapath device. */
 	parms.name = nla_data(a[OVS_DP_ATTR_NAME]);
 	parms.type = OVS_VPORT_TYPE_INTERNAL;
@@ -1290,7 +1319,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		if (err == -EBUSY)
 			err = -EEXIST;
 
-		goto err_destroy_percpu;
+		goto err_destroy_ports_array;
 	}
 
 	reply = ovs_dp_cmd_build_info(dp, info->snd_pid,
@@ -1309,7 +1338,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	return 0;
 
 err_destroy_local_port:
-	ovs_dp_detach_port(rtnl_dereference(dp->ports[OVSP_LOCAL]));
+	ovs_dp_detach_port(ovs_vport_rtnl(dp, OVSP_LOCAL));
+err_destroy_ports_array:
+	kfree(dp->ports);
 err_destroy_percpu:
 	free_percpu(dp->stats_percpu);
 err_destroy_table:
@@ -1326,15 +1357,21 @@ err:
 /* Called with genl_mutex. */
 static void __dp_destroy(struct datapath *dp)
 {
-	struct vport *vport, *next_vport;
+	int i;
 
 	rtnl_lock();
-	list_for_each_entry_safe(vport, next_vport, &dp->port_list, node)
-		if (vport->port_no != OVSP_LOCAL)
-			ovs_dp_detach_port(vport);
+
+	for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++) {
+		struct vport *vport;
+		struct hlist_node *node, *n;
+
+		hlist_for_each_entry_safe(vport, node, n, &dp->ports[i], dp_hash_node)
+			if (vport->port_no != OVSP_LOCAL)
+				ovs_dp_detach_port(vport);
+	}
 
 	list_del(&dp->list_node);
-	ovs_dp_detach_port(rtnl_dereference(dp->ports[OVSP_LOCAL]));
+	ovs_dp_detach_port(ovs_vport_rtnl(dp, OVSP_LOCAL));
 
 	/* rtnl_unlock() will wait until all the references to devices that
 	 * are pending unregistration have been dropped.  We do it here to
@@ -1566,7 +1603,7 @@ static struct vport *lookup_vport(struct net *net,
 		if (!dp)
 			return ERR_PTR(-ENODEV);
 
-		vport = rcu_dereference_rtnl(dp->ports[port_no]);
+		vport = ovs_vport_rtnl_rcu(dp, port_no);
 		if (!vport)
 			return ERR_PTR(-ENOENT);
 		return vport;
@@ -1603,7 +1640,7 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		if (port_no >= DP_MAX_PORTS)
 			goto exit_unlock;
 
-		vport = rtnl_dereference(dp->ports[port_no]);
+		vport = ovs_vport_rtnl_rcu(dp, port_no);
 		err = -EBUSY;
 		if (vport)
 			goto exit_unlock;
@@ -1613,7 +1650,7 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
 				err = -EFBIG;
 				goto exit_unlock;
 			}
-			vport = rtnl_dereference(dp->ports[port_no]);
+			vport = ovs_vport_rtnl(dp, port_no);
 			if (!vport)
 				break;
 		}
@@ -1755,32 +1792,39 @@ static int ovs_vport_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct ovs_header *ovs_header = genlmsg_data(nlmsg_data(cb->nlh));
 	struct datapath *dp;
-	u32 port_no;
-	int retval;
+	int bucket = cb->args[0], skip = cb->args[1];
+	int i, j = 0;
 
 	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
 	if (!dp)
 		return -ENODEV;
 
 	rcu_read_lock();
-	for (port_no = cb->args[0]; port_no < DP_MAX_PORTS; port_no++) {
+	for (i = bucket; i < DP_VPORT_HASH_BUCKETS; i++) {
 		struct vport *vport;
-
-		vport = rcu_dereference(dp->ports[port_no]);
-		if (!vport)
-			continue;
-
-		if (ovs_vport_cmd_fill_info(vport, skb, NETLINK_CB(cb->skb).pid,
-					    cb->nlh->nlmsg_seq, NLM_F_MULTI,
-					    OVS_VPORT_CMD_NEW) < 0)
-			break;
+		struct hlist_node *n;
+
+		j = 0;
+		hlist_for_each_entry_rcu(vport, n, &dp->ports[i], dp_hash_node) {
+			if (j >= skip &&
+			    ovs_vport_cmd_fill_info(vport, skb,
+						    NETLINK_CB(cb->skb).pid,
+						    cb->nlh->nlmsg_seq,
+						    NLM_F_MULTI,
+						    OVS_VPORT_CMD_NEW) < 0)
+				goto out;
+
+			j++;
+		}
+		skip = 0;
 	}
+out:
 	rcu_read_unlock();
 
-	cb->args[0] = port_no;
-	retval = skb->len;
+	cb->args[0] = i;
+	cb->args[1] = j;
 
-	return retval;
+	return skb->len;
 }
 
 static struct genl_ops dp_vport_genl_ops[] = {
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 771c11e..129ec54 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -29,7 +29,9 @@
 #include "flow.h"
 #include "vport.h"
 
-#define DP_MAX_PORTS 1024
+#define DP_MAX_PORTS           USHRT_MAX
+#define DP_VPORT_HASH_BUCKETS  1024
+
 #define SAMPLE_ACTION_DEPTH 3
 
 /**
@@ -57,10 +59,8 @@ struct dp_stats_percpu {
  * @list_node: Element in global 'dps' list.
  * @n_flows: Number of flows currently in flow table.
  * @table: Current flow table.  Protected by genl_lock and RCU.
- * @ports: Map from port number to &struct vport.  %OVSP_LOCAL port
- * always exists, other ports may be %NULL.  Protected by RTNL and RCU.
- * @port_list: List of all ports in @ports in arbitrary order.  RTNL required
- * to iterate or modify.
+ * @ports: Hash table for ports.  %OVSP_LOCAL port always exists.  Protected by
+ * RTNL and RCU.
  * @stats_percpu: Per-CPU datapath statistics.
  * @net: Reference to net namespace.
  *
@@ -75,8 +75,7 @@ struct datapath {
 	struct flow_table __rcu *table;
 
 	/* Switch ports. */
-	struct vport __rcu *ports[DP_MAX_PORTS];
-	struct list_head port_list;
+	struct hlist_head *ports;
 
 	/* Stats. */
 	struct dp_stats_percpu __percpu *stats_percpu;
@@ -87,6 +86,26 @@ struct datapath {
 #endif
 };
 
+struct vport *ovs_lookup_vport(const struct datapath *dp, u16 port_no);
+
+static inline struct vport *ovs_vport_rcu(const struct datapath *dp, int port_no)
+{
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	return ovs_lookup_vport(dp, port_no);
+}
+
+static inline struct vport *ovs_vport_rtnl_rcu(const struct datapath *dp, int port_no)
+{
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rtnl_is_locked());
+	return ovs_lookup_vport(dp, port_no);
+}
+
+static inline struct vport *ovs_vport_rtnl(const struct datapath *dp, int port_no)
+{
+	ASSERT_RTNL();
+	return ovs_lookup_vport(dp, port_no);
+}
+
 /**
  * struct ovs_skb_cb - OVS data in skb CB
  * @flow: The flow associated with this packet.  May be %NULL if no flow.
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index b7f38b1..f9f211d 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -203,10 +203,7 @@ struct sw_flow_actions *ovs_flow_actions_alloc(const struct nlattr *actions)
 	int actions_len = nla_len(actions);
 	struct sw_flow_actions *sfa;
 
-	/* At least DP_MAX_PORTS actions are required to be able to flood a
-	 * packet to every port.  Factor of 2 allows for setting VLAN tags,
-	 * etc. */
-	if (actions_len > 2 * DP_MAX_PORTS * nla_total_size(4))
+	if (actions_len > MAX_ACTIONS_BUFSIZE)
 		return ERR_PTR(-EINVAL);
 
 	sfa = kmalloc(sizeof(*sfa) + actions_len, GFP_KERNEL);
@@ -1000,7 +997,7 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 		swkey->phy.in_port = in_port;
 		attrs &= ~(1 << OVS_KEY_ATTR_IN_PORT);
 	} else {
-		swkey->phy.in_port = USHRT_MAX;
+		swkey->phy.in_port = DP_MAX_PORTS;
 	}
 
 	/* Data attributes. */
@@ -1143,7 +1140,7 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
 	const struct nlattr *nla;
 	int rem;
 
-	*in_port = USHRT_MAX;
+	*in_port = DP_MAX_PORTS;
 	*priority = 0;
 
 	nla_for_each_nested(nla, attr, rem) {
@@ -1180,7 +1177,7 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
 	    nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
 		goto nla_put_failure;
 
-	if (swkey->phy.in_port != USHRT_MAX &&
+	if (swkey->phy.in_port != DP_MAX_PORTS &&
 	    nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, swkey->phy.in_port))
 		goto nla_put_failure;
 
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 9b75617..d92e22a 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -43,7 +43,7 @@ struct sw_flow_actions {
 struct sw_flow_key {
 	struct {
 		u32	priority;	/* Packet QoS priority. */
-		u16	in_port;	/* Input switch port (or USHRT_MAX). */
+		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
 	} phy;
 	struct {
 		u8     src[ETH_ALEN];	/* Ethernet source address. */
@@ -161,6 +161,7 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
 			       const struct nlattr *);
 
+#define MAX_ACTIONS_BUFSIZE    (16 * 1024)
 #define TBL_MIN_BUCKETS		1024
 
 struct flow_table {
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 9873ace..1abd960 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -127,6 +127,7 @@ struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops,
 	vport->port_no = parms->port_no;
 	vport->upcall_pid = parms->upcall_pid;
 	vport->ops = ops;
+	INIT_HLIST_NODE(&vport->dp_hash_node);
 
 	vport->percpu_stats = alloc_percpu(struct vport_percpu_stats);
 	if (!vport->percpu_stats) {
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 97cef08..c56e483 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -70,10 +70,10 @@ struct vport_err_stats {
  * @rcu: RCU callback head for deferred destruction.
  * @port_no: Index into @dp's @ports array.
  * @dp: Datapath to which this port belongs.
- * @node: Element in @dp's @port_list.
  * @upcall_pid: The Netlink port to use for packets received on this port that
  * miss the flow table.
  * @hash_node: Element in @dev_table hash table in vport.c.
+ * @dp_hash_node: Element in @datapath->ports hash table in datapath.c.
  * @ops: Class structure.
  * @percpu_stats: Points to per-CPU statistics used and maintained by vport
  * @stats_lock: Protects @err_stats;
@@ -83,10 +83,10 @@ struct vport {
 	struct rcu_head rcu;
 	u16 port_no;
 	struct datapath	*dp;
-	struct list_head node;
 	u32 upcall_pid;
 
 	struct hlist_node hash_node;
+	struct hlist_node dp_hash_node;
 	const struct vport_ops *ops;
 
 	struct vport_percpu_stats __percpu *percpu_stats;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 1/2] openvswitch: Add support for network namespaces.
From: Jesse Gross @ 2012-09-04 19:14 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1346786049-3100-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

Following patch adds support for network namespace to openvswitch.
Since it must release devices when namespaces are destroyed, a
side effect of this patch is that the module no longer keeps a
refcount but instead cleans up any state when it is unloaded.

Signed-off-by: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c           |  269 ++++++++++++++++++++--------------
 net/openvswitch/datapath.h           |   19 ++-
 net/openvswitch/dp_notify.c          |    8 +-
 net/openvswitch/vport-internal_dev.c |    7 +-
 net/openvswitch/vport-netdev.c       |    2 +-
 net/openvswitch/vport.c              |   22 ++-
 net/openvswitch/vport.h              |    3 +-
 7 files changed, 207 insertions(+), 123 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index d8277d2..cad39fc 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -49,12 +49,29 @@
 #include <linux/dmi.h>
 #include <linux/workqueue.h>
 #include <net/genetlink.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
 
 #include "datapath.h"
 #include "flow.h"
 #include "vport-internal_dev.h"
 
 /**
+ * struct ovs_net - Per net-namespace data for ovs.
+ * @dps: List of datapaths to enable dumping them all out.
+ * Protected by genl_mutex.
+ */
+struct ovs_net {
+	struct list_head dps;
+};
+
+static int ovs_net_id __read_mostly;
+
+#define REHASH_FLOW_INTERVAL (10 * 60 * HZ)
+static void rehash_flow_table(struct work_struct *work);
+static DECLARE_DELAYED_WORK(rehash_flow_wq, rehash_flow_table);
+
+/**
  * DOC: Locking:
  *
  * Writes to device state (add/remove datapath, port, set operations on vports,
@@ -71,29 +88,21 @@
  * each other.
  */
 
-/* Global list of datapaths to enable dumping them all out.
- * Protected by genl_mutex.
- */
-static LIST_HEAD(dps);
-
-#define REHASH_FLOW_INTERVAL (10 * 60 * HZ)
-static void rehash_flow_table(struct work_struct *work);
-static DECLARE_DELAYED_WORK(rehash_flow_wq, rehash_flow_table);
-
 static struct vport *new_vport(const struct vport_parms *);
-static int queue_gso_packets(int dp_ifindex, struct sk_buff *,
+static int queue_gso_packets(struct net *, int dp_ifindex, struct sk_buff *,
 			     const struct dp_upcall_info *);
-static int queue_userspace_packet(int dp_ifindex, struct sk_buff *,
+static int queue_userspace_packet(struct net *, int dp_ifindex,
+				  struct sk_buff *,
 				  const struct dp_upcall_info *);
 
 /* Must be called with rcu_read_lock, genl_mutex, or RTNL lock. */
-static struct datapath *get_dp(int dp_ifindex)
+static struct datapath *get_dp(struct net *net, int dp_ifindex)
 {
 	struct datapath *dp = NULL;
 	struct net_device *dev;
 
 	rcu_read_lock();
-	dev = dev_get_by_index_rcu(&init_net, dp_ifindex);
+	dev = dev_get_by_index_rcu(net, dp_ifindex);
 	if (dev) {
 		struct vport *vport = ovs_internal_dev_get_vport(dev);
 		if (vport)
@@ -135,6 +144,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu)
 
 	ovs_flow_tbl_destroy((__force struct flow_table *)dp->table);
 	free_percpu(dp->stats_percpu);
+	release_net(ovs_dp_get_net(dp));
 	kfree(dp);
 }
 
@@ -220,11 +230,12 @@ static struct genl_family dp_packet_genl_family = {
 	.hdrsize = sizeof(struct ovs_header),
 	.name = OVS_PACKET_FAMILY,
 	.version = OVS_PACKET_VERSION,
-	.maxattr = OVS_PACKET_ATTR_MAX
+	.maxattr = OVS_PACKET_ATTR_MAX,
+	.netnsok = true
 };
 
 int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
-	      const struct dp_upcall_info *upcall_info)
+		  const struct dp_upcall_info *upcall_info)
 {
 	struct dp_stats_percpu *stats;
 	int dp_ifindex;
@@ -242,9 +253,9 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
 	}
 
 	if (!skb_is_gso(skb))
-		err = queue_userspace_packet(dp_ifindex, skb, upcall_info);
+		err = queue_userspace_packet(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
 	else
-		err = queue_gso_packets(dp_ifindex, skb, upcall_info);
+		err = queue_gso_packets(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
 	if (err)
 		goto err;
 
@@ -260,7 +271,8 @@ err:
 	return err;
 }
 
-static int queue_gso_packets(int dp_ifindex, struct sk_buff *skb,
+static int queue_gso_packets(struct net *net, int dp_ifindex,
+			     struct sk_buff *skb,
 			     const struct dp_upcall_info *upcall_info)
 {
 	unsigned short gso_type = skb_shinfo(skb)->gso_type;
@@ -276,7 +288,7 @@ static int queue_gso_packets(int dp_ifindex, struct sk_buff *skb,
 	/* Queue all of the segments. */
 	skb = segs;
 	do {
-		err = queue_userspace_packet(dp_ifindex, skb, upcall_info);
+		err = queue_userspace_packet(net, dp_ifindex, skb, upcall_info);
 		if (err)
 			break;
 
@@ -306,7 +318,8 @@ static int queue_gso_packets(int dp_ifindex, struct sk_buff *skb,
 	return err;
 }
 
-static int queue_userspace_packet(int dp_ifindex, struct sk_buff *skb,
+static int queue_userspace_packet(struct net *net, int dp_ifindex,
+				  struct sk_buff *skb,
 				  const struct dp_upcall_info *upcall_info)
 {
 	struct ovs_header *upcall;
@@ -362,7 +375,7 @@ static int queue_userspace_packet(int dp_ifindex, struct sk_buff *skb,
 
 	skb_copy_and_csum_dev(skb, nla_data(nla));
 
-	err = genlmsg_unicast(&init_net, user_skb, upcall_info->pid);
+	err = genlmsg_unicast(net, user_skb, upcall_info->pid);
 
 out:
 	kfree_skb(nskb);
@@ -370,15 +383,10 @@ out:
 }
 
 /* Called with genl_mutex. */
-static int flush_flows(int dp_ifindex)
+static int flush_flows(struct datapath *dp)
 {
 	struct flow_table *old_table;
 	struct flow_table *new_table;
-	struct datapath *dp;
-
-	dp = get_dp(dp_ifindex);
-	if (!dp)
-		return -ENODEV;
 
 	old_table = genl_dereference(dp->table);
 	new_table = ovs_flow_tbl_alloc(TBL_MIN_BUCKETS);
@@ -668,7 +676,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	packet->priority = flow->key.phy.priority;
 
 	rcu_read_lock();
-	dp = get_dp(ovs_header->dp_ifindex);
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
 	err = -ENODEV;
 	if (!dp)
 		goto err_unlock;
@@ -742,7 +750,8 @@ static struct genl_family dp_flow_genl_family = {
 	.hdrsize = sizeof(struct ovs_header),
 	.name = OVS_FLOW_FAMILY,
 	.version = OVS_FLOW_VERSION,
-	.maxattr = OVS_FLOW_ATTR_MAX
+	.maxattr = OVS_FLOW_ATTR_MAX,
+	.netnsok = true
 };
 
 static struct genl_multicast_group ovs_dp_flow_multicast_group = {
@@ -894,7 +903,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 		goto error;
 	}
 
-	dp = get_dp(ovs_header->dp_ifindex);
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
 	error = -ENODEV;
 	if (!dp)
 		goto error;
@@ -995,7 +1004,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 			   ovs_dp_flow_multicast_group.id, info->nlhdr,
 			   GFP_KERNEL);
 	else
-		netlink_set_err(init_net.genl_sock, 0,
+		netlink_set_err(sock_net(skb->sk)->genl_sock, 0,
 				ovs_dp_flow_multicast_group.id, PTR_ERR(reply));
 	return 0;
 
@@ -1023,7 +1032,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 	if (err)
 		return err;
 
-	dp = get_dp(ovs_header->dp_ifindex);
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
 	if (!dp)
 		return -ENODEV;
 
@@ -1052,16 +1061,17 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	int err;
 	int key_len;
 
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+	if (!dp)
+		return -ENODEV;
+
 	if (!a[OVS_FLOW_ATTR_KEY])
-		return flush_flows(ovs_header->dp_ifindex);
+		return flush_flows(dp);
+
 	err = ovs_flow_from_nlattrs(&key, &key_len, a[OVS_FLOW_ATTR_KEY]);
 	if (err)
 		return err;
 
-	dp = get_dp(ovs_header->dp_ifindex);
-	if (!dp)
-		return -ENODEV;
-
 	table = genl_dereference(dp->table);
 	flow = ovs_flow_tbl_lookup(table, &key, key_len);
 	if (!flow)
@@ -1090,7 +1100,7 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	struct datapath *dp;
 	struct flow_table *table;
 
-	dp = get_dp(ovs_header->dp_ifindex);
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
 	if (!dp)
 		return -ENODEV;
 
@@ -1152,7 +1162,8 @@ static struct genl_family dp_datapath_genl_family = {
 	.hdrsize = sizeof(struct ovs_header),
 	.name = OVS_DATAPATH_FAMILY,
 	.version = OVS_DATAPATH_VERSION,
-	.maxattr = OVS_DP_ATTR_MAX
+	.maxattr = OVS_DP_ATTR_MAX,
+	.netnsok = true
 };
 
 static struct genl_multicast_group ovs_dp_datapath_multicast_group = {
@@ -1210,18 +1221,19 @@ static struct sk_buff *ovs_dp_cmd_build_info(struct datapath *dp, u32 pid,
 }
 
 /* Called with genl_mutex and optionally with RTNL lock also. */
-static struct datapath *lookup_datapath(struct ovs_header *ovs_header,
+static struct datapath *lookup_datapath(struct net *net,
+					struct ovs_header *ovs_header,
 					struct nlattr *a[OVS_DP_ATTR_MAX + 1])
 {
 	struct datapath *dp;
 
 	if (!a[OVS_DP_ATTR_NAME])
-		dp = get_dp(ovs_header->dp_ifindex);
+		dp = get_dp(net, ovs_header->dp_ifindex);
 	else {
 		struct vport *vport;
 
 		rcu_read_lock();
-		vport = ovs_vport_locate(nla_data(a[OVS_DP_ATTR_NAME]));
+		vport = ovs_vport_locate(net, nla_data(a[OVS_DP_ATTR_NAME]));
 		dp = vport && vport->port_no == OVSP_LOCAL ? vport->dp : NULL;
 		rcu_read_unlock();
 	}
@@ -1235,6 +1247,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	struct sk_buff *reply;
 	struct datapath *dp;
 	struct vport *vport;
+	struct ovs_net *ovs_net;
 	int err;
 
 	err = -EINVAL;
@@ -1242,15 +1255,14 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err;
 
 	rtnl_lock();
-	err = -ENODEV;
-	if (!try_module_get(THIS_MODULE))
-		goto err_unlock_rtnl;
 
 	err = -ENOMEM;
 	dp = kzalloc(sizeof(*dp), GFP_KERNEL);
 	if (dp == NULL)
-		goto err_put_module;
+		goto err_unlock_rtnl;
+
 	INIT_LIST_HEAD(&dp->port_list);
+	ovs_dp_set_net(dp, hold_net(sock_net(skb->sk)));
 
 	/* Allocate table. */
 	err = -ENOMEM;
@@ -1287,7 +1299,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(reply))
 		goto err_destroy_local_port;
 
-	list_add_tail(&dp->list_node, &dps);
+	ovs_net = net_generic(ovs_dp_get_net(dp), ovs_net_id);
+	list_add_tail(&dp->list_node, &ovs_net->dps);
 	rtnl_unlock();
 
 	genl_notify(reply, genl_info_net(info), info->snd_pid,
@@ -1302,34 +1315,20 @@ err_destroy_percpu:
 err_destroy_table:
 	ovs_flow_tbl_destroy(genl_dereference(dp->table));
 err_free_dp:
+	release_net(ovs_dp_get_net(dp));
 	kfree(dp);
-err_put_module:
-	module_put(THIS_MODULE);
 err_unlock_rtnl:
 	rtnl_unlock();
 err:
 	return err;
 }
 
-static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
+/* Called with genl_mutex. */
+static void __dp_destroy(struct datapath *dp)
 {
 	struct vport *vport, *next_vport;
-	struct sk_buff *reply;
-	struct datapath *dp;
-	int err;
 
 	rtnl_lock();
-	dp = lookup_datapath(info->userhdr, info->attrs);
-	err = PTR_ERR(dp);
-	if (IS_ERR(dp))
-		goto exit_unlock;
-
-	reply = ovs_dp_cmd_build_info(dp, info->snd_pid,
-				      info->snd_seq, OVS_DP_CMD_DEL);
-	err = PTR_ERR(reply);
-	if (IS_ERR(reply))
-		goto exit_unlock;
-
 	list_for_each_entry_safe(vport, next_vport, &dp->port_list, node)
 		if (vport->port_no != OVSP_LOCAL)
 			ovs_dp_detach_port(vport);
@@ -1345,17 +1344,32 @@ static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	rtnl_unlock();
 
 	call_rcu(&dp->rcu, destroy_dp_rcu);
-	module_put(THIS_MODULE);
+}
+
+static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
+{
+	struct sk_buff *reply;
+	struct datapath *dp;
+	int err;
+
+	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
+	err = PTR_ERR(dp);
+	if (IS_ERR(dp))
+		return err;
+
+	reply = ovs_dp_cmd_build_info(dp, info->snd_pid,
+				      info->snd_seq, OVS_DP_CMD_DEL);
+	err = PTR_ERR(reply);
+	if (IS_ERR(reply))
+		return err;
+
+	__dp_destroy(dp);
 
 	genl_notify(reply, genl_info_net(info), info->snd_pid,
 		    ovs_dp_datapath_multicast_group.id, info->nlhdr,
 		    GFP_KERNEL);
 
 	return 0;
-
-exit_unlock:
-	rtnl_unlock();
-	return err;
 }
 
 static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
@@ -1364,7 +1378,7 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	int err;
 
-	dp = lookup_datapath(info->userhdr, info->attrs);
+	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
 	if (IS_ERR(dp))
 		return PTR_ERR(dp);
 
@@ -1372,7 +1386,7 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 				      info->snd_seq, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
-		netlink_set_err(init_net.genl_sock, 0,
+		netlink_set_err(sock_net(skb->sk)->genl_sock, 0,
 				ovs_dp_datapath_multicast_group.id, err);
 		return 0;
 	}
@@ -1389,7 +1403,7 @@ static int ovs_dp_cmd_get(struct sk_buff *skb, struct genl_info *info)
 	struct sk_buff *reply;
 	struct datapath *dp;
 
-	dp = lookup_datapath(info->userhdr, info->attrs);
+	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
 	if (IS_ERR(dp))
 		return PTR_ERR(dp);
 
@@ -1403,11 +1417,12 @@ static int ovs_dp_cmd_get(struct sk_buff *skb, struct genl_info *info)
 
 static int ovs_dp_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
 	struct datapath *dp;
 	int skip = cb->args[0];
 	int i = 0;
 
-	list_for_each_entry(dp, &dps, list_node) {
+	list_for_each_entry(dp, &ovs_net->dps, list_node) {
 		if (i >= skip &&
 		    ovs_dp_cmd_fill_info(dp, skb, NETLINK_CB(cb->skb).pid,
 					 cb->nlh->nlmsg_seq, NLM_F_MULTI,
@@ -1459,7 +1474,8 @@ static struct genl_family dp_vport_genl_family = {
 	.hdrsize = sizeof(struct ovs_header),
 	.name = OVS_VPORT_FAMILY,
 	.version = OVS_VPORT_VERSION,
-	.maxattr = OVS_VPORT_ATTR_MAX
+	.maxattr = OVS_VPORT_ATTR_MAX,
+	.netnsok = true
 };
 
 struct genl_multicast_group ovs_dp_vport_multicast_group = {
@@ -1525,14 +1541,15 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *vport, u32 pid,
 }
 
 /* Called with RTNL lock or RCU read lock. */
-static struct vport *lookup_vport(struct ovs_header *ovs_header,
+static struct vport *lookup_vport(struct net *net,
+				  struct ovs_header *ovs_header,
 				  struct nlattr *a[OVS_VPORT_ATTR_MAX + 1])
 {
 	struct datapath *dp;
 	struct vport *vport;
 
 	if (a[OVS_VPORT_ATTR_NAME]) {
-		vport = ovs_vport_locate(nla_data(a[OVS_VPORT_ATTR_NAME]));
+		vport = ovs_vport_locate(net, nla_data(a[OVS_VPORT_ATTR_NAME]));
 		if (!vport)
 			return ERR_PTR(-ENODEV);
 		if (ovs_header->dp_ifindex &&
@@ -1545,7 +1562,7 @@ static struct vport *lookup_vport(struct ovs_header *ovs_header,
 		if (port_no >= DP_MAX_PORTS)
 			return ERR_PTR(-EFBIG);
 
-		dp = get_dp(ovs_header->dp_ifindex);
+		dp = get_dp(net, ovs_header->dp_ifindex);
 		if (!dp)
 			return ERR_PTR(-ENODEV);
 
@@ -1574,7 +1591,7 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto exit;
 
 	rtnl_lock();
-	dp = get_dp(ovs_header->dp_ifindex);
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
 	err = -ENODEV;
 	if (!dp)
 		goto exit_unlock;
@@ -1638,7 +1655,7 @@ static int ovs_vport_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	int err;
 
 	rtnl_lock();
-	vport = lookup_vport(info->userhdr, a);
+	vport = lookup_vport(sock_net(skb->sk), info->userhdr, a);
 	err = PTR_ERR(vport);
 	if (IS_ERR(vport))
 		goto exit_unlock;
@@ -1658,7 +1675,7 @@ static int ovs_vport_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	reply = ovs_vport_cmd_build_info(vport, info->snd_pid, info->snd_seq,
 					 OVS_VPORT_CMD_NEW);
 	if (IS_ERR(reply)) {
-		netlink_set_err(init_net.genl_sock, 0,
+		netlink_set_err(sock_net(skb->sk)->genl_sock, 0,
 				ovs_dp_vport_multicast_group.id, PTR_ERR(reply));
 		goto exit_unlock;
 	}
@@ -1679,7 +1696,7 @@ static int ovs_vport_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	int err;
 
 	rtnl_lock();
-	vport = lookup_vport(info->userhdr, a);
+	vport = lookup_vport(sock_net(skb->sk), info->userhdr, a);
 	err = PTR_ERR(vport);
 	if (IS_ERR(vport))
 		goto exit_unlock;
@@ -1714,7 +1731,7 @@ static int ovs_vport_cmd_get(struct sk_buff *skb, struct genl_info *info)
 	int err;
 
 	rcu_read_lock();
-	vport = lookup_vport(ovs_header, a);
+	vport = lookup_vport(sock_net(skb->sk), ovs_header, a);
 	err = PTR_ERR(vport);
 	if (IS_ERR(vport))
 		goto exit_unlock;
@@ -1741,7 +1758,7 @@ static int ovs_vport_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	u32 port_no;
 	int retval;
 
-	dp = get_dp(ovs_header->dp_ifindex);
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
 	if (!dp)
 		return -ENODEV;
 
@@ -1766,28 +1783,6 @@ static int ovs_vport_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	return retval;
 }
 
-static void rehash_flow_table(struct work_struct *work)
-{
-	struct datapath *dp;
-
-	genl_lock();
-
-	list_for_each_entry(dp, &dps, list_node) {
-		struct flow_table *old_table = genl_dereference(dp->table);
-		struct flow_table *new_table;
-
-		new_table = ovs_flow_tbl_rehash(old_table);
-		if (!IS_ERR(new_table)) {
-			rcu_assign_pointer(dp->table, new_table);
-			ovs_flow_tbl_deferred_destroy(old_table);
-		}
-	}
-
-	genl_unlock();
-
-	schedule_delayed_work(&rehash_flow_wq, REHASH_FLOW_INTERVAL);
-}
-
 static struct genl_ops dp_vport_genl_ops[] = {
 	{ .cmd = OVS_VPORT_CMD_NEW,
 	  .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
@@ -1872,6 +1867,59 @@ error:
 	return err;
 }
 
+static void rehash_flow_table(struct work_struct *work)
+{
+	struct datapath *dp;
+	struct net *net;
+
+	genl_lock();
+	rtnl_lock();
+	for_each_net(net) {
+		struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+
+		list_for_each_entry(dp, &ovs_net->dps, list_node) {
+			struct flow_table *old_table = genl_dereference(dp->table);
+			struct flow_table *new_table;
+
+			new_table = ovs_flow_tbl_rehash(old_table);
+			if (!IS_ERR(new_table)) {
+				rcu_assign_pointer(dp->table, new_table);
+				ovs_flow_tbl_deferred_destroy(old_table);
+			}
+		}
+	}
+	rtnl_unlock();
+	genl_unlock();
+
+	schedule_delayed_work(&rehash_flow_wq, REHASH_FLOW_INTERVAL);
+}
+
+static int __net_init ovs_init_net(struct net *net)
+{
+	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+
+	INIT_LIST_HEAD(&ovs_net->dps);
+	return 0;
+}
+
+static void __net_exit ovs_exit_net(struct net *net)
+{
+	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+	struct datapath *dp, *dp_next;
+
+	genl_lock();
+	list_for_each_entry_safe(dp, dp_next, &ovs_net->dps, list_node)
+		__dp_destroy(dp);
+	genl_unlock();
+}
+
+static struct pernet_operations ovs_net_ops = {
+	.init = ovs_init_net,
+	.exit = ovs_exit_net,
+	.id   = &ovs_net_id,
+	.size = sizeof(struct ovs_net),
+};
+
 static int __init dp_init(void)
 {
 	struct sk_buff *dummy_skb;
@@ -1889,10 +1937,14 @@ static int __init dp_init(void)
 	if (err)
 		goto error_flow_exit;
 
-	err = register_netdevice_notifier(&ovs_dp_device_notifier);
+	err = register_pernet_device(&ovs_net_ops);
 	if (err)
 		goto error_vport_exit;
 
+	err = register_netdevice_notifier(&ovs_dp_device_notifier);
+	if (err)
+		goto error_netns_exit;
+
 	err = dp_register_genl();
 	if (err < 0)
 		goto error_unreg_notifier;
@@ -1903,6 +1955,8 @@ static int __init dp_init(void)
 
 error_unreg_notifier:
 	unregister_netdevice_notifier(&ovs_dp_device_notifier);
+error_netns_exit:
+	unregister_pernet_device(&ovs_net_ops);
 error_vport_exit:
 	ovs_vport_exit();
 error_flow_exit:
@@ -1914,9 +1968,10 @@ error:
 static void dp_cleanup(void)
 {
 	cancel_delayed_work_sync(&rehash_flow_wq);
-	rcu_barrier();
 	dp_unregister_genl(ARRAY_SIZE(dp_genl_families));
 	unregister_netdevice_notifier(&ovs_dp_device_notifier);
+	unregister_pernet_device(&ovs_net_ops);
+	rcu_barrier();
 	ovs_vport_exit();
 	ovs_flow_exit();
 }
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index c1105c1..771c11e 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -27,8 +27,7 @@
 #include <linux/u64_stats_sync.h>
 
 #include "flow.h"
-
-struct vport;
+#include "vport.h"
 
 #define DP_MAX_PORTS 1024
 #define SAMPLE_ACTION_DEPTH 3
@@ -63,6 +62,7 @@ struct dp_stats_percpu {
  * @port_list: List of all ports in @ports in arbitrary order.  RTNL required
  * to iterate or modify.
  * @stats_percpu: Per-CPU datapath statistics.
+ * @net: Reference to net namespace.
  *
  * Context: See the comment on locking at the top of datapath.c for additional
  * locking information.
@@ -80,6 +80,11 @@ struct datapath {
 
 	/* Stats. */
 	struct dp_stats_percpu __percpu *stats_percpu;
+
+#ifdef CONFIG_NET_NS
+	/* Network namespace ref. */
+	struct net *net;
+#endif
 };
 
 /**
@@ -108,6 +113,16 @@ struct dp_upcall_info {
 	u32 pid;
 };
 
+static inline struct net *ovs_dp_get_net(struct datapath *dp)
+{
+	return read_pnet(&dp->net);
+}
+
+static inline void ovs_dp_set_net(struct datapath *dp, struct net *net)
+{
+	write_pnet(&dp->net, net);
+}
+
 extern struct notifier_block ovs_dp_device_notifier;
 extern struct genl_multicast_group ovs_dp_vport_multicast_group;
 
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 36dcee8..5558350 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -41,19 +41,21 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
 	case NETDEV_UNREGISTER:
 		if (!ovs_is_internal_dev(dev)) {
 			struct sk_buff *notify;
+			struct datapath *dp = vport->dp;
 
 			notify = ovs_vport_cmd_build_info(vport, 0, 0,
 							  OVS_VPORT_CMD_DEL);
 			ovs_dp_detach_port(vport);
 			if (IS_ERR(notify)) {
-				netlink_set_err(init_net.genl_sock, 0,
+				netlink_set_err(ovs_dp_get_net(dp)->genl_sock, 0,
 						ovs_dp_vport_multicast_group.id,
 						PTR_ERR(notify));
 				break;
 			}
 
-			genlmsg_multicast(notify, 0, ovs_dp_vport_multicast_group.id,
-					  GFP_KERNEL);
+			genlmsg_multicast_netns(ovs_dp_get_net(dp), notify, 0,
+						ovs_dp_vport_multicast_group.id,
+						GFP_KERNEL);
 		}
 		break;
 	}
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 4061b9e..5d460c3 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -144,7 +144,7 @@ static void do_setup(struct net_device *netdev)
 	netdev->tx_queue_len = 0;
 
 	netdev->features = NETIF_F_LLTX | NETIF_F_SG | NETIF_F_FRAGLIST |
-				NETIF_F_HIGHDMA | NETIF_F_HW_CSUM | NETIF_F_TSO;
+			   NETIF_F_HIGHDMA | NETIF_F_HW_CSUM | NETIF_F_TSO;
 
 	netdev->vlan_features = netdev->features;
 	netdev->features |= NETIF_F_HW_VLAN_TX;
@@ -175,9 +175,14 @@ static struct vport *internal_dev_create(const struct vport_parms *parms)
 		goto error_free_vport;
 	}
 
+	dev_net_set(netdev_vport->dev, ovs_dp_get_net(vport->dp));
 	internal_dev = internal_dev_priv(netdev_vport->dev);
 	internal_dev->vport = vport;
 
+	/* Restrict bridge port to current netns. */
+	if (vport->port_no == OVSP_LOCAL)
+		netdev_vport->dev->features |= NETIF_F_NETNS_LOCAL;
+
 	err = register_netdevice(netdev_vport->dev);
 	if (err)
 		goto error_free_netdev;
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 6ea3551..3c1e58b 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -83,7 +83,7 @@ static struct vport *netdev_create(const struct vport_parms *parms)
 
 	netdev_vport = netdev_vport_priv(vport);
 
-	netdev_vport->dev = dev_get_by_name(&init_net, parms->name);
+	netdev_vport->dev = dev_get_by_name(ovs_dp_get_net(vport->dp), parms->name);
 	if (!netdev_vport->dev) {
 		err = -ENODEV;
 		goto error_free_vport;
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 6140336..9873ace 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -16,10 +16,10 @@
  * 02110-1301, USA
  */
 
-#include <linux/dcache.h>
 #include <linux/etherdevice.h>
 #include <linux/if.h>
 #include <linux/if_vlan.h>
+#include <linux/jhash.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
@@ -27,7 +27,9 @@
 #include <linux/rcupdate.h>
 #include <linux/rtnetlink.h>
 #include <linux/compat.h>
+#include <net/net_namespace.h>
 
+#include "datapath.h"
 #include "vport.h"
 #include "vport-internal_dev.h"
 
@@ -67,9 +69,9 @@ void ovs_vport_exit(void)
 	kfree(dev_table);
 }
 
-static struct hlist_head *hash_bucket(const char *name)
+static struct hlist_head *hash_bucket(struct net *net, const char *name)
 {
-	unsigned int hash = full_name_hash(name, strlen(name));
+	unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
 	return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
 }
 
@@ -80,14 +82,15 @@ static struct hlist_head *hash_bucket(const char *name)
  *
  * Must be called with RTNL or RCU read lock.
  */
-struct vport *ovs_vport_locate(const char *name)
+struct vport *ovs_vport_locate(struct net *net, const char *name)
 {
-	struct hlist_head *bucket = hash_bucket(name);
+	struct hlist_head *bucket = hash_bucket(net, name);
 	struct vport *vport;
 	struct hlist_node *node;
 
 	hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
-		if (!strcmp(name, vport->ops->get_name(vport)))
+		if (!strcmp(name, vport->ops->get_name(vport)) &&
+		    net_eq(ovs_dp_get_net(vport->dp), net))
 			return vport;
 
 	return NULL;
@@ -170,14 +173,17 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
 
 	for (i = 0; i < ARRAY_SIZE(vport_ops_list); i++) {
 		if (vport_ops_list[i]->type == parms->type) {
+			struct hlist_head *bucket;
+
 			vport = vport_ops_list[i]->create(parms);
 			if (IS_ERR(vport)) {
 				err = PTR_ERR(vport);
 				goto out;
 			}
 
-			hlist_add_head_rcu(&vport->hash_node,
-					   hash_bucket(vport->ops->get_name(vport)));
+			bucket = hash_bucket(ovs_dp_get_net(vport->dp),
+					     vport->ops->get_name(vport));
+			hlist_add_head_rcu(&vport->hash_node, bucket);
 			return vport;
 		}
 	}
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index aac680c..97cef08 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -20,6 +20,7 @@
 #define VPORT_H 1
 
 #include <linux/list.h>
+#include <linux/netlink.h>
 #include <linux/openvswitch.h>
 #include <linux/skbuff.h>
 #include <linux/spinlock.h>
@@ -38,7 +39,7 @@ void ovs_vport_exit(void);
 struct vport *ovs_vport_add(const struct vport_parms *);
 void ovs_vport_del(struct vport *);
 
-struct vport *ovs_vport_locate(const char *name);
+struct vport *ovs_vport_locate(struct net *net, const char *name);
 
 void ovs_vport_get_stats(struct vport *, struct ovs_vport_stats *);
 
-- 
1.7.9.5

^ permalink raw reply related

* [GIT net-next] Open vSwitch
From: Jesse Gross @ 2012-09-04 19:14 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

Two feature additions to Open vSwitch for net-next/3.7.

The following changes since commit 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee:

  Linux 3.6-rc1 (2012-08-02 16:38:10 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git master

for you to fetch changes up to 15eac2a74277bc7de68a7c2a64a7c91b4b6f5961:

  openvswitch: Increase maximum number of datapath ports. (2012-09-03 19:20:49 -0700)

----------------------------------------------------------------
Pravin B Shelar (2):
      openvswitch: Add support for network namespaces.
      openvswitch: Increase maximum number of datapath ports.

 net/openvswitch/actions.c            |    2 +-
 net/openvswitch/datapath.c           |  375 +++++++++++++++++++++-------------
 net/openvswitch/datapath.h           |   50 ++++-
 net/openvswitch/dp_notify.c          |    8 +-
 net/openvswitch/flow.c               |   11 +-
 net/openvswitch/flow.h               |    3 +-
 net/openvswitch/vport-internal_dev.c |    7 +-
 net/openvswitch/vport-netdev.c       |    2 +-
 net/openvswitch/vport.c              |   23 ++-
 net/openvswitch/vport.h              |    7 +-
 10 files changed, 317 insertions(+), 171 deletions(-)

^ permalink raw reply

* Re: [PATCH] i825xx: fix paging fault on znet_probe()
From: David Miller @ 2012-09-04 19:13 UTC (permalink / raw)
  To: fengguang.wu; +Cc: jeffrey.t.kirsher, netdev, linux-kernel
In-Reply-To: <20120902072546.GA20290@localhost>

From: Fengguang Wu <fengguang.wu@intel.com>
Date: Sun, 2 Sep 2012 15:25:46 +0800

> In znet_probe(), strncmp() may access beyond 0x100000 and
> trigger the below oops in kvm.  Fix it by limiting the loop
> under 0x100000-8. I suspect the limit could be further decreased
> to 0x100000-sizeof(struct netidblk), however no datasheet at hand..
 ...
> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>

This also makes the code actually match the description in the comment
above the loop :-)

Applied, thanks.

^ permalink raw reply

* [GIT net] Open vSwitch
From: Jesse Gross @ 2012-09-04 19:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dev

A few bug fixes intended for net/3.6.

The following changes since commit 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee:

  Linux 3.6-rc1 (2012-08-02 16:38:10 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git fixes

for you to fetch changes up to c303aa94cdae483a7577230e61720e126e600a52:

  openvswitch: Fix FLOW_BUFSIZE definition. (2012-09-03 19:06:27 -0700)

----------------------------------------------------------------
Jesse Gross (2):
      openvswitch: Relax set header validation.
      openvswitch: Fix FLOW_BUFSIZE definition.

Joe Stringer (1):
      openvswitch: Fix typo

 net/openvswitch/actions.c  |    2 +-
 net/openvswitch/datapath.c |    6 +++---
 net/openvswitch/flow.h     |    8 +++++---
 3 files changed, 9 insertions(+), 7 deletions(-)

^ permalink raw reply

* [PATCH net 3/3] openvswitch: Fix FLOW_BUFSIZE definition.
From: Jesse Gross @ 2012-09-04 19:08 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1346785688-2910-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

The vlan encapsulation fields in the maximum flow defintion were
never updated when the representation changed before upstreaming.
In theory this could cause a kernel panic when a maximum length
flow is used.  In practice this has never happened (to my knowledge)
because skb allocations are padded out to a cache line so you would
need the right combination of flow and packet being sent to userspace.

Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/flow.h |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 9b75617..c30df1a 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -145,15 +145,17 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
  *  OVS_KEY_ATTR_IN_PORT       4    --     4      8
  *  OVS_KEY_ATTR_ETHERNET     12    --     4     16
+ *  OVS_KEY_ATTR_ETHERTYPE     2     2     4      8  (outer VLAN ethertype)
  *  OVS_KEY_ATTR_8021Q         4    --     4      8
- *  OVS_KEY_ATTR_ETHERTYPE     2     2     4      8
+ *  OVS_KEY_ATTR_ENCAP         0    --     4      4  (VLAN encapsulation)
+ *  OVS_KEY_ATTR_ETHERTYPE     2     2     4      8  (inner VLAN ethertype)
  *  OVS_KEY_ATTR_IPV6         40    --     4     44
  *  OVS_KEY_ATTR_ICMPV6        2     2     4      8
  *  OVS_KEY_ATTR_ND           28    --     4     32
  *  -------------------------------------------------
- *  total                                       132
+ *  total                                       144
  */
-#define FLOW_BUFSIZE 132
+#define FLOW_BUFSIZE 144

 int ovs_flow_to_nlattrs(const struct sw_flow_key *, struct sk_buff *);
 int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net 2/3] openvswitch: Fix typo
From: Jesse Gross @ 2012-09-04 19:08 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1346785688-2910-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Joe Stringer <joe-Q1GJJQv1iO6lP80pJB477g@public.gmane.org>

Signed-off-by: Joe Stringer <joe-Q1GJJQv1iO6lP80pJB477g@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/actions.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index f3f96ba..954405c 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -45,7 +45,7 @@ static int make_writable(struct sk_buff *skb, int write_len)
 	return pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
 }
 
-/* remove VLAN header from packet and update csum accrodingly. */
+/* remove VLAN header from packet and update csum accordingly. */
 static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 {
 	struct vlan_hdr *vhdr;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net 1/3] openvswitch: Relax set header validation.
From: Jesse Gross @ 2012-09-04 19:08 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1346785688-2910-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

When installing a flow with an action to set a particular field we
need to validate that the packets that are part of the flow actually
contain that header.  With IP we use zeroed addresses and with TCP/UDP
the check is for zeroed ports.  This check is overly broad and can catch
packets like DHCP requests that have a zero source address in a
legitimate header.  This changes the check to look for a zeroed protocol
number for IP or for both ports be zero for TCP/UDP before considering
the header to not exist.

Reported-by: Ethan Jackson <ethan-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index d8277d2..cf58ced 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -425,10 +425,10 @@ static int validate_sample(const struct nlattr *attr,
 static int validate_tp_port(const struct sw_flow_key *flow_key)
 {
 	if (flow_key->eth.type == htons(ETH_P_IP)) {
-		if (flow_key->ipv4.tp.src && flow_key->ipv4.tp.dst)
+		if (flow_key->ipv4.tp.src || flow_key->ipv4.tp.dst)
 			return 0;
 	} else if (flow_key->eth.type == htons(ETH_P_IPV6)) {
-		if (flow_key->ipv6.tp.src && flow_key->ipv6.tp.dst)
+		if (flow_key->ipv6.tp.src || flow_key->ipv6.tp.dst)
 			return 0;
 	}

@@ -460,7 +460,7 @@ static int validate_set(const struct nlattr *a,
 		if (flow_key->eth.type != htons(ETH_P_IP))
 			return -EINVAL;

-		if (!flow_key->ipv4.addr.src || !flow_key->ipv4.addr.dst)
+		if (!flow_key->ip.proto)
 			return -EINVAL;

 		ipv4_key = nla_data(ovs_key);
-- 
1.7.9.5

^ permalink raw reply related

* Re: [RFC PATCH 0/5] net: socket bind to file descriptor introduced
From: J. Bruce Fields @ 2012-09-04 19:00 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: Eric W. Biederman, tglx@linutronix.de, mingo@redhat.com,
	davem@davemloft.net, hpa@zytor.com,
	thierry.reding@avionic-design.de, bfields@redhat.com,
	eric.dumazet@gmail.com, Pavel Emelianov, neilb@suse.de,
	netdev@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, paul.gortmaker@windriver.com,
	viro@zeniv.linux.org.uk, gorcunov@openvz.org,
	akpm@linux-foundation.org, tim.c.chen@linux.intel.com,
	"devel@ope
In-Reply-To: <50320EE5.10307@parallels.com>

On Mon, Aug 20, 2012 at 02:18:13PM +0400, Stanislav Kinsbursky wrote:
> 16.08.2012 07:03, Eric W. Biederman пишет:
> >Stanislav Kinsbursky <skinsbursky@parallels.com> writes:
> >
> >>This patch set introduces new socket operation and new system call:
> >>sys_fbind(), which allows to bind socket to opened file.
> >>File to bind to can be created by sys_mknod(S_IFSOCK) and opened by
> >>open(O_PATH).
> >>
> >>This system call is especially required for UNIX sockets, which has name
> >>lenght limitation.
> >>
> >>The following series implements...
> >
> >Hmm.  I just realized this patchset is even sillier than I thought.
> >
> >Stanislav is the problem you are ultimately trying to solve nfs clients
> >in a container connecting to the wrong user space rpciod?
> >
> 
> Hi, Eric.
> The problem you mentioned was the reason why I started to think about this.
> But currently I believe, that limitations in unix sockets connect or
> bind should be removed, because it will be useful it least for CRIU
> project.
> 
> >Aka net/sunrpc/xprtsock.c:xs_setup_local only taking an absolute path
> >and then creating a delayed work item to actually open the unix domain
> >socket?
> >
> >The straight correct and straight forward thing to do appears to be:
> >- Capture the root from current->fs in xs_setup_local.
> >- In xs_local_finish_connect change current->fs.root to the captured
> >   version of root before kernel_connect, and restore current->fs.root
> >   after kernel_connect.
> >
> >It might not be a bad idea to implement open on unix domain sockets in
> >a filesystem as create(AF_LOCAL)+connect() which would allow you to
> >replace __sock_create + kernel_connect with a simple file_open_root.
> >
> 
> I like the idea of introducing new family (AF_LOCAL_AT for example)
> and new sockaddr for connecting or binding from specified root. The
> only thing I'm worrying is passing file descriptor to unix bind or
> connect routine. Because this approach doesn't provide easy way to
> use such family and sockaddr in kernel (like in NFS example).
> 
> >But I think the simple scheme of:
> >struct path old_root;
> >old_root = current->fs.root;
> >kernel_connect(...);
> >current->fs.root = old_root;
> >
> >Is more than sufficient and will remove the need for anything
> >except a purely local change to get nfs clients to connect from
> >containers.
> >
> 
> That was my first idea.

So is this what you're planning on doing now?

> And probably it would be worth to change all
> fs_struct to support sockets with relative path.
> What do you think about it?

I didn't understand the question.  Are you suggesting that changes to
fs_struct would be required to make this work?  I don't see why.

--b.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox