netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] packet: handle too big packets for PACKET_V3
@ 2014-08-15  0:09 Eric Dumazet
  2014-08-15  0:36 ` Neil Horman
  2014-08-15  0:43 ` Hannes Frederic Sowa
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2014-08-15  0:09 UTC (permalink / raw)
  To: Daniel Borkmann, Neil Horman, Hannes Frederic Sowa,
	Jesper Dangaard Brouer, David Miller
  Cc: netdev

It looks like PACKET_V3 has no check that a packet can always fit in a
block.

Its trivial with GRO to break the assumption and write into kernel
memory.

[  840.989881] BUG: unable to handle kernel paging req0
[  840.997667] IP: [<ffffffff9ed17466>] memcpy+0x6/0x110
[  841.003314] PGD 2068b067 PUD 20371d3067 PMD 20370e7067 PTE 80000001dd6b2060
[  841.011120] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[  841.016199] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 xt_NFLOG nfnetlink_log nfo
[  841.060739] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.16
[  841.077824] task: ffff880119ad2580 ti: ffff880119ad8000 task.ti: ffff880119ad8000
[  841.086173] RIP: 0010:[<ffffffff9ed17466>]  [<ffffffff9ed17466>] memcpy+0x6/0x110
[  841.094531] RSP: 0018:ffff880fffac3b48  EFLAGS: 00010286
[  841.100461] RAX: ffff8801dd6b1b9c RBX: 0000000000000a74 RCX: 00000000000000d6
[  841.108412] RDX: 000000000000053a RSI: ffff880132a769ce RDI: ffff8801dd6b2000
[  841.116373] RBP: ffff880fffac3bb0 R08: 0000000000000ff0 R09: ffff8801dd6b1b9c
[  841.124332] R10: ffff8801bb529f00 R11: 000000000000053a R12: 0000000000000ab6
[  841.132292] R13: ffff880119ad8000 R14: 0000000000000001 R15: ffff8801bb529f00
[  841.140252] FS:  0000000000000000(0000) GS:ffff880fffac0000(0000) knlGS:0000000000000000
[  841.149279] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  841.155689] CR2: ffff8801dd6b2000 CR3: 000000001f40e000 CR4: 00000000001427e0
[  841.163649] Stack:
[  841.165888]  ffffffff9eed1a15 ffff8801e1d06800 ffff88010000053a 000000000000053a
[  841.174176]  ffff8801dd6b1b9c ffff880100000ff0 ffff880119adbfd8 ffff880132a755b6
[  841.182466]  ffff8801bb529f00 ffff8801e1d06800 000000000000151c ffff880132a755b6
[  841.190764] Call Trace:
[  841.193490]  <IRQ>
[  841.195632]  [<ffffffff9eed1a15>] ? skb_copy_bits+0x145/0x290
[  841.202272]  [<ffffffff9ef95899>] tpacket_rcv+0x249/0x820
[  841.208295]  [<ffffffff9eee3358>] __netif_receive_skb_core+0x388/0x950
[  841.215578]  [<ffffffff9eee30fb>] ? __netif_receive_skb_core+0x12b/0x950
[  841.223054]  [<ffffffff9eee3941>] __netif_receive_skb+0x21/0x70
[  841.229658]  [<ffffffff9eee3b35>] netif_receive_skb+0x25/0x120
[  841.236166]  [<ffffffff9eee3b60>] ? netif_receive_skb+0x50/0x120
[  841.242869]  [<ffffffff9ef67b5e>] ? inet_gro_receive+0x6e/0x290
[  841.249474]  [<ffffffff9eee3d6e>] napi_gro_complete.isra.77+0x13e/0x160
[  841.256853]  [<ffffffff9eee3c60>] ? napi_gro_complete.isra.77+0x30/0x160
[  841.264331]  [<ffffffffc03dc0bd>] ? gro_cell_poll+0x8d/0xd0 [ip_tunnel]
[  841.271711]  [<ffffffff9eee4178>] napi_gro_flush+0x78/0xa0
[  841.277830]  [<ffffffff9eee41d3>] napi_complete+0x33/0x80
[  841.283852]  [<ffffffffc03dc0d4>] gro_cell_poll+0xa4/0xd0 [ip_tunnel]
[  841.291038]  [<ffffffff9eee435a>] net_rx_action+0x13a/0x240
[  841.297254]  [<ffffffff9ea575c0>] __do_softirq+0x100/0x290
[  841.303374]  [<ffffffff9efb1b0c>] call_softirq+0x1c/0x30
[  841.309300]  [<ffffffff9ea0bdbd>] do_softirq+0x8d/0xc0
[  841.315032]  [<ffffffff9ea5791d>] irq_exit+0xcd/0xd0
[  841.320571]  [<ffffffff9ea05db3>] do_IRQ+0x63/0xe0
[  841.325915]  [<ffffffff9efa762f>] common_interrupt+0x6f/0x6f

Not sure how to fix this.

This patch only shows where the problem is, but should we :

1) drop the too long packet
2) clamp size to maximal admissible size
3) other solution ? (PACKET_V2 can queue a clone of skb in
receive_queue, but PACKET_V3 has no such capability)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8d9f8042705a..6ee072b746fb 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1073,6 +1073,11 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po,
 		return (void *)curr;
 	}
 
+	/* If frame do not fit on a single block, bail out */
+	if (BLK_PLUS_PRIV(pkc->blk_sizeof_priv) +
+	    TOTAL_PKT_LEN_INCL_ALIGN(len) >= pkc->kblk_size)
+		return NULL;
+
 	/* Ok, close the current block */
 	prb_retire_current_block(pkc, po, 0);
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  0:09 [RFC] packet: handle too big packets for PACKET_V3 Eric Dumazet
@ 2014-08-15  0:36 ` Neil Horman
  2014-08-15  0:43 ` Hannes Frederic Sowa
  1 sibling, 0 replies; 18+ messages in thread
From: Neil Horman @ 2014-08-15  0:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, Hannes Frederic Sowa, Jesper Dangaard Brouer,
	David Miller, netdev

On Thu, Aug 14, 2014 at 05:09:54PM -0700, Eric Dumazet wrote:
> It looks like PACKET_V3 has no check that a packet can always fit in a
> block.
> 
> Its trivial with GRO to break the assumption and write into kernel
> memory.
> 
> [  840.989881] BUG: unable to handle kernel paging req0
> [  840.997667] IP: [<ffffffff9ed17466>] memcpy+0x6/0x110
> [  841.003314] PGD 2068b067 PUD 20371d3067 PMD 20370e7067 PTE 80000001dd6b2060
> [  841.011120] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
> [  841.016199] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 xt_NFLOG nfnetlink_log nfo
> [  841.060739] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.16
> [  841.077824] task: ffff880119ad2580 ti: ffff880119ad8000 task.ti: ffff880119ad8000
> [  841.086173] RIP: 0010:[<ffffffff9ed17466>]  [<ffffffff9ed17466>] memcpy+0x6/0x110
> [  841.094531] RSP: 0018:ffff880fffac3b48  EFLAGS: 00010286
> [  841.100461] RAX: ffff8801dd6b1b9c RBX: 0000000000000a74 RCX: 00000000000000d6
> [  841.108412] RDX: 000000000000053a RSI: ffff880132a769ce RDI: ffff8801dd6b2000
> [  841.116373] RBP: ffff880fffac3bb0 R08: 0000000000000ff0 R09: ffff8801dd6b1b9c
> [  841.124332] R10: ffff8801bb529f00 R11: 000000000000053a R12: 0000000000000ab6
> [  841.132292] R13: ffff880119ad8000 R14: 0000000000000001 R15: ffff8801bb529f00
> [  841.140252] FS:  0000000000000000(0000) GS:ffff880fffac0000(0000) knlGS:0000000000000000
> [  841.149279] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  841.155689] CR2: ffff8801dd6b2000 CR3: 000000001f40e000 CR4: 00000000001427e0
> [  841.163649] Stack:
> [  841.165888]  ffffffff9eed1a15 ffff8801e1d06800 ffff88010000053a 000000000000053a
> [  841.174176]  ffff8801dd6b1b9c ffff880100000ff0 ffff880119adbfd8 ffff880132a755b6
> [  841.182466]  ffff8801bb529f00 ffff8801e1d06800 000000000000151c ffff880132a755b6
> [  841.190764] Call Trace:
> [  841.193490]  <IRQ>
> [  841.195632]  [<ffffffff9eed1a15>] ? skb_copy_bits+0x145/0x290
> [  841.202272]  [<ffffffff9ef95899>] tpacket_rcv+0x249/0x820
> [  841.208295]  [<ffffffff9eee3358>] __netif_receive_skb_core+0x388/0x950
> [  841.215578]  [<ffffffff9eee30fb>] ? __netif_receive_skb_core+0x12b/0x950
> [  841.223054]  [<ffffffff9eee3941>] __netif_receive_skb+0x21/0x70
> [  841.229658]  [<ffffffff9eee3b35>] netif_receive_skb+0x25/0x120
> [  841.236166]  [<ffffffff9eee3b60>] ? netif_receive_skb+0x50/0x120
> [  841.242869]  [<ffffffff9ef67b5e>] ? inet_gro_receive+0x6e/0x290
> [  841.249474]  [<ffffffff9eee3d6e>] napi_gro_complete.isra.77+0x13e/0x160
> [  841.256853]  [<ffffffff9eee3c60>] ? napi_gro_complete.isra.77+0x30/0x160
> [  841.264331]  [<ffffffffc03dc0bd>] ? gro_cell_poll+0x8d/0xd0 [ip_tunnel]
> [  841.271711]  [<ffffffff9eee4178>] napi_gro_flush+0x78/0xa0
> [  841.277830]  [<ffffffff9eee41d3>] napi_complete+0x33/0x80
> [  841.283852]  [<ffffffffc03dc0d4>] gro_cell_poll+0xa4/0xd0 [ip_tunnel]
> [  841.291038]  [<ffffffff9eee435a>] net_rx_action+0x13a/0x240
> [  841.297254]  [<ffffffff9ea575c0>] __do_softirq+0x100/0x290
> [  841.303374]  [<ffffffff9efb1b0c>] call_softirq+0x1c/0x30
> [  841.309300]  [<ffffffff9ea0bdbd>] do_softirq+0x8d/0xc0
> [  841.315032]  [<ffffffff9ea5791d>] irq_exit+0xcd/0xd0
> [  841.320571]  [<ffffffff9ea05db3>] do_IRQ+0x63/0xe0
> [  841.325915]  [<ffffffff9efa762f>] common_interrupt+0x6f/0x6f
> 
> Not sure how to fix this.
> 
> This patch only shows where the problem is, but should we :
> 
> 1) drop the too long packet
> 2) clamp size to maximal admissible size
> 3) other solution ? (PACKET_V2 can queue a clone of skb in
> receive_queue, but PACKET_V3 has no such capability)
> 

Can we separate a gro packet into its constituent pieces and write them
into/across blocks accordingly?  Or alternatively can we add a bit to the
tp_status field to indicate that a packet may span a block?  That probably
breaks applications I would imagine, but maybe we can create a sockopt to allow
applications to opt in to that behavior to give them the choice between that,
and a simple packet drop.

Neil


> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 8d9f8042705a..6ee072b746fb 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1073,6 +1073,11 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po,
>  		return (void *)curr;
>  	}
>  
> +	/* If frame do not fit on a single block, bail out */
> +	if (BLK_PLUS_PRIV(pkc->blk_sizeof_priv) +
> +	    TOTAL_PKT_LEN_INCL_ALIGN(len) >= pkc->kblk_size)
> +		return NULL;
> +
>  	/* Ok, close the current block */
>  	prb_retire_current_block(pkc, po, 0);
>  
> 
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  0:09 [RFC] packet: handle too big packets for PACKET_V3 Eric Dumazet
  2014-08-15  0:36 ` Neil Horman
@ 2014-08-15  0:43 ` Hannes Frederic Sowa
  2014-08-15  0:50   ` Hannes Frederic Sowa
  2014-08-15  0:54   ` Eric Dumazet
  1 sibling, 2 replies; 18+ messages in thread
From: Hannes Frederic Sowa @ 2014-08-15  0:43 UTC (permalink / raw)
  To: Eric Dumazet, Daniel Borkmann, Neil Horman,
	Jesper Dangaard Brouer, David Miller
  Cc: netdev

Hi Eric,

On Fri, Aug 15, 2014, at 02:09, Eric Dumazet wrote:
> It looks like PACKET_V3 has no check that a packet can always fit in a
> block.
> 
> Its trivial with GRO to break the assumption and write into kernel
> memory.
> 
> [...]
>
> Not sure how to fix this.
> 
> This patch only shows where the problem is, but should we :
> 
> 1) drop the too long packet

Someone could use GRO to create packet trains to hide from intrustion
detection systems, which maybe are the main user of TPACKET_V3. I don't
think this is a good idea.

> 2) clamp size to maximal admissible size

Maybe.

> 3) other solution ? (PACKET_V2 can queue a clone of skb in
> receive_queue, but PACKET_V3 has no such capability)

4) Can we still try to skb_gso_segment the packet again? Not nice, but I
guess this will work. Maybe depending on a tunable (default to on)?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  0:43 ` Hannes Frederic Sowa
@ 2014-08-15  0:50   ` Hannes Frederic Sowa
  2014-08-15  0:56     ` Eric Dumazet
  2014-08-15  0:54   ` Eric Dumazet
  1 sibling, 1 reply; 18+ messages in thread
From: Hannes Frederic Sowa @ 2014-08-15  0:50 UTC (permalink / raw)
  To: Eric Dumazet, Daniel Borkmann, Neil Horman,
	Jesper Dangaard Brouer, David Miller
  Cc: netdev

Another thought...

On Fri, Aug 15, 2014, at 02:43, Hannes Frederic Sowa wrote:
> On Fri, Aug 15, 2014, at 02:09, Eric Dumazet wrote:
> > It looks like PACKET_V3 has no check that a packet can always fit in a
> > block.
> > 
> > Its trivial with GRO to break the assumption and write into kernel
> > memory.
>
> [...]
>
> 4) Can we still try to skb_gso_segment the packet again? Not nice, but I
> guess this will work. Maybe depending on a tunable (default to on)?

5) Or let a setsockopt control the NETIF_F_GRO bit of an interface if it
is bound to it? This will likely result in an implementation where we
must count current TPACKET_V3 sockets disabling GRO on an interface,
much like promisc counter.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  0:43 ` Hannes Frederic Sowa
  2014-08-15  0:50   ` Hannes Frederic Sowa
@ 2014-08-15  0:54   ` Eric Dumazet
  2014-08-15  1:04     ` Hannes Frederic Sowa
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2014-08-15  0:54 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Neil Horman, Jesper Dangaard Brouer,
	David Miller, netdev

On Fri, 2014-08-15 at 02:43 +0200, Hannes Frederic Sowa wrote:

> Someone could use GRO to create packet trains to hide from intrustion
> detection systems, which maybe are the main user of TPACKET_V3. I don't
> think this is a good idea.

Presumably these tools already use a large enough bloc_size, and not a
4KB one ;)

Even without GRO, a jumbo frame (9K) can trigger the bug.

I do not think we need to skb_gso_segment() for the cases user setup a
really small bloc_size. This looks like a lot of consumed cycles (we
even might have to recompute the TCP checksums)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  0:50   ` Hannes Frederic Sowa
@ 2014-08-15  0:56     ` Eric Dumazet
  2014-08-15 10:19       ` Neil Horman
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2014-08-15  0:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Neil Horman, Jesper Dangaard Brouer,
	David Miller, netdev

On Fri, 2014-08-15 at 02:50 +0200, Hannes Frederic Sowa wrote:

> 5) Or let a setsockopt control the NETIF_F_GRO bit of an interface if it
> is bound to it? This will likely result in an implementation where we
> must count current TPACKET_V3 sockets disabling GRO on an interface,
> much like promisc counter.

loopback interface has MTU = 64K, it looks like the problem is not tied
to GRO.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  0:54   ` Eric Dumazet
@ 2014-08-15  1:04     ` Hannes Frederic Sowa
  2014-08-15  2:01       ` Eric Dumazet
  2014-08-15  4:54       ` [RFC] " Guy Harris
  0 siblings, 2 replies; 18+ messages in thread
From: Hannes Frederic Sowa @ 2014-08-15  1:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, Neil Horman, Jesper Dangaard Brouer,
	David Miller, netdev

Hi,

On Fri, Aug 15, 2014, at 02:54, Eric Dumazet wrote:
> On Fri, 2014-08-15 at 02:43 +0200, Hannes Frederic Sowa wrote:
> 
> > Someone could use GRO to create packet trains to hide from intrustion
> > detection systems, which maybe are the main user of TPACKET_V3. I don't
> > think this is a good idea.
> 
> Presumably these tools already use a large enough bloc_size, and not a
> 4KB one ;)
> 
> Even without GRO, a jumbo frame (9K) can trigger the bug.

Sure, but if I would have written such a tool without knowledge of GRO I
would have queried at least the MTU. ;)
If an IDS allocates block_sizes below the MTU there is not much we can
do. But up to the MTU we should let GRO behave transparently and here we
violate this. There are also interfaces which extremely large MTUs but
at least they report the MTU size correctly to user space.

> I do not think we need to skb_gso_segment() for the cases user setup a
> really small bloc_size. This looks like a lot of consumed cycles (we
> even might have to recompute the TCP checksums)

Yes, I feared that, too. Attacker could specifically target a slow path.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  1:04     ` Hannes Frederic Sowa
@ 2014-08-15  2:01       ` Eric Dumazet
  2014-08-15  2:08         ` Hannes Frederic Sowa
  2014-08-15  5:02         ` Guy Harris
  2014-08-15  4:54       ` [RFC] " Guy Harris
  1 sibling, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2014-08-15  2:01 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Neil Horman, Jesper Dangaard Brouer,
	David Miller, netdev

On Fri, 2014-08-15 at 03:04 +0200, Hannes Frederic Sowa wrote:

> Sure, but if I would have written such a tool without knowledge of GRO I
> would have queried at least the MTU. ;)

Would all existing tools react to a mtu change properly ?

(It would have to reinit its af_packet ring)

I do not think its a GRO issue, really, but an optimistic PACKET_V3
implementation. GRO was already there when PACKET_V3 was added.

Even a non GRO packet might not fit and we need to avoid the
crash/corruption.

I believe I am going to implement 2)  (clamp the snaplen)

An application should catch that tp_len might be bigger than tp_snaplen
and eventually do something sensible about it.

BTW, tp_sizeof_priv doesnt look to be checked at all, user input is
accepted as is.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  2:01       ` Eric Dumazet
@ 2014-08-15  2:08         ` Hannes Frederic Sowa
  2014-08-15  5:02         ` Guy Harris
  1 sibling, 0 replies; 18+ messages in thread
From: Hannes Frederic Sowa @ 2014-08-15  2:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, Neil Horman, Jesper Dangaard Brouer,
	David Miller, netdev

On Fri, Aug 15, 2014, at 04:01, Eric Dumazet wrote:
> On Fri, 2014-08-15 at 03:04 +0200, Hannes Frederic Sowa wrote:
> 
> > Sure, but if I would have written such a tool without knowledge of GRO I
> > would have queried at least the MTU. ;)
> 
> Would all existing tools react to a mtu change properly ?
> 
> (It would have to reinit its af_packet ring)
> 
> I do not think its a GRO issue, really, but an optimistic PACKET_V3
> implementation. GRO was already there when PACKET_V3 was added.
> 
> Even a non GRO packet might not fit and we need to avoid the
> crash/corruption.
> 
> I believe I am going to implement 2)  (clamp the snaplen)

Ok, cool, I think this is the best way forward for now.

Do you think a a printk_once if the packet gets clamped because of GRO
would be sensible with a hint that GRO could be disabled in such cases?

> An application should catch that tp_len might be bigger than tp_snaplen
> and eventually do something sensible about it.
> 
> BTW, tp_sizeof_priv doesnt look to be checked at all, user input is
> accepted as is.

Hmm...

Bye,
Hannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  1:04     ` Hannes Frederic Sowa
  2014-08-15  2:01       ` Eric Dumazet
@ 2014-08-15  4:54       ` Guy Harris
  2014-08-15 11:37         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 18+ messages in thread
From: Guy Harris @ 2014-08-15  4:54 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, Daniel Borkmann, Neil Horman,
	Jesper Dangaard Brouer, David Miller, netdev


On Aug 14, 2014, at 6:04 PM, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

> On Fri, Aug 15, 2014, at 02:54, Eric Dumazet wrote:
>> On Fri, 2014-08-15 at 02:43 +0200, Hannes Frederic Sowa wrote:
>> 
>>> Someone could use GRO to create packet trains to hide from intrustion
>>> detection systems, which maybe are the main user of TPACKET_V3. I don't
>>> think this is a good idea.
>> 
>> Presumably these tools already use a large enough bloc_size, and not a
>> 4KB one ;)
>> 
>> Even without GRO, a jumbo frame (9K) can trigger the bug.
> 
> Sure, but if I would have written such a tool without knowledge of GRO I
> would have queried at least the MTU. ;)

...and then queried the maximum size of the headers that precede the link-layer payload.

Except that you *can't* do that, and it can be variable-length without an obvious maximum (think 802.11 in monitor mode, where you have radiotap headers).  This causes much pain for libpcap when using TPACKET_V1 and TPACKET_V2, forcing it to allocate huge blocks when smaller ones might be sufficient.

> If an IDS allocates block_sizes below the MTU there is not much we can
> do.

If an IDS uses libpcap, it will get libpcap's behavior, which, for TPACKET_V3 is, roughly

	req.tp_frame_size = MAXIMUM_SNAPLEN;
	req.tp_frame_nr = handle->opt.buffer_size/req.tp_frame_size;

	/* compute the minumum block size that will handle this frame. 
	 * The block has to be page size aligned. 
	 * The max block size allowed by the kernel is arch-dependent and 
	 * it's not explicitly checked here. */
	req.tp_block_size = getpagesize();
	while (req.tp_block_size < req.tp_frame_size) 
		req.tp_block_size <<= 1;

	frames_per_block = req.tp_block_size/req.tp_frame_size;

	req.tp_block_nr = req.tp_frame_nr / frames_per_block;

	/* req.tp_frame_nr is requested to match frames_per_block*req.tp_block_nr */
	req.tp_frame_nr = req.tp_block_nr * frames_per_block;

(the last two C statements are actually part of a loop, where it'll reduce req.tp_frame_nr if it gets told "I don't have enough room for that big a ring").

MAXIMUM_SNAPLEN is 65535 in older versions of libpcap and 262144 in newer versions; it's the maximum frame size.

handle->opt.buffer_size is the buffer size requested by the application; it defaults to 2 MiB.

That calculation, with the default values for the latest version of libpcap, ends up with:

	req.tp_frame_size = 262144;
	req.tp_frame_nr = /* 2097152/262144 */ 8;

	/* compute the minumum block size that will handle this frame. 
	 * The block has to be page size aligned. 
	 * The max block size allowed by the kernel is arch-dependent and 
	 * it's not explicitly checked here. */
	req.tp_block_size = 4096;	/* IA-32 and x86-64, and probably many others */
	while (req.tp_block_size < req.tp_frame_size) 
		req.tp_block_size <<= 1;
	/* ends up with req.tp_block_size = 262144 */

	frames_per_block = /* 262144/262144 */ 1;

	req.tp_block_nr = /* 8 / 1 */ 8;

	/* req.tp_frame_nr is requested to match frames_per_block*req.tp_block_nr */
	req.tp_frame_nr = /* 8 * 1 / 8;

which I think means "8 256 KiB blocks".

> But up to the MTU we should let GRO behave transparently and here we
> violate this. There are also interfaces which extremely large MTUs but
> at least they report the MTU size correctly to user space.

Reporting the MTU to user space is insufficient for libpcap; it needs to know the *maximum packet size*, which includes link-layer headers (which I guess it could compute based on the ARPHRD_ for the interface, although for 802.11 that may be subject to change, e.g. the maximum link-layer header length grew when the QoS stuff was added) *and* metadata headers (such as radiotap, for which there really is no generic maximum length other than the 65535 byte limit imposed by the header length field being 16 bits, but a given driver can presumably return its maximum).

It also doesn't help with segmentation/reassembly offloading, where the MTU is decoupled from the maximum packet size.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  2:01       ` Eric Dumazet
  2014-08-15  2:08         ` Hannes Frederic Sowa
@ 2014-08-15  5:02         ` Guy Harris
  2014-08-15 16:00           ` Eric Dumazet
  2014-08-15 16:16           ` [PATCH v1 net] " Eric Dumazet
  1 sibling, 2 replies; 18+ messages in thread
From: Guy Harris @ 2014-08-15  5:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Neil Horman,
	Jesper Dangaard Brouer, David Miller, netdev


On Aug 14, 2014, at 7:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> I believe I am going to implement 2)  (clamp the snaplen)

If you mean "a packet that's too big to fit into a single block will be cut off at the size of the block, even if the user-requested snaplen is bigger than that", I, at least, have no problem whatsoever with that from libpcap's point of view.  We, the libpcap developers, could just boost the maximum snaplen if necessary.  (Disclaimer: I don't speak for all developers or users of libpcap.)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  0:56     ` Eric Dumazet
@ 2014-08-15 10:19       ` Neil Horman
  0 siblings, 0 replies; 18+ messages in thread
From: Neil Horman @ 2014-08-15 10:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Jesper Dangaard Brouer,
	David Miller, netdev

On Thu, Aug 14, 2014 at 05:56:20PM -0700, Eric Dumazet wrote:
> On Fri, 2014-08-15 at 02:50 +0200, Hannes Frederic Sowa wrote:
> 
> > 5) Or let a setsockopt control the NETIF_F_GRO bit of an interface if it
> > is bound to it? This will likely result in an implementation where we
> > must count current TPACKET_V3 sockets disabling GRO on an interface,
> > much like promisc counter.
> 
> loopback interface has MTU = 64K, it looks like the problem is not tied
> to GRO.
> 
Can we add a listener to the af packet code to resize the block length of the
socket based on the MTU?  

I realize that overloads user configuration, but if we warn them...

Neil
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  4:54       ` [RFC] " Guy Harris
@ 2014-08-15 11:37         ` Hannes Frederic Sowa
  0 siblings, 0 replies; 18+ messages in thread
From: Hannes Frederic Sowa @ 2014-08-15 11:37 UTC (permalink / raw)
  To: Guy Harris
  Cc: Eric Dumazet, Daniel Borkmann, Neil Horman,
	Jesper Dangaard Brouer, David Miller, netdev

Hi,

On Do, 2014-08-14 at 21:54 -0700, Guy Harris wrote:
> On Aug 14, 2014, at 6:04 PM, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> 
> > On Fri, Aug 15, 2014, at 02:54, Eric Dumazet wrote:
> >> On Fri, 2014-08-15 at 02:43 +0200, Hannes Frederic Sowa wrote:
> >> 
> >>> Someone could use GRO to create packet trains to hide from intrustion
> >>> detection systems, which maybe are the main user of TPACKET_V3. I don't
> >>> think this is a good idea.
> >> 
> >> Presumably these tools already use a large enough bloc_size, and not a
> >> 4KB one ;)
> >> 
> >> Even without GRO, a jumbo frame (9K) can trigger the bug.
> > 
> > Sure, but if I would have written such a tool without knowledge of GRO I
> > would have queried at least the MTU. ;)
> 
> ...and then queried the maximum size of the headers that precede the link-layer payload.

But those are mostly constant, no?

> Except that you *can't* do that, and it can be variable-length without an obvious maximum (think 802.11 in monitor mode, where you have radiotap headers).  This causes much pain for libpcap when using TPACKET_V1 and TPACKET_V2, forcing it to allocate huge blocks when smaller ones might be sufficient.
> > If an IDS allocates block_sizes below the MTU there is not much we can
> > do.
> 
> If an IDS uses libpcap, it will get libpcap's behavior, which, for TPACKET_V3 is, roughly
> 
> 	req.tp_frame_size = MAXIMUM_SNAPLEN;
> 	req.tp_frame_nr = handle->opt.buffer_size/req.tp_frame_size;
> 
> 	/* compute the minumum block size that will handle this frame. 
> 	 * The block has to be page size aligned. 
> 	 * The max block size allowed by the kernel is arch-dependent and 
> 	 * it's not explicitly checked here. */
> 	req.tp_block_size = getpagesize();
> 	while (req.tp_block_size < req.tp_frame_size) 
> 		req.tp_block_size <<= 1;
> 
> 	frames_per_block = req.tp_block_size/req.tp_frame_size;
> 
> 	req.tp_block_nr = req.tp_frame_nr / frames_per_block;
> 
> 	/* req.tp_frame_nr is requested to match frames_per_block*req.tp_block_nr */
> 	req.tp_frame_nr = req.tp_block_nr * frames_per_block;
> 
> (the last two C statements are actually part of a loop, where it'll reduce req.tp_frame_nr if it gets told "I don't have enough room for that big a ring").
> 
> MAXIMUM_SNAPLEN is 65535 in older versions of libpcap and 262144 in newer versions; it's the maximum frame size.
> 
> handle->opt.buffer_size is the buffer size requested by the application; it defaults to 2 MiB.
> 
> That calculation, with the default values for the latest version of libpcap, ends up with:
> 
> 	req.tp_frame_size = 262144;
> 	req.tp_frame_nr = /* 2097152/262144 */ 8;
> 
> 	/* compute the minumum block size that will handle this frame. 
> 	 * The block has to be page size aligned. 
> 	 * The max block size allowed by the kernel is arch-dependent and 
> 	 * it's not explicitly checked here. */
> 	req.tp_block_size = 4096;	/* IA-32 and x86-64, and probably many others */
> 	while (req.tp_block_size < req.tp_frame_size) 
> 		req.tp_block_size <<= 1;
> 	/* ends up with req.tp_block_size = 262144 */
> 
> 	frames_per_block = /* 262144/262144 */ 1;
> 
> 	req.tp_block_nr = /* 8 / 1 */ 8;
> 
> 	/* req.tp_frame_nr is requested to match frames_per_block*req.tp_block_nr */
> 	req.tp_frame_nr = /* 8 * 1 / 8;
> 
> which I think means "8 256 KiB blocks".
> 
> > But up to the MTU we should let GRO behave transparently and here we
> > violate this. There are also interfaces which extremely large MTUs but
> > at least they report the MTU size correctly to user space.
> 
> Reporting the MTU to user space is insufficient for libpcap; it needs to know the *maximum packet size*, which includes link-layer headers (which I guess it could compute based on the ARPHRD_ for the interface, although for 802.11 that may be subject to change, e.g. the maximum link-layer header length grew when the QoS stuff was added) *and* metadata headers (such as radiotap, for which there really is no generic maximum length other than the 65535 byte limit imposed by the header length field being 16 bits, but a given driver can presumably return its maximum).
> 
> It also doesn't help with segmentation/reassembly offloading, where the MTU is decoupled from the maximum packet size.

Thanks, that answered the question above. Still, I think that the most
commonly use-cases would get covered by querying the MTU. libpcap needs
to be much more general and so must pay more attention.

Also, IMHO GRO is something special and does not fit that model very
well. But I am also fine with clamping. If people complain, we can
revisit that topic.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] packet: handle too big packets for PACKET_V3
  2014-08-15  5:02         ` Guy Harris
@ 2014-08-15 16:00           ` Eric Dumazet
  2014-08-15 16:16           ` [PATCH v1 net] " Eric Dumazet
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2014-08-15 16:00 UTC (permalink / raw)
  To: Guy Harris
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Neil Horman,
	Jesper Dangaard Brouer, David Miller, netdev

On Thu, 2014-08-14 at 22:02 -0700, Guy Harris wrote:
> On Aug 14, 2014, at 7:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > I believe I am going to implement 2)  (clamp the snaplen)
> 
> If you mean "a packet that's too big to fit into a single block will be cut off at the size of the block, even if the user-requested snaplen is bigger than that", I, at least, have no problem whatsoever with that from libpcap's point of view.  We, the libpcap developers, could just boost the maximum snaplen if necessary.  (Disclaimer: I don't speak for all developers or users of libpcap.)--


Note that libpcap has no problem as far as I know.

Following patch can demonstrate the problem using the
tools/testing/selftests/net/psock_tpacket included in linux kernel
sources.

Running it even on a non debug enabled kernel gave me :

# ./psock_tpacket 
test: TPACKET_V3 with PACKET_RX_RING ............
prev_block_seq_num:116, expected seq:117 != actual seq:7016996765293437281
# [78853.093626] general protection fault: 0000 [#1] SMP 
[78853.098634] Modules linked in: ...
[78853.111390] CPU: 8 PID: 1867 Comm: kworker/8:1 Not tainted 3.16.0-smp-DEV #722
[78853.118615] Hardware name: ...
[78853.125511] Workqueue: events cache_reap
[78853.129448] task: ffff8810565a6a70 ti: ffff881055c30000 task.ti: ffff881055c30000
[78853.136936] RIP: 0010:[<ffffffff811b9b3e>]  [<ffffffff811b9b3e>] cache_reap+0x10e/0x240
[78853.144980] RSP: 0018:ffff881055c33d98  EFLAGS: 00010202
[78853.150303] RAX: ffff881056a2f140 RBX: ffff8810569ee740 RCX: 0000000000000000
[78853.157434] RDX: 0000000000000000 RSI: ffff881057975b40 RDI: ffff880856afc040
[78853.164567] RBP: ffff881055c33de8 R08: 0000000000000001 R09: 0000000000000000
[78853.171715] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[78853.178846] R13: 0000000000000008 R14: 6161616161616161 R15: ffff881056e5d580
[78853.185979] FS:  0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000
[78853.194073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[78853.199819] CR2: 000000000aeff000 CR3: 0000000001a14000 CR4: 00000000000407e0
[78853.206944] Stack:
[78853.208970]  0000000000000000 ffff88107fc0f120 ffff88107fc12080 ffff88107fc0f040
[78853.216423]  ffff881055c33dd8 ffff881057a53d00 ffff88107fc12080 ffff88107fc0f120
[78853.223875]  0000000000000000 0000000000000200 ffff881055c33e38 ffffffff810b2c3c
[78853.231334] Call Trace:
[78853.233805]  [<ffffffff810b2c3c>] process_one_work+0x18c/0x3f0
[78853.239626]  [<ffffffff810b32b3>] worker_thread+0x123/0x440
[78853.245198]  [<ffffffff810b3190>] ? rescuer_thread+0x2b0/0x2b0
[78853.251047]  [<ffffffff810b89d9>] kthread+0xc9/0xe0
[78853.255926]  [<ffffffff810b8910>] ? kthread_worker_fn+0x150/0x150
[78853.262036]  [<ffffffff8159d06c>] ret_from_fork+0x7c/0xb0
[78853.267433]  [<ffffffff810b8910>] ? kthread_worker_fn+0x150/0x150
[78853.273524] Code: 00 65 48 8b 14 25 a0 f1 00 00 48 63 ca 4a 8b 1c 28 48 8b 43 50 48 85 c0 0f 84 5f ff ff ff 4c 8b 34 c8 4d 85 f6 0f 84 52 ff ff ff <45> 8b 5e 08 45 85 db 0f 84 45 ff ff ff fa 4
c 89 f7 48 89 55 b0 
[78853.293487] RIP  [<ffffffff811b9b3e>] cache_reap+0x10e/0x240
[78853.299153]  RSP <ffff881055c33d98>



 tools/testing/selftests/net/psock_lib.h     |   11 ++---------
 tools/testing/selftests/net/psock_tpacket.c |   10 ++--------
 2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/net/psock_lib.h b/tools/testing/selftests/net/psock_lib.h
index 37da54ac85a9..40197779868f 100644
--- a/tools/testing/selftests/net/psock_lib.h
+++ b/tools/testing/selftests/net/psock_lib.h
@@ -28,7 +28,7 @@
 #include <arpa/inet.h>
 #include <unistd.h>
 
-#define DATA_LEN			100
+#define DATA_LEN			5000
 #define DATA_CHAR			'a'
 
 #define PORT_BASE			8000
@@ -40,14 +40,7 @@
 static __maybe_unused void pair_udp_setfilter(int fd)
 {
 	struct sock_filter bpf_filter[] = {
-		{ 0x80, 0, 0, 0x00000000 },  /* LD  pktlen		      */
-		{ 0x35, 0, 5, DATA_LEN   },  /* JGE DATA_LEN  [f goto nomatch]*/
-		{ 0x30, 0, 0, 0x00000050 },  /* LD  ip[80]		      */
-		{ 0x15, 0, 3, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
-		{ 0x30, 0, 0, 0x00000051 },  /* LD  ip[81]		      */
-		{ 0x15, 0, 1, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
-		{ 0x06, 0, 0, 0x00000060 },  /* RET match	              */
-		{ 0x06, 0, 0, 0x00000000 },  /* RET no match		      */
+		{ 0x06, 0, 0, 0x0000ffff },  /* RET match	              */
 	};
 	struct sock_fprog bpf_prog;
 
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 24adf709bd9d..f87f86f574f7 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -354,7 +354,7 @@ static void walk_v1_v2_tx(int sock, struct ring *ring)
 	int rcv_sock, ret;
 	size_t packet_len;
 	union frame_map ppd;
-	char packet[1024];
+	char packet[65536];
 	unsigned int frame_num = 0, got = 0;
 	struct sockaddr_ll ll = {
 		.sll_family = PF_PACKET,
@@ -608,7 +608,7 @@ static void __v3_fill(struct ring *ring, unsigned int blocks)
 	ring->req3.tp_sizeof_priv = 0;
 	ring->req3.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
 
-	ring->req3.tp_block_size = getpagesize() << 2;
+	ring->req3.tp_block_size = getpagesize();
 	ring->req3.tp_frame_size = TPACKET_ALIGNMENT << 7;
 	ring->req3.tp_block_nr = blocks;
 
@@ -789,12 +789,6 @@ int main(void)
 {
 	int ret = 0;
 
-	ret |= test_tpacket(TPACKET_V1, PACKET_RX_RING);
-	ret |= test_tpacket(TPACKET_V1, PACKET_TX_RING);
-
-	ret |= test_tpacket(TPACKET_V2, PACKET_RX_RING);
-	ret |= test_tpacket(TPACKET_V2, PACKET_TX_RING);
-
 	ret |= test_tpacket(TPACKET_V3, PACKET_RX_RING);
 
 	if (ret)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v1 net] packet: handle too big packets for PACKET_V3
  2014-08-15  5:02         ` Guy Harris
  2014-08-15 16:00           ` Eric Dumazet
@ 2014-08-15 16:16           ` Eric Dumazet
  2014-08-18 15:39             ` Daniel Borkmann
                               ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Eric Dumazet @ 2014-08-15 16:16 UTC (permalink / raw)
  To: David Miller
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Neil Horman,
	Jesper Dangaard Brouer, netdev, Guy Harris

From: Eric Dumazet <edumazet@google.com>

af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.

This is not generally the case, even if most existing tools do it right.

This patch clamps too long frames as API permits, and issue a one time
error on syslog.

[  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82

In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
---
 net/packet/af_packet.c |   17 +++++++++++++++++
 net/packet/internal.h  |    1 +
 2 files changed, 18 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8d9f8042705a..93896d2092f6 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -632,6 +632,7 @@ static void init_prb_bdqc(struct packet_sock *po,
 	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
 	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
 
+	p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
 	prb_init_ft_ops(p1, req_u);
 	prb_setup_retire_blk_timer(po, tx_ring);
 	prb_open_block(p1, pbd);
@@ -1942,6 +1943,18 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 			if ((int)snaplen < 0)
 				snaplen = 0;
 		}
+	} else if (unlikely(macoff + snaplen >
+			    GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len)) {
+		u32 nval;
+
+		nval = GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len - macoff;
+		pr_err_once("tpacket_rcv: packet too big, clamped from %u to %u. macoff=%u\n",
+			    snaplen, nval, macoff);
+		snaplen = nval;
+		if (unlikely((int)snaplen < 0)) {
+			snaplen = 0;
+			macoff = GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len;
+		}
 	}
 	spin_lock(&sk->sk_receive_queue.lock);
 	h.raw = packet_current_rx_frame(po, skb,
@@ -3783,6 +3796,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 			goto out;
 		if (unlikely(req->tp_block_size & (PAGE_SIZE - 1)))
 			goto out;
+		if (po->tp_version >= TPACKET_V3 &&
+		    (int)(req->tp_block_size -
+			  BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
+			goto out;
 		if (unlikely(req->tp_frame_size < po->tp_hdrlen +
 					po->tp_reserve))
 			goto out;
diff --git a/net/packet/internal.h b/net/packet/internal.h
index eb9580a6b25f..cdddf6a30399 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -29,6 +29,7 @@ struct tpacket_kbdq_core {
 	char		*pkblk_start;
 	char		*pkblk_end;
 	int		kblk_size;
+	unsigned int	max_frame_len;
 	unsigned int	knum_blocks;
 	uint64_t	knxt_seq_num;
 	char		*prev;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 net] packet: handle too big packets for PACKET_V3
  2014-08-15 16:16           ` [PATCH v1 net] " Eric Dumazet
@ 2014-08-18 15:39             ` Daniel Borkmann
  2014-08-18 17:08             ` Neil Horman
  2014-08-21 23:45             ` David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: Daniel Borkmann @ 2014-08-18 15:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Hannes Frederic Sowa, Neil Horman,
	Jesper Dangaard Brouer, netdev, Guy Harris

On 08/15/2014 06:16 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> af_packet can currently overwrite kernel memory by out of bound
> accesses, because it assumed a [new] block can always hold one frame.
>
> This is not generally the case, even if most existing tools do it right.
>
> This patch clamps too long frames as API permits, and issue a one time
> error on syslog.
>
> [  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
>
> In this example, packet header tp_snaplen was set to 3966,
> and tp_len was set to 5042 (skb->len)
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")

Acked-by: Daniel Borkmann <dborkman@redhat.com>

This looks good to me, thanks Eric!

[ Truly dislike the TPACKET_V3 code ... :/ ]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 net] packet: handle too big packets for PACKET_V3
  2014-08-15 16:16           ` [PATCH v1 net] " Eric Dumazet
  2014-08-18 15:39             ` Daniel Borkmann
@ 2014-08-18 17:08             ` Neil Horman
  2014-08-21 23:45             ` David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: Neil Horman @ 2014-08-18 17:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Hannes Frederic Sowa, Daniel Borkmann,
	Jesper Dangaard Brouer, netdev, Guy Harris

On Fri, Aug 15, 2014 at 09:16:04AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> af_packet can currently overwrite kernel memory by out of bound
> accesses, because it assumed a [new] block can always hold one frame.
> 
> This is not generally the case, even if most existing tools do it right.
> 
> This patch clamps too long frames as API permits, and issue a one time
> error on syslog.
> 
> [  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
> 
> In this example, packet header tp_snaplen was set to 3966,
> and tp_len was set to 5042 (skb->len)
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
> ---
>  net/packet/af_packet.c |   17 +++++++++++++++++
>  net/packet/internal.h  |    1 +
>  2 files changed, 18 insertions(+)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 8d9f8042705a..93896d2092f6 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -632,6 +632,7 @@ static void init_prb_bdqc(struct packet_sock *po,
>  	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
>  	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
>  
> +	p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
>  	prb_init_ft_ops(p1, req_u);
>  	prb_setup_retire_blk_timer(po, tx_ring);
>  	prb_open_block(p1, pbd);
> @@ -1942,6 +1943,18 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
>  			if ((int)snaplen < 0)
>  				snaplen = 0;
>  		}
> +	} else if (unlikely(macoff + snaplen >
> +			    GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len)) {
> +		u32 nval;
> +
> +		nval = GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len - macoff;
> +		pr_err_once("tpacket_rcv: packet too big, clamped from %u to %u. macoff=%u\n",
> +			    snaplen, nval, macoff);
> +		snaplen = nval;
> +		if (unlikely((int)snaplen < 0)) {
> +			snaplen = 0;
> +			macoff = GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len;
> +		}
>  	}
>  	spin_lock(&sk->sk_receive_queue.lock);
>  	h.raw = packet_current_rx_frame(po, skb,
> @@ -3783,6 +3796,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
>  			goto out;
>  		if (unlikely(req->tp_block_size & (PAGE_SIZE - 1)))
>  			goto out;
> +		if (po->tp_version >= TPACKET_V3 &&
> +		    (int)(req->tp_block_size -
> +			  BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
> +			goto out;
>  		if (unlikely(req->tp_frame_size < po->tp_hdrlen +
>  					po->tp_reserve))
>  			goto out;
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index eb9580a6b25f..cdddf6a30399 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -29,6 +29,7 @@ struct tpacket_kbdq_core {
>  	char		*pkblk_start;
>  	char		*pkblk_end;
>  	int		kblk_size;
> +	unsigned int	max_frame_len;
>  	unsigned int	knum_blocks;
>  	uint64_t	knxt_seq_num;
>  	char		*prev;
> 
> 
> 

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 net] packet: handle too big packets for PACKET_V3
  2014-08-15 16:16           ` [PATCH v1 net] " Eric Dumazet
  2014-08-18 15:39             ` Daniel Borkmann
  2014-08-18 17:08             ` Neil Horman
@ 2014-08-21 23:45             ` David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2014-08-21 23:45 UTC (permalink / raw)
  To: eric.dumazet; +Cc: hannes, dborkman, nhorman, brouer, netdev, guy

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 15 Aug 2014 09:16:04 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> af_packet can currently overwrite kernel memory by out of bound
> accesses, because it assumed a [new] block can always hold one frame.
> 
> This is not generally the case, even if most existing tools do it right.
> 
> This patch clamps too long frames as API permits, and issue a one time
> error on syslog.
> 
> [  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
> 
> In this example, packet header tp_snaplen was set to 3966,
> and tp_len was set to 5042 (skb->len)
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")

Since both the skb->len and the snaplen are provided to the user in
the ring entry descriptor, it is correct to fix this problem by simply
truncating.

Applied and queued up for -stable, thanks a lot Eric.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-08-21 23:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-15  0:09 [RFC] packet: handle too big packets for PACKET_V3 Eric Dumazet
2014-08-15  0:36 ` Neil Horman
2014-08-15  0:43 ` Hannes Frederic Sowa
2014-08-15  0:50   ` Hannes Frederic Sowa
2014-08-15  0:56     ` Eric Dumazet
2014-08-15 10:19       ` Neil Horman
2014-08-15  0:54   ` Eric Dumazet
2014-08-15  1:04     ` Hannes Frederic Sowa
2014-08-15  2:01       ` Eric Dumazet
2014-08-15  2:08         ` Hannes Frederic Sowa
2014-08-15  5:02         ` Guy Harris
2014-08-15 16:00           ` Eric Dumazet
2014-08-15 16:16           ` [PATCH v1 net] " Eric Dumazet
2014-08-18 15:39             ` Daniel Borkmann
2014-08-18 17:08             ` Neil Horman
2014-08-21 23:45             ` David Miller
2014-08-15  4:54       ` [RFC] " Guy Harris
2014-08-15 11:37         ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).