Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v3 0/6] net: qualcomm: rmnet: Configuration options
From: David Miller @ 2017-12-13 19:01 UTC (permalink / raw)
  To: subashab; +Cc: netdev
In-Reply-To: <1513038615-16407-1-git-send-email-subashab@codeaurora.org>

From: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Date: Mon, 11 Dec 2017 17:30:09 -0700

> This series adds support for configuring features on rmnet devices.
> The rmnet specific features to be configured here are aggregation and
> control commands.
> 
> Patch 1 is a cleanup of return codes in the transmit path.
> Patch 2 removes some redundant ingress and egress macros.
> Patch 3 restricts the creation of rmnet dev to one dev per mux id for a
> given real dev.
> Patch 4 adds ethernet data path support.
> Patches 5-6 add support for configuring features on new and existing
> rmnet devices.
> 
> v1->v2:
> The memory leak fixed as part of patch 1 is merged seperately as
> a896d94abd2c ("net: qualcomm: rmnet: Fix leak on transmit failure"). 
> Fix a use after free in patch 4 if a packet with headroom lesser than ethernet
> header length is received.
> 
> v2->v3: 
> Fix formatting problem in patch 5 in the return statement.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] tcp: allow TLP in ECN CWR
From: David Miller @ 2017-12-13 18:59 UTC (permalink / raw)
  To: ycheng; +Cc: netdev, ncardwell, edumazet, nanditad, sibanez
In-Reply-To: <20171211234253.102924-1-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Mon, 11 Dec 2017 15:42:53 -0800

> From: Neal Cardwell <ncardwell@google.com>
> 
> This patch enables tail loss probe in cwnd reduction (CWR) state
> to detect potential losses. Prior to this patch, since the sender
> uses PRR to determine the cwnd in CWR state, the combination of
> CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls
> upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer()
> defers sending wait for more ACKs. The ACKs may not come due to
> packet losses.
> 
> Disallowing TLP when there is unused cwnd had the primary effect
> of disallowing TLP when there is TSO deferral, Nagle deferral,
> or we hit the rwin limit. Because basically every application
> write() or incoming ACK will cause us to run tcp_write_xmit()
> to see if we can send more, and then if we sent something we call
> tcp_schedule_loss_probe() to see if we should schedule a TLP. At
> that point, there are a few common reasons why some cwnd budget
> could still be unused:
> 
> (a) rwin limit
> (b) nagle check
> (c) TSO deferral
> (d) TSQ
> 
> For (d), after the next packet tx completion the TSQ mechanism
> will allow us to send more packets, so we don't really need a
> TLP (in practice it shouldn't matter whether we schedule one
> or not). But for (a), (b), (c) the sender won't send any more
> packets until it gets another ACK. But if the whole flight was
> lost, or all the ACKs were lost, then we won't get any more ACKs,
> and ideally we should schedule and send a TLP to get more feedback.
> In particular for a long time we have wanted some kind of timer for
> TSO deferral, and at least this would give us some kind of timer
> 
> Reported-by: Steve Ibanez <sibanez@stanford.edu>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Reviewed-by: Nandita Dukkipati <nanditad@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Applied, thanks.

^ permalink raw reply

* Re: [Patch net-next] net_sched: switch to exit_batch for action pernet ops
From: David Miller @ 2017-12-13 18:58 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, jhs, jiri
In-Reply-To: <20171211233503.28043-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 11 Dec 2017 15:35:03 -0800

> Since we now hold RTNL lock in tc_action_net_exit(), it is good to
> batch them to speedup tc action dismantle.
> 
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v3] net: ethernet: arc: fix error handling in emac_rockchip_probe
From: David Miller @ 2017-12-13 18:58 UTC (permalink / raw)
  To: branislav; +Cc: heiko, netdev, linux-arm-kernel, linux-rockchip, linux-kernel
In-Reply-To: <20171211231338.5207-1-branislav@radocaj.org>

From: Branislav Radocaj <branislav@radocaj.org>
Date: Tue, 12 Dec 2017 00:13:38 +0100

> If clk_set_rate() fails, we should disable clk before return.
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Changes since v2 [1]:
> * Merged with latest code changes
> 
> Changes since v1:
> Update made thanks to David's review, much appreciated David.
> * Improved inconsistent failure handling of clock rate setting
> * For completeness of usecase, added arc_emac_probe error handling
> 
> Signed-off-by: Branislav Radocaj <branislav@radocaj.org>

Applied.

^ permalink raw reply

* Re: [PATCH v9 0/5] Add the ability to do BPF directed error injection
From: Josef Bacik @ 2017-12-13 18:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Josef Bacik, Masami Hiramatsu, rostedt, mingo, davem, netdev,
	linux-kernel, ast, kernel-team, daniel, linux-btrfs
In-Reply-To: <20171213180732.GE13436@magnolia>

On Wed, Dec 13, 2017 at 10:07:32AM -0800, Darrick J. Wong wrote:
> On Wed, Dec 13, 2017 at 01:03:57PM -0500, Josef Bacik wrote:
> > On Tue, Dec 12, 2017 at 03:11:50PM -0800, Darrick J. Wong wrote:
> > > On Mon, Dec 11, 2017 at 11:36:45AM -0500, Josef Bacik wrote:
> > > > This is the same as v8, just rebased onto the bpf tree.
> > > > 
> > > > v8->v9:
> > > > - rebased onto the bpf tree.
> > > > 
> > > > v7->v8:
> > > > - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
> > > > 
> > > > v6->v7:
> > > > - moved the opt-in macro to bpf.h out of kprobes.h.
> > > > 
> > > > v5->v6:
> > > > - add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
> > > >   feature.  This way only functions that opt-in will be allowed to be
> > > >   overridden.
> > > > - added a btrfs patch to allow error injection for open_ctree() so that the bpf
> > > >   sample actually works.
> > > > 
> > > > v4->v5:
> > > > - disallow kprobe_override programs from being put in the prog map array so we
> > > >   don't tail call into something we didn't check.  This allows us to make the
> > > >   normal path still fast without a bunch of percpu operations.
> > > > 
> > > > v3->v4:
> > > > - fix a build error found by kbuild test bot (I didn't wait long enough
> > > >   apparently.)
> > > > - Added a warning message as per Daniels suggestion.
> > > > 
> > > > v2->v3:
> > > > - added a ->kprobe_override flag to bpf_prog.
> > > > - added some sanity checks to disallow attaching bpf progs that have
> > > >   ->kprobe_override set that aren't for ftrace kprobes.
> > > > - added the trace_kprobe_ftrace helper to check if the trace_event_call is a
> > > >   ftrace kprobe.
> > > > - renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
> > > >   value in the kprobe path, and thus only write to it if we're overriding or
> > > >   clearing the override.
> > > > 
> > > > v1->v2:
> > > > - moved things around to make sure that bpf_override_return could really only be
> > > >   used for an ftrace kprobe.
> > > > - killed the special return values from trace_call_bpf.
> > > > - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
> > > >   it was being called from an ftrace kprobe context.
> > > > - reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
> > > > - updated the test as per Alexei's review.
> > > > 
> > > > - Original message -
> > > > 
> > > > A lot of our error paths are not well tested because we have no good way of
> > > > injecting errors generically.  Some subystems (block, memory) have ways to
> > > > inject errors, but they are random so it's hard to get reproduceable results.
> > > > 
> > > > With BPF we can add determinism to our error injection.  We can use kprobes and
> > > > other things to verify we are injecting errors at the exact case we are trying
> > > > to test.  This patch gives us the tool to actual do the error injection part.
> > > > It is very simple, we just set the return value of the pt_regs we're given to
> > > > whatever we provide, and then override the PC with a dummy function that simply
> > > > returns.
> > > 
> > > Heh, this looks cool.  I decided to try it to see what happens, and saw
> > > a bunch of dmesg pasted in below.  Is that supposed to happen?  Or am I
> > > the only fs developer still running with lockdep enabled? :)
> > > 
> > > It looks like bpf_override_return has some sort of side effect such that
> > > we get the splat, since commenting it out makes the symptom go away.
> > > 
> > > <shrug>
> > > 
> > > --D
> > > 
> > > [ 1847.769183] BTRFS error (device (null)): open_ctree failed
> > > [ 1847.770130] BUG: sleeping function called from invalid context at /storage/home/djwong/cdev/work/linux-xfs/kernel/locking/rwsem.c:69
> > > [ 1847.771976] in_atomic(): 1, irqs_disabled(): 0, pid: 1524, name: mount
> > > [ 1847.773016] 1 lock held by mount/1524:
> > > [ 1847.773530]  #0:  (&type->s_umount_key#34/1){+.+.}, at: [<00000000653a9bb4>] sget_userns+0x302/0x4f0
> > > [ 1847.774731] Preemption disabled at:
> > > [ 1847.774735] [<          (null)>]           (null)
> > > [ 1847.777009] CPU: 2 PID: 1524 Comm: mount Tainted: G        W        4.15.0-rc3-xfsx #3
> > > [ 1847.778800] Call Trace:
> > > [ 1847.779047]  dump_stack+0x7c/0xbe
> > > [ 1847.779361]  ___might_sleep+0x1f7/0x260
> > > [ 1847.779720]  down_write+0x29/0xb0
> > > [ 1847.780046]  unregister_shrinker+0x15/0x70
> > > [ 1847.780427]  deactivate_locked_super+0x2e/0x60
> > > [ 1847.780935]  btrfs_mount+0xbb6/0x1000 [btrfs]
> > > [ 1847.781353]  ? __lockdep_init_map+0x5c/0x1d0
> > > [ 1847.781750]  ? mount_fs+0xf/0x80
> > > [ 1847.782065]  ? alloc_vfsmnt+0x1a1/0x230
> > > [ 1847.782429]  mount_fs+0xf/0x80
> > > [ 1847.782733]  vfs_kern_mount+0x62/0x160
> > > [ 1847.783128]  btrfs_mount+0x3d3/0x1000 [btrfs]
> > > [ 1847.783493]  ? __lockdep_init_map+0x5c/0x1d0
> > > [ 1847.783849]  ? __lockdep_init_map+0x5c/0x1d0
> > > [ 1847.784207]  ? mount_fs+0xf/0x80
> > > [ 1847.784502]  mount_fs+0xf/0x80
> > > [ 1847.784835]  vfs_kern_mount+0x62/0x160
> > > [ 1847.785235]  do_mount+0x1b1/0xd50
> > > [ 1847.785594]  ? _copy_from_user+0x5b/0x90
> > > [ 1847.786028]  ? memdup_user+0x4b/0x70
> > > [ 1847.786501]  SyS_mount+0x85/0xd0
> > > [ 1847.786835]  entry_SYSCALL_64_fastpath+0x1f/0x96
> > > [ 1847.787311] RIP: 0033:0x7f6ebecc1b5a
> > > [ 1847.787691] RSP: 002b:00007ffc7bd1c958 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> > > [ 1847.788383] RAX: ffffffffffffffda RBX: 00007f6ebefba63a RCX: 00007f6ebecc1b5a
> > > [ 1847.789106] RDX: 0000000000bfd010 RSI: 0000000000bfa230 RDI: 0000000000bfa210
> > > [ 1847.789807] RBP: 0000000000bfa0f0 R08: 0000000000000000 R09: 0000000000000014
> > > [ 1847.790511] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 00007f6ebf1ca83c
> > > [ 1847.791211] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
> > > [ 1847.792029] BUG: scheduling while atomic: mount/1524/0x00000002
> > > [ 1847.792680] 1 lock held by mount/1524:
> > > [ 1847.793087]  #0:  (rcu_preempt_state.exp_mutex){+.+.}, at: [<00000000a6c536a9>] _synchronize_rcu_expedited+0x1ce/0x400
> > > [ 1847.794129] Modules linked in: xfs libcrc32c btrfs xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress zlib_deflate raid6_pq dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: xfs]
> > > [ 1847.795949] Preemption disabled at:
> > > [ 1847.795951] [<          (null)>]           (null)
> > > [ 1847.796844] CPU: 2 PID: 1524 Comm: mount Tainted: G        W        4.15.0-rc3-xfsx #3
> > > [ 1847.797621] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > [ 1847.798510] Call Trace:
> > > [ 1847.798786]  dump_stack+0x7c/0xbe
> > > [ 1847.799134]  __schedule_bug+0x88/0xe0
> > > [ 1847.799517]  __schedule+0x78c/0xb20
> > > [ 1847.799890]  ? trace_hardirqs_on_caller+0x119/0x180
> > > [ 1847.800391]  schedule+0x40/0x90
> > > [ 1847.800729]  _synchronize_rcu_expedited+0x36b/0x400
> > > [ 1847.801218]  ? rcu_preempt_qs+0xa0/0xa0
> > > [ 1847.801616]  ? remove_wait_queue+0x60/0x60
> > > [ 1847.802040]  ? rcu_preempt_qs+0xa0/0xa0
> > > [ 1847.802433]  ? rcu_exp_wait_wake+0x630/0x630
> > > [ 1847.802872]  ? __lock_acquire+0xfb9/0x1120
> > > [ 1847.803302]  ? __lock_acquire+0x534/0x1120
> > > [ 1847.803725]  ? bdi_unregister+0x57/0x1a0
> > > [ 1847.804135]  bdi_unregister+0x5c/0x1a0
> > > [ 1847.804519]  bdi_put+0xcb/0xe0
> > > [ 1847.804746]  generic_shutdown_super+0xe2/0x110
> > > [ 1847.805066]  kill_anon_super+0xe/0x20
> > > [ 1847.805344]  btrfs_kill_super+0x12/0xa0 [btrfs]
> > > [ 1847.805664]  deactivate_locked_super+0x34/0x60
> > > [ 1847.806111]  btrfs_mount+0xbb6/0x1000 [btrfs]
> > > [ 1847.806476]  ? __lockdep_init_map+0x5c/0x1d0
> > > [ 1847.806824]  ? mount_fs+0xf/0x80
> > > [ 1847.807104]  ? alloc_vfsmnt+0x1a1/0x230
> > > [ 1847.807416]  mount_fs+0xf/0x80
> > > [ 1847.807712]  vfs_kern_mount+0x62/0x160
> > > [ 1847.808112]  btrfs_mount+0x3d3/0x1000 [btrfs]
> > > [ 1847.808565]  ? __lockdep_init_map+0x5c/0x1d0
> > > [ 1847.809005]  ? __lockdep_init_map+0x5c/0x1d0
> > > [ 1847.809425]  ? mount_fs+0xf/0x80
> > > [ 1847.809731]  mount_fs+0xf/0x80
> > > [ 1847.810070]  vfs_kern_mount+0x62/0x160
> > > [ 1847.810469]  do_mount+0x1b1/0xd50
> > > [ 1847.810821]  ? _copy_from_user+0x5b/0x90
> > > [ 1847.811237]  ? memdup_user+0x4b/0x70
> > > [ 1847.811622]  SyS_mount+0x85/0xd0
> > > [ 1847.811996]  entry_SYSCALL_64_fastpath+0x1f/0x96
> > > [ 1847.812465] RIP: 0033:0x7f6ebecc1b5a
> > > [ 1847.812840] RSP: 002b:00007ffc7bd1c958 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> > > [ 1847.813615] RAX: ffffffffffffffda RBX: 00007f6ebefba63a RCX: 00007f6ebecc1b5a
> > > [ 1847.814302] RDX: 0000000000bfd010 RSI: 0000000000bfa230 RDI: 0000000000bfa210
> > > [ 1847.814770] RBP: 0000000000bfa0f0 R08: 0000000000000000 R09: 0000000000000014
> > > [ 1847.815246] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 00007f6ebf1ca83c
> > > [ 1847.815720] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
> > i> 
> > 
> > Looks like this is new, Masami this is happening because of your change here
> > 
> > 5bb4fc2d8641 ("kprobes/x86: Disable preemption in ftrace-based jprobes")
> > 
> > which makes it not do the preempt_enable() if the handler returns 1.  Why is
> > that?  Should I be doing preempt_enable_no_resched() from the handler before
> > returning 1?  Or is this just an oversight on your part?  Thanks,
> 
> FWIW I shut up the preemption imbalance warnings with the attached
> coarse bandaid.  No idea if that's the correct fix...
> 
> --D
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 5db8498..fd948e3 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1215,8 +1215,10 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
>  		if (__this_cpu_read(bpf_kprobe_override)) {
>  			__this_cpu_write(bpf_kprobe_override, 0);
>  			reset_current_kprobe();
> +			preempt_enable();
>  			return 1;
>  		}
> +		preempt_enable();
>  		if (!ret)
>  			return 0;
>  	}

Yeah I'd like to avoid doing this and know why exactly we leave a unpaired
preempt_disable() in kprobe_ftrace_handler() so we don't do something like this
only to have the handler change again in the future and break us again.  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH net,stable] net: qmi_wwan: add Sierra EM7565 1199:9091
From: David Miller @ 2017-12-13 18:57 UTC (permalink / raw)
  To: ssjoholm; +Cc: netdev, linux-usb, linux-kernel, bjorn, rspmn
In-Reply-To: <20171211205114.85097-1-ssjoholm@mac.com>

From: ssjoholm@mac.com
Date: Mon, 11 Dec 2017 21:51:14 +0100

> From: Sebastian Sjoholm <ssjoholm@mac.com>
> 
> Sierra Wireless EM7565 is an Qualcomm MDM9x50 based M.2 modem.
> The USB id is added to qmi_wwan.c to allow QMI communication 
> with the EM7565.
> 
> Signed-off-by: Sebastian Sjoholm <ssjoholm@mac.com>
> Acked-by: Bjørn Mork <bjorn@mork.no>

Applied and queued up for -stable.

^ permalink raw reply

* Re: net: phy: consolidate PHY reset in phy_init_hw()
From: Daniel Walker @ 2017-12-13 18:46 UTC (permalink / raw)
  To: Florian Fainelli, David S. Miller, netdev@vger.kernel.org

Hi,

https://sjc-apl-grt32.cisco.com:8081/gitweb?p=xe-linux/kernel.git;a=commit;h=87aa9f9c61ad56d505641681812e92ad976f8608

I've got a machine with an Marvell 88EE1112 , and it seems this patch 
causes a problem. When the reset happens under this code this phy always 
returns -ETIMEDOUT from inside phy_poll_reset() .. It would seem this 
phy doesn't fit whatever standard this code is expecting. I tried to 
increase the wait time, but it doesn't seems to change anything.

Any thoughts?

Daniel

^ permalink raw reply

* Re: [PATCH] net: igmp: Use correct source address on IGMPv3 reports
From: David Miller @ 2017-12-13 18:52 UTC (permalink / raw)
  To: cernekee; +Cc: kuznet, yoshfuji, netdev, andrew, linux-kernel
In-Reply-To: <20171211191345.104136-1-cernekee@chromium.org>

From: Kevin Cernekee <cernekee@chromium.org>
Date: Mon, 11 Dec 2017 11:13:45 -0800

> Closing a multicast socket after the final IPv4 address is deleted
> from an interface can generate a membership report that uses the
> source IP from a different interface.  The following test script, run
> from an isolated netns, reproduces the issue:
> 
>     #!/bin/bash
> 
>     ip link add dummy0 type dummy
>     ip link add dummy1 type dummy
>     ip link set dummy0 up
>     ip link set dummy1 up
>     ip addr add 10.1.1.1/24 dev dummy0
>     ip addr add 192.168.99.99/24 dev dummy1
> 
>     tcpdump -U -i dummy0 &
>     socat EXEC:"sleep 2" \
>         UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 &
> 
>     sleep 1
>     ip addr del 10.1.1.1/24 dev dummy0
>     sleep 5
>     kill %tcpdump
> 
> RFC 3376 specifies that the report must be sent with a valid IP source
> address from the destination subnet, or from address 0.0.0.0.  Add an
> extra check to make sure this is the case.
> 
> Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
...
> +	return htonl(INADDR_ANY);
> +}
> +
>  static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu)
>  {
>  	struct sk_buff *skb;
> @@ -368,7 +386,7 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu)
>  	pip->frag_off = htons(IP_DF);
>  	pip->ttl      = 1;
>  	pip->daddr    = fl4.daddr;
> -	pip->saddr    = fl4.saddr;
> +	pip->saddr    = igmpv3_get_srcaddr(dev, &fl4);
>  	pip->protocol = IPPROTO_IGMP;
>  	pip->tot_len  = 0;	/* filled in later */
>  	ip_select_ident(net, skb, NULL);

Ok, and I checked that MC source address validation on the receiver
will pass because:

	if (ipv4_is_zeronet(saddr)) {
		if (!ipv4_is_local_multicast(daddr))
			return -EINVAL;

that check in ip_mc_validate_source() won't trigger.

^ permalink raw reply

* Re: [PATCH] net/tls: Fix inverted error codes to avoid endless loop
From: David Miller @ 2017-12-13 18:48 UTC (permalink / raw)
  To: r.hering; +Cc: netdev
In-Reply-To: <OF6F0DE70F.1C59BE55-ONC12581F3.0066F1BE-C12581F3.0068093A@avm.de>

From: r.hering@avm.de
Date: Mon, 11 Dec 2017 19:56:21 +0100

> sendfile() calls can hang endless with using Kernel TLS if a socket error 
> occurs.
> Socket error codes must be inverted by Kernel TLS before returning because
> they are stored with positive sign. If returned non-inverted they are
> interpreted as number of bytes sent, causing endless looping of the
> splice mechanic behind sendfile().
> 
> Signed-off-by: Robert Hering <r.hering@avm.de>

Your patch is corrupted by your email client, it turned TAB characters
into spaces.

Please read Documentation/email-clients.txt to fix this.

Send a test patch to yourself, and make sure you can actually apply
the patch contained in that test email.

Do not repost the patch here until you can get such a test patch
to work properly.

Thank you.

^ permalink raw reply

* [PATCH 11/12] netfilter: exthdr: add missign attributes to policy
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

Add missing netlink attribute policy.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_exthdr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c
index a0a93d987a3b..47ec1046ad11 100644
--- a/net/netfilter/nft_exthdr.c
+++ b/net/netfilter/nft_exthdr.c
@@ -214,6 +214,8 @@ static const struct nla_policy nft_exthdr_policy[NFTA_EXTHDR_MAX + 1] = {
 	[NFTA_EXTHDR_OFFSET]		= { .type = NLA_U32 },
 	[NFTA_EXTHDR_LEN]		= { .type = NLA_U32 },
 	[NFTA_EXTHDR_FLAGS]		= { .type = NLA_U32 },
+	[NFTA_EXTHDR_OP]		= { .type = NLA_U32 },
+	[NFTA_EXTHDR_SREG]		= { .type = NLA_U32 },
 };
 
 static int nft_exthdr_init(const struct nft_ctx *ctx,
-- 
2.11.0

^ permalink raw reply related

* [PATCH 09/12] netfilter: xt_osf: Add missing permission checks
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Kevin Cernekee <cernekee@chromium.org>

The capability check in nfnetlink_rcv() verifies that the caller
has CAP_NET_ADMIN in the namespace that "owns" the netlink socket.
However, xt_osf_fingers is shared by all net namespaces on the
system.  An unprivileged user can create user and net namespaces
in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable()
check:

    vpnns -- nfnl_osf -f /tmp/pf.os

    vpnns -- nfnl_osf -f /tmp/pf.os -d

These non-root operations successfully modify the systemwide OS
fingerprint list.  Add new capable() checks so that they can't.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_osf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c
index 36e14b1f061d..a34f314a8c23 100644
--- a/net/netfilter/xt_osf.c
+++ b/net/netfilter/xt_osf.c
@@ -19,6 +19,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 
+#include <linux/capability.h>
 #include <linux/if.h>
 #include <linux/inetdevice.h>
 #include <linux/ip.h>
@@ -70,6 +71,9 @@ static int xt_osf_add_callback(struct net *net, struct sock *ctnl,
 	struct xt_osf_finger *kf = NULL, *sf;
 	int err = 0;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!osf_attrs[OSF_ATTR_FINGER])
 		return -EINVAL;
 
@@ -115,6 +119,9 @@ static int xt_osf_remove_callback(struct net *net, struct sock *ctnl,
 	struct xt_osf_finger *sf;
 	int err = -ENOENT;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!osf_attrs[OSF_ATTR_FINGER])
 		return -EINVAL;
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH 12/12] netfilter: ip6t_MASQUERADE: add dependency on conntrack module
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

After commit 4d3a57f23dec ("netfilter: conntrack: do not enable connection
tracking unless needed") conntrack is disabled by default unless some
module explicitly declares dependency in particular network namespace.

Fixes: a357b3f80bc8 ("netfilter: nat: add dependencies on conntrack module")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv6/netfilter/ip6t_MASQUERADE.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/ip6t_MASQUERADE.c b/net/ipv6/netfilter/ip6t_MASQUERADE.c
index 2b1a15846f9a..92c0047e7e33 100644
--- a/net/ipv6/netfilter/ip6t_MASQUERADE.c
+++ b/net/ipv6/netfilter/ip6t_MASQUERADE.c
@@ -33,13 +33,19 @@ static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par)
 
 	if (range->flags & NF_NAT_RANGE_MAP_IPS)
 		return -EINVAL;
-	return 0;
+	return nf_ct_netns_get(par->net, par->family);
+}
+
+static void masquerade_tg6_destroy(const struct xt_tgdtor_param *par)
+{
+	nf_ct_netns_put(par->net, par->family);
 }
 
 static struct xt_target masquerade_tg6_reg __read_mostly = {
 	.name		= "MASQUERADE",
 	.family		= NFPROTO_IPV6,
 	.checkentry	= masquerade_tg6_checkentry,
+	.destroy	= masquerade_tg6_destroy,
 	.target		= masquerade_tg6,
 	.targetsize	= sizeof(struct nf_nat_range),
 	.table		= "nat",
-- 
2.11.0

^ permalink raw reply related

* [PATCH 10/12] netfilter: ipt_CLUSTERIP: fix clusterip_net_exit build regression
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Arnd Bergmann <arnd@arndb.de>

The added check produces a build error when CONFIG_PROC_FS is
disabled:

net/ipv4/netfilter/ipt_CLUSTERIP.c: In function 'clusterip_net_exit':
net/ipv4/netfilter/ipt_CLUSTERIP.c:822:28: error: 'cn' undeclared (first use in this function)

This moves the variable declaration out of the #ifdef to make it
available to the WARN_ON_ONCE().

Fixes: 613d0776d3fe ("netfilter: exit_net cleanup check added")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index e35b8d074f06..69060e3abe85 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -813,8 +813,8 @@ static int clusterip_net_init(struct net *net)
 
 static void clusterip_net_exit(struct net *net)
 {
-#ifdef CONFIG_PROC_FS
 	struct clusterip_net *cn = net_generic(net, clusterip_net_id);
+#ifdef CONFIG_PROC_FS
 	proc_remove(cn->procdir);
 	cn->procdir = NULL;
 #endif
-- 
2.11.0

^ permalink raw reply related

* [PATCH 06/12] netfilter: conntrack: clamp timeouts to INT_MAX
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Jay Elliott <jelliott@arista.com>

When the conntracking code multiplies a timeout by HZ, it can overflow
from positive to negative; this causes it to instantly expire.  To
protect against this the multiplication is done in 64-bit so we can
prevent it from exceeding INT_MAX.

Signed-off-by: Jay Elliott <jelliott@arista.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 59c08997bfdf..66d72a8fa87f 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1566,9 +1566,11 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
 static int ctnetlink_change_timeout(struct nf_conn *ct,
 				    const struct nlattr * const cda[])
 {
-	u_int32_t timeout = ntohl(nla_get_be32(cda[CTA_TIMEOUT]));
+	u64 timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
 
-	ct->timeout = nfct_time_stamp + timeout * HZ;
+	if (timeout > INT_MAX)
+		timeout = INT_MAX;
+	ct->timeout = nfct_time_stamp + (u32)timeout;
 
 	if (test_bit(IPS_DYING_BIT, &ct->status))
 		return -ETIME;
@@ -1768,6 +1770,7 @@ ctnetlink_create_conntrack(struct net *net,
 	int err = -EINVAL;
 	struct nf_conntrack_helper *helper;
 	struct nf_conn_tstamp *tstamp;
+	u64 timeout;
 
 	ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_ATOMIC);
 	if (IS_ERR(ct))
@@ -1776,7 +1779,10 @@ ctnetlink_create_conntrack(struct net *net,
 	if (!cda[CTA_TIMEOUT])
 		goto err1;
 
-	ct->timeout = nfct_time_stamp + ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
+	timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
+	if (timeout > INT_MAX)
+		timeout = INT_MAX;
+	ct->timeout = (u32)timeout + nfct_time_stamp;
 
 	rcu_read_lock();
  	if (cda[CTA_HELP]) {
-- 
2.11.0

^ permalink raw reply related

* [PATCH 05/12] netfilter: conntrack: lower timeout to RETRANS seconds if window is 0
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

When zero window is announced we can get into a situation where
connection stays around forever:

1. One side announces zero window.
2. Other side closes.

In this case, no FIN is sent (stuck in send queue).

Unless other side opens the window up again conntrack
stays in ESTABLISHED state for a very long time.

Lets alleviate this by lowering the timeout to RETRANS (5 minutes),
the other end should be sending zero window probes to keep the
connection established as long as a socket still exists.

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index b12fc07111d0..37ef35b861f2 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1039,6 +1039,9 @@ static int tcp_packet(struct nf_conn *ct,
 		 IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED &&
 		 timeouts[new_state] > timeouts[TCP_CONNTRACK_UNACK])
 		timeout = timeouts[TCP_CONNTRACK_UNACK];
+	else if (ct->proto.tcp.last_win == 0 &&
+		 timeouts[new_state] > timeouts[TCP_CONNTRACK_RETRANS])
+		timeout = timeouts[TCP_CONNTRACK_RETRANS];
 	else
 		timeout = timeouts[new_state];
 	spin_unlock_bh(&ct->lock);
-- 
2.11.0

^ permalink raw reply related

* [PATCH 03/12] netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>

It is bad practive to return in a macro, this patch
moves the check into a function.

Signed-off-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_h323_asn1.c | 94 +++++++++++++++++++++++-----------
 1 file changed, 65 insertions(+), 29 deletions(-)

diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c
index cf1bf2605c10..3d9a009ac147 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -103,7 +103,6 @@ struct bitstr {
 #define INC_BIT(bs) if((++(bs)->bit)>7){(bs)->cur++;(bs)->bit=0;}
 #define INC_BITS(bs,b) if(((bs)->bit+=(b))>7){(bs)->cur+=(bs)->bit>>3;(bs)->bit&=7;}
 #define BYTE_ALIGN(bs) if((bs)->bit){(bs)->cur++;(bs)->bit=0;}
-#define CHECK_BOUND(bs,n) if((bs)->cur+(n)>(bs)->end)return(H323_ERROR_BOUND)
 static unsigned int get_len(struct bitstr *bs);
 static unsigned int get_bit(struct bitstr *bs);
 static unsigned int get_bits(struct bitstr *bs, unsigned int b);
@@ -165,6 +164,14 @@ static unsigned int get_len(struct bitstr *bs)
 	return v;
 }
 
+static int nf_h323_error_boundary(struct bitstr *bs, size_t bytes)
+{
+	if (*bs->cur + bytes > *bs->end)
+		return 1;
+
+	return 0;
+}
+
 /****************************************************************************/
 static unsigned int get_bit(struct bitstr *bs)
 {
@@ -280,7 +287,8 @@ static int decode_bool(struct bitstr *bs, const struct field_t *f,
 
 	INC_BIT(bs);
 
-	CHECK_BOUND(bs, 0);
+	if (nf_h323_error_boundary(bs, 0))
+		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
 
@@ -293,11 +301,14 @@ static int decode_oid(struct bitstr *bs, const struct field_t *f,
 	PRINT("%*.s%s\n", level * TAB_SIZE, " ", f->name);
 
 	BYTE_ALIGN(bs);
-	CHECK_BOUND(bs, 1);
+	if (nf_h323_error_boundary(bs, 1))
+		return H323_ERROR_BOUND;
+
 	len = *bs->cur++;
 	bs->cur += len;
+	if (nf_h323_error_boundary(bs, 0))
+		return H323_ERROR_BOUND;
 
-	CHECK_BOUND(bs, 0);
 	return H323_ERROR_NONE;
 }
 
@@ -330,7 +341,8 @@ static int decode_int(struct bitstr *bs, const struct field_t *f,
 		break;
 	case UNCO:
 		BYTE_ALIGN(bs);
-		CHECK_BOUND(bs, 2);
+		if (nf_h323_error_boundary(bs, 2))
+			return H323_ERROR_BOUND;
 		len = get_len(bs);
 		bs->cur += len;
 		break;
@@ -341,7 +353,8 @@ static int decode_int(struct bitstr *bs, const struct field_t *f,
 
 	PRINT("\n");
 
-	CHECK_BOUND(bs, 0);
+	if (nf_h323_error_boundary(bs, 0))
+		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
 
@@ -357,7 +370,8 @@ static int decode_enum(struct bitstr *bs, const struct field_t *f,
 		INC_BITS(bs, f->sz);
 	}
 
-	CHECK_BOUND(bs, 0);
+	if (nf_h323_error_boundary(bs, 0))
+		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
 
@@ -375,12 +389,14 @@ static int decode_bitstr(struct bitstr *bs, const struct field_t *f,
 		len = f->lb;
 		break;
 	case WORD:		/* 2-byte length */
-		CHECK_BOUND(bs, 2);
+		if (nf_h323_error_boundary(bs, 2))
+			return H323_ERROR_BOUND;
 		len = (*bs->cur++) << 8;
 		len += (*bs->cur++) + f->lb;
 		break;
 	case SEMI:
-		CHECK_BOUND(bs, 2);
+		if (nf_h323_error_boundary(bs, 2))
+			return H323_ERROR_BOUND;
 		len = get_len(bs);
 		break;
 	default:
@@ -391,7 +407,8 @@ static int decode_bitstr(struct bitstr *bs, const struct field_t *f,
 	bs->cur += len >> 3;
 	bs->bit = len & 7;
 
-	CHECK_BOUND(bs, 0);
+	if (nf_h323_error_boundary(bs, 0))
+		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
 
@@ -409,7 +426,8 @@ static int decode_numstr(struct bitstr *bs, const struct field_t *f,
 	BYTE_ALIGN(bs);
 	INC_BITS(bs, (len << 2));
 
-	CHECK_BOUND(bs, 0);
+	if (nf_h323_error_boundary(bs, 0))
+		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
 
@@ -440,12 +458,14 @@ static int decode_octstr(struct bitstr *bs, const struct field_t *f,
 		break;
 	case BYTE:		/* Range == 256 */
 		BYTE_ALIGN(bs);
-		CHECK_BOUND(bs, 1);
+		if (nf_h323_error_boundary(bs, 1))
+			return H323_ERROR_BOUND;
 		len = (*bs->cur++) + f->lb;
 		break;
 	case SEMI:
 		BYTE_ALIGN(bs);
-		CHECK_BOUND(bs, 2);
+		if (nf_h323_error_boundary(bs, 2))
+			return H323_ERROR_BOUND;
 		len = get_len(bs) + f->lb;
 		break;
 	default:		/* 2 <= Range <= 255 */
@@ -458,7 +478,8 @@ static int decode_octstr(struct bitstr *bs, const struct field_t *f,
 
 	PRINT("\n");
 
-	CHECK_BOUND(bs, 0);
+	if (nf_h323_error_boundary(bs, 0))
+		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
 
@@ -473,7 +494,8 @@ static int decode_bmpstr(struct bitstr *bs, const struct field_t *f,
 	switch (f->sz) {
 	case BYTE:		/* Range == 256 */
 		BYTE_ALIGN(bs);
-		CHECK_BOUND(bs, 1);
+		if (nf_h323_error_boundary(bs, 1))
+			return H323_ERROR_BOUND;
 		len = (*bs->cur++) + f->lb;
 		break;
 	default:		/* 2 <= Range <= 255 */
@@ -484,7 +506,8 @@ static int decode_bmpstr(struct bitstr *bs, const struct field_t *f,
 
 	bs->cur += len << 1;
 
-	CHECK_BOUND(bs, 0);
+	if (nf_h323_error_boundary(bs, 0))
+		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
 
@@ -525,9 +548,11 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 
 		/* Decode */
 		if (son->attr & OPEN) {	/* Open field */
-			CHECK_BOUND(bs, 2);
+			if (nf_h323_error_boundary(bs, 2))
+				return H323_ERROR_BOUND;
 			len = get_len(bs);
-			CHECK_BOUND(bs, len);
+			if (nf_h323_error_boundary(bs, len))
+				return H323_ERROR_BOUND;
 			if (!base || !(son->attr & DECODE)) {
 				PRINT("%*.s%s\n", (level + 1) * TAB_SIZE,
 				      " ", son->name);
@@ -556,7 +581,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 
 	/* Get the extension bitmap */
 	bmp2_len = get_bits(bs, 7) + 1;
-	CHECK_BOUND(bs, (bmp2_len + 7) >> 3);
+	if (nf_h323_error_boundary(bs, (bmp2_len + 7) >> 3))
+		return H323_ERROR_BOUND;
 	bmp2 = get_bitmap(bs, bmp2_len);
 	bmp |= bmp2 >> f->sz;
 	if (base)
@@ -567,9 +593,11 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 	for (opt = 0; opt < bmp2_len; opt++, i++, son++) {
 		/* Check Range */
 		if (i >= f->ub) {	/* Newer Version? */
-			CHECK_BOUND(bs, 2);
+			if (nf_h323_error_boundary(bs, 2))
+				return H323_ERROR_BOUND;
 			len = get_len(bs);
-			CHECK_BOUND(bs, len);
+			if (nf_h323_error_boundary(bs, len))
+				return H323_ERROR_BOUND;
 			bs->cur += len;
 			continue;
 		}
@@ -583,9 +611,11 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 		if (!((0x80000000 >> opt) & bmp2))	/* Not present */
 			continue;
 
-		CHECK_BOUND(bs, 2);
+		if (nf_h323_error_boundary(bs, 2))
+			return H323_ERROR_BOUND;
 		len = get_len(bs);
-		CHECK_BOUND(bs, len);
+		if (nf_h323_error_boundary(bs, len))
+			return H323_ERROR_BOUND;
 		if (!base || !(son->attr & DECODE)) {
 			PRINT("%*.s%s\n", (level + 1) * TAB_SIZE, " ",
 			      son->name);
@@ -623,19 +653,22 @@ static int decode_seqof(struct bitstr *bs, const struct field_t *f,
 	switch (f->sz) {
 	case BYTE:
 		BYTE_ALIGN(bs);
-		CHECK_BOUND(bs, 1);
+		if (nf_h323_error_boundary(bs, 1))
+			return H323_ERROR_BOUND;
 		count = *bs->cur++;
 		break;
 	case WORD:
 		BYTE_ALIGN(bs);
-		CHECK_BOUND(bs, 2);
+		if (nf_h323_error_boundary(bs, 2))
+			return H323_ERROR_BOUND;
 		count = *bs->cur++;
 		count <<= 8;
 		count += *bs->cur++;
 		break;
 	case SEMI:
 		BYTE_ALIGN(bs);
-		CHECK_BOUND(bs, 2);
+		if (nf_h323_error_boundary(bs, 2))
+			return H323_ERROR_BOUND;
 		count = get_len(bs);
 		break;
 	default:
@@ -659,7 +692,8 @@ static int decode_seqof(struct bitstr *bs, const struct field_t *f,
 		if (son->attr & OPEN) {
 			BYTE_ALIGN(bs);
 			len = get_len(bs);
-			CHECK_BOUND(bs, len);
+			if (nf_h323_error_boundary(bs, len))
+				return H323_ERROR_BOUND;
 			if (!base || !(son->attr & DECODE)) {
 				PRINT("%*.s%s\n", (level + 1) * TAB_SIZE,
 				      " ", son->name);
@@ -728,7 +762,8 @@ static int decode_choice(struct bitstr *bs, const struct field_t *f,
 	if (type >= f->ub) {	/* Newer version? */
 		BYTE_ALIGN(bs);
 		len = get_len(bs);
-		CHECK_BOUND(bs, len);
+		if (nf_h323_error_boundary(bs, len))
+			return H323_ERROR_BOUND;
 		bs->cur += len;
 		return H323_ERROR_NONE;
 	}
@@ -743,7 +778,8 @@ static int decode_choice(struct bitstr *bs, const struct field_t *f,
 	if (ext || (son->attr & OPEN)) {
 		BYTE_ALIGN(bs);
 		len = get_len(bs);
-		CHECK_BOUND(bs, len);
+		if (nf_h323_error_boundary(bs, len))
+			return H323_ERROR_BOUND;
 		if (!base || !(son->attr & DECODE)) {
 			PRINT("%*.s%s\n", (level + 1) * TAB_SIZE, " ",
 			      son->name);
-- 
2.11.0

^ permalink raw reply related

* [PATCH 04/12] netfilter: nf_ct_h323: Extend nf_h323_error_boundary to work on bits as well
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>

This patch fixes several out of bounds memory reads by extending
the nf_h323_error_boundary() function to work on bits as well
an check the affected parts.

Signed-off-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_h323_asn1.c | 92 +++++++++++++++++++++++-----------
 1 file changed, 62 insertions(+), 30 deletions(-)

diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c
index 3d9a009ac147..dc6347342e34 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -164,8 +164,13 @@ static unsigned int get_len(struct bitstr *bs)
 	return v;
 }
 
-static int nf_h323_error_boundary(struct bitstr *bs, size_t bytes)
+static int nf_h323_error_boundary(struct bitstr *bs, size_t bytes, size_t bits)
 {
+	bits += bs->bit;
+	bytes += bits / BITS_PER_BYTE;
+	if (bits % BITS_PER_BYTE > 0)
+		bytes++;
+
 	if (*bs->cur + bytes > *bs->end)
 		return 1;
 
@@ -286,8 +291,7 @@ static int decode_bool(struct bitstr *bs, const struct field_t *f,
 	PRINT("%*.s%s\n", level * TAB_SIZE, " ", f->name);
 
 	INC_BIT(bs);
-
-	if (nf_h323_error_boundary(bs, 0))
+	if (nf_h323_error_boundary(bs, 0, 0))
 		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
@@ -301,12 +305,12 @@ static int decode_oid(struct bitstr *bs, const struct field_t *f,
 	PRINT("%*.s%s\n", level * TAB_SIZE, " ", f->name);
 
 	BYTE_ALIGN(bs);
-	if (nf_h323_error_boundary(bs, 1))
+	if (nf_h323_error_boundary(bs, 1, 0))
 		return H323_ERROR_BOUND;
 
 	len = *bs->cur++;
 	bs->cur += len;
-	if (nf_h323_error_boundary(bs, 0))
+	if (nf_h323_error_boundary(bs, 0, 0))
 		return H323_ERROR_BOUND;
 
 	return H323_ERROR_NONE;
@@ -330,6 +334,8 @@ static int decode_int(struct bitstr *bs, const struct field_t *f,
 		bs->cur += 2;
 		break;
 	case CONS:		/* 64K < Range < 4G */
+		if (nf_h323_error_boundary(bs, 0, 2))
+			return H323_ERROR_BOUND;
 		len = get_bits(bs, 2) + 1;
 		BYTE_ALIGN(bs);
 		if (base && (f->attr & DECODE)) {	/* timeToLive */
@@ -341,7 +347,7 @@ static int decode_int(struct bitstr *bs, const struct field_t *f,
 		break;
 	case UNCO:
 		BYTE_ALIGN(bs);
-		if (nf_h323_error_boundary(bs, 2))
+		if (nf_h323_error_boundary(bs, 2, 0))
 			return H323_ERROR_BOUND;
 		len = get_len(bs);
 		bs->cur += len;
@@ -353,7 +359,7 @@ static int decode_int(struct bitstr *bs, const struct field_t *f,
 
 	PRINT("\n");
 
-	if (nf_h323_error_boundary(bs, 0))
+	if (nf_h323_error_boundary(bs, 0, 0))
 		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
@@ -370,7 +376,7 @@ static int decode_enum(struct bitstr *bs, const struct field_t *f,
 		INC_BITS(bs, f->sz);
 	}
 
-	if (nf_h323_error_boundary(bs, 0))
+	if (nf_h323_error_boundary(bs, 0, 0))
 		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
@@ -389,13 +395,13 @@ static int decode_bitstr(struct bitstr *bs, const struct field_t *f,
 		len = f->lb;
 		break;
 	case WORD:		/* 2-byte length */
-		if (nf_h323_error_boundary(bs, 2))
+		if (nf_h323_error_boundary(bs, 2, 0))
 			return H323_ERROR_BOUND;
 		len = (*bs->cur++) << 8;
 		len += (*bs->cur++) + f->lb;
 		break;
 	case SEMI:
-		if (nf_h323_error_boundary(bs, 2))
+		if (nf_h323_error_boundary(bs, 2, 0))
 			return H323_ERROR_BOUND;
 		len = get_len(bs);
 		break;
@@ -407,7 +413,7 @@ static int decode_bitstr(struct bitstr *bs, const struct field_t *f,
 	bs->cur += len >> 3;
 	bs->bit = len & 7;
 
-	if (nf_h323_error_boundary(bs, 0))
+	if (nf_h323_error_boundary(bs, 0, 0))
 		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
@@ -421,12 +427,14 @@ static int decode_numstr(struct bitstr *bs, const struct field_t *f,
 	PRINT("%*.s%s\n", level * TAB_SIZE, " ", f->name);
 
 	/* 2 <= Range <= 255 */
+	if (nf_h323_error_boundary(bs, 0, f->sz))
+		return H323_ERROR_BOUND;
 	len = get_bits(bs, f->sz) + f->lb;
 
 	BYTE_ALIGN(bs);
 	INC_BITS(bs, (len << 2));
 
-	if (nf_h323_error_boundary(bs, 0))
+	if (nf_h323_error_boundary(bs, 0, 0))
 		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
@@ -458,17 +466,19 @@ static int decode_octstr(struct bitstr *bs, const struct field_t *f,
 		break;
 	case BYTE:		/* Range == 256 */
 		BYTE_ALIGN(bs);
-		if (nf_h323_error_boundary(bs, 1))
+		if (nf_h323_error_boundary(bs, 1, 0))
 			return H323_ERROR_BOUND;
 		len = (*bs->cur++) + f->lb;
 		break;
 	case SEMI:
 		BYTE_ALIGN(bs);
-		if (nf_h323_error_boundary(bs, 2))
+		if (nf_h323_error_boundary(bs, 2, 0))
 			return H323_ERROR_BOUND;
 		len = get_len(bs) + f->lb;
 		break;
 	default:		/* 2 <= Range <= 255 */
+		if (nf_h323_error_boundary(bs, 0, f->sz))
+			return H323_ERROR_BOUND;
 		len = get_bits(bs, f->sz) + f->lb;
 		BYTE_ALIGN(bs);
 		break;
@@ -478,7 +488,7 @@ static int decode_octstr(struct bitstr *bs, const struct field_t *f,
 
 	PRINT("\n");
 
-	if (nf_h323_error_boundary(bs, 0))
+	if (nf_h323_error_boundary(bs, 0, 0))
 		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
@@ -494,11 +504,13 @@ static int decode_bmpstr(struct bitstr *bs, const struct field_t *f,
 	switch (f->sz) {
 	case BYTE:		/* Range == 256 */
 		BYTE_ALIGN(bs);
-		if (nf_h323_error_boundary(bs, 1))
+		if (nf_h323_error_boundary(bs, 1, 0))
 			return H323_ERROR_BOUND;
 		len = (*bs->cur++) + f->lb;
 		break;
 	default:		/* 2 <= Range <= 255 */
+		if (nf_h323_error_boundary(bs, 0, f->sz))
+			return H323_ERROR_BOUND;
 		len = get_bits(bs, f->sz) + f->lb;
 		BYTE_ALIGN(bs);
 		break;
@@ -506,7 +518,7 @@ static int decode_bmpstr(struct bitstr *bs, const struct field_t *f,
 
 	bs->cur += len << 1;
 
-	if (nf_h323_error_boundary(bs, 0))
+	if (nf_h323_error_boundary(bs, 0, 0))
 		return H323_ERROR_BOUND;
 	return H323_ERROR_NONE;
 }
@@ -526,9 +538,13 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 	base = (base && (f->attr & DECODE)) ? base + f->offset : NULL;
 
 	/* Extensible? */
+	if (nf_h323_error_boundary(bs, 0, 1))
+		return H323_ERROR_BOUND;
 	ext = (f->attr & EXT) ? get_bit(bs) : 0;
 
 	/* Get fields bitmap */
+	if (nf_h323_error_boundary(bs, 0, f->sz))
+		return H323_ERROR_BOUND;
 	bmp = get_bitmap(bs, f->sz);
 	if (base)
 		*(unsigned int *)base = bmp;
@@ -548,10 +564,10 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 
 		/* Decode */
 		if (son->attr & OPEN) {	/* Open field */
-			if (nf_h323_error_boundary(bs, 2))
+			if (nf_h323_error_boundary(bs, 2, 0))
 				return H323_ERROR_BOUND;
 			len = get_len(bs);
-			if (nf_h323_error_boundary(bs, len))
+			if (nf_h323_error_boundary(bs, len, 0))
 				return H323_ERROR_BOUND;
 			if (!base || !(son->attr & DECODE)) {
 				PRINT("%*.s%s\n", (level + 1) * TAB_SIZE,
@@ -580,8 +596,10 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 		return H323_ERROR_NONE;
 
 	/* Get the extension bitmap */
+	if (nf_h323_error_boundary(bs, 0, 7))
+		return H323_ERROR_BOUND;
 	bmp2_len = get_bits(bs, 7) + 1;
-	if (nf_h323_error_boundary(bs, (bmp2_len + 7) >> 3))
+	if (nf_h323_error_boundary(bs, 0, bmp2_len))
 		return H323_ERROR_BOUND;
 	bmp2 = get_bitmap(bs, bmp2_len);
 	bmp |= bmp2 >> f->sz;
@@ -593,10 +611,10 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 	for (opt = 0; opt < bmp2_len; opt++, i++, son++) {
 		/* Check Range */
 		if (i >= f->ub) {	/* Newer Version? */
-			if (nf_h323_error_boundary(bs, 2))
+			if (nf_h323_error_boundary(bs, 2, 0))
 				return H323_ERROR_BOUND;
 			len = get_len(bs);
-			if (nf_h323_error_boundary(bs, len))
+			if (nf_h323_error_boundary(bs, len, 0))
 				return H323_ERROR_BOUND;
 			bs->cur += len;
 			continue;
@@ -611,10 +629,10 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
 		if (!((0x80000000 >> opt) & bmp2))	/* Not present */
 			continue;
 
-		if (nf_h323_error_boundary(bs, 2))
+		if (nf_h323_error_boundary(bs, 2, 0))
 			return H323_ERROR_BOUND;
 		len = get_len(bs);
-		if (nf_h323_error_boundary(bs, len))
+		if (nf_h323_error_boundary(bs, len, 0))
 			return H323_ERROR_BOUND;
 		if (!base || !(son->attr & DECODE)) {
 			PRINT("%*.s%s\n", (level + 1) * TAB_SIZE, " ",
@@ -653,13 +671,13 @@ static int decode_seqof(struct bitstr *bs, const struct field_t *f,
 	switch (f->sz) {
 	case BYTE:
 		BYTE_ALIGN(bs);
-		if (nf_h323_error_boundary(bs, 1))
+		if (nf_h323_error_boundary(bs, 1, 0))
 			return H323_ERROR_BOUND;
 		count = *bs->cur++;
 		break;
 	case WORD:
 		BYTE_ALIGN(bs);
-		if (nf_h323_error_boundary(bs, 2))
+		if (nf_h323_error_boundary(bs, 2, 0))
 			return H323_ERROR_BOUND;
 		count = *bs->cur++;
 		count <<= 8;
@@ -667,11 +685,13 @@ static int decode_seqof(struct bitstr *bs, const struct field_t *f,
 		break;
 	case SEMI:
 		BYTE_ALIGN(bs);
-		if (nf_h323_error_boundary(bs, 2))
+		if (nf_h323_error_boundary(bs, 2, 0))
 			return H323_ERROR_BOUND;
 		count = get_len(bs);
 		break;
 	default:
+		if (nf_h323_error_boundary(bs, 0, f->sz))
+			return H323_ERROR_BOUND;
 		count = get_bits(bs, f->sz);
 		break;
 	}
@@ -691,8 +711,10 @@ static int decode_seqof(struct bitstr *bs, const struct field_t *f,
 	for (i = 0; i < count; i++) {
 		if (son->attr & OPEN) {
 			BYTE_ALIGN(bs);
+			if (nf_h323_error_boundary(bs, 2, 0))
+				return H323_ERROR_BOUND;
 			len = get_len(bs);
-			if (nf_h323_error_boundary(bs, len))
+			if (nf_h323_error_boundary(bs, len, 0))
 				return H323_ERROR_BOUND;
 			if (!base || !(son->attr & DECODE)) {
 				PRINT("%*.s%s\n", (level + 1) * TAB_SIZE,
@@ -744,11 +766,17 @@ static int decode_choice(struct bitstr *bs, const struct field_t *f,
 	base = (base && (f->attr & DECODE)) ? base + f->offset : NULL;
 
 	/* Decode the choice index number */
+	if (nf_h323_error_boundary(bs, 0, 1))
+		return H323_ERROR_BOUND;
 	if ((f->attr & EXT) && get_bit(bs)) {
 		ext = 1;
+		if (nf_h323_error_boundary(bs, 0, 7))
+			return H323_ERROR_BOUND;
 		type = get_bits(bs, 7) + f->lb;
 	} else {
 		ext = 0;
+		if (nf_h323_error_boundary(bs, 0, f->sz))
+			return H323_ERROR_BOUND;
 		type = get_bits(bs, f->sz);
 		if (type >= f->lb)
 			return H323_ERROR_RANGE;
@@ -761,8 +789,10 @@ static int decode_choice(struct bitstr *bs, const struct field_t *f,
 	/* Check Range */
 	if (type >= f->ub) {	/* Newer version? */
 		BYTE_ALIGN(bs);
+		if (nf_h323_error_boundary(bs, 2, 0))
+			return H323_ERROR_BOUND;
 		len = get_len(bs);
-		if (nf_h323_error_boundary(bs, len))
+		if (nf_h323_error_boundary(bs, len, 0))
 			return H323_ERROR_BOUND;
 		bs->cur += len;
 		return H323_ERROR_NONE;
@@ -777,8 +807,10 @@ static int decode_choice(struct bitstr *bs, const struct field_t *f,
 
 	if (ext || (son->attr & OPEN)) {
 		BYTE_ALIGN(bs);
+		if (nf_h323_error_boundary(bs, len, 0))
+			return H323_ERROR_BOUND;
 		len = get_len(bs);
-		if (nf_h323_error_boundary(bs, len))
+		if (nf_h323_error_boundary(bs, len, 0))
 			return H323_ERROR_BOUND;
 		if (!base || !(son->attr & DECODE)) {
 			PRINT("%*.s%s\n", (level + 1) * TAB_SIZE, " ",
-- 
2.11.0

^ permalink raw reply related

* [PATCH 02/12] netfilter: exit_net cleanup check added
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Vasily Averin <vvs@virtuozzo.com>

Be sure that lists initialized in net_init hook was return to initial
state.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 1 +
 net/netfilter/nf_tables_api.c      | 7 +++++++
 net/netfilter/nfnetlink_log.c      | 5 +++++
 net/netfilter/nfnetlink_queue.c    | 5 +++++
 net/netfilter/x_tables.c           | 9 +++++++++
 5 files changed, 27 insertions(+)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 17b4ca562944..e35b8d074f06 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -819,6 +819,7 @@ static void clusterip_net_exit(struct net *net)
 	cn->procdir = NULL;
 #endif
 	nf_unregister_net_hook(net, &cip_arp_ops);
+	WARN_ON_ONCE(!list_empty(&cn->configs));
 }
 
 static struct pernet_operations clusterip_net_ops = {
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d8327b43e4dc..10798b357481 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5847,6 +5847,12 @@ static int __net_init nf_tables_init_net(struct net *net)
 	return 0;
 }
 
+static void __net_exit nf_tables_exit_net(struct net *net)
+{
+	WARN_ON_ONCE(!list_empty(&net->nft.af_info));
+	WARN_ON_ONCE(!list_empty(&net->nft.commit_list));
+}
+
 int __nft_release_basechain(struct nft_ctx *ctx)
 {
 	struct nft_rule *rule, *nr;
@@ -5917,6 +5923,7 @@ static void __nft_release_afinfo(struct net *net, struct nft_af_info *afi)
 
 static struct pernet_operations nf_tables_net_ops = {
 	.init	= nf_tables_init_net,
+	.exit	= nf_tables_exit_net,
 };
 
 static int __init nf_tables_module_init(void)
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index cad6498f10b0..1f511ed0fea3 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -1093,10 +1093,15 @@ static int __net_init nfnl_log_net_init(struct net *net)
 
 static void __net_exit nfnl_log_net_exit(struct net *net)
 {
+	struct nfnl_log_net *log = nfnl_log_pernet(net);
+	unsigned int i;
+
 #ifdef CONFIG_PROC_FS
 	remove_proc_entry("nfnetlink_log", net->nf.proc_netfilter);
 #endif
 	nf_log_unset(net, &nfulnl_logger);
+	for (i = 0; i < INSTANCE_BUCKETS; i++)
+		WARN_ON_ONCE(!hlist_empty(&log->instance_table[i]));
 }
 
 static struct pernet_operations nfnl_log_net_ops = {
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index a16356cacec3..c09b36755ed7 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -1512,10 +1512,15 @@ static int __net_init nfnl_queue_net_init(struct net *net)
 
 static void __net_exit nfnl_queue_net_exit(struct net *net)
 {
+	struct nfnl_queue_net *q = nfnl_queue_pernet(net);
+	unsigned int i;
+
 	nf_unregister_queue_handler(net);
 #ifdef CONFIG_PROC_FS
 	remove_proc_entry("nfnetlink_queue", net->nf.proc_netfilter);
 #endif
+	for (i = 0; i < INSTANCE_BUCKETS; i++)
+		WARN_ON_ONCE(!hlist_empty(&q->instance_table[i]));
 }
 
 static void nfnl_queue_net_exit_batch(struct list_head *net_exit_list)
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index a77dd514297c..55802e97f906 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1729,8 +1729,17 @@ static int __net_init xt_net_init(struct net *net)
 	return 0;
 }
 
+static void __net_exit xt_net_exit(struct net *net)
+{
+	int i;
+
+	for (i = 0; i < NFPROTO_NUMPROTO; i++)
+		WARN_ON_ONCE(!list_empty(&net->xt.tables[i]));
+}
+
 static struct pernet_operations xt_net_ops = {
 	.init = xt_net_init,
+	.exit = xt_net_exit,
 };
 
 static int __init xt_init(void)
-- 
2.11.0

^ permalink raw reply related

* [PATCH 01/12] netfilter: remove redundant assignment to e
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Colin Ian King <colin.king@canonical.com>

The assignment to variable e is redundant since the same assignment
occurs just a few lines later, hence it can be removed.  Cleans up
clang warning for arp_tables, ip_tables and ip6_tables:

warning: Value stored to 'e' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/arp_tables.c | 1 -
 net/ipv4/netfilter/ip_tables.c  | 1 -
 net/ipv6/netfilter/ip6_tables.c | 1 -
 3 files changed, 3 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index f88221aebc9d..0c3c944a7b72 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -373,7 +373,6 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 					if (!xt_find_jump_offset(offsets, newpos,
 								 newinfo->number))
 						return 0;
-					e = entry0 + newpos;
 				} else {
 					/* ... this is a fallthru */
 					newpos = pos + e->next_offset;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 4cbe5e80f3bf..2e0d339028bb 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -439,7 +439,6 @@ mark_source_chains(const struct xt_table_info *newinfo,
 					if (!xt_find_jump_offset(offsets, newpos,
 								 newinfo->number))
 						return 0;
-					e = entry0 + newpos;
 				} else {
 					/* ... this is a fallthru */
 					newpos = pos + e->next_offset;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index f06e25065a34..1d7ae9366335 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -458,7 +458,6 @@ mark_source_chains(const struct xt_table_info *newinfo,
 					if (!xt_find_jump_offset(offsets, newpos,
 								 newinfo->number))
 						return 0;
-					e = entry0 + newpos;
 				} else {
 					/* ... this is a fallthru */
 					newpos = pos + e->next_offset;
-- 
2.11.0

^ permalink raw reply related

* [PATCH 00/12] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The follow patchset contains Netfilter fixes for your net tree,
they are:

1) Fix compilation warning in x_tables with clang due to useless
   redundant reassignment, from Colin Ian King.

2) Add bugtrap to net_exit to catch uninitialized lists, patch
   from Vasily Averin.

3) Fix out of bounds memory reads in H323 conntrack helper, this
   comes with an initial patch to remove replace the obscure
   CHECK_BOUND macro as a dependency. From Eric Sesterhenn.

4) Reduce retransmission timeout when window is 0 in TCP conntrack,
   from Florian Westphal.

6) ctnetlink clamp timeout to INT_MAX if timeout is too large,
   otherwise timeout wraps around and it results in killing the
   entry that is being added immediately.

7) Missing CAP_NET_ADMIN checks in cthelper and xt_osf, due to
   no netns support. From Kevin Cernekee.

8) Missing maximum number of instructions checks in xt_bpf, patch
   from Jann Horn.

9) With no CONFIG_PROC_FS ipt_CLUSTERIP compilation breaks,
   patch from Arnd Bergmann.

10) Missing netlink attribute policy in nftables exthdr, from
    Florian Westphal.

11) Enable conntrack with IPv6 MASQUERADE rules, as a357b3f80bc8
    should have done in first place, from Konstantin Khlebnikov.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 32a72bbd5da2411eab591bf9bc2e39349106193a:

  net: vxge: Fix some indentation issues (2017-11-20 11:36:30 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 23715275e4fb6f64358a499d20928a9e93819f2f:

  netfilter: ip6t_MASQUERADE: add dependency on conntrack module (2017-12-11 17:04:50 +0100)

----------------------------------------------------------------
Arnd Bergmann (1):
      netfilter: ipt_CLUSTERIP: fix clusterip_net_exit build regression

Colin Ian King (1):
      netfilter: remove redundant assignment to e

Eric Sesterhenn (2):
      netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function
      netfilter: nf_ct_h323: Extend nf_h323_error_boundary to work on bits as well

Florian Westphal (2):
      netfilter: conntrack: lower timeout to RETRANS seconds if window is 0
      netfilter: exthdr: add missign attributes to policy

Jann Horn (1):
      netfilter: xt_bpf: add overflow checks

Jay Elliott (1):
      netfilter: conntrack: clamp timeouts to INT_MAX

Kevin Cernekee (2):
      netfilter: nfnetlink_cthelper: Add missing permission checks
      netfilter: xt_osf: Add missing permission checks

Konstantin Khlebnikov (1):
      netfilter: ip6t_MASQUERADE: add dependency on conntrack module

Vasily Averin (1):
      netfilter: exit_net cleanup check added

 net/ipv4/netfilter/arp_tables.c        |   1 -
 net/ipv4/netfilter/ip_tables.c         |   1 -
 net/ipv4/netfilter/ipt_CLUSTERIP.c     |   3 +-
 net/ipv6/netfilter/ip6_tables.c        |   1 -
 net/ipv6/netfilter/ip6t_MASQUERADE.c   |   8 ++-
 net/netfilter/nf_conntrack_h323_asn1.c | 128 +++++++++++++++++++++++++--------
 net/netfilter/nf_conntrack_netlink.c   |  12 +++-
 net/netfilter/nf_conntrack_proto_tcp.c |   3 +
 net/netfilter/nf_tables_api.c          |   7 ++
 net/netfilter/nfnetlink_cthelper.c     |  10 +++
 net/netfilter/nfnetlink_log.c          |   5 ++
 net/netfilter/nfnetlink_queue.c        |   5 ++
 net/netfilter/nft_exthdr.c             |   2 +
 net/netfilter/x_tables.c               |   9 +++
 net/netfilter/xt_bpf.c                 |   6 ++
 net/netfilter/xt_osf.c                 |   7 ++
 16 files changed, 170 insertions(+), 38 deletions(-)

^ permalink raw reply

* [PATCH 08/12] netfilter: xt_bpf: add overflow checks
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Jann Horn <jannh@google.com>

Check whether inputs from userspace are too long (explicit length field too
big or string not null-terminated) to avoid out-of-bounds reads.

As far as I can tell, this can at worst lead to very limited kernel heap
memory disclosure or oopses.

This bug can be triggered by an unprivileged user even if the xt_bpf module
is not loaded: iptables is available in network namespaces, and the xt_bpf
module can be autoloaded.

Triggering the bug with a classic BPF filter with fake length 0x1000 causes
the following KASAN report:

==================================================================
BUG: KASAN: slab-out-of-bounds in bpf_prog_create+0x84/0xf0
Read of size 32768 at addr ffff8801eff2c494 by task test/4627

CPU: 0 PID: 4627 Comm: test Not tainted 4.15.0-rc1+ #1
[...]
Call Trace:
 dump_stack+0x5c/0x85
 print_address_description+0x6a/0x260
 kasan_report+0x254/0x370
 ? bpf_prog_create+0x84/0xf0
 memcpy+0x1f/0x50
 bpf_prog_create+0x84/0xf0
 bpf_mt_check+0x90/0xd6 [xt_bpf]
[...]
Allocated by task 4627:
 kasan_kmalloc+0xa0/0xd0
 __kmalloc_node+0x47/0x60
 xt_alloc_table_info+0x41/0x70 [x_tables]
[...]
The buggy address belongs to the object at ffff8801eff2c3c0
                which belongs to the cache kmalloc-2048 of size 2048
The buggy address is located 212 bytes inside of
                2048-byte region [ffff8801eff2c3c0, ffff8801eff2cbc0)
[...]
==================================================================

Fixes: e6f30c731718 ("netfilter: x_tables: add xt_bpf match")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_bpf.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index 041da0d9c06f..1f7fbd3c7e5a 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -27,6 +27,9 @@ static int __bpf_mt_check_bytecode(struct sock_filter *insns, __u16 len,
 {
 	struct sock_fprog_kern program;
 
+	if (len > XT_BPF_MAX_NUM_INSTR)
+		return -EINVAL;
+
 	program.len = len;
 	program.filter = insns;
 
@@ -55,6 +58,9 @@ static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
 	mm_segment_t oldfs = get_fs();
 	int retval, fd;
 
+	if (strnlen(path, XT_BPF_PATH_MAX) == XT_BPF_PATH_MAX)
+		return -EINVAL;
+
 	set_fs(KERNEL_DS);
 	fd = bpf_obj_get_user(path, 0);
 	set_fs(oldfs);
-- 
2.11.0


^ permalink raw reply related

* [PATCH 07/12] netfilter: nfnetlink_cthelper: Add missing permission checks
From: Pablo Neira Ayuso @ 2017-12-13 18:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20171213184520.8193-1-pablo@netfilter.org>

From: Kevin Cernekee <cernekee@chromium.org>

The capability check in nfnetlink_rcv() verifies that the caller
has CAP_NET_ADMIN in the namespace that "owns" the netlink socket.
However, nfnl_cthelper_list is shared by all net namespaces on the
system.  An unprivileged user can create user and net namespaces
in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable()
check:

    $ nfct helper list
    nfct v1.4.4: netlink error: Operation not permitted
    $ vpnns -- nfct helper list
    {
            .name = ftp,
            .queuenum = 0,
            .l3protonum = 2,
            .l4protonum = 6,
            .priv_data_len = 24,
            .status = enabled,
    };

Add capable() checks in nfnetlink_cthelper, as this is cleaner than
trying to generalize the solution.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink_cthelper.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/netfilter/nfnetlink_cthelper.c b/net/netfilter/nfnetlink_cthelper.c
index 41628b393673..d33ce6d5ebce 100644
--- a/net/netfilter/nfnetlink_cthelper.c
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -17,6 +17,7 @@
 #include <linux/types.h>
 #include <linux/list.h>
 #include <linux/errno.h>
+#include <linux/capability.h>
 #include <net/netlink.h>
 #include <net/sock.h>
 
@@ -407,6 +408,9 @@ static int nfnl_cthelper_new(struct net *net, struct sock *nfnl,
 	struct nfnl_cthelper *nlcth;
 	int ret = 0;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!tb[NFCTH_NAME] || !tb[NFCTH_TUPLE])
 		return -EINVAL;
 
@@ -611,6 +615,9 @@ static int nfnl_cthelper_get(struct net *net, struct sock *nfnl,
 	struct nfnl_cthelper *nlcth;
 	bool tuple_set = false;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = nfnl_cthelper_dump_table,
@@ -678,6 +685,9 @@ static int nfnl_cthelper_del(struct net *net, struct sock *nfnl,
 	struct nfnl_cthelper *nlcth, *n;
 	int j = 0, ret;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (tb[NFCTH_NAME])
 		helper_name = nla_data(tb[NFCTH_NAME]);
 
-- 
2.11.0


^ permalink raw reply related

* Re: [net 1/1] tipc: eliminate potential memory leak
From: David Miller @ 2017-12-13 18:45 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen,
	hoang.h.le, canh.d.luu, ying.xue, tipc-discussion
In-Reply-To: <1513015915-22187-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Mon, 11 Dec 2017 19:11:55 +0100

> In the function tipc_sk_mcast_rcv() we call refcount_dec(&skb->users)
> on received sk_buffers. Since the reference counter might hit zero at
> this point, we have a potential memory leak.
> 
> We fix this by replacing refcount_dec() with kfree_skb().
> 
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [patch net-next v3 00/10] net: sched: allow qdiscs to share filter block instances
From: Jiri Pirko @ 2017-12-13 18:42 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <3d3f57ae-23b7-6389-7276-3019f57dce01@gmail.com>

Wed, Dec 13, 2017 at 07:28:59PM CET, dsahern@gmail.com wrote:
>On 12/13/17 10:39 AM, Jiri Pirko wrote:
>> Wed, Dec 13, 2017 at 06:18:04PM CET, dsahern@gmail.com wrote:
>>> On 12/13/17 10:07 AM, Jiri Pirko wrote:
>>>> Wed, Dec 13, 2017 at 05:54:35PM CET, dsahern@gmail.com wrote:
>>>>> On 12/13/17 8:10 AM, Jiri Pirko wrote:
>>>>>> So back to the example. First, we create 2 qdiscs. Both will share
>>>>>> block number 22. "22" is just an identification. If we don't pass any
>>>>>> block number, a new one will be generated by kernel:
>>>>>>
>>>>>> $ tc qdisc add dev ens7 ingress block 22
>>>>>>                                 ^^^^^^^^
>>>>>> $ tc qdisc add dev ens8 ingress block 22
>>>>>>                                 ^^^^^^^^
>>>>>>
>>>>>> Now if we list the qdiscs, we will see the block index in the output:
>>>>>>
>>>>>> $ tc qdisc
>>>>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>>>>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>>>>>
>>>>>> To make is more visual, the situation looks like this:
>>>>>>
>>>>>>    ens7 ingress qdisc                 ens7 ingress qdisc
>>>>>>           |                                  |
>>>>>>           |                                  |
>>>>>>           +---------->  block 22  <----------+
>>>>>>
>>>>>> Unlimited number of qdiscs may share the same block.
>>>>>>
>>>>>> Now we can add filter to any of qdiscs sharing the same block:
>>>>>>
>>>>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>>>
>>>>> I still say this is very odd user semantic - making changes to device M
>>>>> and the changes magically affect device N. Operating on the shared block
>>>>> as a separate object makes it is much more direct and clear.
>>>>
>>>> I plan to do it as a follow-up patch. But this is how things are done
>>>> now and have to continue to work.
>>>
>>> Why is that? You are introducing the notion of a shared block with this
>>> patch set. What is the legacy "how things are done now" you are
>>> referring to?
>> 
>> Well, the filter add/del should just work no matter if the block behind is
>> shared or not.
>
>My argument is that modifying a shared block instance via a dev should
>not be allowed. Those changes should only be allowed via the shared
>block. So if a user puts adds a shared block to the device and then
>attempts to add a filter via the device it should not be allowed.

I don't see why. The handle is the qdisc here.

^ permalink raw reply

* [PATCH net-next v2] bpf/tracing: fix kernel/events/core.c compilation error
From: Yonghong Song @ 2017-12-13 18:35 UTC (permalink / raw)
  To: ast, daniel, sfr, netdev; +Cc: kernel-team

Commit f371b304f12e ("bpf/tracing: allow user space to
query prog array on the same tp") introduced a perf
ioctl command to query prog array attached to the
same perf tracepoint. The commit introduced a
compilation error under certain config conditions, e.g.,
  (1). CONFIG_BPF_SYSCALL is not defined, or
  (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
       nor CONFIG_KPROBE_EVENTS is defined.

Error message:
  kernel/events/core.o: In function `perf_ioctl':
  core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'

This patch fixed this error by guarding the real definition under
CONFIG_BPF_EVENTS and provided static inline dummy function
if CONFIG_BPF_EVENTS was not defined.
It renamed the function from bpf_event_query_prog_array to
perf_event_query_prog_array and moved the definition from linux/bpf.h
to linux/trace_events.h so the definition is in proximity to
other prog_array related functions.

Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h          | 1 -
 include/linux/trace_events.h | 6 ++++++
 kernel/events/core.c         | 2 +-
 kernel/trace/bpf_trace.c     | 2 +-
 4 files changed, 8 insertions(+), 3 deletions(-)

Changelog:
 v1 -> v2:
   . Updated the commit message
   . Using CONFIG_BPF_EVENTS to guard the real definition
   . Renamed the function and move the definition from linux/bpf.h
     to linux/trace_events.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 93e15b9..54dc7ca 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -254,7 +254,6 @@ typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src,
 
 u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
 		     void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
-int bpf_event_query_prog_array(struct perf_event *event, void __user *info);
 
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 5fea451..8a1442c 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -467,6 +467,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
 unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
 void perf_event_detach_bpf_prog(struct perf_event *event);
+int perf_event_query_prog_array(struct perf_event *event, void __user *info);
 #else
 static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 {
@@ -481,6 +482,11 @@ perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 
 static inline void perf_event_detach_bpf_prog(struct perf_event *event) { }
 
+static inline int
+perf_event_query_prog_array(struct perf_event *event, void __user *info)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 enum {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5857c500..34fda6a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4725,7 +4725,7 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 	}
 
 	case PERF_EVENT_IOC_QUERY_BPF:
-		return bpf_event_query_prog_array(event, (void __user *)arg);
+		return perf_event_query_prog_array(event, (void __user *)arg);
 	default:
 		return -ENOTTY;
 	}
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e009b7e..c5dd60c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -856,7 +856,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
 	mutex_unlock(&bpf_event_mutex);
 }
 
-int bpf_event_query_prog_array(struct perf_event *event, void __user *info)
+int perf_event_query_prog_array(struct perf_event *event, void __user *info)
 {
 	struct perf_event_query_bpf __user *uquery = info;
 	struct perf_event_query_bpf query = {};
-- 
2.9.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox