Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] bnxt_en: Fix memory fault in bnxt_ethtool_init()
From: David Miller @ 2018-04-19 20:35 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, kernel-team
In-Reply-To: <1524122176-13511-1-git-send-email-michael.chan@broadcom.com>

From: Michael Chan <michael.chan@broadcom.com>
Date: Thu, 19 Apr 2018 03:16:16 -0400

> From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
> 
> In some firmware images, the length of BNX_DIR_TYPE_PKG_LOG nvram type
> could be greater than the fixed buffer length of 4096 bytes allocated by
> the driver.  This was causing HWRM_NVM_READ to copy more data to the buffer
> than the allocated size, causing general protection fault.
> 
> Fix the issue by allocating the exact buffer length returned by
> HWRM_NVM_FIND_DIR_ENTRY, instead of 4096.  Move the kzalloc() call
> into the bnxt_get_pkgver() function.
> 
> Fixes: 3ebf6f0a09a2 ("bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO")
> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>

Applied, thanks Michael.

^ permalink raw reply

* Re: [PATCH] net: hns: Avoid action name truncation
From: David Miller @ 2018-04-19 20:30 UTC (permalink / raw)
  To: dann.frazier; +Cc: yisen.zhuang, salil.mehta, netdev, linux-kernel, linyunsheng
In-Reply-To: <20180419035541.6318-1-dann.frazier@canonical.com>

From: dann frazier <dann.frazier@canonical.com>
Date: Wed, 18 Apr 2018 21:55:41 -0600

> When longer interface names are used, the action names exposed in
> /proc/interrupts and /proc/irq/* maybe truncated. For example, when
> using the predictable name algorithm in systemd on a HiSilicon D05,
> I see:
> 
>   ubuntu@d05-3:~$  grep enahisic2i0-tx /proc/interrupts | sed 's/.* //'
>   enahisic2i0-tx0
>   enahisic2i0-tx1
>   [...]
>   enahisic2i0-tx8
>   enahisic2i0-tx9
>   enahisic2i0-tx1
>   enahisic2i0-tx1
>   enahisic2i0-tx1
>   enahisic2i0-tx1
>   enahisic2i0-tx1
>   enahisic2i0-tx1
> 
> Increase the max ring name length to allow for an interface name
> of IFNAMSIZE. After this change, I now see:
> 
>   $ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //'
>   enahisic2i0-tx0
>   enahisic2i0-tx1
>   enahisic2i0-tx2
>   [...]
>   enahisic2i0-tx8
>   enahisic2i0-tx9
>   enahisic2i0-tx10
>   enahisic2i0-tx11
>   enahisic2i0-tx12
>   enahisic2i0-tx13
>   enahisic2i0-tx14
>   enahisic2i0-tx15
> 
> Signed-off-by: dann frazier <dann.frazier@canonical.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next 00/11] Modernize mdio-gpio
From: Andrew Lunn @ 2018-04-19 20:20 UTC (permalink / raw)
  To: Linus Walleij; +Cc: David Miller, netdev, Florian Fainelli
In-Reply-To: <CACRpkdaXuJHMBS5Vodrj4CzgB7MwiJp_gXJuQrjEOLF4ARPydQ@mail.gmail.com>

On Thu, Apr 19, 2018 at 09:52:09PM +0200, Linus Walleij wrote:
> On Thu, Apr 19, 2018 at 1:02 AM, Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > This patchset is inspired by a previous version by Linus Walleij
> >
> > It reworks the mdio-gpio code to make use of gpio descriptors instead
> > of gpio numbers. However compared to the previous version, it retains
> > support for platform devices. It does however remove the platform_data
> > header file. The needed GPIOs are now passed by making use of a gpiod
> > lookup table. e.g:
> 
> Looks good to me, but wasn't this what Florian was NACKing?

Hi Linus

At the time, i don't think either Florian or i knew about gpiod lookup
tables. It was only when i got deep into this patchset i found them.

> I thought he was going to add some x86 MDIO using platform data,
> and then I suppose he wanted to use something more than some
> GPIO descriptors, maybe IRQ etc (who knows)?

I now have said x86 MDIO device, connecting to an Ethernet switch. It
works :-)

      Andrew

^ permalink raw reply

* Re: [PATCH v4 00/10] New network driver for Amiga X-Surf 100 (m68k)
From: David Miller @ 2018-04-19 20:11 UTC (permalink / raw)
  To: schmitzmic
  Cc: netdev, andrew, fthain, geert, f.fainelli, linux-m68k,
	Michael.Karcher
In-Reply-To: <1524103526-12240-1-git-send-email-schmitzmic@gmail.com>

From: Michael Schmitz <schmitzmic@gmail.com>
Date: Thu, 19 Apr 2018 14:05:17 +1200

> This patch series adds support for the Individual Computers X-Surf 100
> network card for m68k Amiga, a network adapter based on the AX88796 chip set.

Series applied, thank you.

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen-netfront: Fix hang on device removal
From: Simon Gaiser @ 2018-04-19 20:09 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: netdev, xen-devel, Eduardo Otubo, Juergen Gross, Boris Ostrovsky,
	open list
In-Reply-To: <CAKf6xpuusCJ0DMJ_G3hG5pd9vd8rUBH=VPa4TCvaoptGto-=Zw@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1353 bytes --]

Jason Andryuk:
> On Thu, Apr 19, 2018 at 2:10 PM, Simon Gaiser
> <simon@invisiblethingslab.com> wrote:
>> Jason Andryuk:
>>> A toolstack may delete the vif frontend and backend xenstore entries
>>> while xen-netfront is in the removal code path.  In that case, the
>>> checks for xenbus_read_driver_state would return XenbusStateUnknown, and
>>> xennet_remove would hang indefinitely.  This hang prevents system
>>> shutdown.
>>>
>>> xennet_remove must be able to handle XenbusStateUnknown, and
>>> netback_changed must also wake up the wake_queue for that state as well.
>>>
>>> Fixes: 5b5971df3bc2 ("xen-netfront: remove warning when unloading module")
>>
>> I think this should go into stable since AFAIK the hanging network
>> device can only be fixed by rebooting the guest. AFAICS this affects all
>> 4.* branches since 5b5971df3bc2 got backported to them.
>>
>> Upstream commit c2d2e6738a209f0f9dffa2dc8e7292fc45360d61.
> 
> Simon,
> 
> Yes, I agree.  I actually submitted the request to stable earlier
> today, so hopefully it gets added soon.

Ok, great. (I checked the stable patch queue, but didn't check the
mailing list archive).

> Have you experienced this hang?

Yes, it's affecting the kernel shipped by Qubes OS (see [1]).

Thanks, Simon.

[1]: https://github.com/QubesOS/qubes-issues/issues/3657


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 01/11] net: phy_ mdio-gpio: Fixup , which should be ;
From: David Miller @ 2018-04-19 20:00 UTC (permalink / raw)
  To: andrew; +Cc: netdev, f.fainelli, linus.walleij
In-Reply-To: <1524092579-15625-2-git-send-email-andrew@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 19 Apr 2018 01:02:49 +0200

> @@ -161,7 +161,7 @@ static struct mii_bus *mdio_gpio_bus_init(struct device *dev,
>  	if (!new_bus)
>  		goto out;
>  
> -	new_bus->name = "GPIO Bitbanged MDIO",
> +	new_bus->name = "GPIO Bitbanged MDIO";

Would be so great to find a way to automatically detect these somehow.

Yes, they are useful when controlling evaluation in weird ways in
macros etc.  However most of the time if a ',' is used in a context
where a ';' also works, it's unintentional.

^ permalink raw reply

* Re: [PATCH net-next 00/11] Modernize mdio-gpio
From: David Miller @ 2018-04-19 19:59 UTC (permalink / raw)
  To: andrew; +Cc: netdev, f.fainelli, linus.walleij
In-Reply-To: <1524092579-15625-1-git-send-email-andrew@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 19 Apr 2018 01:02:48 +0200

> This patchset is inspired by a previous version by Linus Walleij
> 
> It reworks the mdio-gpio code to make use of gpio descriptors instead
> of gpio numbers. However compared to the previous version, it retains
> support for platform devices. It does however remove the platform_data
> header file. The needed GPIOs are now passed by making use of a gpiod
> lookup table. e.g:
> 
> static struct gpiod_lookup_table zii_scu_mdio_gpiod_table = {
> 	.dev_id = "mdio-gpio.0",
> 	.table = {
> 		GPIO_LOOKUP_IDX("gpio_ich", 17, NULL, MDIO_GPIO_MDC,
> 				GPIO_ACTIVE_HIGH),
> 		GPIO_LOOKUP_IDX("gpio_ich", 2, NULL, MDIO_GPIO_MDIO,
> 				GPIO_ACTIVE_HIGH),
> 		GPIO_LOOKUP_IDX("gpio_ich", 21, NULL, MDIO_GPIO_MDO,
> 				GPIO_ACTIVE_LOW),
> 	},
> };

Nice set of simplifications, applied.

^ permalink raw reply

* Re: [PATCH net-next 00/11] Modernize mdio-gpio
From: Linus Walleij @ 2018-04-19 19:52 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David Miller, netdev, Florian Fainelli
In-Reply-To: <1524092579-15625-1-git-send-email-andrew@lunn.ch>

On Thu, Apr 19, 2018 at 1:02 AM, Andrew Lunn <andrew@lunn.ch> wrote:

> This patchset is inspired by a previous version by Linus Walleij
>
> It reworks the mdio-gpio code to make use of gpio descriptors instead
> of gpio numbers. However compared to the previous version, it retains
> support for platform devices. It does however remove the platform_data
> header file. The needed GPIOs are now passed by making use of a gpiod
> lookup table. e.g:

Looks good to me, but wasn't this what Florian was NACKing?

I thought he was going to add some x86 MDIO using platform data,
and then I suppose he wanted to use something more than some
GPIO descriptors, maybe IRQ etc (who knows)?

If he only needs to put in GPIO descriptors then this is fine of
course.

Anyway the series has a solid:
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

Thanks,
Linus Walleij

^ permalink raw reply

* [PATCH v2] net: ethernet: ti: cpsw: fix tx vlan priority mapping
From: Ivan Khoronzhuk @ 2018-04-19 19:49 UTC (permalink / raw)
  To: grygorii.strashko
  Cc: davem, linux-omap, netdev, linux-kernel, Ivan Khoronzhuk

The CPDMA_TX_PRIORITY_MAP in real is vlan pcp field priority mapping
register and basically replaces vlan pcp field for tagged packets.
So, set it to be 1:1 mapping. Otherwise, it will cause unexpected
change of egress vlan tagged packets, like prio 2 -> prio 5.

Fixes: e05107e6b747 ("net: ethernet: ti: cpsw: add multi queue support")
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
Based on net/master

 drivers/net/ethernet/ti/cpsw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 3037127..74f8284 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -129,7 +129,7 @@ do {								\
 
 #define RX_PRIORITY_MAPPING	0x76543210
 #define TX_PRIORITY_MAPPING	0x33221100
-#define CPDMA_TX_PRIORITY_MAP	0x01234567
+#define CPDMA_TX_PRIORITY_MAP	0x76543210
 
 #define CPSW_VLAN_AWARE		BIT(1)
 #define CPSW_RX_VLAN_ENCAP	BIT(2)
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Andrew Morton @ 2018-04-19 19:47 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David Miller, linux-mm, eric.dumazet, edumazet, bhutchings,
	netdev, linux-kernel, mst, jasowang, virtualization, dm-devel,
	Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>

On Thu, 19 Apr 2018 12:12:38 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote:

> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
> 
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.

Yes, that's nasty.

> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> ...
>
> --- linux-2.6.orig/mm/util.c	2018-04-18 15:46:23.000000000 +0200
> +++ linux-2.6/mm/util.c	2018-04-18 16:00:43.000000000 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> +#ifndef CONFIG_DEBUG_VM
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>  	 */
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
> +#endif
>  
>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));

Well, it doesn't have to be done at compile-time, does it?  We could
add a knob (in debugfs, presumably) which enables this at runtime. 
That's far more user-friendly.

^ permalink raw reply

* Re: [PATCH net-next 0/8] net/ipv6: followup to fib6_info change
From: David Miller @ 2018-04-19 19:42 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, idosch, roopa, eric.dumazet, weiwan, kafai, yoshfuji
In-Reply-To: <20180418223906.16650-1-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Wed, 18 Apr 2018 15:38:58 -0700

> Followup to fib change for IPv6.
> 
> First 2 patches rename fib6_info struct elements to match its name,
> and rename addrconf_dst_alloc to match what it returns.
> 
> Patches 3-7 refactor the code to remove the need for fib6_idev reducing
> fib6_info by another 8 bytes to 200 bytes.
> 
> Patch 8 fixes the gfp flags argument to addrconf_prefix_route in a
> couple of places.

Series applied.

I'm really happy with how these changes are making the code look.

The disconnect between dst.dev and rt6i_idev always bugged me.

^ permalink raw reply

* Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format
From: Arnaldo Carvalho de Melo @ 2018-04-19 19:40 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20180418225606.2771620-1-kafai@fb.com>

Em Wed, Apr 18, 2018 at 03:55:56PM -0700, Martin KaFai Lau escreveu:
> This patch introduces BPF Type Format (BTF).
> 
> BTF (BPF Type Format) is the meta data format which describes
> the data types of BPF program/map.  Hence, it basically focus
> on the C programming language which the modern BPF is primary
> using.  The first use case is to provide a generic pretty print
> capability for a BPF map.
> 
> A modified pahole that can convert dwarf to BTF is here:
> https://github.com/iamkafai/pahole/tree/btf
> (Arnaldo, there is some BTF_KIND numbering changes on
>  Apr 18th, d61426c1571)

Thanks for letting me know, I'm starting to look at this,

- Arnaldo
 
> Please see individual patch for details.
> 
> v5:
> - Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not
>   currently used.  They can be added in the future.
>   Some bpf_df_xxx() are removed together.
> - Add comment in patch 7 to clarify that the new bpffs_map_fops
>   should not be extended further.
> 
> v4:
> - Fix warning (remove unneeded semicolon)
> - Remove a redundant variable (nr_bytes) from btf_int_check_meta() in
>   patch 1.  Caught by W=1.
> 
> v3:
> - Rebase to bpf-next
> - Fix sparse warning (by adding static)
> - Add BTF header logging: btf_verifier_log_hdr()
> - Fix the alignment test on btf->type_off
> - Add tests for the BTF header
> - Lower the max BTF size to 16MB.  It should be enough
>   for some time.  We could raise it later if it would
>   be needed.
> 
> v2:
> - Use kvfree where needed in patch 1 and 2
> - Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
>   in patch 1
> - Fix an incorrect goto target in map_create() during
>   the btf-error-path in patch 7
> - re-org some local vars to keep the rev xmas tree in btf.c
> 
> Martin KaFai Lau (10):
>   bpf: btf: Introduce BPF Type Format (BTF)
>   bpf: btf: Validate type reference
>   bpf: btf: Check members of struct/union
>   bpf: btf: Add pretty print capability for data with BTF type info
>   bpf: btf: Add BPF_BTF_LOAD command
>   bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
>   bpf: btf: Add pretty print support to the basic arraymap
>   bpf: btf: Sync bpf.h and btf.h to tools/
>   bpf: btf: Add BTF support to libbpf
>   bpf: btf: Add BTF tests
> 
>  include/linux/bpf.h                          |   20 +-
>  include/linux/btf.h                          |   48 +
>  include/uapi/linux/bpf.h                     |   12 +
>  include/uapi/linux/btf.h                     |  130 ++
>  kernel/bpf/Makefile                          |    1 +
>  kernel/bpf/arraymap.c                        |   50 +
>  kernel/bpf/btf.c                             | 2064 ++++++++++++++++++++++++++
>  kernel/bpf/inode.c                           |  156 +-
>  kernel/bpf/syscall.c                         |   51 +-
>  tools/include/uapi/linux/bpf.h               |   12 +
>  tools/include/uapi/linux/btf.h               |  130 ++
>  tools/lib/bpf/Build                          |    2 +-
>  tools/lib/bpf/bpf.c                          |   92 +-
>  tools/lib/bpf/bpf.h                          |   16 +
>  tools/lib/bpf/btf.c                          |  374 +++++
>  tools/lib/bpf/btf.h                          |   22 +
>  tools/lib/bpf/libbpf.c                       |  148 +-
>  tools/lib/bpf/libbpf.h                       |    3 +
>  tools/testing/selftests/bpf/Makefile         |   26 +-
>  tools/testing/selftests/bpf/test_btf.c       | 1669 +++++++++++++++++++++
>  tools/testing/selftests/bpf/test_btf_haskv.c |   48 +
>  tools/testing/selftests/bpf/test_btf_nokv.c  |   43 +
>  22 files changed, 5076 insertions(+), 41 deletions(-)
>  create mode 100644 include/linux/btf.h
>  create mode 100644 include/uapi/linux/btf.h
>  create mode 100644 kernel/bpf/btf.c
>  create mode 100644 tools/include/uapi/linux/btf.h
>  create mode 100644 tools/lib/bpf/btf.c
>  create mode 100644 tools/lib/bpf/btf.h
>  create mode 100644 tools/testing/selftests/bpf/test_btf.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_haskv.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_nokv.c
> 
> -- 
> 2.9.5

^ permalink raw reply

* Re: [PATCH net 2/3] net: sched: ife: handle malformed tlv length
From: David Miller @ 2018-04-19 19:30 UTC (permalink / raw)
  To: aring; +Cc: yotam.gi, jhs, xiyou.wangcong, jiri, yuvalm, netdev, kernel
In-Reply-To: <20180418213534.6215-3-aring@mojatatu.com>

From: Alexander Aring <aring@mojatatu.com>
Date: Wed, 18 Apr 2018 17:35:33 -0400

> @@ -92,12 +92,43 @@ struct meta_tlvhdr {
>  	__be16 len;
>  };
>  
> +static inline bool __ife_tlv_meta_valid(const unsigned char *skbdata,
> +					const unsigned char *ifehdr_end)
> +{

Please do not use inline in foo.c files, let the compiler decide.

^ permalink raw reply

* Re: [PATCH net-next] lan78xx: Add support to dump lan78xx registers
From: David Miller @ 2018-04-19 19:26 UTC (permalink / raw)
  To: raghuramchary.jallipalli; +Cc: netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180418155735.28070-1-raghuramchary.jallipalli@microchip.com>

From: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Date: Wed, 18 Apr 2018 21:27:35 +0530

> +	/* Read Device/MAC registers */
> +	for (i = 0, j = 0; i < (sizeof(lan78xx_regs) / sizeof(u32)); i++, j++)
> +		lan78xx_read_reg(dev, lan78xx_regs[i], &data[j]);

There is no need for two loop variables, both i and j increment over the
same numbers.

^ permalink raw reply

* [Patch net] llc: delete timers synchronously in llc_sk_free()
From: Cong Wang @ 2018-04-19 19:25 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang

The connection timers of an llc sock could be still flying
after we delete them in llc_sk_free(), and even possibly
after we free the sock. We could just wait synchronously
here in case of troubles.

Note, I leave other call paths as they are, since they may
not have to wait, at least we can change them to synchronously
when needed.

Also, move the code to net/llc/llc_conn.c, which is apparently
a better place.

Reported-by: <syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/llc_conn.h |  1 +
 net/llc/llc_c_ac.c     |  9 +--------
 net/llc/llc_conn.c     | 22 +++++++++++++++++++++-
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h
index 5c40f118c0fa..df528a623548 100644
--- a/include/net/llc_conn.h
+++ b/include/net/llc_conn.h
@@ -97,6 +97,7 @@ static __inline__ char llc_backlog_type(struct sk_buff *skb)
 
 struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority,
 			  struct proto *prot, int kern);
+void llc_sk_stop_all_timers(struct sock *sk, bool sync);
 void llc_sk_free(struct sock *sk);
 
 void llc_sk_reset(struct sock *sk);
diff --git a/net/llc/llc_c_ac.c b/net/llc/llc_c_ac.c
index 163121192aca..4d78375f9872 100644
--- a/net/llc/llc_c_ac.c
+++ b/net/llc/llc_c_ac.c
@@ -1099,14 +1099,7 @@ int llc_conn_ac_inc_tx_win_size(struct sock *sk, struct sk_buff *skb)
 
 int llc_conn_ac_stop_all_timers(struct sock *sk, struct sk_buff *skb)
 {
-	struct llc_sock *llc = llc_sk(sk);
-
-	del_timer(&llc->pf_cycle_timer.timer);
-	del_timer(&llc->ack_timer.timer);
-	del_timer(&llc->rej_sent_timer.timer);
-	del_timer(&llc->busy_state_timer.timer);
-	llc->ack_must_be_send = 0;
-	llc->ack_pf = 0;
+	llc_sk_stop_all_timers(sk, false);
 	return 0;
 }
 
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index 110e32bcb399..c0ac522b48a1 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -961,6 +961,26 @@ struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority, struct pr
 	return sk;
 }
 
+void llc_sk_stop_all_timers(struct sock *sk, bool sync)
+{
+	struct llc_sock *llc = llc_sk(sk);
+
+	if (sync) {
+		del_timer_sync(&llc->pf_cycle_timer.timer);
+		del_timer_sync(&llc->ack_timer.timer);
+		del_timer_sync(&llc->rej_sent_timer.timer);
+		del_timer_sync(&llc->busy_state_timer.timer);
+	} else {
+		del_timer(&llc->pf_cycle_timer.timer);
+		del_timer(&llc->ack_timer.timer);
+		del_timer(&llc->rej_sent_timer.timer);
+		del_timer(&llc->busy_state_timer.timer);
+	}
+
+	llc->ack_must_be_send = 0;
+	llc->ack_pf = 0;
+}
+
 /**
  *	llc_sk_free - Frees a LLC socket
  *	@sk - socket to free
@@ -973,7 +993,7 @@ void llc_sk_free(struct sock *sk)
 
 	llc->state = LLC_CONN_OUT_OF_SVC;
 	/* Stop all (possibly) running timers */
-	llc_conn_ac_stop_all_timers(sk, NULL);
+	llc_sk_stop_all_timers(sk, true);
 #ifdef DEBUG_LLC_CONN_ALLOC
 	printk(KERN_INFO "%s: unackq=%d, txq=%d\n", __func__,
 		skb_queue_len(&llc->pdu_unack_q),
-- 
2.13.0

^ permalink raw reply related

* Re: [PATCH] docs: ip-sysctl.txt: fix name of some ipv6 variables
From: David Miller @ 2018-04-19 19:22 UTC (permalink / raw)
  To: olivier.gayot; +Cc: netdev
In-Reply-To: <1524081786-30392-1-git-send-email-olivier.gayot@sigexec.com>

From: Olivier Gayot <olivier.gayot@sigexec.com>
Date: Wed, 18 Apr 2018 22:03:06 +0200

> The name of the following proc/sysctl entries were incorrectly
> documented:
> 
>     /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_number
>     /proc/sys/net/ipv6/conf/<interface>/max_hbt_opts_number
>     /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_length
>     /proc/sys/net/ipv6/conf/<interface>/max_hbt_length
> 
> Their name was set to the name of the symbol in the .data field of the
> control table instead of their .proc name.
> 
> Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>

Good catch, applied, thank you.

^ permalink raw reply

* Re: simplify procfs code for seq_file instances
From: Alexey Dobriyan @ 2018-04-19 18:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Alexander Viro, Greg Kroah-Hartman, Jiri Slaby,
	Corey Minyard, Alessandro Zummo, Alexandre Belloni, linux-acpi,
	drbd-dev, linux-ide, netdev, linux-rtc, megaraidlinux.pdl,
	linux-scsi, devel, linux-afs, linux-ext4, jfs-discussion,
	netfilter-devel, linux-kernel
In-Reply-To: <20180419124140.9309-1-hch@lst.de>

>     git://git.infradead.org/users/hch/misc.git proc_create


I want to ask if it is time to start using poorman function overloading
with _b_c_e(). There are millions of allocation functions for example,
all slightly difference, and people will add more. Seeing /proc interfaces
doubled like this is painful.

^ permalink raw reply

* Re: WARNING in refcount_dec
From: Willem de Bruijn @ 2018-04-19 18:55 UTC (permalink / raw)
  To: DaeRyong Jeong
  Cc: Cong Wang, Byoungyoung Lee, LKML, Kyungtae Kim,
	Linux Kernel Network Developers, Willem de Bruijn
In-Reply-To: <CACsK=jcgPLMydKfRkKKXXUVqAWXwszbvkr=5jYXXPAmigTtszQ@mail.gmail.com>

On Thu, Apr 19, 2018 at 2:32 AM, DaeRyong Jeong <threeearcat@gmail.com> wrote:
> Hello.
> We have analyzed the cause of the crash in v4.16-rc3, WARNING in refcount_dec,
> which is found by RaceFuzzer (a modified version of Syzkaller).
>
> Since struct packet_sock's member variables, running, has_vnet_hdr, origdev
> and auxdata are declared as bitfields, accessing these variables can race if
> there is no synchronization mechanism.

Great catch.

These fields po->{running, auxdata, origdev, has_vnet_hdr} are
accessed without a uniform locking strategy.

po->running is always accessed with po->bind_lock held (with the
exception of reading in packet_seq_show, but that is best effort).

That is the only field written to outside setsockopt. If it is moved to
a separate word, it will no longer interfere with the others.

The other fields are read lockless in the various recv and send
functions, but only set in setsockopt. We've had enough
locking bugs around setsockopt that I suggest we wrap all of
those in lock_sock, like the example I gave before for
has_vnet_hdr.

^ permalink raw reply

* Re: [PATCH 03/39] proc: introduce proc_create_seq_private
From: Alexey Dobriyan @ 2018-04-19 18:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel, linux-scsi,
	Corey Minyard, linux-ide, Greg Kroah-Hartman, jfs-discussion,
	linux-kernel, linux-acpi, netdev, netfilter-devel, Alexander Viro,
	Jiri Slaby, Andrew Morton, linux-ext4, linux-afs,
	megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180419124140.9309-4-hch@lst.de>

On Thu, Apr 19, 2018 at 02:41:04PM +0200, Christoph Hellwig wrote:
> Variant of proc_create_data that directly take a struct seq_operations

> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -45,6 +45,7 @@ struct proc_dir_entry {
>  	const struct inode_operations *proc_iops;
>  	const struct file_operations *proc_fops;
>  	const struct seq_operations *seq_ops;
> +	size_t state_size;

"unsigned int" please.

Where have you seen 4GB priv states?

^ permalink raw reply

* Re: [PATCH 14/39] proc: introduce proc_create_net_single
From: Alexey Dobriyan @ 2018-04-19 18:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel, linux-scsi,
	Corey Minyard, linux-ide, Greg Kroah-Hartman, jfs-discussion,
	linux-kernel, linux-acpi, netdev, netfilter-devel, Alexander Viro,
	Jiri Slaby, Andrew Morton, linux-ext4, linux-afs,
	megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180419124140.9309-15-hch@lst.de>

On Thu, Apr 19, 2018 at 02:41:15PM +0200, Christoph Hellwig wrote:
> Variant of proc_create_data that directly take a seq_file show

> +struct proc_dir_entry *proc_create_net_single(const char *name, umode_t mode,
> +		struct proc_dir_entry *parent,
> +		int (*show)(struct seq_file *, void *), void *data)
> +{
> +	struct proc_dir_entry *p;
> +
> +	p = proc_create_data(name, mode, parent, &proc_net_single_fops, data);
> +	if (p)
> +		p->single_show = show;
> +	return p;
> +}

Ditto, should be oopsable.

^ permalink raw reply

* Re: [PATCH 02/39] proc: introduce proc_create_seq{,_data}
From: Alexey Dobriyan @ 2018-04-19 18:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel, linux-scsi,
	Corey Minyard, linux-ide, Greg Kroah-Hartman, jfs-discussion,
	linux-kernel, linux-acpi, netdev, netfilter-devel, Alexander Viro,
	Jiri Slaby, Andrew Morton, linux-ext4, linux-afs,
	megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180419124140.9309-3-hch@lst.de>

On Thu, Apr 19, 2018 at 02:41:03PM +0200, Christoph Hellwig wrote:
> Variants of proc_create{,_data} that directly take a struct seq_operations
> argument and drastically reduces the boilerplate code in the callers.

> +static int proc_seq_open(struct inode *inode, struct file *file)
> +{
> +	struct proc_dir_entry *de = PDE(inode);
> +
> +	return seq_open(file, de->seq_ops);
> +}
> +
> +static const struct file_operations proc_seq_fops = {
> +	.open		= proc_seq_open,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= seq_release,
> +};
> +
> +struct proc_dir_entry *proc_create_seq_data(const char *name, umode_t mode,
> +		struct proc_dir_entry *parent, const struct seq_operations *ops,
> +		void *data)
> +{
> +	struct proc_dir_entry *p;
> +
> +	p = proc_create_data(name, mode, parent, &proc_seq_fops, data);
> +	if (p)
> +		p->seq_ops = ops;
> +	return p;
> +}

Should be oopsable.
Once proc_create_data() returns, entry is live, ->open can be called.

> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -44,6 +44,7 @@ struct proc_dir_entry {
>  	struct completion *pde_unload_completion;
>  	const struct inode_operations *proc_iops;
>  	const struct file_operations *proc_fops;
> +	const struct seq_operations *seq_ops;
>  	void *data;
>  	unsigned int low_ino;
>  	nlink_t nlink;

"struct proc_dir_entry is 192/128 bytes now.
If someone knows how to pad array to certain size without union
please tell.

^ permalink raw reply

* Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Michael S. Tsirkin @ 2018-04-19 18:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tiwei Bie, alex.williamson, ddutile, alexander.h.duyck,
	virtio-dev, linux-kernel, kvm, virtualization, netdev, dan.daly,
	cunming.liang, zhihong.wang, jianfeng.tan, xiao.w.wang
In-Reply-To: <30a63fff-7599-640a-361f-a27e5783012a@redhat.com>

On Tue, Apr 10, 2018 at 03:25:45PM +0800, Jason Wang wrote:
> > > > One problem is that, different virtio ring compatible devices
> > > > may have different device interfaces. That is to say, we will
> > > > need different drivers in QEMU. It could be troublesome. And
> > > > that's what this patch trying to fix. The idea behind this
> > > > patch is very simple: mdev is a standard way to emulate device
> > > > in kernel.
> > > So you just move the abstraction layer from qemu to kernel, and you still
> > > need different drivers in kernel for different device interfaces of
> > > accelerators. This looks even more complex than leaving it in qemu. As you
> > > said, another idea is to implement userspace vhost backend for accelerators
> > > which seems easier and could co-work with other parts of qemu without
> > > inventing new type of messages.
> > I'm not quite sure. Do you think it's acceptable to
> > add various vendor specific hardware drivers in QEMU?
> > 
> 
> I don't object but we need to figure out the advantages of doing it in qemu
> too.
> 
> Thanks

To be frank kernel is exactly where device drivers belong.  DPDK did
move them to userspace but that's merely a requirement for data path.
*If* you can have them in kernel that is best:
- update kernel and there's no need to rebuild userspace
- apps can be written in any language no need to maintain multiple
  libraries or add wrappers
- security concerns are much smaller (ok people are trying to
  raise the bar with IOMMUs and such, but it's already pretty
  good even without)

The biggest issue is that you let userspace poke at the
device which is also allowed by the IOMMU to poke at
kernel memory (needed for kernel driver to work).

Yes, maybe if device is not buggy it's all fine, but
it's better if we do not have to trust the device
otherwise the security picture becomes more murky.

I suggested attaching a PASID to (some) queues - see my old post "using
PASIDs to enable a safe variant of direct ring access".

Then using IOMMU with VFIO to limit access through queue to corrent
ranges of memory.


-- 
MST

^ permalink raw reply

* Re: [bisected] Stack overflow after fs: "switch the IO-triggering parts of umount to fs_pin" (was net namespaces kernel stack overflow)
From: Alexander Aring @ 2018-04-19 18:37 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: Al Viro, linux-kernel, netdev, Jamal Hadi Salim
In-Reply-To: <188a05bc-de07-c048-6a8a-63dc899cce6d@virtuozzo.com>

Hi,

On Thu, Apr 19, 2018 at 12:56 PM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> On 19.04.2018 19:44, Al Viro wrote:
>> On Thu, Apr 19, 2018 at 04:34:48PM +0100, Al Viro wrote:
>>
>>> IOW, we only get there if our vfsmount was an MNT_INTERNAL one.
>>> So we have mnt->mnt_umount of some MNT_INTERNAL mount found in
>>> ->mnt_pins of some other mount.  Which, AFAICS, means that
>>> it used to be mounted on that other mount.  How the hell can
>>> that happen?
>>>
>>> It looks like you somehow get a long chain of MNT_INTERNAL mounts
>>> stacked on top of each other, which ought to be prevented by
>>>         mnt_flags &= ~MNT_INTERNAL_FLAGS;
>>> in do_add_mount().  Nuts...
>>
>> Arrrrrgh...  Nuts is right - clone_mnt() preserves the sodding
>> MNT_INTERNAL, with obvious results.
>>
>> netns is related to the problem, by exposing MNT_INTERNAL mounts
>> (in /proc/*/ns/*) for mount --bind to copy and attach to the
>> tree.  AFAICS, the minimal reproducer is
>>
>> touch /tmp/a
>> unshare -m sh -c 'for i in `seq 10000`; do mount --bind /proc/1/ns/net /tmp/a; done'
>>
>> (and it can be anything in /proc/*/ns/*, really)
>>
>> I think the fix should be along the lines of the following:
>>
>> Don't leak MNT_INTERNAL away from internal mounts
>>
>> We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
>> their copies.
>>
>> Cc: stable@kernel.org
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> Flawless victory! Thanks.
>

Thanks to all.

Also thanks to Kirill for helping me here and doing the main part by
bisecting this issue.

Finally, my testing stuff which produced this bug also works well now.

Tested-by: Alexander Aring <aring@mojatatu.com>

- Alex

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Vlastimil Babka @ 2018-04-19 18:28 UTC (permalink / raw)
  To: Mikulas Patocka, David Miller, Andrew Morton, linux-mm
  Cc: eric.dumazet, edumazet, bhutchings, netdev, linux-kernel, mst,
	jasowang, virtualization, dm-devel, Laura Abbott
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>

On 04/19/2018 06:12 PM, Mikulas Patocka wrote:
> From: Mikulas Patocka <mpatocka@redhat.com>
> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
> 
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
> 
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
> 
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Hmm AFAIK Fedora uses CONFIG_DEBUG_VM in their kernels. Sure you want to
impose this on all users? Seems too much for DEBUG_VM to me. Maybe it
should be hidden under some error injection config?

> ---
>  mm/util.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c	2018-04-18 15:46:23.000000000 +0200
> +++ linux-2.6/mm/util.c	2018-04-18 16:00:43.000000000 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> +#ifndef CONFIG_DEBUG_VM
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>  	 */
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
> +#endif

Did you verify that vmalloc does the right thing for sub-page sizes?
Shouldn't those be exempted?

>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));
> 

^ permalink raw reply

* Re: [PATCH net-next] net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
From: Alexander Duyck @ 2018-04-19 18:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <20180418184315.48704-1-edumazet@google.com>

On Wed, Apr 18, 2018 at 11:43 AM, Eric Dumazet <edumazet@google.com> wrote:
> After working on IP defragmentation lately, I found that some large
> packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> zero paddings on the last (small) fragment.
>
> While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> fragments had CHECKSUM_COMPLETE set.
>
> We can instead compute the checksum of the part we are trimming,
> usually smaller than the part we keep.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/skbuff.h |  5 ++---
>  net/core/skbuff.c      | 14 ++++++++++++++
>  2 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 9065477ed255a48f7e01b8a28ea6321cce9127f5..d274059529eb5216d041dfdcad4a564a623c8ea0 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3131,6 +3131,7 @@ static inline void *skb_push_rcsum(struct sk_buff *skb, unsigned int len)
>         return skb->data;
>  }
>
> +int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len);
>  /**
>   *     pskb_trim_rcsum - trim received skb and update checksum
>   *     @skb: buffer to trim
> @@ -3144,9 +3145,7 @@ static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len)
>  {
>         if (likely(len >= skb->len))
>                 return 0;
> -       if (skb->ip_summed == CHECKSUM_COMPLETE)
> -               skb->ip_summed = CHECKSUM_NONE;
> -       return __pskb_trim(skb, len);
> +       return pskb_trim_rcsum_slow(skb, len);
>  }
>

I'm wondering if in the past padding was somehow screwing up the
CHECKSUM_COMPLETE value being provided. I wonder if it wouldn't be in
our interest to just consider manually computing the checksum of the
fragment after stripping the padding instead of just subtracting the
offset of the padding.

I guess if we start seeing checksum errors popping up on some devices
we can reevaluate this if necessary.

Thanks.

- Alex

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox