Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 1/2] bnxt_en: do not call napi_hash_add()
From: Michael Chan @ 2016-11-08 21:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1478632013.17367.23.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, Nov 8, 2016 at 11:06 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> This is automatically done from netif_napi_add(), and we want to not
> export napi_hash_add() anymore in the following patch.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Michael Chan <michael.chan@broadcom.com>

Acked-by: Michael Chan <michael.chan@broadcom.com>

^ permalink raw reply

* Re: [PATCH] [RFC] net: phy: phy drivers should not set SUPPORTED_Pause or SUPPORTED_Asym_Pause
From: Timur Tabi @ 2016-11-08 20:43 UTC (permalink / raw)
  To: Florian Fainelli, David Miller, netdev
In-Reply-To: <5820AC1D.4040408@codeaurora.org>

On 11/07/2016 10:30 AM, Timur Tabi wrote:
>
> I'm still don't understand 100% how these flags really work, because I
> just can't shake the feeling that they should not be set for every phy.
>   If these flags are supposed to be turned on universally, then why are
> they even an option?

So I've been giving this more thought.  Can you tell me if the following 
is correct:

1) PHY drivers and/or phylib sets the SUPPORTED_Pause | 
SUPPORTED_AsymPause bits in phydev->supported.  This indicates that the 
PHY supports pause frames.

2) The MAC driver checks phydev->supported before it calls phy_start(). 
  If (SUPPORTED_Pause | SUPPORTED_AsymPause) is set, then it sets those 
bits in phydev->advertising if it wants to enable pause frame support.

3) When the link state changes, the MAC driver checks 
phydev->advertising, and if the bits are set, then it enables those 
features in the MAC.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH v2] irqchip/renesas-irqc: Postpone driver initialization
From: Geert Uytterhoeven @ 2016-11-08 19:50 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Geert Uytterhoeven, Thomas Gleixner, Jason Cooper, Marc Zyngier,
	Simon Horman, Magnus Damm, Linux-Renesas,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <216fe100-556b-7f83-39b0-4cf54c1fa35c@gmail.com>

Hi Florian,

On Tue, Nov 8, 2016 at 8:42 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 11/08/2016 11:35 AM, Geert Uytterhoeven wrote:
>> Currently the renesas-irqc driver uses postcore_initcall().
>>
>> However, the new CPG/MSSR driver uses subsys_initcall(). Hence the
>> IRQC's probe will be deferred, which causes the Micrel Ethernet PHY to
>> not find its interrupt on R-Car Gen2 and RZ/G, as the of_mdio subsystem
>> does not support deferred probe yet.
>
> Is not that the more correct fix to implement though?

Sure it is. But nothing has happened since this was reported ca. 1 year ago.
Cfr. "of_mdiobus_register_phy() and deferred probe"
https://lkml.org/lkml/2015/10/22/377

My MDIO foo is not that strong...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: net/l2tp: use-after-free write in l2tp_ip6_close
From: Andrey Konovalov @ 2016-11-08 19:45 UTC (permalink / raw)
  To: syzkaller
  Cc: David S. Miller, Eric Dumazet, Willem de Bruijn,
	Hannes Frederic Sowa, Soheil Hassas Yeganeh, Shmulik Ladkani,
	Wei Wang, Haishuang Yan, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, Alexander Potapenko
In-Reply-To: <CAM_iQpVkKyEsPDse5wnM=WGbbXwSWE5fBSHwonhS3a3i4A-Twg@mail.gmail.com>

Hi Cong,

Tried with your patch, still seeing the reports.

Thanks!

On Tue, Nov 8, 2016 at 12:02 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Nov 7, 2016 at 2:35 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer:
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in l2tp_ip6_close+0x239/0x2a0 at addr
>> ffff8800677276d8
>> Write of size 8 by task a.out/8668
>> CPU: 0 PID: 8668 Comm: a.out Not tainted 4.9.0-rc4+ #354
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>  ffff8800694d7b00 ffffffff81b46a64 ffff88006adb5780 ffff8800677276c0
>>  ffff880067727c68 ffff8800677276c0 ffff8800694d7b28 ffffffff8150a86c
>>  ffff8800694d7bb8 ffff88006adb5780 ffff8800e77276d8 ffff8800694d7ba8
>> Call Trace:
>>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>>  [<ffffffff81b46a64>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
>>  [<ffffffff8150a86c>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
>>  [<     inline     >] print_address_description mm/kasan/report.c:194
>>  [<ffffffff8150ab07>] kasan_report_error+0x1f7/0x4d0 mm/kasan/report.c:283
>>  [<     inline     >] kasan_report mm/kasan/report.c:303
>>  [<ffffffff8150b01e>] __asan_report_store8_noabort+0x3e/0x40
>> mm/kasan/report.c:329
>>  [<     inline     >] __write_once_size ./include/linux/compiler.h:272
>>  [<     inline     >] __hlist_del ./include/linux/list.h:622
>>  [<     inline     >] hlist_del_init ./include/linux/list.h:637
>>  [<ffffffff83825f49>] l2tp_ip6_close+0x239/0x2a0 net/l2tp/l2tp_ip6.c:239
>>  [<ffffffff8316b31f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
>>  [<ffffffff832cd4d0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
>>  [<ffffffff82b6d89e>] sock_release+0x8e/0x1d0 net/socket.c:570
>>  [<ffffffff82b6d9f6>] sock_close+0x16/0x20 net/socket.c:1017
>>  [<ffffffff81524bdd>] __fput+0x29d/0x720 fs/file_table.c:208
>>  [<ffffffff815250e5>] ____fput+0x15/0x20 fs/file_table.c:244
>>  [<ffffffff81172928>] task_work_run+0xf8/0x170 kernel/task_work.c:116
>>  [<     inline     >] exit_task_work ./include/linux/task_work.h:21
>>  [<ffffffff8111bda3>] do_exit+0x883/0x2ac0 kernel/exit.c:828
>>  [<ffffffff8112234e>] do_group_exit+0x10e/0x340 kernel/exit.c:931
>>  [<     inline     >] SYSC_exit_group kernel/exit.c:942
>>  [<ffffffff8112259d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
>>  [<ffffffff83fc1501>] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>
> I guess we need to lock the sock for l2tp_ip6_disconnect() too.
>
> diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
> index ad3468c..ea2ae66 100644
> --- a/net/l2tp/l2tp_ip6.c
> +++ b/net/l2tp/l2tp_ip6.c
> @@ -410,7 +410,7 @@ static int l2tp_ip6_disconnect(struct sock *sk, int flags)
>         if (sock_flag(sk, SOCK_ZAPPED))
>                 return 0;
>
> -       return __udp_disconnect(sk, flags);
> +       return udp_disconnect(sk, flags);
>  }
>
>  static int l2tp_ip6_getname(struct socket *sock, struct sockaddr *uaddr,
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH v2] irqchip/renesas-irqc: Postpone driver initialization
From: Florian Fainelli @ 2016-11-08 19:42 UTC (permalink / raw)
  To: Geert Uytterhoeven, Thomas Gleixner, Jason Cooper, Marc Zyngier
  Cc: Simon Horman, Magnus Damm, linux-renesas-soc, linux-kernel,
	netdev
In-Reply-To: <1478633747-26878-1-git-send-email-geert+renesas@glider.be>

On 11/08/2016 11:35 AM, Geert Uytterhoeven wrote:
> Currently the renesas-irqc driver uses postcore_initcall().
> 
> However, the new CPG/MSSR driver uses subsys_initcall(). Hence the
> IRQC's probe will be deferred, which causes the Micrel Ethernet PHY to
> not find its interrupt on R-Car Gen2 and RZ/G, as the of_mdio subsystem
> does not support deferred probe yet.

Is not that the more correct fix to implement though?
-- 
Florian

^ permalink raw reply

* [PATCH v2] irqchip/renesas-irqc: Postpone driver initialization
From: Geert Uytterhoeven @ 2016-11-08 19:35 UTC (permalink / raw)
  To: Thomas Gleixner, Jason Cooper, Marc Zyngier
  Cc: Florian Fainelli, Simon Horman, Magnus Damm, linux-renesas-soc,
	linux-kernel, netdev, Geert Uytterhoeven

Currently the renesas-irqc driver uses postcore_initcall().

However, the new CPG/MSSR driver uses subsys_initcall(). Hence the
IRQC's probe will be deferred, which causes the Micrel Ethernet PHY to
not find its interrupt on R-Car Gen2 and RZ/G, as the of_mdio subsystem
does not support deferred probe yet.

Replace postcore_initcall() by device_initcall() to work around this.

Note that on R-Mobile APE6, where the PFC/GPIO combo uses the IRQC as
its parent interrupt controller, this does cause a few additional probe
deferrals (for SCIFA0, SD0, SD1, and MMC). But the affected drivers
handle that fine.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
---
v2:
  - Drop RFC state,
  - Add Tested-by,
  - Improved description.
---
 drivers/irqchip/irq-renesas-irqc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-renesas-irqc.c b/drivers/irqchip/irq-renesas-irqc.c
index 52304b139aa46a60..992849e54d00ea77 100644
--- a/drivers/irqchip/irq-renesas-irqc.c
+++ b/drivers/irqchip/irq-renesas-irqc.c
@@ -295,7 +295,7 @@ static int __init irqc_init(void)
 {
 	return platform_driver_register(&irqc_device_driver);
 }
-postcore_initcall(irqc_init);
+device_initcall(irqc_init);

 static void __exit irqc_exit(void)
 {
-- 
1.9.1

^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
From: Alexander Duyck @ 2016-11-08 19:33 UTC (permalink / raw)
  To: Hisashi T Fujinaka, Cao jin, Netdev, intel-wired-lan,
	linux-kernel@vger.kernel.org, Izumi, Taku/泉 拓
In-Reply-To: <20161108183739.GA3744@calimero.vinschen.de>

On Tue, Nov 8, 2016 at 10:37 AM, Corinna Vinschen <vinschen@redhat.com> wrote:
> On Nov  8 09:16, Hisashi T Fujinaka wrote:
>> On Tue, 8 Nov 2016, Corinna Vinschen wrote:
>> > On Nov  8 15:06, Cao jin wrote:
>> > > When running as guest, under certain condition, it will oops as following.
>> > > writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr
>> > > is NULL. While other register access won't oops kernel because they use
>> > > wr32/rd32 which have a defense against NULL pointer.
>> > > [...]
>> >
>> > Incidentally we're just looking for a solution to that problem too.
>> > Do three patches to fix the same problem at rougly the same time already
>> > qualify as freak accident?
>> >
>> > FTR, I attached my current patch, which I was planning to submit after
>> > some external testing.
>> >
>> > However, all three patches have one thing in common:  They workaround
>> > a somewhat dubious resetting of the hardware address to NULL in case
>> > reading from a register failed.
>> >
>> > That makes me wonder if setting the hardware address to NULL in
>> > rd32/igb_rd32 is really such a good idea.  It's performed in a function
>> > which return value is *never* tested for validity in the calling
>> > functions and leads to subsequent crashes since no tests for hw_addr ==
>> > NULL are performed.
>> >
>> > Maybe commit 22a8b2915 should be reconsidered?  Isn't there some more
>> > graceful way to handle the "surprise removal"?
>>
>> Answering this from my home account because, well, work is Outlook.
>>
>> "Reconsidering" would be great. In fact, revert if if you'd like. I'm
>> uncertain that the surprise removal code actually works the way I
>> thought previously and I think I took a lot of it out of my local code.
>>
>> Unfortuantely I don't have any equipment that I can use to reproduce
>> surprise removal any longer so that means I wouldn't be able to test
>> anything. I have to defer to you or Cao Jin.
>
> I'm not too keen to rip out a PCIe NIC under power from my locale
> desktop machine, but I think an actual surprise removal is not the
> problem.
>
> As described in my git log entry, the error condition in igb_rd32 can be
> triggered during a suspend.  The HW has been put into a sleep state but
> some register read requests are apparently not guarded against that
> situation.  Reading a register in this state returns -1, thus a suspend
> is erroneously triggering the "surprise removal" sequence.

The question I would have is what is reading the device when it is in
this state.  The watchdog and any other functions that would read the
device should be disabled.

One possibility could be a race between a call to igb_close and the
igb_suspend function.  We have seen some of those pop up recently on
ixgbe and it looks like igb has the same bug.  We should probably be
using the rtnl_lock to guarantee that netif_device_detach and the call
to __igb_close are completed before igb_close could possibly be called
by the network stack.

> Here's a raw idea:
>
> - Note that device is suspended in e1000_hw struct.  Don't trigger
>   error sequence in igb_rd32 if so (...and return a 0 value???)

The thing is that a suspended device should not be accessed at all.
If we are accessing it while it is suspended then that is a bug.  If
you could throw a WARN_ON call in igb_rd32 to capture where this is
being triggered that might be useful.

> - Otherwise assume it's actually a surprise removal.  In theory that
>   should somehow trigger a device removal sequence, kind of like
>   calling igb_remove, no?

Well a read of the MMIO region while suspended is more of a surprise
read since there shouldn't be anything going on.  We need to isolate
where that read is coming from and fix it.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH net-next v4] cadence: Add LSO support.
From: Florian Fainelli @ 2016-11-08 19:09 UTC (permalink / raw)
  To: Rafal Ozieblo, nicolas.ferre, netdev, linux-kernel
In-Reply-To: <1478612463-15076-1-git-send-email-rafalo@cadence.com>

On 11/08/2016 05:41 AM, Rafal Ozieblo wrote:
> New Cadence GEM hardware support Large Segment Offload (LSO):
> TCP segmentation offload (TSO) as well as UDP fragmentation
> offload (UFO). Support for those features was added to the driver.
> 
> Signed-off-by: Rafal Ozieblo <rafalo@cadence.com>
> ---

> -#define MACB_MAX_TX_LEN		((unsigned int)((1 << MACB_TX_FRMLEN_SIZE) - 1))
> -#define GEM_MAX_TX_LEN		((unsigned int)((1 << GEM_TX_FRMLEN_SIZE) - 1))
> +/* Max length of transmit frame must be a multiple of 8 bytes */
> +#define MACB_TX_LEN_ALIGN	8
> +#define MACB_MAX_TX_LEN		((unsigned int)((1 << MACB_TX_FRMLEN_SIZE) - 1) & ~((unsigned int)(MACB_TX_LEN_ALIGN - 1)))
> +#define GEM_MAX_TX_LEN		((unsigned int)((1 << GEM_TX_FRMLEN_SIZE) - 1) & ~((unsigned int)(MACB_TX_LEN_ALIGN - 1)))
>  
>  #define GEM_MTU_MIN_SIZE	ETH_MIN_MTU
> +#define MACB_NETIF_LSO		(NETIF_F_TSO | NETIF_F_UFO)

Not a huge fan of this definition, since it is always used in conjuction
with netdev_features_t, having it expanded all the time is kind of nicer
for the reader, but this is just personal preference here.

>  
>  #define MACB_WOL_HAS_MAGIC_PACKET	(0x1 << 0)
>  #define MACB_WOL_ENABLED		(0x1 << 1)
> @@ -1223,7 +1228,8 @@ static void macb_poll_controller(struct net_device *dev)
>  
>  static unsigned int macb_tx_map(struct macb *bp,
>  				struct macb_queue *queue,
> -				struct sk_buff *skb)
> +				struct sk_buff *skb,
> +				unsigned int hdrlen)
>  {
>  	dma_addr_t mapping;
>  	unsigned int len, entry, i, tx_head = queue->tx_head;
> @@ -1231,14 +1237,27 @@ static unsigned int macb_tx_map(struct macb *bp,
>  	struct macb_dma_desc *desc;
>  	unsigned int offset, size, count = 0;
>  	unsigned int f, nr_frags = skb_shinfo(skb)->nr_frags;
> -	unsigned int eof = 1;
> -	u32 ctrl;
> +	unsigned int eof = 1, mss_mfs = 0;
> +	u32 ctrl, lso_ctrl = 0, seq_ctrl = 0;
> +
> +	/* LSO */
> +	if (skb_shinfo(skb)->gso_size != 0) {
> +		if (IPPROTO_UDP == (ip_hdr(skb)->protocol))

Most checks are usually done the other way with the left and right
member swapped.

> +			/* UDP - UFO */
> +			lso_ctrl = MACB_LSO_UFO_ENABLE;
> +		else
> +			/* TCP - TSO */
> +			lso_ctrl = MACB_LSO_TSO_ENABLE;
> +	}

>  
>  	/* Then, map paged data from fragments */
> @@ -1311,6 +1332,20 @@ static unsigned int macb_tx_map(struct macb *bp,
>  	desc = &queue->tx_ring[entry];
>  	desc->ctrl = ctrl;
>  
> +	if (lso_ctrl) {
> +		if (lso_ctrl == MACB_LSO_UFO_ENABLE)
> +			/* include header and FCS in value given to h/w */
> +			mss_mfs = skb_shinfo(skb)->gso_size +
> +					skb_transport_offset(skb) + 4;

ETH_FCS_LEN instead of 4?


> +static netdev_features_t macb_features_check(struct sk_buff *skb,
> +					     struct net_device *dev,
> +					     netdev_features_t features)
> +{
> +	unsigned int nr_frags, f;
> +	unsigned int hdrlen;
> +
> +	/* Validate LSO compatibility */
> +
> +	/* there is only one buffer */
> +	if (!skb_is_nonlinear(skb))
> +		return features;
> +
> +	/* length of header */
> +	hdrlen = skb_transport_offset(skb);
> +	if (IPPROTO_TCP == (ip_hdr(skb)->protocol))
> +		hdrlen += tcp_hdrlen(skb);

Same here, please reverse the left and right members, no need for
parenthesis aground ip_hdr(skb)->protocol.

> +
> +	/* For LSO:
> +	 * When software supplies two or more payload buffers all payload buffers
> +	 * apart from the last must be a multiple of 8 bytes in size.
> +	 */
> +	if (!IS_ALIGNED(skb_headlen(skb) - hdrlen, MACB_TX_LEN_ALIGN))
> +		return features & ~MACB_NETIF_LSO;
> +
> +	nr_frags = skb_shinfo(skb)->nr_frags;
> +	/* No need to check last fragment */
> +	nr_frags--;
> +	for (f = 0; f < nr_frags; f++) {
> +		const skb_frag_t *frag = &skb_shinfo(skb)->frags[f];
> +
> +		if (!IS_ALIGNED(skb_frag_size(frag), MACB_TX_LEN_ALIGN))
> +			return features & ~MACB_NETIF_LSO;
> +	}
> +	return features;
> +}
> +
>  static inline int macb_clear_csum(struct sk_buff *skb)
>  {
>  	/* no change for packets without checksum offloading */
> @@ -1374,7 +1456,27 @@ static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct macb *bp = netdev_priv(dev);
>  	struct macb_queue *queue = &bp->queues[queue_index];
>  	unsigned long flags;
> -	unsigned int count, nr_frags, frag_size, f;
> +	unsigned int desc_cnt, nr_frags, frag_size, f;
> +	unsigned int is_lso = 0, is_udp, hdrlen;
> +
> +	is_lso = (skb_shinfo(skb)->gso_size != 0);
> +
> +	if (is_lso) {
> +		is_udp = (IPPROTO_UDP == (ip_hdr(skb)->protocol));

Same here, and you may want to declare is_udp as boolean and do this:

		is_udp = !!(ip_hdr(skb)->protocl == IPPROTO_UDP);

> +
> +		/* length of headers */
> +		if (is_udp)
> +			/* only queue eth + ip headers separately for UDP */
> +			hdrlen = skb_transport_offset(skb);
> +		else
> +			hdrlen = skb_transport_offset(skb) + tcp_hdrlen(skb);
> +		if (skb_headlen(skb) < hdrlen) {
> +			netdev_err(bp->dev, "Error - LSO headers fragmented!!!\n");
> +			/* if this is required, would need to copy to single buffer */
> +			return NETDEV_TX_BUSY;
> +		}

>  
> +	if (is_lso) {
> +		if (is_udp)
> +			/* zero UDP checksum, not calculated by h/w for UFO */
> +			udp_hdr(skb)->check = 0;

is_udp is only set when (is_lso) is checked, so the two conditions are
redundant, just checking is_udp should be enough?
-- 
Florian

^ permalink raw reply

* [PATCH net-next 2/2] net: napi_hash_add() is no longer exported
From: Eric Dumazet @ 2016-11-08 19:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Michael Chan

From: Eric Dumazet <edumazet@google.com>

There are no more users except from net/core/dev.c
napi_hash_add() can now be static.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
---
 include/linux/netdevice.h |   11 -----------
 net/core/dev.c            |    3 +--
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 66fd61c681d90d4a7ecc3bf7bae44c2b3b1fe10c..d64135a0ab718b9e119646b74f92901e8fe4b356 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -468,17 +468,6 @@ static inline void napi_complete(struct napi_struct *n)
 }
 
 /**
- *	napi_hash_add - add a NAPI to global hashtable
- *	@napi: NAPI context
- *
- * Generate a new napi_id and store a @napi under it in napi_hash.
- * Used for busy polling (CONFIG_NET_RX_BUSY_POLL).
- * Note: This is normally automatically done from netif_napi_add(),
- * so might disappear in a future Linux version.
- */
-void napi_hash_add(struct napi_struct *napi);
-
-/**
  *	napi_hash_del - remove a NAPI from global table
  *	@napi: NAPI context
  *
diff --git a/net/core/dev.c b/net/core/dev.c
index 0260ad314506c621215a0e3449392bc01aad55ca..e2148d9845868d7ca5d2c6b853cb0fbe18b32163 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5017,7 +5017,7 @@ EXPORT_SYMBOL(sk_busy_loop);
 
 #endif /* CONFIG_NET_RX_BUSY_POLL */
 
-void napi_hash_add(struct napi_struct *napi)
+static void napi_hash_add(struct napi_struct *napi)
 {
 	if (test_bit(NAPI_STATE_NO_BUSY_POLL, &napi->state) ||
 	    test_and_set_bit(NAPI_STATE_HASHED, &napi->state))
@@ -5037,7 +5037,6 @@ void napi_hash_add(struct napi_struct *napi)
 
 	spin_unlock(&napi_hash_lock);
 }
-EXPORT_SYMBOL_GPL(napi_hash_add);
 
 /* Warning : caller is responsible to make sure rcu grace period
  * is respected before freeing memory containing @napi

^ permalink raw reply related

* [PATCH net-next 1/2] bnxt_en: do not call napi_hash_add()
From: Eric Dumazet @ 2016-11-08 19:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Michael Chan

From: Eric Dumazet <edumazet@google.com>

This is automatically done from netif_napi_add(), and we want to not
export napi_hash_add() anymore in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index a042da1ff4b90e9aae4f76db71c99c2c4da321d3..d313b02485a10b2b7995076578ce3632865b475f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4954,7 +4954,6 @@ static void bnxt_init_napi(struct bnxt *bp)
 			bnapi = bp->bnapi[cp_nr_rings];
 			netif_napi_add(bp->dev, &bnapi->napi,
 				       bnxt_poll_nitroa0, 64);
-			napi_hash_add(&bnapi->napi);
 		}
 	} else {
 		bnapi = bp->bnapi[0];

^ permalink raw reply related

* Re: [PATCH 13/17] batman-adv: Consume skb in receive handlers
From: Sven Eckelmann @ 2016-11-08 19:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Simon Wunderlich, davem, netdev, b.a.t.m.a.n
In-Reply-To: <1478626981.17367.16.camel@edumazet-glaptop3.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 302 bytes --]

On Dienstag, 8. November 2016 09:43:01 CET Eric Dumazet wrote:
[...]
> Sure, but your patch 13/17 should address this right away.
[...]

Fair enough. I've asked Simon to resubmit the patches with the
"consume_skb -> conditional kfree_skb/consume_skb" patch squashed
into patch 13.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH v3 4/4] posix-timers: make it configurable
From: John Stultz @ 2016-11-08 18:48 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Michal Marek, Richard Cochran, Paul Bolle, Thomas Gleixner,
	Josh Triplett, Edward Cree, netdev, linux-kbuild, lkml
In-Reply-To: <alpine.LFD.2.20.1611081311420.11818@knanqh.ubzr>

On Tue, Nov 8, 2016 at 10:19 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 8 Nov 2016, John Stultz wrote:
>
>> One spot of concern is that the
>> tools/testing/selftests/timers/posix_timers.c test hangs testing
>> virtual itimers. Looking through the code I'm not seeing where an
>> error case is missed.
>>
>> The strace looks like:
>> ...
>> write(1, "Testing posix timers. False nega"..., 66Testing posix
>> timers. False negative may happen on CPU execution
>> ) = 66
>> write(1, "based timers if other threads ru"..., 48based timers if
>> other threads run on the CPU...
>> ) = 48
>> write(1, "Check itimer virtual... ", 24Check itimer virtual... ) = 24
>> rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
>> 0x7fb73306ccb0}, {SIG_DFL, [], 0}, 8) = 0
>> gettimeofday({1478710402, 937476}, NULL) = 0
>> setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
>> <Hang>
>>
>>
>> Where as with posix timers enabled:
>> ...
>> write(1, "Testing posix timers. False nega"..., 138Testing posix
>> timers. False negative may happen on CPU execution
>> based timers if other threads run on the CPU...
>> Check itimer virtual... ) = 138
>> rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
>> 0x7f231ba8ccb0}, {SIG_DFL, [], 0}, 8) = 0
>> gettimeofday({1478626751, 904856}, NULL) = 0
>> setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
>> --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_KERNEL} ---
>> rt_sigreturn()                          = 0
>
> I'll have a look.
>
>> So I suspect you were a little too aggressive with the #ifdefs around
>> the itimers/signal code, or we need to make sure we return an error on
>> the setitimer ITIMER_VIRTUAL case as well.
>
> Well, it seemed to me that with POSIX_TIMERS=n, all the code that would
> set up that signal is gone, so there was no point keeping the code to
> deliver it.
>
> Now... would it make more sense to remove itimer support as well when
> POSIX_TIMERS=n?  The same reasoning would apply.

Yes, returning an error with itimers seems needed if the signal bits
are missing.

Though I do worry that since getitimer/setitimer are older obsolete
interfaces which the posix timers api is supposed to replace, folks
might be surprised to see it removed when setting POSIX_TIMERS=n. So
some additional notes in the kconfig description may be needed.

thanks
-john

^ permalink raw reply

* Re: net/sctp: null-ptr-deref in sctp_inet_listen
From: Andrey Konovalov @ 2016-11-08 18:46 UTC (permalink / raw)
  To: Xin Long
  Cc: Vlad Yasevich, Neil Horman, David S. Miller, linux-sctp, netdev,
	LKML, Dmitry Vyukov, Alexander Potapenko, Kostya Serebryany,
	Eric Dumazet, syzkaller, Marcelo Ricardo Leitner
In-Reply-To: <CADvbK_eyG3pLMOws5wAOpvDUPUB2SknOCV5z6TLfJoKfj+Q27A@mail.gmail.com>

Hi Xin,

Your patch seems to be fixing the issue.

Tested-by: Andrey Konovalov <andreyknvl@google.com>

Thanks!

On Tue, Nov 8, 2016 at 11:06 AM, Xin Long <lucien.xin@gmail.com> wrote:
> On Tue, Nov 8, 2016 at 5:44 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer:
>>
>> kasan: CONFIG_KASAN_INLINE enabled
>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> general protection fault: 0000 [#1] SMP KASAN
>> Modules linked in:
>> CPU: 1 PID: 3851 Comm: a.out Not tainted 4.9.0-rc4+ #354
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: ffff880065f1d800 task.stack: ffff880063840000
>> RIP: 0010:[<ffffffff8394151b>]  [<ffffffff8394151b>]
>> sctp_inet_listen+0x29b/0x790 net/sctp/socket.c:6870
>> RSP: 0018:ffff880063847dd0  EFLAGS: 00010202
>> RAX: dffffc0000000000 RBX: 1ffff1000c708fbd RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
>> RBP: ffff880063847e70 R08: dffffc0000000000 R09: dffffc0000000000
>> R10: 0000000000000002 R11: 0000000000000002 R12: ffff88006b350800
>> R13: 0000000000000000 R14: 1ffff1000d66a1a5 R15: 0000000000000000
>> FS:  00007fd1f0f3d7c0(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000020000000 CR3: 0000000064af9000 CR4: 00000000000006e0
>> Stack:
>>  ffff880063847de0 ffff880066165900 ffff88006b350d20 0000000041b58ab3
>>  ffffffff847ff589 ffffffff83941280 dffffc0000000000 0000000000000000
>>  ffff880069b9f740 0000000000000000 ffff880063847e38 ffffffff819f04ef
>> Call Trace:
>>  [<     inline     >] SYSC_listen net/socket.c:1396
>>  [<ffffffff82b73cf6>] SyS_listen+0x206/0x250 net/socket.c:1382
>>  [<ffffffff83fc1501>] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Code: 00 0f 85 f4 04 00 00 4d 8b ac 24 28 05 00 00 49 b8 00 00 00 00
>> 00 fc ff df 49 8d 7d 02 48 89 fe 49 89 fa 48 c1 ee 03 41 83 e2 07 <46>
>> 0f b6 0c 06 41 83 c2 01 45 38 ca 7c 09 45 84 c9 0f 85 87 04
>> RIP  [<ffffffff8394151b>] sctp_inet_listen+0x29b/0x790 net/sctp/socket.c:6870
>>  RSP <ffff880063847dd0>
>> ---[ end trace f2b501fc22999b37 ]---
>>
>> A reproducer is attached.
>>
>> On commit bc33b0ca11e3df467777a4fa7639ba488c9d4911 (Nov 5).
>>
> This is a shutdown injection issue.
> sctp_shutdown need a sk->state check, just like tcp_shutdown:
>
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -4287,7 +4287,8 @@ static void sctp_shutdown(struct sock *sk, int how)
>         if (!sctp_style(sk, TCP))
>                 return;
>
> -       if (how & SEND_SHUTDOWN) {
> +       if (how & SEND_SHUTDOWN &&
> +           (1 << sk->sk_state) & (SCTP_SS_ESTABLISHED | SCTP_SS_CLOSING)) {
>                 sk->sk_state = SCTP_SS_CLOSING;
>                 ep = sctp_sk(sk)->ep;
>                 if (!list_empty(&ep->asocs)) {

^ permalink raw reply

* Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function
From: Alan Stern @ 2016-11-08 18:44 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Kai-Heng Feng, Oliver Neukum, linux-kernel, linux-usb, netdev
In-Reply-To: <87eg2lkjce.fsf@miraculix.mork.no>

On Tue, 8 Nov 2016, Bjørn Mork wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Tue, 8 Nov 2016, Kai-Heng Feng wrote:
> >
> >> Hi,
> >> 
> >> On Mon, Nov 7, 2016 at 7:02 PM, Oliver Neukum <oneukum@suse.com> wrote:
> >> > On Fri, 2016-11-04 at 17:57 +0800, Kai-Heng Feng wrote:
> >> >> Sometimes cdc_mbim failed to probe if runtime pm is enabled:
> >> >> [    9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22
> >> >>
> >> >> This can be solved by increase its pm usage counter.
> >> >>
> >> >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >> >
> >> > For the record:
> >> >
> >> > NAK. This fixes a symptom. If this patch helps something is broken in
> >> > device core. We need to find that.
> >> >
> >> 
> >> Please check attached dmesg with usbcore.dyndbg="+p".
> >
> > The log shows that the device went into suspend _before_ the cdc_mbim 
> > driver was probed, not during the probe.  Then just before the probe 
> > was started, the USB core tried to resume the device and the resume 
> > failed.
> >
> > The log shows a bunch of other problems with this device:
> >
> > [    3.862253] usb 2-2: config 1 has an invalid interface number: 12 but max is 1
> > [    3.862254] usb 2-2: config 1 has an invalid interface number: 13 but max is 1
> > [    3.862254] usb 2-2: config 1 has an invalid interface number: 13 but max is 1
> > [    3.862255] usb 2-2: config 1 has no interface number 0
> > [    3.862256] usb 2-2: config 1 has no interface number 1
> 
> These messages are completely harmless and normal for Sierra Wireless
> devices.  They use the interface number to identify the type of
> function, causing this mismatch between the number of interfaces and the
> inteface numbers. Boy, that looks weird in writing :)
> 
> Ref this discussion we had a few years ago:
> http://www.spinics.net/lists/linux-usb/msg77499.html
> 
> No, I didn't expect you to remember that :)

You're right; I didn't remember it.  But seeing those messages again in
the mailing list archives, they do look a little familiar.

> > [    8.295180] usb 2-2: Disable of device-initiated U1 failed.
> > [    8.295322] usb 2-2: Disable of device-initiated U2 failed.
> >
> > I get the impression that the device won't work properly with runtime 
> > PM at all.
> 
> I suspect the device is an EM7455?  If so, then it does work fine with
> runtime PM, as long as we're talking USB2.  Not sure about USB3 runtime
> PM though.  Cannot test it. The Lenovo laptop I got with one of these
> modems has disabled the USB3 link on the m.2 modem slot for some reason.

These problems could very well be caused by running at SuperSpeed
(USB-3) instead of high speed (USB-2).

Is there any way to test what happens when the device is attached to 
the computer by a USB-2 cable?  That would prevent it from operating at 
SuperSpeed.

The main point, however, is that the proposed patch doesn't seem to
address the true problem, which is that the device gets suspended
between probes.  The patch only tries to prevent it from being
suspended during a probe -- which is already prevented by the USB core.

Alan Stern

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
From: Corinna Vinschen @ 2016-11-08 18:37 UTC (permalink / raw)
  To: Hisashi T Fujinaka
  Cc: Cao jin, netdev, intel-wired-lan, linux-kernel, izumi.taku
In-Reply-To: <alpine.NEB.2.20.17.1611080910530.6177@chris.i8u.org>

[-- Attachment #1: Type: text/plain, Size: 2724 bytes --]

On Nov  8 09:16, Hisashi T Fujinaka wrote:
> On Tue, 8 Nov 2016, Corinna Vinschen wrote:
> > On Nov  8 15:06, Cao jin wrote:
> > > When running as guest, under certain condition, it will oops as following.
> > > writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr
> > > is NULL. While other register access won't oops kernel because they use
> > > wr32/rd32 which have a defense against NULL pointer.
> > > [...]
> > 
> > Incidentally we're just looking for a solution to that problem too.
> > Do three patches to fix the same problem at rougly the same time already
> > qualify as freak accident?
> > 
> > FTR, I attached my current patch, which I was planning to submit after
> > some external testing.
> > 
> > However, all three patches have one thing in common:  They workaround
> > a somewhat dubious resetting of the hardware address to NULL in case
> > reading from a register failed.
> > 
> > That makes me wonder if setting the hardware address to NULL in
> > rd32/igb_rd32 is really such a good idea.  It's performed in a function
> > which return value is *never* tested for validity in the calling
> > functions and leads to subsequent crashes since no tests for hw_addr ==
> > NULL are performed.
> > 
> > Maybe commit 22a8b2915 should be reconsidered?  Isn't there some more
> > graceful way to handle the "surprise removal"?
> 
> Answering this from my home account because, well, work is Outlook.
> 
> "Reconsidering" would be great. In fact, revert if if you'd like. I'm
> uncertain that the surprise removal code actually works the way I
> thought previously and I think I took a lot of it out of my local code.
> 
> Unfortuantely I don't have any equipment that I can use to reproduce
> surprise removal any longer so that means I wouldn't be able to test
> anything. I have to defer to you or Cao Jin.

I'm not too keen to rip out a PCIe NIC under power from my locale
desktop machine, but I think an actual surprise removal is not the
problem.

As described in my git log entry, the error condition in igb_rd32 can be
triggered during a suspend.  The HW has been put into a sleep state but
some register read requests are apparently not guarded against that
situation.  Reading a register in this state returns -1, thus a suspend
is erroneously triggering the "surprise removal" sequence.

Here's a raw idea:

- Note that device is suspended in e1000_hw struct.  Don't trigger
  error sequence in igb_rd32 if so (...and return a 0 value???)

- Otherwise assume it's actually a surprise removal.  In theory that
  should somehow trigger a device removal sequence, kind of like
  calling igb_remove, no?


Thanks,
Corinna

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
From: Hisashi T Fujinaka @ 2016-11-08 18:32 UTC (permalink / raw)
  To: Corinna Vinschen
  Cc: Cao jin, netdev, intel-wired-lan, linux-kernel, izumi.taku
In-Reply-To: <alpine.NEB.2.20.17.1611080910530.6177@chris.i8u.org>

On Tue, 8 Nov 2016, Hisashi T Fujinaka wrote:

>> Incidentally we're just looking for a solution to that problem too.
>> Do three patches to fix the same problem at rougly the same time already
>> qualify as freak accident?
>> 
>> FTR, I attached my current patch, which I was planning to submit after
>> some external testing.
>> 
>> However, all three patches have one thing in common:  They workaround
>> a somewhat dubious resetting of the hardware address to NULL in case
>> reading from a register failed.
>> 
>> That makes me wonder if setting the hardware address to NULL in
>> rd32/igb_rd32 is really such a good idea.  It's performed in a function
>> which return value is *never* tested for validity in the calling
>> functions and leads to subsequent crashes since no tests for hw_addr ==
>> NULL are performed.
>> 
>> Maybe commit 22a8b2915 should be reconsidered?  Isn't there some more
>> graceful way to handle the "surprise removal"?
>
> Answering this from my home account because, well, work is Outlook.
>
> "Reconsidering" would be great. In fact, revert if if you'd like. I'm
> uncertain that the surprise removal code actually works the way I
> thought previously and I think I took a lot of it out of my local code.
>
> Unfortuantely I don't have any equipment that I can use to reproduce
> surprise removal any longer so that means I wouldn't be able to test
> anything. I have to defer to you or Cao Jin.

Whoops. Never mind. I was just told that I had a bug that Alex Duyck and
Cao Jin just fixed. I'd stick to listening to Alex.

-- 
Hisashi T Fujinaka - htodd@twofifty.com

^ permalink raw reply

* Re: linux-next: Tree for Nov 8 (netdev, netfilter)
From: Randy Dunlap @ 2016-11-08 18:22 UTC (permalink / raw)
  To: Stephen Rothwell, linux-next
  Cc: linux-kernel, netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <20161108183855.5f8c209a@canb.auug.org.au>

On 11/07/16 23:38, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20161028:


on i386 or x86_64:

net/built-in.o: In function `nf_sk_lookup_slow_v4':
(.text+0x97414): undefined reference to `udp4_lib_lookup'

when these are not enabled:
#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_SOCKET) || \
    IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY)

and
CONFIG_NF_SOCKET_IPV4=y

See net/ipv4/netfilter/nf_socket_ipv4.c.


Reported-by: Randy Dunlap <rdunlap@infradead.org>
-- 
~Randy

^ permalink raw reply

* Re: [PATCH v3 4/4] posix-timers: make it configurable
From: Nicolas Pitre @ 2016-11-08 18:19 UTC (permalink / raw)
  To: John Stultz
  Cc: Michal Marek, Richard Cochran, Paul Bolle, Thomas Gleixner,
	Josh Triplett, Edward Cree, netdev, linux-kbuild, lkml
In-Reply-To: <CALAqxLVrGRW3ODzZk=4-8FSLe5wBwHJSOEAW=+kMFjTRoe1HzQ@mail.gmail.com>

On Tue, 8 Nov 2016, John Stultz wrote:

> One spot of concern is that the
> tools/testing/selftests/timers/posix_timers.c test hangs testing
> virtual itimers. Looking through the code I'm not seeing where an
> error case is missed.
> 
> The strace looks like:
> ...
> write(1, "Testing posix timers. False nega"..., 66Testing posix
> timers. False negative may happen on CPU execution
> ) = 66
> write(1, "based timers if other threads ru"..., 48based timers if
> other threads run on the CPU...
> ) = 48
> write(1, "Check itimer virtual... ", 24Check itimer virtual... ) = 24
> rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
> 0x7fb73306ccb0}, {SIG_DFL, [], 0}, 8) = 0
> gettimeofday({1478710402, 937476}, NULL) = 0
> setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
> <Hang>
> 
> 
> Where as with posix timers enabled:
> ...
> write(1, "Testing posix timers. False nega"..., 138Testing posix
> timers. False negative may happen on CPU execution
> based timers if other threads run on the CPU...
> Check itimer virtual... ) = 138
> rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
> 0x7f231ba8ccb0}, {SIG_DFL, [], 0}, 8) = 0
> gettimeofday({1478626751, 904856}, NULL) = 0
> setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
> --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_KERNEL} ---
> rt_sigreturn()                          = 0

I'll have a look.

> So I suspect you were a little too aggressive with the #ifdefs around
> the itimers/signal code, or we need to make sure we return an error on
> the setitimer ITIMER_VIRTUAL case as well.

Well, it seemed to me that with POSIX_TIMERS=n, all the code that would 
set up that signal is gone, so there was no point keeping the code to 
deliver it.

Now... would it make more sense to remove itimer support as well when 
POSIX_TIMERS=n?  The same reasoning would apply.


Nicolas

^ permalink raw reply

* RE: Is there a maximum bytes in flight limitation in the tcp stack? -->limit in scp
From: De Schepper, Koen (Nokia - BE) @ 2016-11-08 18:07 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: netdev@vger.kernel.org

Seems to be a limitation in the application. We used scp, and it (still) seems to limit the bytes in flight. Using our own application, we didn't see a limit indeed. Thanks for your response, and sorry for the noise...

Koen.

> -----Original Message-----
> From: Yuchung Cheng [mailto:ycheng@google.com]
> Sent: dinsdag 8 november 2016 5:51
> To: De Schepper, Koen (Nokia - BE) <koen.de_schepper@nokia-bell-
> labs.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: Is there a maximum bytes in flight limitation in the tcp stack?
> 
> On Thu, Nov 3, 2016 at 9:37 AM, De Schepper, Koen (Nokia - BE)
> <koen.de_schepper@nokia-bell-labs.com> wrote:
> >
> > Hi,
> >
> > We experience some limit on the maximum packets in flight which seem
> not to be related with the receive or write buffers. Does somebody know if
> there is an issue with a maximum of around 1MByte (or sometimes 2Mbyte)
> of data in flight per TCP flow?
> 
> does not ring a bell. I've definitely see cubic reaching >2MB cwnd (inflight)
> some packet trace will help.
> 
> btw, tcp_rmem is the maximum receive buffer including all header and
> control overhead. the receive window announced is (very roughly) half
> of your rcvbuf.
> 
> >
> > It seems to be a strict and stable limit independent from the CC (tested
> with Cubic, Reno and DCTCP). On a link of 200Mbps and 200ms RTT our link is
> only 20% (sometimes 40%, see conditions below) utilized for a single TCP
> flow with no drop experienced at all (no bottleneck in the AQM or RTT
> emulation, as it supports more throughput if multiple flows are active).
> >
> > Some configuration changes we already tried on both client and server
> (kernel 3.18.9):
> >
> > net.ipv4.tcp_no_metrics_save = 1
> > net.ipv4.tcp_rmem = 4096 87380 6291456
> > net.ipv4.tcp_wmem = 4096 16384 4194304
> >
> > SERVER# ss -i
> > tcp    ESTAB      0      1049728  10.187.255.211:46642     10.187.16.194:ssh
> >          dctcp wscale:7,7 rto:408 rtt:204.333/0.741 ato:40 mss:1448 cwnd:1466
> send 83.1Mbps unacked:728 rcv_rtt:212 rcv_space:29200
> > CLIENT# ss -i
> > tcp    ESTAB      0      288      10.187.16.194:ssh      10.187.255.211:46642
> >          dctcp wscale:7,7 rto:404 rtt:203.389/0.213 ato:40 mss:1448 cwnd:78
> send 4.4Mbps unacked:8 rcv_rtt:204 rcv_space:1074844
> >
> > When increasing the write and receive mem further (they were already
> way above 1 or 2 MB) it steps to double (40%; 2Mbytes in flight):
> > net.ipv4.tcp_no_metrics_save = 1
> > net.ipv4.tcp_rmem = 4096 8000000 16291456
> > net.ipv4.tcp_wmem = 4096 8000000 16291456
> >
> > SERVER # ss -i
> > tcp    ESTAB      0      2068976  10.187.255.212:54637     10.187.16.112:ssh
> >          cubic wscale:8,8 rto:404 rtt:202.622/0.061 ato:40 mss:1448 cwnd:1849
> ssthresh:1140 send 105.7Mbps unacked:1457 rcv_rtt:217.5 rcv_space:29200
> > CLIENT# ss -i
> > tcp    ESTAB      0      648      10.187.16.112:ssh      10.187.255.212:54637
> >          cubic wscale:8,8 rto:404 rtt:201.956/0.038 ato:40 mss:1448 cwnd:132
> send 7.6Mbps unacked:18 rcv_rtt:204 rcv_space:2093044
> >
> > Further increasing (x10) does not help anymore...
> > net.ipv4.tcp_no_metrics_save = 1
> > net.ipv4.tcp_rmem = 4096 80000000 162914560
> > net.ipv4.tcp_wmem = 4096 80000000 162914560
> >
> > As all these parameters autotune, it is hard to find out which one is
> limiting... In the examples, above unacked does not want to go higher, while
> congestion window in the server is big enough... rcv_space could be limiting,
> but it tunes up if I change the server with the higher buffers (switching to
> 2MByte in flight).
> >
> > We also tried tcp_limit_output_bytes, setting it bigger (x10) and
> smaller(/10), without effect. We've put it in /etc/sysctl.conf and rebooted, to
> make sure that it is effective.
> >
> > Some more detailed tests that had an effect on the 1 or 2MByte:
> > - It seems that with TSO off, if we configure a bigger wmem buffer, an
> ongoing flow suddenly is able to immediately double its bytes in flight limit.
> We configured further up to more than 10x the buffer, but no further
> increase helps, and the limits we saw are only 1MByte and 2Mbyte (no
> intermediate values depending on any parameter). When setting tcp_wmem
> smaller again, the 2MByte limit stays on the ongoing flow. We have to restart
> the flow to make the buffer reduction to 1MByte effective.
> > - With TSO on, only the 2MByte limit is effective, independent from the
> wmem buffer. We have to restart the flow to make a tso change effective.
> >
> > Koen.
> >

^ permalink raw reply

* Re: net/sunrpc/clnt.c:2773 suspicious rcu_dereference_check() usage!
From: Anna Schumaker @ 2016-11-08 17:56 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jeff Layton, Trond Myklebust, J. Bruce Fields, David S. Miller,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andy Adamson
In-Reply-To: <20161108174346.GA17545-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

On 11/08/2016 12:43 PM, Ross Zwisler wrote:
> On Tue, Nov 08, 2016 at 07:42:59AM -0500, Anna Schumaker wrote:
>> On 11/08/2016 07:09 AM, Jeff Layton wrote:
>>> On Tue, 2016-11-08 at 06:53 -0500, Jeff Layton wrote:
>>>> On Mon, 2016-11-07 at 22:42 -0700, Ross Zwisler wrote:
>>>>>
>>>>> I've got a virtual machine that has some NFS mounts, and with a newly compiled
>>>>> kernel based on v4.9-rc3 I see the following warning/info message:
>>>>>
>>>>> [   42.750181] ===============================
>>>>> [   42.750192] [ INFO: suspicious RCU usage. ]
>>>>> [   42.750203] 4.9.0-rc3-00002-g7b6e7de #3 Not tainted
>>>>> [   42.750213] -------------------------------
>>>>> [   42.750225] net/sunrpc/clnt.c:2773 suspicious rcu_dereference_check() usage!
>>>>> [   42.750235] 
>>>>> [   42.750235] other info that might help us debug this:
>>>>> [   42.750235] 
>>>>> [   42.750246] 
>>>>> [   42.750246] rcu_scheduler_active = 1, debug_locks = 0
>>>>> [   42.750257] 1 lock held by mount.nfs4/6440:
>>>>> [   42.750278]  #0: 
>>>>> [   42.750299]  (
>>>>> [   42.750319] &(&nn->nfs_client_lock)->rlock
>>>>> [   42.750340] ){+.+...}
>>>>> [   42.750362] , at: 
>>>>> [   42.750372] [<ffffffff813012b5>] nfs_get_client+0x105/0x5e0
>>>>> [   42.750383] 
>>>>> [   42.750383] stack backtrace:
>>>>> [   42.750394] CPU: 0 PID: 6440 Comm: mount.nfs4 Not tainted 4.9.0-rc3-00002-g7b6e7de #3
>>>>> [   42.750406] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.MBH.0096.D23.1608240105 08/24/2016
>>>>> [   42.750429]  ffffc9000092fa68 ffffffff8150730f ffff88014ec8da40 0000000000000001
>>>>> [   42.750452]  ffffc9000092fa98 ffffffff810bc3f7 ffff880150b0b228 ffff88015068dbb0
>>>>> [   42.750475]  ffffc9000092fb38 ffff88014fc99180 ffffc9000092fac0 ffffffff81b243e5
>>>>> [   42.750486] Call Trace:
>>>>> [   42.750498]  [<ffffffff8150730f>] dump_stack+0x67/0x98
>>>>> [   42.750511]  [<ffffffff810bc3f7>] lockdep_rcu_suspicious+0xe7/0x120
>>>>> [   42.750524]  [<ffffffff81b243e5>] rpc_clnt_xprt_switch_has_addr+0x115/0x150
>>>>> [   42.750536]  [<ffffffff813013f4>] nfs_get_client+0x244/0x5e0
>>>>> [   42.750549]  [<ffffffff813012ac>] ? nfs_get_client+0xfc/0x5e0
>>>>> [   42.750561]  [<ffffffff813568f8>] nfs4_set_client+0x98/0x130
>>>>> [   42.750574]  [<ffffffff8135872e>] nfs4_create_server+0x13e/0x390
>>>>> [   42.750588]  [<ffffffff8134cd0e>] nfs4_remote_mount+0x2e/0x60
>>>>> [   42.750600]  [<ffffffff811f3a29>] mount_fs+0x39/0x170
>>>>> [   42.750614]  [<ffffffff81214a0b>] vfs_kern_mount+0x6b/0x150
>>>>> [   42.750626]  [<ffffffff8134cbec>] ? nfs_do_root_mount+0x3c/0xc0
>>>>> [   42.750639]  [<ffffffff8134cc36>] nfs_do_root_mount+0x86/0xc0
>>>>> [   42.750652]  [<ffffffff8134d014>] nfs4_try_mount+0x44/0xc0
>>>>> [   42.750664]  [<ffffffff81302097>] ? get_nfs_version+0x27/0x90
>>>>> [   42.750677]  [<ffffffff81310f8c>] nfs_fs_mount+0x4ac/0xd80
>>>>> [   42.750689]  [<ffffffff810bb938>] ? lockdep_init_map+0x88/0x1f0
>>>>> [   42.750701]  [<ffffffff81311ac0>] ? nfs_clone_super+0x130/0x130
>>>>> [   42.750713]  [<ffffffff8130f300>] ? param_set_portnr+0x70/0x70
>>>>> [   42.750726]  [<ffffffff811f3a29>] mount_fs+0x39/0x170
>>>>> [   42.750740]  [<ffffffff81214a0b>] vfs_kern_mount+0x6b/0x150
>>>>> [   42.750752]  [<ffffffff812176f1>] do_mount+0x1f1/0xd10
>>>>> [   42.750765]  [<ffffffff81217441>] ? copy_mount_options+0xa1/0x140
>>>>> [   42.750777]  [<ffffffff81218543>] SyS_mount+0x83/0xd0
>>>>> [   42.750790]  [<ffffffff81002abc>] do_syscall_64+0x5c/0x130
>>>>> [   42.750802]  [<ffffffff81c479a4>] entry_SYSCALL64_slow_path+0x25/0x25
>>>>>
>>>>> This rcu_dereference_check() was introduced by the following commit:
>>>>>
>>>>> commit 39e5d2df959dd4aea81fa33d765d2a5cc67a0512
>>>>> Author: Andy Adamson <andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
>>>>> Date:   Fri Sep 9 09:22:25 2016 -0400
>>>>>
>>>>>     SUNRPC search xprt switch for sockaddr
>>>>>     
>>>>>     Signed-off-by: Andy Adamson <andros-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
>>>>>     Signed-off-by: Anna Schumaker <Anna.Schumaker-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
>>>>>
>>>>> Thanks,
>>>>> - Ross
>>>>
>>>> Thanks Ross,
>>
>> Hi Ross,
>>
>> Can you try this patch and let me know if it helps:
>>
>> http://git.linux-nfs.org/?p=anna/linux-nfs.git;a=commitdiff;h=bb29dd84333a96f309c6d0f88b285b5b78927058
>>
>> I'm planning on sending it to Linus soon, so it should be in rc5.
> 
> Hi Anna,
> 
> Yep, this patch makes the warning go away in my setup.

Great!  Thanks for testing!

Anna

> 
> Thanks,
> - Ross
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: net/sunrpc/clnt.c:2773 suspicious rcu_dereference_check() usage!
From: Ross Zwisler @ 2016-11-08 17:43 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Jeff Layton, Ross Zwisler, Trond Myklebust, J. Bruce Fields,
	David S. Miller, linux-nfs, netdev, linux-kernel, Andy Adamson
In-Reply-To: <60a0f29b-7a0a-c5e2-0e98-fa9a923dd339@Netapp.com>

On Tue, Nov 08, 2016 at 07:42:59AM -0500, Anna Schumaker wrote:
> On 11/08/2016 07:09 AM, Jeff Layton wrote:
> > On Tue, 2016-11-08 at 06:53 -0500, Jeff Layton wrote:
> >> On Mon, 2016-11-07 at 22:42 -0700, Ross Zwisler wrote:
> >>>
> >>> I've got a virtual machine that has some NFS mounts, and with a newly compiled
> >>> kernel based on v4.9-rc3 I see the following warning/info message:
> >>>
> >>> [   42.750181] ===============================
> >>> [   42.750192] [ INFO: suspicious RCU usage. ]
> >>> [   42.750203] 4.9.0-rc3-00002-g7b6e7de #3 Not tainted
> >>> [   42.750213] -------------------------------
> >>> [   42.750225] net/sunrpc/clnt.c:2773 suspicious rcu_dereference_check() usage!
> >>> [   42.750235] 
> >>> [   42.750235] other info that might help us debug this:
> >>> [   42.750235] 
> >>> [   42.750246] 
> >>> [   42.750246] rcu_scheduler_active = 1, debug_locks = 0
> >>> [   42.750257] 1 lock held by mount.nfs4/6440:
> >>> [   42.750278]  #0: 
> >>> [   42.750299]  (
> >>> [   42.750319] &(&nn->nfs_client_lock)->rlock
> >>> [   42.750340] ){+.+...}
> >>> [   42.750362] , at: 
> >>> [   42.750372] [<ffffffff813012b5>] nfs_get_client+0x105/0x5e0
> >>> [   42.750383] 
> >>> [   42.750383] stack backtrace:
> >>> [   42.750394] CPU: 0 PID: 6440 Comm: mount.nfs4 Not tainted 4.9.0-rc3-00002-g7b6e7de #3
> >>> [   42.750406] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.MBH.0096.D23.1608240105 08/24/2016
> >>> [   42.750429]  ffffc9000092fa68 ffffffff8150730f ffff88014ec8da40 0000000000000001
> >>> [   42.750452]  ffffc9000092fa98 ffffffff810bc3f7 ffff880150b0b228 ffff88015068dbb0
> >>> [   42.750475]  ffffc9000092fb38 ffff88014fc99180 ffffc9000092fac0 ffffffff81b243e5
> >>> [   42.750486] Call Trace:
> >>> [   42.750498]  [<ffffffff8150730f>] dump_stack+0x67/0x98
> >>> [   42.750511]  [<ffffffff810bc3f7>] lockdep_rcu_suspicious+0xe7/0x120
> >>> [   42.750524]  [<ffffffff81b243e5>] rpc_clnt_xprt_switch_has_addr+0x115/0x150
> >>> [   42.750536]  [<ffffffff813013f4>] nfs_get_client+0x244/0x5e0
> >>> [   42.750549]  [<ffffffff813012ac>] ? nfs_get_client+0xfc/0x5e0
> >>> [   42.750561]  [<ffffffff813568f8>] nfs4_set_client+0x98/0x130
> >>> [   42.750574]  [<ffffffff8135872e>] nfs4_create_server+0x13e/0x390
> >>> [   42.750588]  [<ffffffff8134cd0e>] nfs4_remote_mount+0x2e/0x60
> >>> [   42.750600]  [<ffffffff811f3a29>] mount_fs+0x39/0x170
> >>> [   42.750614]  [<ffffffff81214a0b>] vfs_kern_mount+0x6b/0x150
> >>> [   42.750626]  [<ffffffff8134cbec>] ? nfs_do_root_mount+0x3c/0xc0
> >>> [   42.750639]  [<ffffffff8134cc36>] nfs_do_root_mount+0x86/0xc0
> >>> [   42.750652]  [<ffffffff8134d014>] nfs4_try_mount+0x44/0xc0
> >>> [   42.750664]  [<ffffffff81302097>] ? get_nfs_version+0x27/0x90
> >>> [   42.750677]  [<ffffffff81310f8c>] nfs_fs_mount+0x4ac/0xd80
> >>> [   42.750689]  [<ffffffff810bb938>] ? lockdep_init_map+0x88/0x1f0
> >>> [   42.750701]  [<ffffffff81311ac0>] ? nfs_clone_super+0x130/0x130
> >>> [   42.750713]  [<ffffffff8130f300>] ? param_set_portnr+0x70/0x70
> >>> [   42.750726]  [<ffffffff811f3a29>] mount_fs+0x39/0x170
> >>> [   42.750740]  [<ffffffff81214a0b>] vfs_kern_mount+0x6b/0x150
> >>> [   42.750752]  [<ffffffff812176f1>] do_mount+0x1f1/0xd10
> >>> [   42.750765]  [<ffffffff81217441>] ? copy_mount_options+0xa1/0x140
> >>> [   42.750777]  [<ffffffff81218543>] SyS_mount+0x83/0xd0
> >>> [   42.750790]  [<ffffffff81002abc>] do_syscall_64+0x5c/0x130
> >>> [   42.750802]  [<ffffffff81c479a4>] entry_SYSCALL64_slow_path+0x25/0x25
> >>>
> >>> This rcu_dereference_check() was introduced by the following commit:
> >>>
> >>> commit 39e5d2df959dd4aea81fa33d765d2a5cc67a0512
> >>> Author: Andy Adamson <andros@netapp.com>
> >>> Date:   Fri Sep 9 09:22:25 2016 -0400
> >>>
> >>>     SUNRPC search xprt switch for sockaddr
> >>>     
> >>>     Signed-off-by: Andy Adamson <andros@netapp.com>
> >>>     Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >>>
> >>> Thanks,
> >>> - Ross
> >>
> >> Thanks Ross,
> 
> Hi Ross,
> 
> Can you try this patch and let me know if it helps:
> 
> http://git.linux-nfs.org/?p=anna/linux-nfs.git;a=commitdiff;h=bb29dd84333a96f309c6d0f88b285b5b78927058
> 
> I'm planning on sending it to Linus soon, so it should be in rc5.

Hi Anna,

Yep, this patch makes the warning go away in my setup.

Thanks,
- Ross

^ permalink raw reply

* Re: [PATCH 13/17] batman-adv: Consume skb in receive handlers
From: Eric Dumazet @ 2016-11-08 17:43 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <2044854.g075mifYzn@bentobox>

On Tue, 2016-11-08 at 18:28 +0100, Sven Eckelmann wrote:
> On Dienstag, 8. November 2016 08:59:49 CET Eric Dumazet wrote:
> [...]
> > > +free_skb:
> > >  	consume_skb(skb);
> > > -	return NET_RX_SUCCESS;
> > > +
> > > +	return ret;
> > >  }
> > 
> > 
> > Okay, but we do have kfree_skb() and consume_skb() and they should be
> > used appropriately.
> 
> Yes, this patch is one part of reaching this goal. Some other parts are also
> in this patchset. But other changes like the one you've mention here (change
> some consume_skb partially back to kfree_skb) have still to be done. But
> first we have to clean up the main portion of the mess :)

Sure, but your patch 13/17 should address this right away.

You must not call consume_skb() if you are dropping a packet.

Prior to this patch, kfree_skb() was properly called, and after this
patch, consume_skb() is called instead.


-       ret = (*batadv_rx_handler[idx])(skb, hard_iface);
-
-       if (ret == NET_RX_DROP)
-               kfree_skb(skb);
+       (*batadv_rx_handler[idx])(skb, hard_iface);
 
You can not claim working on these issues and at the same time add them
back.

^ permalink raw reply

* Re: [PATCH v3 4/4] posix-timers: make it configurable
From: John Stultz @ 2016-11-08 17:42 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Michal Marek, Richard Cochran, Paul Bolle, Thomas Gleixner,
	Josh Triplett, Edward Cree, netdev, linux-kbuild, lkml
In-Reply-To: <1478556899-2951-5-git-send-email-nicolas.pitre@linaro.org>

On Mon, Nov 7, 2016 at 2:14 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> Some embedded systems have no use for them.  This removes about
> 22KB from the kernel binary size when configured out.
>
> Corresponding syscalls are routed to a stub logging the attempt to
> use those syscalls which should be enough of a clue if they were
> disabled without proper consideration. They are: timer_create,
> timer_gettime: timer_getoverrun, timer_settime, timer_delete,
> clock_adjtime.
>
> The clock_settime, clock_gettime, clock_getres and clock_nanosleep
> syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
> CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
> majority of use cases with very little code.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> Acked-by: Richard Cochran <richardcochran@gmail.com>

So I have no design objections to the patch overall.

I ran this through my timekeeping tests last night and it passed a
fair number of the tests, considering.

I of course see a lot of failures around timer_creates failing
(set-timer-lat), and cases where clockids aren't supported.
So I'll need to see about updating the tests to fail more gracefully
with this change.

One spot of concern is that the
tools/testing/selftests/timers/posix_timers.c test hangs testing
virtual itimers. Looking through the code I'm not seeing where an
error case is missed.

The strace looks like:
...
write(1, "Testing posix timers. False nega"..., 66Testing posix
timers. False negative may happen on CPU execution
) = 66
write(1, "based timers if other threads ru"..., 48based timers if
other threads run on the CPU...
) = 48
write(1, "Check itimer virtual... ", 24Check itimer virtual... ) = 24
rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
0x7fb73306ccb0}, {SIG_DFL, [], 0}, 8) = 0
gettimeofday({1478710402, 937476}, NULL) = 0
setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
<Hang>


Where as with posix timers enabled:
...
write(1, "Testing posix timers. False nega"..., 138Testing posix
timers. False negative may happen on CPU execution
based timers if other threads run on the CPU...
Check itimer virtual... ) = 138
rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
0x7f231ba8ccb0}, {SIG_DFL, [], 0}, 8) = 0
gettimeofday({1478626751, 904856}, NULL) = 0
setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_KERNEL} ---
rt_sigreturn()                          = 0
gettimeofday({1478626753, 906137}, NULL) = 0
write(1, "[OK]\nCheck itimer prof... ", 26[OK]
...

So I suspect you were a little too aggressive with the #ifdefs around
the itimers/signal code, or we need to make sure we return an error on
the setitimer ITIMER_VIRTUAL case as well.

thanks
-john

^ permalink raw reply

* Re: [PATCH 13/17] batman-adv: Consume skb in receive handlers
From: Sven Eckelmann @ 2016-11-08 17:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Simon Wunderlich, davem, netdev, b.a.t.m.a.n
In-Reply-To: <1478624389.17367.12.camel@edumazet-glaptop3.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

On Dienstag, 8. November 2016 08:59:49 CET Eric Dumazet wrote:
[...]
> > +free_skb:
> >  	consume_skb(skb);
> > -	return NET_RX_SUCCESS;
> > +
> > +	return ret;
> >  }
> 
> 
> Okay, but we do have kfree_skb() and consume_skb() and they should be
> used appropriately.

Yes, this patch is one part of reaching this goal. Some other parts are also
in this patchset. But other changes like the one you've mention here (change
some consume_skb partially back to kfree_skb) have still to be done. But
first we have to clean up the main portion of the mess :)

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
From: Hisashi T Fujinaka @ 2016-11-08 17:16 UTC (permalink / raw)
  To: Corinna Vinschen
  Cc: Cao jin, netdev, intel-wired-lan, linux-kernel, izumi.taku
In-Reply-To: <20161108164214.GF31855@calimero.vinschen.de>

On Tue, 8 Nov 2016, Corinna Vinschen wrote:

> On Nov  8 15:06, Cao jin wrote:
>> When running as guest, under certain condition, it will oops as following.
>> writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr
>> is NULL. While other register access won't oops kernel because they use
>> wr32/rd32 which have a defense against NULL pointer.
>>
>>     [  141.225449] pcieport 0000:00:1c.0: AER: Multiple Uncorrected (Fatal)
>>     error received: id=0101
>>     [  141.225523] igb 0000:01:00.1: PCIe Bus Error:
>>     severity=Uncorrected (Fatal), type=Unaccessible,
>>     id=0101(Unregistered Agent ID)
>>     [  141.299442] igb 0000:01:00.1: broadcast error_detected message
>>     [  141.300539] igb 0000:01:00.0 enp1s0f0: PCIe link lost, device now
>>     detached
>>     [  141.351019] igb 0000:01:00.1 enp1s0f1: PCIe link lost, device now
>>     detached
>>     [  143.465904] pcieport 0000:00:1c.0: Root Port link has been reset
>>     [  143.465994] igb 0000:01:00.1: broadcast slot_reset message
>>     [  143.466039] igb 0000:01:00.0: enabling device (0000 -> 0002)
>>     [  144.389078] igb 0000:01:00.1: enabling device (0000 -> 0002)
>>     [  145.312078] igb 0000:01:00.1: broadcast resume message
>>     [  145.322211] BUG: unable to handle kernel paging request at
>>     0000000000003818
>>     [  145.361275] IP: [<ffffffffa02fd38d>]
>>     igb_configure_tx_ring+0x14d/0x280 [igb]
>>     [  145.400048] PGD 0
>>     [  145.438007] Oops: 0002 [#1] SMP
>>
>> A similiar issue & solution could be found at:
>>     http://patchwork.ozlabs.org/patch/689592/
>>
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>>  drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>> index edc9a6a..3f240ac 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -3390,7 +3390,7 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
>>  	     tdba & 0x00000000ffffffffULL);
>>  	wr32(E1000_TDBAH(reg_idx), tdba >> 32);
>>
>> -	ring->tail = hw->hw_addr + E1000_TDT(reg_idx);
>> +	ring->tail = adapter->io_addr + E1000_TDT(reg_idx);
>>  	wr32(E1000_TDH(reg_idx), 0);
>>  	writel(0, ring->tail);
>>
>> @@ -3729,7 +3729,7 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
>>  	     ring->count * sizeof(union e1000_adv_rx_desc));
>>
>>  	/* initialize head and tail */
>> -	ring->tail = hw->hw_addr + E1000_RDT(reg_idx);
>> +	ring->tail = adapter->io_addr + E1000_RDT(reg_idx);
>>  	wr32(E1000_RDH(reg_idx), 0);
>>  	writel(0, ring->tail);
>>
>> --
>> 2.1.0
>
> Incidentally we're just looking for a solution to that problem too.
> Do three patches to fix the same problem at rougly the same time already
> qualify as freak accident?
>
> FTR, I attached my current patch, which I was planning to submit after
> some external testing.
>
> However, all three patches have one thing in common:  They workaround
> a somewhat dubious resetting of the hardware address to NULL in case
> reading from a register failed.
>
> That makes me wonder if setting the hardware address to NULL in
> rd32/igb_rd32 is really such a good idea.  It's performed in a function
> which return value is *never* tested for validity in the calling
> functions and leads to subsequent crashes since no tests for hw_addr ==
> NULL are performed.
>
> Maybe commit 22a8b2915 should be reconsidered?  Isn't there some more
> graceful way to handle the "surprise removal"?

Answering this from my home account because, well, work is Outlook.

"Reconsidering" would be great. In fact, revert if if you'd like. I'm
uncertain that the surprise removal code actually works the way I
thought previously and I think I took a lot of it out of my local code.

Unfortuantely I don't have any equipment that I can use to reproduce
surprise removal any longer so that means I wouldn't be able to test
anything. I have to defer to you or Cao Jin.

-- 
Hisashi T Fujinaka - htodd@twofifty.com (todd.fujinaka@intel.com)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox