Netdev List
 help / color / mirror / Atom feed
* Greetings
From: Zeliha Omer Faruk @ 2018-05-26 17:34 UTC (permalink / raw)





Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey

^ permalink raw reply

* Re: [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
From: Sergei Shtylyov @ 2018-05-26 17:31 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc
In-Reply-To: <22b370a7-0990-f1f5-196e-6e353a4e34ac@mentor.com>

On 05/25/2018 09:25 AM, Vladimir Zapolskiy wrote:

>>>> For ages trivial changes to RAVB and SuperH ethernet links by means of
>>>> standard 'ethtool' trigger a 'sleeping function called from invalid
>>>> context' bug, to visualize it on r8a7795 ULCB:
>>>>
>>>>   % ethtool -r eth0
>>>>   BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
>>>>   in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
>>>>   INFO: lockdep is turned off.
>>>>   irq event stamp: 0
>>>>   hardirqs last  enabled at (0): [<0000000000000000>]           (null)
>>>>   hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>>>>   softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
>>>>   softirqs last disabled at (0): [<0000000000000000>]           (null)
>>>>   CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
>>>>   Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
>>>>   Call trace:
>>>>    dump_backtrace+0x0/0x198
>>>>    show_stack+0x24/0x30
>>>>    dump_stack+0xb8/0xf4
>>>>    ___might_sleep+0x1c8/0x1f8
>>>>    __might_sleep+0x58/0x90
>>>>    __mutex_lock+0x50/0x890
>>>>    mutex_lock_nested+0x3c/0x50
>>>>    phy_start_aneg_priv+0x38/0x180
>>>>    phy_start_aneg+0x24/0x30
>>>>    ravb_nway_reset+0x3c/0x68
>>>>    dev_ethtool+0x3dc/0x2338
>>>>    dev_ioctl+0x19c/0x490
>>>>    sock_do_ioctl+0xe0/0x238
>>>>    sock_ioctl+0x254/0x460
>>>>    do_vfs_ioctl+0xb0/0x918
>>>>    ksys_ioctl+0x50/0x80
>>>>    sys_ioctl+0x34/0x48
>>>>    __sys_trace_return+0x0/0x4
>>>>
>>>> The root cause is that an attempt to modify ECMR and GECMR registers
>>>> only when RX/TX function is disabled was too overcomplicated in its
>>>> original implementation, also processing of an optional Link Change
>>>> interrupt added even more complexity, as a result the implementation
>>>> was error prone.
>>>>
>>>> The new locking scheme is confirmed to be correct by dumping driver
>>>> specific and generic PHY framework function calls with aid of ftrace
>>>> while running more or less advanced tests.
>>>>
>>>> Please note that sh_eth patches from the series were built-tested only.
>>>>
>>>> On purpose I do not add Fixes tags, the reused PHY handlers were added
>>>> way later than the fixed problems were firstly found in the drivers.
>>>
>>>    I think you went one step too far with these fixes. On the first glance,
>>> the real fixes are to remove grabbing/releasing the spinlock for the duration
>>> of the phylib calls. Am I right? If so, making use of the new phylib APIs
>>> would be a further enhancement, it's not needed for fixing the splats per se...
>>
>>    Note that I hadn't looked at the patches #3/#6 at the time of writing this;
>> those seem to be more complicated than the rest.
> 
> Right, the simplistic approach of just removing the held spinlock does
> not fit well into the overall lame locking model found in the driver.

   Yet you only try fixing it in the patches #3 and #6. I was talking about
the patches #1 and #4 mostly (#2 and #5 turned out to be non-fixes).

> The thing is that I would prefer to exhibit 'remove custom callbacks'
> side of the changes as it is done now, and fixing severe 'invalid contex'
> bugs is left as a valuable side effect. I may attempt to find enough
> free time to follow your instructions, but frankly speaking I don't
> see it beneficial to split a single good all-sufficient change into
> three or more: removal of spinlocks, replacement of phy_start_aneg(),
> then a non-functional clean-up.
   Yes, I would prefer these step-by-step changes.

> Bikeshedding isn't my preference,

   This is not about bikeshedding. What you are trying to do clearly
violates the 2 basic principles of the kernel development: "don't mix
fixes and enhancements" and "do one thing per patch". 

> but a report about technical flaws related to the published changes
> is appreciated, otherwise let me ask you to accept the changes as is,
> secondary optimizations can be done on top of them.

   No, I'll certainly have to NAK patches #1/#3 in their current form.
I'm yet to review patches #3/#6... anyway, if you lack the time to do things
properly, I'll have to take this burden on my shoulders (giving you credits).
Yet I'm basically is in the same situation as you -- I have to spend my copiuos
free time on the large patch sets (like yours) and I'm still having some cleanups
to sh_eth cooking here (which I'll most probably have to defer)...

> --
> With best wishes,
> Vladimir

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 2/6] ravb: remove custom .get_link_ksettings from ethtool ops
From: Sergei Shtylyov @ 2018-05-26 17:07 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc
In-Reply-To: <1527160318-10958-3-git-send-email-vladimir_zapolskiy@mentor.com>

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change replaces a custom implementation of .get_link_ksettings
> callback with a shared phy_ethtool_get_link_ksettings(), note that

> &priv->lock wrapping is not needed, because the lock does not
> serialize access to phydev fields.

   No BUG() here, AFAICT. But then this is not a fix but an enhancement.
And I would have done that in 2 steps: 1st removing the spinlock code
and the 2nd removing the custom method implementation. 

> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
From: Sergei Shtylyov @ 2018-05-26 16:56 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc
In-Reply-To: <1527160318-10958-2-git-send-email-vladimir_zapolskiy@mentor.com>

Hello.

   A formal patch review this time...

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   OK so far...

> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().

   Why? Is this necessary to fix the BUG()?

> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 68f122140966..4a043eb0e2aa 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>  	return error;
>  }
>  
> -static int ravb_nway_reset(struct net_device *ndev)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	int error = -ENODEV;
> -	unsigned long flags;
> -
> -	if (ndev->phydev) {
> -		spin_lock_irqsave(&priv->lock, flags);

   OK, removing spin_lock_irqsave() fixes the BUG()...
   Not sure what we rotect against here anyway, MAC interrupts?

> -		error = phy_start_aneg(ndev->phydev);
> -		spin_unlock_irqrestore(&priv->lock, flags);
> -	}
> -
> -	return error;
> -}
> -
>  static u32 ravb_get_msglevel(struct net_device *ndev)
>  {
>  	struct ravb_private *priv = netdev_priv(ndev);
> @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
>  }
>  
>  static const struct ethtool_ops ravb_ethtool_ops = {
> -	.nway_reset		= ravb_nway_reset,
> +	.nway_reset		= phy_ethtool_nway_reset,

   What does this fix?

>  	.get_msglevel		= ravb_get_msglevel,
>  	.set_msglevel		= ravb_set_msglevel,
>  	.get_link		= ethtool_op_get_link,

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
From: Sergei Shtylyov @ 2018-05-26 16:56 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc
In-Reply-To: <1527160318-10958-2-git-send-email-vladimir_zapolskiy@mentor.com>

Hello.

   A formal patch review this time...

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   OK so far...

> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().

   Why? Is this necessary to fix the BUG()?

> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 68f122140966..4a043eb0e2aa 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>  	return error;
>  }
>  
> -static int ravb_nway_reset(struct net_device *ndev)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	int error = -ENODEV;
> -	unsigned long flags;
> -
> -	if (ndev->phydev) {
> -		spin_lock_irqsave(&priv->lock, flags);

   OK, removing spin_lock_irqsave() fixes the BUG()...
   Not sure what we rotect against here anyway, MAC interrupts?

> -		error = phy_start_aneg(ndev->phydev);
> -		spin_unlock_irqrestore(&priv->lock, flags);
> -	}
> -
> -	return error;
> -}
> -
>  static u32 ravb_get_msglevel(struct net_device *ndev)
>  {
>  	struct ravb_private *priv = netdev_priv(ndev);
> @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
>  }
>  
>  static const struct ethtool_ops ravb_ethtool_ops = {
> -	.nway_reset		= ravb_nway_reset,
> +	.nway_reset		= phy_ethtool_nway_reset,

   What does this fix?

>  	.get_msglevel		= ravb_get_msglevel,
>  	.set_msglevel		= ravb_set_msglevel,
>  	.get_link		= ethtool_op_get_link,

MBR, Sergei

^ permalink raw reply

* Re: [PATCH] crypto: chtls: generic handling of data and hdr
From: Herbert Xu @ 2018-05-26 16:25 UTC (permalink / raw)
  To: Atul Gupta; +Cc: linux-crypto, netdev, davem, Harsh Jain
In-Reply-To: <1526296298-30183-1-git-send-email-atul.gupta@chelsio.com>

On Mon, May 14, 2018 at 04:41:38PM +0530, Atul Gupta wrote:
> removed redundant check and made TLS PDU and header recv
> handling common as received from HW.
> Ensure that only tls header is read in cpl_rx_tls_cmp
> read-ahead and skb is freed when entire data is processed.
> 
> Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
> Signed-off-by: Harsh Jain <harsh@chelsio.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
From: Dmitry Vyukov @ 2018-05-26 15:50 UTC (permalink / raw)
  To: Michael Tuexen
  Cc: Neil Horman, Xin Long, network dev, linux-sctp, David Miller,
	David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller
In-Reply-To: <71FD541D-25FE-4100-980B-C3A0CEAF6CD4@lurchi.franken.de>

On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen
<michael.tuexen@lurchi.franken.de> wrote:
>> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote:
>>
>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote:
>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused
>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
>>> value, hb_timer will get stuck there, as in its timer handler it starts
>>> this timer again with this value, then goes to the timer handler again.
>>>
>>> This problem is there since very beginning, and thanks to Eric for the
>>> reproducer shared from a syzbot mail.
>>>
>>> This patch fixes it by not allowing to set rto_min with a value below
>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl.
>>>
>>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>> ---
>>> include/net/sctp/constants.h |  1 +
>>> net/sctp/socket.c            | 10 +++++++---
>>> net/sctp/sysctl.c            |  3 ++-
>>> 3 files changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
>>> index 20ff237..2ee7a7b 100644
>>> --- a/include/net/sctp/constants.h
>>> +++ b/include/net/sctp/constants.h
>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 };
>>> #define SCTP_RTO_INITIAL     (3 * 1000)
>>> #define SCTP_RTO_MIN         (1 * 1000)
>>> #define SCTP_RTO_MAX         (60 * 1000)
>>> +#define SCTP_RTO_HARD_MIN   200
>>>
>>> #define SCTP_RTO_ALPHA          3   /* 1/8 when converted to right shifts. */
>>> #define SCTP_RTO_BETA           2   /* 1/4 when converted to right shifts. */
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index ae7e7c6..6ef12c7 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval,
>>>  * be changed.
>>>  *
>>>  */
>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen)
>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval,
>>> +                               unsigned int optlen)
>>> {
>>>      struct sctp_rtoinfo rtoinfo;
>>>      struct sctp_association *asoc;
>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne
>>>      else
>>>              rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max;
>>>
>>> -    if (rto_min)
>>> +    if (rto_min) {
>>> +            if (rto_min < SCTP_RTO_HARD_MIN)
>>> +                    return -EINVAL;
>>>              rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min;
>>> -    else
>>> +    } else {
>>>              rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min;
>>> +    }
>>>
>>>      if (rto_min > rto_max)
>>>              return -EINVAL;
>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
>>> index 33ca5b7..7ec854a 100644
>>> --- a/net/sctp/sysctl.c
>>> +++ b/net/sctp/sysctl.c
>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0;
>>> static int rto_beta_min = 0;
>>> static int rto_alpha_max = 1000;
>>> static int rto_beta_max = 1000;
>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN;
>>>
>>> static unsigned long max_autoclose_min = 0;
>>> static unsigned long max_autoclose_max =
>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = {
>>>              .maxlen         = sizeof(unsigned int),
>>>              .mode           = 0644,
>>>              .proc_handler   = proc_sctp_do_rto_min,
>>> -            .extra1         = &one,
>>> +            .extra1         = &rto_hard_min,
>>>              .extra2         = &init_net.sctp.rto_max
>>>      },
>>>      {
>>> --
>>> 2.1.0
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as
>> well
>>
> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms.
> So could this be reduced?

Hi Michael,

What value do they use?

Xin, Neil, is there more principled way of ensuring that a timer won't
cause a hard CPU stall? There are slow machines and there are slow
kernels (in particular syzbot kernel has tons of debug configs
enabled). 200ms _should_ not cause problems because we did not see
them with tcp. But it's hard to say what's the low limit as we are
trying to put a hard upper bound on execution time of a complex
section of code. Is there something like cond_resched for timers?

^ permalink raw reply

* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
From: Michael Tuexen @ 2018-05-26 15:42 UTC (permalink / raw)
  To: Neil Horman
  Cc: Xin Long, network dev, linux-sctp, davem, David Ahern,
	Eric Dumazet, Marcelo Ricardo Leitner, syzkaller
In-Reply-To: <20180525191319.GB392@hmswarspite.think-freely.org>

> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote:
>> syzbot reported a rcu_sched self-detected stall on CPU which is caused
>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
>> value, hb_timer will get stuck there, as in its timer handler it starts
>> this timer again with this value, then goes to the timer handler again.
>> 
>> This problem is there since very beginning, and thanks to Eric for the
>> reproducer shared from a syzbot mail.
>> 
>> This patch fixes it by not allowing to set rto_min with a value below
>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl.
>> 
>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> ---
>> include/net/sctp/constants.h |  1 +
>> net/sctp/socket.c            | 10 +++++++---
>> net/sctp/sysctl.c            |  3 ++-
>> 3 files changed, 10 insertions(+), 4 deletions(-)
>> 
>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
>> index 20ff237..2ee7a7b 100644
>> --- a/include/net/sctp/constants.h
>> +++ b/include/net/sctp/constants.h
>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 };
>> #define SCTP_RTO_INITIAL	(3 * 1000)
>> #define SCTP_RTO_MIN		(1 * 1000)
>> #define SCTP_RTO_MAX		(60 * 1000)
>> +#define SCTP_RTO_HARD_MIN	200
>> 
>> #define SCTP_RTO_ALPHA          3   /* 1/8 when converted to right shifts. */
>> #define SCTP_RTO_BETA           2   /* 1/4 when converted to right shifts. */
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index ae7e7c6..6ef12c7 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval,
>>  * be changed.
>>  *
>>  */
>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen)
>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval,
>> +				   unsigned int optlen)
>> {
>> 	struct sctp_rtoinfo rtoinfo;
>> 	struct sctp_association *asoc;
>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne
>> 	else
>> 		rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max;
>> 
>> -	if (rto_min)
>> +	if (rto_min) {
>> +		if (rto_min < SCTP_RTO_HARD_MIN)
>> +			return -EINVAL;
>> 		rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min;
>> -	else
>> +	} else {
>> 		rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min;
>> +	}
>> 
>> 	if (rto_min > rto_max)
>> 		return -EINVAL;
>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
>> index 33ca5b7..7ec854a 100644
>> --- a/net/sctp/sysctl.c
>> +++ b/net/sctp/sysctl.c
>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0;
>> static int rto_beta_min = 0;
>> static int rto_alpha_max = 1000;
>> static int rto_beta_max = 1000;
>> +static int rto_hard_min = SCTP_RTO_HARD_MIN;
>> 
>> static unsigned long max_autoclose_min = 0;
>> static unsigned long max_autoclose_max =
>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = {
>> 		.maxlen		= sizeof(unsigned int),
>> 		.mode		= 0644,
>> 		.proc_handler	= proc_sctp_do_rto_min,
>> -		.extra1         = &one,
>> +		.extra1         = &rto_hard_min,
>> 		.extra2         = &init_net.sctp.rto_max
>> 	},
>> 	{
>> -- 
>> 2.1.0
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as
> well
> 
I'm aware of some signalling networks which use RTO.min of smaller values than 200ms.
So could this be reduced?

Best regards
Michael
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: INFO: rcu detected stall in skb_free_head
From: Dmitry Vyukov @ 2018-05-26 15:38 UTC (permalink / raw)
  To: syzbot
  Cc: Andrei Vagin, David Miller, Kirill Tkhai, LKML, netdev,
	syzkaller-bugs
In-Reply-To: <000000000000799e6c056aff4a6f@google.com>

On Sun, Apr 29, 2018 at 6:33 PM, syzbot
<syzbot+cac7c17ec0aca89d3c45@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018 +0000)
> Merge branch 'parisc-4.17-3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=cac7c17ec0aca89d3c45
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6517400396627968
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5914490758943236750
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+cac7c17ec0aca89d3c45@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> INFO: rcu_sched self-detected stall on CPU
>         1-...!: (117917 ticks this GP) idle=036/1/4611686018427387906
> softirq=114416/114416 fqs=32
>          (t=125000 jiffies g=60712 c=60711 q=345938)
> rcu_sched kthread starved for 124847 jiffies! g60712 c60711 f0x2
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
> RCU grace-period kthread stack dump:
> rcu_sched       R  running task    23592     9      2 0x80000000
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
>  schedule+0xef/0x430 kernel/sched/core.c:3549
>  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
>  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> NMI backtrace for cpu 1
> CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 4.17.0-rc1+ #6
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: events rht_deferred_worker
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
>  __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
>  hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
> [inline]
> RIP: 0010:kfree+0x124/0x260 mm/slab.c:3814
> RSP: 0018:ffff8801db105450 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000007 RBX: ffff88006c118040 RCX: 1ffff1003b3059e7
> RDX: 0000000000000000 RSI: ffff8801d982cf90 RDI: 0000000000000286
> RBP: ffff8801db105470 R08: ffff8801d982ce78 R09: 0000000000000002
> R10: ffff8801d982c640 R11: 0000000000000000 R12: 0000000000000286
> R13: ffff8801dac00ac0 R14: ffffffff85bd7b69 R15: ffff88006c0f8180
>  skb_free_head+0x99/0xc0 net/core/skbuff.c:550
>  skb_release_data+0x690/0x860 net/core/skbuff.c:570
>  skb_release_all+0x4a/0x60 net/core/skbuff.c:627
>  __kfree_skb net/core/skbuff.c:641 [inline]
>  kfree_skb+0x195/0x560 net/core/skbuff.c:659
>  enqueue_to_backlog+0x2fc/0xc90 net/core/dev.c:3968
>  netif_rx_internal+0x14d/0xae0 net/core/dev.c:4181
>  netif_rx+0xba/0x400 net/core/dev.c:4206
>  loopback_xmit+0x283/0x741 drivers/net/loopback.c:91
>  __netdev_start_xmit include/linux/netdevice.h:4087 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4096 [inline]
>  xmit_one net/core/dev.c:3053 [inline]
>  dev_hard_start_xmit+0x264/0xc10 net/core/dev.c:3069
>  __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
>  neigh_hh_output include/net/neighbour.h:472 [inline]
>  neigh_output include/net/neighbour.h:480 [inline]
>  ip_finish_output2+0x1046/0x1840 net/ipv4/ip_output.c:229
>  ip_finish_output+0x828/0xf80 net/ipv4/ip_output.c:317
>  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
>  ip_output+0x21b/0x850 net/ipv4/ip_output.c:405
>  dst_output include/net/dst.h:444 [inline]
>  ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
>  ip_queue_xmit+0x9d7/0x1f70 net/ipv4/ip_output.c:504
>  sctp_v4_xmit+0x108/0x140 net/sctp/protocol.c:983
>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650
>  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406

#syz fix: sctp: not allow to set rto_min with a value below 200 msecs


>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1046
>  </IRQ>
>  do_softirq.part.17+0x14d/0x190 kernel/softirq.c:329
>  do_softirq arch/x86/include/asm/preempt.h:23 [inline]
>  __local_bh_enable_ip+0x1ec/0x230 kernel/softirq.c:182
>  __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:176 [inline]
>  _raw_spin_unlock_bh+0x30/0x40 kernel/locking/spinlock.c:200
>  spin_unlock_bh include/linux/spinlock.h:355 [inline]
>  rhashtable_rehash_chain lib/rhashtable.c:292 [inline]
>  rhashtable_rehash_table lib/rhashtable.c:333 [inline]
>  rht_deferred_worker+0x1058/0x1fb0 lib/rhashtable.c:432
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 126s!
> BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 126s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=8/256
>     in-flight: 24:rht_deferred_worker
>     pending: rht_deferred_worker, defense_work_handler,
> defense_work_handler, perf_sched_delayed, defense_work_handler,
> switchdev_deferred_process_work, cache_reap
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=11/256
>     pending: jump_label_update_timeout, defense_work_handler,
> defense_work_handler, defense_work_handler, defense_work_handler,
> defense_work_handler, defense_work_handler, check_corruption,
> key_garbage_collector, vmstat_shepherd, cache_reap
> workqueue events_long: flags=0x0
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
>     pending: br_fdb_cleanup, br_fdb_cleanup, br_fdb_cleanup
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=5/256
>     pending: br_fdb_cleanup, br_fdb_cleanup, br_fdb_cleanup, br_fdb_cleanup,
> br_fdb_cleanup
> workqueue events_unbound: flags=0x2
>   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512
>     in-flight: 10434:fsnotify_connector_destroy_workfn
> workqueue events_power_efficient: flags=0x80
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>     pending: neigh_periodic_work
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256
>     pending: check_lifetime, gc_worker, do_cache_clean, neigh_periodic_work
> workqueue rcu_gp: flags=0x8
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>     pending: process_srcu
> workqueue mm_percpu_wq: flags=0x8
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>     pending: vmstat_update
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
>     pending: vmstat_update
> workqueue writeback: flags=0x4e
>   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256
>     in-flight: 13940:wb_workfn
> workqueue kblockd: flags=0x18
>   pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=2/256
>     pending: blk_mq_timeout_work, blk_mq_timeout_work
> workqueue dm_bufio_cache: flags=0x8
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
>     pending: work_fn
> workqueue ipv6_addrconf: flags=0x40008
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
>     pending: addrconf_verify_work
> workqueue krdsd: flags=0xe000a
>   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1
>     in-flight: 301:rds_connect_worker
> pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=127s workers=3 idle: 4831 4860
> pool 4: cpus=0-1 flags=0x4 nice=0 hung=0s workers=8 idle: 301 22 6 7437 6882
> 50
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/000000000000799e6c056aff4a6f%40google.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: INFO: rcu detected stall in kmem_cache_alloc_node_trace
From: Dmitry Vyukov @ 2018-05-26 15:38 UTC (permalink / raw)
  To: syzbot
  Cc: David Miller, LKML, linux-sctp, netdev, Neil Horman,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <00000000000060115b056b14b1fb@google.com>

On Mon, Apr 30, 2018 at 8:05 PM, syzbot
<syzbot+deec965c578bb9b81613@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    17dec0a94915 Merge branch 'userns-linus' of
> git://git.kerne...
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?id=6093051722203136
> kernel config:
> https://syzkaller.appspot.com/x/.config?id=-2735707888269579554
> dashboard link: https://syzkaller.appspot.com/bug?extid=deec965c578bb9b81613
> compiler:       gcc (GCC) 8.0.1 20180301 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+deec965c578bb9b81613@syzkaller.appspotmail.com
>
> sctp: [Deprecated]: syz-executor3 (pid 10218) Use of int in max_burst socket
> option.
> Use struct sctp_assoc_value instead
> sctp: [Deprecated]: syz-executor3 (pid 10218) Use of int in max_burst socket
> option.
> Use struct sctp_assoc_value instead
> random: crng init done
> INFO: rcu_sched self-detected stall on CPU
>         0-....: (120712 ticks this GP) idle=ac6/1/4611686018427387908
> softirq=31693/31693 fqs=31173
>          (t=125001 jiffies g=17039 c=17038 q=303419)
> NMI backtrace for cpu 0
> CPU: 0 PID: 10218 Comm: syz-executor3 Not tainted 4.16.0+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x1b9/0x29f lib/dump_stack.c:53
>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:162
>  tick_sched_timer+0x42/0x130 kernel/time/tick-sched.c:1170
>  __run_hrtimer kernel/time/hrtimer.c:1349 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1411
>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1469
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
> [inline]
> RIP: 0010:lock_is_held_type+0x18b/0x210 kernel/locking/lockdep.c:3960
> RSP: 0018:ffff8801db006400 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12
> RAX: dffffc0000000000 RBX: 0000000000000282 RCX: 0000000000000000
> RDX: 1ffffffff1162e55 RSI: ffffffff88b90c60 RDI: 0000000000000282
> RBP: ffff8801db006420 R08: ffffed003b6046c3 R09: ffffed003b6046c2
> R10: ffffed003b6046c2 R11: ffff8801db023613 R12: ffff8801b2f623c0
> R13: 0000000000000000 R14: ffff88009932bb00 R15: 00000000ffffffff
>  lock_is_held include/linux/lockdep.h:344 [inline]
>  rcu_read_lock_sched_held+0x108/0x120 kernel/rcu/update.c:117
>  trace_kmalloc_node include/trace/events/kmem.h:100 [inline]
>  kmem_cache_alloc_node_trace+0x34e/0x770 mm/slab.c:3652
>  __do_kmalloc_node mm/slab.c:3669 [inline]
>  __kmalloc_node_track_caller+0x33/0x70 mm/slab.c:3684
>  __kmalloc_reserve.isra.38+0x3a/0xe0 net/core/skbuff.c:137
>  __alloc_skb+0x14d/0x780 net/core/skbuff.c:205
>  alloc_skb include/linux/skbuff.h:987 [inline]
>  sctp_packet_transmit+0x45e/0x3ba0 net/sctp/output.c:585
>  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406

#syz fix: sctp: not allow to set rto_min with a value below 200 msecs


>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>  </IRQ>
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
> [inline]
> RIP: 0010:console_unlock+0xcdf/0x1100 kernel/printk/printk.c:2403
> RSP: 0018:ffff8801946eec00 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff12
> RAX: 0000000000040000 RBX: 0000000000000200 RCX: ffffc90002ee8000
> RDX: 0000000000004461 RSI: ffffffff815f3446 RDI: 0000000000000212
> RBP: ffff8801946eed68 R08: ffff8801b2f62c38 R09: 0000000000000006
> R10: ffff8801b2f623c0 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffff84b84430 R14: 0000000000000001 R15: dffffc0000000000
>  vprintk_emit+0x6ad/0xdd0 kernel/printk/printk.c:1907
>  vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
>  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:379
>  printk+0x9e/0xba kernel/printk/printk.c:1980
>  sctp_getsockopt_maxburst net/sctp/socket.c:6265 [inline]
>  sctp_getsockopt.cold.34+0x11d/0x14c net/sctp/socket.c:7240
>  sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2998
>  __sys_getsockopt+0x1a5/0x370 net/socket.c:1940
>  SYSC_getsockopt net/socket.c:1951 [inline]
>  SyS_getsockopt+0x34/0x50 net/socket.c:1948
>  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455279
> RSP: 002b:00007f5c0c0f2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
> RAX: ffffffffffffffda RBX: 00007f5c0c0f36d4 RCX: 0000000000455279
> RDX: 0000000000000014 RSI: 0000000000000084 RDI: 0000000000000014
> RBP: 000000000072bea0 R08: 0000000020000240 R09: 0000000000000000
> R10: 0000000020000140 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 000000000000012b R14: 00000000006f4ca8 R15: 0000000000000000
> INFO: rcu_sched detected stalls on CPUs/tasks:
>         0-....: (120712 ticks this GP) idle=ac6/1/4611686018427387908
> softirq=31693/31693 fqs=31173
>         (detected by 1, t=125002 jiffies, g=17039, c=17038, q=303419)
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 10218 Comm: syz-executor3 Not tainted 4.16.0+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__lock_acquire+0xa78/0x5130 kernel/locking/lockdep.c:3434
> RSP: 0018:ffff8801db005b40 EFLAGS: 00000046
> RAX: dffffc0000000000 RBX: 00000000858813a6 RCX: 0000000000000001
> RDX: 1ffff100365ec585 RSI: ffff8801b2f62c38 RDI: ffff8801b2f62cf9
> RBP: ffff8801db005ed0 R08: 0000000000000008 R09: 0000000000000004
> R10: ffff8801b2f62cd8 R11: ffff8801b2f623c0 R12: 00000000be6c7baf
> R13: 0000000000000000 R14: 2ca977e0cccfd81f R15: 0000000000000000
> FS:  00007f5c0c0f3700(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffe57df8fa8 CR3: 00000001d7124000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <IRQ>
>  lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
>  rcu_lock_acquire include/linux/rcupdate.h:246 [inline]
>  rcu_read_lock include/linux/rcupdate.h:632 [inline]
>  is_bpf_text_address+0x3b/0x170 kernel/bpf/core.c:478
>  kernel_text_address+0x79/0xf0 kernel/extable.c:152
>  __kernel_text_address+0xd/0x40 kernel/extable.c:107
>  unwind_get_return_address+0x61/0xa0 arch/x86/kernel/unwind_frame.c:18
>  __save_stack_trace+0x7e/0xd0 arch/x86/kernel/stacktrace.c:45
>  save_stack_trace+0x1a/0x20 arch/x86/kernel/stacktrace.c:60
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520
>  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527
>  __cache_free mm/slab.c:3486 [inline]
>  kmem_cache_free+0x86/0x2d0 mm/slab.c:3744
>  kfree_skbmem+0x13c/0x210 net/core/skbuff.c:582
>  __kfree_skb net/core/skbuff.c:642 [inline]
>  consume_skb+0x193/0x550 net/core/skbuff.c:701
>  sctp_chunk_destroy net/sctp/sm_make_chunk.c:1477 [inline]
>  sctp_chunk_put+0x2c0/0x440 net/sctp/sm_make_chunk.c:1504
>  sctp_chunk_free+0x53/0x60 net/sctp/sm_make_chunk.c:1491
>  sctp_packet_pack net/sctp/output.c:489 [inline]
>  sctp_packet_transmit+0x142e/0x3ba0 net/sctp/output.c:610
>  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>  </IRQ>
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
> [inline]
> RIP: 0010:console_unlock+0xcdf/0x1100 kernel/printk/printk.c:2403
> RSP: 0018:ffff8801946eec00 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff12
> RAX: 0000000000040000 RBX: 0000000000000200 RCX: ffffc90002ee8000
> RDX: 0000000000004461 RSI: ffffffff815f3446 RDI: 0000000000000212
> RBP: ffff8801946eed68 R08: ffff8801b2f62c38 R09: 0000000000000006
> R10: ffff8801b2f623c0 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffff84b84430 R14: 0000000000000001 R15: dffffc0000000000
>  vprintk_emit+0x6ad/0xdd0 kernel/printk/printk.c:1907
>  vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
>  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:379
>  printk+0x9e/0xba kernel/printk/printk.c:1980
>  sctp_getsockopt_maxburst net/sctp/socket.c:6265 [inline]
>  sctp_getsockopt.cold.34+0x11d/0x14c net/sctp/socket.c:7240
>  sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2998
>  __sys_getsockopt+0x1a5/0x370 net/socket.c:1940
>  SYSC_getsockopt net/socket.c:1951 [inline]
>  SyS_getsockopt+0x34/0x50 net/socket.c:1948
>  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455279
> RSP: 002b:00007f5c0c0f2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
> RAX: ffffffffffffffda RBX: 00007f5c0c0f36d4 RCX: 0000000000455279
> RDX: 0000000000000014 RSI: 0000000000000084 RDI: 0000000000000014
> RBP: 000000000072bea0 R08: 0000000020000240 R09: 0000000000000000
> R10: 0000000020000140 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 000000000000012b R14: 00000000006f4ca8 R15: 0000000000000000
> Code: 0f 85 dc 32 00 00 8b 0d b7 9e 8b 07 85 c9 0f 84 62 f7 ff ff 48 b8 00
> 00 00 00 00 fc ff df 48 8b 54 24 78 48 c1 ea 03 80 3c 02 00 <0f> 85 0b 31 00
> 00 48 8b 94 24 88 00 00 00 4d 89 b3 68 08 00 00
> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 3.007
> msecs
> INFO: task kworker/u4:4:688 blocked for more than 120 seconds.
>       Not tainted 4.16.0+ #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u4:4    D19560   688      2 0x80000000
> Workqueue: events_unbound fsnotify_mark_destroy_workfn
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x807/0x1e40 kernel/sched/core.c:3490
>  schedule+0xef/0x430 kernel/sched/core.c:3549
>  schedule_timeout+0x1b5/0x240 kernel/time/timer.c:1777
>  do_wait_for_common kernel/sched/completion.c:83 [inline]
>  __wait_for_common kernel/sched/completion.c:104 [inline]
>  wait_for_common kernel/sched/completion.c:115 [inline]
>  wait_for_completion+0x3e7/0x870 kernel/sched/completion.c:136
>  __synchronize_srcu+0x189/0x240 kernel/rcu/srcutree.c:924
>  synchronize_srcu+0x408/0x54f kernel/rcu/srcutree.c:1002
>  fsnotify_mark_destroy_workfn+0x1aa/0x530 fs/notify/mark.c:759
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
>
> Showing all locks held in the system:
> 2 locks held by kworker/u4:4/688:
>  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:
> __write_once_size include/linux/compiler.h:215 [inline]
>  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:
> arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:
> atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
>  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:
> atomic_long_set include/asm-generic/atomic-long.h:57 [inline]
>  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:
> set_work_data kernel/workqueue.c:617 [inline]
>  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:
> set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:
> process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
>  #1: 000000004c7e11cf ((reaper_work).work){+.+.}, at:
> process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
> 2 locks held by khungtaskd/878:
>  #0: 000000001cc267e2 (rcu_read_lock){....}, at:
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>  #0: 000000001cc267e2 (rcu_read_lock){....}, at: watchdog+0x1ff/0xf60
> kernel/hung_task.c:249
>  #1: 000000002f71223f (tasklist_lock){.+.+}, at:
> debug_show_all_locks+0xde/0x34a kernel/locking/lockdep.c:4470
> 2 locks held by kworker/1:2/1980:
>  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at: __write_once_size
> include/linux/compiler.h:215 [inline]
>  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at: arch_atomic64_set
> arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at: atomic64_set
> include/asm-generic/atomic-instrumented.h:40 [inline]
>  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at: atomic_long_set
> include/asm-generic/atomic-long.h:57 [inline]
>  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at: set_work_data
> kernel/workqueue.c:617 [inline]
>  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at:
> set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at:
> process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
>  #1: 000000008e69c2e7 (xfrm_state_gc_work){+.+.}, at:
> process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
> 1 lock held by rsyslogd/4365:
>  #0: 00000000ef321630 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1a9/0x1e0
> fs/file.c:766
> 2 locks held by getty/4456:
>  #0: 0000000044e43f49 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
> drivers/tty/tty_ldsem.c:365
>  #1: 0000000093b079e0 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4457:
>  #0: 000000005b25b55f (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
> drivers/tty/tty_ldsem.c:365
>  #1: 0000000099955ee5 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4458:
>  #0: 0000000050f5738d (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
> drivers/tty/tty_ldsem.c:365
>  #1: 00000000cccc0402 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4459:
>  #0: 00000000ddecfb5c (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
> drivers/tty/tty_ldsem.c:365
>  #1: 000000003c071f3a (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4460:
>  #0: 000000003bec706e (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
> drivers/tty/tty_ldsem.c:365
>  #1: 00000000dd28453c (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4461:
>  #0: 00000000313fc55a (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
> drivers/tty/tty_ldsem.c:365
>  #1: 00000000e9766156 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4462:
>  #0: 000000003212bb13 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
> drivers/tty/tty_ldsem.c:365
>  #1: 0000000065fa2f73 (&ldata->atomic_read_lock){+.+.}, at:
> n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
>
> =============================================
>
> NMI backtrace for cpu 1
> CPU: 1 PID: 878 Comm: khungtaskd Not tainted 4.16.0+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x1b9/0x29f lib/dump_stack.c:53
>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
>  check_hung_task kernel/hung_task.c:132 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
>  watchdog+0xc10/0xf60 kernel/hung_task.c:249
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 10218 Comm: syz-executor3 Not tainted 4.16.0+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:debug_lockdep_rcu_enabled.part.1+0xb/0x60 kernel/rcu/update.c:298
> RSP: 0018:ffff8801db0062e8 EFLAGS: 00000202
> RAX: dffffc0000000000 RBX: 1ffff1003b600c61 RCX: ffffffff86583d15
> RDX: 0000000000000004 RSI: ffffffff86583d65 RDI: ffffffff88e6cc40
> RBP: ffff8801db0062f8 R08: ffff8801b2f623c0 R09: ffffed003b6046c2
> R10: ffffed003b6046c2 R11: ffff8801db023613 R12: ffff8801c3d740c0
> R13: 0000000000000001 R14: 0000000000010000 R15: ffff88019cc0a900
> FS:  00007f5c0c0f3700(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffe57df8fa8 CR3: 00000001d7124000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <IRQ>
>  rcu_read_unlock include/linux/rcupdate.h:684 [inline]
>  ip6_mtu+0x36a/0x590 net/ipv6/route.c:2420
>  dst_mtu include/net/dst.h:210 [inline]
>  ip6_xmit+0xb42/0x23f0 net/ipv6/ip6_output.c:262
>  sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225
>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
>  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>  </IRQ>
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
> [inline]
> RIP: 0010:console_unlock+0xcdf/0x1100 kernel/printk/printk.c:2403
> RSP: 0018:ffff8801946eec00 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff12
> RAX: 0000000000040000 RBX: 0000000000000200 RCX: ffffc90002ee8000
> RDX: 0000000000004461 RSI: ffffffff815f3446 RDI: 0000000000000212
> RBP: ffff8801946eed68 R08: ffff8801b2f62c38 R09: 0000000000000006
> R10: ffff8801b2f623c0 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffff84b84430 R14: 0000000000000001 R15: dffffc0000000000
>  vprintk_emit+0x6ad/0xdd0 kernel/printk/printk.c:1907
>  vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
>  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:379
>  printk+0x9e/0xba kernel/printk/printk.c:1980
>  sctp_getsockopt_maxburst net/sctp/socket.c:6265 [inline]
>  sctp_getsockopt.cold.34+0x11d/0x14c net/sctp/socket.c:7240
>  sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2998
>  __sys_getsockopt+0x1a5/0x370 net/socket.c:1940
>  SYSC_getsockopt net/socket.c:1951 [inline]
>  SyS_getsockopt+0x34/0x50 net/socket.c:1948
>  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455279
> RSP: 002b:00007f5c0c0f2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
> RAX: ffffffffffffffda RBX: 00007f5c0c0f36d4 RCX: 0000000000455279
> RDX: 0000000000000014 RSI: 0000000000000084 RDI: 0000000000000014
> RBP: 000000000072bea0 R08: 0000000020000240 R09: 0000000000000000
> R10: 0000000020000140 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 000000000000012b R14: 00000000006f4ca8 R15: 0000000000000000
> Code: e9 c3 fd ff ff 48 8b 7d c8 e8 52 6c 50 00 e9 9c fd ff ff 0f 1f 00 66
> 2e 0f 1f 84 00 00 00 00 00 48 b8 00 00 00 00 00 fc ff df 55 <48> 89 e5 53 65
> 48 8b 1c 25 c0 ed 01 00 48 8d bb 74 08 00 00 48
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/00000000000060115b056b14b1fb%40google.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: INFO: rcu detected stall in sctp_generate_heartbeat_event
From: Dmitry Vyukov @ 2018-05-26 15:36 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: syzbot, David Miller, LKML, linux-sctp, netdev, Neil Horman,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <20180508120643.GM4977@localhost.localdomain>

On Tue, May 8, 2018 at 2:06 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Tue, May 08, 2018 at 12:35:02AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    90278871d4b0 Merge git://git.kernel.org/pub/scm/linux/kern..
>> git tree:       net-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=119a7237800000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=aea320d3af5ef99d
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e4a5bbd54260c93014f9
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>
> A reproducer will be welcomed. With just these traces, I don't think
> we have enough information.


#syz fix: sctp: not allow to set rto_min with a value below 200 msecs

^ permalink raw reply

* Re: INFO: rcu detected stall in kfree_skbmem
From: Dmitry Vyukov @ 2018-05-26 15:34 UTC (permalink / raw)
  To: Xin Long
  Cc: Neil Horman, Marcelo Ricardo Leitner, syzbot, Vladislav Yasevich,
	linux-sctp, Andrei Vagin, David Miller, Kirill Tkhai, LKML,
	netdev, syzkaller-bugs
In-Reply-To: <CADvbK_crGBk-q_910r-xdh2p=xnxiV=1EExLSX-ecddFwMag6w@mail.gmail.com>

On Mon, May 14, 2018 at 8:04 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Mon, May 14, 2018 at 9:34 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> On Fri, May 11, 2018 at 12:00:38PM +0200, Dmitry Vyukov wrote:
>>> On Mon, Apr 30, 2018 at 8:09 PM, syzbot
>>> <syzbot+fc78715ba3b3257caf6a@syzkaller.appspotmail.com> wrote:
>>> > Hello,
>>> >
>>> > syzbot found the following crash on:
>>> >
>>> > HEAD commit:    5d1365940a68 Merge
>>> > git://git.kernel.org/pub/scm/linux/kerne...
>>> > git tree:       net-next
>>> > console output: https://syzkaller.appspot.com/x/log.txt?id=5667997129637888
>>> > kernel config:
>>> > https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
>>> > dashboard link: https://syzkaller.appspot.com/bug?extid=fc78715ba3b3257caf6a
>>> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>> >
>>> > Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> This looks sctp-related, +sctp maintainers.
>>>
>> Looking at the entire trace, it appears that we are getting caught in the
>> kfree_skb that is getting triggered in enqueue_to_backlog which occurs when our
>> rx backlog list grows over netdev_max_backlog packets.  That suggests to me that
> It might be a long skb->frag_list that made kfree_skb slow when packing
> lots of small chunks to go through lo device?
>
>> whatever test(s) is/are causing this trace are queuing up a large number of
>> frames to be sent over the loopback interface, and are never/rarely getting
>> received.  Looking up higher in the stack, in the sctp_generate_heartbeat_event
>> function, we (in addition to the rcu_read_lock in sctp_v6_xmit) we also hold the
>> socket lock during the entirety of the xmit operaion.  Is it possible that we
>> are just enqueuing so many frames for xmit that we are blocking progress of
>> other threads using the same socket that we cross the RCU self detected stall
>> boundary?  While its not a fix per se, it might be a worthwhile test to limit
>> the number of frames we flush in a single pass.
>>
>> Neil
>>
>>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> > Reported-by: syzbot+fc78715ba3b3257caf6a@syzkaller.appspotmail.com
>>> >
>>> > INFO: rcu_sched self-detected stall on CPU
>>> >         1-...!: (1 GPs behind) idle=a3e/1/4611686018427387908
>>> > softirq=71980/71983 fqs=33
>>> >          (t=125000 jiffies g=39438 c=39437 q=958)
>>> > rcu_sched kthread starved for 124829 jiffies! g39438 c39437 f0x0
>>> > RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
>>> > RCU grace-period kthread stack dump:
>>> > rcu_sched       R  running task    23768     9      2 0x80000000
>>> > Call Trace:
>>> >  context_switch kernel/sched/core.c:2848 [inline]
>>> >  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
>>> >  schedule+0xef/0x430 kernel/sched/core.c:3549
>>> >  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
>>> >  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
>>> >  kthread+0x345/0x410 kernel/kthread.c:238
>>> >  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
>>> > NMI backtrace for cpu 1
>>> > CPU: 1 PID: 20560 Comm: syz-executor4 Not tainted 4.16.0+ #1
>>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>>> > Google 01/01/2011
>>> > Call Trace:
>>> >  <IRQ>
>>> >  __dump_stack lib/dump_stack.c:77 [inline]
>>> >  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>> >  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>>> >  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>>> >  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>> >  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>> >  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>>> >  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>>> >  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>>> >  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>>> >  rcu_pending kernel/rcu/tree.c:3401 [inline]
>>> >  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>>> >  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>>> >  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
>>> >  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
>>> >  __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
>>> >  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
>>> >  hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
>>> >  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>> >  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>>> >  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>>> > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>>> > [inline]
>>> > RIP: 0010:kmem_cache_free+0xb3/0x2d0 mm/slab.c:3757
>>> > RSP: 0018:ffff8801db105228 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
>>> > RAX: 0000000000000007 RBX: ffff8800b055c940 RCX: 1ffff1003b2345a5
>>> > RDX: 0000000000000000 RSI: ffff8801d91a2d80 RDI: 0000000000000282
>>> > RBP: ffff8801db105248 R08: ffff8801d91a2cb8 R09: 0000000000000002
>>> > R10: ffff8801d91a2480 R11: 0000000000000000 R12: ffff8801d9848e40
>>> > R13: 0000000000000282 R14: ffffffff85b7f27c R15: 0000000000000000
>>> >  kfree_skbmem+0x13c/0x210 net/core/skbuff.c:582
>>> >  __kfree_skb net/core/skbuff.c:642 [inline]
>>> >  kfree_skb+0x19d/0x560 net/core/skbuff.c:659
>>> >  enqueue_to_backlog+0x2fc/0xc90 net/core/dev.c:3968
>>> >  netif_rx_internal+0x14d/0xae0 net/core/dev.c:4181
>>> >  netif_rx+0xba/0x400 net/core/dev.c:4206
>>> >  loopback_xmit+0x283/0x741 drivers/net/loopback.c:91
>>> >  __netdev_start_xmit include/linux/netdevice.h:4087 [inline]
>>> >  netdev_start_xmit include/linux/netdevice.h:4096 [inline]
>>> >  xmit_one net/core/dev.c:3053 [inline]
>>> >  dev_hard_start_xmit+0x264/0xc10 net/core/dev.c:3069
>>> >  __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
>>> >  dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
>>> >  neigh_hh_output include/net/neighbour.h:472 [inline]
>>> >  neigh_output include/net/neighbour.h:480 [inline]
>>> >  ip6_finish_output2+0x134e/0x2810 net/ipv6/ip6_output.c:120
>>> >  ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154
>>> >  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
>>> >  ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171
>>> >  dst_output include/net/dst.h:444 [inline]
>>> >  NF_HOOK include/linux/netfilter.h:288 [inline]
>>> >  ip6_xmit+0xf51/0x23f0 net/ipv6/ip6_output.c:277
>>> >  sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225
>>> >  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650
>>> >  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
>>> >  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>>> >  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>>> >  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>>> >  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>>> >  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406


#syz fix: sctp: not allow to set rto_min with a value below 200 msecs


>>> >  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>>> >  expire_timers kernel/time/timer.c:1363 [inline]
>>> >  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>>> >  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>>> >  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>>> >  invoke_softirq kernel/softirq.c:365 [inline]
>>> >  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>>> >  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>>> >  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>>> >  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>>> >  </IRQ>
>>> > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>>> > [inline]
>>> > RIP: 0010:lock_release+0x4d4/0xa10 kernel/locking/lockdep.c:3942
>>> > RSP: 0018:ffff8801971ce7b0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
>>> > RAX: dffffc0000000000 RBX: 1ffff10032e39cfb RCX: 1ffff1003b234595
>>> > RDX: 1ffffffff11630ed RSI: 0000000000000002 RDI: 0000000000000282
>>> > RBP: ffff8801971ce8e0 R08: 1ffff10032e39cff R09: ffffed003b6246c2
>>> > R10: 0000000000000003 R11: 0000000000000001 R12: ffff8801d91a2480
>>> > R13: ffffffff88b8df60 R14: ffff8801d91a2480 R15: ffff8801971ce7f8
>>> >  rcu_lock_release include/linux/rcupdate.h:251 [inline]
>>> >  rcu_read_unlock include/linux/rcupdate.h:688 [inline]
>>> >  __unlock_page_memcg+0x72/0x100 mm/memcontrol.c:1654
>>> >  unlock_page_memcg+0x2c/0x40 mm/memcontrol.c:1663
>>> >  page_remove_file_rmap mm/rmap.c:1248 [inline]
>>> >  page_remove_rmap+0x6f2/0x1250 mm/rmap.c:1299
>>> >  zap_pte_range mm/memory.c:1337 [inline]
>>> >  zap_pmd_range mm/memory.c:1441 [inline]
>>> >  zap_pud_range mm/memory.c:1470 [inline]
>>> >  zap_p4d_range mm/memory.c:1491 [inline]
>>> >  unmap_page_range+0xeb4/0x2200 mm/memory.c:1512
>>> >  unmap_single_vma+0x1a0/0x310 mm/memory.c:1557
>>> >  unmap_vmas+0x120/0x1f0 mm/memory.c:1587
>>> >  exit_mmap+0x265/0x570 mm/mmap.c:3038
>>> >  __mmput kernel/fork.c:962 [inline]
>>> >  mmput+0x251/0x610 kernel/fork.c:983
>>> >  exit_mm kernel/exit.c:544 [inline]
>>> >  do_exit+0xe98/0x2730 kernel/exit.c:852
>>> >  do_group_exit+0x16f/0x430 kernel/exit.c:968
>>> >  get_signal+0x886/0x1960 kernel/signal.c:2469
>>> >  do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
>>> >  exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
>>> >  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>>> >  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>>> >  do_syscall_64+0x792/0x9d0 arch/x86/entry/common.c:292
>>> >  entry_SYSCALL_64_after_hwframe+0x42/0xb7
>>> > RIP: 0033:0x455319
>>> > RSP: 002b:00007fa346e81ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
>>> > RAX: fffffffffffffe00 RBX: 000000000072bf80 RCX: 0000000000455319
>>> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bf80
>>> > RBP: 000000000072bf80 R08: 0000000000000000 R09: 000000000072bf58
>>> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>> > R13: 0000000000a3e81f R14: 00007fa346e829c0 R15: 0000000000000001
>>> >
>>> >
>>> > ---
>>> > This bug is generated by a bot. It may contain errors.
>>> > See https://goo.gl/tpsmEJ for more information about syzbot.
>>> > syzbot engineers can be reached at syzkaller@googlegroups.com.
>>> >
>>> > syzbot will keep track of this bug report.
>>> > If you forgot to add the Reported-by tag, once the fix for this bug is
>>> > merged
>>> > into any tree, please reply to this email with:
>>> > #syz fix: exact-commit-title
>>> > To mark this as a duplicate of another syzbot report, please reply with:
>>> > #syz dup: exact-subject-of-another-report
>>> > If it's a one-off invalid bug report, please reply with:
>>> > #syz invalid
>>> > Note: if the crash happens again, it will cause creation of a new bug
>>> > report.
>>> > Note: all commands must start from beginning of the line in the email body.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google Groups
>>> > "syzkaller-bugs" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send an
>>> > email to syzkaller-bugs+unsubscribe@googlegroups.com.
>>> > To view this discussion on the web visit
>>> > https://groups.google.com/d/msgid/syzkaller-bugs/000000000000a9b0e3056b14bfb2%40google.com.
>>> > For more options, visit https://groups.google.com/d/optout.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: INFO: rcu detected stall in sctp_packet_transmit
From: Dmitry Vyukov @ 2018-05-26 15:34 UTC (permalink / raw)
  To: Xin Long
  Cc: syzbot, davem, LKML, linux-sctp, Marcelo Ricardo Leitner,
	network dev, Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <CACT4Y+aG0CdZOe8+Y5+OWsghV2o36UpgAXFE4cCm-Mmv6Cq0oA@mail.gmail.com>

On Wed, May 16, 2018 at 2:12 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, May 16, 2018 at 1:02 PM, Xin Long <lucien.xin@gmail.com> wrote:
>>>> <syzbot+ff0b569fb5111dcd1a36@syzkaller.appspotmail.com> wrote:
>>>>> Hello,
>>>>>
>>>>> syzbot found the following crash on:
>>>>>
>>>>> HEAD commit:    961423f9fcbc Merge branch 'sctp-Introduce-sctp_flush_ctx'
>>>>> git tree:       net-next
>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1366aea7800000
>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=51fb0a6913f757db
>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=ff0b569fb5111dcd1a36
>>>>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>>>>
>>>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>>>
>>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>>> Reported-by: syzbot+ff0b569fb5111dcd1a36@syzkaller.appspotmail.com
>>>>>
>>>>> INFO: rcu_sched self-detected stall on CPU
>>>>>         0-....: (1 GPs behind) idle=dae/1/4611686018427387908
>>>>> softirq=93090/93091 fqs=30902
>>>>>          (t=125000 jiffies g=51107 c=51106 q=972)
>>>>> NMI backtrace for cpu 0
>>>>> CPU: 0 PID: 24668 Comm: syz-executor6 Not tainted 4.17.0-rc4+ #44
>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>>>>> Google 01/01/2011
>>>>> Call Trace:
>>>>>  <IRQ>
>>>>>  __dump_stack lib/dump_stack.c:77 [inline]
>>>>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>>>>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>>>>>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>>>>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>>>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>>>>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>>>>>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>>>>>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>>>>>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>>>>>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>>>>>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>>>>>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>>>>>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>>>>>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>>>>>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>>>>>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
>>>>>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>>>>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>>>>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>>>>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>>>> RIP: 0010:sctp_v6_xmit+0x259/0x6b0 net/sctp/ipv6.c:219
>>>>> RSP: 0018:ffff8801dae068e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>>>>> RAX: 0000000000000007 RBX: ffff8801bb7ec800 RCX: ffffffff86f1b345
>>>>> RDX: 0000000000000000 RSI: ffffffff86f1b381 RDI: ffff8801b73d97c4
>>>>> RBP: ffff8801dae06988 R08: ffff88019505c300 R09: ffffed003b5c46c2
>>>>> R10: ffffed003b5c46c2 R11: ffff8801dae23613 R12: ffff88011fd57300
>>>>> R13: ffff8801bb7ecec8 R14: 0000000000000029 R15: 0000000000000002
>>>>>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
>>>>>  sctp_outq_flush_transports net/sctp/outqueue.c:1164 [inline]
>>>>>  sctp_outq_flush+0x5f5/0x3430 net/sctp/outqueue.c:1212
>>>>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>>>>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>>>>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>>>>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>>>>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>>>> Shocks, this timer event again. Can we try to minimize the repo.syz and
>>>> get a short script, not neccessary to reproduce the issue 100%. we need
>>>> to know what it was doing when this happened.
>>>>
>>>> Thanks.
>>>
>>> It's possible to reply the whole log from console output following
>>> these instructions:
>>> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md
>> Thanks, it's running now.
>> Usually how long will it take to finish running this 5000-line log?
>
> If you run with -repeat=0 then it will run infinitely repeating the
> log again and again. If you see:
>
> parsed 1000 programs
> ...
> executed 5000 programs
>
> then it looped 5 times already. You can run with -repeat=10.
>
> syzbot has tried replaying the log, but for some reason it wasn't able
> to reproduce the crash (maybe accumulated state, or maybe it crashed
> in a different way). You can also try logs from other sctp hangs.


#syz fix: sctp: not allow to set rto_min with a value below 200 msecs

^ permalink raw reply

* Re: INFO: rcu detected stall in ip_route_output_key_hash
From: Dmitry Vyukov @ 2018-05-26 15:32 UTC (permalink / raw)
  To: syzbot
  Cc: David Miller, Alexey Kuznetsov, LKML, netdev, syzkaller-bugs,
	Hideaki YOSHIFUJI, linux-sctp
In-Reply-To: <000000000000e2e064056c5460e3@google.com>

On Wed, May 16, 2018 at 5:29 PM, syzbot
<syzbot+769a7ccbbb4b5074f125@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    0b7d9978406f Merge branch 'Microsemi-Ocelot-Ethernet-switc..
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1138c477800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b632d8e2c2ab2c1
> dashboard link: https://syzkaller.appspot.com/bug?extid=769a7ccbbb4b5074f125
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+769a7ccbbb4b5074f125@syzkaller.appspotmail.com
>
> netlink: 4 bytes leftover after parsing attributes in process
> `syz-executor2'.
> random: crng init done
> INFO: rcu_sched self-detected stall on CPU
>         1-...!: (121515 ticks this GP) idle=e7e/1/4611686018427387908
> softirq=31362/31362 fqs=7
>          (t=125000 jiffies g=16439 c=16438 q=668508)
> rcu_sched kthread starved for 124958 jiffies! g16439 c16438 f0x2
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
> RCU grace-period kthread stack dump:
> rcu_sched       R  running task    23768     9      2 0x80000000
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
>  schedule+0xef/0x430 kernel/sched/core.c:3549
>  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
>  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> NMI backtrace for cpu 1
> CPU: 1 PID: 4488 Comm: syz-fuzzer Not tainted 4.17.0-rc4+ #45
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
> RIP: 0010:rcu_is_watching+0x6/0x140 kernel/rcu/tree.c:1071
> RSP: 0000:ffff8801daf06620 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
> RAX: ffff8801ad526240 RBX: 0000000000000000 RCX: ffffffff86444456
> RDX: 0000000000000100 RSI: ffffffff864444b8 RDI: 0000000000000001
> RBP: ffff8801daf06628 R08: ffff8801ad526240 R09: 0000000000000002
> R10: ffff8801ad526240 R11: 0000000000000000 R12: 1ffff1003b5e0cca
> R13: ffff88008ff1a100 R14: 0000000000000000 R15: ffff8801daf066d0
>  rcu_read_unlock include/linux/rcupdate.h:684 [inline]
>  ip_route_output_key_hash+0x2cd/0x390 net/ipv4/route.c:2303
>  __ip_route_output_key include/net/route.h:124 [inline]
>  ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2557
>  ip_route_output_key include/net/route.h:134 [inline]
>  sctp_v4_get_dst+0x50e/0x17a0 net/sctp/protocol.c:447
>  sctp_transport_route+0x132/0x360 net/sctp/transport.c:303
>  sctp_packet_config+0x926/0xdd0 net/sctp/output.c:118
>  sctp_outq_select_transport+0x2bb/0x9c0 net/sctp/outqueue.c:877
>  sctp_outq_flush_ctrl.constprop.12+0x2ad/0xe60 net/sctp/outqueue.c:911
>  sctp_outq_flush+0x2ef/0x3430 net/sctp/outqueue.c:1203
>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406

#syz fix: sctp: not allow to set rto_min with a value below 200 msecs

>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>  </IRQ>
> RIP: 0033:0x40b55d
> RSP: 002b:000000c424bedca8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
> RAX: 000000c4244e5470 RBX: 000000004d36768c RCX: 0000000000000000
> RDX: 000000c4244e5470 RSI: 000000000000ffff RDI: 0000000000000000
> RBP: 000000c424bedcc0 R08: 0000000000000000 R09: 0000000000000000
> R10: 00000000009466f2 R11: 0000000000000004 R12: 0000000000000000
> R13: 0000000000000020 R14: 0000000000000013 R15: 0000000000000034
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 125s!
> BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 125s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
>     in-flight: 2104:do_numa_crng_init
>     pending: drbg_async_seed, cache_reap
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=9/256
>     pending: defense_work_handler, defense_work_handler,
> defense_work_handler, defense_work_handler, defense_work_handler,
> defense_work_handler, check_corruption, vmstat_shepherd, cache_reap
> workqueue events_power_efficient: flags=0x80
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>     pending: do_cache_clean
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
>     pending: check_lifetime, gc_worker
> workqueue writeback: flags=0x4e
>   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256
>     in-flight: 6:wb_workfn
> workqueue ipv6_addrconf: flags=0x40008
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
>     pending: addrconf_verify_work
> workqueue krdsd: flags=0xe000a
>   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1
>     in-flight: 43:rds_connect_worker
> pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 24 1980 18
> pool 4: cpus=0-1 flags=0x4 nice=0 hung=0s workers=7 idle: 22 8018 287 6751
> 89
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/000000000000e2e064056c5460e3%40google.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: INFO: rcu detected stall in corrupted
From: Dmitry Vyukov @ 2018-05-26 15:28 UTC (permalink / raw)
  To: Xin Long
  Cc: Marcelo Ricardo Leitner, Eric Dumazet, David Miller,
	syzbot+f116bc1994efe725d51b, kuznet, LKML, network dev,
	syzkaller-bugs, yoshfuji, dsahern, Roopa Prabhu, linux-sctp
In-Reply-To: <CADvbK_dMq0c9wBHoNbLgXj3ee-Ua1EQiqPgvYxX6pumCpO=ygw@mail.gmail.com>

On Thu, May 24, 2018 at 11:02 AM, Xin Long <lucien.xin@gmail.com> wrote:
> On Thu, May 24, 2018 at 7:13 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
>> On Mon, May 21, 2018 at 11:13:46AM -0700, Eric Dumazet wrote:
>>>
>>>
>>> On 05/21/2018 11:09 AM, David Miller wrote:
>>> > From: syzbot <syzbot+f116bc1994efe725d51b@syzkaller.appspotmail.com>
>>> > Date: Mon, 21 May 2018 11:05:02 -0700
>>> >
>>> >>  find_match+0x244/0x13a0 net/ipv6/route.c:691
>>> >>  find_rr_leaf net/ipv6/route.c:729 [inline]
>>> >>  rt6_select net/ipv6/route.c:779 [inline]
>>> >
>>> > Hmmm, endless loop in find_rr_leaf or similar?
>>> >
>>>
>>>
>>> I do not think so, this really looks like SCTP specific
>>> , we now have dozens of traces all sharing :
>>>
>>>  sctp_transport_route+0xad/0x450 net/sctp/transport.c:293
>>>  sctp_packet_config+0xb89/0xfd0 net/sctp/output.c:123
>>>  sctp_outq_flush+0x79c/0x4370 net/sctp/outqueue.c:894
>>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>>>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>>>
>>>
>>> Some kind of infinite loop.
>>>
>>> When the hrtimer fires, it can point to any code that sits below but does not necessarily have a bug.
>>
>> Agreed. Xin Long identified the root cause. syzkaller is setting too
>> aggressive parameters to SCTP RTO, leading to issues with the
>> heartbeat timer.
> Right, I will prepare a fix soon with your suggestion rto_min value "HZ/5"
> Thanks.

#syz fix: sctp: not allow to set rto_min with a value below 200 msecs

^ permalink raw reply

* Proposal
From: Miss Zeliha Omer Faruk @ 2018-05-26 15:25 UTC (permalink / raw)





Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey

^ permalink raw reply

* [PATCH] rsi: fix spelling mistake "Uknown" -> "Unknown"
From: Colin King @ 2018-05-26 15:00 UTC (permalink / raw)
  To: Kalle Valo, David S . Miller, linux-wireless, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in rsi_dbg message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/rsi/rsi_91x_mac80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rsi/rsi_91x_mac80211.c b/drivers/net/wireless/rsi/rsi_91x_mac80211.c
index bfa7569c85bb..2ca7464b7fa3 100644
--- a/drivers/net/wireless/rsi/rsi_91x_mac80211.c
+++ b/drivers/net/wireless/rsi/rsi_91x_mac80211.c
@@ -1103,7 +1103,7 @@ static int rsi_mac80211_ampdu_action(struct ieee80211_hw *hw,
 		break;
 
 	default:
-		rsi_dbg(ERR_ZONE, "%s: Uknown AMPDU action\n", __func__);
+		rsi_dbg(ERR_ZONE, "%s: Unknown AMPDU action\n", __func__);
 		break;
 	}
 
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2] netfilter: properly initialize xt_table_info structure
From: Greg Kroah-Hartman @ 2018-05-26 14:54 UTC (permalink / raw)
  To: Florian Westphal, Peter Pi
  Cc: Jan Engelhardt, Eric Dumazet, Greg Hackmann, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Michal Kubecek, netfilter-devel, coreteam,
	netdev
In-Reply-To: <20180518092756.odlyvxcpgbuistqq@breakpoint.cc>

On Fri, May 18, 2018 at 11:27:56AM +0200, Florian Westphal wrote:
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > On Thu, May 17, 2018 at 12:42:00PM +0200, Jan Engelhardt wrote:
> > > 
> > > On Thursday 2018-05-17 12:09, Greg Kroah-Hartman wrote:
> > > >> > --- a/net/netfilter/x_tables.c
> > > >> > +++ b/net/netfilter/x_tables.c
> > > >> > @@ -1183,11 +1183,10 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
> > > >> >  	 * than shoot all processes down before realizing there is nothing
> > > >> >  	 * more to reclaim.
> > > >> >  	 */
> > > >> > -	info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> > > >> > +	info = kvzalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> > > >> >  	if (!info)
> > > >> >  		return NULL;
> > > >>
> > > >> I am curious, what particular path does not later overwrite the whole zone ?
> > > >
> > > >In do_ipt_get_ctl, the IPT_SO_GET_ENTRIES: option uses a len value that
> > > >can be larger than the size of the structure itself.
> > > >
> > > >Then the data is copied to userspace in copy_entries_to_user() for ipv4
> > > >and v6, and that's where the "bad data"
> > > 
> > > If the kernel incorrectly copies more bytes than it should, isn't that
> > > a sign that may be going going past the end of the info buffer?
> > > (And thus, zeroing won't truly fix the issue)
> > 
> > No, the buffer size is correct, we just aren't filling up the whole
> > buffer as the data requested is smaller than the buffer size.
> 
> I have no objections to the patch but I'd like to understand what
> problem its fixing.
> 
> Normal pattern is:
> newinfo = xt_alloc_table_info(tmp.size);
> copy_from_user(newinfo->entries, user + sizeof(tmp), tmp.size);
> 
> So inital value of the rule blob area should not matter.
> 
> Furthermore, when copying the rule blob back to userspace,
> the kernel is not supposed to copy any padding back to userspace either,
> since commit f32815d21d4d8287336fb9cef4d2d9e0866214c2 only the
> user-relevant parts should be copied (some matches and targets allocate
> kernel-private data such as pointers, and we did use to leak such pointer
> values back to userspace).

Adding Peter to this thread, as he originally reported this issue to
Google back in February.

Peter, I know you reported this against the 4.4 kernel tree, but since
then, commit f32815d21d4d ("xtables: add xt_match, xt_target and data
copy_to_user functions") has been added to the kernel in release 4.11.
In digging through this crazy code path, I think the issue is still
there, but can not verify it for sure.

Is there any way you can run your tests on the 4.14 or newer kernel tree
to see if this issue really is fixed or not?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net-next 6/7] net: bridge: Notify about bridge VLANs
From: Vivien Didelot @ 2018-05-26 14:54 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, devel, bridge, jiri, idosch, davem, razvan.stefanescu,
	gregkh, stephen, andrew, f.fainelli, nikolay
In-Reply-To: <wihwovrfz62.fsf@dev-r-vrt-156.mtr.labs.mlnx>

Hi Petr,

Petr Machata <petrm@mellanox.com> writes:

> Vivien Didelot <vivien.didelot@savoirfairelinux.com> writes:
>
>>> +	} else {
>>> +		err = br_switchdev_port_obj_add(dev, v->vid, flags);
>>> +		if (err && err != -EOPNOTSUPP)
>>> +			goto out;
>>>  	}
>>
>> Except that br_switchdev_port_obj_add taking vid and flags arguments
>> seems confusing to me, the change looks good:
>
> I'm not sure what you're aiming at. Both VID and flags are sent with the
> notification, so they need to be passed on to the function somehow. Do
> you have a counterproposal for the API?

I'm only questioning the code organization here, not the functional
aspect which I do agree with. What I'm saying is that you name a new
switchdev helper br_switchdev_port_OBJ_add, which takes VLAN arguments
(vid and flags.) How would you call another eventual helper taking MDB
arguments, br_switchdev_port_OBJ_add again? So something like
br_switchdev_port_VLAN_add would be more intuitive.

At the same time there's an effort to centralize all switchdev helpers
of the bridge layer (i.e. the software -> hardware bridge calls) into
net/bridge/br_switchdev.c, so that file would be more adequate.

You may discard my comments but I think it'd be beneficial to us all to
finally keep a bit of consistency in that bridge layer code.


Thanks,

        Vivien

^ permalink raw reply

* RE: [PATCH, net-next] net/mlx5e: fix TLS dependency
From: Boris Pismenny @ 2018-05-26 14:17 UTC (permalink / raw)
  To: Saeed Mahameed, davem@davemloft.net, arnd@arndb.de,
	leon@kernel.org
  Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	Or Gerlitz, Feras Daoud, netdev@vger.kernel.org
In-Reply-To: <ffcbcc97b72bd9957a7eafb2553edd1cd7e68dab.camel@mellanox.com>

Acked-by: Boris Pismenny <borisp@mellanox.com>

Thank you.

> -----Original Message-----
> From: Saeed Mahameed
> Sent: Saturday, May 26, 2018 2:19 AM
> To: davem@davemloft.net; arnd@arndb.de; leon@kernel.org
> Cc: linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org; Boris
> Pismenny <borisp@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>;
> Feras Daoud <ferasda@mellanox.com>; Ilan Tayari <ilant@mellanox.com>;
> netdev@vger.kernel.org; Ilya Lesokhin <ilyal@mellanox.com>
> Subject: Re: [PATCH, net-next] net/mlx5e: fix TLS dependency
> 
> On Fri, 2018-05-25 at 23:36 +0200, Arnd Bergmann wrote:
> > With CONFIG_TLS=m and MLX5_CORE_EN=y, we get a link failure:
> >
> > drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.o: In
> > function `mlx5e_tls_handle_ooo':
> > tls_rxtx.c:(.text+0x24c): undefined reference to `tls_get_record'
> > drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.o: In
> > function `mlx5e_tls_handle_tx_skb':
> > tls_rxtx.c:(.text+0x9a8): undefined reference to
> > `tls_device_sk_destruct'
> >
> > This narrows down the dependency to only allow the configurations that
> > will actually work. The existing dependency on TLS_DEVICE is not
> > sufficient here since MLX5_EN_TLS is a 'bool' symbol.
> >
> > Fixes: c83294b9efa5 ("net/mlx5e: TLS, Add Innova TLS TX support")
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > ---
> 
> LGTM
> 
> Acked-by: Saeed Mahameed <saeedm@mellanox.com>
> 
> Thank you Arnd!
> 
> 
> >  drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> > b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> > index ee6684779d11..2545296a0c08 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> > @@ -91,6 +91,7 @@ config MLX5_EN_TLS
> >  	bool "TLS cryptography-offload accelaration"
> >  	depends on MLX5_CORE_EN
> >  	depends on TLS_DEVICE
> > +	depends on TLS=y || MLX5_CORE=m
> >  	depends on MLX5_ACCEL
> >  	default n
> >  	---help---

^ permalink raw reply

* [PATCH bpf-next v2] selftests/bpf: missing headers test_lwt_seg6local
From: Mathieu Xhonneux @ 2018-05-26 14:44 UTC (permalink / raw)
  To: netdev; +Cc: alexei.starovoitov, daniel, ys114321

Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
some UAPI headers in tools/.

clang -I. -I./include/uapi -I../../../include/uapi -idirafter
/usr/local/include -idirafter
/data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types \
         -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |      \
llc -march=bpf -mcpu=generic  -filetype=obj -o
[...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
         ^~~~~~~~~~~~~~~~~~~~
1 error generated.
make: Leaving directory
`/data/users/yhs/work/net-next/tools/testing/selftests/bpf'

v2: moving the headers to tools/include/uapi/.

Reported-by: Y Song <ys114321@gmail.com>
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
---
 tools/include/uapi/linux/seg6.h       | 55 ++++++++++++++++++++++++
 tools/include/uapi/linux/seg6_local.h | 80 +++++++++++++++++++++++++++++++++++
 2 files changed, 135 insertions(+)
 create mode 100644 tools/include/uapi/linux/seg6.h
 create mode 100644 tools/include/uapi/linux/seg6_local.h

diff --git a/tools/include/uapi/linux/seg6.h b/tools/include/uapi/linux/seg6.h
new file mode 100644
index 000000000000..286e8d6a8e98
--- /dev/null
+++ b/tools/include/uapi/linux/seg6.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ *  SR-IPv6 implementation
+ *
+ *  Author:
+ *  David Lebrun <david.lebrun@uclouvain.be>
+ *
+ *
+ *  This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_SEG6_H
+#define _UAPI_LINUX_SEG6_H
+
+#include <linux/types.h>
+#include <linux/in6.h>		/* For struct in6_addr. */
+
+/*
+ * SRH
+ */
+struct ipv6_sr_hdr {
+	__u8	nexthdr;
+	__u8	hdrlen;
+	__u8	type;
+	__u8	segments_left;
+	__u8	first_segment; /* Represents the last_entry field of SRH */
+	__u8	flags;
+	__u16	tag;
+
+	struct in6_addr segments[0];
+};
+
+#define SR6_FLAG1_PROTECTED	(1 << 6)
+#define SR6_FLAG1_OAM		(1 << 5)
+#define SR6_FLAG1_ALERT		(1 << 4)
+#define SR6_FLAG1_HMAC		(1 << 3)
+
+#define SR6_TLV_INGRESS		1
+#define SR6_TLV_EGRESS		2
+#define SR6_TLV_OPAQUE		3
+#define SR6_TLV_PADDING		4
+#define SR6_TLV_HMAC		5
+
+#define sr_has_hmac(srh) ((srh)->flags & SR6_FLAG1_HMAC)
+
+struct sr6_tlv {
+	__u8 type;
+	__u8 len;
+	__u8 data[0];
+};
+
+#endif
diff --git a/tools/include/uapi/linux/seg6_local.h b/tools/include/uapi/linux/seg6_local.h
new file mode 100644
index 000000000000..edc138bdc56d
--- /dev/null
+++ b/tools/include/uapi/linux/seg6_local.h
@@ -0,0 +1,80 @@
+/*
+ *  SR-IPv6 implementation
+ *
+ *  Author:
+ *  David Lebrun <david.lebrun@uclouvain.be>
+ *
+ *
+ *  This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_SEG6_LOCAL_H
+#define _UAPI_LINUX_SEG6_LOCAL_H
+
+#include <linux/seg6.h>
+
+enum {
+	SEG6_LOCAL_UNSPEC,
+	SEG6_LOCAL_ACTION,
+	SEG6_LOCAL_SRH,
+	SEG6_LOCAL_TABLE,
+	SEG6_LOCAL_NH4,
+	SEG6_LOCAL_NH6,
+	SEG6_LOCAL_IIF,
+	SEG6_LOCAL_OIF,
+	SEG6_LOCAL_BPF,
+	__SEG6_LOCAL_MAX,
+};
+#define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
+
+enum {
+	SEG6_LOCAL_ACTION_UNSPEC	= 0,
+	/* node segment */
+	SEG6_LOCAL_ACTION_END		= 1,
+	/* adjacency segment (IPv6 cross-connect) */
+	SEG6_LOCAL_ACTION_END_X		= 2,
+	/* lookup of next seg NH in table */
+	SEG6_LOCAL_ACTION_END_T		= 3,
+	/* decap and L2 cross-connect */
+	SEG6_LOCAL_ACTION_END_DX2	= 4,
+	/* decap and IPv6 cross-connect */
+	SEG6_LOCAL_ACTION_END_DX6	= 5,
+	/* decap and IPv4 cross-connect */
+	SEG6_LOCAL_ACTION_END_DX4	= 6,
+	/* decap and lookup of DA in v6 table */
+	SEG6_LOCAL_ACTION_END_DT6	= 7,
+	/* decap and lookup of DA in v4 table */
+	SEG6_LOCAL_ACTION_END_DT4	= 8,
+	/* binding segment with insertion */
+	SEG6_LOCAL_ACTION_END_B6	= 9,
+	/* binding segment with encapsulation */
+	SEG6_LOCAL_ACTION_END_B6_ENCAP	= 10,
+	/* binding segment with MPLS encap */
+	SEG6_LOCAL_ACTION_END_BM	= 11,
+	/* lookup last seg in table */
+	SEG6_LOCAL_ACTION_END_S		= 12,
+	/* forward to SR-unaware VNF with static proxy */
+	SEG6_LOCAL_ACTION_END_AS	= 13,
+	/* forward to SR-unaware VNF with masquerading */
+	SEG6_LOCAL_ACTION_END_AM	= 14,
+	/* custom BPF action */
+	SEG6_LOCAL_ACTION_END_BPF	= 15,
+
+	__SEG6_LOCAL_ACTION_MAX,
+};
+
+#define SEG6_LOCAL_ACTION_MAX (__SEG6_LOCAL_ACTION_MAX - 1)
+
+enum {
+	SEG6_LOCAL_BPF_PROG_UNSPEC,
+	SEG6_LOCAL_BPF_PROG,
+	SEG6_LOCAL_BPF_PROG_NAME,
+	__SEG6_LOCAL_BPF_PROG_MAX,
+};
+
+#define SEG6_LOCAL_BPF_PROG_MAX (__SEG6_LOCAL_BPF_PROG_MAX - 1)
+
+#endif
-- 
2.16.1

^ permalink raw reply related

* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Mathieu Xhonneux @ 2018-05-26 12:26 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, netdev, Y Song
In-Reply-To: <a80ed942-c1ef-a630-fffa-5e16f20d8356@iogearbox.net>

2018-05-25 18:39 GMT+02:00 Daniel Borkmann <daniel@iogearbox.net>:
> Yes, should definitely go there to tools include infrastructure.

What is the point of tools/testing/selftests/bpf/include/uapi/ then ?
Incompatibility issues preventing linux/types.h to be included in
non-bpf testing executables ? My initial conception was that all
headers only related to bpf should go into this directory. Sending a
v2.

^ permalink raw reply

* Re: [PATCH v4 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2
From: Sean Young @ 2018-05-26 12:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <20180525204509.7jsnnk2qzws3bmyd@ast-mbp>

On Fri, May 25, 2018 at 01:45:11PM -0700, Alexei Starovoitov wrote:
> On Fri, May 18, 2018 at 03:07:29PM +0100, Sean Young wrote:
> > Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
> > rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> > that the last key should be repeated.
> > 
> > The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> > the target_fd must be the /dev/lircN device.
> > 
> > Signed-off-by: Sean Young <sean@mess.org>
> ...
> >  enum bpf_attach_type {
> > @@ -158,6 +159,7 @@ enum bpf_attach_type {
> >  	BPF_CGROUP_INET6_CONNECT,
> >  	BPF_CGROUP_INET4_POST_BIND,
> >  	BPF_CGROUP_INET6_POST_BIND,
> > +	BPF_LIRC_MODE2,
> >  	__MAX_BPF_ATTACH_TYPE
> >  };
> >  
> > @@ -1902,6 +1904,53 @@ union bpf_attr {
> >   *		egress otherwise). This is the only flag supported for now.
> >   *	Return
> >   *		**SK_PASS** on success, or **SK_DROP** on error.
> > + *
> > + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
> > + *	Description
> > + *		This helper is used in programs implementing IR decoding, to
> > + *		report a successfully decoded key press with *scancode*,
> > + *		*toggle* value in the given *protocol*. The scancode will be
> > + *		translated to a keycode using the rc keymap, and reported as
> > + *		an input key down event. After a period a key up event is
> > + *		generated. This period can be extended by calling either
> > + *		**bpf_rc_keydown** () with the same values, or calling
> > + *		**bpf_rc_repeat** ().
> > + *
> > + *		Some protocols include a toggle bit, in case the button
> > + *		was released and pressed again between consecutive scancodes
> > + *
> > + *		The *ctx* should point to the lirc sample as passed into
> > + *		the program.
> > + *
> > + *		The *protocol* is the decoded protocol number (see
> > + *		**enum rc_proto** for some predefined values).
> > + *
> > + *		This helper is only available is the kernel was compiled with
> > + *		the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> > + *		"**y**".
> > + *
> > + *	Return
> > + *		0
> > + *
> > + * int bpf_rc_repeat(void *ctx)
> > + *	Description
> > + *		This helper is used in programs implementing IR decoding, to
> > + *		report a successfully decoded repeat key message. This delays
> > + *		the generation of a key up event for previously generated
> > + *		key down event.
> > + *
> > + *		Some IR protocols like NEC have a special IR message for
> > + *		repeating last button, for when a button is held down.
> > + *
> > + *		The *ctx* should point to the lirc sample as passed into
> > + *		the program.
> > + *
> > + *		This helper is only available is the kernel was compiled with
> > + *		the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> > + *		"**y**".
> 
> Hi Sean,
> 
> thank you for working on this. The patch set looks good to me.
> I'd only ask to change above two helper names to something more specific.
> Since BPF_PROG_TYPE_LIRC_MODE2 is the name of new prog type and kconfig.
> May be bpf_lirc2_keydown() and bpf_lirc2_repeat() ?

A little history might help here.

lirc and rc-core have non-obvious meanings. So, lirc was the original project
that dealt with IR. That project was rejected from mainline because it did
not send translated keycodes to input devices (it exposed its own interface
for keypresses).

Then rc-core was written which maps IR scancodes to keycodes (using rc
keymaps) and sends them to the input layer. The original lirc userspace ABI
for receiving and sending raw IR pulses and spaces was retained (mode2 as
it was called in lirc).

Reusing parts of the lirc ABI for BPF decoding raw IR makes sense, however
dispatching decoded scancodes was never part of lirc, only rc-core. In fact,
rc-core is reused in hdmi-cec for cec commands, which does not use lirc
at all. So for example, if we want to process cec messages in bpf, it would
want call rc_keydown().

I don't think this lirc/rc-core duality is particularly great, but I'm
not sure what the right answer to that is.

> > @@ -1576,6 +1577,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> >  	case BPF_SK_SKB_STREAM_PARSER:
> >  	case BPF_SK_SKB_STREAM_VERDICT:
> >  		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
> > +	case BPF_LIRC_MODE2:
> > +		return rc_dev_prog_attach(attr);
> ...
> > +	case BPF_LIRC_MODE2:
> > +		return rc_dev_prog_detach(attr);
> 
> and similar rename for internal function names that go into bpf core.

I agree with this.

> Please add accumulated acks when you respin.

Good point, will do.

Thanks,

Sean

^ permalink raw reply

* [PATCH net-next] net: remove unnecessary genlmsg_cancel() calls
From: YueHaibing @ 2018-05-26 11:15 UTC (permalink / raw)
  To: davem, jiri
  Cc: netdev, linux-kernel, johannes, kvalo, kuznet, yoshfuji, sameo,
	linux-wireless, YueHaibing

the message be freed immediately, no need to trim it
back to the previous size.

Inspired by commit 7a9b3ec1e19f ("nl80211: remove unnecessary genlmsg_cancel() calls")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/team/team.c               |  2 --
 drivers/net/wireless/mac80211_hwsim.c |  1 -
 net/core/devlink.c                    |  4 ----
 net/ipv6/seg6.c                       |  1 -
 net/ncsi/ncsi-netlink.c               |  1 -
 net/nfc/netlink.c                     | 17 -----------------
 6 files changed, 26 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index e6730a0..267dcc9 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2426,7 +2426,6 @@ static int team_nl_send_options_get(struct team *team, u32 portid, u32 seq,
 nla_put_failure:
 	err = -EMSGSIZE;
 errout:
-	genlmsg_cancel(skb, hdr);
 	nlmsg_free(skb);
 	return err;
 }
@@ -2720,7 +2719,6 @@ static int team_nl_send_port_list_get(struct team *team, u32 portid, u32 seq,
 nla_put_failure:
 	err = -EMSGSIZE;
 errout:
-	genlmsg_cancel(skb, hdr);
 	nlmsg_free(skb);
 	return err;
 }
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index c26469b..38e1135 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -2514,7 +2514,6 @@ static void hwsim_mcast_new_radio(int id, struct genl_info *info,
 	return;
 
 out_err:
-	genlmsg_cancel(mcast_skb, data);
 	nlmsg_free(mcast_skb);
 }
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 475246b..f75ee02 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1826,7 +1826,6 @@ static int devlink_dpipe_tables_fill(struct genl_info *info,
 nla_put_failure:
 	err = -EMSGSIZE;
 err_table_put:
-	genlmsg_cancel(skb, hdr);
 	nlmsg_free(skb);
 	return err;
 }
@@ -2032,7 +2031,6 @@ int devlink_dpipe_entry_ctx_prepare(struct devlink_dpipe_dump_ctx *dump_ctx)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(dump_ctx->skb, dump_ctx->hdr);
 	nlmsg_free(dump_ctx->skb);
 	return -EMSGSIZE;
 }
@@ -2249,7 +2247,6 @@ static int devlink_dpipe_headers_fill(struct genl_info *info,
 nla_put_failure:
 	err = -EMSGSIZE;
 err_table_put:
-	genlmsg_cancel(skb, hdr);
 	nlmsg_free(skb);
 	return err;
 }
@@ -2551,7 +2548,6 @@ static int devlink_resource_fill(struct genl_info *info,
 	err = -EMSGSIZE;
 err_resource_put:
 err_skb_send_alloc:
-	genlmsg_cancel(skb, hdr);
 	nlmsg_free(skb);
 	return err;
 }
diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c
index 7f5621d..0fdf2a5 100644
--- a/net/ipv6/seg6.c
+++ b/net/ipv6/seg6.c
@@ -226,7 +226,6 @@ static int seg6_genl_get_tunsrc(struct sk_buff *skb, struct genl_info *info)
 
 nla_put_failure:
 	rcu_read_unlock();
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -ENOMEM;
diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index b09ef77..99f4c22 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -201,7 +201,6 @@ static int ncsi_pkg_info_nl(struct sk_buff *msg, struct genl_info *info)
 	return genlmsg_reply(skb, info);
 
 err:
-	genlmsg_cancel(skb, hdr);
 	kfree_skb(skb);
 	return rc;
 }
diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c
index f018eaf..376181c 100644
--- a/net/nfc/netlink.c
+++ b/net/nfc/netlink.c
@@ -206,7 +206,6 @@ int nfc_genl_targets_found(struct nfc_dev *dev)
 	return genlmsg_multicast(&nfc_genl_family, msg, 0, 0, GFP_ATOMIC);
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -237,7 +236,6 @@ int nfc_genl_target_lost(struct nfc_dev *dev, u32 target_idx)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -269,7 +267,6 @@ int nfc_genl_tm_activated(struct nfc_dev *dev, u32 protocol)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -299,7 +296,6 @@ int nfc_genl_tm_deactivated(struct nfc_dev *dev)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -340,7 +336,6 @@ int nfc_genl_device_added(struct nfc_dev *dev)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -370,7 +365,6 @@ int nfc_genl_device_removed(struct nfc_dev *dev)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -434,8 +428,6 @@ int nfc_genl_llc_send_sdres(struct nfc_dev *dev, struct hlist_head *sdres_list)
 	return genlmsg_multicast(&nfc_genl_family, msg, 0, 0, GFP_ATOMIC);
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
-
 free_msg:
 	nlmsg_free(msg);
 
@@ -470,7 +462,6 @@ int nfc_genl_se_added(struct nfc_dev *dev, u32 se_idx, u16 type)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -501,7 +492,6 @@ int nfc_genl_se_removed(struct nfc_dev *dev, u32 se_idx)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -546,7 +536,6 @@ int nfc_genl_se_transaction(struct nfc_dev *dev, u8 se_idx,
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	/* evt_transaction is no more used */
 	devm_kfree(&dev->dev, evt_transaction);
@@ -585,7 +574,6 @@ int nfc_genl_se_connectivity(struct nfc_dev *dev, u8 se_idx)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -703,7 +691,6 @@ int nfc_genl_dep_link_up_event(struct nfc_dev *dev, u32 target_idx,
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -735,7 +722,6 @@ int nfc_genl_dep_link_down_event(struct nfc_dev *dev)
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -1030,7 +1016,6 @@ static int nfc_genl_send_params(struct sk_buff *msg,
 	return 0;
 
 nla_put_failure:
-
 	genlmsg_cancel(msg, hdr);
 	return -EMSGSIZE;
 }
@@ -1290,7 +1275,6 @@ int nfc_genl_fw_download_done(struct nfc_dev *dev, const char *firmware_name,
 	return 0;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	return -EMSGSIZE;
@@ -1507,7 +1491,6 @@ static void se_io_cb(void *context, u8 *apdu, size_t apdu_len, int err)
 	return;
 
 nla_put_failure:
-	genlmsg_cancel(msg, hdr);
 free_msg:
 	nlmsg_free(msg);
 	kfree(ctx);
-- 
2.7.0

^ permalink raw reply related

* Re: System hung for reg_check_changs_work()-> rtnl_lock()->mutex_lock()
From: Dmitry Vyukov @ 2018-05-26 10:19 UTC (permalink / raw)
  To: Shawn Lin; +Cc: netdev, David S. Miller, syzkaller-bugs
In-Reply-To: <aa65282a-b7f9-757f-8690-64c27df44e90@rock-chips.com>

On Mon, May 21, 2018 at 5:47 AM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
> Hi,
>
> I found this hung for several times these days, and seems syzbot already
> reported a similar problem. Is there any patch(es) for that?
>
> Successfully initialized wpa_supplicant
> [  240.091941] INFO: task kworker/u8:1:39 blocked for more than 120 seconds.
> [  240.092004]       Not tainted 4.4.126 #1
> [  240.092026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  240.092047] kworker/u8:1    D ffffff8008084dfc     0    39      2
> 0x00000000
> [  240.092116] Workqueue: events_power_efficient reg_check_chans_work
> [  240.092153] Call trace:
> [  240.092191] [<ffffff8008084dfc>] __switch_to+0x84/0xa0
> [  240.092228] [<ffffff80086299f0>] __schedule+0x428/0x45c
> [  240.092260] [<ffffff8008629a98>] schedule+0x74/0x94
> [  240.092295] [<ffffff8008629e44>] schedule_preempt_disabled+0x20/0x38
> [  240.092332] [<ffffff800862b13c>] __mutex_lock_slowpath+0xc0/0x138
> [  240.092364] [<ffffff800862b1e0>] mutex_lock+0x2c/0x40
> [  240.092399] [<ffffff80084eb820>] rtnl_lock+0x14/0x1c
> [  240.092428] [<ffffff80085ce2a0>] reg_check_chans_work+0x2c/0x1f0
> [  240.092463] [<ffffff80080abbac>] process_one_work+0x1b0/0x294
> [  240.092494] [<ffffff80080ac904>] worker_thread+0x2d8/0x398
> [  240.092524] [<ffffff80080b0f04>] kthread+0xc8/0xd8
> [  240.092567] [<ffffff8008082e80>] ret_from_fork+0x10/0x50
> [  240.092594] Kernel panic - not syncing: hung_task: blocked tasks
> [  240.101163] CPU: 0 PID: 30 Comm: khungtaskd Not tainted 4.4.126 #1
> [  240.101729] Hardware name: Rockchip RK3308 evb analog mic board (DT)
> [  240.102302] Call trace:
> [  240.102546] [<ffffff800808731c>] dump_backtrace+0x0/0x1c4
> [  240.103044] [<ffffff80080874f4>] show_stack+0x14/0x1c
> [  240.103521] [<ffffff800823f4b4>] dump_stack+0x94/0xbc
> [  240.104000] [<ffffff800810a80c>] panic+0xd8/0x228
> [  240.104446] [<ffffff80080fb228>] proc_dohung_task_timeout_secs+0x0/0x40
> [  240.105050] [<ffffff80080b0f04>] kthread+0xc8/0xd8
> [  240.105500] [<ffffff8008082e80>] ret_from_fork+0x10/0x50
> [  240.106065] CPU1: stopping
> [  240.106348] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.126 #1


Syzbot has reported whole bunch of hangs on rtnl lock, but there is no
resolution:
https://syzkaller.appspot.com/bug?id=2503c576cabb08d41812e732b390141f01a59545

I suspect this can be related to hangs in unregister_netdevice:
https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9
They happen all the time too, there is no resolution for this either.

Also see this thread:
https://groups.google.com/d/msg/syzkaller/-06_laheMF0/xqezy58kAwAJ

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox