Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] brcmfmac: Ensure pointer correctly set if skb data location changes
From: Arend van Spriel @ 2017-04-24 11:10 UTC (permalink / raw)
  To: James Hughes, Franky Lin, Hante Meuleman, Kalle Valo,
	linux-wireless, brcm80211-dev-list.pdl, netdev
In-Reply-To: <20170424105219.18797-1-james.hughes@raspberrypi.org>

On 4/24/2017 12:52 PM, James Hughes wrote:
> The incoming skb header may be resized if header space is
> insufficient, which might change the data adddress in the skb.
> Ensure that a cached pointer to that data is correctly set by
> moving assignment to after any possible changes.

Thanks, James

Minor nit below...

You may add my acknowledgement:

Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
> ---
>   drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c

[...]

> @@ -229,6 +229,8 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
>   		}
>   	}
>   
> +	eh = (struct ethhdr *)(skb->data);
> +

Please move after the length validation below.

Regards,
Arend

>   	/* validate length for ether packet */
>   	if (skb->len < sizeof(*eh)) {
>   		ret = -EINVAL;
> 

^ permalink raw reply

* Re: [PATCH net] bridge: shutdown bridge device before removing it
From: Nikolay Aleksandrov @ 2017-04-24 11:29 UTC (permalink / raw)
  To: Xin Long, network dev
  Cc: David S. Miller, Stephen Hemminger,
	bridge@lists.linux-foundation.org
In-Reply-To: <3d77b47d-c8c3-de57-946e-59096a88964e@cumulusnetworks.com>

On 24/04/17 14:01, Nikolay Aleksandrov wrote:
> On 24/04/17 10:25, Xin Long wrote:
>> During removing a bridge device, if the bridge is still up, a new mdb entry
>> still can be added in br_multicast_add_group() after all mdb entries are
>> removed in br_multicast_dev_del(). Like the path:
>>
>>   mld_ifc_timer_expire ->
>>     mld_sendpack -> ...
>>       br_multicast_rcv ->
>>         br_multicast_add_group
>>
>> The new mp's timer will be set up. If the timer expires after the bridge
>> is freed, it may cause use-after-free panic in br_multicast_group_expired.
>> This can happen when ip link remove a bridge or destroy a netns with a
>> bridge device inside.
>>
>> As we can see in br_del_bridge, brctl is also supposed to remove a bridge
>> device after it's shutdown.
>>
>> This patch is to call dev_close at the beginning of br_dev_delete so that
>> netif_running check in br_multicast_add_group can avoid this issue. But
>> to keep consistent with before, it will not remove the IFF_UP check in
>> br_del_bridge for brctl.
>>
>> Reported-by: Jianwen Ji <jiji@redhat.com>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> ---
>>  net/bridge/br_if.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
> 
> +CC bridge maintainers
> 
> I can see how this could happen, could you also provide the traceback ?
> 
> The patch looks good to me, actually I think it fixes another issue with
> mcast stats where the percpu pointer can be accessed after it's freed if
> an mcast packet can get sent via br->dev after the br_multicast_dev_del() call.

Never mind the another issue part, Ido's recent ndo_uninit() patch fixed it.

> This is definitely stable material, if I'm not mistaken the issue is there since
> the introduction of br_dev_delete:
> commit e10177abf842
> Author: Satish Ashok <sashok@cumulusnetworks.com>
> Date:   Wed Jul 15 07:16:51 2015 -0700
> 
>     bridge: multicast: fix handling of temp and perm entries
> 
> 
> 
> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> 
> 
> 
> 
> 
> 

^ permalink raw reply

* [PATCH v2] brcmfmac: Ensure pointer correctly set if skb data location changes
From: James Hughes @ 2017-04-24 11:40 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo,
	linux-wireless, brcm80211-dev-list.pdl, netdev
  Cc: James Hughes

The incoming skb header may be resized if header space is
insufficient, which might change the data adddress in the skb.
Ensure that a cached pointer to that data is correctly set by
moving assignment to after any possible changes.

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>

Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
---
Changes in v2

Moved the assignment below the len check - saves a few cycles 
if the length check fails. 


 drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 5eaac13e2317..9b7c19a508ac 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -198,7 +198,7 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 	int ret;
 	struct brcmf_if *ifp = netdev_priv(ndev);
 	struct brcmf_pub *drvr = ifp->drvr;
-	struct ethhdr *eh = (struct ethhdr *)(skb->data);
+	struct ethhdr *eh;
 
 	brcmf_dbg(DATA, "Enter, bsscfgidx=%d\n", ifp->bsscfgidx);
 
@@ -236,6 +236,8 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 		goto done;
 	}
 
+	eh = (struct ethhdr *)(skb->data);
+
 	if (eh->h_proto == htons(ETH_P_PAE))
 		atomic_inc(&ifp->pend_8021x_cnt);
 
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH RFC] ptr_ring: add ptr_ring_unconsume
From: Jason Wang @ 2017-04-24 11:54 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, netdev
In-Reply-To: <20170424022711-mutt-send-email-mst@kernel.org>



On 2017年04月24日 07:28, Michael S. Tsirkin wrote:
> On Tue, Apr 18, 2017 at 11:07:42AM +0800, Jason Wang wrote:
>>
>> On 2017年04月17日 07:19, Michael S. Tsirkin wrote:
>>> Applications that consume a batch of entries in one go
>>> can benefit from ability to return some of them back
>>> into the ring.
>>>
>>> Add an API for that - assuming there's space. If there's no space
>>> naturally we can't do this and have to drop entries, but this implies
>>> ring is full so we'd likely drop some anyway.
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>
>>> Jason, in my mind the biggest issue with your batching patchset is the
>>> backet drops on disconnect.  This API will help avoid that in the common
>>> case.
>> Ok, I will rebase the series on top of this. (Though I don't think we care
>> the packet loss).
> E.g. I care - I often start sending packets to VM before it's
> fully booted. Several vhost resets might follow.

Ok.

>
>>> I would still prefer that we understand what's going on,
>> I try to reply in another thread, does it make sense?
>>
>>>    and I would
>>> like to know what's the smallest batch size that's still helpful,
>> Yes, I've replied in another thread, the result is:
>>
>>
>> no batching   1.88Mpps
>> RX_BATCH=1    1.93Mpps
>> RX_BATCH=4    2.11Mpps
>> RX_BATCH=16   2.14Mpps
>> RX_BATCH=64   2.25Mpps
>> RX_BATCH=256  2.18Mpps
> Essentially 4 is enough, other stuf looks more like noise
> to me. What about 2?

The numbers are pretty stable, so probably not noise. Retested on top of 
batch zeroing:

no  1.97Mpps
1   2.09Mpps
2   2.11Mpps
4   2.16Mpps
8   2.19Mpps
16  2.21Mpps
32  2.25Mpps
64  2.30Mpps
128 2.21Mpps
256 2.21Mpps

64 performs best.

Thanks

>
>>>    but
>>> I'm not going to block the patch on these grounds assuming packet drops
>>> are fixed.
>> Thanks a lot.
>>
>>> Lightly tested - this is on top of consumer batching patches.
>>>
>>> Thanks!
>>>
>>>    include/linux/ptr_ring.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 57 insertions(+)
>>>
>>> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
>>> index 783e7f5..5fbeab4 100644
>>> --- a/include/linux/ptr_ring.h
>>> +++ b/include/linux/ptr_ring.h
>>> @@ -457,6 +457,63 @@ static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
>>>    	return 0;
>>>    }
>>> +/*
>>> + * Return entries into ring. Destroy entries that don't fit.
>>> + *
>>> + * Note: this is expected to be a rare slow path operation.
>>> + *
>>> + * Note: producer lock is nested within consumer lock, so if you
>>> + * resize you must make sure all uses nest correctly.
>>> + * In particular if you consume ring in interrupt or BH context, you must
>>> + * disable interrupts/BH when doing so.
>>> + */
>>> +static inline void ptr_ring_unconsume(struct ptr_ring *r, void **batch, int n,
>>> +				      void (*destroy)(void *))
>>> +{
>>> +	unsigned long flags;
>>> +	int head;
>>> +
>>> +	spin_lock_irqsave(&(r)->consumer_lock, flags);
>>> +	spin_lock(&(r)->producer_lock);
>>> +
>>> +	if (!r->size)
>>> +		goto done;
>>> +
>>> +	/*
>>> +	 * Clean out buffered entries (for simplicity). This way following code
>>> +	 * can test entries for NULL and if not assume they are valid.
>>> +	 */
>>> +	head = r->consumer_head - 1;
>>> +	while (likely(head >= r->consumer_tail))
>>> +		r->queue[head--] = NULL;
>>> +	r->consumer_tail = r->consumer_head;
>>> +
>>> +	/*
>>> +	 * Go over entries in batch, start moving head back and copy entries.
>>> +	 * Stop when we run into previously unconsumed entries.
>>> +	 */
>>> +	while (n--) {
>>> +		head = r->consumer_head - 1;
>>> +		if (head < 0)
>>> +			head = r->size - 1;
>>> +		if (r->queue[head]) {
>>> +			/* This batch entry will have to be destroyed. */
>>> +			++n;
>>> +			goto done;
>>> +		}
>>> +		r->queue[head] = batch[n];
>>> +		r->consumer_tail = r->consumer_head = head;
>>> +	}
>>> +
>>> +done:
>>> +	/* Destroy all entries left in the batch. */
>>> +	while (n--) {
>>> +		destroy(batch[n]);
>>> +	}
>>> +	spin_unlock(&(r)->producer_lock);
>>> +	spin_unlock_irqrestore(&(r)->consumer_lock, flags);
>>> +}
>>> +
>>>    static inline void **__ptr_ring_swap_queue(struct ptr_ring *r, void **queue,
>>>    					   int size, gfp_t gfp,
>>>    					   void (*destroy)(void *))

^ permalink raw reply

* Re: net: cleanup_net is slow
From: Andrey Konovalov @ 2017-04-24 11:58 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Eric Dumazet, Cong Wang, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, syzkaller
In-Reply-To: <20170421194515.GB8853@breakpoint.cc>

On Fri, Apr 21, 2017 at 9:45 PM, Florian Westphal <fw@strlen.de> wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Indeed.  Setting net.netfilter.nf_conntrack_default_on=0 cuts time
>> cleanup time by 2/3 ...
>>
>> nf unregister is way too happy to issue synchronize_net(), I'll work on
>> a fix.
>
> I'll test this patch as a start.  Maybe we can also leverage exit_batch
> more on netfilter side.

Hi Florian,

Your patch improves fuzzing speed at least twice, which is a great start!

Thanks!

>
> diff --git a/net/netfilter/core.c b/net/netfilter/core.c
> index a87a6f8a74d8..08fe1f526265 100644
> --- a/net/netfilter/core.c
> +++ b/net/netfilter/core.c
> @@ -126,14 +126,15 @@ int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)
>  }
>  EXPORT_SYMBOL(nf_register_net_hook);
>
> -void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)
> +static struct nf_hook_entry *
> +__nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)
>  {
>         struct nf_hook_entry __rcu **pp;
>         struct nf_hook_entry *p;
>
>         pp = nf_hook_entry_head(net, reg);
>         if (WARN_ON_ONCE(!pp))
> -               return;
> +               return NULL;
>
>         mutex_lock(&nf_hook_mutex);
>         for (; (p = nf_entry_dereference(*pp)) != NULL; pp = &p->next) {
> @@ -145,7 +146,7 @@ void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)
>         mutex_unlock(&nf_hook_mutex);
>         if (!p) {
>                 WARN(1, "nf_unregister_net_hook: hook not found!\n");
> -               return;
> +               return NULL;
>         }
>  #ifdef CONFIG_NETFILTER_INGRESS
>         if (reg->pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)
> @@ -154,6 +155,17 @@ void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)
>  #ifdef HAVE_JUMP_LABEL
>         static_key_slow_dec(&nf_hooks_needed[reg->pf][reg->hooknum]);
>  #endif
> +
> +       return p;
> +}
> +
> +void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)
> +{
> +       struct nf_hook_entry *p = __nf_unregister_net_hook(net, reg);
> +
> +       if (!p)
> +               return;
> +
>         synchronize_net();
>         nf_queue_nf_hook_drop(net, p);
>         /* other cpu might still process nfqueue verdict that used reg */
> @@ -183,10 +195,36 @@ int nf_register_net_hooks(struct net *net, const struct nf_hook_ops *reg,
>  EXPORT_SYMBOL(nf_register_net_hooks);
>
>  void nf_unregister_net_hooks(struct net *net, const struct nf_hook_ops *reg,
> -                            unsigned int n)
> +                            unsigned int hookcount)
>  {
> -       while (n-- > 0)
> -               nf_unregister_net_hook(net, &reg[n]);
> +       struct nf_hook_entry *to_free[16];
> +       unsigned int i, n;
> +
> +       WARN_ON_ONCE(hookcount > ARRAY_SIZE(to_free));
> +
> + next_round:
> +       n = min_t(unsigned int, hookcount, ARRAY_SIZE(to_free));
> +
> +       for (i = 0; i < n; i++)
> +               to_free[i] = __nf_unregister_net_hook(net, &reg[i]);
> +
> +       synchronize_net();
> +
> +       for (i = 0; i < n; i++) {
> +               if (to_free[i])
> +                       nf_queue_nf_hook_drop(net, to_free[i]);
> +       }
> +
> +       synchronize_net();
> +
> +       for (i = 0; i < n; i++)
> +               kfree(to_free[i]);
> +
> +       if (n < hookcount) {
> +               hookcount -= n;
> +               reg += n;
> +               goto next_round;
> +       }
>  }
>  EXPORT_SYMBOL(nf_unregister_net_hooks);
>

^ permalink raw reply

* Re: [PATCH RFC] ptr_ring: add ptr_ring_unconsume
From: Michael S. Tsirkin @ 2017-04-24 12:00 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, netdev
In-Reply-To: <21a19608-40be-38d4-9843-088a273fd71a@redhat.com>

On Mon, Apr 24, 2017 at 07:54:18PM +0800, Jason Wang wrote:
> 
> 
> On 2017年04月24日 07:28, Michael S. Tsirkin wrote:
> > On Tue, Apr 18, 2017 at 11:07:42AM +0800, Jason Wang wrote:
> > > 
> > > On 2017年04月17日 07:19, Michael S. Tsirkin wrote:
> > > > Applications that consume a batch of entries in one go
> > > > can benefit from ability to return some of them back
> > > > into the ring.
> > > > 
> > > > Add an API for that - assuming there's space. If there's no space
> > > > naturally we can't do this and have to drop entries, but this implies
> > > > ring is full so we'd likely drop some anyway.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > 
> > > > Jason, in my mind the biggest issue with your batching patchset is the
> > > > backet drops on disconnect.  This API will help avoid that in the common
> > > > case.
> > > Ok, I will rebase the series on top of this. (Though I don't think we care
> > > the packet loss).
> > E.g. I care - I often start sending packets to VM before it's
> > fully booted. Several vhost resets might follow.
> 
> Ok.
> 
> > 
> > > > I would still prefer that we understand what's going on,
> > > I try to reply in another thread, does it make sense?
> > > 
> > > >    and I would
> > > > like to know what's the smallest batch size that's still helpful,
> > > Yes, I've replied in another thread, the result is:
> > > 
> > > 
> > > no batching   1.88Mpps
> > > RX_BATCH=1    1.93Mpps
> > > RX_BATCH=4    2.11Mpps
> > > RX_BATCH=16   2.14Mpps
> > > RX_BATCH=64   2.25Mpps
> > > RX_BATCH=256  2.18Mpps
> > Essentially 4 is enough, other stuf looks more like noise
> > to me. What about 2?
> 
> The numbers are pretty stable, so probably not noise. Retested on top of
> batch zeroing:
> 
> no  1.97Mpps
> 1   2.09Mpps
> 2   2.11Mpps
> 4   2.16Mpps
> 8   2.19Mpps
> 16  2.21Mpps
> 32  2.25Mpps
> 64  2.30Mpps
> 128 2.21Mpps
> 256 2.21Mpps
> 
> 64 performs best.
> 
> Thanks

OK but it might be e.g. a function of the ring size, host cache size or
whatever. As we don't really understand the why, if we just optimize for
your setup we risk regressions in others.  64 entries is a lot, it
increases the queue size noticeably.  Could this be part of the effect?
Could you try changing the queue size to see what happens?

> > 
> > > >    but
> > > > I'm not going to block the patch on these grounds assuming packet drops
> > > > are fixed.
> > > Thanks a lot.
> > > 
> > > > Lightly tested - this is on top of consumer batching patches.
> > > > 
> > > > Thanks!
> > > > 
> > > >    include/linux/ptr_ring.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > >    1 file changed, 57 insertions(+)
> > > > 
> > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > > index 783e7f5..5fbeab4 100644
> > > > --- a/include/linux/ptr_ring.h
> > > > +++ b/include/linux/ptr_ring.h
> > > > @@ -457,6 +457,63 @@ static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
> > > >    	return 0;
> > > >    }
> > > > +/*
> > > > + * Return entries into ring. Destroy entries that don't fit.
> > > > + *
> > > > + * Note: this is expected to be a rare slow path operation.
> > > > + *
> > > > + * Note: producer lock is nested within consumer lock, so if you
> > > > + * resize you must make sure all uses nest correctly.
> > > > + * In particular if you consume ring in interrupt or BH context, you must
> > > > + * disable interrupts/BH when doing so.
> > > > + */
> > > > +static inline void ptr_ring_unconsume(struct ptr_ring *r, void **batch, int n,
> > > > +				      void (*destroy)(void *))
> > > > +{
> > > > +	unsigned long flags;
> > > > +	int head;
> > > > +
> > > > +	spin_lock_irqsave(&(r)->consumer_lock, flags);
> > > > +	spin_lock(&(r)->producer_lock);
> > > > +
> > > > +	if (!r->size)
> > > > +		goto done;
> > > > +
> > > > +	/*
> > > > +	 * Clean out buffered entries (for simplicity). This way following code
> > > > +	 * can test entries for NULL and if not assume they are valid.
> > > > +	 */
> > > > +	head = r->consumer_head - 1;
> > > > +	while (likely(head >= r->consumer_tail))
> > > > +		r->queue[head--] = NULL;
> > > > +	r->consumer_tail = r->consumer_head;
> > > > +
> > > > +	/*
> > > > +	 * Go over entries in batch, start moving head back and copy entries.
> > > > +	 * Stop when we run into previously unconsumed entries.
> > > > +	 */
> > > > +	while (n--) {
> > > > +		head = r->consumer_head - 1;
> > > > +		if (head < 0)
> > > > +			head = r->size - 1;
> > > > +		if (r->queue[head]) {
> > > > +			/* This batch entry will have to be destroyed. */
> > > > +			++n;
> > > > +			goto done;
> > > > +		}
> > > > +		r->queue[head] = batch[n];
> > > > +		r->consumer_tail = r->consumer_head = head;
> > > > +	}
> > > > +
> > > > +done:
> > > > +	/* Destroy all entries left in the batch. */
> > > > +	while (n--) {
> > > > +		destroy(batch[n]);
> > > > +	}
> > > > +	spin_unlock(&(r)->producer_lock);
> > > > +	spin_unlock_irqrestore(&(r)->consumer_lock, flags);
> > > > +}
> > > > +
> > > >    static inline void **__ptr_ring_swap_queue(struct ptr_ring *r, void **queue,
> > > >    					   int size, gfp_t gfp,
> > > >    					   void (*destroy)(void *))

^ permalink raw reply

* Re: [PATCH net] bridge: shutdown bridge device before removing it
From: Nikolay Aleksandrov @ 2017-04-24 12:07 UTC (permalink / raw)
  To: Xin Long, network dev
  Cc: David S. Miller, Stephen Hemminger,
	bridge@lists.linux-foundation.org
In-Reply-To: <3d77b47d-c8c3-de57-946e-59096a88964e@cumulusnetworks.com>

On 24/04/17 14:01, Nikolay Aleksandrov wrote:
> On 24/04/17 10:25, Xin Long wrote:
>> During removing a bridge device, if the bridge is still up, a new mdb entry
>> still can be added in br_multicast_add_group() after all mdb entries are
>> removed in br_multicast_dev_del(). Like the path:
>>
>>   mld_ifc_timer_expire ->
>>     mld_sendpack -> ...
>>       br_multicast_rcv ->
>>         br_multicast_add_group
>>
>> The new mp's timer will be set up. If the timer expires after the bridge
>> is freed, it may cause use-after-free panic in br_multicast_group_expired.
>> This can happen when ip link remove a bridge or destroy a netns with a
>> bridge device inside.
>>
>> As we can see in br_del_bridge, brctl is also supposed to remove a bridge
>> device after it's shutdown.
>>
>> This patch is to call dev_close at the beginning of br_dev_delete so that
>> netif_running check in br_multicast_add_group can avoid this issue. But
>> to keep consistent with before, it will not remove the IFF_UP check in
>> br_del_bridge for brctl.
>>
>> Reported-by: Jianwen Ji <jiji@redhat.com>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> ---
>>  net/bridge/br_if.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
> 
> +CC bridge maintainers
> 
> I can see how this could happen, could you also provide the traceback ?
> 
> The patch looks good to me, actually I think it fixes another issue with
> mcast stats where the percpu pointer can be accessed after it's freed if
> an mcast packet can get sent via br->dev after the br_multicast_dev_del() call.
> This is definitely stable material, if I'm not mistaken the issue is there since
> the introduction of br_dev_delete:
> commit e10177abf842
> Author: Satish Ashok <sashok@cumulusnetworks.com>
> Date:   Wed Jul 15 07:16:51 2015 -0700
> 
>     bridge: multicast: fix handling of temp and perm entries
> 
> 
> 
> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 

Actually I have a better idea for a fix because dev_close() for a single device is rather heavy.
Why don't you move the mdb flush logic in the bridge's ndo_uninit() callback ?
That should have the same effect and be much faster.

By the way I just noticed that there's also a memory leak - the mdb hash is reallocated
and not freed due to the mdb rehash, here's also kmemleak's object:

unreferenced object 0xffff8800540ba800 (size 2048):
  comm "softirq", pid 0, jiffies 4520588901 (age 5787.284s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
    [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
    [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
    [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
    [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
    [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
    [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
    [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
    [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
    [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
    [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
    [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
    [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
    [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
    [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
    [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]

^ permalink raw reply

* Re: net: cleanup_net is slow
From: Florian Westphal @ 2017-04-24 12:08 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Florian Westphal, Eric Dumazet, Cong Wang, netdev, LKML,
	Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <CAAeHK+zwsdzVrgwLccbXs7QUEsFkeTL13_AfNKMD1G2BkOr0eg@mail.gmail.com>

Andrey Konovalov <andreyknvl@google.com> wrote:
> On Fri, Apr 21, 2017 at 9:45 PM, Florian Westphal <fw@strlen.de> wrote:
> > Florian Westphal <fw@strlen.de> wrote:
> >> Indeed.  Setting net.netfilter.nf_conntrack_default_on=0 cuts time
> >> cleanup time by 2/3 ...
> >>
> >> nf unregister is way too happy to issue synchronize_net(), I'll work on
> >> a fix.
> >
> > I'll test this patch as a start.  Maybe we can also leverage exit_batch
> > more on netfilter side.
> 
> Hi Florian,
> 
> Your patch improves fuzzing speed at least twice, which is a great start!

Great.  I'll try to push the patches i have to nf-next ASAP.

^ permalink raw reply

* Re: [PATCH] macsec: avoid heap overflow in skb_to_sgvec
From: Jason A. Donenfeld @ 2017-04-24 12:15 UTC (permalink / raw)
  To: David Laight
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net, stable@vger.kernel.org, security@kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DCFFD9487@AcuExch.aculab.com>

On Mon, Apr 24, 2017 at 1:02 PM, David Laight <David.Laight@aculab.com> wrote:
> ...
>
> Shouldn't skb_to_sgvec() be checking the number of fragments against
> the size of the sg list?
> The callers would then all need auditing to allow for failure.

This has never been done before, since this is one of those operations
that simply _shouldn't fail_ this late in the driver's path. There's
an easy way to use a fixed size array of MAX_SKB_FRAGS+1, and then
just not specify FRAGLIST as a device feature. Then the function
succeeds every time, rather than dropping packets. Alternatively, if
the array is being allocated dynamically (kmalloc), a call to
skb_cow_data returns the number of fragments needed; since usually
people using scattergather are going to be modifying the skb anyway, I
believe this function should be being called anyway...

It would be possible to do as you suggest, though, by using sg_is_last
in skb_to_sgvec. In this case we'd need to change every call site of
skb_to_sgvec to ensure the return value is being checked as well as
making sure that the sglist is initialized with sg_init_table to
ensure the last frag is properly marked. I wouldn't be opposed to
this, though it is potentially error prone work.

In any case, this patch here follows the pattern of the entire rest of
the present-day kernel, so it ought to be merged as-is.

^ permalink raw reply

* [PATCH net-next 0/2] l2tp: add informations about l2tpeth interfaces in /sys
From: Guillaume Nault @ 2017-04-24 12:16 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman

Patch #1 lets userspace retrieve the naming scheme of an l2tpeth
interface, using /sys/class/net/<iface>/name_assign_type.

Patch #2 adds the DEVTYPE field in /sys/class/net/<iface>/uevent so
that userspace can reliably know if a device is an l2tpeth interface.


Guillaume Nault (2):
  l2tp: set name_assign_type for devices created by l2tp_eth.c
  l2tp: define "l2tpeth" device type

 net/l2tp/l2tp_eth.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH net-next 2/2] l2tp: define "l2tpeth" device type
From: Guillaume Nault @ 2017-04-24 12:16 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1493035407.git.g.nault@alphalink.fr>

Export type of l2tpeth interfaces to userspace
(/sys/class/net/<iface>/uevent).

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_eth.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 5e44b3cc1212..59aba8aeac03 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -130,8 +130,13 @@ static const struct net_device_ops l2tp_eth_netdev_ops = {
 	.ndo_set_mac_address	= eth_mac_addr,
 };
 
+static struct device_type l2tpeth_type = {
+	.name = "l2tpeth",
+};
+
 static void l2tp_eth_dev_setup(struct net_device *dev)
 {
+	SET_NETDEV_DEVTYPE(dev, &l2tpeth_type);
 	ether_setup(dev);
 	dev->priv_flags		&= ~IFF_TX_SKB_SHARING;
 	dev->features		|= NETIF_F_LLTX;
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 1/2] l2tp: set name_assign_type for devices created by l2tp_eth.c
From: Guillaume Nault @ 2017-04-24 12:16 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1493035407.git.g.nault@alphalink.fr>

Export naming scheme used when creating l2tpeth interfaces
(/sys/class/net/<iface>/name_assign_type). This let userspace know if
the device's name has been generated automatically or defined manually.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_eth.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index b722d559c544..5e44b3cc1212 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -258,6 +258,7 @@ static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
 
 static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg)
 {
+	unsigned char name_assign_type;
 	struct net_device *dev;
 	char name[IFNAMSIZ];
 	struct l2tp_tunnel *tunnel;
@@ -281,8 +282,11 @@ static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 p
 			goto out;
 		}
 		strlcpy(name, cfg->ifname, IFNAMSIZ);
-	} else
+		name_assign_type = NET_NAME_USER;
+	} else {
 		strcpy(name, L2TP_ETH_DEV_NAME);
+		name_assign_type = NET_NAME_ENUM;
+	}
 
 	session = l2tp_session_create(sizeof(*spriv), tunnel, session_id,
 				      peer_session_id, cfg);
@@ -291,7 +295,7 @@ static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 p
 		goto out;
 	}
 
-	dev = alloc_netdev(sizeof(*priv), name, NET_NAME_UNKNOWN,
+	dev = alloc_netdev(sizeof(*priv), name, name_assign_type,
 			   l2tp_eth_dev_setup);
 	if (!dev) {
 		rc = -ENOMEM;
-- 
2.11.0

^ permalink raw reply related

* [PATCH net] ipv6: move stub initialization after ipv6 setup completion
From: Paolo Abeni @ 2017-04-24 12:18 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, Hannes Frederic Sowa

The ipv6 stub pointer is currently initialized before the ipv6
routing subsystem: a 3rd party can access and use such stub
before the routing data is ready.
Moreover, such pointer is not cleared in case of initialization
error, possibly leading to dangling pointers usage.

This change addresses the above moving the stub initialization
at the end of ipv6 init code.

Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/af_inet6.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index a9a9553..e82e59f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -933,8 +933,6 @@ static int __init inet6_init(void)
 	if (err)
 		goto igmp_fail;
 
-	ipv6_stub = &ipv6_stub_impl;
-
 	err = ipv6_netfilter_init();
 	if (err)
 		goto netfilter_fail;
@@ -1010,6 +1008,10 @@ static int __init inet6_init(void)
 	if (err)
 		goto sysctl_fail;
 #endif
+
+	/* ensure that ipv6 stubs are visible only after ipv6 is ready */
+	wmb();
+	ipv6_stub = &ipv6_stub_impl;
 out:
 	return err;
 
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH v4 13/18] arm64: allwinner: sun50i-a64: add dwmac-sun8i Ethernet driver
From: Corentin Labbe @ 2017-04-24 12:24 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	wens-jdAy2FN1RRM, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	peppe.cavallaro-qxv4g6HH51o, alexandre.torgue-qxv4g6HH51o,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <20170412124153.q6zvdvqkroizaxgb@lukather>

On Wed, Apr 12, 2017 at 02:41:53PM +0200, Maxime Ripard wrote:
> On Wed, Apr 12, 2017 at 01:13:55PM +0200, Corentin Labbe wrote:
> > The dwmac-sun8i is an Ethernet MAC that supports 10/100/1000 Mbit
> > connections. It is very similar to the device found in the Allwinner
> > H3, but lacks the internal 100 Mbit PHY and its associated control
> > bits.
> > This adds the necessary bits to the Allwinner A64 SoC .dtsi, but keeps
> > it disabled at this level.
> > 
> > Signed-off-by: Corentin Labbe <clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > ---
> >  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 +++++++++++++++++++++++++++
> >  1 file changed, 37 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> > index 0b0f4ab..2569827 100644
> > --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> > +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> > @@ -287,6 +287,23 @@
> >  				bias-pull-up;
> >  			};
> >  
> > +			rmii_pins: rmii_pins {
> > +				pins = "PD10", "PD11", "PD13", "PD14",
> > +						"PD17", "PD18", "PD19", "PD20",
> > +						"PD22", "PD23";
> 
> Please align the wrapped lines on the first pin.
> 

OK

> > +				function = "emac";
> > +				drive-strength = <40>;
> 
> Do you actually need that for all the boards, or only a few of them?

I have tried to use lower value without success on some boards. (opipc/pine64 in my memory)

Regards
Corentin Labbe

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-24 12:49 UTC (permalink / raw)
  To: Simon Horman, Jiri Pirko
  Cc: davem, xiyou.wangcong, eric.dumazet, netdev, Tom Herbert,
	Pablo Neira Ayuso
In-Reply-To: <20170424091455.GA25218@vergenet.net>

On 17-04-24 05:14 AM, Simon Horman wrote:
[..]

> Jamal, I am confused about why are you so concerned about the space
> consumed by this attribute, it's per-message, right? Is it the bigger
> picture you are worried about - a similar per-entry flag at some point in
> the future?


To me the two worries are one and the same.

Jiri strongly believes (from a big picture view) we must use
TLVs for extensibility.
While I agree with him in general i have strong reservations
in this case because i can get both extensibility and
build for performance with using a flag bitmask as the
content of the TLV.

A TLV consumes 64 bits minimum. It doesnt matter if we decide
to use a u8 or a u16, we are still sending 64 bits on that
TLV with the rest being PADding. Not to be melodramatic, but
the worst case scenario of putting everything in a TLV for 32
flags is using about 30x more space than using a bitmask.

Yes, space is important and if i can express upto 32 flags
with one TLV rather than 32 TLVs i choose one TLV.
I am always looking for ways to filter out crap i dont need
when i do stats collection. I have numerous wounds from fdb
entries which decided to use a TLV per flag.

The design approach we have used in netlink is: flags start
as a bitmap (whether they are on main headers or TLVs); they may be
complemented with a bitmask/selector (refer to IFLINK messages).

Lets look at this specific patch I have sending. I have already
changed it 3 times and involved a churn of 3 different flags.
If you asked me in the beggining i wouldve scratched my head
thinking for a near term use for bit #3, #4 etc,

I am fine with the counter-Postel view of having the kernel
validate that appropriate bits are set as long as we dont make
user space to now start learning how to play acrobatics.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-24 12:54 UTC (permalink / raw)
  To: Simon Horman; +Cc: David Miller, eric.dumazet, jiri, netdev, xiyou.wangcong
In-Reply-To: <20170424092754.GB25218@vergenet.net>

On 17-04-24 05:27 AM, Simon Horman wrote:
> On Fri, Apr 21, 2017 at 02:11:00PM -0400, Jamal Hadi Salim wrote:
>> On 17-04-21 12:12 PM, David Miller wrote:

>
> From my PoV, for #d user-space has to either get smart or fail.
>
> Creating new tc involved work in order to support the new feature.
> Part of that work would be a decision weather or not to provide
> compatibility for old kernel or to bail out gracefully.
>

But not that much work as is being ascribed now.
This is a big change to user space code. Are we planning to
change all netlink apps (everything in iproute2) to now discover
by testing whether their flags are accepted or not? One extra
crossing to the kernel just to test the waters.

I think there's a middle ground and see which idea takes off.
Refer to the last patch I sent which implements the idea below.

cheers,
jamal

>> There is a simpler approach that would work going forward.
>> How about letting the user choose their fate? Set something maybe
>> in the netlink header to tell the kernel "if you dont understand
>> something I am asking for - please ignore it and do what you can".
>> This would maintain current behavior but would force the user to
>> explicitly state so.
>>
>> cheers,
>> jamal
>>

^ permalink raw reply

* Re: [PATCH v4 13/18] arm64: allwinner: sun50i-a64: add dwmac-sun8i Ethernet driver
From: Chen-Yu Tsai @ 2017-04-24 12:58 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Maxime Ripard, Rob Herring, Mark Rutland, Chen-Yu Tsai,
	Russell King, Catalin Marinas, Will Deacon, Giuseppe Cavallaro,
	alexandre.torgue-qxv4g6HH51o, linux-sunxi, devicetree,
	linux-kernel, netdev, linux-arm-kernel
In-Reply-To: <20170424122411.GA9349@Red>

On Mon, Apr 24, 2017 at 8:24 PM, Corentin Labbe
<clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, Apr 12, 2017 at 02:41:53PM +0200, Maxime Ripard wrote:
>> On Wed, Apr 12, 2017 at 01:13:55PM +0200, Corentin Labbe wrote:
>> > The dwmac-sun8i is an Ethernet MAC that supports 10/100/1000 Mbit
>> > connections. It is very similar to the device found in the Allwinner
>> > H3, but lacks the internal 100 Mbit PHY and its associated control
>> > bits.
>> > This adds the necessary bits to the Allwinner A64 SoC .dtsi, but keeps
>> > it disabled at this level.
>> >
>> > Signed-off-by: Corentin Labbe <clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> > ---
>> >  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 +++++++++++++++++++++++++++
>> >  1 file changed, 37 insertions(+)
>> >
>> > diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> > index 0b0f4ab..2569827 100644
>> > --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> > +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> > @@ -287,6 +287,23 @@
>> >                             bias-pull-up;
>> >                     };
>> >
>> > +                   rmii_pins: rmii_pins {
>> > +                           pins = "PD10", "PD11", "PD13", "PD14",
>> > +                                           "PD17", "PD18", "PD19", "PD20",
>> > +                                           "PD22", "PD23";
>>
>> Please align the wrapped lines on the first pin.
>>
>
> OK
>
>> > +                           function = "emac";
>> > +                           drive-strength = <40>;
>>
>> Do you actually need that for all the boards, or only a few of them?
>
> I have tried to use lower value without success on some boards. (opipc/pine64 in my memory)

FYI we need them for all the boards that use RGMII.
The signals at gigabit speed run at 125 MHz DDR.

For RMII we probably don't need it. Even at 100 Mbps,
it's only 50 MHz SDR. drive-strength = <30> should be
enough.

ChenYu

^ permalink raw reply

* [PATCH v2] net/packet: initialize val in packet_getsockopt()
From: Alexander Potapenko @ 2017-04-24 12:59 UTC (permalink / raw)
  To: dvyukov, kcc, edumazet, davem, kuznet; +Cc: linux-kernel, netdev

In the case getsockopt() is called with PACKET_HDRLEN and optlen < 4
|val| remains uninitialized and the syscall may behave differently
depending on its value. This doesn't have security consequences (as the
uninit bytes aren't copied back), but it's still cleaner to initialize
|val| and ensure optlen is not less than sizeof(int).

This bug has been detected with KMSAN.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2: - if len < sizeof(int), make it 0

KMSAN report below:

==================================================================
BUG: KMSAN: use of unitialized memory in packet_getsockopt+0xb9b/0xbe0
inter: 0
CPU: 0 PID: 1036 Comm: probe Tainted: G    B           4.11.0-rc5+ #2444
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x143/0x1b0 lib/dump_stack.c:52
 kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078
 __kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510
 packet_getsockopt+0xb9b/0xbe0 net/packet/af_packet.c:3839
 SYSC_getsockopt+0x495/0x540 net/socket.c:1829
 SyS_getsockopt+0xb0/0xd0 net/socket.c:1811
 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
RIP: 0033:0x436d8a
RSP: 002b:00007ffce54e52c8 EFLAGS: 00000203 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000436d8a
RDX: 000000000000000b RSI: 0000000000000107 RDI: 0000000000000003
RBP: 00007ffce54e52b0 R08: 00007ffce54e52d8 R09: 0000000000000004
R10: 00007ffce54e52d4 R11: 0000000000000203 R12: 00007ffce54e53c8
R13: 00007ffce54e53d8 R14: 0000000000000002 R15: 0000000000000000
origin description: ----val@packet_getsockopt (origin=00000000f6600052)
local variable created at:
 packet_getsockopt+0xcd/0xbe0 net/packet/af_packet.c:3789
 SYSC_getsockopt+0x495/0x540 net/socket.c:1829
==================================================================
---
 net/packet/af_packet.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8489beff5c25..dfc762df5da9 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3787,7 +3787,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 			     char __user *optval, int __user *optlen)
 {
 	int len;
-	int val, lv = sizeof(val);
+	int val = 0, lv = sizeof(val);
 	struct sock *sk = sock->sk;
 	struct packet_sock *po = pkt_sk(sk);
 	void *data = &val;
@@ -3836,6 +3836,8 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 	case PACKET_HDRLEN:
 		if (len > sizeof(int))
 			len = sizeof(int);
+		if (len < sizeof(int))
+			len = 0;
 		if (copy_from_user(&val, optval, len))
 			return -EFAULT;
 		switch (val) {
-- 
2.12.2.816.g2cccc81164-goog

^ permalink raw reply related

* [PATCH net v1 2/2] tipc: fix socket flow control accounting error at tipc_recv_stream
From: Parthasarathy Bhuvaragan @ 2017-04-24 13:00 UTC (permalink / raw)
  To: netdev; +Cc: tipc-discussion
In-Reply-To: <1493038843-30621-1-git-send-email-parthasarathy.bhuvaragan@ericsson.com>

Until now in tipc_recv_stream(), we update the received
unacknowledged bytes based on a stack variable and not based on the
actual message size.
If the user buffer passed at tipc_recv_stream() is smaller than the
received skb, the size variable in stack differs from the actual
message size in the skb. This leads to a flow control accounting
error causing permanent congestion.

In this commit, we fix this accounting error by always using the
size of the incoming message.

Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index b28e94f1c739..566906795c8c 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1484,7 +1484,7 @@ static int tipc_recv_stream(struct socket *sock, struct msghdr *m,
 	if (unlikely(flags & MSG_PEEK))
 		goto exit;
 
-	tsk->rcv_unacked += tsk_inc(tsk, hlen + sz);
+	tsk->rcv_unacked += tsk_inc(tsk, hlen + msg_data_sz(msg));
 	if (unlikely(tsk->rcv_unacked >= (tsk->rcv_win / 4)))
 		tipc_sk_send_ack(tsk);
 	tsk_advance_rx_queue(sk);
-- 
2.1.4


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related

* [PATCH net v1 1/2] tipc: fix socket flow control accounting error at tipc_send_stream
From: Parthasarathy Bhuvaragan @ 2017-04-24 13:00 UTC (permalink / raw)
  To: netdev; +Cc: tipc-discussion

Until now in tipc_send_stream(), we return -1 when the socket
encounters link congestion even if the socket had successfully
sent partial data. This is incorrect as the application resends
the same the partial data leading to data corruption at
receiver's end.

In this commit, we return the partially sent bytes as the return
value at link congestion.

Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 7130e73bd42c..b28e94f1c739 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1083,7 +1083,7 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen)
 		}
 	} while (sent < dlen && !rc);
 
-	return rc ? rc : sent;
+	return sent ? sent : rc;
 }
 
 /**
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2] brcmfmac: Make skb header writable before use
From: James Hughes @ 2017-04-24 13:03 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo,
	linux-wireless, brcm80211-dev-list.pdl, netdev
  Cc: James Hughes

The driver was making changes to the skb_header without
ensuring it was writable (i.e. uncloned).
This patch also removes some boiler plate header size
checking/adjustment code as that is also handled by the
skb_cow_header function used to make header writable.

This patch depends on
brcmfmac: Ensure pointer correctly set if skb data location changes

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
---
Changes in v2
  Makes the _cow_ call at the entry point of the skb in to the
  stack, means only needs to be done once, and error handling
  is easier.
  Split a separate minor bug fix off to a separate patch (which
  this patch depends on)



 .../net/wireless/broadcom/brcm80211/brcmfmac/core.c   | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 9b7c19a508ac..88f8675a94c2 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -211,22 +211,11 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 		goto done;
 	}
 
-	/* Make sure there's enough room for any header */
-	if (skb_headroom(skb) < drvr->hdrlen) {
-		struct sk_buff *skb2;
-
-		brcmf_dbg(INFO, "%s: insufficient headroom\n",
-			  brcmf_ifname(ifp));
-		drvr->bus_if->tx_realloc++;
-		skb2 = skb_realloc_headroom(skb, drvr->hdrlen);
+	/* Make sure there's enough room for any header, and make it writable */
+	if (skb_cow_head(skb, drvr->hdrlen)) {
 		dev_kfree_skb(skb);
-		skb = skb2;
-		if (skb == NULL) {
-			brcmf_err("%s: skb_realloc_headroom failed\n",
-				  brcmf_ifname(ifp));
-			ret = -ENOMEM;
-			goto done;
-		}
+		ret = -ENOMEM;
+		goto done;
 	}
 
 	/* validate length for ether packet */
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH v4 net-next RFC] net: Generic XDP
From: Jesper Dangaard Brouer @ 2017-04-24 13:18 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: David Miller, alexei.starovoitov, michael.chan, netdev,
	xdp-newbies, brouer
In-Reply-To: <20170420163034.053ec42c@redhat.com>


On Thu, 20 Apr 2017 16:30:34 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Wed, 19 Apr 2017 10:29:03 -0400
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> > I ran this on top of a card that uses the bnxt_en driver on a desktop
> > class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
> > UDP traffic with flow control disabled and saw the following (all stats
> > in Million PPS).
> > 
> >                 xdp1                xdp2            xdp_tx_tunnel
> > Generic XDP      7.8    5.5 (1.3 actual)         4.6 (1.1 actual)
> > Optimized XDP   11.7		     9.7                      4.6
> > 
> > One thing to note is that the Generic XDP case shows some different
> > results for reported by the application vs actual (seen on the wire).  I
> > did not debug where the drops are happening and what counter needs to be
> > incremented to note this -- I'll add that to my TODO list.  The
> > Optimized XDP case does not have a difference in reported vs actual
> > frames on the wire.  
> 
> The reported application vs actual (seen on the wire) number sound scary.
> How do you evaluate/measure "seen on the wire"?
> 
> Perhaps you could use ethtool -S stats to see if anything is fishy?
> I recommend using my tool[1] like:
> 
>  ~/git/network-testing/bin/ethtool_stats.pl --dev mlx5p2 --sec 2
> 
> [1] https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
> 
> I'm evaluating this patch on a mlx5 NIC, and something is not right...
> I'm seeing:
> 
>  Ethtool(mlx5p2) stat:     349599 (        349,599) <= tx_multicast_phy /sec
>  Ethtool(mlx5p2) stat:    4940185 (      4,940,185) <= tx_packets /sec
>  Ethtool(mlx5p2) stat:     349596 (        349,596) <= tx_packets_phy /sec
>  [...]
>  Ethtool(mlx5p2) stat:      36898 (         36,898) <= rx_cache_busy /sec
>  Ethtool(mlx5p2) stat:      36898 (         36,898) <= rx_cache_full /sec
>  Ethtool(mlx5p2) stat:    4903287 (      4,903,287) <= rx_cache_reuse /sec
>  Ethtool(mlx5p2) stat:    4940185 (      4,940,185) <= rx_csum_complete /sec
>  Ethtool(mlx5p2) stat:    4940185 (      4,940,185) <= rx_packets /sec
> 
> Something is wrong... when I tcpdump on the generator machine, I see
> garbled packets with IPv6 multicast addresses.
> 
> And it looks like I'm only sending 349,596 tx_packets_phy/sec on the "wire".
> 

Not seeing packets on the TX wire was caused by the NIC HW dropping the
packets, because the ethernet MAC-addr were not changed/swapped.

Fixed this XDP_TX bug in my test program xdp_bench01_mem_access_cost.
https://github.com/netoptimizer/prototype-kernel/commit/85f7ba2f0ea2

Even added a new option --swapmac for creating another test option for
modifying the packet.
https://github.com/netoptimizer/prototype-kernel/commit/fe080e6f3ccf

I will shortly publish a full report of testing this patch.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH net-next 1/3] samples/bpf: add -Wno-unknown-warning-option to clang
From: Alexander Alemayhu @ 2017-04-24 13:31 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Alemayhu, daniel, ast
In-Reply-To: <20170424133108.31595-1-alexander@alemayhu.com>

I was initially going to remove '-Wno-address-of-packed-member' because I
thought it was not supposed to be there but Daniel suggested using
'-Wno-unknown-warning-option'. 

This silences several warnings similiar to the one below

warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option]
1 warning generated.
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include
 -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  \
        -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
        -Wno-compare-distinct-pointer-types \
        -Wno-gnu-variable-sized-type-not-at-end \
        -Wno-address-of-packed-member -Wno-tautological-compare \
        -O2 -emit-llvm -c samples/bpf/xdp_tx_iptunnel_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/xdp_tx_iptunnel_kern.o

$ clang --version

 clang version 3.9.1 (tags/RELEASE_391/final)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 InstalledDir: /usr/bin

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
---
 samples/bpf/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index d42b495b0992..6c7468eb3684 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -189,4 +189,5 @@ $(obj)/%.o: $(src)/%.c
 		-Wno-compare-distinct-pointer-types \
 		-Wno-gnu-variable-sized-type-not-at-end \
 		-Wno-address-of-packed-member -Wno-tautological-compare \
+		-Wno-unknown-warning-option \
 		-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 0/3] Misc BPF cleanup
From: Alexander Alemayhu @ 2017-04-24 13:31 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Alemayhu, daniel, ast

Hei,

while looking into making the Makefile in samples/bpf better handle O= I saw
several warnings when running `make clean && make samples/bpf/`. This series
reduces those warnings.

Thanks.

Alexander Alemayhu (3):
  samples/bpf: add -Wno-unknown-warning-option to clang
  samples/bpf: add static to function with no prototype
  samples/bpf: check before defining offsetof

 samples/bpf/Makefile                    | 1 +
 samples/bpf/cookie_uid_helper_example.c | 2 +-
 samples/bpf/test_lru_dist.c             | 4 +++-
 3 files changed, 5 insertions(+), 2 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH net-next 2/3] samples/bpf: add static to function with no prototype
From: Alexander Alemayhu @ 2017-04-24 13:31 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Alemayhu, daniel, ast
In-Reply-To: <20170424133108.31595-1-alexander@alemayhu.com>

Fixes the following warning

samples/bpf/cookie_uid_helper_example.c: At top level:
samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes]
 void finish(int ret)
      ^~~~~~
  HOSTLD  samples/bpf/per_socket_stats_example

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
---
 samples/bpf/cookie_uid_helper_example.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/cookie_uid_helper_example.c b/samples/bpf/cookie_uid_helper_example.c
index ad5afedf2e70..9ce55840d61d 100644
--- a/samples/bpf/cookie_uid_helper_example.c
+++ b/samples/bpf/cookie_uid_helper_example.c
@@ -273,7 +273,7 @@ static int usage(void)
 	return 1;
 }
 
-void finish(int ret)
+static void finish(int ret)
 {
 	test_finish = true;
 }
-- 
2.9.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox