Netdev List
 help / color / mirror / Atom feed
* [PATCH net] sfc: remove incorrect EFX_BUG_ON_PARANOID check
From: Edward Cree @ 2014-10-21 13:50 UTC (permalink / raw)
  To: davem, netdev; +Cc: sshah, jcooper, linux-net-drivers

From: Jon Cooper <jcooper@solarflare.com>

write_count and insert_count can wrap around, making > check invalid.

Fixes: 70b33fb0ddec827cbbd14cdc664fc27b2ef4a6b6 ("sfc: add support for
 skb->xmit_more").

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
Compile tested only.

 drivers/net/ethernet/sfc/tx.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index ee84a90..aaf2987 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -343,8 +343,6 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	unsigned short dma_flags;
 	int i = 0;
 
-	EFX_BUG_ON_PARANOID(tx_queue->write_count > tx_queue->insert_count);
-
 	if (skb_shinfo(skb)->gso_size)
 		return efx_enqueue_skb_tso(tx_queue, skb);
 
@@ -1258,8 +1256,6 @@ static int efx_enqueue_skb_tso(struct efx_tx_queue *tx_queue,
 	/* Find the packet protocol and sanity-check it */
 	state.protocol = efx_tso_check_protocol(skb);
 
-	EFX_BUG_ON_PARANOID(tx_queue->write_count > tx_queue->insert_count);
-
 	rc = tso_start(&state, efx, skb);
 	if (rc)
 		goto mem_err;
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH] netfilter: Fix wastful cleanup check for unconfirmed conn in get_next_corpse
From: Feng Gao @ 2014-10-21 13:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Patrick McHardy, kadlec, davem
  Cc: Netfilter Developer Mailing List, coreteam, netdev, linux-kernel
In-Reply-To: <CA+6hz4oodWXXyWhcR=Zt4Esf1Riwatvy0xYMhFnQ8CkTaqV=Pg@mail.gmail.com>

Sorry. I get it is not an issue after read the codes again.
The unconfirmed conn list check is only checked once in the current codes.
Because it will be checked only when no matched conntracks found in
function get_next_corpse.

Then I think current codes may confuse the reader. I am an example.
So could my changes be as the enhancement ?

Best Regards
Feng

On Tue, Oct 21, 2014 at 10:47 AM, Feng Gao <gfree.wind@gmail.com> wrote:
> Paste my changes directly instead of the attachment.
>
> Subject: [PATCH 1/1] netfilter: Fix wastful cleanup check for unconfirmed
> conn in get_next_corpse
>
> The function get_next_corpse is used to iterate the conntracks.
> It will check the per cpu unconfirmed list of every cpu too.
> Now it is only invoked by nf_ct_iterate_cleanup in one while loop.
> Actually the unconfirmed list could be accessed completely by one call, then
> the others are wastful.
>
> So move the unconfirmed list check outside the function get_next_corpse and
> create one new function
> Let the nf_ct_iterate_cleanup invokes the new function
> clean_up_unconfirmed_conntracks once after the loops.
>
> Signed-off-by: fgao <gfree.wind@gmail.com>
> ---
>  net/netfilter/nf_conntrack_core.c | 36 ++++++++++++++++++++++++------------
>  1 file changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_core.c
> b/net/netfilter/nf_conntrack_core.c
> index 5016a69..ace7c2c2 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1348,6 +1348,28 @@ static void nf_conntrack_attach(struct sk_buff *nskb,
> const struct sk_buff *skb)
>   nf_conntrack_get(nskb->nfct);
>  }
>
> +static void clean_up_unconfirmed_conntracks(struct net *net,
> + int (*iter)(struct nf_conn *i, void *data),
> + void *data)
> +{
> + struct nf_conntrack_tuple_hash *h;
> + struct nf_conn *ct;
> + struct hlist_nulls_node *n;
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
> +
> + spin_lock_bh(&pcpu->lock);
> + hlist_nulls_for_each_entry(h, n, &pcpu->unconfirmed, hnnode) {
> + ct = nf_ct_tuplehash_to_ctrack(h);
> + if (iter(ct, data))
> + set_bit(IPS_DYING_BIT, &ct->status);
> + }
> + spin_unlock_bh(&pcpu->lock);
> + }
> +}
> +
>  /* Bring out ya dead! */
>  static struct nf_conn *
>  get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void
> *data),
> @@ -1356,7 +1378,6 @@ get_next_corpse(struct net *net, int (*iter)(struct
> nf_conn *i, void *data),
>   struct nf_conntrack_tuple_hash *h;
>   struct nf_conn *ct;
>   struct hlist_nulls_node *n;
> - int cpu;
>   spinlock_t *lockp;
>
>   for (; *bucket < net->ct.htable_size; (*bucket)++) {
> @@ -1376,17 +1397,6 @@ get_next_corpse(struct net *net, int (*iter)(struct
> nf_conn *i, void *data),
>   local_bh_enable();
>   }
>
> - for_each_possible_cpu(cpu) {
> - struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
> -
> - spin_lock_bh(&pcpu->lock);
> - hlist_nulls_for_each_entry(h, n, &pcpu->unconfirmed, hnnode) {
> - ct = nf_ct_tuplehash_to_ctrack(h);
> - if (iter(ct, data))
> - set_bit(IPS_DYING_BIT, &ct->status);
> - }
> - spin_unlock_bh(&pcpu->lock);
> - }
>   return NULL;
>  found:
>   atomic_inc(&ct->ct_general.use);
> @@ -1411,6 +1421,8 @@ void nf_ct_iterate_cleanup(struct net *net,
>
>   nf_ct_put(ct);
>   }
> +
> + clean_up_unconfirmed_conntracks(net, iter, data);
>  }
>  EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup);
>
> --
> 1.9.1
>
>
> On Tue, Oct 21, 2014 at 8:31 AM, Feng Gao <gfree.wind@gmail.com> wrote:
>>
>> Hi all,
>>
>> I am sorry to send the patch commit again because the last email is
>> not plain text and is rejected by some servers.
>>
>> This is the patch to branch master of kernel.
>>
>> The function get_next_corpse is only invoked by nf_ct_iterate_cleanup
>> in one while loop, and it will check the per cpu unconfirmed conntrack
>> list every time.
>>
>> I think the whole list of unconfirmed conntracks could be accessed by
>> one call, so the others are not necessary.
>>
>> So I move the checks outside the get_next_corpse, and create one new
>> function clean_up_unconfirmed_conntracks.
>> Let the nf_ct_iterate_cleanup invokes the
>> clean_up_unconfirmed_conntracks after the while loop.
>>
>> These codes have already exist for a long time. Firstly I think maybe
>> there is some reason, but I fail to get it.
>>
>>
>> Best Regards
>> Feng
>
>

^ permalink raw reply

* Re: [PATCH] Documentation: ptp: Fix build failure on MIPS cross builds
From: Richard Cochran @ 2014-10-21 13:39 UTC (permalink / raw)
  To: Markos Chandras
  Cc: linux-mips, Jonathan Corbet, netdev, linux-doc, linux-kernel,
	Peter Foley
In-Reply-To: <544659B1.6070509@imgtec.com>

On Tue, Oct 21, 2014 at 02:03:45PM +0100, Markos Chandras wrote:
> 
> Hmm I can't see this testptp.mk file in the mainline. What tree are you
> referring to?

Sorry, I have net-next open in front of me. The same guy who added
the buggy Makefile deleted my working makefile...

Thanks,
Richard

^ permalink raw reply

* [PATCH net-next] Removed unused function sctp_addr_is_valid()
From: Sébastien Barré @ 2014-10-21 13:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: Sébastien Barré, Vlad Yasevich, Neil Horman, linux-sctp,
	netdev

sctp_addr_is_valid() only appeared in its definition.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>

---

Note: originally sent on linux-sctp (11/10/2014, 16:29).
Sorry, I should probably have sent immediately to you and netdev as well.
Neil Horman had acked the patch, so I included
the ack here.

 include/net/sctp/structs.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 4ff3f67..806e3b5 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1116,7 +1116,6 @@ int sctp_raw_to_bind_addrs(struct sctp_bind_addr *bp, __u8 *raw, int len,
 sctp_scope_t sctp_scope(const union sctp_addr *);
 int sctp_in_scope(struct net *net, const union sctp_addr *addr, const sctp_scope_t scope);
 int sctp_is_any(struct sock *sk, const union sctp_addr *addr);
-int sctp_addr_is_valid(const union sctp_addr *addr);
 int sctp_is_ep_boundall(struct sock *sk);
 
 
-- 
tg: (61ed53d..) net-next/removed-unused-sctp-function (depends on: net-next/master)

^ permalink raw reply related

* Re: [PATCH] Documentation: ptp: Fix build failure on MIPS cross builds
From: Markos Chandras @ 2014-10-21 13:03 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-mips, Jonathan Corbet, netdev, linux-doc, linux-kernel,
	Peter Foley
In-Reply-To: <20141021125240.GB16479@netboy>

On 10/21/2014 01:52 PM, Richard Cochran wrote:
> (adding Peter Foley to CC ...)
> 
> On Tue, Oct 21, 2014 at 01:11:22PM +0100, Markos Chandras wrote:
>> On 10/21/2014 12:07 PM, Richard Cochran wrote:
>>> On Mon, Oct 20, 2014 at 09:42:18AM +0100, Markos Chandras wrote:
>>>> diff --git a/Documentation/ptp/Makefile b/Documentation/ptp/Makefile
>>>> index 293d6c09a11f..397c1cd2eda7 100644
>>>> --- a/Documentation/ptp/Makefile
>>>> +++ b/Documentation/ptp/Makefile
>>>> @@ -1,5 +1,15 @@
>>>>  # List of programs to build
>>>> +ifndef CROSS_COMPILE
>>>>  hostprogs-y := testptp
>>>> +else
>>>> +# MIPS system calls are defined based on the -mabi that is passed
>>>> +# to the toolchain which may or may not be a valid option
>>>> +# for the host toolchain. So disable testptp if target architecture
>>>> +# is MIPS but the host isn't.
>>>> +ifndef CONFIG_MIPS
>>>> +hostprogs-y := testptp
>>>> +endif
>>>> +endif
>>>
>>> It seems like a shame to simply give up and not compile this at all.
>>> Is there no way to correctly cross compile this for MIPS?
>>>
>>> Thanks,
>>> Richard
>>>
>>
>> As far as I can see you don't cross-compile the file. You use the host
>> toolchain.
> 
> Look at Documentation/ptp/testptp.mk. There I do use $CROSS_COMPILE.

Hmm I can't see this testptp.mk file in the mainline. What tree are you
referring to?

markos linux (master) $ grep -r CROSS_COMPILE Documentation/ptp/*
markos linux (master) $

-- 
markos

^ permalink raw reply

* Re: [PATCH] Documentation: ptp: Fix build failure on MIPS cross builds
From: Richard Cochran @ 2014-10-21 12:52 UTC (permalink / raw)
  To: Markos Chandras
  Cc: linux-mips, Jonathan Corbet, netdev, linux-doc, linux-kernel,
	Peter Foley
In-Reply-To: <54464D6A.5000501@imgtec.com>

(adding Peter Foley to CC ...)

On Tue, Oct 21, 2014 at 01:11:22PM +0100, Markos Chandras wrote:
> On 10/21/2014 12:07 PM, Richard Cochran wrote:
> > On Mon, Oct 20, 2014 at 09:42:18AM +0100, Markos Chandras wrote:
> >> diff --git a/Documentation/ptp/Makefile b/Documentation/ptp/Makefile
> >> index 293d6c09a11f..397c1cd2eda7 100644
> >> --- a/Documentation/ptp/Makefile
> >> +++ b/Documentation/ptp/Makefile
> >> @@ -1,5 +1,15 @@
> >>  # List of programs to build
> >> +ifndef CROSS_COMPILE
> >>  hostprogs-y := testptp
> >> +else
> >> +# MIPS system calls are defined based on the -mabi that is passed
> >> +# to the toolchain which may or may not be a valid option
> >> +# for the host toolchain. So disable testptp if target architecture
> >> +# is MIPS but the host isn't.
> >> +ifndef CONFIG_MIPS
> >> +hostprogs-y := testptp
> >> +endif
> >> +endif
> > 
> > It seems like a shame to simply give up and not compile this at all.
> > Is there no way to correctly cross compile this for MIPS?
> > 
> > Thanks,
> > Richard
> > 
> 
> As far as I can see you don't cross-compile the file. You use the host
> toolchain.

Look at Documentation/ptp/testptp.mk. There I do use $CROSS_COMPILE.

> There is no clean way to build it for host if you have your
> kernel configured for MIPS. Perhaps maybe you could define
> __MIPS_SIM_{ABI64, ABI32, NABI32} in the gcc command line (-D...) but
> this is a bit ugly. Or maybe use the host headers instead of the ones in
> the kernel source.

Your patch is for the file, Documentation/ptp/Makefile. I did not
write that file. Maybe Peter knows how to fix it?

Thanks,
Richard


^ permalink raw reply

* BUSINESS PARTNERSHIP PROPOSAL IN INDONESIA
From: Wayne Spencer @ 2014-10-21 12:33 UTC (permalink / raw)


Hello,
Compliment, Am Spencer Wayne, I work with a pharmaceutical company
here in Canada and I got your  contact from Indo Canada Chamber of
Commerce in Canada and I decided to contact you directly for an
investment with my company if you can understand English, Bahasa.
production of Pharmaceutical products and Animal Vaccines. There are
some agricultural seeds that my company needs from Indonesia for the
production of our Animal Vaccines and Anti-viral drugs.

We have been purchasing the materials from Pakistan and Sri Lanka but
it is very scarce now and we got information that it also exist in
Indonesia but we are having problem with the owner of the products due
to language barrier.

Please take a moment out of your very busy schedule to respond back to
me at my private e mail address for more detail.
Regards,
Wayne

^ permalink raw reply

* Re: [PATCH] Documentation: ptp: Fix build failure on MIPS cross builds
From: Markos Chandras @ 2014-10-21 12:11 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-mips, Jonathan Corbet, netdev, linux-doc, linux-kernel
In-Reply-To: <20141021110724.GA16479@netboy>

On 10/21/2014 12:07 PM, Richard Cochran wrote:
> On Mon, Oct 20, 2014 at 09:42:18AM +0100, Markos Chandras wrote:
>> diff --git a/Documentation/ptp/Makefile b/Documentation/ptp/Makefile
>> index 293d6c09a11f..397c1cd2eda7 100644
>> --- a/Documentation/ptp/Makefile
>> +++ b/Documentation/ptp/Makefile
>> @@ -1,5 +1,15 @@
>>  # List of programs to build
>> +ifndef CROSS_COMPILE
>>  hostprogs-y := testptp
>> +else
>> +# MIPS system calls are defined based on the -mabi that is passed
>> +# to the toolchain which may or may not be a valid option
>> +# for the host toolchain. So disable testptp if target architecture
>> +# is MIPS but the host isn't.
>> +ifndef CONFIG_MIPS
>> +hostprogs-y := testptp
>> +endif
>> +endif
> 
> It seems like a shame to simply give up and not compile this at all.
> Is there no way to correctly cross compile this for MIPS?
> 
> Thanks,
> Richard
> 

As far as I can see you don't cross-compile the file. You use the host
toolchain. There is no clean way to build it for host if you have your
kernel configured for MIPS. Perhaps maybe you could define
__MIPS_SIM_{ABI64, ABI32, NABI32} in the gcc command line (-D...) but
this is a bit ugly. Or maybe use the host headers instead of the ones in
the kernel source.

-- 
markos

^ permalink raw reply

* Re: [PATCH 8/9] ARM: berlin: Add phy-connection-type to BG2Q PHY
From: Sergei Shtylyov @ 2014-10-21 12:08 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: Antoine Ténart, David S. Miller, Florian Fainelli, Eric Miao,
	Haojian Zhuang, linux-arm-kernel, netdev, devicetree,
	linux-kernel
In-Reply-To: <54464796.90804@gmail.com>

On 10/21/2014 3:46 PM, Sebastian Hesselbarth wrote:

>>> From: Antoine Ténart <antoine.tenart@free-electrons.com>

>>> Internal FastEthernet PHY on BG2Q is connected via MII, add a
>>> corresponding phy-connection-type property.

>>> Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
>>> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
>>> ---
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: "Antoine Ténart" <antoine.tenart@free-electrons.com>
>>> Cc: Florian Fainelli <f.fainelli@gmail.com>
>>> Cc: Eric Miao <eric.y.miao@gmail.com>
>>> Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: netdev@vger.kernel.org
>>> Cc: devicetree@vger.kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>> ---
>>>   arch/arm/boot/dts/berlin2q.dtsi | 1 +
>>>   1 file changed, 1 insertion(+)

>>> diff --git a/arch/arm/boot/dts/berlin2q.dtsi
>>> b/arch/arm/boot/dts/berlin2q.dtsi
>>> index 891d56b03922..6dbc520bddc1 100644
>>> --- a/arch/arm/boot/dts/berlin2q.dtsi
>>> +++ b/arch/arm/boot/dts/berlin2q.dtsi
>>> @@ -127,6 +127,7 @@
>>>               status = "disabled";
>>>
>>>               ethphy0: ethernet-phy@0 {
>>> +                phy-connection-type = "mii";

>>     You're adding this prop to the PHY node? That's very weird...
>> normally, it's a property of a MDIO bus node.

> Sergei,

> How can this be a property of the MDIO bus node? Just think of an MDIO
> bus with two PHYs where one is connected via GMII and the other via
> RGMII? How should this work?

    Hm, this is an unexpected case...

> But you are right that the property should not be part of the PHY node
> but the controller node instead. I'll rework and send an update, thanks
> for the hint.

    Oh, not at all. :-)

> Sebastian

WBR, Sergei

^ permalink raw reply

* Re: Queue with wait-free enqueue, blocking dequeue, splice
From: Mathieu Desnoyers @ 2014-10-21 12:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Paul E. McKenney, netdev, Jamal Hadi Salim
In-Reply-To: <285747581.12566.1413849850921.JavaMail.zimbra@efficios.com>



----- Original Message -----
> From: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> To: "Jesper Dangaard Brouer" <brouer@redhat.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, netdev@vger.kernel.org, "Jamal Hadi Salim" <jhs@mojatatu.com>
> Sent: Monday, October 20, 2014 8:04:10 PM
> Subject: Re: Queue with wait-free enqueue, blocking dequeue, splice
> 
> ----- Original Message -----
> > From: "Jesper Dangaard Brouer" <brouer@redhat.com>
> > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> > Cc: brouer@redhat.com, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
> > netdev@vger.kernel.org, "Jamal Hadi Salim"
> > <jhs@mojatatu.com>
> > Sent: Monday, October 20, 2014 10:02:37 AM
> > Subject: Re: Queue with wait-free enqueue, blocking dequeue, splice
> > 

[...]

> > 
> > AFAIK your queue implementation is a CAS-based, Wait-Free on enqueue,
> > but Lock-Free on dequeue with the potential for waiting/blocking on
> > a enqueue processes.
> >  I'm not 100% sure, that we want this behavior for the qdisc system.
> 
> It's actually xchg-based (not CAS-based). It is indeed wait-free
> in the strictest sense of the term on enqueue (at least on x86,
> some other arch implement xchg using ll/sc, which is not strictly
> wait-free).
> 
> On dequeue, it can busy-wait for a short while that the enqueue
> completes. Note that in kernel, since we disable preemption during
> enqueue, the dequeue does not have to ever block, just busy-looping
> is OK, since the longest thing that could nest over the enqueue
> is possibly an interrupt and softirq. So yes, I guess the dequeue
> would qualify as lock-free.

Scratch this last part about dequeue lock-freedom: dequeue can
busy-wait endlessly long if an enqueue is, for instance, interrupted
by a dequeue operation that would sit within an interrupt handler.
Dequeue is therefore not "lock-free". It is not even obstruction-free.

This just means that you need to be aware of this if you use dequeue
in an execution context that can interrupt enqueue.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: RCU stall in af_unix.c, should use spin_lock_irqsave?
From: Thomas Petazzoni @ 2014-10-21 11:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel,
	Alexandre FOURNIER, Ezequiel Garcia, Marcin Wojtas,
	Gregory Clément
In-Reply-To: <1413887300.23173.14.camel@edumazet-glaptop2.roam.corp.google.com>

Dear Eric Dumazet,

On Tue, 21 Oct 2014 03:28:20 -0700, Eric Dumazet wrote:

> > Ok. So it's actually safe to mix spin_lock() and spin_lock_irqsave() on
> > the same lock, if you know that this lock will never ever be taken in
> > an interrupt context?
> 
> Sure.

Ok, thanks.

> > > mvpp2 is seriously brain damaged : on_each_cpu() cannot be used from
> > > a bottom half handler.
> > 
> > That's what I thought. Back to the drawing board then, to fix mvpp2.
> > 
> > Do you think there is a place where we can write down those
> > assumptions? It isn't easy to spot whether on_each_cpu() is safe to use
> > in a bottom half or not.
> > 
> 
> Really ? kernel/smp.c is full of comments.
> 
> Too many comments and people seem to not read them ;)
> 
> Take a look at smp_call_function(), which is called from on_each_cpu()

Indeed, it's written black on white on smp_call_function(). I guess
we'll have to dig into the details of the mvpp2 hardware and its
per-CPU registers and see how to handle things properly.

Thanks a lot for your input!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH 2/2][V2] xfrm6: fix a potential use after free in xfrm6_policy.c
From: Sergei Shtylyov @ 2014-10-21 11:57 UTC (permalink / raw)
  To: roy.qing.li, netdev
In-Reply-To: <1413851652-22553-1-git-send-email-roy.qing.li@gmail.com>

Hello.

On 10/21/2014 4:34 AM, roy.qing.li@gmail.com wrote:

> From: Li RongQing <roy.qing.li@gmail.com>

> pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
> oboslete, so recompute the nd and exthdr

> V2: insert a space between date type(like __be16) and * as suggested by
> Sergei Shtylyov

    This passage should preferably go under the -- tear line.

> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

    You should re-post the whole series anew.

WBR, Sergei

^ permalink raw reply

* Re: Queue with wait-free enqueue, blocking dequeue, splice
From: Jesper Dangaard Brouer @ 2014-10-21 11:48 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Paul E. McKenney, netdev, Jamal Hadi Salim, brouer
In-Reply-To: <285747581.12566.1413849850921.JavaMail.zimbra@efficios.com>

On Tue, 21 Oct 2014 00:04:10 +0000 (UTC) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> > From: "Jesper Dangaard Brouer" <brouer@redhat.com>
> > 
[...]
> > I can certainly use the wfcq_empty() check,
> 
> Not sure why you would want to use it, considering that the dequeue
> operation implies it. If there is nothing to dequeue, we just return
> immediately. Dequeue operation does not block on empty queue. It
> just busy waits if it happen to see the queue in an intermediate
> state of the enqueue operation (which is very short, few instructions
> at most, with preemption disabled).
> 
> > but I guess I need to
> > maintain a separate counter to maintain the qdisc limit, right?
> > (I would use the approximate/split counter API percpu_counter to keep
> > this scalable, and wfcq_empty() would provide an accurate empty check)
> 
> Yes for split counters, not sure why you need the empty check explicitly
> in your use-case though.

In case the qdisc is empty, we avoid/bypass the enqueue + dequeue phase
and instead transmit the packet directly.

Iif the flag TCQ_F_CAN_BYPASS is set. 
 https://github.com/torvalds/linux/blob/master/net/core/dev.c#L2799
But I'm not 100% sure that we can set this flag on a lock-less qdisc.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH 8/9] ARM: berlin: Add phy-connection-type to BG2Q PHY
From: Sebastian Hesselbarth @ 2014-10-21 11:46 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Antoine Ténart, David S. Miller, Florian Fainelli, Eric Miao,
	Haojian Zhuang, linux-arm-kernel, netdev, devicetree,
	linux-kernel
In-Reply-To: <54464401.20900@cogentembedded.com>

On 21.10.2014 13:31, Sergei Shtylyov wrote:
> Hello.
>
> On 10/21/2014 12:53 PM, Sebastian Hesselbarth wrote:
>
>> From: Antoine Ténart <antoine.tenart@free-electrons.com>
>
>> Internal FastEthernet PHY on BG2Q is connected via MII, add a
>> corresponding phy-connection-type property.
>
>> Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
>> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
>> ---
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: "Antoine Ténart" <antoine.tenart@free-electrons.com>
>> Cc: Florian Fainelli <f.fainelli@gmail.com>
>> Cc: Eric Miao <eric.y.miao@gmail.com>
>> Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: netdev@vger.kernel.org
>> Cc: devicetree@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>   arch/arm/boot/dts/berlin2q.dtsi | 1 +
>>   1 file changed, 1 insertion(+)
>
>> diff --git a/arch/arm/boot/dts/berlin2q.dtsi
>> b/arch/arm/boot/dts/berlin2q.dtsi
>> index 891d56b03922..6dbc520bddc1 100644
>> --- a/arch/arm/boot/dts/berlin2q.dtsi
>> +++ b/arch/arm/boot/dts/berlin2q.dtsi
>> @@ -127,6 +127,7 @@
>>               status = "disabled";
>>
>>               ethphy0: ethernet-phy@0 {
>> +                phy-connection-type = "mii";
>
>     You're adding this prop to the PHY node? That's very weird...
> normally, it's a property of a MDIO bus node.

Sergei,

How can this be a property of the MDIO bus node? Just think of an MDIO
bus with two PHYs where one is connected via GMII and the other via
RGMII? How should this work?

But you are right that the property should not be part of the PHY node
but the controller node instead. I'll rework and send an update, thanks
for the hint.

Sebastian

^ permalink raw reply

* Re: [PATCH 8/9] ARM: berlin: Add phy-connection-type to BG2Q PHY
From: Sergei Shtylyov @ 2014-10-21 11:31 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: Antoine Ténart, David S. Miller, Florian Fainelli, Eric Miao,
	Haojian Zhuang, linux-arm-kernel, netdev, devicetree,
	linux-kernel
In-Reply-To: <1413881627-21639-9-git-send-email-sebastian.hesselbarth@gmail.com>

Hello.

On 10/21/2014 12:53 PM, Sebastian Hesselbarth wrote:

> From: Antoine Ténart <antoine.tenart@free-electrons.com>

> Internal FastEthernet PHY on BG2Q is connected via MII, add a
> corresponding phy-connection-type property.

> Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> ---
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: "Antoine Ténart" <antoine.tenart@free-electrons.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Eric Miao <eric.y.miao@gmail.com>
> Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: netdev@vger.kernel.org
> Cc: devicetree@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>   arch/arm/boot/dts/berlin2q.dtsi | 1 +
>   1 file changed, 1 insertion(+)

> diff --git a/arch/arm/boot/dts/berlin2q.dtsi b/arch/arm/boot/dts/berlin2q.dtsi
> index 891d56b03922..6dbc520bddc1 100644
> --- a/arch/arm/boot/dts/berlin2q.dtsi
> +++ b/arch/arm/boot/dts/berlin2q.dtsi
> @@ -127,6 +127,7 @@
>   			status = "disabled";
>
>   			ethphy0: ethernet-phy@0 {
> +				phy-connection-type = "mii";

    You're adding this prop to the PHY node? That's very weird... normally, 
it's a property of a MDIO bus node.

[...]

WBR, Sergei

^ permalink raw reply

* Re: [PATCH] Documentation: ptp: Fix build failure on MIPS cross builds
From: Richard Cochran @ 2014-10-21 11:07 UTC (permalink / raw)
  To: Markos Chandras
  Cc: linux-mips, Jonathan Corbet, netdev, linux-doc, linux-kernel
In-Reply-To: <1413794538-28465-1-git-send-email-markos.chandras@imgtec.com>

On Mon, Oct 20, 2014 at 09:42:18AM +0100, Markos Chandras wrote:
> diff --git a/Documentation/ptp/Makefile b/Documentation/ptp/Makefile
> index 293d6c09a11f..397c1cd2eda7 100644
> --- a/Documentation/ptp/Makefile
> +++ b/Documentation/ptp/Makefile
> @@ -1,5 +1,15 @@
>  # List of programs to build
> +ifndef CROSS_COMPILE
>  hostprogs-y := testptp
> +else
> +# MIPS system calls are defined based on the -mabi that is passed
> +# to the toolchain which may or may not be a valid option
> +# for the host toolchain. So disable testptp if target architecture
> +# is MIPS but the host isn't.
> +ifndef CONFIG_MIPS
> +hostprogs-y := testptp
> +endif
> +endif

It seems like a shame to simply give up and not compile this at all.
Is there no way to correctly cross compile this for MIPS?

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH net] bpf: fix bug in eBPF verifier
From: Hannes Frederic Sowa @ 2014-10-21 10:35 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David S. Miller, Daniel Borkmann, netdev, linux-kernel
In-Reply-To: <1413842097-4380-1-git-send-email-ast@plumgrid.com>

On Mo, 2014-10-20 at 14:54 -0700, Alexei Starovoitov wrote:
> while comparing for verifier state equivalency the comparison
> was missing a check for uninitialized register.
> Make sure it does so and add a testcase.
> 
> Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
> 
> while we were staring at the verifier code with Hannes during LPC
> something felt odd in this spot. Yes. It was a bug. Fix it.
> 
>  kernel/bpf/verifier.c       |    3 ++-
>  samples/bpf/test_verifier.c |   11 +++++++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 801f5f3..9f81818 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1409,7 +1409,8 @@ static bool states_equal(struct verifier_state *old, struct verifier_state *cur)
>  		if (memcmp(&old->regs[i], &cur->regs[i],
>  			   sizeof(old->regs[0])) != 0) {
>  			if (old->regs[i].type == NOT_INIT ||
> -			    old->regs[i].type == UNKNOWN_VALUE)
> +			    (old->regs[i].type == UNKNOWN_VALUE &&
> +			     cur->regs[i].type != NOT_INIT))
>  				continue;
>  			return false;
>  		}

That makes sense.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks,
Hannes

^ permalink raw reply

* Re: RCU stall in af_unix.c, should use spin_lock_irqsave?
From: Eric Dumazet @ 2014-10-21 10:28 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel,
	Alexandre FOURNIER, Ezequiel Garcia, Marcin Wojtas,
	Gregory Clément
In-Reply-To: <20141021121011.53686d5f@free-electrons.com>

On Tue, 2014-10-21 at 12:10 +0200, Thomas Petazzoni wrote:

> Ok. So it's actually safe to mix spin_lock() and spin_lock_irqsave() on
> the same lock, if you know that this lock will never ever be taken in
> an interrupt context?

Sure.

> 
> > mvpp2 is seriously brain damaged : on_each_cpu() cannot be used from
> > a bottom half handler.
> 
> That's what I thought. Back to the drawing board then, to fix mvpp2.
> 
> Do you think there is a place where we can write down those
> assumptions? It isn't easy to spot whether on_each_cpu() is safe to use
> in a bottom half or not.
> 

Really ? kernel/smp.c is full of comments.

Too many comments and people seem to not read them ;)

Take a look at smp_call_function(), which is called from on_each_cpu()

^ permalink raw reply

* Re: RCU stall in af_unix.c, should use spin_lock_irqsave?
From: Thomas Petazzoni @ 2014-10-21 10:13 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel,
	Alexandre FOURNIER, Ezequiel Garcia, Marcin Wojtas,
	Gregory Clément
In-Reply-To: <1413886132.32553.14.camel@localhost>

Dear Hannes Frederic Sowa,

On Tue, 21 Oct 2014 12:08:52 +0200, Hannes Frederic Sowa wrote:
> On Di, 2014-10-21 at 10:03 +0200, Thomas Petazzoni wrote:
> > So, the question is: is this patch the correct solution (but then other
> > usage of spin_lock in af_unix.c might also need fixing) ? Or is the
> > network driver at fault?
> 
> It feels like a false positive. Do you see one core spinning tightly on
> a lock? Does the system get unusable?

Interrupts are still enabled (for example, sysrq are still working),
but scheduling no longer takes place (all processes are blocked).

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: RCU stall in af_unix.c, should use spin_lock_irqsave?
From: Thomas Petazzoni @ 2014-10-21 10:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel,
	Alexandre FOURNIER, Ezequiel Garcia, Marcin Wojtas,
	Gregory Clément
In-Reply-To: <1413885874.23173.11.camel@edumazet-glaptop2.roam.corp.google.com>

Dear Eric Dumazet,

On Tue, 21 Oct 2014 03:04:34 -0700, Eric Dumazet wrote:

> > So, the question is: is this patch the correct solution (but then other
> > usage of spin_lock in af_unix.c might also need fixing) ? Or is the
> > network driver at fault?
> > 
> > Thanks for your input,
> > 
> > Thomas
> 
> Locks in af_unix do not need to mask irqs. Ever.
> 
> skb_queue_tail() uses an irqsave variant because its a generic function,
> and _some_ skb list might be manipulated from hard irq handlers in pre
> NAPI drivers. But af_unix does not have an interrupt handler that could
> potentially try to lock sk_receive_queue.lock

Ok. So it's actually safe to mix spin_lock() and spin_lock_irqsave() on
the same lock, if you know that this lock will never ever be taken in
an interrupt context?

> mvpp2 is seriously brain damaged : on_each_cpu() cannot be used from
> a bottom half handler.

That's what I thought. Back to the drawing board then, to fix mvpp2.

Do you think there is a place where we can write down those
assumptions? It isn't easy to spot whether on_each_cpu() is safe to use
in a bottom half or not.

Anyway, thanks a lot for your quick feedback!

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: RCU stall in af_unix.c, should use spin_lock_irqsave?
From: Hannes Frederic Sowa @ 2014-10-21 10:08 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel,
	Alexandre FOURNIER, Ezequiel Garcia, Marcin Wojtas,
	Gregory Clément
In-Reply-To: <20141021100313.397f4962@free-electrons.com>

On Di, 2014-10-21 at 10:03 +0200, Thomas Petazzoni wrote:
> So, the question is: is this patch the correct solution (but then other
> usage of spin_lock in af_unix.c might also need fixing) ? Or is the
> network driver at fault?

It feels like a false positive. Do you see one core spinning tightly on
a lock? Does the system get unusable?

Bye,
Hannes

^ permalink raw reply

* Re: RCU stall in af_unix.c, should use spin_lock_irqsave?
From: Eric Dumazet @ 2014-10-21 10:04 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel,
	Alexandre FOURNIER, Ezequiel Garcia, Marcin Wojtas,
	Gregory Clément
In-Reply-To: <20141021100313.397f4962@free-electrons.com>

On Tue, 2014-10-21 at 10:03 +0200, Thomas Petazzoni wrote:
> Hello,
> 
> I stumbled across a reproducible RCU stall related to the AF_UNIX code
> (on 3.17, on an ARM SMP system), and I'm not sure whether the problem
> is caused by:
> 
>  * The af_unix.c code using spin_lock() on sk->sk_receive_queue.lock
>    while it should be using spin_lock_irqsave().
> 
> OR
> 
>  * The mvpp2 Ethernet driver using on_each_cpu() in a softirq context.
> 
> At least, switching to use spin_lock_irqsave() instead of spin_lock()
> has proven to fix the issue (see patch below). The spinlock validator
> complains that a lockup has been detected, which might indicate that
> something is indeed wrong with the spinlock handling.
> 
> Now, to the gory details. When stress-testing a lighttpd web server
> that does a lot of CGI calls and therefore a lot of Unix socket
> traffic, I get typically after couple of minutes the following RCU
> stall:
> 
> INFO: rcu_sched self-detected stall on CPU { 0}  (t=2100 jiffies g=14665 c=14664 q=1352)
> Task dump for CPU 0:
> lighttpd        R running      0  1549      1 0x00000002
> [<c0014f94>] (unwind_backtrace) from [<c001130c>] (show_stack+0x10/0x14)
> [<c001130c>] (show_stack) from [<c0059688>] (rcu_dump_cpu_stacks+0x98/0xd4)
> [<c0059688>] (rcu_dump_cpu_stacks) from [<c005c3ec>] (rcu_check_callbacks+0x424/0x740)
> [<c005c3ec>] (rcu_check_callbacks) from [<c005e7c8>] (update_process_times+0x40/0x60)
> [<c005e7c8>] (update_process_times) from [<c006ce70>] (tick_sched_timer+0x70/0x210)
> [<c006ce70>] (tick_sched_timer) from [<c005efc4>] (__run_hrtimer.isra.35+0x6c/0x128)
> [<c005efc4>] (__run_hrtimer.isra.35) from [<c005f600>] (hrtimer_interrupt+0x11c/0x2d0)
> [<c005f600>] (hrtimer_interrupt) from [<c00148f8>] (twd_handler+0x34/0x44)
> [<c00148f8>] (twd_handler) from [<c00557ec>] (handle_percpu_devid_irq+0x6c/0x84)
> [<c00557ec>] (handle_percpu_devid_irq) from [<c0051c80>] (generic_handle_irq+0x2c/0x3c)
> [<c0051c80>] (generic_handle_irq) from [<c000eafc>] (handle_IRQ+0x40/0x90)
> [<c000eafc>] (handle_IRQ) from [<c00086d0>] (gic_handle_irq+0x2c/0x5c)
> [<c00086d0>] (gic_handle_irq) from [<c0011e40>] (__irq_svc+0x40/0x54)
> Exception stack(0xde0c1ce8 to 0xde0c1d30)
> 1ce0:                   c073a684 20010113 c06e30fc 00000003 de0c1d30 00000001
> 1d00: 00000001 0000012c dfbdcc40 ffffe70c dfbdcc48 df5bd800 00000002 de0c1d30
> 1d20: c00712d4 c00712bc 20010113 ffffffff
> [<c0011e40>] (__irq_svc) from [<c00712bc>] (generic_exec_single+0x10c/0x180)
> [<c00712bc>] (generic_exec_single) from [<c00713d0>] (smp_call_function_single+0xa0/0xcc)
> [<c00713d0>] (smp_call_function_single) from [<c0071818>] (on_each_cpu+0x2c/0x48)
> [<c0071818>] (on_each_cpu) from [<c03312dc>] (mvpp2_poll+0x30/0x594)
> [<c03312dc>] (mvpp2_poll) from [<c041d024>] (net_rx_action+0xb0/0x170)
> [<c041d024>] (net_rx_action) from [<c00220c4>] (__do_softirq+0x120/0x274)
> [<c00220c4>] (__do_softirq) from [<c0022468>] (irq_exit+0x78/0xb0)
> [<c0022468>] (irq_exit) from [<c000eb00>] (handle_IRQ+0x44/0x90)
> [<c000eb00>] (handle_IRQ) from [<c00086d0>] (gic_handle_irq+0x2c/0x5c)
> [<c00086d0>] (gic_handle_irq) from [<c0011e40>] (__irq_svc+0x40/0x54)
> Exception stack(0xde0c1eb8 to 0xde0c1f00)
> 1ea0:                                                       de1b986c 00000000
> 1ec0: 00000420 de1b986c de1b9800 d761c080 be913a34 be913a34 00000007 de0c0000
> 1ee0: d761c0a0 be913a34 00000010 de0c1f00 c0491898 c0491918 60010013 ffffffff
> [<c0011e40>] (__irq_svc) from [<c0491918>] (unix_inq_len+0x9c/0xa8)
> [<c0491918>] (unix_inq_len) from [<c049194c>] (unix_ioctl+0x28/0x88)
> [<c049194c>] (unix_ioctl) from [<c0407ccc>] (sock_ioctl+0x124/0x280)
> [<c0407ccc>] (sock_ioctl) from [<c00c11bc>] (do_vfs_ioctl+0x3fc/0x5c0)
> [<c00c11bc>] (do_vfs_ioctl) from [<c00c13b4>] (SyS_ioctl+0x34/0x5c)
> [<c00c13b4>] (SyS_ioctl) from [<c000e220>] (ret_fast_syscall+0x0/0x30)
> Task dump for CPU 1:
> kiplink_admin.f R running      0  1932   1549 0x00000000
> [<c0513a04>] (__schedule) from [<00000007>] (0x7)
> 
> If my analysis is correct, what happens on CPU0 is that:
> 
>  * lighttpd does an ioctl() on a socket, which ends up calling
>    unix_inq_len(), which tries to get a spinlock using spin_lock(). The
>    lock is probably taken.
> 
>  * while waiting for this lock, we get a network RX interrupt, which
>    schedules the network RX softirq, which ends up calling the ->poll()
>    function of the network driver, in our case mvpp2_poll().
> 
>  * since the network hardware has some per-CPU registers that we need
>    to read on all CPUs, the network driver does a on_each_cpu() call.
>    This apparently leads nowhere, as after a while, the timer interrupt
>    kicks in and decides we're not making progress anymore.
> 
> After enabling spinlock debugging, I get the following right before the
> RCU stall (note how the RCU stall happens on CPU0, while the spinlock
> lockup suspected happens on CPU1) :
> 
> BUG: spinlock lockup suspected on CPU#1, kiplink_admin.f/1938
>  lock: 0xde4998c0, .magic: dead4ead, .owner: lighttpd/1910, .owner_cpu: 0
> CPU: 1 PID: 1938 Comm: kiplink_admin.f Tainted: G        W  O   3.17.0-00017-g53fa061 #2
> [<c00154d8>] (unwind_backtrace) from [<c001183c>] (show_stack+0x10/0x14)
> [<c001183c>] (show_stack) from [<c053f560>] (dump_stack+0x9c/0xbc)
> [<c053f560>] (dump_stack) from [<c0057338>] (do_raw_spin_lock+0x118/0x18c)
> [<c0057338>] (do_raw_spin_lock) from [<c05466fc>] (_raw_spin_lock_irqsave+0x60/0x6c)
> [<c05466fc>] (_raw_spin_lock_irqsave) from [<c042a7d4>] (skb_queue_tail+0x18/0x48)
> [<c042a7d4>] (skb_queue_tail) from [<c04b9f58>] (unix_stream_sendmsg+0x1c8/0x36c)
> [<c04b9f58>] (unix_stream_sendmsg) from [<c0422eb8>] (sock_aio_write+0xcc/0xec)
> [<c0422eb8>] (sock_aio_write) from [<c00bf414>] (do_sync_write+0x80/0xa8)
> [<c00bf414>] (do_sync_write) from [<c00bfe60>] (vfs_write+0x108/0x1b0)
> [<c00bfe60>] (vfs_write) from [<c00c0418>] (SyS_write+0x40/0x94)
> [<c00c0418>] (SyS_write) from [<c000e3a0>] (ret_fast_syscall+0x0/0x48)
> 
> And interestingly, skb_queue_tail() is also taking the same spinlock as
> unix_inq_len(), except that it does so with spin_lock_irqsave(). And
> this is causing the issue: since this spin_lock_irqsave() takes place
> on CPU1, the interupts are disabled, and therefore we're not getting
> the IPI that allows the on_each_cpu() call coming from CPU0 to make
> progress, causing the lockup.
> 
> The patch below has proven to fix the issue: I was able to reproduce
> the issue in maximum 5 to 10 minutes, and with the patch the system has
> survived an entire night of testing.
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index e968843..c60205a 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -2124,11 +2124,12 @@ long unix_inq_len(struct sock *sk)
>  {
>         struct sk_buff *skb;
>         long amount = 0;
> +       unsigned long flags;
>  
>         if (sk->sk_state == TCP_LISTEN)
>                 return -EINVAL;
>  
> -       spin_lock(&sk->sk_receive_queue.lock);
> +       spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
>         if (sk->sk_type == SOCK_STREAM ||
>             sk->sk_type == SOCK_SEQPACKET) {
>                 skb_queue_walk(&sk->sk_receive_queue, skb)
> @@ -2138,7 +2139,7 @@ long unix_inq_len(struct sock *sk)
>                 if (skb)
>                         amount = skb->len;
>         }
> -       spin_unlock(&sk->sk_receive_queue.lock);
> +       spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
>  
>         return amount;
>  }
> 
> So, the question is: is this patch the correct solution (but then other
> usage of spin_lock in af_unix.c might also need fixing) ? Or is the
> network driver at fault?
> 
> Thanks for your input,
> 
> Thomas

Locks in af_unix do not need to mask irqs. Ever.

skb_queue_tail() uses an irqsave variant because its a generic function,
and _some_ skb list might be manipulated from hard irq handlers in pre
NAPI drivers. But af_unix does not have an interrupt handler that could
potentially try to lock sk_receive_queue.lock

mvpp2 is seriously brain damaged : on_each_cpu() cannot be used from
a bottom half handler.

^ permalink raw reply

* [PATCH net] net: sched: initialize bstats syncp
From: Sabrina Dubroca @ 2014-10-21  9:23 UTC (permalink / raw)
  To: davem; +Cc: netdev, john.r.fastabend

Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.

Fixes: 22e0f8b9322c "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 net/sched/sch_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2cf61b3e633c..76f402e05bd6 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -947,7 +947,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 	if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) {
 		if (qdisc_is_percpu_stats(sch)) {
 			sch->cpu_bstats =
-				alloc_percpu(struct gnet_stats_basic_cpu);
+				netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
 			if (!sch->cpu_bstats)
 				goto err_out4;
 
-- 
2.1.2

^ permalink raw reply related

* [PATCH nf] netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation
From: Sabrina Dubroca @ 2014-10-21  9:08 UTC (permalink / raw)
  To: pablo, kaber, kadlec; +Cc: netfilter-devel, coreteam, netdev

alloc_percpu returns NULL on failure, not a negative error code.

Fixes: ff3cd7b3c922 ("netfilter: nf_tables: refactor chain statistic routines")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 net/netfilter/nf_tables_api.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 556a0dfa4abc..eb123676e1ef 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1328,10 +1328,10 @@ static int nf_tables_newchain(struct sock *nlsk, struct sk_buff *skb,
 			basechain->stats = stats;
 		} else {
 			stats = netdev_alloc_pcpu_stats(struct nft_stats);
-			if (IS_ERR(stats)) {
+			if (stats == NULL) {
 				module_put(type->owner);
 				kfree(basechain);
-				return PTR_ERR(stats);
+				return -ENOMEM;
 			}
 			rcu_assign_pointer(basechain->stats, stats);
 		}
-- 
2.1.2

^ permalink raw reply related

* [PATCH 6/9] ARM: berlin: Add BG2 ethernet DT nodes
From: Sebastian Hesselbarth @ 2014-10-21  8:53 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: David S. Miller, Antoine Ténart, Florian Fainelli, Eric Miao,
	Haojian Zhuang, linux-arm-kernel, netdev, devicetree,
	linux-kernel
In-Reply-To: <1413881627-21639-1-git-send-email-sebastian.hesselbarth@gmail.com>

Marvell BG2 has two fast ethernet controllers with internal PHY,
add the corresponding nodes to SoC dtsi.

Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
---
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Antoine Ténart" <antoine.tenart@free-electrons.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm/boot/dts/berlin2.dtsi | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/arch/arm/boot/dts/berlin2.dtsi b/arch/arm/boot/dts/berlin2.dtsi
index 9d7c810ebd0b..31d5922263d7 100644
--- a/arch/arm/boot/dts/berlin2.dtsi
+++ b/arch/arm/boot/dts/berlin2.dtsi
@@ -79,11 +79,47 @@
 			clocks = <&chip CLKID_TWD>;
 		};
 
+		eth1: ethernet@b90000 {
+			compatible = "marvell,pxa168-eth";
+			reg = <0xb90000 0x10000>;
+			clocks = <&chip CLKID_GETH1>;
+			interrupts = <GIC_SPI 24 IRQ_TYPE_LEVEL_HIGH>;
+			/* set by bootloader */
+			local-mac-address = [00 00 00 00 00 00];
+			#address-cells = <1>;
+			#size-cells = <0>;
+			phy-handle = <&ethphy1>;
+			status = "disabled";
+
+			ethphy1: ethernet-phy@0 {
+				phy-connection-type = "mii";
+				reg = <0>;
+			};
+		};
+
 		cpu-ctrl@dd0000 {
 			compatible = "marvell,berlin-cpu-ctrl";
 			reg = <0xdd0000 0x10000>;
 		};
 
+		eth0: ethernet@e50000 {
+			compatible = "marvell,pxa168-eth";
+			reg = <0xe50000 0x10000>;
+			clocks = <&chip CLKID_GETH0>;
+			interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>;
+			/* set by bootloader */
+			local-mac-address = [00 00 00 00 00 00];
+			#address-cells = <1>;
+			#size-cells = <0>;
+			phy-handle = <&ethphy0>;
+			status = "disabled";
+
+			ethphy0: ethernet-phy@0 {
+				phy-connection-type = "mii";
+				reg = <0>;
+			};
+		};
+
 		apb@e80000 {
 			compatible = "simple-bus";
 			#address-cells = <1>;
-- 
2.1.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox