Netdev List
 help / color / mirror / Atom feed
* RE: [patch v1 1/2] dt-bindings: net: add binding documentation for mlxsw thermal control
From: Vadim Pasternak @ 2017-08-29 17:57 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org,
	ivecera-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20170829172254.GG8235-g2DYL2Zd6BY@public.gmane.org>



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew-g2DYL2Zd6BY@public.gmane.org]
> Sent: Tuesday, August 29, 2017 8:23 PM
> To: Vadim Pasternak <vadimp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org; davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org; jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org;
> ivecera-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v1 1/2] dt-bindings: net: add binding documentation for
> mlxsw thermal control
> 
> > +- compatible		: "mellanox,mlxsw_minimal"
> 
> Interesting product name. Is there a mlxsw_maximal planned?
> 

Hi Andrew,

Thank you very much for review.

No plans for such product. We just have fully functional drivers for different
kind of Mellanox switch devices like spectrum, switchib, switchx2. All of them
work over PCI bus. The minimal is supposed to be used for the chassis
management and we uses it at BMC side. It works over I2C bus and doesn't
depend on switch type. So it has a minaml functionality, so name "minimal".

> > +- reg			: The I2C address of the device.
> > +
> > +Optional properties:
> > +- cooling-phandle	: phandle of the cooling device, which is to be used
> > +			  for the zone thermal control.
> > +			  If absent, cooling device controlled internally by
> > +			  the ASIC may be used.
> > +
> > +- trips			: the nodes to describe a point in the
> temperature
> > +			  domain with key temperatures at which cooling is
> > +			  recommended. Each node must contain the next
> values:
> > +			  - type: the trip type. Expected values are:
> > +			    0 - a trip point to enable active cooling;
> > +			    1 - a trip point to enable passive cooling;
> > +			    2 - a trip point to notify emergency;
> > +			  - temperature: unsigned integer indicating the trip
> > +			    temperature level in millicelsius;
> > +			  - minimum cooling state allowed within the trip
> node;
> > +			  - maximum cooling state allowed within the trip
> node;
> > +
> > +Example:
> > +	asic_thermal: mlxsw_minimal@48 {
> > +		compatible = "mlxsw_minimal";
> 
> You missed the vendor part.

Acked.

> 
> > +		reg = <0x48>;
> > +		status = "disabled";
> 
> An example with it disabled?

We just use it in such way at BMC side. It's disabled by default and upon
event indicating the good health for the device the device driver is
connected. I can remove it from the example. But for BMC it's actually
the default state.

> 
> > +		cooling-phandle = <&cooling>;
> > +
> > +		trips {
> > +			trip@0 {
> > +				trip = <0 75000 0 0>;
> > +			};
> 
> I don't know much about the thermal subsystem. But looking at other
> example binding documents, you seem to do something different here to
> other drivers. Why do you not use what seems to be the common format:

In mlxsw_thermal driver we have definition for the thermal trips, which contains
the type, like  "active" of "passive", temperature  in millicelsius and min/max states
for cooling device. These vector defines thermal trip points.
The hysteresis parameter is not relevant.

For example, ASIC thermal sensor is associated with the cooling device like:
&pwm_tacho {
...
	cooling: fan@0 {
		reg = <0x00>;
		cooling-levels = /bits/ 8 <125 151 177 203 229 255>;
		aspeed,fan-tach-ch = /bits/ 8 <0x00>;
	};

And the below sub-nodes
			trip@0 {
				trip = <0 75000 0 0>;
			};
			trip@1 {
				trip = <2 85000 1 5>;
			};
			trip@3 {
				trip = <2 105000 5 5>;
			};

defines that PWM should be at default speed (125), while temperature is
below 75000, should be at max speed (255), while temperature is above 
10500, and should step according the temperate trend between.

Thanks,
Vadim.

> 
>                trips {
>                         cpu_alert0: cpu-alert0 {
>                                 temperature = <90000>; /* millicelsius */
>                                 hysteresis = <2000>; /* millicelsius */
>                                 type = "active";
>                         };
>                         cpu_alert1: cpu-alert1 {
>                                 temperature = <100000>; /* millicelsius */
>                                 hysteresis = <2000>; /* millicelsius */
>                                 type = "passive";
>                         };
>                         cpu_crit: cpu-crit {
>                                 temperature = <125000>; /* millicelsius */
>                                 hysteresis = <2000>; /* millicelsius */
>                                 type = "critical";
>                         };
>                 };
> 
> 	Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net] sch_htb: fix crash on init failure
From: Nikolay Aleksandrov @ 2017-08-29 17:58 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, jhs, xiyou.wangcong, jiri, roopa, Nikolay Aleksandrov

The commit below added a call to the ->destroy() callback for all qdiscs
which failed in their ->init(), but some were not prepared for such
change and can't handle partially initialized qdisc. HTB is one of them
and if any error occurs before the qdisc watchdog timer and qdisc work are
initialized then we can hit either a null ptr deref (timer->base) when
canceling in ->destroy or lockdep error info about trying to register
a non-static key and a stack dump. So to fix these two move the watchdog
timer and workqueue init before anything that can err out.
To reproduce userspace needs to send broken htb qdisc create request,
tested with a modified tc (q_htb.c).

Trace log:
[ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2710.897977] IP: hrtimer_active+0x17/0x8a
[ 2710.898174] PGD 58fab067
[ 2710.898175] P4D 58fab067
[ 2710.898353] PUD 586c0067
[ 2710.898531] PMD 0
[ 2710.898710]
[ 2710.899045] Oops: 0000 [#1] SMP
[ 2710.899232] Modules linked in:
[ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54
[ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000
[ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a
[ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246
[ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000
[ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298
[ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001
[ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000
[ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0
[ 2710.901907] FS:  00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
[ 2710.902277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0
[ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2710.903180] Call Trace:
[ 2710.903332]  hrtimer_try_to_cancel+0x1a/0x93
[ 2710.903504]  hrtimer_cancel+0x15/0x20
[ 2710.903667]  qdisc_watchdog_cancel+0x12/0x14
[ 2710.903866]  htb_destroy+0x2e/0xf7
[ 2710.904097]  qdisc_create+0x377/0x3fd
[ 2710.904330]  tc_modify_qdisc+0x4d2/0x4fd
[ 2710.904511]  rtnetlink_rcv_msg+0x188/0x197
[ 2710.904682]  ? rcu_read_unlock+0x3e/0x5f
[ 2710.904849]  ? rtnl_newlink+0x729/0x729
[ 2710.905017]  netlink_rcv_skb+0x6c/0xce
[ 2710.905183]  rtnetlink_rcv+0x23/0x2a
[ 2710.905345]  netlink_unicast+0x103/0x181
[ 2710.905511]  netlink_sendmsg+0x326/0x337
[ 2710.905679]  sock_sendmsg_nosec+0x14/0x3f
[ 2710.905847]  sock_sendmsg+0x29/0x2e
[ 2710.906010]  ___sys_sendmsg+0x209/0x28b
[ 2710.906176]  ? do_raw_spin_unlock+0xcd/0xf8
[ 2710.906346]  ? _raw_spin_unlock+0x27/0x31
[ 2710.906514]  ? __handle_mm_fault+0x651/0xdb1
[ 2710.906685]  ? check_chain_key+0xb0/0xfd
[ 2710.906855]  __sys_sendmsg+0x45/0x63
[ 2710.907018]  ? __sys_sendmsg+0x45/0x63
[ 2710.907185]  SyS_sendmsg+0x19/0x1b
[ 2710.907344]  entry_SYSCALL_64_fastpath+0x23/0xc2

Note that probably this bug goes further back because the default qdisc
handling always calls ->destroy on init failure too.

Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
Always calling qdisc destroy on init failure in the default qdisc handling
was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
one as fixes tag.

There're more fixes to come, some are much easier to trigger without
modifications to tc.

 net/sched/sch_htb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 5d65ec5207e9..5bf5177b2bd3 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1017,6 +1017,9 @@ static int htb_init(struct Qdisc *sch, struct nlattr *opt)
 	int err;
 	int i;
 
+	qdisc_watchdog_init(&q->watchdog, sch);
+	INIT_WORK(&q->work, htb_work_func);
+
 	if (!opt)
 		return -EINVAL;
 
@@ -1041,8 +1044,6 @@ static int htb_init(struct Qdisc *sch, struct nlattr *opt)
 	for (i = 0; i < TC_HTB_NUMPRIO; i++)
 		INIT_LIST_HEAD(q->drops + i);
 
-	qdisc_watchdog_init(&q->watchdog, sch);
-	INIT_WORK(&q->work, htb_work_func);
 	qdisc_skb_head_init(&q->direct_queue);
 
 	if (tb[TCA_HTB_DIRECT_QLEN])
-- 
2.1.4

^ permalink raw reply related

* Re: UDP sockets oddities
From: Eric Dumazet @ 2017-08-29 18:01 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: David Miller, netdev, pabeni, willemb
In-Reply-To: <353aa37f-c62b-f5af-8c89-f67af6509497@gmail.com>

On Tue, 2017-08-29 at 10:53 -0700, Florian Fainelli wrote:
> On 08/26/2017 11:56 AM, Florian Fainelli wrote:
> > 
> > 
> > On 08/26/2017 05:47 AM, Eric Dumazet wrote:
> >> On Fri, 2017-08-25 at 21:19 -0700, David Miller wrote:
> >>
> >>> Agreed, but the ARP resolution queue really needs to scale it's backlog
> >>> to the physical technology it is attached to.
> >> Yes, last time (in 2011) we increased the old limit of 3 packets :/
> >>
> >> We probably should match sysctl_wmem_max so that a single socket
> >> provider would hit its sk_sndbuf limit
> 
> Eric, do you want to post this as a formal patch? I don't think I
> understand these tunables enough to provide a good commit message
> anyways. Thanks!

I will post it today.

I was out of the office yesterday, rafting on the south fork of American
River ;)


This will target net-next.

Thanks.

^ permalink raw reply

* Re: pull-request: wireless-drivers-next 2017-08-28
From: David Miller @ 2017-08-29 18:05 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <87val8ujvp.fsf@kamboji.qca.qualcomm.com>

From: Kalle Valo <kvalo@codeaurora.org>
Date: Mon, 28 Aug 2017 12:22:34 +0300

> here's a pull request to net-next for 4.14. Because I pulled
> wireless-drivers (at least that's my suspicion) the diffstat was wrong
> again and I created it manually. I recall Linus somewhere saying that in
> certain cases this is normal and it's ok to create the diffstat
> manually, so I don't worry about this anymore.

Yeah, that's fine.

> In this pull request we also add SDIO_DEVICE_ID_CYPRESS_4373 to
> include/linux/mmc/sdio_ids.h which stands out in the diffstat.
> 
> Please let me know if there are any problems.

Pulled, thanks!

^ permalink raw reply

* Re: [PATCH net] sch_htb: fix crash on init failure
From: Eric Dumazet @ 2017-08-29 18:09 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, edumazet, jhs, xiyou.wangcong, jiri, roopa
In-Reply-To: <1504029487-7085-1-git-send-email-nikolay@cumulusnetworks.com>

On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
> The commit below added a call to the ->destroy() callback for all qdiscs
> which failed in their ->init(), but some were not prepared for such
> change and can't handle partially initialized qdisc. HTB is one of them
> and if any error occurs before the qdisc watchdog timer and qdisc work are
> initialized then we can hit either a null ptr deref (timer->base) when
> canceling in ->destroy or lockdep error info about trying to register
> a non-static key and a stack dump. So to fix these two move the watchdog
> timer and workqueue init before anything that can err out.
> To reproduce userspace needs to send broken htb qdisc create request,
> tested with a modified tc (q_htb.c).

> Note that probably this bug goes further back because the default qdisc
> handling always calls ->destroy on init failure too.
> 
> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> Always calling qdisc destroy on init failure in the default qdisc handling
> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
> one as fixes tag.

Well, we probably need to audit init/destroy not only in net/sched, but
other parts of networking stack.

What about the qdisc_skb_head_init(&q->direct_queue) call ?

I am surprised you do not crash in __skb_queue_purge(&q->direct_queue);

^ permalink raw reply

* Re: [PATCH net] sch_htb: fix crash on init failure
From: Cong Wang @ 2017-08-29 18:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nikolay Aleksandrov, Linux Kernel Network Developers,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Roopa Prabhu
In-Reply-To: <1504030196.11498.82.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, Aug 29, 2017 at 11:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
>> The commit below added a call to the ->destroy() callback for all qdiscs
>> which failed in their ->init(), but some were not prepared for such
>> change and can't handle partially initialized qdisc. HTB is one of them
>> and if any error occurs before the qdisc watchdog timer and qdisc work are
>> initialized then we can hit either a null ptr deref (timer->base) when
>> canceling in ->destroy or lockdep error info about trying to register
>> a non-static key and a stack dump. So to fix these two move the watchdog
>> timer and workqueue init before anything that can err out.
>> To reproduce userspace needs to send broken htb qdisc create request,
>> tested with a modified tc (q_htb.c).
>
>> Note that probably this bug goes further back because the default qdisc
>> handling always calls ->destroy on init failure too.
>>
>> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>> Always calling qdisc destroy on init failure in the default qdisc handling
>> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
>> one as fixes tag.
>
> Well, we probably need to audit init/destroy not only in net/sched, but
> other parts of networking stack.
>
> What about the qdisc_skb_head_init(&q->direct_queue) call ?

It just zero the pointers:

static inline void qdisc_skb_head_init(struct qdisc_skb_head *qh)
{
        qh->head = NULL;
        qh->tail = NULL;
        qh->qlen = 0;
}

And qdisc is already kzalloc()'ed.

^ permalink raw reply

* Re: [PATCH net] sch_htb: fix crash on init failure
From: Eric Dumazet @ 2017-08-29 18:14 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, edumazet, jhs, xiyou.wangcong, jiri, roopa
In-Reply-To: <1504030196.11498.82.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2017-08-29 at 11:09 -0700, Eric Dumazet wrote:
> On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
> > The commit below added a call to the ->destroy() callback for all qdiscs
> > which failed in their ->init(), but some were not prepared for such
> > change and can't handle partially initialized qdisc. HTB is one of them
> > and if any error occurs before the qdisc watchdog timer and qdisc work are
> > initialized then we can hit either a null ptr deref (timer->base) when
> > canceling in ->destroy or lockdep error info about trying to register
> > a non-static key and a stack dump. So to fix these two move the watchdog
> > timer and workqueue init before anything that can err out.
> > To reproduce userspace needs to send broken htb qdisc create request,
> > tested with a modified tc (q_htb.c).
> 
> > Note that probably this bug goes further back because the default qdisc
> > handling always calls ->destroy on init failure too.
> > 
> > Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> > Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> > ---
> > Always calling qdisc destroy on init failure in the default qdisc handling
> > was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
> > one as fixes tag.
> 
> Well, we probably need to audit init/destroy not only in net/sched, but
> other parts of networking stack.
> 
> What about the qdisc_skb_head_init(&q->direct_queue) call ?
> 
> I am surprised you do not crash in __skb_queue_purge(&q->direct_queue);

Oh, this is because skb_peek() is happy if queue->next is NULL.

^ permalink raw reply

* Re: [PATCH net] sch_htb: fix crash on init failure
From: Nikolay Aleksandrov @ 2017-08-29 18:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, edumazet, jhs, xiyou.wangcong, jiri, roopa
In-Reply-To: <1504030196.11498.82.camel@edumazet-glaptop3.roam.corp.google.com>

On 29/08/17 21:09, Eric Dumazet wrote:
> On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
>> The commit below added a call to the ->destroy() callback for all qdiscs
>> which failed in their ->init(), but some were not prepared for such
>> change and can't handle partially initialized qdisc. HTB is one of them
>> and if any error occurs before the qdisc watchdog timer and qdisc work are
>> initialized then we can hit either a null ptr deref (timer->base) when
>> canceling in ->destroy or lockdep error info about trying to register
>> a non-static key and a stack dump. So to fix these two move the watchdog
>> timer and workqueue init before anything that can err out.
>> To reproduce userspace needs to send broken htb qdisc create request,
>> tested with a modified tc (q_htb.c).
> 
>> Note that probably this bug goes further back because the default qdisc
>> handling always calls ->destroy on init failure too.
>>
>> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>> Always calling qdisc destroy on init failure in the default qdisc handling
>> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
>> one as fixes tag.
> 
> Well, we probably need to audit init/destroy not only in net/sched, but
> other parts of networking stack.
> 

I'm not sure I follow, I hit this while working on a net/sched/ patch and had to error
out in the init() function.

> What about the qdisc_skb_head_init(&q->direct_queue) call ?
> 
> I am surprised you do not crash in __skb_queue_purge(&q->direct_queue);
> 

Hm, do you mean in __qdisc_reset_queue() ? 
I have only tried/seen the crash happen on qdisc add.

A much simpler and easier bug is sch_multiq (ethX non-multiq):
$ tc qdisc add dev ethX root multiq
(error EOPNOTSUPP + double free of ->queues due to free in init())
and e.g. $ ip l add dumdum type dummy
to see a crash due to the corrupted memory.

^ permalink raw reply

* Re: [PATCH net] sch_htb: fix crash on init failure
From: Cong Wang @ 2017-08-29 18:20 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Linux Kernel Network Developers, Eric Dumazet, Jamal Hadi Salim,
	Jiri Pirko, Roopa Prabhu
In-Reply-To: <1504029487-7085-1-git-send-email-nikolay@cumulusnetworks.com>

On Tue, Aug 29, 2017 at 10:58 AM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> The commit below added a call to the ->destroy() callback for all qdiscs
> which failed in their ->init(), but some were not prepared for such
> change and can't handle partially initialized qdisc. HTB is one of them
> and if any error occurs before the qdisc watchdog timer and qdisc work are
> initialized then we can hit either a null ptr deref (timer->base) when
> canceling in ->destroy or lockdep error info about trying to register
> a non-static key and a stack dump. So to fix these two move the watchdog
> timer and workqueue init before anything that can err out.
> To reproduce userspace needs to send broken htb qdisc create request,
> tested with a modified tc (q_htb.c).
>
> Trace log:
> [ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 2710.897977] IP: hrtimer_active+0x17/0x8a
> [ 2710.898174] PGD 58fab067
> [ 2710.898175] P4D 58fab067
> [ 2710.898353] PUD 586c0067
> [ 2710.898531] PMD 0
> [ 2710.898710]
> [ 2710.899045] Oops: 0000 [#1] SMP
> [ 2710.899232] Modules linked in:
> [ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54
> [ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000
> [ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a
> [ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246
> [ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000
> [ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298
> [ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001
> [ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000
> [ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0
> [ 2710.901907] FS:  00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
> [ 2710.902277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0
> [ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2710.903180] Call Trace:
> [ 2710.903332]  hrtimer_try_to_cancel+0x1a/0x93
> [ 2710.903504]  hrtimer_cancel+0x15/0x20
> [ 2710.903667]  qdisc_watchdog_cancel+0x12/0x14
> [ 2710.903866]  htb_destroy+0x2e/0xf7
> [ 2710.904097]  qdisc_create+0x377/0x3fd
> [ 2710.904330]  tc_modify_qdisc+0x4d2/0x4fd
> [ 2710.904511]  rtnetlink_rcv_msg+0x188/0x197
> [ 2710.904682]  ? rcu_read_unlock+0x3e/0x5f
> [ 2710.904849]  ? rtnl_newlink+0x729/0x729
> [ 2710.905017]  netlink_rcv_skb+0x6c/0xce
> [ 2710.905183]  rtnetlink_rcv+0x23/0x2a
> [ 2710.905345]  netlink_unicast+0x103/0x181
> [ 2710.905511]  netlink_sendmsg+0x326/0x337
> [ 2710.905679]  sock_sendmsg_nosec+0x14/0x3f
> [ 2710.905847]  sock_sendmsg+0x29/0x2e
> [ 2710.906010]  ___sys_sendmsg+0x209/0x28b
> [ 2710.906176]  ? do_raw_spin_unlock+0xcd/0xf8
> [ 2710.906346]  ? _raw_spin_unlock+0x27/0x31
> [ 2710.906514]  ? __handle_mm_fault+0x651/0xdb1
> [ 2710.906685]  ? check_chain_key+0xb0/0xfd
> [ 2710.906855]  __sys_sendmsg+0x45/0x63
> [ 2710.907018]  ? __sys_sendmsg+0x45/0x63
> [ 2710.907185]  SyS_sendmsg+0x19/0x1b
> [ 2710.907344]  entry_SYSCALL_64_fastpath+0x23/0xc2
>
> Note that probably this bug goes further back because the default qdisc
> handling always calls ->destroy on init failure too.
>
> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> Always calling qdisc destroy on init failure in the default qdisc handling
> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
> one as fixes tag.
>
> There're more fixes to come, some are much easier to trigger without
> modifications to tc.

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

^ permalink raw reply

* Re: [PATCH net] sch_htb: fix crash on init failure
From: Eric Dumazet @ 2017-08-29 18:23 UTC (permalink / raw)
  To: Cong Wang
  Cc: Nikolay Aleksandrov, Linux Kernel Network Developers,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Roopa Prabhu
In-Reply-To: <CAM_iQpUh2LjwuzNQ1OqRRzKMJ1uARp0ouOQ42HEsAC5XJm_BWg@mail.gmail.com>

On Tue, 2017-08-29 at 11:13 -0700, Cong Wang wrote:
> On Tue, Aug 29, 2017 at 11:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
> >> The commit below added a call to the ->destroy() callback for all qdiscs
> >> which failed in their ->init(), but some were not prepared for such
> >> change and can't handle partially initialized qdisc. HTB is one of them
> >> and if any error occurs before the qdisc watchdog timer and qdisc work are
> >> initialized then we can hit either a null ptr deref (timer->base) when
> >> canceling in ->destroy or lockdep error info about trying to register
> >> a non-static key and a stack dump. So to fix these two move the watchdog
> >> timer and workqueue init before anything that can err out.
> >> To reproduce userspace needs to send broken htb qdisc create request,
> >> tested with a modified tc (q_htb.c).
> >
> >> Note that probably this bug goes further back because the default qdisc
> >> handling always calls ->destroy on init failure too.
> >>
> >> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >> ---
> >> Always calling qdisc destroy on init failure in the default qdisc handling
> >> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
> >> one as fixes tag.
> >
> > Well, we probably need to audit init/destroy not only in net/sched, but
> > other parts of networking stack.
> >
> > What about the qdisc_skb_head_init(&q->direct_queue) call ?
> 
> It just zero the pointers:
> 
> static inline void qdisc_skb_head_init(struct qdisc_skb_head *qh)
> {
>         qh->head = NULL;
>         qh->tail = NULL;
>         qh->qlen = 0;
> }
> 
> And qdisc is already kzalloc()'ed.

Yeah, I was looking at some old tree, not having the
__qdisc_reset_queue() yet.

^ permalink raw reply

* [PATCH net] sch_multiq: fix double free on init failure
From: Nikolay Aleksandrov @ 2017-08-29 18:26 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, jhs, xiyou.wangcong, jiri, roopa, Nikolay Aleksandrov

The below commit added a call to ->destroy() on init failure, but multiq
still frees ->queues on error in init, but ->queues is also freed by
->destroy() thus we get double free and corrupted memory.

Very easy to reproduce (eth0 not multiqueue):
$ tc qdisc add dev eth0 root multiq
RTNETLINK answers: Operation not supported
$ ip l add dumdum type dummy
(crash)

Trace log:
[ 3929.467747] general protection fault: 0000 [#1] SMP
[ 3929.468083] Modules linked in:
[ 3929.468302] CPU: 3 PID: 967 Comm: ip Not tainted 4.13.0-rc6+ #56
[ 3929.468625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 3929.469124] task: ffff88003716a700 task.stack: ffff88005872c000
[ 3929.469449] RIP: 0010:__kmalloc_track_caller+0x117/0x1be
[ 3929.469746] RSP: 0018:ffff88005872f6a0 EFLAGS: 00010246
[ 3929.470042] RAX: 00000000000002de RBX: 0000000058a59000 RCX: 00000000000002df
[ 3929.470406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff821f7020
[ 3929.470770] RBP: ffff88005872f6e8 R08: 000000000001f010 R09: 0000000000000000
[ 3929.471133] R10: ffff88005872f730 R11: 0000000000008cdd R12: ff006d75646d7564
[ 3929.471496] R13: 00000000014000c0 R14: ffff88005b403c00 R15: ffff88005b403c00
[ 3929.471869] FS:  00007f0b70480740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[ 3929.472286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3929.472677] CR2: 00007ffcee4f3000 CR3: 0000000059d45000 CR4: 00000000000406e0
[ 3929.473209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3929.474109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3929.474873] Call Trace:
[ 3929.475337]  ? kstrdup_const+0x23/0x25
[ 3929.475863]  kstrdup+0x2e/0x4b
[ 3929.476338]  kstrdup_const+0x23/0x25
[ 3929.478084]  __kernfs_new_node+0x28/0xbc
[ 3929.478478]  kernfs_new_node+0x35/0x55
[ 3929.478929]  kernfs_create_link+0x23/0x76
[ 3929.479478]  sysfs_do_create_link_sd.isra.2+0x85/0xd7
[ 3929.480096]  sysfs_create_link+0x33/0x35
[ 3929.480649]  device_add+0x200/0x589
[ 3929.481184]  netdev_register_kobject+0x7c/0x12f
[ 3929.481711]  register_netdevice+0x373/0x471
[ 3929.482174]  rtnl_newlink+0x614/0x729
[ 3929.482610]  ? rtnl_newlink+0x17f/0x729
[ 3929.483080]  rtnetlink_rcv_msg+0x188/0x197
[ 3929.483533]  ? rcu_read_unlock+0x3e/0x5f
[ 3929.483984]  ? rtnl_newlink+0x729/0x729
[ 3929.484420]  netlink_rcv_skb+0x6c/0xce
[ 3929.484858]  rtnetlink_rcv+0x23/0x2a
[ 3929.485291]  netlink_unicast+0x103/0x181
[ 3929.485735]  netlink_sendmsg+0x326/0x337
[ 3929.486181]  sock_sendmsg_nosec+0x14/0x3f
[ 3929.486614]  sock_sendmsg+0x29/0x2e
[ 3929.486973]  ___sys_sendmsg+0x209/0x28b
[ 3929.487340]  ? do_raw_spin_unlock+0xcd/0xf8
[ 3929.487719]  ? _raw_spin_unlock+0x27/0x31
[ 3929.488092]  ? __handle_mm_fault+0x651/0xdb1
[ 3929.488471]  ? check_chain_key+0xb0/0xfd
[ 3929.488847]  __sys_sendmsg+0x45/0x63
[ 3929.489206]  ? __sys_sendmsg+0x45/0x63
[ 3929.489576]  SyS_sendmsg+0x19/0x1b
[ 3929.489901]  entry_SYSCALL_64_fastpath+0x23/0xc2
[ 3929.490172] RIP: 0033:0x7f0b6fb93690
[ 3929.490423] RSP: 002b:00007ffcee4ed588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3929.490881] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f0b6fb93690
[ 3929.491198] RDX: 0000000000000000 RSI: 00007ffcee4ed5d0 RDI: 0000000000000003
[ 3929.491521] RBP: ffff88005872ff98 R08: 0000000000000001 R09: 0000000000000000
[ 3929.491801] R10: 00007ffcee4ed350 R11: 0000000000000246 R12: 0000000000000002
[ 3929.492075] R13: 000000000066f1a0 R14: 00007ffcee4f5680 R15: 0000000000000000
[ 3929.492352]  ? trace_hardirqs_off_caller+0xa7/0xcf
[ 3929.492590] Code: 8b 45 c0 48 8b 45 b8 74 17 48 8b 4d c8 83 ca ff 44
89 ee 4c 89 f7 e8 83 ca ff ff 49 89 c4 eb 49 49 63 56 20 48 8d 48 01 4d
8b 06 <49> 8b 1c 14 48 89 c2 4c 89 e0 65 49 0f c7 08 0f 94 c0 83 f0 01
[ 3929.493335] RIP: __kmalloc_track_caller+0x117/0x1be RSP: ffff88005872f6a0

Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 net/sched/sch_multiq.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index f143b7bbaa0d..b07f8b01aa07 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -259,9 +259,6 @@ static int multiq_init(struct Qdisc *sch, struct nlattr *opt)
 
 	err = multiq_tune(sch, opt);
 
-	if (err)
-		kfree(q->queues);
-
 	return err;
 }
 
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net] sch_multiq: fix double free on init failure
From: Eric Dumazet @ 2017-08-29 18:40 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, edumazet, jhs, xiyou.wangcong, jiri, roopa
In-Reply-To: <1504031188-11434-1-git-send-email-nikolay@cumulusnetworks.com>

On Tue, 2017-08-29 at 21:26 +0300, Nikolay Aleksandrov wrote:
> The below commit added a call to ->destroy() on init failure, but multiq
> still frees ->queues on error in init, but ->queues is also freed by
> ->destroy() thus we get double free and corrupted memory.
> 
> Very easy to reproduce (eth0 not multiqueue):
> $ tc qdisc add dev eth0 root multiq
> RTNETLINK answers: Operation not supported
> $ ip l add dumdum type dummy
> (crash)

> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
>  net/sched/sch_multiq.c | 3 ---
>  1 file changed, 3 deletions(-)
> 

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH] rtlwifi: rtl8822be: Add firmware for new driver/device
From: Kyle McMartin @ 2017-08-29 18:42 UTC (permalink / raw)
  To: Larry Finger
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-firmware-DgEjT+Ai2ygdnm+yROfE0A
In-Reply-To: <20170825142340.11646-1-Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>

On Fri, Aug 25, 2017 at 09:23:40AM -0500, Larry Finger wrote:
> A driver for the RTL8822BE has been added to staging. This commit supplies
> the firmware for it.
> 
> Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
> ---

applied. thanks Larry.

--kyle

^ permalink raw reply

* [patch v1 0/2] add support for the external thermal zone and cooling device binding for Mellanox network devices
From: Vadim Pasternak @ 2017-08-29 18:45 UTC (permalink / raw)
  To: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: jiri-rHqAuBHg3fBzbRFIqnYvSA, ivecera-H+wXaHxf7aLQT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Vadim Pasternak

It allows binding of AISC thermal sensor to externally defined thermal
zones and cooling device. Such definition can be provided from DTS.

Vadim Pasternak (2):
  dt-bindings: net: add binding documentation for mlxsw thermal control
  mlxsw: core: add support for the external thermal zone setting (by
    DTS)

 .../devicetree/bindings/net/mellanox,mlxsw.txt     |  46 +++++++++
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 107 ++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlxsw/minimal.c      |   6 ++
 3 files changed, 155 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/mellanox,mlxsw.txt

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [patch v1 1/2] dt-bindings: net: add binding documentation for mlxsw thermal control
From: Vadim Pasternak @ 2017-08-29 18:45 UTC (permalink / raw)
  To: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: jiri-rHqAuBHg3fBzbRFIqnYvSA, ivecera-H+wXaHxf7aLQT0dZR+AlfA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Vadim Pasternak
In-Reply-To: <1504032311-195988-1-git-send-email-vadimp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add binding document for Mellanox switch devices.

Signed-off-by: Vadim Pasternak <vadimp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 .../devicetree/bindings/net/mellanox,mlxsw.txt     | 46 ++++++++++++++++++++++
 1 file changed, 46 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/mellanox,mlxsw.txt

diff --git a/Documentation/devicetree/bindings/net/mellanox,mlxsw.txt b/Documentation/devicetree/bindings/net/mellanox,mlxsw.txt
new file mode 100644
index 0000000..55de5ff
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mellanox,mlxsw.txt
@@ -0,0 +1,46 @@
+Mellanox Technologies Switch ASICs
+
+This file provides information, what the device node
+for the Switch ASIC interface contains.
+
+Required properties:
+- compatible		: "mellanox,mlxsw_minimal"
+- reg			: The I2C address of the device.
+
+Optional properties:
+- cooling-phandle	: phandle of the cooling device, which is to be used
+			  for the zone thermal control.
+			  If absent, cooling device controlled internally by
+			  the ASIC may be used.
+
+- trips			: the nodes to describe a point in the temperature
+			  domain with key temperatures at which cooling is
+			  recommended. Each node must contain the next values:
+			  - type: the trip type. Expected values are:
+			    0 - a trip point to enable active cooling;
+			    1 - a trip point to enable passive cooling;
+			    2 - a trip point to notify emergency;
+			  - temperature: unsigned integer indicating the trip
+			    temperature level in millicelsius;
+			  - minimum cooling state allowed within the trip node;
+			  - maximum cooling state allowed within the trip node;
+
+Example:
+	asic_thermal: mlxsw_minimal@48 {
+		compatible = "mlxsw_minimal";
+		reg = <0x48>;
+		status = "disabled";
+		cooling-phandle = <&cooling>;
+
+		trips {
+			trip@0 {
+				trip = <0 75000 0 0>;
+			};
+			trip@1 {
+				trip = <2 85000 1 5>;
+			};
+			trip@3 {
+				trip = <2 105000 5 5>;
+			};
+		};
+	};
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
From: Florian Fainelli @ 2017-08-29 18:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, opendmb, andrew, vivien.didelot, Florian Fainelli

Hi David,

While trying an ARM BE kernel for kinks, the 3 drivers below started not
working and the reasons why became pretty obvious because the register space
remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's
native endian (let's call that a feature).

Thanks!

Florian Fainelli (4):
  net: systemport: Use correct I/O accessors
  net: dsa: bcm_sf2: Use correct I/O accessors
  net: systemport: Set correct RSB endian bits based on host
  net: phy: mdio-bcm-unimac: Use correct I/O accessors

 drivers/net/dsa/bcm_sf2.h                  | 12 +++++------
 drivers/net/ethernet/broadcom/bcmsysport.c | 21 ++++++++++++--------
 drivers/net/phy/mdio-bcm-unimac.c          | 32 ++++++++++++++++++++++++------
 3 files changed, 45 insertions(+), 20 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH net-next 1/4] net: systemport: Use correct I/O accessors
From: Florian Fainelli @ 2017-08-29 18:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, opendmb, andrew, vivien.didelot, Florian Fainelli
In-Reply-To: <1504031985-52808-1-git-send-email-f.fainelli@gmail.com>

The SYSTEMPORT driver currently uses __raw_{read,write}l which means
native I/O endian. This works correctly for an ARM LE kernel (default)
but fails miserably on an ARM BE (BE8) kernel where registers are kept
little endian, so replace uses with {read,write}l_relaxed here which is
what we want because this is all performance sensitive code.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index b3a21418f511..a7e84292af50 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -32,13 +32,13 @@
 #define BCM_SYSPORT_IO_MACRO(name, offset) \
 static inline u32 name##_readl(struct bcm_sysport_priv *priv, u32 off)	\
 {									\
-	u32 reg = __raw_readl(priv->base + offset + off);		\
+	u32 reg = readl_relaxed(priv->base + offset + off);		\
 	return reg;							\
 }									\
 static inline void name##_writel(struct bcm_sysport_priv *priv,		\
 				  u32 val, u32 off)			\
 {									\
-	__raw_writel(val, priv->base + offset + off);			\
+	writel_relaxed(val, priv->base + offset + off);			\
 }									\
 
 BCM_SYSPORT_IO_MACRO(intrl2_0, SYS_PORT_INTRL2_0_OFFSET);
@@ -59,14 +59,14 @@ static inline u32 rdma_readl(struct bcm_sysport_priv *priv, u32 off)
 {
 	if (priv->is_lite && off >= RDMA_STATUS)
 		off += 4;
-	return __raw_readl(priv->base + SYS_PORT_RDMA_OFFSET + off);
+	return readl_relaxed(priv->base + SYS_PORT_RDMA_OFFSET + off);
 }
 
 static inline void rdma_writel(struct bcm_sysport_priv *priv, u32 val, u32 off)
 {
 	if (priv->is_lite && off >= RDMA_STATUS)
 		off += 4;
-	__raw_writel(val, priv->base + SYS_PORT_RDMA_OFFSET + off);
+	writel_relaxed(val, priv->base + SYS_PORT_RDMA_OFFSET + off);
 }
 
 static inline u32 tdma_control_bit(struct bcm_sysport_priv *priv, u32 bit)
@@ -110,10 +110,10 @@ static inline void dma_desc_set_addr(struct bcm_sysport_priv *priv,
 				     dma_addr_t addr)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
-	__raw_writel(upper_32_bits(addr) & DESC_ADDR_HI_MASK,
+	writel_relaxed(upper_32_bits(addr) & DESC_ADDR_HI_MASK,
 		     d + DESC_ADDR_HI_STATUS_LEN);
 #endif
-	__raw_writel(lower_32_bits(addr), d + DESC_ADDR_LO);
+	writel_relaxed(lower_32_bits(addr), d + DESC_ADDR_LO);
 }
 
 static inline void tdma_port_write_desc_addr(struct bcm_sysport_priv *priv,
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 2/4] net: dsa: bcm_sf2: Use correct I/O accessors
From: Florian Fainelli @ 2017-08-29 18:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, opendmb, andrew, vivien.didelot, Florian Fainelli
In-Reply-To: <1504031985-52808-1-git-send-email-f.fainelli@gmail.com>

The Starfigther 2 driver currently uses __raw_{read,write}l which means
native I/O endian. This works correctly for an ARM LE kernel (default)
but fails miserably on an ARM BE (BE8) kernel where registers are kept
little endian, so replace uses with {read,write}l_relaxed here which is
what we want because this is all performance sensitive code.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
index 7d3030e04f11..d9c96b281fc0 100644
--- a/drivers/net/dsa/bcm_sf2.h
+++ b/drivers/net/dsa/bcm_sf2.h
@@ -130,12 +130,12 @@ static inline u32 bcm_sf2_mangle_addr(struct bcm_sf2_priv *priv, u32 off)
 #define SF2_IO_MACRO(name) \
 static inline u32 name##_readl(struct bcm_sf2_priv *priv, u32 off)	\
 {									\
-	return __raw_readl(priv->name + off);				\
+	return readl_relaxed(priv->name + off);				\
 }									\
 static inline void name##_writel(struct bcm_sf2_priv *priv,		\
 				  u32 val, u32 off)			\
 {									\
-	__raw_writel(val, priv->name + off);				\
+	writel_relaxed(val, priv->name + off);				\
 }									\
 
 /* Accesses to 64-bits register requires us to latch the hi/lo pairs
@@ -179,23 +179,23 @@ static inline u32 bcm_sf2_mangle_addr(struct bcm_sf2_priv *priv, u32 off)
 static inline u32 core_readl(struct bcm_sf2_priv *priv, u32 off)
 {
 	u32 tmp = bcm_sf2_mangle_addr(priv, off);
-	return __raw_readl(priv->core + tmp);
+	return readl_relaxed(priv->core + tmp);
 }
 
 static inline void core_writel(struct bcm_sf2_priv *priv, u32 val, u32 off)
 {
 	u32 tmp = bcm_sf2_mangle_addr(priv, off);
-	__raw_writel(val, priv->core + tmp);
+	writel_relaxed(val, priv->core + tmp);
 }
 
 static inline u32 reg_readl(struct bcm_sf2_priv *priv, u16 off)
 {
-	return __raw_readl(priv->reg + priv->reg_offsets[off]);
+	return readl_relaxed(priv->reg + priv->reg_offsets[off]);
 }
 
 static inline void reg_writel(struct bcm_sf2_priv *priv, u32 val, u16 off)
 {
-	__raw_writel(val, priv->reg + priv->reg_offsets[off]);
+	writel_relaxed(val, priv->reg + priv->reg_offsets[off]);
 }
 
 SF2_IO64_MACRO(core);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 3/4] net: systemport: Set correct RSB endian bits based on host
From: Florian Fainelli @ 2017-08-29 18:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, opendmb, andrew, vivien.didelot, Florian Fainelli
In-Reply-To: <1504031985-52808-1-git-send-email-f.fainelli@gmail.com>

LE CPU:
* set RSB_SWAP0 (both SYSTEMPORT and SYSTEMPORT Lite)
* clear RSB_SWAP1 (SYSTEMPORT Lite only)

BE CPU:
* clear RSB_SWAP0 (both SYSTEMPORT and SYSTEMPORTE lite)
* set RSB_SWAP1 (SYSTEMPORT Lite only)

With these settings, we have the Receive Status Block always match the
host endian and we do not need to perform any conversion.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index a7e84292af50..7c7558a6b720 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1762,9 +1762,14 @@ static void rbuf_init(struct bcm_sysport_priv *priv)
 	reg = rbuf_readl(priv, RBUF_CONTROL);
 	reg |= RBUF_4B_ALGN | RBUF_RSB_EN;
 	/* Set a correct RSB format on SYSTEMPORT Lite */
-	if (priv->is_lite) {
-		reg &= ~RBUF_RSB_SWAP1;
+	if (!IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) {
+		if (priv->is_lite)
+			reg &= ~RBUF_RSB_SWAP1;
 		reg |= RBUF_RSB_SWAP0;
+	} else {
+		if (priv->is_lite)
+			reg |= RBUF_RSB_SWAP1;
+		reg &= ~RBUF_RSB_SWAP0;
 	}
 	rbuf_writel(priv, reg, RBUF_CONTROL);
 }
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 4/4] net: phy: mdio-bcm-unimac: Use correct I/O accessors
From: Florian Fainelli @ 2017-08-29 18:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, opendmb, andrew, vivien.didelot, Florian Fainelli
In-Reply-To: <1504031985-52808-1-git-send-email-f.fainelli@gmail.com>

The driver currently uses __raw_{read,write}l which works for all
platforms supported: Broadcom MIPS LE/BE (native endian), ARM LE (native
endian) but not ARM BE (registers are still LE). Switch to using the
proper accessors for all platforms and explain why Broadcom MIPS BE is
special here, in doing so, we introduce a couple of helper functions to
abstract these differences.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/mdio-bcm-unimac.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/net/phy/mdio-bcm-unimac.c b/drivers/net/phy/mdio-bcm-unimac.c
index 73c5267a11fd..08e0647b85e2 100644
--- a/drivers/net/phy/mdio-bcm-unimac.c
+++ b/drivers/net/phy/mdio-bcm-unimac.c
@@ -47,18 +47,38 @@ struct unimac_mdio_priv {
 	void			*wait_func_data;
 };
 
+static inline u32 unimac_mdio_readl(struct unimac_mdio_priv *priv, u32 offset)
+{
+	/* MIPS chips strapped for BE will automagically configure the
+	 * peripheral registers for CPU-native byte order.
+	 */
+	if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+		return __raw_readl(priv->base + offset);
+	else
+		return readl_relaxed(priv->base + offset);
+}
+
+static inline void unimac_mdio_writel(struct unimac_mdio_priv *priv, u32 val,
+				      u32 offset)
+{
+	if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+		__raw_writel(val, priv->base + offset);
+	else
+		writel_relaxed(val, priv->base + offset);
+}
+
 static inline void unimac_mdio_start(struct unimac_mdio_priv *priv)
 {
 	u32 reg;
 
-	reg = __raw_readl(priv->base + MDIO_CMD);
+	reg = unimac_mdio_readl(priv, MDIO_CMD);
 	reg |= MDIO_START_BUSY;
-	__raw_writel(reg, priv->base + MDIO_CMD);
+	unimac_mdio_writel(priv, reg, MDIO_CMD);
 }
 
 static inline unsigned int unimac_mdio_busy(struct unimac_mdio_priv *priv)
 {
-	return __raw_readl(priv->base + MDIO_CMD) & MDIO_START_BUSY;
+	return unimac_mdio_readl(priv, MDIO_CMD) & MDIO_START_BUSY;
 }
 
 static int unimac_mdio_poll(void *wait_func_data)
@@ -87,7 +107,7 @@ static int unimac_mdio_read(struct mii_bus *bus, int phy_id, int reg)
 
 	/* Prepare the read operation */
 	cmd = MDIO_RD | (phy_id << MDIO_PMD_SHIFT) | (reg << MDIO_REG_SHIFT);
-	__raw_writel(cmd, priv->base + MDIO_CMD);
+	unimac_mdio_writel(priv, cmd, MDIO_CMD);
 
 	/* Start MDIO transaction */
 	unimac_mdio_start(priv);
@@ -96,7 +116,7 @@ static int unimac_mdio_read(struct mii_bus *bus, int phy_id, int reg)
 	if (ret)
 		return ret;
 
-	cmd = __raw_readl(priv->base + MDIO_CMD);
+	cmd = unimac_mdio_readl(priv, MDIO_CMD);
 
 	/* Some broken devices are known not to release the line during
 	 * turn-around, e.g: Broadcom BCM53125 external switches, so check for
@@ -118,7 +138,7 @@ static int unimac_mdio_write(struct mii_bus *bus, int phy_id,
 	/* Prepare the write operation */
 	cmd = MDIO_WR | (phy_id << MDIO_PMD_SHIFT) |
 		(reg << MDIO_REG_SHIFT) | (0xffff & val);
-	__raw_writel(cmd, priv->base + MDIO_CMD);
+	unimac_mdio_writel(priv, cmd, MDIO_CMD);
 
 	unimac_mdio_start(priv);
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH] rsi: remove memset before memcpy
From: Himanshu Jha @ 2017-08-29 18:54 UTC (permalink / raw)
  To: kvalo; +Cc: amit.karwar, linux-wireless, netdev, linux-kernel, Himanshu Jha

calling memcpy immediately after memset with the same region of memory
makes memset redundant.

Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
---
 drivers/net/wireless/rsi/rsi_91x_sdio.c | 1 -
 drivers/net/wireless/rsi/rsi_91x_usb.c  | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c b/drivers/net/wireless/rsi/rsi_91x_sdio.c
index 742f6cd..8d3a483 100644
--- a/drivers/net/wireless/rsi/rsi_91x_sdio.c
+++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c
@@ -584,7 +584,6 @@ static int rsi_sdio_load_data_master_write(struct rsi_hw *adapter,
 	}
 
 	for (offset = 0, i = 0; i < num_blocks; i++, offset += block_size) {
-		memset(temp_buf, 0, block_size);
 		memcpy(temp_buf, ta_firmware + offset, block_size);
 		lsb_address = (u16)base_address;
 		status = rsi_sdio_write_register_multiple
diff --git a/drivers/net/wireless/rsi/rsi_91x_usb.c b/drivers/net/wireless/rsi/rsi_91x_usb.c
index 9097f7e..81df09d 100644
--- a/drivers/net/wireless/rsi/rsi_91x_usb.c
+++ b/drivers/net/wireless/rsi/rsi_91x_usb.c
@@ -439,7 +439,6 @@ static int rsi_usb_load_data_master_write(struct rsi_hw *adapter,
 	rsi_dbg(INFO_ZONE, "num_blocks: %d\n", num_blocks);
 
 	for (cur_indx = 0, i = 0; i < num_blocks; i++, cur_indx += block_size) {
-		memset(temp_buf, 0, block_size);
 		memcpy(temp_buf, ta_firmware + cur_indx, block_size);
 		status = rsi_usb_write_register_multiple(adapter, base_address,
 							 (u8 *)(temp_buf),
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net] sch_multiq: fix double free on init failure
From: Cong Wang @ 2017-08-29 18:59 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Linux Kernel Network Developers, Eric Dumazet, Jamal Hadi Salim,
	Jiri Pirko, Roopa Prabhu
In-Reply-To: <1504031188-11434-1-git-send-email-nikolay@cumulusnetworks.com>

On Tue, Aug 29, 2017 at 11:26 AM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
> index f143b7bbaa0d..b07f8b01aa07 100644
> --- a/net/sched/sch_multiq.c
> +++ b/net/sched/sch_multiq.c
> @@ -259,9 +259,6 @@ static int multiq_init(struct Qdisc *sch, struct nlattr *opt)
>
>         err = multiq_tune(sch, opt);
>
> -       if (err)
> -               kfree(q->queues);
> -
>         return err;

You can fold them to:

    return multiq_tune(sch, opt);

Other than this,

Acked-by : Cong Wang <xiyou.wangcong@gmail.com>

^ permalink raw reply

* [PATCH net] sch_hhf: fix null pointer dereference on init failure
From: Nikolay Aleksandrov @ 2017-08-29 19:02 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, jhs, xiyou.wangcong, jiri, roopa, Nikolay Aleksandrov

If sch_hhf fails in its ->init() function (either due to wrong
user-space arguments as below or memory alloc failure of hh_flows) it
will do a null pointer deref of q->hh_flows in its ->destroy() function.

To reproduce the crash:
$ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000

Crash log:
[  690.654882] BUG: unable to handle kernel NULL pointer dereference at (null)
[  690.655565] IP: hhf_destroy+0x48/0xbc
[  690.655944] PGD 37345067
[  690.655948] P4D 37345067
[  690.656252] PUD 58402067
[  690.656554] PMD 0
[  690.656857]
[  690.657362] Oops: 0000 [#1] SMP
[  690.657696] Modules linked in:
[  690.658032] CPU: 3 PID: 920 Comm: tc Not tainted 4.13.0-rc6+ #57
[  690.658525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[  690.659255] task: ffff880058578000 task.stack: ffff88005acbc000
[  690.659747] RIP: 0010:hhf_destroy+0x48/0xbc
[  690.660146] RSP: 0018:ffff88005acbf9e0 EFLAGS: 00010246
[  690.660601] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
[  690.661155] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff821f63f0
[  690.661710] RBP: ffff88005acbfa08 R08: ffffffff81b10a90 R09: 0000000000000000
[  690.662267] R10: 00000000f42b7019 R11: ffff880058578000 R12: 00000000ffffffea
[  690.662820] R13: ffff8800372f6400 R14: 0000000000000000 R15: 0000000000000000
[  690.663769] FS:  00007f8ae5e8b740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[  690.667069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  690.667965] CR2: 0000000000000000 CR3: 0000000058523000 CR4: 00000000000406e0
[  690.668918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  690.669945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  690.671003] Call Trace:
[  690.671743]  qdisc_create+0x377/0x3fd
[  690.672534]  tc_modify_qdisc+0x4d2/0x4fd
[  690.673324]  rtnetlink_rcv_msg+0x188/0x197
[  690.674204]  ? rcu_read_unlock+0x3e/0x5f
[  690.675091]  ? rtnl_newlink+0x729/0x729
[  690.675877]  netlink_rcv_skb+0x6c/0xce
[  690.676648]  rtnetlink_rcv+0x23/0x2a
[  690.677405]  netlink_unicast+0x103/0x181
[  690.678179]  netlink_sendmsg+0x326/0x337
[  690.678958]  sock_sendmsg_nosec+0x14/0x3f
[  690.679743]  sock_sendmsg+0x29/0x2e
[  690.680506]  ___sys_sendmsg+0x209/0x28b
[  690.681283]  ? __handle_mm_fault+0xc7d/0xdb1
[  690.681915]  ? check_chain_key+0xb0/0xfd
[  690.682449]  __sys_sendmsg+0x45/0x63
[  690.682954]  ? __sys_sendmsg+0x45/0x63
[  690.683471]  SyS_sendmsg+0x19/0x1b
[  690.683974]  entry_SYSCALL_64_fastpath+0x23/0xc2
[  690.684516] RIP: 0033:0x7f8ae529d690
[  690.685016] RSP: 002b:00007fff26d2d6b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  690.685931] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f8ae529d690
[  690.686573] RDX: 0000000000000000 RSI: 00007fff26d2d700 RDI: 0000000000000003
[  690.687047] RBP: ffff88005acbff98 R08: 0000000000000001 R09: 0000000000000000
[  690.687519] R10: 00007fff26d2d480 R11: 0000000000000246 R12: 0000000000000002
[  690.687996] R13: 0000000001258070 R14: 0000000000000001 R15: 0000000000000000
[  690.688475]  ? trace_hardirqs_off_caller+0xa7/0xcf
[  690.688887] Code: 00 00 e8 2a 02 ae ff 49 8b bc 1d 60 02 00 00 48 83
c3 08 e8 19 02 ae ff 48 83 fb 20 75 dc 45 31 f6 4d 89 f7 4d 03 bd 20 02
00 00 <49> 8b 07 49 39 c7 75 24 49 83 c6 10 49 81 fe 00 40 00 00 75 e1
[  690.690200] RIP: hhf_destroy+0x48/0xbc RSP: ffff88005acbf9e0
[  690.690636] CR2: 0000000000000000

Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
First I did it with the check in the for () conditional, but this is more
visible and explicit. Let me know if you'd like the shorter version. :-)

 net/sched/sch_hhf.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 51d3ba682af9..931c6cc23ac2 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -477,15 +477,17 @@ static void hhf_destroy(struct Qdisc *sch)
 		kvfree(q->hhf_valid_bits[i]);
 	}
 
-	for (i = 0; i < HH_FLOWS_CNT; i++) {
-		struct hh_flow_state *flow, *next;
-		struct list_head *head = &q->hh_flows[i];
-
-		if (list_empty(head))
-			continue;
-		list_for_each_entry_safe(flow, next, head, flowchain) {
-			list_del(&flow->flowchain);
-			kfree(flow);
+	if (q->hh_flows) {
+		for (i = 0; i < HH_FLOWS_CNT; i++) {
+			struct hh_flow_state *flow, *next;
+			struct list_head *head = &q->hh_flows[i];
+
+			if (list_empty(head))
+				continue;
+			list_for_each_entry_safe(flow, next, head, flowchain) {
+				list_del(&flow->flowchain);
+				kfree(flow);
+			}
 		}
 	}
 	kvfree(q->hh_flows);
-- 
2.1.4

^ permalink raw reply related

* Re: XDP redirect measurements, gotchas and tracepoints
From: Andy Gospodarek @ 2017-08-29 19:02 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jesper Dangaard Brouer, Michael Chan, John Fastabend,
	Duyck, Alexander H, pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, borkmann@iogearbox.net
In-Reply-To: <CAKgT0Uebz0dX9wotygOo59UTKt-bQ29Oyd6=MkJYE0a+y2dR2A@mail.gmail.com>

On Tue, Aug 29, 2017 at 09:23:49AM -0700, Alexander Duyck wrote:
> On Tue, Aug 29, 2017 at 6:26 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Mon, 28 Aug 2017 09:11:25 -0700 Alexander Duyck <alexander.duyck@gmail.com> wrote:
> >
> >> My advice would be to not over complicate this. My big concern with
> >> all this buffer recycling is what happens the first time somebody
> >> introduces something like mirroring? Are you going to copy the data to
> >> a new page which would be quite expensive or just have to introduce
> >> reference counts? You are going to have to deal with stuff like
> >> reference counts eventually so you might as well bite that bullet now.
> >> My advice would be to not bother with optimizing for performance right
> >> now and instead focus on just getting functionality. The approach we
> >> took in ixgbe for the transmit path should work for almost any other
> >> driver since all you are looking at is having to free the page
> >> reference which takes care of reference counting already.
> >
> > This return API is not about optimizing performance right now.  It is
> > actually about allowing us to change the underlying memory model per RX
> > queue for XDP.
> 
>  I would disagree. To me this is a obvious case of premature optimization.
> 

I'm with Jesper on this.  Though it may seem to you that this is an
optimization that is not a goal.

> > If a RX-ring is use for both SKBs and XDP, then the refcnt model is
> > still enforced.  Although a driver using the 1-packet-per-page model,
> > should be able to reuse refcnt==1 pages when returned from XDP.
> 
> Isn't this the case for all Rx on XDP enabled rings. Last I knew there
> was an option to pass packets up via an SKB if XDP_PASS is returned.
> Are you saying we need to do a special allocation path if an XDP
> program doesn't make use of XDP_PASS?

I am not proposing that a special allocation path is needed depending on the
return code from the XDP program.  I'm proposing that in a case where
the return code is XDP_REDIRECT (or really anytime the ndo_xdp_xmit
operation is called), that there should be:

(1) notification back to the driver/resource/etc that allocated the page
that resources are no longer in use.

or 

(2) common alloc/free framework used by drivers that operate on
xdp->data so that framework takes care of refcounting, etc.

My preference is (1) since it provides drivers the most flexibility in
the event that some hardware resource (rx ring buffer pointer) or
software resource (page or other chunk of memory) can be freed.

> > If a RX-ring is _ONLY_ used for XDP, then the driver have freedom to
> > implement another memory model, with the return-API.  We need to
> > experiment with the most optimal memory model.  The 1-packet-per-page
> > model is actually not the fastest, because of PCI-e bottlenecks.  With
> > HW support for packing descriptors and packets over the PCI-e bus, much
> > higher rates can be achieved.  Mellanox mlx5-Lx already have the needed HW
> > support.  And companies like NetCope also have 100G HW that does
> > similar tricks, and they even have a whitepaper[1][2] how they are
> > faster than DPDK with their NDP (Netcope Data Plane) API.
> >
> > We do need the ability/flexibility to change the RX memory model, to
> > take advantage of this new NIC hardware.
> 
> Looking over the white paper I see nothing that prevents us from using
> the same memory model we do with the Intel NICs. If anything I think
> the Intel drivers in "legacy-rx" mode could support something like
> this now, even if the hardware doesn't simply because we can get away
> with keeping the memory pseudo-pinned. My bigger concern is that we
> keep coming back to this idea that we need to have the network stack
> taking care of the 1 page per packet recycling when I really think it
> has no business being there. We either need to look at implementing
> this in the way we did in the Intel drivers where we use the reference
> counts or implement our own memory handling API like SLUB or something
> similar based on compound page destructors. I would much rather see us
> focus on getting this going with an agnostic memory model where we
> don't have to make the stack aware of where the memory came from or
> where it has to be returned to.
> 
> > [1] https://www.netcope.com/en/resources/improving-dpdk-performance
> > [2] https://www.netcope.com/en/company/press-center/press-releases/read-new-netcope-whitepaper-on-dpdk-acceleration
> 
> My only concern with something like this is the fact that it is
> optimized for a setup where the data is left in place and nothing
> extra is added. Trying to work with something like this gets more
> expensive when you have to deal with the full stack as you have to
> copy out the headers and still deal with all the skb metadata. I fully
> agree with the basic premise that writing in large blocks provides
> significant gains in throughput, specifically with small packets. The
> only gotcha you would have to deal with is SKB allocation and data
> copying overhead to make room and fill in metadata for the frame and
> any extra headers needed.
> 
> - Alex

^ permalink raw reply

* Re: [PATCH v2 net-next] irda: fix link order if IRDA is built into the kernel
From: Greg KH @ 2017-08-29 19:05 UTC (permalink / raw)
  To: David Miller; +Cc: devel, samuel, netdev, linux-kernel, geert, fengguang.wu
In-Reply-To: <20170829.104945.1786969849973428586.davem@davemloft.net>

On Tue, Aug 29, 2017 at 10:49:45AM -0700, David Miller wrote:
> From: Greg KH <gregkh@linuxfoundation.org>
> Date: Tue, 29 Aug 2017 19:46:22 +0200
> 
> > When moving the IRDA code out of net/ into drivers/staging/irda/net, the
> > link order changes when IRDA is built into the kernel.  That causes a
> > kernel crash at boot time as netfilter isn't initialized yet.
> > 
> > To fix this, build and link the irda networking code in the same exact
> > order that it was previously before the move.
> > 
> > Reported-by: kernel test robot <fengguang.wu@intel.com>
> > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> > Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
> 
> Greg, just change the initializer in IRDA so that it will run
> after subsys_init() when built statically.
> 
> IRDA is definitely not the first pontentially statically built
> thing that needs netlink up and available.

Ok, will do that tomorrow and test it and send you the patch.

thanks,

greg k-h

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox