Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Ido Schimmel @ 2019-09-12  9:05 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Robert Beckett, Florian Fainelli, netdev@vger.kernel.org,
	Vivien Didelot, David S. Miller, Jiri Pirko
In-Reply-To: <20190911225841.GB5710@lunn.ch>

On Thu, Sep 12, 2019 at 12:58:41AM +0200, Andrew Lunn wrote:
> So think about how your can model the Marvell switch capabilities
> using TC, and implement offload support for it.

+1 :)

^ permalink raw reply

* Re: [PATCH v3 2/2] tcp: Add rcv_wnd to TCP_INFO
From: Dave Taht @ 2019-09-12  9:14 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Thomas Higdon, netdev@vger.kernel.org, Jonathan Lemon, Dave Jones,
	Eric Dumazet, Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <CADVnQynNiTEAmA-++JL7kMeht+dzfh2b==R_UJnEdnX3W=3k8g@mail.gmail.com>

On Thu, Sep 12, 2019 at 1:59 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Sep 11, 2019 at 6:32 PM Thomas Higdon <tph@fb.com> wrote:
> >
> > Neal Cardwell mentioned that rcv_wnd would be useful for helping
> > diagnose whether a flow is receive-window-limited at a given instant.
> >
> > This serves the purpose of adding an additional __u32 to avoid the
> > would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
> >
> > Signed-off-by: Thomas Higdon <tph@fb.com>
> > ---
>
> Thanks, Thomas.
>
> I know that when I mentioned this before I mentioned the idea of both
> tp->snd_wnd (send-side receive window) and tp->rcv_wnd (receive-side
> receive window) in tcp_info, and did not express a preference between
> the two. Now that we are faced with a decision between the two,
> personally I think it would be a little more useful to start with
> tp->snd_wnd. :-)
>
> Two main reasons:
>
> (1) Usually when we're diagnosing TCP performance problems, we do so
> from the sender, since the sender makes most of the
> performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> From the sender-side the thing that would be most useful is to see
> tp->snd_wnd, the receive window that the receiver has advertised to
> the sender.

I am under the impression, that particularly in the mobile space, that
network behavior
is often governed by rcv_wnd. At least, there's been so many papers on
this that I'd
tended to assume so.

Given a desire to do both vars, is there a *third* u32 we could add to
fill in the next hole? :)
ecn marks?

>
> (2) From the receiver side, "ss" can already show a fair amount of
> info about receive-side buffer/window limits, like:
> info->tcpi_rcv_ssthresh, info->tcpi_rcv_space,
> skmeminfo[SK_MEMINFO_RMEM_ALLOC], skmeminfo[SK_MEMINFO_RCVBUF]. Often
> the rwin can be approximated by combining those.
>
> Hopefully Eric, Yuchung, and Soheil can weigh in on the question of
> snd_wnd vs rcv_wnd. Or we can perhaps think of another field, and add
> the tcpi_rcvi_ooopack, snd_wnd, rcv_wnd, and that final field, all
> together.
>
> thanks,
> neal



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply

* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Andrew Lunn @ 2019-09-12  9:21 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Robert Beckett, Florian Fainelli, netdev@vger.kernel.org,
	Vivien Didelot, David S. Miller, Jiri Pirko
In-Reply-To: <20190912090339.GA16311@splinter>

> 2. Scheduling: How to schedule between the different transmission queues
> 
> Where the port from which the packets should egress is the CPU port,
> before they cross the PCI towards the imx6.

Hi Ido

This is DSA, so the switch is connected via Ethernet to the IMX6, not
PCI. Minor detail, but that really is the core of what makes DSA DSA.

     Andrew

^ permalink raw reply

* Re: [PATCH net v2 01/11] net: core: limit nested device depth
From: David Miller @ 2019-09-12  9:38 UTC (permalink / raw)
  To: ap420073
  Cc: netdev, j.vosburgh, vfalico, andy, jiri, sd, roopa, saeedm,
	manishc, rahulv, kys, haiyangz, sthemmin, sashal, hare, varun,
	ubraun, kgraul, jay.vosburgh
In-Reply-To: <CAMArcTV-Qvfd7xA0huCh_dbtr7P4LA+cQ7CpnaBBhdq-tq5fZQ@mail.gmail.com>

From: Taehee Yoo <ap420073@gmail.com>
Date: Thu, 12 Sep 2019 12:56:19 +0900

> I tested with this reproducer commands without lockdep.
> 
>     ip link add dummy0 type dummy
>     ip link add link dummy0 name vlan1 type vlan id 1
>     ip link set vlan1 up
> 
>     for i in {2..200}
>     do
>             let A=$i-1
> 
>             ip link add name vlan$i link vlan$A type vlan id $i
>     done
>     ip link del vlan1 <-- this command is added.

Is there any other device type which allows arbitrary nesting depth
in this manner other than VLAN?  Perhaps it is the VLAN nesting
depth that we should limit instead of all of this extra code.

^ permalink raw reply

* RE: [RFC PATCH 3/3] Enable ptp_kvm for arm64
From: Jianyong Wu (Arm Technology China) @ 2019-09-12  9:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: netdev@vger.kernel.org, pbonzini@redhat.com,
	sean.j.christopherson@intel.com, richardcochran@gmail.com,
	Mark Rutland, Will Deacon, Suzuki Poulose,
	linux-kernel@vger.kernel.org, Steve Capper,
	Kaly Xin (Arm Technology China), Justin He (Arm Technology China)
In-Reply-To: <86ftl3rrxg.wl-maz@kernel.org>

Hi Marc,

> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Wednesday, September 11, 2019 7:31 PM
> To: Jianyong Wu (Arm Technology China) <Jianyong.Wu@arm.com>
> Cc: netdev@vger.kernel.org; pbonzini@redhat.com;
> sean.j.christopherson@intel.com; richardcochran@gmail.com; Mark Rutland
> <Mark.Rutland@arm.com>; Will Deacon <Will.Deacon@arm.com>; Suzuki
> Poulose <Suzuki.Poulose@arm.com>; linux-kernel@vger.kernel.org; Steve
> Capper <Steve.Capper@arm.com>; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>; Justin He (Arm Technology China)
> <Justin.He@arm.com>
> Subject: Re: [RFC PATCH 3/3] Enable ptp_kvm for arm64
>
> On Wed, 11 Sep 2019 11:06:18 +0100,
> "Jianyong Wu (Arm Technology China)" <Jianyong.Wu@arm.com> wrote:
> >
> > Hi Marc,
> >
> > I think there are three points for the migration issue of ptp_kvm,
> > where a VM using ptp_kvm migrates to a host without ptp_kvm support.
> >
> > First: how does it impact the VM having migrated?
> > I run a VM with ptp_kvm support in guest but not support in host. the
> > ptp0 will return 0 when get time from it which can't pass the check of
> > chrony, then chrony will choose another clocksource.
> > From this point, VM will only get lost in precision of time sync.
>
> "only" is a bit of an understatement. Once the guest has started relying on a
> service, it seems rather harsh to pretend this service doesn't exist anymore.
> It could well be that the VM cannot perform its function if the precision is not
> good enough.
>
> The analogy is the Spectre-v2 mitigation, which is implemented as a hypercall.
> Nothing will break if you migrate to a host that doesn't support the mitigation,
> but the guest will now be unsafe. Is that acceptable? the answer is of course
> "no".
>
> > Second: how to check the failure of the ptp kvm service when there is
> > no ptp kvm service, hypercall will go into default ops, so we can
> > check the return value which can inform us the failure.
>
> Sure. But that's still an issue. The VM relied on the service, and the service
> isn't available anymore.
>
> > Third: how to inform VMM
> > There is ioctl cmd call "KVM_CHECK_EXTENSION" in kvm, which may do
> > that thing. Accordingly, qemu should be offered the support which will
> > block us.  We can try to add this support in kvm but we are not sure
> > the response from qemu side.
>
> It doesn't matter whether QEMU implements that check or . The important
> thing is that we give userspace a way to check this for this, and having a
> capability that can be checked against is probably the right thing to do.

Ok, I agree.
Adding a new capability item under "KVM_CHECK_EXTENSION" in kvm for ptp_kvm will do and Using ioctl in userspace can check if the ptp service is available.
I will append this patch in this patch serial.

Thanks
Jianyong Wu

>
> Thanks,
>
>       M.
>
> --
> Jazz is not dead, it just smells funny.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply

* Re: [PATCH 04/11] net: phylink: switch to using fwnode_gpiod_get_index()
From: Linus Walleij @ 2019-09-12  9:41 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Andy Shevchenko, Russell King - ARM Linux admin, Mika Westerberg,
	linux-kernel@vger.kernel.org, open list:GPIO SUBSYSTEM,
	Andrew Lunn, David S. Miller, Florian Fainelli, Heiner Kallweit,
	netdev
In-Reply-To: <20190911095149.GA108334@dtor-ws>

On Wed, Sep 11, 2019 at 10:51 AM Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:

> If we are willing to sacrifice the custom label for the GPIO that
> fwnode_gpiod_get_index() allows us to set, then there are several
> drivers that could actually use gpiod_get() API.

We have:
gpiod_set_consumer_name(gpiod, "name");
to deal with that so no sacrifice is needed.

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH 00/11] Add support for software nodes to gpiolib
From: Linus Walleij @ 2019-09-12  9:55 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Andy Shevchenko, Mika Westerberg, linux-kernel@vger.kernel.org,
	open list:GPIO SUBSYSTEM, Andrew Lunn, Andrzej Hajda,
	Bartosz Golaszewski, Daniel Vetter, David Airlie, David S. Miller,
	Florian Fainelli, Heiner Kallweit, Jernej Skrabec, Jonas Karlman,
	Laurent Pinchart, Neil Armstrong, Russell King,
	open list:DRM PANEL DRIVERS, ACPI Devel Maling List, netdev
In-Reply-To: <20190911075215.78047-1-dmitry.torokhov@gmail.com>

On Wed, Sep 11, 2019 at 8:52 AM Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:

> If we agree in principle, I would like to have the very first 3 patches
> in an immutable branch off maybe -rc8 so that it can be pulled into
> individual subsystems so that patches switching various drivers to
> fwnode_gpiod_get_index() could be applied.

I think it seems a bit enthusiastic to have non-GPIO subsystems
pick up these changes this close to the merge window so my plan
is to merge patches 1.2.3 (1 already merged) and then you could
massage the other subsystems in v5.4-rc1.

But if other subsystems say "hey we want do fix this in like 3 days"
then I'm game for an immutable branch as well.

Yours,
Linus Walleij

^ permalink raw reply

* [PATCH net] net/sched: fix race between deactivation and dequeue for NOLOCK qdisc
From: Paolo Abeni @ 2019-09-12 10:02 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Davide Caratti, Li Shuang

The test implemented by some_qdisc_is_busy() is somewhat loosy for
NOLOCK qdisc, as we may hit the following scenario:

CPU1						CPU2
// in net_tx_action()
clear_bit(__QDISC_STATE_SCHED...);
						// in some_qdisc_is_busy()
						val = (qdisc_is_running(q) ||
						       test_bit(__QDISC_STATE_SCHED,
								&q->state));
						// here val is 0 but...
qdisc_run(q)
// ... CPU1 is going to run the qdisc next

As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset()
in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as
both the above bit operations are under the root qdisc lock().

After commit 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue") 
the race can cause use after free and/or null ptr dereference, but the root 
cause is likely older.

This patch addresses the issue explicitly checking for deactivation under
the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical
scenario becomes a no-op.

Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(),
but that is safe due to the skb_array() locking, and we can't avoid that
for NOLOCK qdiscs.

Fixes: 021a17ed796b ("pfifo_fast: drop unneeded additional lock on dequeue")
Reported-by: Li Shuang <shuali@redhat.com>
Reported-and-tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/pkt_sched.h |  7 ++++++-
 net/core/dev.c          | 16 ++++++++++------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index a16fbe9a2a67..aa99c73c3fbd 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -118,7 +118,12 @@ void __qdisc_run(struct Qdisc *q);
 static inline void qdisc_run(struct Qdisc *q)
 {
 	if (qdisc_run_begin(q)) {
-		__qdisc_run(q);
+		/* NOLOCK qdisc must check 'state' under the qdisc seqlock
+		 * to avoid racing with dev_qdisc_reset()
+		 */
+		if (!(q->flags & TCQ_F_NOLOCK) ||
+		    likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
+			__qdisc_run(q);
 		qdisc_run_end(q);
 	}
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 0891f499c1bb..ef8f2f002e09 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3467,18 +3467,22 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	qdisc_calculate_pkt_len(skb, q);

 	if (q->flags & TCQ_F_NOLOCK) {
-		if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
-			__qdisc_drop(skb, &to_free);
-			rc = NET_XMIT_DROP;
-		} else if ((q->flags & TCQ_F_CAN_BYPASS) && q->empty &&
-			   qdisc_run_begin(q)) {
+		if ((q->flags & TCQ_F_CAN_BYPASS) && q->empty &&
+		    qdisc_run_begin(q)) {
+			if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED,
+					      &q->state))) {
+				__qdisc_drop(skb, &to_free);
+				rc = NET_XMIT_DROP;
+				goto end_run;
+			}
 			qdisc_bstats_cpu_update(q, skb);

+			rc = NET_XMIT_SUCCESS;
 			if (sch_direct_xmit(skb, q, dev, txq, NULL, true))
 				__qdisc_run(q);

+end_run:
 			qdisc_run_end(q);
-			rc = NET_XMIT_SUCCESS;
 		} else {
 			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
 			qdisc_run(q);
-- 
2.21.0

^ permalink raw reply related

* Re: [PATCH v2 net-next 3/7] net: dsa: sja1105: Switch to hardware operations for PTP
From: David Miller @ 2019-09-12 10:12 UTC (permalink / raw)
  To: olteanv; +Cc: f.fainelli, vivien.didelot, andrew, richardcochran, netdev
In-Reply-To: <20190910013501.3262-4-olteanv@gmail.com>

From: Vladimir Oltean <olteanv@gmail.com>
Date: Tue, 10 Sep 2019 04:34:57 +0300

>  static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
>  {
>  	struct sja1105_private *priv = ptp_to_sja1105(ptp);
> +	const struct sja1105_regs *regs = priv->info->regs;
>  	s64 clkrate;
> +	int rc;
 ..
> -static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
> -{
> -	struct sja1105_private *priv = ptp_to_sja1105(ptp);
> +	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
> +				  &clkrate, 4);

You're sending an arbitrary 4 bytes of a 64-bit value.  This works on little endian
but will not on big endian.

Please properly copy this clkrate into a "u32" variable and pass that into
sja1105_spi_send_int().

It also seems to suggest that you want to use abs() to perform that weird
centering around 1 << 31 calculation.

Thank you.

^ permalink raw reply

* Re: [PATCH 1/7] net/dsa: configure autoneg for CPU port
From: Robert Beckett @ 2019-09-12 10:14 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, netdev, Vivien Didelot, David S. Miller,
	bob.beckett
In-Reply-To: <20190911225252.GA5710@lunn.ch>

On Thu, 2019-09-12 at 00:52 +0200, Andrew Lunn wrote:
> > It is not just for broadcast storm protection. The original issue
> > that
> > made me look in to all of this turned out to be rx descritor ring
> > buffer exhaustion due to the CPU not being able to keep up with
> > packet
> > reception.
> 
> Pause frames does not really solve this problem. The switch will at
> some point fill its buffers, and start throwing packets away. Or it
> needs to send pause packets it its peers. And then your whole switch
> throughput goes down. Packets will always get thrown away, so you
> need
> QoS in your network to give the network hints about which frames is
> should throw away first.
> 

Indeed. This is the understanding I was working with.
This patch series enables pause frames, output queue prriority and
strict scheduling to egress the high priority queues first.
This means that when the switch starts dropping frames, it drops from
the lowest priority as the highest ones are delivered at line speed
without issue.

> ..
> 
> > Fundamentally, with a phy to phy CPU connection, the CPU MAC may
> > well
> > wish to enable pause frames for various reasons, so we should
> > strive to
> > handle that I think.
> 
> It actually has nothing to do with PHY to PHY connections. You can
> use
> pause frames with direct MAC to MAC connections. PHY auto-negotiation
> is one way to indicate both ends support it, but there are also other
> ways. e.g.
> 
> ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off]
> 
> on the SoC you could do
> 
> ethtool --pause eth0 autoneg off rx on tx on
> 
> to force the SoC to send and process pause frames. Ideally i would
> prefer a solution like this, since it is not a change of behaviour
> for
> everybody else.

Good point, well made.
The reason for using autoneg in this series was due to having no netdev
to run ethtool against for the CPU port.
If we go down the route of creating a netdev for the CPU port, then we
could indeed force pause frames at both ends.

However, given that the phy on the marvell switch is capable of autoneg
, is it not reasonable to setup the advertisement and let autoneg take
care of it if using phy to phy connection?

> 
>    Andrew

^ permalink raw reply

* Re: [PATCH net v2 01/11] net: core: limit nested device depth
From: Taehee Yoo @ 2019-09-12 10:14 UTC (permalink / raw)
  To: David Miller
  Cc: Netdev, j.vosburgh, vfalico, Andy Gospodarek,
	Jiří Pírko, sd, Roopa Prabhu, saeedm, manishc,
	rahulv, kys, haiyangz, sthemmin, sashal, hare, varun, ubraun,
	kgraul, Jay Vosburgh
In-Reply-To: <20190912.113807.52193745382103083.davem@davemloft.net>

On Thu, 12 Sep 2019 at 18:38, David Miller <davem@davemloft.net> wrote:
>
> From: Taehee Yoo <ap420073@gmail.com>
> Date: Thu, 12 Sep 2019 12:56:19 +0900
>
> > I tested with this reproducer commands without lockdep.
> >
> >     ip link add dummy0 type dummy
> >     ip link add link dummy0 name vlan1 type vlan id 1
> >     ip link set vlan1 up
> >
> >     for i in {2..200}
> >     do
> >             let A=$i-1
> >
> >             ip link add name vlan$i link vlan$A type vlan id $i
> >     done
> >     ip link del vlan1 <-- this command is added.
>
> Is there any other device type which allows arbitrary nesting depth
> in this manner other than VLAN?  Perhaps it is the VLAN nesting
> depth that we should limit instead of all of this extra code.

Below device types have the same problem.
VLAN, BONDING, TEAM, VXLAN, MACVLAN, and MACSEC.
All the below test commands reproduce a panic.

BONDING test commands:
    ip link add bond0 type bond
    for i in {1..200}
    do
            let A=$i-1
            ip link add bond$i type bond
            ip link set bond$i master bond$A
    done
    ip link set bond5 master bond0

TEAM test commands:
    ip link add team0 type team
    for i in {1..200}
    do
            let A=$i-1
            ip link add team$i type team
            ip link set team$i master team$A
    done

MACSEC test commands:
    ip link add link lo macsec0 type macsec
    for i in {1..100}
    do
            let A=$i-1
            ip link add link macsec$A macsec$i type macsec
    done
    ip link del macsec0

MACVLAN test commands:
    ip link add dummy0 type dummy
    ip link add macvlan1 link dummy0 type macvlan
    ip link add vlan2 link macvlan1 type vlan id 2
    let i=3
    for j in {1..100}
    do
            let A=$i-1
            ip link add macvlan$i link vlan$A type macvlan
            let i=$i+1
            let A=$i-1
            ip link add vlan$i link macvlan$A type vlan id $i
            let i=$i+1
    done
    ip link del dummy0

VXLAN test commands:
    ip link add vxlan1 type vxlan dev lo id 1 dstport 1
    for i in {2..100}
    do
            let A=$i-1
            ip link add vxlan$i type vxlan dev vxlan$A id $i dstport $i
    done
    ip link del vxlan1

^ permalink raw reply

* Re: [PATCH v2 net-next 3/7] net: dsa: sja1105: Switch to hardware operations for PTP
From: Vladimir Oltean @ 2019-09-12 10:17 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, vivien.didelot, andrew, richardcochran, netdev
In-Reply-To: <20190912.121203.1106283271122334199.davem@davemloft.net>

Hi Dave,

On 12/09/2019, David Miller <davem@davemloft.net> wrote:
> From: Vladimir Oltean <olteanv@gmail.com>
> Date: Tue, 10 Sep 2019 04:34:57 +0300
>
>>  static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long
>> scaled_ppm)
>>  {
>>  	struct sja1105_private *priv = ptp_to_sja1105(ptp);
>> +	const struct sja1105_regs *regs = priv->info->regs;
>>  	s64 clkrate;
>> +	int rc;
>  ..
>> -static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
>> -{
>> -	struct sja1105_private *priv = ptp_to_sja1105(ptp);
>> +	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
>> +				  &clkrate, 4);
>
> You're sending an arbitrary 4 bytes of a 64-bit value.  This works on little
> endian
> but will not on big endian.
>
> Please properly copy this clkrate into a "u32" variable and pass that into
> sja1105_spi_send_int().
>
> It also seems to suggest that you want to use abs() to perform that weird
> centering around 1 << 31 calculation.
>
> Thank you.
>

It looks 'wrong' but it isn't. The driver uses the 'packing' framework
(lib/packing.c) which is endian-agnostic (converts between CPU and
peripheral endianness) and operates on u64 as the CPU word size. On
the contrary, u32 would not work with the 'packing' API in its current
form, but I don't see yet any reasons to extend it (packing64,
packing32 etc).

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH v4 2/2] net: phy: dp83867: Add SGMII mode type switching
From: Vitaly Gaiduk @ 2019-09-12 10:17 UTC (permalink / raw)
  To: David Miller
  Cc: robh+dt, f.fainelli, mark.rutland, andrew, hkallweit1, tpiepho,
	netdev, devicetree, linux-kernel
In-Reply-To: <20190912.003754.663480494374990855.davem@davemloft.net>

Hello, David.

Should I patch commit as Trent Piepho suggested? He wrote about using 
phy_modify_mmd() instead.

Vitaly.

On 12.09.2019 1:37, David Miller wrote:
> From: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
> Date: Mon,  9 Sep 2019 20:19:24 +0300
>
>> This patch adds ability to switch beetween two PHY SGMII modes.
>> Some hardware, for example, FPGA IP designs may use 6-wire mode
>> which enables differential SGMII clock to MAC.
>>
>> Signed-off-by: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
> Applied.

^ permalink raw reply

* Re: [PATCH v4] tun: fix use-after-free when register netdev failed
From: David Miller @ 2019-09-12 10:18 UTC (permalink / raw)
  To: yangyingliang; +Cc: netdev, jasowang, eric.dumazet, xiyou.wangcong, weiyongjun1
In-Reply-To: <1568113017-79840-1-git-send-email-yangyingliang@huawei.com>

From: Yang Yingliang <yangyingliang@huawei.com>
Date: Tue, 10 Sep 2019 18:56:57 +0800

> I got a UAF repport in tun driver when doing fuzzy test:
 ...
> tun_chr_read_iter() accessed the memory which freed by free_netdev()
> called by tun_set_iff():
> 
>         CPUA                                           CPUB
>   tun_set_iff()
>     alloc_netdev_mqs()
>     tun_attach()
>                                                   tun_chr_read_iter()
>                                                     tun_get()
>                                                     tun_do_read()
>                                                       tun_ring_recv()
>     register_netdevice() <-- inject error
>     goto err_detach
>     tun_detach_all() <-- set RCV_SHUTDOWN
>     free_netdev() <-- called from
>                      err_free_dev path
>       netdev_freemem() <-- free the memory
>                         without check refcount
>       (In this path, the refcount cannot prevent
>        freeing the memory of dev, and the memory
>        will be used by dev_put() called by
>        tun_chr_read_iter() on CPUB.)
>                                                      (Break from tun_ring_recv(),
>                                                      because RCV_SHUTDOWN is set)
>                                                    tun_put()
>                                                      dev_put() <-- use the memory
>                                                                    freed by netdev_freemem()
> 
> Put the publishing of tfile->tun after register_netdevice(),
> so tun_get() won't get the tun pointer that freed by
> err_detach path if register_netdevice() failed.
> 
> Fixes: eb0fb363f920 ("tuntap: attach queue 0 before registering netdevice")
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
From: David Miller @ 2019-09-12 10:21 UTC (permalink / raw)
  To: christophe.jaillet
  Cc: kuznet, yoshfuji, netdev, linux-kernel, kernel-janitors
In-Reply-To: <20190910112959.9222-1-christophe.jaillet@wanadoo.fr>

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Tue, 10 Sep 2019 13:29:59 +0200

> The '.exit' functions from 'pernet_operations' structure should be marked
> as __net_exit, not __net_init.
> 
> Fixes: d862e5461423 ("net: ipv6: Implement /proc/net/icmp6.")
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
> Untested, but using __net_exit looks consistent with other
> pernet_operations.exit use case.

Looks good, applied.

^ permalink raw reply

* [PATCH v2] net: stmmac: socfpga: re-use the `interface` parameter from platform data
From: Alexandru Ardelean @ 2019-09-12 13:28 UTC (permalink / raw)
  To: netdev, linux-stm32, linux-arm-kernel, linux-kernel
  Cc: peppe.cavallaro, alexandre.torgue, joabreu, mcoquelin.stm32,
	davem, Alexandru Ardelean

The socfpga sub-driver defines an `interface` field in the `socfpga_dwmac`
struct and parses it on init.

The shared `stmmac_probe_config_dt()` function also parses this from the
device-tree and makes it available on the returned `plat_data` (which is
the same data available via `netdev_priv()`).

All that's needed now is to dig that information out, via some
`dev_get_drvdata()` && `netdev_priv()` calls and re-use it.

Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
---

Changelog v1 -> v2:
* initially, this patch was developed on a 4.14 kernel, and adapted (badly)
  to `net-next`, so it did not build ; the v2 has been fixed and adapted
  correctly

 .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c   | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
index c141fe783e87..5b6213207c43 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
@@ -46,7 +46,6 @@ struct socfpga_dwmac_ops {
 };
 
 struct socfpga_dwmac {
-	int	interface;
 	u32	reg_offset;
 	u32	reg_shift;
 	struct	device *dev;
@@ -110,8 +109,6 @@ static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device *
 	struct resource res_tse_pcs;
 	struct resource res_sgmii_adapter;
 
-	dwmac->interface = of_get_phy_mode(np);
-
 	sys_mgr_base_addr =
 		altr_sysmgr_regmap_lookup_by_phandle(np, "altr,sysmgr-syscon");
 	if (IS_ERR(sys_mgr_base_addr)) {
@@ -231,6 +228,14 @@ static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device *
 	return ret;
 }
 
+static inline int socfpga_get_plat_phymode(struct socfpga_dwmac *dwmac)
+{
+	struct net_device *ndev = dev_get_drvdata(dwmac->dev);
+	struct stmmac_priv *priv = netdev_priv(ndev);
+
+	return priv->plat->interface;
+}
+
 static int socfpga_set_phy_mode_common(int phymode, u32 *val)
 {
 	switch (phymode) {
@@ -255,7 +260,7 @@ static int socfpga_set_phy_mode_common(int phymode, u32 *val)
 static int socfpga_gen5_set_phy_mode(struct socfpga_dwmac *dwmac)
 {
 	struct regmap *sys_mgr_base_addr = dwmac->sys_mgr_base_addr;
-	int phymode = dwmac->interface;
+	int phymode = socfpga_get_plat_phymode(dwmac);
 	u32 reg_offset = dwmac->reg_offset;
 	u32 reg_shift = dwmac->reg_shift;
 	u32 ctrl, val, module;
@@ -314,7 +319,7 @@ static int socfpga_gen5_set_phy_mode(struct socfpga_dwmac *dwmac)
 static int socfpga_gen10_set_phy_mode(struct socfpga_dwmac *dwmac)
 {
 	struct regmap *sys_mgr_base_addr = dwmac->sys_mgr_base_addr;
-	int phymode = dwmac->interface;
+	int phymode = socfpga_get_plat_phymode(dwmac);
 	u32 reg_offset = dwmac->reg_offset;
 	u32 reg_shift = dwmac->reg_shift;
 	u32 ctrl, val, module;
-- 
2.20.1


^ permalink raw reply related

* Re: [Patch net] sch_sfb: fix a crash in sfb_destroy()
From: Linus Torvalds @ 2019-09-12 10:31 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Linux Kernel Network Developers, syzbot,
	Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <CAM_iQpVP6qVbWmV+kA8UGXG6r1LJftyV32UjUbqryGrX5Ud8Nw@mail.gmail.com>

On Thu, Sep 12, 2019 at 2:10 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Wed, Sep 11, 2019 at 2:36 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > It seems a similar fix would be needed in net/sched/sch_dsmark.c ?
> >
>
> Yeah, or just add a NULL check in dsmark_destroy().

Well, this was why one of my suggestions was to just make
"qdisc_put()" be happy with a NULL pointer (or even an ERR_PTR()).

That would have fixed not just sfb, but also dsmark with a single patch.

We tend to have that kind of pattern in a lot of places, where we can
free unallocated structures (end ERR_PTR() pointers) withour errors,
so that

  destroy_fn(alloc_fn());

is defined to always work, even when alloc_fn() returns NULL or an
error. That, and allowing the "it was never allocated at all" case (as
long as it's initialized to NULL, of course) tends to make various
error cases simpler.

The obvious one is kfree(kmalloc()), of course, but we've done it in
other places too. So you find things like

  void i2c_unregister_device(struct i2c_client *client)
  {
        if (IS_ERR_OR_NULL(client))
                return;

in various subsystems and drivers. So one of my suggestions was to
just do that to qdisc_put().

It depends on what you want to do, of course. Do you want to make sure
each user is being very careful? Or do you want to make the interfaces
easy to use without _having_ to be careful? There are arguments both
ways, but we've tended to move more towards a "easy to use" model than
the "be careful" one.

               Linus

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] add ksz9567 with I2C support to ksz9477 driver
From: David Miller @ 2019-09-12 10:39 UTC (permalink / raw)
  To: george.mccollister
  Cc: netdev, woojung.huh, andrew, f.fainelli, Tristram.Ha, marex,
	linux-kernel
In-Reply-To: <20190910131836.114058-1-george.mccollister@gmail.com>

From: George McCollister <george.mccollister@gmail.com>
Date: Tue, 10 Sep 2019 08:18:33 -0500

> Resurrect KSZ9477 I2C driver support patch originally sent to the list
> by Tristram Ha and resolve outstanding issues. It now works as similarly to
> the ksz9477 SPI driver as possible, using the same regmap macros.
> 
> Add support for ksz9567 to the ksz9477 driver (tested on a board with
> ksz9567 connected via I2C).
> 
> Remove NET_DSA_TAG_KSZ_COMMON since it's not needed.
> 
> Changes since v1:
> Put ksz9477_i2c.c includes in alphabetical order.
> Added Reviewed-Bys.

Series applied.

Please follow up with Andrew about the macros.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/7] net/dsa: configure autoneg for CPU port
From: Andrew Lunn @ 2019-09-12 10:43 UTC (permalink / raw)
  To: Robert Beckett
  Cc: Florian Fainelli, netdev, Vivien Didelot, David S. Miller,
	bob.beckett
In-Reply-To: <8d63d4dbd9d075b5c238fd8933673b95b2fa96e9.camel@collabora.com>

> > It actually has nothing to do with PHY to PHY connections. You can
> > use
> > pause frames with direct MAC to MAC connections. PHY auto-negotiation
> > is one way to indicate both ends support it, but there are also other
> > ways. e.g.
> > 
> > ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off]
> > 
> > on the SoC you could do
> > 
> > ethtool --pause eth0 autoneg off rx on tx on
> > 
> > to force the SoC to send and process pause frames. Ideally i would
> > prefer a solution like this, since it is not a change of behaviour
> > for
> > everybody else.
> 
> Good point, well made.
> The reason for using autoneg in this series was due to having no netdev
> to run ethtool against for the CPU port.

Do you need one? It is the IMX which is the bottle neck. It is the one
which needs to send pause frames. You have a netdev for that. Have you
checked if the switch will react on pause frames without your
change. Play with the command i give above on the master interface. It
looks like the FEC driver fully supports synchronous pause
configuration.

> However, given that the phy on the marvell switch is capable of
> autoneg , is it not reasonable to setup the advertisement and let
> autoneg take care of it if using phy to phy connection?

Most designs don't use back to back PHYs for the CPU port. They save
the cost and connect MACs back to back using RGMII, or maybe SERDES.
If we are going for a method which can configure pause between the CPU
and the switch, it needs to be generic and work for both setups.

    Andrew

^ permalink raw reply

* Re: [PATCH] bpf: validate bpf_func when BPF_JIT is enabled
From: Toke Høiland-Jørgensen @ 2019-09-12 10:46 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Björn Töpel, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Kees Cook, Martin Lau, Song Liu,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jesper Dangaard Brouer
In-Reply-To: <CABCJKufCwjXQ6a4oLjywDmxY2apUZ1yop-5+qty82bfwV-QTAA@mail.gmail.com>

Sami Tolvanen <samitolvanen@google.com> writes:

> On Wed, Sep 11, 2019 at 5:09 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Björn Töpel <bjorn.topel@intel.com> writes:
>> > I ran the "xdp_rxq_info" sample with and without Sami's patch:
>>
>> Thanks for doing this!
>
> Yes, thanks for testing this Björn!
>
>> Or (1/22998700 - 1/23923874) * 10**9 == 1.7 nanoseconds of overhead.
>>
>> I guess that is not *too* bad; but it's still chipping away at
>> performance; anything we could do to lower the overhead?
>
> The check is already rather minimal, but I could move this to a static
> inline function to help ensure the compiler doesn't generate an
> additional function call for this. I'm also fine with gating this
> behind a separate config option, but I'm not sure if that's worth it.
> Any thoughts?

I think it would be good if you do both. I'm a bit worried that XDP
performance will end up in a "death by a thousand paper cuts" situation,
so I'd rather push back on even relatively small overheads like this; so
being able to turn it off in the config would be good.

Can you share more details about what the "future CFI checking" is
likely to look like?

-Toke

^ permalink raw reply

* Re: [PATCH net] net: Fix null de-reference of device refcount
From: David Miller @ 2019-09-12 10:56 UTC (permalink / raw)
  To: subashab; +Cc: dlezcano, eric.dumazet, netdev, stranche
In-Reply-To: <1568145777-29480-1-git-send-email-subashab@codeaurora.org>

From: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Date: Tue, 10 Sep 2019 14:02:57 -0600

> In event of failure during register_netdevice, free_netdev is
> invoked immediately. free_netdev assumes that all the netdevice
> refcounts have been dropped prior to it being called and as a
> result frees and clears out the refcount pointer.
> 
> However, this is not necessarily true as some of the operations
> in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
> invocation after a grace period. The IPv4 callback in_dev_rcu_put
> tries to access the refcount after free_netdev is called which
> leads to a null de-reference-
 ...
> Fix this by waiting for the completion of the call_rcu() in
> case of register_netdevice errors.
> 
> Fixes: 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
> Cc: Sean Tranchetti <stranche@codeaurora.org>
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH v2] net: qrtr: fix memort leak in qrtr_tun_write_iter
From: David Miller @ 2019-09-12 10:59 UTC (permalink / raw)
  To: navid.emamdoost; +Cc: emamd001, smccaman, kjlu, netdev, linux-kernel
In-Reply-To: <20190911150907.18251-1-navid.emamdoost@gmail.com>

From: Navid Emamdoost <navid.emamdoost@gmail.com>
Date: Wed, 11 Sep 2019 10:09:02 -0500

> In qrtr_tun_write_iter the allocated kbuf should be release in case of
> error or success return.
> 
> v2 Update: Thanks to David Miller for pointing out the release on success
> path as well.
> 
> Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>

Applied, thanks.

^ permalink raw reply

* [PATCH] ixgbe: Fix secpath usage for IPsec TX offload.
From: Steffen Klassert @ 2019-09-12 11:01 UTC (permalink / raw)
  To: Jeff Kirsher, intel-wired-lan; +Cc: Michael Marley, Shannon Nelson, netdev

The ixgbe driver currently does IPsec TX offloading
based on an existing secpath. However, the secpath
can also come from the RX side, in this case it is
misinterpreted for TX offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for TX offload.

Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
Reported-by: Michael Marley <michael@michaelmarley.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9bcae44e9883..ae31bd57127c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -36,6 +36,7 @@
 #include <net/vxlan.h>
 #include <net/mpls.h>
 #include <net/xdp_sock.h>
+#include <net/xfrm.h>

 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -8696,7 +8697,7 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 #endif /* IXGBE_FCOE */

 #ifdef CONFIG_IXGBE_IPSEC
-	if (secpath_exists(skb) &&
+	if (xfrm_offload(skb) &&
 	    !ixgbe_ipsec_tx(tx_ring, first, &ipsec_tx))
 		goto out_drop;
 #endif
-- 
2.17.1

^ permalink raw reply related

* Re: ANNOUNCE: rpld an another RPL implementation for Linux
From: Stefan Schmidt @ 2019-09-12 11:14 UTC (permalink / raw)
  To: Alexander Aring, open list:NETWORKING [GENERAL]
  Cc: Michael Richardson, Jamal Hadi Salim, Robert Kaiser,
	Martin Gergeleit, Kai Beckmann, koen, linux-wpan - ML, reubenhwk,
	BlueZ development, sebastian.meiling, Marcel Holtmann,
	Werner Almesberger, Jukka Rissanen
In-Reply-To: <CAB_54W7h9ca0UJAZtk=ApPX-2ZCvzu4774BTFTaB5mtkobWCtw@mail.gmail.com>

Hello Alex.

On 29.08.19 23:57, Alexander Aring wrote:
> Hi,
> 
> I had some free time, I wanted to know how RPL [0] works so I did a
> implementation. It's _very_ basic as it only gives you a "routable"
> (is that a word?) thing afterwards in a very constrained setup of RPL
> messages.
> 
> Took ~1 month to implement it and I reused some great code from radvd
> [1]. I released it under the same license (BSD?). Anyway, I know there
> exists a lot of memory leaks and the parameters are just crazy as not
> practical in a real environment BUT it works.
> 
> I changed a little bit the dependencies from radvd (because fancy new things):
> 
> - lua for config handling
> - libev for event loop handling
> - libmnl for netlink handling
> 
> The code is available at:
> 
> https://github.com/linux-wpan/rpld

I finally had a first look at it and played around a little bit.

How do you want to review patches for this? Pull requests on the github
repo or patches send on the linux-wpan list?

So far just some basic stuff I stumbled over when playing with it. Build
fixes (SCOPE_ID and different lua pkgconfig namings), leak fixes to
config.c as well as a travis setup to get building on CI as well as
submitting to Coverity scan service (the later two are already tested in
practice with some dev branches I pushed to the github repo, hope you
don't mind).

regards
Stefan Schmidt

^ permalink raw reply

* [patch iproute2-next v4 0/2] devlink: couple forgotten flash patches
From: Jiri Pirko @ 2019-09-12 11:29 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern, jakub.kicinski, saeedm, mlxsw, f.fainelli

From: Jiri Pirko <jiri@mellanox.com>

I was under impression they are already merged, but apparently they are
not. I just rebased them on top of current iproute2 net-next tree.

Jiri Pirko (2):
  devlink: implement flash update status monitoring
  devlink: implement flash status monitoring

 devlink/devlink.c      | 258 ++++++++++++++++++++++++++++++++++++++++-
 devlink/mnlg.c         |   5 +
 devlink/mnlg.h         |   1 +
 man/man8/devlink-dev.8 |  11 ++
 4 files changed, 271 insertions(+), 4 deletions(-)

-- 
2.20.1


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox