Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 4/4] net: mvpp2: 2500baseX support
From: Antoine Tenart @ 2018-01-03 15:32 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Antoine Tenart, davem, kishon, gregory.clement, linux, mw,
	stefanc, ymarkman, thomas.petazzoni, miquel.raynal, nadavh,
	netdev, linux-kernel
In-Reply-To: <20180103152036.GC3401@lunn.ch>

Hi Andrew,

On Wed, Jan 03, 2018 at 04:20:36PM +0100, Andrew Lunn wrote:
> > @@ -4612,6 +4616,9 @@ static int mvpp22_comphy_init(struct mvpp2_port *port)
> >  	case PHY_INTERFACE_MODE_1000BASEX:
> >  		mode = PHY_MODE_SGMII;
> >  		break;
> > +	case PHY_INTERFACE_MODE_2500BASEX:
> > +		mode = PHY_MODE_2500SGMII;
> > +		break;
> 
> I think this is the source of confusion with linux/phy.h and
> linux/phy/phy.h.
> 
> What would PHY_INTERFACE_MODE_2500SGMII use?
> 
> Where is this all getting confused? Should the caller to
> mvpp22_comphy_init() actually be passing PHY_INTERFACE_MODE_2500SGMII?
> What is the MAC actually doing at this point? 2500BASEX or 2500SGMII?

PHY_INTERFACE_MODE_2500BASEX is the PHY mode whereas PHY_MODE_2500SGMII
is the mode used by the common PHY driver (i.e. the one configuring the
serdes lanes).

There's no PHY_INTERFACE_MODE_2500SGMII mode.

> At minimum there needs to be a comment that this is not a typ0,
> otherwise you are going to get patches submitted to 'fix' this.

Sure, I can add a comment to state this function is a translation
between the net PHY mode and the generic PHY mode (it's a n-to-1
translation).

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* Re: [RFC PATCH net-next 03/19] ipv6: Clear nexthop flags upon netdev up
From: David Ahern @ 2018-01-03 15:32 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Ido Schimmel, netdev, davem, roopa, nicolas.dichtel, mlxsw
In-Reply-To: <20180103074418.GA761@splinter>

On 1/3/18 12:44 AM, Ido Schimmel wrote:
>>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>>> index ed06b1190f05..b6405568ed7b 100644
>>> --- a/net/ipv6/addrconf.c
>>> +++ b/net/ipv6/addrconf.c
>>> @@ -3484,6 +3484,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
>>>  			if (run_pending)
>>>  				addrconf_dad_run(idev);
>>>  
>>> +			/* Device has an address by now */
>>> +			rt6_sync_up(dev, RTNH_F_DEAD);
>>> +
>>
>> Seems like this should be in the NETDEV_UP section, say after
>> addrconf_permanent_addr.
> 
> Unless the `keep_addr_on_down` sysctl is set, then at this stage the
> netdev doesn't have an IP address and we shouldn't clear the dead flag
> just yet.
> 
> This is consistent with IPv4 that clears the dead flag from nexthops in
> a multipath route only if the nexthop device has an IP address. When the
> last IPv4 address is removed from a netdev all the routes using it are
> flushed and there's nothing to clear upon NETDEV_UP.

I have a bug about that IPv4 handling from the FRR team:

$ ip link add dummy1 type dummy
$ ip li set dummy1 up
$ ip route add 1.1.1.0/24 dev dummy1

$ ip addr add dev dummy1 2.2.2.1/24
$ ip ro ls | grep dummy1
1.1.1.0/24 dev dummy1 scope link
2.2.2.0/24 dev dummy1 proto kernel scope link src 2.2.2.1

$ ip addr del dev dummy1 2.2.2.1/24
$ ip ro ls | grep dummy1
<no outpu>

The 1.1.1.0/24 route was removed as well the 2.2.2.0 connected route.

^ permalink raw reply

* Re: [PATCH 0/2] Kill redundant checks in the Renesas Ethernet drivers
From: Sergei Shtylyov @ 2018-01-03 15:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-renesas-soc
In-Reply-To: <20180103.102155.745876383952341968.davem@davemloft.net>

Hello!

On 01/03/2018 06:21 PM, David Miller wrote:

> From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> Date: Sun, 31 Dec 2017 21:41:34 +0300
> 
>> Here's a set of 2 patches against DaveM's 'net-next.git' repo removing
>> redundant checks in the driver probe() methods.
> 
> Series applied with the "disassembly" typo fixed.

    Thank you. Expect a fix for a 'sh_eth' bug added in 2013 soonish. :-)

MBR, Sergei

^ permalink raw reply

* Re: [PATCHv5 2/3] net: socionext: Add Synquacer NetSec driver
From: David Miller @ 2018-01-03 15:35 UTC (permalink / raw)
  To: jassisinghbrar
  Cc: netdev, devicetree, arnd.bergmann, andrew, ard.biesheuvel,
	robh+dt, mark.rutland, masami.hiramatsu, jaswinder.singh
In-Reply-To: <1514783689-4352-1-git-send-email-jassisinghbrar@gmail.com>

From: jassisinghbrar@gmail.com
Date: Mon,  1 Jan 2018 10:44:49 +0530

> +#define DRING_TAIL(r)		((r)->tail)
> +
> +#define DRING_HEAD(r)		((r)->head)

These macros do not help readability at all.

> +#define MOVE_TAIL(r)		do { \
> +					if (++(r)->tail == DESC_NUM) \
> +						(r)->tail = 0; \
> +				} while (0)
> +
> +#define MOVE_HEAD(r)		do { \
> +					if (++(r)->head == DESC_NUM) \
> +						(r)->head = 0; \
> +				} while (0)
> +
> +#define JUMP_HEAD(r, n)	do { \
> +					int i; \
> +					for (i = 0; i < (n); i++) \
> +						MOVE_HEAD(r); \
> +				} while (0)

Neither do these.

And JUMP_HEAD is so inefficient, it's a constant time calculation:

	r->head += n;
	if (r->head >= DESC_NUM)
		r->head -= DESC_NUM;

All of this stuff can be done inline without all of these CPP macros
which are discouraged, have multiple evaluation issues, and decrease
the amount of type checking going on.

If you absolutely must have helpers, use static functions (without
the inline keyword, let the compiler device).

> +static inline int available_descs(struct netsec_desc_ring *r)

No inline functions in foo.c files, let the compiler device.

> +/*************************************************************/
> +/*********************** NETDEV_OPS **************************/
> +/*************************************************************/

Please, comments are not billboards or Star Wars openning sequences.
Simplify this, thank you.

> +
> +static void netsec_set_tx_de(struct netsec_priv *priv,
> +			     struct netsec_desc_ring *dring,
> +			     const struct netsec_tx_pkt_ctrl *tx_ctrl,
> +			     const struct netsec_desc *desc,
> +			     struct sk_buff *skb)
> +{
> +	struct netsec_de *de;
> +	int idx = DRING_HEAD(dring);
> +	u32 attr;

Please order local variables from longest to shortest line.

Please audit your entire submission for this issue.

Thank you.

^ permalink raw reply

* Re: [PATCH 00/10] Convert mvneta to phylink
From: David Miller @ 2018-01-03 15:39 UTC (permalink / raw)
  To: linux; +Cc: andrew, f.fainelli, thomas.petazzoni, netdev
In-Reply-To: <20180102172242.GE28752@n2100.armlinux.org.uk>

From: Russell King - ARM Linux <linux@armlinux.org.uk>
Date: Tue, 2 Jan 2018 17:22:42 +0000

> This series converts mvneta to use phylink, which is necessary to
> support the SFP cages on SolidRun's Clearfog platform.  This series just
> converts mvneta without adding the DT parts - having discussed with
> Andrew, we believe we're too close to the merge window to submit that
> patch.
> 
> I've split the "net: mvneta: convert to phylink" patch up to make it
> easier to review, and in doing so, spotted some minor corner cases that
> needed to be fixed along the way.
> 
> This series depends on the previously merged phylink patches in netdev,
> along with the recently reviewed 7 patch series "Resolve races in phy
> accessors" without which, the race described in patch 5 of that series
> is very evident when triggering a dummy hibernate cycle.
> 
> This series also illustrates how to convert mvpp2 to phylink.
> 
> mvneta is the only user of the fixed_phy_update_state() API, and this
> becomes redundant with the conversion.
> 
> It would be good to get this series not only reviewed, but also
> independently tested to ensure that I haven't missed anything - I only
> have the Clearfog platform to test on, and that doesn't support all the
> different interface modes that mvneta supports.
> 
> A particularly interesting side effect of this series is that DSA
> switches no longer need the "CPU" port and DSA facing MAC ethernet
> instance to be marked as a fixed link anymore with mvneta - we can use
> 1000BaseX mode, and the DSA to CPU link will use the 802.3z negotiation
> to determine the link properties without needing the link parameters to
> be explicitly stated in DT - that is a subject of a future patch.

This looks good to me, series applied, thanks Russell.

^ permalink raw reply

* Re: [PATCH net-next 2/5] net: dsa: mv88e6xxx: Hold mutex while doing stats operations
From: Vivien Didelot @ 2018-01-03 15:43 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David Miller, Florian Fainelli, netdev, Russell King
In-Reply-To: <20180103150207.GA3401@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

> On Wed, Jan 03, 2018 at 09:32:42AM -0500, Vivien Didelot wrote:
>> Hi Andrew,
>> 
>> Andrew Lunn <andrew@lunn.ch> writes:
>> 
>> > -static int mv88e6xxx_get_sset_count(struct dsa_switch *ds, int port)
>> > +static int _mv88e6xxx_get_sset_count(struct dsa_switch *ds, int port)
>> >  {
>> >  	struct mv88e6xxx_chip *chip = ds->priv;
>> >  
>> > @@ -702,6 +706,19 @@ static int mv88e6xxx_get_sset_count(struct dsa_switch *ds, int port)
>> >  	return 0;
>> >  }
>> 
>> We worked to remove the old underscore prefix convention. Please don't
>> add it back... Simply rework the return statements of
>> mv88e6xxx_get_sset_count to lock/unlock there.
>
> That makes mv88e6xxx_get_sset_count quite complex, making it error
> prone. Doing the locking in a separate function makes is very clear
> the lock is held and then correctly released. So i will just rename
> _mv88e6xxx_get_sset_count() to mv88e6xxx_get_sset_count_locked()

     static int mv88e6xxx_get_sset_count(struct dsa_switch *ds)
     {
            struct mv88e6xxx_chip *chip = ds->priv;
    +       int err;
     
    -       if (chip->info->ops->stats_get_sset_count)
    -               return chip->info->ops->stats_get_sset_count(chip);
    +       if (!chip->info->ops->stats_get_sset_count)
    +               return 0;
     
    -       return 0;
    +       mutex_lock(&chip->reg_lock);
    +       err = chip->info->ops->stats_get_sset_count(chip);
    +       mutex_unlock(&chip->reg_lock);
    +
    +       return err;
     }


This is quite complex and error prone, seriously?


     Vivien

^ permalink raw reply

* [PATCH net] net: stmmac: enable EEE in MII, GMII or RGMII only
From: Jerome Brunet @ 2018-01-03 15:46 UTC (permalink / raw)
  To: netdev, Giuseppe Cavallaro, Alexandre Torgue
  Cc: Jerome Brunet, linux-kernel, linux-amlogic

Note in the databook - Section 4.4 - EEE :
" The EEE feature is not supported when the MAC is configured to use the
TBI, RTBI, SMII, RMII or SGMII single PHY interface. Even if the MAC
supports multiple PHY interfaces, you should activate the EEE mode only
when the MAC is operating with GMII, MII, or RGMII interface."

Applying this restriction solves a stability issue observed on Amlogic
gxl platforms operating with RMII interface and the internal PHY.

Fixes: 83bf79b6bb64 ("stmmac: disable at run-time the EEE if not supported")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Arnaud Patard <arnaud.patard@rtp-net.org>
---

Hi David,

As quickly explained in the log above, we are having some stability issue on
Amlogic gxl platform: synopsys MAC + internal PHY over RMII. We found out that
these issues are caused by EEE. Forcefully disabling this feature solves our
problem.

I only had access to a snip of databook. I have no idea if the restriction
explained above applies to all the revision of Synopsys MAC controller.

Anyway, this change should be safe to apply. If we get more information from
Synopsys later on, it will be easy to revert, if necessary.

Thanks
Jerome

Change since RFC [0]:
* use phy_interface_mode_is_rgmii() as suggested by Andrew

[0]: http://lkml.kernel.org/r/20171205102809.4347-1-jbrunet@baylibre.com

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c52a9963c19d..023938166821 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -364,9 +364,15 @@ static void stmmac_eee_ctrl_timer(struct timer_list *t)
 bool stmmac_eee_init(struct stmmac_priv *priv)
 {
 	struct net_device *ndev = priv->dev;
+	int interface = priv->plat->interface;
 	unsigned long flags;
 	bool ret = false;
 
+	if ((interface != PHY_INTERFACE_MODE_MII) &&
+	    (interface != PHY_INTERFACE_MODE_GMII) &&
+	    !phy_interface_mode_is_rgmii(interface))
+		goto out;
+
 	/* Using PCS we cannot dial with the phy registers at this stage
 	 * so we do not support extra feature like EEE.
 	 */
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH net] tipc: fix missing rtnl lock protection during setting link properties
From: David Miller @ 2018-01-03 15:48 UTC (permalink / raw)
  To: ying.xue; +Cc: netdev, jon.maloy, syzkaller-bugs, tipc-discussion
In-Reply-To: <1514802241-7896-1-git-send-email-ying.xue@windriver.com>

From: Ying Xue <ying.xue@windriver.com>
Date: Mon, 1 Jan 2018 18:24:01 +0800

> diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c
> index e48f0b2..0fb12a4 100644
> --- a/net/tipc/netlink_compat.c
> +++ b/net/tipc/netlink_compat.c
> @@ -720,17 +720,21 @@ static int tipc_nl_compat_link_set(struct tipc_nl_compat_cmd_doit *cmd,
>  
>  	lc = (struct tipc_link_config *)TLV_DATA(msg->req);
>  
> +	rtnl_lock();
>  	media = tipc_media_find(lc->name);
>  	if (media) {
>  		cmd->doit = &tipc_nl_media_set;
> +		rtnl_unlock();
>  		return tipc_nl_compat_media_set(skb, msg);
>  	}
>  
>  	bearer = tipc_bearer_find(msg->net, lc->name);
>  	if (bearer) {
>  		cmd->doit = &tipc_nl_bearer_set;
> +		rtnl_unlock();
>  		return tipc_nl_compat_bearer_set(skb, msg);
>  	}
> +	rtnl_unlock();
>  
>  	return __tipc_nl_compat_link_set(skb, msg);
>  }

As soon as you drop the RTNL lock, the media or bearer entry can be
removed from the tables.

This invalidates what you do next, whether it's
tipc_nl_compat_media_set(), tipc_nl_compat_bearer_set(), etc.

Therefore, you have to lock down the tipc configuration state around
this entire operation, from media/bearer probe to the building of the
netlink message(s).

Either this entire code path must execute with the bearer/media entry
present, or without.  If you drop the RTNL mutex in the middle, this
invariant is not held.

^ permalink raw reply

* Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
From: Michael S. Tsirkin @ 2018-01-03 15:50 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, jakub.kicinski, xiyou.wangcong, jiri, netdev
In-Reply-To: <457d384f-5cd1-d52d-f6e9-4c9ebb1f8d09@gmail.com>

On Tue, Jan 02, 2018 at 04:25:03PM -0800, John Fastabend wrote:
> > 
> > More generally, what makes this usage safe?
> > Is there a way to formalize it at the API level?
> > 
> 
> Right I think these are good questions. I think the ptr_ring API should
> allow a peek operation to be used without a lock. The user has to ensure
> they only use it as a hint and if its dereferenced the user needs to
> ensure the object is not free'd from some other codepath while it is
> being dereferenced. The existing API seems to match this.
> 
> This is how I used it in pfifo_fast expecting the above to be true. The
> API allows for false negatives which _should_ be OK if the user is expecting
> this. Alternatively, we could make it false positives if you want and
> that would also work for me considering this case is hit very rarely.

By now I'm confused by which are positives and which are negatives :)
OK so the guarantees we want would be:

- empty can return false if ring is empty.
  caller must re-check with consumer lock taken
- if multiple threads call it, only one thread
  is guaranteed that empty will not return true
  if ring is non empty.
  after detecting that ring is not empty,
  this thread shall ....

can you help me fill in the blank please?





> >>> John, others, could you pls confirm it's not too bad performance-wise?
> >>> I'll then split it up properly and re-post.
> >>>
> >>
> >> I haven't benchmarked it but in the dequeue case taking a lock for
> >> every priority even when its empty seems unneeded.
> > 
> > Well it does seem to make the code more comprehensible and more robust.
> > 
> 
> Its a trade-off between performance and robustness.
> 
> > But OK -  I was looking at fixing the unlocked empty API to make sure it
> > actually does what it's supposed to. I posted a draft earlier in this
> > thread, it needs to be looked at in depth to figure out whether it can
> > ever give false negatives or positives, and document the results.
> > 
> > 
> 
> I'll look at it. But I still think keeping a lockless version makes sense
> for many use cases.

Fine. Just let's try to document what are the guarantees.

> > 
> >>> -->
> >>>
> >>> net: don't misuse ptr_ring_peek
> >>>
> >>> ptr_ring_peek only claims to be safe if the result is never
> >>> dereferenced, which isn't the case for its use in sch_generic.
> >>> Add locked API variants and use the bh one here.
> >>>
> >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>
> >>> ---
> >>>
> >>
> >> [...]
> >>
> >>> --- a/net/sched/sch_generic.c
> >>> +++ b/net/sched/sch_generic.c
> >>> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
> >>>  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
> >>>  		struct skb_array *q = band2list(priv, band);
> >>>  
> >>> -		skb = __skb_array_peek(q);
> >>> +		skb = skb_array_peek_bh(q);
> >>
> >> Ah I should have added a comment here. For now peek() is only used from
> >> locking qdiscs. So peek and consume/produce operations will never happen
> >> in parallel. In this case we should never hit the false negative case with
> >> my patch or the out of bounds reference without my patch.

OK so this is the part I missed. Can you add a comment please?


> >> Doing a peek() op without qdisc lock is a bit problematic anyways. With
> >> current code another cpu could consume the skb and free it. Either we
> >> can ensure a single consumer runs at a time on an array (not the same as
> >> qdisc maybe) or just avoid peek operations in this case. My current plan
> >> was to just avoid peek() ops altogether, they seem unnecessary with the
> >> types of qdiscs I want to be build.
> >>
> >> Thanks,
> >> John
> > 
> > For the lockless qdisc, for net-next, do we need the patch above until you fix that though?
> > 
> 
> No, I think after this patch (net: ptr_ring: otherwise safe empty checks...) is
> applied we do not need any additional fixes in net-next. Future work will
> require the above patch (the one you provided) though so its useful work.

The one that avoids allocating extra memory?


> I'll do another review of the false positive case though to be sure the
> current code is OK wrt handling false positives and any potential stalls.


Thanks!

> >>>  	}
> >>>  
> >>>  	return skb;
> >>>

^ permalink raw reply

* RE: [EXT] Re: [PATCH net-next v2 4/4] net: mvpp2: 2500baseX support
From: Stefan Chulski @ 2018-01-03 15:50 UTC (permalink / raw)
  To: Antoine Tenart, Andrew Lunn
  Cc: davem@davemloft.net, kishon@ti.com,
	gregory.clement@free-electrons.com, linux@armlinux.org.uk,
	mw@semihalf.com, Yan Markman, thomas.petazzoni@free-electrons.com,
	miquel.raynal@free-electrons.com, Nadav Haklai,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20180103153227.GA9227@kwain>



> -----Original Message-----
> From: Antoine Tenart [mailto:antoine.tenart@free-electrons.com]
> Sent: Wednesday, January 03, 2018 5:32 PM
> To: Andrew Lunn <andrew@lunn.ch>
> Cc: Antoine Tenart <antoine.tenart@free-electrons.com>;
> davem@davemloft.net; kishon@ti.com; gregory.clement@free-electrons.com;
> linux@armlinux.org.uk; mw@semihalf.com; Stefan Chulski
> <stefanc@marvell.com>; Yan Markman <ymarkman@marvell.com>;
> thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com;
> Nadav Haklai <nadavh@marvell.com>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: [EXT] Re: [PATCH net-next v2 4/4] net: mvpp2: 2500baseX support
> 
> External Email
> 
> ----------------------------------------------------------------------
> Hi Andrew,
> 
> On Wed, Jan 03, 2018 at 04:20:36PM +0100, Andrew Lunn wrote:
> > > @@ -4612,6 +4616,9 @@ static int mvpp22_comphy_init(struct
> mvpp2_port *port)
> > >  	case PHY_INTERFACE_MODE_1000BASEX:
> > >  		mode = PHY_MODE_SGMII;
> > >  		break;
> > > +	case PHY_INTERFACE_MODE_2500BASEX:
> > > +		mode = PHY_MODE_2500SGMII;
> > > +		break;
> >
> > I think this is the source of confusion with linux/phy.h and
> > linux/phy/phy.h.
> >
> > What would PHY_INTERFACE_MODE_2500SGMII use?
> >
> > Where is this all getting confused? Should the caller to
> > mvpp22_comphy_init() actually be passing
> PHY_INTERFACE_MODE_2500SGMII?
> > What is the MAC actually doing at this point? 2500BASEX or 2500SGMII?
> 
> PHY_INTERFACE_MODE_2500BASEX is the PHY mode whereas
> PHY_MODE_2500SGMII is the mode used by the common PHY driver (i.e. the
> one configuring the serdes lanes).
> 
> There's no PHY_INTERFACE_MODE_2500SGMII mode.
> 
> > At minimum there needs to be a comment that this is not a typ0,
> > otherwise you are going to get patches submitted to 'fix' this.
> 
> Sure, I can add a comment to state this function is a translation between the
> net PHY mode and the generic PHY mode (it's a n-to-1 translation).
> 

Maybe we should rename enum phy_mode to comphy_mode and PHY_MODE_2500SGMII to COMPHY_MODE_2500SGMII.
Since this enum set MAC to PHY serdes communication mode, not PHY to PHY communication mode.

Stefan.

^ permalink raw reply

* Re: [PATCH v4] uapi libc compat: add fallback for unsupported libcs
From: David Miller @ 2018-01-03 15:53 UTC (permalink / raw)
  To: hauke
  Cc: netdev, linux-kernel, linux-api, musl, felix.janda, f.fainelli,
	carlos, ldv
In-Reply-To: <20180101183320.24409-1-hauke@hauke-m.de>

From: Hauke Mehrtens <hauke@hauke-m.de>
Date: Mon,  1 Jan 2018 19:33:20 +0100

> From: Felix Janda <felix.janda@posteo.de>
> 
> libc-compat.h aims to prevent symbol collisions between uapi and libc
> headers for each supported libc. This requires continuous coordination
> between them.
> 
> The goal of this commit is to improve the situation for libcs (such as
> musl) which are not yet supported and/or do not wish to be explicitly
> supported, while not affecting supported libcs. More precisely, with
> this commit, unsupported libcs can request the suppression of any
> specific uapi definition by defining the correspondings _UAPI_DEF_*
> macro as 0. This can fix symbol collisions for them, as long as the
> libc headers are included before the uapi headers. Inclusion in the
> other order is outside the scope of this commit.
> 
> All infrastructure in order to enable this fallback for unsupported
> libcs is already in place, except that libc-compat.h unconditionally
> defines all _UAPI_DEF_* macros to 1 for all unsupported libcs so that
> any previous definitions are ignored. In order to fix this, this commit
> merely makes these definitions conditional.
> 
> This commit together with the musl libc commit
> 
> http://git.musl-libc.org/cgit/musl/commit/?id=04983f2272382af92eb8f8838964ff944fbb8258
> 
> fixes for example the following compiler errors when <linux/in6.h> is
> included after musl's <netinet/in.h>:
> 
> ./linux/in6.h:32:8: error: redefinition of 'struct in6_addr'
> ./linux/in6.h:49:8: error: redefinition of 'struct sockaddr_in6'
> ./linux/in6.h:59:8: error: redefinition of 'struct ipv6_mreq'
> 
> The comments referencing glibc are still correct, but this file is not
> only used for glibc any more.
> 
> Signed-off-by: Felix Janda <felix.janda@posteo.de>
> Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>

Ok, applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next v2 4/4] net: mvpp2: 2500baseX support
From: Andrew Lunn @ 2018-01-03 15:53 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, kishon, gregory.clement, linux, mw, stefanc, ymarkman,
	thomas.petazzoni, miquel.raynal, nadavh, netdev, linux-kernel
In-Reply-To: <20180103153227.GA9227@kwain>

On Wed, Jan 03, 2018 at 04:32:27PM +0100, Antoine Tenart wrote:
> Hi Andrew,
> 
> On Wed, Jan 03, 2018 at 04:20:36PM +0100, Andrew Lunn wrote:
> > > @@ -4612,6 +4616,9 @@ static int mvpp22_comphy_init(struct mvpp2_port *port)
> > >  	case PHY_INTERFACE_MODE_1000BASEX:
> > >  		mode = PHY_MODE_SGMII;
> > >  		break;
> > > +	case PHY_INTERFACE_MODE_2500BASEX:
> > > +		mode = PHY_MODE_2500SGMII;
> > > +		break;
> > 
> > I think this is the source of confusion with linux/phy.h and
> > linux/phy/phy.h.
> > 
> > What would PHY_INTERFACE_MODE_2500SGMII use?
> > 
> > Where is this all getting confused? Should the caller to
> > mvpp22_comphy_init() actually be passing PHY_INTERFACE_MODE_2500SGMII?
> > What is the MAC actually doing at this point? 2500BASEX or 2500SGMII?
> 
> PHY_INTERFACE_MODE_2500BASEX is the PHY mode whereas PHY_MODE_2500SGMII
> is the mode used by the common PHY driver (i.e. the one configuring the
> serdes lanes).

> There's no PHY_INTERFACE_MODE_2500SGMII mode.

Hi Antoine

At the moment there is no PHY_INTERFACE_MODE_2500SGMII. However,
there are some devices which can do 2.5G SGMII. So it will appear
sometime. This piece of code then looks even stranger.

> Sure, I can add a comment to state this function is a translation
> between the net PHY mode and the generic PHY mode (it's a n-to-1
> translation).

I think from an API design point of view, passing PHY_MODE_2500BASEX
to comphy makes more sense. That is what the MAC wants to do. How the
comphy achieves that should be internal to the comphy.

       Andrew

^ permalink raw reply

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
From: David Ahern @ 2018-01-03 15:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <20180103094025.GA2067@nanopsycho.orion>

On 1/3/18 2:40 AM, Jiri Pirko wrote:
> Wed, Jan 03, 2018 at 03:07:36AM CET, dsahern@gmail.com wrote:
>> On 1/2/18 12:49 PM, Jiri Pirko wrote:
>>> DaveA, please consider following example:
>>>
>>> $ tc qdisc add dev ens7 ingress
>>> $ tc qdisc
>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>>>
>>> Now I have one device with one qdisc attached.
>>>
>>> I will add some filters, for example:
>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>
>>> No sharing is happening. The user is doing what he is used to do.
>>>
>>> Now user decides to share this filters with another device. As you can
>>> see above, the block created for ens7 qdisc instance has id "1".
>>> User can simply do:
>>>
>>> tc qdisc add dev ens8 ingress block 1
>>>
>>> And the block gets shared among ens7 ingress qdisc instance and ens8
>>> ingress qdisc instance.
>>>
>>> What is wrong with this? The approach you suggest would disallow this
>>
>> Conceptually, absolutely nothing. We all agree that a shared block
>> feature is needed. So no argument on sharing the filters across devices.
>>
>> The disagreement is in how they should be managed. I think my last
>> response concisely captures my concerns -- the principle of least surprise.
>>
>> So with the initial commands above, all is fine. Then someone is
>> debugging a problem or wants to add another filter to ens8, so they run:
>>
>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>> 192.168.1.0/16 action drop
>>
>> Then traffic flows through ens7 break and some other user is struggling
>> to understand what just happened. That the new filter magically appears
>> on ens7 when the user operated on ens8 is a surprise. Nothing about that
>> last command acknowledges that it is changing a shared resource.
> 
> Given the fact that user configured sharing between ens7 and ens8 and he
> can easily see that by "$ tc qdisc show" I don't see anything wrong
> about it, no surprise. Either the user knows what is he doing or not.

tc is one of the most difficult commands for users to understand and get
right. The API behind the command even more so. There seems to be a
general agreement on this.

To someone like you who is well versed in tc semantics this may seem
obvious, but I contend that even you would slip up here at some point.
There is too much distance between the filter management and the qdisc
listing a part of which shows a block id - not that it is shared or
anything else, just of the many words in the output there is 'block N'.

>>
>> Consider the commands being run by different people, and a time span
>> between. Allowing the shared block to be configured by any device using
>> the block is just setting up users for errors and confusion.
> 
> No confusion. Everything is visible, all info is in the manpage. The
> same story as always.
> 
> 
>>
>>> forcing user to explicitly create some block entity and then to attach
>>> it to qdisc instances. I don't really see good reason for it. Could you
>>> please clear this up for me?
>>
>> It forces the user to acknowledge it is changing a resource that may be
>> shared by more than one device.
>>
>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>> 192.168.1.0/16 action drop
>> Error: This qdisc is a shared block. Use the block API to configure.
>>
>> $ tc qdisc show dev ens8
>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>>
>> $ tc filter add block 1 protocol ip pref 25 flower dst_ip 192.168.1.0/16
>> action drop
>>
>> Now there are no surprises. I have to know that ens8 is using block 1,
>> and I have to specify that block when adding a filter.
> 
> On contrary. This is surprising! Consider my original example extended
> by your approach and limitations:

Nope, I was not extending your approach; I was using your examples to
show why I disagree with the approach. As I mentioned in past responses,
I believe the block lifecycle should be independent of any device.

$ tc qdisc add block 1
$ tc filter add block 1 ....

$ tc qdisc add dev ens7 ingress block 1
$ tc qdisc add dev ens8 ingress block 1

$ tc filter show block 1
(filters listed)

$ tc filter show dev ens7 ingress
Info: This qdisc is shared. Use the block api to list filters.

(This is very similar to how I am handling nexthop objects for routes. I
can show some examples if desired, but don't want this tangent to go too
far.)

> 
> $ tc qdisc add dev ens7 ingress
> $ tc qdisc
> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
> 
> So far, everything is good. Now I add qdisc with block 1 to ens8:
> $ tc qdisc add dev ens8 ingress block 1
> 
> And I do:
> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
> Should it Error out or pass by your limitations?
> 
> Assume it should pass.
> I do:
> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip 192.168.1.0/16 action drop
> Error: This qdisc is a shared block. Use the block API to configure.
> 
> This will error out as you wrote. Now I do show:
> 
> $ tc qdisc show dev ens8                                                
> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
> 
> As you wrote, there is "ens7" in output of ens8 qdisc. That is
> surprising.

that's a typo on my part with copy-paste-modify of commands while
writing an email; that was not intentional to show ens7 on an ens8 device.


> 
> What would following commands show with your limitations:
> $ tc filter show dev ens7 ingress
> $ tc filter show dev ens8 ingress

see above

> 
> All filters should be listed under ens7 and ens8 should be blank? I
> cannot add filters to ens8 with your limitations so I guess the show for
> it should be blank. But there are actually rules there! That is another
> surprise and breakage!
> 
> Now I continue and remove the qdisc from ens7:
> $ tc qdisc add dev ens7 ingress
> 
> The block 1 is still there for ens8. So what happens now? What is the
> output of "filter show dev ens8 ingress" and "qdisc show dev ens8"?
> Will "add dev ens8 ingress" magically start to work now? This is another
> set of surprises and breakages.
> 
> So as I see it with your limitations, there is a lot of surprises
> introduced.
> 
> Note that I gave a lot of thoughts to all this. The approach I suggest
> is the cleanest and does not break anything. Also, it is easily
> extendable by adding the block handle to add/del/list the filters.
> But the current commands should not be broken. Please.
> 
> If you want, I can implement the block handle extension as a part of this
> patchset. I wanted to do it as a follow-up to limit the number of
> patches in the set so DaveM would not have reason to hate me :)
> 
> 
>>
>>
>> BTW, is there an option to list all devices using the same shared block
>> - short of listing all and grepping?
> 
> $ tc qdisc show
> 
> 

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: collect TX rate limit info in UP CIM logs
From: David Miller @ 2018-01-03 15:58 UTC (permalink / raw)
  To: rahul.lakkireddy; +Cc: netdev, ganeshgr, nirranjan, indranil
In-Reply-To: <1514877538-26024-1-git-send-email-rahul.lakkireddy@chelsio.com>

From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Date: Tue,  2 Jan 2018 12:48:58 +0530

> Collect TX rate limiting related information in UP CIM logs.
> 
> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>

Applied, thanks.

^ permalink raw reply

* [tip:core/rcu] drivers/net/ethernet/qlogic/qed: Fix __qed_spq_block() ordering
From: tip-bot for Paul E. McKenney @ 2018-01-03 16:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: netdev, mingo, Ariel.Elior, hpa, tglx, everest-linux-l2, paulmck

Commit-ID:  cb7e125e03274cffa97d74433c876765efffaf6a
Gitweb:     https://git.kernel.org/tip/cb7e125e03274cffa97d74433c876765efffaf6a
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Mon, 9 Oct 2017 09:26:25 -0700
Committer:  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CommitDate: Mon, 4 Dec 2017 10:52:52 -0800

drivers/net/ethernet/qlogic/qed: Fix __qed_spq_block() ordering

The __qed_spq_block() function expects an smp_read_barrier_depends()
to order a prior READ_ONCE() against a later load that does not depend
on the prior READ_ONCE(), an expectation that can fail to be met.
This commit therefore replaces the READ_ONCE() with smp_load_acquire()
and removes the smp_read_barrier_depends().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: <everest-linux-l2@cavium.com>
Cc: <netdev@vger.kernel.org>
---
 drivers/net/ethernet/qlogic/qed/qed_spq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c b/drivers/net/ethernet/qlogic/qed/qed_spq.c
index be48d9a..c1237ec 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_spq.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c
@@ -97,9 +97,7 @@ static int __qed_spq_block(struct qed_hwfn *p_hwfn,
 
 	while (iter_cnt--) {
 		/* Validate we receive completion update */
-		if (READ_ONCE(comp_done->done) == 1) {
-			/* Read updated FW return value */
-			smp_read_barrier_depends();
+		if (smp_load_acquire(&comp_done->done) == 1) { /* ^^^ */
 			if (p_fw_ret)
 				*p_fw_ret = comp_done->fw_return_code;
 			return 0;

^ permalink raw reply related

* Re: [PATCH] ethernet: mlx4: Delete an error message for a failed memory allocation in five functions
From: Jason Gunthorpe @ 2018-01-03 16:03 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Julia Lawall, SF Markus Elfring, linux-rdma, netdev, LKML,
	kernel-janitors
In-Reply-To: <adfc3229-d3f9-afb9-00e6-9f765af0cbe8@mellanox.com>

On Wed, Jan 03, 2018 at 01:24:59PM +0200, Tariq Toukan wrote:

> >Strings take up space.  Since there is a backtrace on an out of memory
> >problem, if the string does not provide any more information than the
> >position of the call, then there is not much added value.  I don't know
> >what was the string in this case.  If it provides some additional
> >information, then it would be reasonable to keep it.
> 
> I don't really accept this claim...

The standard we are moving to is to rely on the backtrace print for
debugging. It is so huge it is unlikely a single print from the driver
will make much difference to the user's view.

Most users think backtrace == oops == bug report.

> In addition, some out-of-memory errors are recoverable, even though their
> backtrace is also printed. For example, in function mlx4_en_create_cq
> (appears in patch) we have a first allocation attempt (kzalloc_node) and a
> fallback (kzalloc).

Please send a patch fixing this as other have suggested, it is clearly
a bug.

Jason

^ permalink raw reply

* Re: [PATCH net-next v2 0/7] Resolve races in phy accessors
From: David Miller @ 2018-01-03 16:04 UTC (permalink / raw)
  To: linux; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <20180102105218.GB21998@n2100.armlinux.org.uk>

From: Russell King - ARM Linux <linux@armlinux.org.uk>
Date: Tue, 2 Jan 2018 10:52:18 +0000

> This series resolves races with various accesses to PHY registers.
> The first five patches are necessary before we add phylink support
> to mvneta, the remaining three are merely cleanups for unobserved
> races, and hence are less critical.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next] net: mdio: Only perform gpio reset for PHYs
From: David Miller @ 2018-01-03 16:08 UTC (permalink / raw)
  To: andrew; +Cc: f.fainelli, sean.wang, sergei.shtylyov, geert+renesas, netdev
In-Reply-To: <1514911226-24622-1-git-send-email-andrew@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Tue,  2 Jan 2018 17:40:26 +0100

> Ethernet switch on the MDIO bus have historically performed their own
> handling of the GPIO reset line. The resent patch to have the MDIO
> core handle the reset has broken the switch drivers, in that they
> cannot claim the GPIO. Some switch drivers need more control over the
> GPIO line than what the MDIO core provides. So restore the historical
> behaviour by only performing a reset of PHYs, not switches.
> 
> Fixes: bafbdd527d56 ("phylib: Add device reset GPIO support")
> Reported-by: Sean Wang <sean.wang@mediatek.com>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

Applied, thanks Andrew.

^ permalink raw reply

* Re: [PATCH 0/5] VSOCK: add vsock_test test suite
From: Jorgen S. Hansen @ 2018-01-03 16:09 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev@vger.kernel.org, Dexuan Cui
In-Reply-To: <20180102120509.GB831@stefanha-x1.localdomain>


> On Jan 2, 2018, at 1:05 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Wed, Dec 20, 2017 at 02:48:43PM +0000, Jorgen S. Hansen wrote:
>> 
>>> On Dec 13, 2017, at 3:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> 
>>> The vsock_diag.ko module already has a test suite but the core AF_VSOCK
>>> functionality has no tests.  This patch series adds several test cases that
>>> exercise AF_VSOCK SOCK_STREAM socket semantics (send/recv, connect/accept,
>>> half-closed connections, simultaneous connections).
>>> 
>>> The test suite is modest but I hope to cover additional cases in the future.
>>> My goal is to have a shared test suite so VMCI, Hyper-V, and KVM can ensure
>>> that our transports behave the same.
>>> 
>>> I have tested virtio-vsock.
>>> 
>>> Jorgen: Please test the VMCI transport and let me know if anything needs to be
>>> adjusted.  See tools/testing/vsock/README for information on how to run the
>>> test suite.
>>> 
>> 
>> I tried running the vsock_test on VMCI, and all the tests failed in one way or
>> another:
> 
> Great, thank you for testing and looking into the failures!
> 
>> 1) connection reset test: when the guest tries to connect to the host, we
>>  get EINVAL as the error instead of ECONNRESET. I’ll fix that.
> 
> Yay, the tests found a bug!
> 
>> 2) client close and server close tests: On the host side, VMCI doesn’t
>>  support reading data from a socket that has been closed by the
>>  guest. When the guest closes a connection, all data is gone, and
>>  we return EOF on the host side. So the tests that try to read data
>>  after close, should not attempt that on VMCI host side. I got the
>>  tests to pass by adding a getsockname call to determine if
>>  the local CID was the host CID, and then skip the read attempt
>>  in that case. We could add a vmci flag, that would enable
>>  this behavior.
> 
> Interesting behavior.  Is there a reason for disallowing half-closed
> sockets on the host side?

This is a consequence of the way the underlying VMCI queue pairs are
implemented. When the guest side closes a connection, it signals this
to the peer by detaching from the VMCI queue pair used for the data
transfer (the detach will result in an event being generated on the
peer side). However, the VMCI queue pair is allocated as part of the
guest memory, so when the guest detaches, that memory is reclaimed.
So the host side would need to create a copy of the contents of that
queue pair in kernel memory as part of the detach operation. When
this was implemented, it was decided that it was better to avoid
a potential large kernel memory allocation and the data copy at
detach time than to maintain the half close behavior of INET.

> 
>> 5) multiple connections tests: with the standard socket sizes,
>>  VMCI is only able to support about 100 concurrent stream
>>  connections so this test passes with MULTICONN_NFDS
>>  set to 100.
> 
> The 1000 magic number is because many distros have a maximum number of
> file descriptors ulimit of 1024.  But it's an arbitrary number and we
> could lower it to 100.
> 
> Is this VMCI concurrent stream limit a denial of service vector?  Can an
> unprivileged guest userspace process open many sockets to prevent
> legitimate connections from other users within the same guest?

vSocket uses VMCI queue pairs for the stream, and the VMCI device
only allows a limited amount of memory to be used for queue pairs
per VM. So it is possible to exhaust this shared resource. The queue
pairs are created as part of the connection establishment process, so
it would require the user process to both create and connect the sockets
to a host side endpoint (connections between guest processes will not
allocate VMCI queue pairs).

Thanks,
Jorgen

^ permalink raw reply

* [tip:core/rcu] drivers/vhost: Remove now-redundant read_barrier_depends()
From: tip-bot for Paul E. McKenney @ 2018-01-03 16:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mst, tglx, virtualization, paulmck, kvm, jasowang, netdev, mingo,
	hpa

Commit-ID:  3a5db0b108e0a40f08c2bcff6a675dbf632b91e0
Gitweb:     https://git.kernel.org/tip/3a5db0b108e0a40f08c2bcff6a675dbf632b91e0
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Mon, 27 Nov 2017 09:45:10 -0800
Committer:  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CommitDate: Tue, 5 Dec 2017 11:57:55 -0800

drivers/vhost: Remove now-redundant read_barrier_depends()

Because READ_ONCE() now implies read_barrier_depends(), the
read_barrier_depends() in next_desc() is now redundant.  This commit
therefore removes it and the related comments.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: <kvm@vger.kernel.org>
Cc: <virtualization@lists.linux-foundation.org>
Cc: <netdev@vger.kernel.org>
---
 drivers/vhost/vhost.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 33ac2b1..78b5940 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1877,12 +1877,7 @@ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
 		return -1U;
 
 	/* Check they're not leading us off end of descriptors. */
-	next = vhost16_to_cpu(vq, desc->next);
-	/* Make sure compiler knows to grab that: we don't want it changing! */
-	/* We will use the result as an index in an array, so most
-	 * architectures only need a compiler barrier here. */
-	read_barrier_depends();
-
+	next = vhost16_to_cpu(vq, READ_ONCE(desc->next));
 	return next;
 }
 

^ permalink raw reply related

* [tip:core/rcu] netfilter: Remove now-redundant smp_read_barrier_depends()
From: tip-bot for Paul E. McKenney @ 2018-01-03 16:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: davem, netfilter-devel, coreteam, kadlec, tglx, fw, mingo, hpa,
	netdev, pablo, paulmck

Commit-ID:  4be2b04e43fd3d8164d7aeb1808e47fbeb0c0de0
Gitweb:     https://git.kernel.org/tip/4be2b04e43fd3d8164d7aeb1808e47fbeb0c0de0
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Mon, 9 Oct 2017 12:09:04 -0700
Committer:  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CommitDate: Mon, 4 Dec 2017 10:52:57 -0800

netfilter: Remove now-redundant smp_read_barrier_depends()

READ_ONCE() now implies smp_read_barrier_depends(), which means that
the instances in arpt_do_table(), ipt_do_table(), and ip6t_do_table()
are now redundant.  This commit removes them and adjusts the comments.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netfilter-devel@vger.kernel.org>
Cc: <coreteam@netfilter.org>
Cc: <netdev@vger.kernel.org>
---
 net/ipv4/netfilter/arp_tables.c | 7 +------
 net/ipv4/netfilter/ip_tables.c  | 7 +------
 net/ipv6/netfilter/ip6_tables.c | 7 +------
 3 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index f88221a..d242c2d 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -202,13 +202,8 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
-	private = table->private;
+	private = READ_ONCE(table->private); /* Address dependency. */
 	cpu     = smp_processor_id();
-	/*
-	 * Ensure we load private-> members after we've fetched the base
-	 * pointer.
-	 */
-	smp_read_barrier_depends();
 	table_base = private->entries;
 	jumpstack  = (struct arpt_entry **)private->jumpstack[cpu];
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 4cbe5e8..46866cc 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -260,13 +260,8 @@ ipt_do_table(struct sk_buff *skb,
 	WARN_ON(!(table->valid_hooks & (1 << hook)));
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
-	private = table->private;
+	private = READ_ONCE(table->private); /* Address dependency. */
 	cpu        = smp_processor_id();
-	/*
-	 * Ensure we load private-> members after we've fetched the base
-	 * pointer.
-	 */
-	smp_read_barrier_depends();
 	table_base = private->entries;
 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index f06e250..ac1db84 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -282,12 +282,7 @@ ip6t_do_table(struct sk_buff *skb,
 
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
-	private = table->private;
-	/*
-	 * Ensure we load private-> members after we've fetched the base
-	 * pointer.
-	 */
-	smp_read_barrier_depends();
+	private = READ_ONCE(table->private); /* Address dependency. */
 	cpu        = smp_processor_id();
 	table_base = private->entries;
 	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];

^ permalink raw reply related

* RE: [PATCH net-next v2 4/4] net: mvpp2: 2500baseX support
From: Stefan Chulski @ 2018-01-03 16:11 UTC (permalink / raw)
  To: Andrew Lunn, Antoine Tenart
  Cc: davem@davemloft.net, kishon@ti.com,
	gregory.clement@free-electrons.com, linux@armlinux.org.uk,
	mw@semihalf.com, Yan Markman, thomas.petazzoni@free-electrons.com,
	miquel.raynal@free-electrons.com, Nadav Haklai,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20180103155311.GD3401@lunn.ch>



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Wednesday, January 03, 2018 5:53 PM
> To: Antoine Tenart <antoine.tenart@free-electrons.com>
> Cc: davem@davemloft.net; kishon@ti.com; gregory.clement@free-
> electrons.com; linux@armlinux.org.uk; mw@semihalf.com; Stefan Chulski
> <stefanc@marvell.com>; Yan Markman <ymarkman@marvell.com>;
> thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com;
> Nadav Haklai <nadavh@marvell.com>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH net-next v2 4/4] net: mvpp2: 2500baseX support
> 
> On Wed, Jan 03, 2018 at 04:32:27PM +0100, Antoine Tenart wrote:
> > Hi Andrew,
> >
> > On Wed, Jan 03, 2018 at 04:20:36PM +0100, Andrew Lunn wrote:
> > > > @@ -4612,6 +4616,9 @@ static int mvpp22_comphy_init(struct
> mvpp2_port *port)
> > > >  	case PHY_INTERFACE_MODE_1000BASEX:
> > > >  		mode = PHY_MODE_SGMII;
> > > >  		break;
> > > > +	case PHY_INTERFACE_MODE_2500BASEX:
> > > > +		mode = PHY_MODE_2500SGMII;
> > > > +		break;
> > >
> > > I think this is the source of confusion with linux/phy.h and
> > > linux/phy/phy.h.
> > >
> > > What would PHY_INTERFACE_MODE_2500SGMII use?
> > >
> > > Where is this all getting confused? Should the caller to
> > > mvpp22_comphy_init() actually be passing
> PHY_INTERFACE_MODE_2500SGMII?
> > > What is the MAC actually doing at this point? 2500BASEX or 2500SGMII?
> >
> > PHY_INTERFACE_MODE_2500BASEX is the PHY mode whereas
> > PHY_MODE_2500SGMII is the mode used by the common PHY driver (i.e. the
> > one configuring the serdes lanes).
> 
> > There's no PHY_INTERFACE_MODE_2500SGMII mode.
> 
> Hi Antoine
> 
> At the moment there is no PHY_INTERFACE_MODE_2500SGMII. However,
> there are some devices which can do 2.5G SGMII. So it will appear sometime.
> This piece of code then looks even stranger.
> 
> > Sure, I can add a comment to state this function is a translation
> > between the net PHY mode and the generic PHY mode (it's a n-to-1
> > translation).
> 
> I think from an API design point of view, passing PHY_MODE_2500BASEX to
> comphy makes more sense. That is what the MAC wants to do. How the
> comphy achieves that should be internal to the comphy.
> 
>        Andrew

COMPHY driver configure SERDES lanes to high speed SGMII(2500SGMII) mode and doesn't
care about mode used by physical layer(2500BASEX in this case).
There are possibility that MAC SERDES lane is routed on a backplane to another SERDES. 
I think we should use SGMII naming.

Stefan.

^ permalink raw reply

* Re: [PATCH v3 00/27] kill devm_ioremap_nocache
From: Arnd Bergmann @ 2018-01-03 16:14 UTC (permalink / raw)
  To: christophe leroy
  Cc: Yisheng Xie, open list:RALINK MIPS ARCHITECTURE, Ulf Hansson,
	Jakub Kicinski, Platform Driver, David Airlie, linux-wireless,
	alsa-devel, dri-devel, Linux Kernel Mailing List, David Howells,
	IDE-ML, Wim Van Sebroeck, Networking, linux-mtd, Daniel Vetter,
	Dan Williams, Jason Cooper, linux-rtc, Boris Brezillon,
	Mauro Carvalho Chehab
In-Reply-To: <c28ac0bc-8bd2-3dce-3167-8c0f80ec601e@c-s.fr>

On Sun, Dec 24, 2017 at 9:55 AM, christophe leroy
<christophe.leroy@c-s.fr> wrote:
> Le 23/12/2017 à 16:57, Guenter Roeck a écrit :
>>
>> On 12/23/2017 05:48 AM, Greg KH wrote:
>>>
>>> On Sat, Dec 23, 2017 at 06:55:25PM +0800, Yisheng Xie wrote:
>>>>
>>>> Hi all,
>>>>
>>>> When I tried to use devm_ioremap function and review related code, I
>>>> found
>>>> devm_ioremap and devm_ioremap_nocache is almost the same with each
>>>> other,
>>>> except one use ioremap while the other use ioremap_nocache.
>>>
>>>
>>> For all arches?  Really?  Look at MIPS, and x86, they have different
>>> functions.
>>>
>>
>> Both mips and x86 end up mapping the same function, but other arches
>> don't.
>> mn10300 is one where ioremap and ioremap_nocache are definitely different.
>
>
> alpha: identical
> arc: identical
> arm: identical
> arm64: identical
> cris: different        <==
> frv: identical
> hexagone: identical
> ia64: different        <==
> m32r: identical
> m68k: identical
> metag: identical
> microblaze: identical
> mips: identical
> mn10300: different     <==
> nios: identical
> openrisc: different    <==
> parisc: identical
> riscv: identical
> s390: identical
> sh: identical
> sparc: identical
> tile: identical
> um: rely on asm/generic
> unicore32: identical
> x86: identical
> asm/generic (no mmu): identical
>
> So 4 among all arches seems to have ioremap() and ioremap_nocache() being
> different.
>
> Could we have a define set by the 4 arches on which ioremap() and
> ioremap_nocache() are different, something like
> HAVE_DIFFERENT_IOREMAP_NOCACHE ?

I wonder if those are actually correct or not. What I found looking at
those architectures:

- openrisc only has one driver using ioremap (drivers/net/ethernet/ethoc.c)
  and that calls ioremap_nocache(). Presumably the authors went with the
  implementation for ioremap that made sense (using default attributes)
  rather than the one that actually works (using uncached).

- On ia64, ioremap() checks the attributes for the physical
  address based on firmware tables and then picks either cached
  or uncached mappings. ioremap_nocache() does the same but
  returns NULL instead of a cached mapping for anything that is
  not an MMIO address. Presumably it would just work to always
  call ioremap().

- mn10300 appears to be wrong, broken by David Howells in
  commit 83c2dc15ce82 ("MN10300: Handle cacheable PCI regions
  in pci_iomap()") for any driver calling ioremap() by to get uncached
  memory, if I understand the comment for commit 34f1bdee1910
   ("mn10300: switch to GENERIC_PCI_IOMAP") correctly: it
  seems that PCI addresses include the 'uncached' bit by default
  to get the right behavior, but dropping that bit breaks it.

- cris seems similar to mn10300 in hardware, using an phys address
  bit for uncached access. There are two callers in arch code that
  appear to rely on the cachable output of ioremap()
arch/cris/arch-v32/kernel/signal.c:
__ioremap_prot(virt_to_phys(data), PAGE_SIZE, PAGE_SIGNAL_TRAMPOLINE);
arch/cris/arch-v32/mm/intmem.c:         intmem_virtual =
ioremap(MEM_INTMEM_START + RESERVED_SIZE,
  It's unclear whether ioremap_nocache() actually has any users
  on cris, or whether it was only added for compile-time testing,
  and calling plain ioremap() would always work too (assuming we
  pass the phys address with the uncached-bit set).

       Arnd
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: general protection fault in fib6_add (2)
From: David Ahern @ 2018-01-03 16:16 UTC (permalink / raw)
  To: syzbot, davem, kuznet, linux-kernel, netdev, syzkaller-bugs,
	yoshfuji, weiwan
In-Reply-To: <001a1145e85edcbf2d0561d3075c@google.com>

[ +weiwan@google.com ]

On 1/2/18 3:58 PM, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 61233580f1f33c50e159c50e24d80ffd2ba2e06b
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0693adff3f83403dc5da@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> audit: type=1400 audit(1514594846.496:7): avc:  denied  { map } for 
> pid=3201 comm="syzkaller001778" path="/root/syzkaller001778299"
> dev="sda1" ino=16481
> scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> IPv6: Can't replace route, no match found
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 3201 Comm: syzkaller001778 Not tainted 4.15.0-rc5+ #151
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:fib6_add+0x736/0x15a0 net/ipv6/ip6_fib.c:1244
> RSP: 0018:ffff8801c7626a70 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000020 RCX: ffffffff84794465
> RDX: 0000000000000004 RSI: ffff8801d38935f0 RDI: 0000000000000282
> RBP: ffff8801c7626da0 R08: 1ffff10038ec4c35 R09: 0000000000000000
> R10: ffff8801c7626c68 R11: 0000000000000000 R12: 00000000fffffffe
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000009
> FS:  0000000000000000(0000) GS:ffff8801db200000(0063)
> knlGS:0000000009b70840
> CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> CR2: 0000000020be1000 CR3: 00000001d585a006 CR4: 00000000001606f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1006
>  ip6_route_multipath_add+0xd14/0x16c0 net/ipv6/route.c:3833
>  inet6_rtm_newroute+0xdc/0x160 net/ipv6/route.c:3957
>  rtnetlink_rcv_msg+0x733/0x1020 net/core/rtnetlink.c:4411
>  netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408
>  rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4423
>  netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline]
>  netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301
>  netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864
>  sock_sendmsg_nosec net/socket.c:636 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:646
>  sock_write_iter+0x31a/0x5d0 net/socket.c:915
>  call_write_iter include/linux/fs.h:1772 [inline]
>  do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
>  do_iter_write+0x154/0x540 fs/read_write.c:932
>  compat_writev+0x225/0x420 fs/read_write.c:1246
>  do_compat_writev+0x115/0x220 fs/read_write.c:1267
>  C_SYSC_writev fs/read_write.c:1278 [inline]
>  compat_SyS_writev+0x26/0x30 fs/read_write.c:1274
>  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
>  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
>  entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:125
> RIP: 0023:0xf7f1fc79
> RSP: 002b:00000000ffb61bfc EFLAGS: 00000203 ORIG_RAX: 0000000000000092
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000204aaff0
> RDX: 0000000000000001 RSI: 0000000000000167 RDI: 0000000000000010
> RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> Code: f1 a9 f6 fc e8 2c f2 e2 fc 85 c0 0f 85 d5 03 00 00 49 8d 5e 20 e8
> db a9 f6 fc 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> 02 00 0f 85 5a 0c 00 00 4d 39 ee 4d 8b 7e 20 0f 95 c0 4c
> RIP: fib6_add+0x736/0x15a0 net/ipv6/ip6_fib.c:1244 RSP: ffff8801c7626a70
> ---[ end trace 956c65133fcfff88 ]---
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: [PATCH] RDS: Heap OOB write in rds_message_alloc_sgs()
From: David Miller @ 2018-01-03 16:24 UTC (permalink / raw)
  To: simo.ghannam; +Cc: netdev
In-Reply-To: <5a4be135.1296df0a.90c9.baea@mx.google.com>

From: simo.ghannam@gmail.com
Date: Tue,  2 Jan 2018 19:44:34 +0000

> From: Mohamed Ghannam <simo.ghannam@gmail.com>
> 
> When args->nr_local is 0, nr_pages gets also 0 due some size
> calculation via rds_rm_size(), which is later used to allocate
> pages for DMA, this bug produces a heap Out-Of-Bound write access
> to a specific memory region.
> 
> Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox