* [PATCH net v3] tipc: check minimum bearer MTU
From: Michal Kubecek @ 2016-12-02 8:33 UTC (permalink / raw)
To: Jon Maloy, Ying Xue
Cc: David S. Miller, tipc-discussion, netdev, linux-kernel,
Ben Hutchings, Qian Zhang
Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.
As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.
Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
---
changes v2 to v3:
- rename tipc_check_mtu() helper to tipc_mtu_bad() and make the comment
about the function less confusing
changes v1 to v2:
- add missing "static" to tipc_check_mtu() helper declaration
- rather than blocking device MTU change to too low value, disable tipc
bearer if that happens (suggested by Ben Hutchings)
---
net/tipc/bearer.c | 11 +++++++++--
net/tipc/bearer.h | 13 +++++++++++++
net/tipc/udp_media.c | 5 +++++
3 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 975dbeb60ab0..52d74760fb68 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -421,6 +421,10 @@ int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b,
dev = dev_get_by_name(net, driver_name);
if (!dev)
return -ENODEV;
+ if (tipc_mtu_bad(dev, 0)) {
+ dev_put(dev);
+ return -EINVAL;
+ }
/* Associate TIPC bearer with L2 bearer */
rcu_assign_pointer(b->media_ptr, dev);
@@ -610,8 +614,6 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt,
if (!b)
return NOTIFY_DONE;
- b->mtu = dev->mtu;
-
switch (evt) {
case NETDEV_CHANGE:
if (netif_carrier_ok(dev))
@@ -624,6 +626,11 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt,
tipc_reset_bearer(net, b);
break;
case NETDEV_CHANGEMTU:
+ if (tipc_mtu_bad(dev, 0)) {
+ bearer_disable(net, b);
+ break;
+ }
+ b->mtu = dev->mtu;
tipc_reset_bearer(net, b);
break;
case NETDEV_CHANGEADDR:
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 78892e2f53e3..278ff7f616f9 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -39,6 +39,7 @@
#include "netlink.h"
#include "core.h"
+#include "msg.h"
#include <net/genetlink.h>
#define MAX_MEDIA 3
@@ -59,6 +60,9 @@
#define TIPC_MEDIA_TYPE_IB 2
#define TIPC_MEDIA_TYPE_UDP 3
+/* minimum bearer MTU */
+#define TIPC_MIN_BEARER_MTU (MAX_H_SIZE + INT_H_SIZE)
+
/**
* struct tipc_media_addr - destination address used by TIPC bearers
* @value: address info (format defined by media)
@@ -215,4 +219,13 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id,
void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
struct sk_buff_head *xmitq);
+/* check if device MTU is too low for tipc headers */
+static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int reserve)
+{
+ if (dev->mtu >= TIPC_MIN_BEARER_MTU + reserve)
+ return false;
+ netdev_warn(dev, "MTU too low for tipc bearer\n");
+ return true;
+}
+
#endif /* _TIPC_BEARER_H */
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index 78cab9c5a445..b58dc95f3d35 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -697,6 +697,11 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
udp_conf.use_udp_checksums = false;
ub->ifindex = dev->ifindex;
+ if (tipc_mtu_bad(dev, sizeof(struct iphdr) +
+ sizeof(struct udphdr))) {
+ err = -EINVAL;
+ goto err;
+ }
b->mtu = dev->mtu - sizeof(struct iphdr)
- sizeof(struct udphdr);
#if IS_ENABLED(CONFIG_IPV6)
--
2.10.2
^ permalink raw reply related
* Re: [PATCH iproute2 net-next 2/4] tc: flower: document SCTP ip_proto
From: Simon Horman @ 2016-12-02 8:36 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20161201105010.1f9ea04f@xeon-e3>
On Thu, Dec 01, 2016 at 10:50:10AM -0800, Stephen Hemminger wrote:
> On Tue, 29 Nov 2016 16:51:31 +0100
> Simon Horman <simon.horman@netronome.com> wrote:
>
> > Add SCTP ip_proto to help text and man page.
> >
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
> Sorry doesn't apply to current net-next branch in iproute2 git.
> Probably some of the other changes modified formatting.
Sorry about that, this file seems a bit busy these days.
I will rebase and repost.
^ permalink raw reply
* Re: stmmac: turn coalescing / NAPI off in stmmac
From: Giuseppe CAVALLARO @ 2016-12-02 8:39 UTC (permalink / raw)
To: Pavel Machek, David Miller, alexandre.torgue; +Cc: netdev, linux-kernel
In-Reply-To: <20161201224827.GA26301@amd>
On 12/1/2016 11:48 PM, Pavel Machek wrote:
>
>>> @@ -2771,12 +2771,8 @@ static netdev_features_t stmmac_fix_features(struct net_device *dev,
>>> features &= ~NETIF_F_CSUM_MASK;
>>>
>>> /* Disable tso if asked by ethtool */
>>> - if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
>>> - if (features & NETIF_F_TSO)
>>> - priv->tso = true;
>>> - else
>>> - priv->tso = false;
>>> - }
>>> + if ((priv->plat->tso_en) && (priv->dma_cap.tsoen))
>>> + priv->tso = !!(features & NETIF_F_TSO);
>>>
>>
>> Pavel, this really seems arbitrary.
>>
>> Whilst I really appreciate you're looking into this driver a bit because
>> of some issues you are trying to resolve, I'd like to ask that you not
>> start bombarding me with nit-pick cleanups here and there and instead
>> concentrate on the real bug or issue.
>
> Well, fixing clean code is easier than fixing strange code... Plus I
> was hoping to make the mainainers to talk. The driver is listed as
> supported after all.
Absolutely, I am available to support you, better I can.
So no problem to clarify strange or complex parts of the driver
and find/try new solutions to enhance it.
> Anyway... since you asked. I belive I have way to disable NAPI / tx
> coalescing in the driver. Unfortunately, locking is missing on the rx
> path, and needs to be extended to _irqsave variant on tx path.
I have just replied to a previous thread about that...
To be honest, I have in the box just a patch to fix lock on lpi
as I had discussed in this mailing list some week ago.
I will provide it asap.
>
> So patch currently looks like this (hand edited, can't be
> applied, got it working few hours ago). Does it look acceptable?
>
> I'd prefer this to go after the patch that pulls common code to single
> place, so that single place needs to be patched. Plus I guess I should
> add ifdefs, so that more advanced NAPI / tx coalescing code can be
> reactivated when it is fixed. Trivial fixes can go on top. Does that
> sound like a plan?
Hmm, what I find strange is that, just this code is running since a
long time on several platforms and Chip versions. No raise condition
have been found or lock protection problems (also proving look
mechanisms).
I'd like to avoid to break old compatibilities and having the same
performances but if there are some bugs I can support to review
and test. Indeed, this year we have added the 4.x but some parts
of the code (for TSO) should be self-contained. So I cannot image
regressions on common part of the code... I let Alex to do a double
check.
Pavel, I ask you sorry if I missed some problems so, if you can
(as D. Miller asked) to send us a cover letter + all patches
I will try to reply soon. I can do also some tests if you ask
me that! I could run on 3.x and 4.x but I cannot promise you
benchmarks.
> Which tree do you want patches against?
>
> https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/ ?
I think that bug fixing should be on top of net.git but I let Miller
to decide.
Best Regards
Peppe
>
> Best regards,
> Pavel
>
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 0b706a7..c0016c8 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1395,9 +1397,10 @@ static void __stmmac_tx_clean(struct stmmac_priv *priv)
>
> static void stmmac_tx_clean(struct stmmac_priv *priv)
> {
> - spin_lock(&priv->tx_lock);
> + unsigned long flags;
> + spin_lock_irqsave(&priv->tx_lock, flags);
> __stmmac_tx_clean(priv);
> - spin_unlock(&priv->tx_lock);
> + spin_unlock_irqrestore(&priv->tx_lock, flags);
> }
>
> static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv)
> @@ -1441,6 +1444,8 @@ static void stmmac_tx_err(struct stmmac_priv *priv)
> netif_wake_queue(priv->dev);
> }
>
> +static int stmmac_rx(struct stmmac_priv *priv, int limit);
> +
> /**
> * stmmac_dma_interrupt - DMA ISR
> * @priv: driver private structure
> @@ -1452,10 +1457,17 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
> {
> int status;
> int rxfifosz = priv->plat->rx_fifo_size;
> + unsigned long flags;
>
> status = priv->hw->dma->dma_interrupt(priv->ioaddr, &priv->xstats);
> if (likely((status & handle_rx)) || (status & handle_tx)) {
> + int r;
> + spin_lock_irqsave(&priv->tx_lock, flags);
> + r = stmmac_rx(priv, 999);
> + spin_unlock_irqrestore(&priv->tx_lock, flags);
> +#if 0
> if (likely(napi_schedule_prep(&priv->napi))) {
> //pr_err("napi: schedule\n");
> stmmac_disable_dma_irq(priv);
> @@ -1463,7 +1475,8 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
> } else
> pr_err("napi: schedule failed\n");
> #endif
> + stmmac_tx_clean(priv);
> }
> if (unlikely(status & tx_hard_error_bump_tc)) {
> /* Try to bump up the dma threshold on this failure */
> @@ -1638,7 +1651,7 @@ static void stmmac_tx_timer(unsigned long data)
> {
> struct stmmac_priv *priv = (struct stmmac_priv *)data;
>
> - stmmac_tx_clean(priv);
> + //stmmac_tx_clean(priv);
> }
>
> /**
> @@ -1990,6 +2003,8 @@ static void stmmac_xmit_common(struct sk_buff *skb, struct net_device *dev, int
> if (likely(priv->tx_coal_frames > priv->tx_count_frames)) {
> mod_timer(&priv->txtimer,
> STMMAC_COAL_TIMER(priv->tx_coal_timer));
> + priv->hw->desc->set_tx_ic(desc);
> + priv->xstats.tx_set_ic_bit++;
> } else {
> priv->tx_count_frames = 0;
> priv->hw->desc->set_tx_ic(desc);
> @@ -2038,8 +2053,9 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
> struct dma_desc *desc, *first, *mss_desc = NULL;
> u8 proto_hdr_len;
> int i;
> + unsigned long flags;
>
> - spin_lock(&priv->tx_lock);
> + spin_lock_irqsave(&priv->tx_lock, flags);
>
> /* Compute header lengths */
> proto_hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
> @@ -2052,7 +2068,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
> /* This is a hard error, log it. */
> pr_err("%s: Tx Ring full when queue awake\n", __func__);
> }
> - spin_unlock(&priv->tx_lock);
> + spin_unlock_irqrestore(&priv->tx_lock, flags);
> return NETDEV_TX_BUSY;
> }
>
> @@ -2168,11 +2184,11 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
> priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, priv->tx_tail_addr,
> STMMAC_CHAN0);
>
> - spin_unlock(&priv->tx_lock);
> + spin_unlock_irqrestore(&priv->tx_lock, flags);
> return NETDEV_TX_OK;
>
> dma_map_err:
> - spin_unlock(&priv->tx_lock);
> + spin_unlock_irqrestore(&priv->tx_lock, flags);
> dev_err(priv->device, "Tx dma map failed\n");
> dev_kfree_skb(skb);
> priv->dev->stats.tx_dropped++;
> @@ -2197,6 +2213,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
> struct dma_desc *desc, *first;
> unsigned int enh_desc;
> unsigned int des;
> + unsigned int flags;
>
> /* Manage oversized TCP frames for GMAC4 device */
> if (skb_is_gso(skb) && priv->tso) {
> @@ -2204,7 +2221,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
> return stmmac_tso_xmit(skb, dev);
> }
>
> - spin_lock(&priv->tx_lock);
> + spin_lock_irqsave(&priv->tx_lock, flags);
>
> if (unlikely(stmmac_tx_avail(priv) < nfrags + 1)) {
> if (!netif_queue_stopped(dev)) {
> @@ -2212,7 +2229,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
> /* This is a hard error, log it. */
> pr_err("%s: Tx Ring full when queue awake\n", __func__);
> }
> - spin_unlock(&priv->tx_lock);
> + spin_unlock_irqrestore(&priv->tx_lock, flags);
> return NETDEV_TX_BUSY;
> }
>
> @@ -2347,11 +2364,11 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
> priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, priv->tx_tail_addr,
> STMMAC_CHAN0);
>
> - spin_unlock(&priv->tx_lock);
> + spin_unlock_irqrestore(&priv->tx_lock, flags);
> return NETDEV_TX_OK;
>
> dma_map_err:
> - spin_unlock(&priv->tx_lock);
> + spin_unlock_irqrestore(&priv->tx_lock, flags);
> dev_err(priv->device, "Tx dma map failed\n");
> dev_kfree_skb(skb);
> priv->dev->stats.tx_dropped++;
> @@ -2634,7 +2651,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit)
> else
> skb->ip_summed = CHECKSUM_UNNECESSARY;
>
> - napi_gro_receive(&priv->napi, skb);
> + //napi_gro_receive(&priv->napi, skb);
> + netif_rx(skb);
>
> priv->dev->stats.rx_packets++;
> priv->dev->stats.rx_bytes += frame_len;
> @@ -2662,6 +2680,7 @@ static int stmmac_poll(struct napi_struct *napi, int budget)
> struct stmmac_priv *priv = container_of(napi, struct stmmac_priv, napi);
> int work_done;
>
> + BUG();
> priv->xstats.napi_poll++;
> stmmac_tx_clean(priv);
>
>
>
>
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Giuseppe CAVALLARO @ 2016-12-02 8:41 UTC (permalink / raw)
To: Pavel Machek, David Miller; +Cc: netdev, linux-kernel, lsanfil
In-Reply-To: <b6ac7a3c-6774-9568-1cc0-97074eaa928f@st.com>
+ Lino
On 12/2/2016 9:24 AM, Giuseppe CAVALLARO wrote:
>
> Hello
>
>
> On 11/24/2016 10:25 PM, Pavel Machek wrote:
>> Hi!
>>
>>>>> I'm debugging strange delays during transmit in stmmac driver. They
>>>>> seem to be present in 4.4 kernel (and older kernels, too). Workload is
>>>>> burst of udp packets being sent, pause, burst of udp packets, ...
>> ...
>>>> 4.9-rc6 still has the delays. With the
>>>>
>>>> #define STMMAC_COAL_TX_TIMER 1000
>>>> #define STMMAC_TX_MAX_FRAMES 2
>>>>
>>>> settings, delays go away, and driver still works. (It fails fairly
>>>> fast in 4.4). Good news. But the question still is: what is going on
>>>> there?
>>>
>>> 256 packets looks way too large for being a trigger for aborting the
>>> TX coalescing timer.
>>>
>>> Looking more deeply into this, the driver is using non-highres timers
>>> to implement the TX coalescing. This simply cannot work.
>>>
>>> 1 HZ, which is the lowest granularity of non-highres timers in the
>>> kernel, is variable as well as already too large of a delay for
>>> effective TX coalescing.
>>>
>>> I seriously think that the TX coalescing support should be ripped out
>>> or disabled entirely until it is implemented properly in this
>>> driver.
>>
>> Ok, I'd disable coalescing, but could not figure it out till. What is
>> generic way to do that?
>>
>> It seems only thing stmmac_tx_timer() does is calling
>> stmmac_tx_clean(), which reclaims tx_skbuff[] entries. It should be
>> possible to do that explicitely, without delay, but it stops working
>> completely if I attempt to do that.
>>
>> On a side note, stmmac_poll() does stmmac_enable_dma_irq() while
>> stmmac_dma_interrupt() disables interrupts. But I don't see any
>> protection between the two, so IMO it could race and we'd end up
>> without polling or interrupts...
>
>
> the idea behind the TX mitigation is to mix the interrupt and
> timer and this approach gave us real benefit in terms
> of performances and CPU usage (especially on SH4-200/SH4-300 platforms
> based).
> In the ring, some descriptors can raise the irq (according to a
> threshold) and set the IC bit. In this path, the NAPI poll will be
> scheduled.
> But there is a timer that can run (and we experimented that no high
> resolution is needed) to clear the tx resources.
> Concerning the lock protection, we had reviewed long time ago and
> IIRC, no raise condition should be present. Open to review it, again!
>
> So, welcome any other schema and testing on platforms supported.
>
>
> Hoping this summary can help.
>
> Peppe
>
>
>>
>> Thanks and best regards,
>> Pavel
>>
>
>
^ permalink raw reply
* Re: [PATCH 1/2] net: stmmac: avoid Camelcase naming
From: Giuseppe CAVALLARO @ 2016-12-02 8:44 UTC (permalink / raw)
To: Corentin Labbe, alexandre.torgue; +Cc: netdev, linux-kernel
In-Reply-To: <1480605581-13350-1-git-send-email-clabbe.montjoie@gmail.com>
Hello Corentin
patches look ok, I just wonder if you tested it in case of
the stmmac is connected to a transceiver. Let me consider it
a critical part of the driver to properly work.
Regards
Peppe
On 12/1/2016 4:19 PM, Corentin Labbe wrote:
> This patch simply rename regValue to value, like it was named in other
> mdio functions.
>
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> ---
> drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> index e3216e5..6796c28 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> @@ -83,14 +83,14 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg)
> unsigned int mii_data = priv->hw->mii.data;
>
> int data;
> - u16 regValue = (((phyaddr << 11) & (0x0000F800)) |
> + u16 value = (((phyaddr << 11) & (0x0000F800)) |
> ((phyreg << 6) & (0x000007C0)));
> - regValue |= MII_BUSY | ((priv->clk_csr & 0xF) << 2);
> + value |= MII_BUSY | ((priv->clk_csr & 0xF) << 2);
>
> if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
> return -EBUSY;
>
> - writel(regValue, priv->ioaddr + mii_address);
> + writel(value, priv->ioaddr + mii_address);
>
> if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
> return -EBUSY;
>
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Pavel Machek @ 2016-12-02 8:45 UTC (permalink / raw)
To: Giuseppe CAVALLARO, alexandre.torgue; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <b6ac7a3c-6774-9568-1cc0-97074eaa928f@st.com>
[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]
Hi!
> >>1 HZ, which is the lowest granularity of non-highres timers in the
> >>kernel, is variable as well as already too large of a delay for
> >>effective TX coalescing.
> >>
> >>I seriously think that the TX coalescing support should be ripped out
> >>or disabled entirely until it is implemented properly in this
> >>driver.
> >
> >Ok, I'd disable coalescing, but could not figure it out till. What is
> >generic way to do that?
> >
> >It seems only thing stmmac_tx_timer() does is calling
> >stmmac_tx_clean(), which reclaims tx_skbuff[] entries. It should be
> >possible to do that explicitely, without delay, but it stops working
> >completely if I attempt to do that.
> >
> >On a side note, stmmac_poll() does stmmac_enable_dma_irq() while
> >stmmac_dma_interrupt() disables interrupts. But I don't see any
> >protection between the two, so IMO it could race and we'd end up
> >without polling or interrupts...
>
>
> the idea behind the TX mitigation is to mix the interrupt and
> timer and this approach gave us real benefit in terms
> of performances and CPU usage (especially on SH4-200/SH4-300 platforms
> based).
Well, if you have a workload that sends and receive packets, it tends
to work ok, as you do tx_clean() in stmmac_poll(). My workload is not
like that -- it is "sending packets at 3MB/sec, receiving none". So
the stmmac_tx_timer() is rescheduled and rescheduled and rescheduled,
and then we run out of transmit descriptors, and then 40msec passes,
and then we clean them. Bad.
And that's why low-res timers do not cut it.
> In the ring, some descriptors can raise the irq (according to a
> threshold) and set the IC bit. In this path, the NAPI poll will be
> scheduled.
Not NAPI poll but stmmac_tx_timer(), right?
> But there is a timer that can run (and we experimented that no high
> resolution is needed) to clear the tx resources.
> Concerning the lock protection, we had reviewed long time ago and
> IIRC, no raise condition should be present. Open to review it,
> again!
Well, I certainly like the fact that we are talking :-).
And yes, I have some questions.
There's nothing that protect stmmac_poll() from running concurently
with stmmac_dma_interrupt(), right?
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* [[PATCH iproute2/net-next v2] 0/4] tc: flower: SCTP and other port fixes
From: Simon Horman @ 2016-12-02 8:45 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Simon Horman
Hi Stephen,
this short series:
* Makes some improvements to the documentation of flower;
A follow-up to recent work by Paul Blakey and myself.
* Corrects some errors introduced when SCTP port matching support was
recently added.
Changes since v2:
* Rebase
Simon Horman (4):
tc: flower: remove references to eth_type in manpage
tc: flower: document SCTP ip_proto
tc: flower: correct name of ip_proto parameter to flower_parse_port()
tc: flower: make use of flower_port_attr_type() safe and silent
man/man8/tc-flower.8 | 37 ++++++++++++++++++-------------------
tc/f_flower.c | 32 +++++++++++++++++---------------
2 files changed, 35 insertions(+), 34 deletions(-)
--
2.7.0.rc3.207.g0ac5344
^ permalink raw reply
* [[PATCH iproute2/net-next v2] 1/4] tc: flower: remove references to eth_type in manpage
From: Simon Horman @ 2016-12-02 8:45 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Simon Horman, Paul Blakey
In-Reply-To: <1480668321-20875-1-git-send-email-simon.horman@netronome.com>
Remove references to eth_type and ether_type (spelling error) in
the tc flower manpage.
Also correct formatting of boldface text with whitespace.
Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
man/man8/tc-flower.8 | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 16ef261797ab..56db42f983c1 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -103,8 +103,8 @@ or an unsigned 8bit value in hexadecimal format.
Match on source or destination IP address.
.I ADDRESS
must be a valid IPv4 or IPv6 address, depending on
-.BR ether_type ,
-which has to be specified in beforehand.
+.BR protocol
+option of tc filter.
.TP
.BI dst_port " NUMBER"
.TQ
@@ -114,16 +114,15 @@ Match on layer 4 protocol source or destination port number. Only available for
which has to be specified in beforehand.
.SH NOTES
As stated above where applicable, matches of a certain layer implicitly depend
-on the matches of the next lower layer. Precisely, layer one and two matches (
-.BR indev , dst_mac , src_mac " and " eth_type )
-have no dependency, layer three matches (
-.BR ip_proto , dst_ip " and " src_ip )
-require
-.B eth_type
-being set to either
-.BR ipv4 " or " ipv6 ,
-and finally layer four matches (
-.BR dst_port " and " src_port )
+on the matches of the next lower layer. Precisely, layer one and two matches
+(\fBindev\fR, \fBdst_mac\fR and \fBsrc_mac\fR)
+have no dependency, layer three matches
+(\fBip_proto\fR, \fBdst_ip\fR and \fBsrc_ip\fR)
+depend on the
+.B protocol
+option of tc filter
+and finally layer four matches
+(\fBdst_port\fR and \fBsrc_port\fR)
depend on
.B ip_proto
being set to either
--
2.7.0.rc3.207.g0ac5344
^ permalink raw reply related
* [[PATCH iproute2/net-next v2] 2/4] tc: flower: document SCTP ip_proto
From: Simon Horman @ 2016-12-02 8:45 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Simon Horman
In-Reply-To: <1480668321-20875-1-git-send-email-simon.horman@netronome.com>
Add SCTP ip_proto to help text and man page.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
man/man8/tc-flower.8 | 14 +++++++-------
tc/f_flower.c | 2 +-
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 56db42f983c1..a401293fed50 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -29,7 +29,7 @@ flower \- flow based traffic control filter
.IR PRIORITY " | "
.BR vlan_eth_type " { " ipv4 " | " ipv6 " | "
.IR ETH_TYPE " } | "
-.BR ip_proto " { " tcp " | " udp " | "
+.BR ip_proto " { " tcp " | " udp " | " sctp " | "
.IR IP_PROTO " } | { "
.BR dst_ip " | " src_ip " } { "
.IR ipv4_address " | " ipv6_address " } | { "
@@ -93,8 +93,8 @@ or an unsigned 16bit value in hexadecimal format.
.BI ip_proto " IP_PROTO"
Match on layer four protocol.
.I IP_PROTO
-may be either
-.BR tcp , udp
+may be
+.BR tcp ", " udp ", " sctp
or an unsigned 8bit value in hexadecimal format.
.TP
.BI dst_ip " ADDRESS"
@@ -110,8 +110,8 @@ option of tc filter.
.TQ
.BI src_port " NUMBER"
Match on layer 4 protocol source or destination port number. Only available for
-.BR ip_proto " values " udp " and " tcp ,
-which has to be specified in beforehand.
+.BR ip_proto " values " udp ", " tcp " and " sctp
+which have to be specified in beforehand.
.SH NOTES
As stated above where applicable, matches of a certain layer implicitly depend
on the matches of the next lower layer. Precisely, layer one and two matches
@@ -125,8 +125,8 @@ and finally layer four matches
(\fBdst_port\fR and \fBsrc_port\fR)
depend on
.B ip_proto
-being set to either
-.BR tcp " or " udp .
+being set to
+.BR tcp ", " udp " or " sctp.
.P
There can be only used one mask per one prio. If user needs to specify different
mask, he has to use different prio.
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 1555764b9996..dacf24faf00e 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -37,7 +37,7 @@ static void explain(void)
" vlan_ethtype [ ipv4 | ipv6 | ETH-TYPE ] |\n"
" dst_mac MAC-ADDR |\n"
" src_mac MAC-ADDR |\n"
- " ip_proto [tcp | udp | IP-PROTO ] |\n"
+ " ip_proto [tcp | udp | sctp | IP-PROTO ] |\n"
" dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n"
" src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n"
" dst_port PORT-NUMBER |\n"
--
2.7.0.rc3.207.g0ac5344
^ permalink raw reply related
* [[PATCH iproute2/net-next v2] 3/4] tc: flower: correct name of ip_proto parameter to flower_parse_port()
From: Simon Horman @ 2016-12-02 8:45 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Simon Horman
In-Reply-To: <1480668321-20875-1-git-send-email-simon.horman@netronome.com>
This corrects a typo.
Fixes: a1fb0d484237 ("tc: flower: Support matching on SCTP ports")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
tc/f_flower.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tc/f_flower.c b/tc/f_flower.c
index dacf24faf00e..d86ccdc3d3f0 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -160,15 +160,15 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
return 0;
}
-static int flower_port_attr_type(__u8 ip_port, bool is_src)
+static int flower_port_attr_type(__u8 ip_proto, bool is_src)
{
- if (ip_port == IPPROTO_TCP) {
+ if (ip_proto == IPPROTO_TCP) {
return is_src ? TCA_FLOWER_KEY_TCP_SRC :
TCA_FLOWER_KEY_TCP_DST;
- } else if (ip_port == IPPROTO_UDP) {
+ } else if (ip_proto == IPPROTO_UDP) {
return is_src ? TCA_FLOWER_KEY_UDP_SRC :
TCA_FLOWER_KEY_UDP_DST;
- } else if (ip_port == IPPROTO_SCTP) {
+ } else if (ip_proto == IPPROTO_SCTP) {
return is_src ? TCA_FLOWER_KEY_SCTP_SRC :
TCA_FLOWER_KEY_SCTP_DST;
} else {
@@ -177,14 +177,14 @@ static int flower_port_attr_type(__u8 ip_port, bool is_src)
}
}
-static int flower_parse_port(char *str, __u8 ip_port, bool is_src,
+static int flower_parse_port(char *str, __u8 ip_proto, bool is_src,
struct nlmsghdr *n)
{
int ret;
int type;
__be16 port;
- type = flower_port_attr_type(ip_port, is_src);
+ type = flower_port_attr_type(ip_proto, is_src);
if (type < 0)
return -1;
--
2.7.0.rc3.207.g0ac5344
^ permalink raw reply related
* [[PATCH iproute2/net-next v2] 4/4] tc: flower: make use of flower_port_attr_type() safe and silent
From: Simon Horman @ 2016-12-02 8:45 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Simon Horman
In-Reply-To: <1480668321-20875-1-git-send-email-simon.horman@netronome.com>
Make use of flower_port_attr_type() safe:
* flower_port_attr_type() may return a valid index into tb[] or -1.
Only access tb[] in the case of the former.
* Do not access null entries in tb[]
Also make usage silent - it is valid for ip_proto to be invalid,
for example if it is not specified as part of the filter.
Fixes: a1fb0d484237 ("tc: flower: Support matching on SCTP ports")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
tc/f_flower.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d86ccdc3d3f0..615e8f27bed2 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -162,19 +162,17 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
static int flower_port_attr_type(__u8 ip_proto, bool is_src)
{
- if (ip_proto == IPPROTO_TCP) {
+ if (ip_proto == IPPROTO_TCP)
return is_src ? TCA_FLOWER_KEY_TCP_SRC :
TCA_FLOWER_KEY_TCP_DST;
- } else if (ip_proto == IPPROTO_UDP) {
+ else if (ip_proto == IPPROTO_UDP)
return is_src ? TCA_FLOWER_KEY_UDP_SRC :
TCA_FLOWER_KEY_UDP_DST;
- } else if (ip_proto == IPPROTO_SCTP) {
+ else if (ip_proto == IPPROTO_SCTP)
return is_src ? TCA_FLOWER_KEY_SCTP_SRC :
TCA_FLOWER_KEY_SCTP_DST;
- } else {
- fprintf(stderr, "Illegal \"ip_proto\" for port\n");
+ else
return -1;
- }
}
static int flower_parse_port(char *str, __u8 ip_proto, bool is_src,
@@ -511,7 +509,8 @@ static void flower_print_ip_addr(FILE *f, char *name, __be16 eth_type,
static void flower_print_port(FILE *f, char *name, struct rtattr *attr)
{
- fprintf(f, "\n %s %d", name, ntohs(rta_getattr_u16(attr)));
+ if (attr)
+ fprintf(f, "\n %s %d", name, ntohs(rta_getattr_u16(attr)));
}
static int flower_print_opt(struct filter_util *qu, FILE *f,
@@ -520,6 +519,7 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
struct rtattr *tb[TCA_FLOWER_MAX + 1];
__be16 eth_type = 0;
__u8 ip_proto = 0xff;
+ int nl_type;
if (!opt)
return 0;
@@ -574,10 +574,12 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
tb[TCA_FLOWER_KEY_IPV6_SRC],
tb[TCA_FLOWER_KEY_IPV6_SRC_MASK]);
- flower_print_port(f, "dst_port",
- tb[flower_port_attr_type(ip_proto, false)]);
- flower_print_port(f, "src_port",
- tb[flower_port_attr_type(ip_proto, true)]);
+ nl_type = flower_port_attr_type(ip_proto, false);
+ if (nl_type >= 0)
+ flower_print_port(f, "dst_port", tb[nl_type]);
+ nl_type = flower_port_attr_type(ip_proto, true);
+ if (nl_type >= 0)
+ flower_print_port(f, "src_port", tb[nl_type]);
if (tb[TCA_FLOWER_FLAGS]) {
__u32 flags = rta_getattr_u32(tb[TCA_FLOWER_FLAGS]);
--
2.7.0.rc3.207.g0ac5344
^ permalink raw reply related
* Re: [PATCH net 0/7] net: stmmac: fix probe error handling and phydev leaks
From: Giuseppe CAVALLARO @ 2016-12-02 8:46 UTC (permalink / raw)
To: Johan Hovold, David S. Miller
Cc: Alexandre Torgue, Joachim Eastwood, Carlo Caione, Kevin Hilman,
Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai, netdev,
linux-kernel
In-Reply-To: <1480516195-27696-1-git-send-email-johan@kernel.org>
On 11/30/2016 3:29 PM, Johan Hovold wrote:
> This series fixes a number of issues with the stmmac-driver probe error
> handling, which for example left clocks enabled after probe failures.
>
> The final patch fixes a failure to deregister and free any fixed-link
> PHYs that were registered during probe on probe errors and on driver
> unbind. It also fixes a related of-node leak on late probe errors.
>
> This series depends on the of_phy_deregister_fixed_link() helper that
> was just merged to net.
>
> As mentioned earlier, one staging driver also suffers from a similar
> leak and can be fixed up once the above mentioned helper hits mainline.
>
> Note that these patches have only been compile tested.
For common and STi part:
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
thx
peppe
>
> Johan
>
>
> Johan Hovold (7):
> net: ethernet: stmmac: dwmac-socfpga: fix use-after-free on probe
> errors
> net: ethernet: stmmac: dwmac-sti: fix probe error path
> net: ethernet: stmmac: dwmac-rk: fix probe error path
> net: ethernet: stmmac: dwmac-generic: fix probe error path
> net: ethernet: stmmac: dwmac-meson8b: fix probe error path
> net: ethernet: stmmac: platform: fix outdated function header
> net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
>
> .../net/ethernet/stmicro/stmmac/dwmac-generic.c | 17 ++++++++--
> .../net/ethernet/stmicro/stmmac/dwmac-ipq806x.c | 25 ++++++++++----
> .../net/ethernet/stmicro/stmmac/dwmac-lpc18xx.c | 17 ++++++++--
> drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c | 23 ++++++++++---
> .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 32 +++++++++++++-----
> drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 21 +++++++++---
> .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 39 ++++++++++++++--------
> drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 23 ++++++++++---
> drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c | 19 ++++++++---
> drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c | 26 +++++++++++----
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 -
> .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 33 +++++++++++++++---
> .../net/ethernet/stmicro/stmmac/stmmac_platform.h | 2 ++
> 13 files changed, 215 insertions(+), 63 deletions(-)
>
^ permalink raw reply
* Re: [PATCH 1/4] bindings: net: stmmac: correct note about TSO
From: Giuseppe CAVALLARO @ 2016-12-02 8:48 UTC (permalink / raw)
To: Niklas Cassel, Rob Herring, Mark Rutland, David S. Miller,
Alexandre TORGUE, Phil Reid, Niklas Cassel, Eric Engestrom
Cc: netdev, devicetree, linux-kernel
In-Reply-To: <1479911066-19752-1-git-send-email-niklass@axis.com>
On 11/23/2016 3:24 PM, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@axis.com>
>
> snps,tso was previously placed under AXI BUS Mode parameters,
> suggesting that the property should be in the stmmac-axi-config node.
>
> TSO (TCP Segmentation Offloading) has nothing to do with AXI BUS Mode
> parameters, and the parser actually expects it to be in the root node,
> not in the stmmac-axi-config.
>
> Also added a note about snps,tso only being available on GMAC4 and newer.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> ---
> Documentation/devicetree/bindings/net/stmmac.txt | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt
> index 41b49e6075f5..b95ff998ba73 100644
> --- a/Documentation/devicetree/bindings/net/stmmac.txt
> +++ b/Documentation/devicetree/bindings/net/stmmac.txt
> @@ -1,7 +1,7 @@
> * STMicroelectronics 10/100/1000 Ethernet driver (GMAC)
>
> Required properties:
> -- compatible: Should be "snps,dwmac-<ip_version>" "snps,dwmac"
> +- compatible: Should be "snps,dwmac-<ip_version>", "snps,dwmac"
> For backwards compatibility: "st,spear600-gmac" is also supported.
> - reg: Address and length of the register set for the device
> - interrupt-parent: Should be the phandle for the interrupt controller
> @@ -50,6 +50,8 @@ Optional properties:
> - snps,ps-speed: port selection speed that can be passed to the core when
> PCS is supported. For example, this is used in case of SGMII
> and MAC2MAC connection.
> +- snps,tso: this enables the TSO feature otherwise it will be managed by
> + MAC HW capability register. Only for GMAC4 and newer.
> - AXI BUS Mode parameters: below the list of all the parameters to program the
> AXI register inside the DMA module:
> - snps,lpi_en: enable Low Power Interface
> @@ -62,8 +64,6 @@ Optional properties:
> - snps,fb: fixed-burst
> - snps,mb: mixed-burst
> - snps,rb: rebuild INCRx Burst
> - - snps,tso: this enables the TSO feature otherwise it will be managed by
> - MAC HW capability register.
> - mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus.
>
> Examples:
>
^ permalink raw reply
* Re: pull request (net-next): ipsec-next 2016-12-01
From: Steffen Klassert @ 2016-12-02 8:54 UTC (permalink / raw)
To: David Miller; +Cc: herbert, netdev
In-Reply-To: <20161201.114516.1572943956105738518.davem@davemloft.net>
On Thu, Dec 01, 2016 at 11:45:16AM -0500, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Thu, 1 Dec 2016 12:48:04 +0100
>
> > git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master
> >
> > for you to fetch changes up to 2258d927a691ddd2ab585adb17ea9f96e89d0638:
> >
> > xfrm: remove unused helper (2016-09-30 08:20:56 +0200)
>
> Hmmm, when I try to pull I don't get anything:
>
> [davem@dhcp-10-15-49-210 net-next]$ git pull --no-ff git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master
> >From git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
> * branch master -> FETCH_HEAD
> Already up-to-date.
Oh yes, my bad. You've got it already with my last pull request.
Sorry.
^ permalink raw reply
* Re: [PATCH 1/2] net: stmmac: avoid Camelcase naming
From: Corentin Labbe @ 2016-12-02 8:58 UTC (permalink / raw)
To: Giuseppe CAVALLARO; +Cc: alexandre.torgue, netdev, linux-kernel
In-Reply-To: <83a25cf8-907b-074e-754c-faa9e641c896@st.com>
On Fri, Dec 02, 2016 at 09:44:48AM +0100, Giuseppe CAVALLARO wrote:
> Hello Corentin
>
> patches look ok, I just wonder if you tested it in case of
> the stmmac is connected to a transceiver. Let me consider it
> a critical part of the driver to properly work.
>
> Regards
> Peppe
>
I tested it on a Cubieboard 2 (dwmac-sunxi).
What do you mean by "connected to a transceiver" ? an external PHY ?
Regards
^ permalink raw reply
* Re: [PATCH v2] cpsw: ethtool: add support for getting/setting EEE registers
From: Giuseppe CAVALLARO @ 2016-12-02 9:11 UTC (permalink / raw)
To: Florian Fainelli, Andrew Lunn, Yegor Yefremov
Cc: netdev, linux-omap@vger.kernel.org, Grygorii Strashko,
N, Mugunthan V, Rami Rosen, Fabrice GASNIER
In-Reply-To: <11549d93-4b51-25c8-e693-509048475bef@gmail.com>
Hi Florian
sorry for my delay.
On 11/24/2016 7:23 PM, Florian Fainelli wrote:
> +Peppe,
>
> Le 24/11/2016 à 07:38, Andrew Lunn a écrit :
>>> As for enabling advertising and correct working of cpsw do you mean it
>>> would be better to disable EEE in any PHY on cpsw initialization as
>>> long as cpsw doesn't provide support for EEE?
>>>
>>> We observe some strange behavior with our gigabit PHYs and a link
>>> partner in a EEE-capable unmanaged NetGear switch. Disabling
>>> advertising seems to help. Though we're still investigating the issue.
>>
>> Hi Florian
>>
>> Am i right in saying, a PHY should not advertise EEE until the MAC
>> driver calls phy_init_eee(), indicating the MAC supports EEE?
>
> You would think so, but I don't see how this could possibly work if that
> was not the case already, see below.
>
>>
>> If so, it looks like we need to change a few of the PHY drivers, in
>> particular, the bcm-*.c.
>
> The first part that bcm-phy-lib.c does is make sure that EEE is enabled
> such that this gets reflected in MDIO_PCS_EEE_ABLE, without this, we
> won't be able to pass the first test in phy_init_eee(). The second part
> is to advertise EEE such that this gets reflected in MDIO_AN_EEE_ADV,
> also to make sure that we can pass the second check in phy_init_eee().
>
> Now, looking at phy_init_eee(), and what stmmac does (and bcmgenet,
> copied after stmmac), we need to somehow, have EEE advertised for
> phy_init_eee() to succeed, prepare the MAC to support EEE, and finally
> conclude with a call to phy_ethtool_set_eee(), which writes to the
> MDIO_AN_EEE_ADV register, and concludes the EEE auto-negotiated process.
> Since we already have EEE advertised, we are essentially just checking
> that the EEE advertised settings and the LP advertised settings actually
> do match, so it sounds like the final call to phy_ethtool_set_eee() is
> potentially useless if the resolved advertised and link partner
> advertised settings already match...
>
> So it sounds like at least, the first time you try to initialize EEE, we
> should start with EEE not advertised, and then, if we have EEE enabled
> at some point, and we re-negotiate the link parameters, somehow
> phy_init_eee() does a right job for that.
>
> Peppe, any thoughts on this?
I share what you say.
In sum, the EEE management inside the stmmac is:
- the driver looks at own HW cap register if EEE is supported
(indeed the user could keep disable EEE if bugged on some HW
+ Alex, Fabrice: we had some patches for this to propose where we
called the phy_ethtool_set_eee to disable feature at phy
level
- then the stmmac asks PHY layer to understand if transceiver and
partners are EEE capable.
- If all matches the EEE is actually initialized.
the logic above should be respected when use ethtool, hmm, I will
check the stmmac_ethtool_op_set_eee asap.
Hoping this is useful
Regards
Peppe
^ permalink raw reply
* Re: [PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.
From: Jiri Benc @ 2016-12-02 9:25 UTC (permalink / raw)
To: Pravin Shelar
Cc: Jarno Rajahalme, Linux Kernel Network Developers, Eric Garver
In-Reply-To: <CAOrHB_BEQBbvDa2W0dJmmANsjZEiwx7GbTxo-7yirDB=5d9cqQ@mail.gmail.com>
On Thu, 1 Dec 2016 11:50:00 -0800, Pravin Shelar wrote:
> This is not changing any behavior compared to current OVS vlan checks.
> Single vlan header is not considered for MTU check.
It is changing it.
Consider the case when there's an interface with MTU 1500 forwarding to
an interface with MTU 1496. Obviously, full-sized vlan frames
ingressing on the first interface are not forwardable to the second
one. Yet, if the vlan tag is accelerated (and thus not counted in
skb->len), is_skb_forwardable happily returns true because of the check
len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
if (skb->len <= len)
Jiri
^ permalink raw reply
* Re: [PATCH 1/2] net: stmmac: avoid Camelcase naming
From: Giuseppe CAVALLARO @ 2016-12-02 9:25 UTC (permalink / raw)
To: Corentin Labbe; +Cc: alexandre.torgue, netdev, linux-kernel
In-Reply-To: <20161202085846.GB20128@Red>
On 12/2/2016 9:58 AM, Corentin Labbe wrote:
> On Fri, Dec 02, 2016 at 09:44:48AM +0100, Giuseppe CAVALLARO wrote:
>> Hello Corentin
>>
>> patches look ok, I just wonder if you tested it in case of
>> the stmmac is connected to a transceiver. Let me consider it
>> a critical part of the driver to properly work.
>>
>> Regards
>> Peppe
>>
>
> I tested it on a Cubieboard 2 (dwmac-sunxi).
> What do you mean by "connected to a transceiver" ? an external PHY ?
yes an external PHY. AFAIK, users have, sometime, a switch
with fixed link enabled.
Thx for your support and patches
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
>
> Regards
>
^ permalink raw reply
* Re: [RFC PATCH net-next v2] ipv6: implement consistent hashing for equal-cost multipath routing
From: David Lebrun @ 2016-12-02 9:29 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers
In-Reply-To: <CALx6S37NF6rqbAeu=9o8nFFT3Z88=OqcMPsm0rc9-vEPGTCYbw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 516 bytes --]
On 11/30/2016 05:04 AM, Tom Herbert wrote:
> This is a lot of code to make ECMP work better. Can you be more
> specific as to what the "issues" are? Assuming this is just the
> transient packet reorder that happens in one link flap I am wondering
> if this complexity is justified.
Unconsistent hashing is an issue when the load balancer is in front of
stateful backends, keeping per-flow state. Also, if neighbors are
constantly being added and removed, flows are constantly changing nexthops.
David
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 163 bytes --]
^ permalink raw reply
* Re: [patch net-next v3 11/12] mlxsw: spectrum_router: Request a dump of FIB tables during init
From: Ido Schimmel @ 2016-12-02 9:34 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
f.fainelli, alexander.h.duyck, kaber
In-Reply-To: <75d6f272-9320-8aed-4de2-3f69d9b3ea3c@stressinduktion.org>
On Fri, Dec 02, 2016 at 12:27:25AM +0100, Hannes Frederic Sowa wrote:
> I really like that. Would you mind adding this?
Yes. I'll send another version to Jiri today after testing and hopefully
we can submit today / tomorrow. I think Linus is still undecided about
-rc8 and I would like to get this in 4.10.
> >> Quick follow-up question: How can I quickly find out the hw limitations
> >> via the kernel api?
> >
> > That's a good question. Currently, you can't. However, we already have a
> > mechanism in place to read device's capabilities from the firmware and
> > we can (and should) expose some of them to the user. The best API for
> > that would be devlink, as it can represent the entire device as opposed
> > to only a port netdev like other tools.
> >
> > We're also working on making the pipeline more visible to the user, so
> > that it would be easier for users to understand and debug their
> > networks. I believe a colleague of mine (Matty) presented this during
> > the last netdev conference.
>
> Thanks, I will look it up!
Found it:
https://www.youtube.com/watch?v=gwzaKXWIelc&feature=youtu.be&list=PLrninrcyMo3IkTvpvM2LK6gn4NdbFhI0G&t=6892
^ permalink raw reply
* Re: [PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.
From: Jiri Benc @ 2016-12-02 9:42 UTC (permalink / raw)
To: Pravin Shelar
Cc: Jarno Rajahalme, Linux Kernel Network Developers, Eric Garver
In-Reply-To: <CAOrHB_CAR8G=BF6X_+HgkRRnb=Erc6CYpCL4EnJfDRe7qBR2vg@mail.gmail.com>
On Thu, 1 Dec 2016 12:31:09 -0800, Pravin Shelar wrote:
> On Wed, Nov 30, 2016 at 6:30 AM, Jiri Benc <jbenc@redhat.com> wrote:
> > I'm not opposed to changing this but I'm afraid it needs much deeper
> > review. Because with this in place, no core kernel functions that
> > depend on skb->protocol may be called from within openvswitch.
> >
> Can you give specific example where it does not work?
I can't, I haven't reviewed the usage. I'm just saying that the stack
does not expect skb->protocol being ETH_P_8021Q for e.g. IPv4 packets.
It may not be relevant for the calls used by openvswitch but we should
be sure about that. Especially defragmentation and conntrack is worth
looking at.
Again, I'm not saying this is wrong nor that there is an actual
problem. I'm just pointing out that openvswitch has different
expectations about skb wrt. vlans than the rest of the kernel and we
should be reasonably sure the behavior is correct when passing between
the two.
> skb-protocol value is set by the caller, so it should not be
> arbitrary. is it missing in any case?
It's not set exactly by the caller, because that's what this patch is
removing. It is set by whoever handed over the packet to openvswitch.
The point is we don't know *what* it is set to. It may as well be
ETH_P_8021Q, breaking the conditions here. It should not happen in
practice but still, it seems weird to depend on the fact that the
packet coming to ovs has never skb->protocol equal to ETH_P_8021Q nor
ETH_P_8021AD.
Jiri
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Giuseppe CAVALLARO @ 2016-12-02 9:43 UTC (permalink / raw)
To: Pavel Machek, alexandre.torgue; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20161202084511.GA32294@amd>
Hi Pavel
On 12/2/2016 9:45 AM, Pavel Machek wrote:
> Hi!
>
>>>> 1 HZ, which is the lowest granularity of non-highres timers in the
>>>> kernel, is variable as well as already too large of a delay for
>>>> effective TX coalescing.
>>>>
>>>> I seriously think that the TX coalescing support should be ripped out
>>>> or disabled entirely until it is implemented properly in this
>>>> driver.
>>>
>>> Ok, I'd disable coalescing, but could not figure it out till. What is
>>> generic way to do that?
>>>
>>> It seems only thing stmmac_tx_timer() does is calling
>>> stmmac_tx_clean(), which reclaims tx_skbuff[] entries. It should be
>>> possible to do that explicitely, without delay, but it stops working
>>> completely if I attempt to do that.
>>>
>>> On a side note, stmmac_poll() does stmmac_enable_dma_irq() while
>>> stmmac_dma_interrupt() disables interrupts. But I don't see any
>>> protection between the two, so IMO it could race and we'd end up
>>> without polling or interrupts...
>>
>>
>> the idea behind the TX mitigation is to mix the interrupt and
>> timer and this approach gave us real benefit in terms
>> of performances and CPU usage (especially on SH4-200/SH4-300 platforms
>> based).
>
> Well, if you have a workload that sends and receive packets, it tends
> to work ok, as you do tx_clean() in stmmac_poll(). My workload is not
> like that -- it is "sending packets at 3MB/sec, receiving none". So
> the stmmac_tx_timer() is rescheduled and rescheduled and rescheduled,
> and then we run out of transmit descriptors, and then 40msec passes,
> and then we clean them. Bad.
>
> And that's why low-res timers do not cut it.
in that case, I expect that the tuning of the driver could help you.
I mean, by using ethtool, it could be enough to set the IC bit on all
the descriptors. You should touch the tx_coal_frames.
Then you can use ethtool -S to monitor the status.
We had experimented this tuning on STB IP where just datagrams
had to send externally. To be honest, although we had seen
better results w/o any timer, we kept this approach enabled
because the timer was fast enough to cover our tests on SH4 boxes.
FYI, stmmac doesn't implement adaptive algo.
>
>> In the ring, some descriptors can raise the irq (according to a
>> threshold) and set the IC bit. In this path, the NAPI poll will be
>> scheduled.
>
> Not NAPI poll but stmmac_tx_timer(), right?
in the xmit according the the threshold the timer is started or the
interrupt is set inside the descriptor.
Then stmmac_tx_clean will be always called and, if you see the flow,
no irqlock protection is needed!
>
>> But there is a timer that can run (and we experimented that no high
>> resolution is needed) to clear the tx resources.
>> Concerning the lock protection, we had reviewed long time ago and
>> IIRC, no raise condition should be present. Open to review it,
>> again!
>
> Well, I certainly like the fact that we are talking :-).
>
> And yes, I have some questions.
>
> There's nothing that protect stmmac_poll() from running concurently
> with stmmac_dma_interrupt(), right?
This is not necessary.
Best Regards
peppe
>
> Best regards,
> Pavel
>
^ permalink raw reply
* RE: [PATCH] net:phy fix driver reference count error when attach and detach phy device
From: David Laight @ 2016-12-02 9:45 UTC (permalink / raw)
To: 'Mao Wenan', netdev@vger.kernel.org, f.fainelli@gmail.com,
dingtianhong@huawei.com
In-Reply-To: <1480501378-10172-1-git-send-email-maowenan@huawei.com>
From: Mao Wenan
> Sent: 30 November 2016 10:23
> The nic in my board use the phy dev from marvell, and the system will
> load the marvell phy driver automatically, but when I remove the phy
> drivers, the system immediately panic:
> Call trace:
> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
>
> there should be proper reference counting in place to avoid that.
> I found that phy_attach_direct() forgets to add phy device driver
> reference count, and phy_detach() forgets to subtract reference count.
> This patch is to fix this bug, after that panic is disappeared when remove
> marvell.ko
>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
> drivers/net/phy/phy_device.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 1a4bf8a..a7ec7c2 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> return -EIO;
> }
>
> + if (!try_module_get(d->driver->owner)) {
> + dev_err(&dev->dev, "failed to get the device driver module\n");
> + return -EIO;
> + }
If this is the phy code, what stops the phy driver being unloaded
before the try_module_get() obtains a reference.
If it isn't the phy driver then there ought to be a reference count obtained
when the phy driver is located (by whatever decides which phy driver to use).
Even if that code later releases its reference (it probably shouldn't on success)
then you can't fail to get an extra reference here.
> +
> get_device(d);
>
> /* Assume that if there is no driver, that it doesn't
> @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>
> error:
> put_device(d);
> + module_put(d->driver->owner);
Are those two in the wrong order ?
> module_put(bus->owner);
> return err;
> }
> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
> bus = phydev->mdio.bus;
>
> put_device(&phydev->mdio.dev);
> + module_put(phydev->mdio.dev.driver->owner);
> module_put(bus->owner);
Where is this code called from?
You can't call it from the phy driver because the driver can be unloaded
as soon as the last reference is removed.
At that point the code memory is freed.
> }
> EXPORT_SYMBOL(phy_detach);
> --
> 2.7.0
>
^ permalink raw reply
* [PATCH v2] netfilter: avoid warn and OOM killer on vmalloc call
From: Marcelo Ricardo Leitner @ 2016-12-02 9:46 UTC (permalink / raw)
To: andreyknvl; +Cc: fw, nhorman, netfilter-devel, netdev, linux-kernel
Andrey Konovalov reported that this vmalloc call is based on an
userspace request and that it's spewing traces, which may flood the logs
and cause DoS if abused.
Florian Westphal also mentioned that this call should not trigger OOM
killer.
This patch brings the vmalloc call in sync to kmalloc and disables the
warn trace on allocation failure and also disable OOM killer invocation.
Note, however, that under such stress situation, other places may
trigger OOM killer invocation.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
net/netfilter/x_tables.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index fc4977456c30e098197b4f987b758072c9cf60d9..dece525bf83a0098dad607fce665cd0bde228362 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -958,7 +958,9 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
if (!info) {
- info = vmalloc(sz);
+ info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
+ __GFP_NORETRY | __GFP_HIGHMEM,
+ PAGE_KERNEL);
if (!info)
return NULL;
}
--
2.9.3
^ permalink raw reply related
* Re: [patch] net: renesas: ravb: unintialized return value
From: Johan Hovold @ 2016-12-02 9:46 UTC (permalink / raw)
To: Dan Carpenter
Cc: Sergei Shtylyov, Johan Hovold, David S. Miller, Yoshihiro Kaneko,
Kazuya Mizuguchi, Simon Horman, Wolfram Sang, Andrew Lunn,
Philippe Reynes, Niklas Söderlund, Arnd Bergmann, netdev,
linux-renesas-soc, kernel-janitors
In-Reply-To: <20161201205744.GB10701@mwanda>
On Thu, Dec 01, 2016 at 11:57:44PM +0300, Dan Carpenter wrote:
> We want to set the other "err" variable here so that we can return it
> later. My version of GCC misses this issue but I caught it with a
> static checker.
> Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Thanks for catching this.
Reviewed-by: Johan Hovold <johan@kernel.org>
Johan
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox