Netdev List
 help / color / mirror / Atom feed
* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Christoph Lameter @ 2016-12-13 16:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: John Fastabend, Mike Rapoport, netdev@vger.kernel.org, linux-mm,
	Willem de Bruijn, Björn Töpel, Karlsson, Magnus,
	Alexander Duyck, Mel Gorman, Tom Herbert, Brenden Blanco,
	Tariq Toukan, Saeed Mahameed, Jesse Brandeburg, Kalman Meth,
	Vladislav Yasevich
In-Reply-To: <20161213171028.24dbf519@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3223 bytes --]

On Tue, 13 Dec 2016, Jesper Dangaard Brouer wrote:

> This is the early demux problem.  With the push-mode of registering
> memory, you need hardware steering support, for zero-copy support, as
> the software step happens after DMA engine have written into the memory.

Right. But we could fall back to software. Transfer to a kernel buffer and
then move stuff over. Not much of an improvment but it will make things
work.

> > The discussion here is a bit amusing since these issues have been
> > resolved a long time ago with the design of the RDMA subsystem. Zero
> > copy is already in wide use. Memory registration is used to pin down
> > memory areas. Work requests can be filed with the RDMA subsystem that
> > then send and receive packets from the registered memory regions.
> > This is not strictly remote memory access but this is a basic mode of
> > operations supported  by the RDMA subsystem. The mlx5 driver quoted
> > here supports all of that.
>
> I hear what you are saying.  I will look into a push-model, as it might
> be a better solution.
>  I will read up on RDMA + verbs and learn more about their API model.  I
> even plan to write a small sample program to get a feeling for the API,
> and maybe we can use that as a baseline for the performance target we
> can obtain on the same HW. (Thanks to Björn for already giving me some
> pointer here)

Great.

> > What is bad about RDMA is that it is a separate kernel subsystem.
> > What I would like to see is a deeper integration with the network
> > stack so that memory regions can be registred with a network socket
> > and work requests then can be submitted and processed that directly
> > read and write in these regions. The network stack should provide the
> > services that the hardware of the NIC does not suppport as usual.
>
> Interesting.  So you even imagine sockets registering memory regions
> with the NIC.  If we had a proper NIC HW filter API across the drivers,
> to register the steering rule (like ibv_create_flow), this would be
> doable, but we don't (DPDK actually have an interesting proposal[1])

Well doing this would mean adding some features and that also would at
best allow general support for zero copy direct to user space with a
fallback to software if the hardware is missing some feature.

> > The RX/TX ring in user space should be an additional mode of
> > operation of the socket layer. Once that is in place the "Remote
> > memory acces" can be trivially implemented on top of that and the
> > ugly RDMA sidecar subsystem can go away.
>
> I cannot follow that 100%, but I guess you are saying we also need a
> more efficient mode of handing over pages/packet to userspace (than
> going through the normal socket API calls).

A work request contains the user space address of the data to be sent
and/or received. The address must be in a registered memory region. This
is different from copying the packet into kernel data structures.

I think this can easily be generalized. We need support for registering
memory regions, submissions of work request and the processing of
completion requets. QP (queue-pair) processing is probably the basis for
the whole scheme that is used in multiple context these days.

^ permalink raw reply

* Re: [PATCHv2 1/5] sh_eth: add generic wake-on-lan support via magic packet
From: Niklas Söderlund @ 2016-12-13 16:27 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Sergei Shtylyov, Simon Horman, netdev@vger.kernel.org,
	Linux-Renesas, linux-pm
In-Reply-To: <CAMuHMdV6xrzjBQeVAWK3B_c077O6keWrH7Ndi5CX+2JLqjsL5A@mail.gmail.com>

Hi Geert,

Thanks for feedback and testing!

On 2016-12-13 14:03:52 +0100, Geert Uytterhoeven wrote:
> CC linux-pm

I think you forgot to CC linux-pm :-)

> 
> On Mon, Dec 12, 2016 at 5:09 PM, Niklas Söderlund
> <niklas.soderlund+renesas@ragnatech.se> wrote:
> > Add generic functionality to support Wake-on-Lan using MagicPacket which
> > are supported by at least a few versions of sh_eth. Only add
> > functionality for WoL, no specific sh_eth version are marked to support
> > WoL yet.
> >
> > WoL is enabled in the suspend callback by setting MagicPacket detection
> > and disabling all interrupts expect MagicPacket. In the resume path the
> > driver needs to reset the hardware to rearm the WoL logic, this prevents
> > the driver from simply restoring the registers and to take advantage of
> > that sh_eth was not suspended to reduce resume time. To reset the
> > hardware the driver close and reopens the device just like it would do
> > in a normal suspend/resume scenario without WoL enabled, but it both
> > close and open the device in the resume callback since the device needs
> > to be open for WoL to work.
> >
> > One quirk needed for WoL is that the module clock needs to be prevented
> > from being switched off by Runtime PM. To keep the clock alive the
> > suspend callback need to call clk_enable() directly to increase the
> > usage count of the clock. Then when Runtime PM decreases the clock usage
> > count it won't reach 0 and be switched off.
> >
> > Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
> 
> Thanks for the update!
> 
> I've verified WoL is working on r8a7791/koelsch and r8a7740/armadillo.
> 
> However, if I look at /sys/kernel/debug/wakeup_sources, "active_count" and
> "event_count" for the Ethernet device do not increase when using WoL, while
> they do for the GPIO keyboard when using the keyboard for wake up.
> So something seems to be missing from a bookkeeping point of view.

Cool, now I know why some net drivers call pm_wakeup_event() if the 
wakeup source was WoL :-) I added this to sh_eth and now the bookkeeping 
seems to work as you describe, "active_count" and "event_count" are 
incremented while waking up from WoL. I will include this in the next 
version, thanks for reporting I had no idea to check for this.

> 
> Interestingly, "wakeup_count" does not increase for both?
> Has this been broken recently?

I had a quick look at this and the 'wakeup_count' is increased in 
wakeup_source_report_event() which is in the call path from 
pm_wakeup_event().

pm_wakeup_event()
  __pm_wakeup_event()
    wakeup_source_report_event()

static void wakeup_source_report_event(struct wakeup_source *ws) 
{
        ws->event_count++;
        /* This is racy, but the counter is approximate anyway. */
        if (events_check_enabled)
                ws->wakeup_count++;

        if (!ws->active)
                wakeup_source_activate(ws);
}

So maybe 'wakeup_count' is not incremented due to the race with 
events_check_enabled? But then again I'm new to PM so I might miss 
something obvious. I'm also not sure if I can do anything in this series 
to improve the behavior of 'wakeup_count' for sh_eth?

> 
> > ---
> >  drivers/net/ethernet/renesas/sh_eth.c | 112 +++++++++++++++++++++++++++++++---
> >  drivers/net/ethernet/renesas/sh_eth.h |   3 +
> >  2 files changed, 107 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
> > index 05b0dc5..87640b9 100644
> > --- a/drivers/net/ethernet/renesas/sh_eth.c
> > +++ b/drivers/net/ethernet/renesas/sh_eth.c
> > @@ -2199,6 +2199,33 @@ static int sh_eth_set_ringparam(struct net_device *ndev,
> >         return 0;
> >  }
> >
> > +static void sh_eth_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
> > +{
> > +       struct sh_eth_private *mdp = netdev_priv(ndev);
> > +
> > +       wol->supported = 0;
> > +       wol->wolopts = 0;
> > +
> > +       if (mdp->cd->magic && mdp->clk) {
> > +               wol->supported = WAKE_MAGIC;
> > +               wol->wolopts = mdp->wol_enabled ? WAKE_MAGIC : 0;
> > +       }
> > +}
> > +
> > +static int sh_eth_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
> > +{
> > +       struct sh_eth_private *mdp = netdev_priv(ndev);
> > +
> > +       if (!mdp->cd->magic || !mdp->clk || wol->wolopts & ~WAKE_MAGIC)
> > +               return -EOPNOTSUPP;
> > +
> > +       mdp->wol_enabled = !!(wol->wolopts & WAKE_MAGIC);
> > +
> > +       device_set_wakeup_enable(&mdp->pdev->dev, mdp->wol_enabled);
> > +
> > +       return 0;
> > +}
> > +
> >  static const struct ethtool_ops sh_eth_ethtool_ops = {
> >         .get_regs_len   = sh_eth_get_regs_len,
> >         .get_regs       = sh_eth_get_regs,
> > @@ -2213,6 +2240,8 @@ static const struct ethtool_ops sh_eth_ethtool_ops = {
> >         .set_ringparam  = sh_eth_set_ringparam,
> >         .get_link_ksettings = sh_eth_get_link_ksettings,
> >         .set_link_ksettings = sh_eth_set_link_ksettings,
> > +       .get_wol        = sh_eth_get_wol,
> > +       .set_wol        = sh_eth_set_wol,
> >  };
> >
> >  /* network device open function */
> > @@ -3017,6 +3046,11 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
> >                 goto out_release;
> >         }
> >
> > +       /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
> > +       mdp->clk = devm_clk_get(&pdev->dev, NULL);
> > +       if (IS_ERR(mdp->clk))
> > +               mdp->clk = NULL;
> > +
> >         ndev->base_addr = res->start;
> >
> >         spin_lock_init(&mdp->lock);
> > @@ -3111,6 +3145,9 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
> >         if (ret)
> >                 goto out_napi_del;
> >
> > +       if (mdp->cd->magic && mdp->clk)
> > +               device_set_wakeup_capable(&pdev->dev, 1);
> > +
> >         /* print device information */
> >         netdev_info(ndev, "Base address at 0x%x, %pM, IRQ %d.\n",
> >                     (u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
> > @@ -3150,15 +3187,67 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
> >
> >  #ifdef CONFIG_PM
> >  #ifdef CONFIG_PM_SLEEP
> > +static int sh_eth_wol_setup(struct net_device *ndev)
> > +{
> > +       struct sh_eth_private *mdp = netdev_priv(ndev);
> > +
> > +       /* Only allow ECI interrupts */
> > +       synchronize_irq(ndev->irq);
> > +       napi_disable(&mdp->napi);
> > +       sh_eth_write(ndev, DMAC_M_ECI, EESIPR);
> > +
> > +       /* Enable MagicPacket */
> > +       sh_eth_modify(ndev, ECMR, 0, ECMR_PMDE);
> > +
> > +       /* Increased clock usage so device won't be suspended */
> > +       clk_enable(mdp->clk);
> > +
> > +       return enable_irq_wake(ndev->irq);
> > +}
> > +
> > +static int sh_eth_wol_restore(struct net_device *ndev)
> > +{
> > +       struct sh_eth_private *mdp = netdev_priv(ndev);
> > +       int ret;
> > +
> > +       napi_enable(&mdp->napi);
> > +
> > +       /* Disable MagicPacket */
> > +       sh_eth_modify(ndev, ECMR, ECMR_PMDE, 0);
> > +
> > +       /* The device need to be reset to restore MagicPacket logic
> > +        * for next wakeup. If we close and open the device it will
> > +        * both be reset and all registers restored. This is what
> > +        * happens during suspend and resume without WoL enabled.
> > +        */
> > +       ret = sh_eth_close(ndev);
> > +       if (ret < 0)
> > +               return ret;
> > +       ret = sh_eth_open(ndev);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       /* Restore clock usage count */
> > +       clk_disable(mdp->clk);
> > +
> > +       return disable_irq_wake(ndev->irq);
> > +}
> > +
> >  static int sh_eth_suspend(struct device *dev)
> >  {
> >         struct net_device *ndev = dev_get_drvdata(dev);
> > +       struct sh_eth_private *mdp = netdev_priv(ndev);
> >         int ret = 0;
> >
> > -       if (netif_running(ndev)) {
> > -               netif_device_detach(ndev);
> > +       if (!netif_running(ndev))
> > +               return 0;
> > +
> > +       netif_device_detach(ndev);
> > +
> > +       if (mdp->wol_enabled)
> > +               ret = sh_eth_wol_setup(ndev);
> > +       else
> >                 ret = sh_eth_close(ndev);
> > -       }
> >
> >         return ret;
> >  }
> > @@ -3166,14 +3255,21 @@ static int sh_eth_suspend(struct device *dev)
> >  static int sh_eth_resume(struct device *dev)
> >  {
> >         struct net_device *ndev = dev_get_drvdata(dev);
> > +       struct sh_eth_private *mdp = netdev_priv(ndev);
> >         int ret = 0;
> >
> > -       if (netif_running(ndev)) {
> > +       if (!netif_running(ndev))
> > +               return 0;
> > +
> > +       if (mdp->wol_enabled)
> > +               ret = sh_eth_wol_restore(ndev);
> > +       else
> >                 ret = sh_eth_open(ndev);
> > -               if (ret < 0)
> > -                       return ret;
> > -               netif_device_attach(ndev);
> > -       }
> > +
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       netif_device_attach(ndev);
> >
> >         return ret;
> >  }
> > diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
> > index d050f37..4ceed00 100644
> > --- a/drivers/net/ethernet/renesas/sh_eth.h
> > +++ b/drivers/net/ethernet/renesas/sh_eth.h
> > @@ -493,6 +493,7 @@ struct sh_eth_cpu_data {
> >         unsigned shift_rd0:1;   /* shift Rx descriptor word 0 right by 16 */
> >         unsigned rmiimode:1;    /* EtherC has RMIIMODE register */
> >         unsigned rtrate:1;      /* EtherC has RTRATE register */
> > +       unsigned magic:1;       /* EtherC has ECMR.PMDE and ECSR.MPD */
> >  };
> >
> >  struct sh_eth_private {
> > @@ -501,6 +502,7 @@ struct sh_eth_private {
> >         const u16 *reg_offset;
> >         void __iomem *addr;
> >         void __iomem *tsu_addr;
> > +       struct clk *clk;
> >         u32 num_rx_ring;
> >         u32 num_tx_ring;
> >         dma_addr_t rx_desc_dma;
> > @@ -529,6 +531,7 @@ struct sh_eth_private {
> >         unsigned no_ether_link:1;
> >         unsigned ether_link_active_low:1;
> >         unsigned is_opened:1;
> > +       unsigned wol_enabled:1;
> >  };
> >
> >  static inline void sh_eth_soft_swap(char *src, int len)
> > --
> > 2.10.2
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

-- 
Regards,
Niklas Söderlund

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Jesper Dangaard Brouer @ 2016-12-13 16:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: John Fastabend, Mike Rapoport, netdev@vger.kernel.org, linux-mm,
	Willem de Bruijn, Björn Töpel, Karlsson, Magnus,
	Alexander Duyck, Mel Gorman, Tom Herbert, Brenden Blanco,
	Tariq Toukan, Saeed Mahameed, Jesse Brandeburg, Kalman Meth,
	Vladislav Yasevich, brouer
In-Reply-To: <alpine.DEB.2.20.1612121200280.13607@east.gentwo.org>


On Mon, 12 Dec 2016 12:06:59 -0600 (CST) Christoph Lameter <cl@linux.com> wrote:
> On Mon, 12 Dec 2016, Jesper Dangaard Brouer wrote:
> 
> > Hmmm. If you can rely on hardware setup to give you steering and
> > dedicated access to the RX rings.  In those cases, I guess, the "push"
> > model could be a more direct API approach.  
> 
> If the hardware does not support steering then one should be able to
> provide those services in software.

This is the early demux problem.  With the push-mode of registering
memory, you need hardware steering support, for zero-copy support, as
the software step happens after DMA engine have written into the memory.

My model pre-VMA map all the pages in the RX ring (if zero-copy gets
enabled, by a single user).  The software step can filter and zero-copy
send packet-pages to the application/socket that requested this. The
disadvantage is all zero-copy application need to share this VMA
mapping.  This is solved by configuring HW filters into a RX-queue, and
then only attach your zero-copy application to that queue.


> > I was shooting for a model that worked without hardware support.
> > And then transparently benefit from HW support by configuring a HW
> > filter into a specific RX queue and attaching/using to that queue.  
> 
> The discussion here is a bit amusing since these issues have been
> resolved a long time ago with the design of the RDMA subsystem. Zero
> copy is already in wide use. Memory registration is used to pin down
> memory areas. Work requests can be filed with the RDMA subsystem that
> then send and receive packets from the registered memory regions.
> This is not strictly remote memory access but this is a basic mode of
> operations supported  by the RDMA subsystem. The mlx5 driver quoted
> here supports all of that.

I hear what you are saying.  I will look into a push-model, as it might
be a better solution.
 I will read up on RDMA + verbs and learn more about their API model.  I
even plan to write a small sample program to get a feeling for the API,
and maybe we can use that as a baseline for the performance target we
can obtain on the same HW. (Thanks to Björn for already giving me some
pointer here)


> What is bad about RDMA is that it is a separate kernel subsystem.
> What I would like to see is a deeper integration with the network
> stack so that memory regions can be registred with a network socket
> and work requests then can be submitted and processed that directly
> read and write in these regions. The network stack should provide the
> services that the hardware of the NIC does not suppport as usual.

Interesting.  So you even imagine sockets registering memory regions
with the NIC.  If we had a proper NIC HW filter API across the drivers,
to register the steering rule (like ibv_create_flow), this would be
doable, but we don't (DPDK actually have an interesting proposal[1])

 
> The RX/TX ring in user space should be an additional mode of
> operation of the socket layer. Once that is in place the "Remote
> memory acces" can be trivially implemented on top of that and the
> ugly RDMA sidecar subsystem can go away.
 
I cannot follow that 100%, but I guess you are saying we also need a
more efficient mode of handing over pages/packet to userspace (than
going through the normal socket API calls).


Appreciate your input, it challenged my thinking.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] https://rawgit.com/6WIND/rte_flow/master/rte_flow.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net v3] ibmveth: set correct gso_size and gso_type
From: Thomas Falcon @ 2016-12-13 15:49 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, brking, marcelo.leitner, pradeeps, jmaxwell37, zdai,
	eric.dumazet
In-Reply-To: <20161210.155647.1988856288960520451.davem@davemloft.net>

On 12/10/2016 02:56 PM, David Miller wrote:
> From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
> Date: Sat, 10 Dec 2016 12:39:48 -0600
>
>> v3: include a check for non-zero mss when calculating gso_segs
>>
>> v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch
>>     and make sure everyone is included on CC
> I already applied v1 which made it all the way even to Linus's
> tree.  So you'll have to send me relative fixups if there are
> things to fix or change since v1.
>
> You must always generate patches against the current 'net' tree.
>
Sorry about that.  Thank you for applying it on short notice.  I agree that using the TCP checksum field is not ideal, but it was a compromise with the VIOS team.  Hopefully, there will be a better way in the future.

Thanks again,

Tom 

^ permalink raw reply

* Re: [PATCH net] virtio-net: correctly enable multiqueue
From: David Miller @ 2016-12-13 15:39 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, tytso, nhorman, mst
In-Reply-To: <1481610185-12183-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Tue, 13 Dec 2016 14:23:05 +0800

> Commit 4490001029012539937ff02778fe6180613fa949 ("virtio-net: enable
> multiqueue by default") blindly set the affinity instead of queues
> during probe which can cause a mismatch of #queues between guest and
> host. This patch fixes it by setting queues.
> 
> Reported-by: Theodore Ts'o <tytso@mit.edu>
> Tested-by: Theodore Ts'o <tytso@mit.edu>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 49000102901 ("virtio-net: enable multiqueue by default")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied, thanks Jason.

^ permalink raw reply

* [PATCH/replace net-next 17/27] OVS: remove use of VLAN_TAG_PRESENT
From: Michał Mirosław @ 2016-12-13 15:31 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: open list:OPENVSWITCH, Jiri Benc
In-Reply-To: <e44219bc56d3e44aa0711c83c626adabf4c4ecd8.1481586602.git.mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

This is a minimal change to allow removing of VLAN_TAG_PRESENT.
It leaves OVS unable to use CFI bit, as fixing this would need
a deeper surgery involving userspace interface.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 net/openvswitch/actions.c      | 13 +++++++++----
 net/openvswitch/flow.c         |  2 +-
 net/openvswitch/flow.h         |  2 +-
 net/openvswitch/flow_netlink.c | 22 +++++++++++-----------
 4 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 514f7bc..fb41833 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -278,7 +278,7 @@ static int push_vlan(struct sk_buff *skb, struct sw_flow_key *key,
 		key->eth.vlan.tpid = vlan->vlan_tpid;
 	}
 	return skb_vlan_push(skb, vlan->vlan_tpid,
-			     ntohs(vlan->vlan_tci) & ~VLAN_TAG_PRESENT);
+			     ntohs(vlan->vlan_tci) & ~VLAN_CFI_MASK);
 }
 
 /* 'src' is already properly masked. */
@@ -704,8 +704,10 @@ static int ovs_vport_output(struct net *net, struct sock *sk, struct sk_buff *sk
 	__skb_dst_copy(skb, data->dst);
 	*OVS_CB(skb) = data->cb;
 	skb->inner_protocol = data->inner_protocol;
-	skb->vlan_tci = data->vlan_tci;
-	skb->vlan_proto = data->vlan_proto;
+	if (data->vlan_tci & VLAN_CFI_MASK)
+		__vlan_hwaccel_put_tag(skb, data->vlan_proto, data->vlan_tci & ~VLAN_CFI_MASK);
+	else
+		__vlan_hwaccel_clear_tag(skb);
 
 	/* Reconstruct the MAC header.  */
 	skb_push(skb, data->l2_len);
@@ -749,7 +751,10 @@ static void prepare_frag(struct vport *vport, struct sk_buff *skb,
 	data->cb = *OVS_CB(skb);
 	data->inner_protocol = skb->inner_protocol;
 	data->network_offset = orig_network_offset;
-	data->vlan_tci = skb->vlan_tci;
+	if (skb_vlan_tag_present(skb))
+		data->vlan_tci = skb_vlan_tag_get(skb) | VLAN_CFI_MASK;
+	else
+		data->vlan_tci = 0;
 	data->vlan_proto = skb->vlan_proto;
 	data->mac_proto = mac_proto;
 	data->l2_len = hlen;
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 08aa926..2bd4eac 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -347,7 +347,7 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 	int res;
 
 	if (skb_vlan_tag_present(skb)) {
-		key->eth.vlan.tci = htons(skb->vlan_tci);
+		key->eth.vlan.tci = htons(skb->vlan_tci) | htons(VLAN_CFI_MASK);
 		key->eth.vlan.tpid = skb->vlan_proto;
 	} else {
 		/* Parse outer vlan tag in the non-accelerated case. */
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index f61cae7..c8724ca 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -58,7 +58,7 @@ struct ovs_tunnel_info {
 
 struct vlan_head {
 	__be16 tpid; /* Vlan type. Generally 802.1q or 802.1ad.*/
-	__be16 tci;  /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
+	__be16 tci;  /* 0 if no VLAN, VLAN_CFI_MASK set otherwise. */
 };
 
 #define OVS_SW_FLOW_KEY_METADATA_SIZE			\
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d19044f..b72fcbd 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -853,9 +853,9 @@ static int validate_vlan_from_nlattrs(const struct sw_flow_match *match,
 	if (a[OVS_KEY_ATTR_VLAN])
 		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
 
-	if (!(tci & htons(VLAN_TAG_PRESENT))) {
+	if (!(tci & htons(VLAN_CFI_MASK))) {
 		if (tci) {
-			OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT bit set.",
+			OVS_NLERR(log, "%s TCI does not have VLAN_CFI_MASK bit set.",
 				  (inner) ? "C-VLAN" : "VLAN");
 			return -EINVAL;
 		} else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
@@ -876,9 +876,9 @@ static int validate_vlan_mask_from_nlattrs(const struct sw_flow_match *match,
 	__be16 tci = 0;
 	__be16 tpid = 0;
 	bool encap_valid = !!(match->key->eth.vlan.tci &
-			      htons(VLAN_TAG_PRESENT));
+			      htons(VLAN_CFI_MASK));
 	bool i_encap_valid = !!(match->key->eth.cvlan.tci &
-				htons(VLAN_TAG_PRESENT));
+				htons(VLAN_CFI_MASK));
 
 	if (!(key_attrs & (1 << OVS_KEY_ATTR_ENCAP))) {
 		/* Not a VLAN. */
@@ -902,8 +902,8 @@ static int validate_vlan_mask_from_nlattrs(const struct sw_flow_match *match,
 			  (inner) ? "C-VLAN" : "VLAN", ntohs(tpid));
 		return -EINVAL;
 	}
-	if (!(tci & htons(VLAN_TAG_PRESENT))) {
-		OVS_NLERR(log, "%s TCI mask does not have exact match for VLAN_TAG_PRESENT bit.",
+	if (!(tci & htons(VLAN_CFI_MASK))) {
+		OVS_NLERR(log, "%s TCI mask does not have exact match for VLAN_CFI_MASK bit.",
 			  (inner) ? "C-VLAN" : "VLAN");
 		return -EINVAL;
 	}
@@ -958,7 +958,7 @@ static int parse_vlan_from_nlattrs(struct sw_flow_match *match,
 	if (err)
 		return err;
 
-	encap_valid = !!(match->key->eth.vlan.tci & htons(VLAN_TAG_PRESENT));
+	encap_valid = !!(match->key->eth.vlan.tci & htons(VLAN_CFI_MASK));
 	if (encap_valid) {
 		err = __parse_vlan_from_nlattrs(match, key_attrs, true, a,
 						is_mask, log);
@@ -2444,7 +2444,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			vlan = nla_data(a);
 			if (!eth_type_vlan(vlan->vlan_tpid))
 				return -EINVAL;
-			if (!(vlan->vlan_tci & htons(VLAN_TAG_PRESENT)))
+			if (!(vlan->vlan_tci & htons(VLAN_CFI_MASK)))
 				return -EINVAL;
 			vlan_tci = vlan->vlan_tci;
 			break;
@@ -2460,7 +2460,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			/* Prohibit push MPLS other than to a white list
 			 * for packets that have a known tag order.
 			 */
-			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			if (vlan_tci & htons(VLAN_CFI_MASK) ||
 			    (eth_type != htons(ETH_P_IP) &&
 			     eth_type != htons(ETH_P_IPV6) &&
 			     eth_type != htons(ETH_P_ARP) &&
@@ -2472,7 +2472,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_MPLS:
-			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			if (vlan_tci & htons(VLAN_CFI_MASK) ||
 			    !eth_p_mpls(eth_type))
 				return -EINVAL;
 
@@ -2530,7 +2530,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		case OVS_ACTION_ATTR_POP_ETH:
 			if (mac_proto != MAC_PROTO_ETHERNET)
 				return -EINVAL;
-			if (vlan_tci & htons(VLAN_TAG_PRESENT))
+			if (vlan_tci & htons(VLAN_CFI_MASK))
 				return -EINVAL;
 			mac_proto = MAC_PROTO_ETHERNET;
 			break;
-- 
2.10.2

_______________________________________________
dev mailing list
dev@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

^ permalink raw reply related

* Re: [PATCH v2 2/2] net: rfkill: Add rfkill-any LED trigger
From: Johannes Berg @ 2016-12-13 15:18 UTC (permalink / raw)
  To: Michał Kępień, David S . Miller
  Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20161208073052.12988-2-kernel@kempniu.pl>

On Thu, 2016-12-08 at 08:30 +0100, Michał Kępień wrote:
> Add a new "global" (i.e. not per-rfkill device) LED trigger, rfkill-
> any,
> which may be useful on laptops with a single "radio LED" and multiple
> radio transmitters.  The trigger is meant to turn a LED on whenever
> there is at least one radio transmitter active and turn it off
> otherwise.
> 

Also applied, but I moved the discussion of the mutex into the recorded
commit log.

johannes

^ permalink raw reply

* Re: [PATCH v2 1/2] net: rfkill: Cleanup error handling in rfkill_init()
From: Johannes Berg @ 2016-12-13 15:16 UTC (permalink / raw)
  To: Michał Kępień, David S . Miller
  Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20161208073052.12988-1-kernel@kempniu.pl>

On Thu, 2016-12-08 at 08:30 +0100, Michał Kępień wrote:
> Use a separate label per error condition in rfkill_init() to make it
> a bit cleaner and easier to extend.

applied.

johannes

^ permalink raw reply

* Re: [PATCH net-next 17/27] OVS: remove assumptions about VLAN_TAG_PRESENT bit
From: Michał Mirosław @ 2016-12-13 15:14 UTC (permalink / raw)
  To: Jiri Benc; +Cc: open list:OPENVSWITCH, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161213114010.0fa2ddb6@griffin>

On Tue, Dec 13, 2016 at 11:40:10AM +0100, Jiri Benc wrote:
> On Tue, 13 Dec 2016 01:12:38 +0100 (CET), Michał Mirosław wrote:
> > @@ -850,20 +848,11 @@ static int validate_vlan_from_nlattrs(const struct sw_flow_match *match,
> >  		return -EINVAL;
> >  	}
> >  
> > -	if (a[OVS_KEY_ATTR_VLAN])
> > -		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
> > -
> > -	if (!(tci & htons(VLAN_TAG_PRESENT))) {
> > -		if (tci) {
> > -			OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT bit set.",
> > -				  (inner) ? "C-VLAN" : "VLAN");
> > -			return -EINVAL;
> > -		} else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> > -			/* Corner case for truncated VLAN header. */
> > -			OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
> > -				  (inner) ? "C-VLAN" : "VLAN");
> > -			return -EINVAL;
> > -		}
> > +	if (!a[OVS_KEY_ATTR_VLAN] && nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> > +		/* Corner case for truncated VLAN header. */
> > +		OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
> > +			(inner) ? "C-VLAN" : "VLAN");
> > +		return -EINVAL;
> 
> This looks wrong. The zero value of nla_get_be16(a[OVS_KEY_ATTR_VLAN])
> together with empty a[OVS_KEY_ATTR_ENCAP] means truncated VLAN header
> and this needs to be preserved.
> 
> In other words, you need to check also !nla_get_be16(a[OVS_KEY_ATTR_VLAN])
> here.
> 
> 
> Overall, this whole patch looks very suspicious from the very
> beginning. The xors used are strong indication that something is not
> right.
> 
> And indeed, you're breaking uAPI compatibility. Previously, the
> VLAG_TAG_PRESENT bit set in OVS_KEY_ATTR_VLAN caused the flow to match
> on packets with VLAN tag present. After this patch, it causes the flow
> to match only on those packets that have a certain value of
> VLAN_CFI_MASK in their VLAN tag (I haven't bother deciphering what
> value is checked after all these xors, it's as well possible that it
> will also match on packets _without_ any VLAN header).
> 
> You have to preserve the old meaning of VLAN_TAG_PRESENT in
> OVS_KEY_ATTR_VLAN. When doing flow lookups, VLAN_TAG_PRESENT must match
> on and only on packets that have VLAN tag present, irrespective of their
> VLAN_CFI_MASK.
> 
> If you want to introduce support for lookups on VLAN_CFI_MASK, you'll
> have to do that by introducing a new netlink attribute.

Hmm. In that case I'll just mask the CFI bit on entry to OVS.
Otherwise it's for me like to stab in a dark at a large beast
I know nothing about.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next 13/27] bridge: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2016-12-13 15:11 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: netdev, moderated list:ETHERNET BRIDGE,
	open list:NETFILTER ({IP, IP6, ARP, EB , NF}TABLES),
	Jozsef Kadlecsik (supporter:NETFILTER ({IP, IP6, ARP, EB, NF}TABLES)),
	Patrick McHardy (supporter:NETFILTER ({IP, IP6, ARP , EB, NF}TABLES)),
	Pablo Neira Ayuso (supporter:NETFILTER ({IP , IP6, ARP, EB, NF}TABLES))
In-Reply-To: <f5078470-0fc0-cfcd-36df-3ebb380b547a@cogentembedded.com>

On Tue, Dec 13, 2016 at 03:59:46PM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 12/13/2016 3:12 AM, Michał Mirosław wrote:
> 
> > This removes assumption than vlan_tci != 0 when tag is present.
> > 
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  net/bridge/br_netfilter_hooks.c | 14 ++++++++------
> >  net/bridge/br_private.h         |  2 +-
> >  net/bridge/br_vlan.c            |  6 +++---
> >  3 files changed, 12 insertions(+), 10 deletions(-)
> > 
> > diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> > index b12501a..2cc0747 100644
> > --- a/net/bridge/br_netfilter_hooks.c
> > +++ b/net/bridge/br_netfilter_hooks.c
> [...]
> > @@ -749,8 +747,12 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
> > 
> >  		data = this_cpu_ptr(&brnf_frag_data_storage);
> > 
> > -		data->vlan_tci = skb->vlan_tci;
> > -		data->vlan_proto = skb->vlan_proto;
> > +		if (skb_vlan_tag_present(skb)) {
> > +			data->vlan_tci = skb->vlan_tci;
> > +			data->vlan_proto = skb->vlan_proto;
> > +		} else
> > +			data->vlan_proto = 0;
> 
>    CodingStyle: should use {} in all branches of *if* if at least one branch
> has {}.
> 
> [...]
> > diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> > index b6de4f4..ef94664 100644
> > --- a/net/bridge/br_vlan.c
> > +++ b/net/bridge/br_vlan.c
> 
> > @@ -444,8 +444,8 @@ static bool __allowed_ingress(const struct net_bridge *br,
> >  			__vlan_hwaccel_put_tag(skb, br->vlan_proto, pvid);
> >  		else
> >  			/* Priority-tagged Frame.
> > -			 * At this point, We know that skb->vlan_tci had
> > -			 * VLAN_TAG_PRESENT bit and its VID field was 0x000.
> > +			 * At this point, We know that skb->vlan_tci VID
> 
>    s/We/we/.
> 
> > +			 * field was 0x000.
> 
>    Simply 0, maybe?

Thanks, fixed.
-- Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Andrew Lunn @ 2016-12-13 15:09 UTC (permalink / raw)
  To: Volodymyr Bendiuga
  Cc: Florian Fainelli, Vivien Didelot, Volodymyr Bendiuga, netdev,
	Volodymyr Bendiuga
In-Reply-To: <CAMr9LbqyvrRYe_m34Ufzxkf2Vef3LbMQiuoiwVX0mK=hW6euyA@mail.gmail.com>

> Another interesting thing is that fdb dump operation is slow
> too. The more entries you have the slower it gets. We have
> implemented caching with timing there as well (I have not submitted
> such patch yet), which makes fdb dump much more pleasant in user
> space, the operation happens instantly, where regular fdb dump op
> might take few seconds.

I know some people won't like the added MDIO transactions, because
they are doing PTP and want low jitter when talking to the switch and
PHYs. As things are now, once the system is configured, the MDIO bus
is idle. Hence there is very low jitter for PTP. For your cache to be
useful, i assume you are refreshing it at regular intervals. So you
are adding a constant load, which i guess 99.9% of the timer is never
used because there is no user looking at the table, where as the PTP
load is used to keep the clocks synchronised.

   Andrew

^ permalink raw reply

* [RFC PATCH v3] audit: use proper refcount locking on audit_sock
From: Richard Guy Briggs @ 2016-12-13 15:03 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-audit
  Cc: Richard Guy Briggs, dvyukov, xiyou.wangcong, edumazet, eparis,
	pmoore, sgrubb
In-Reply-To: <20161212100215.GA1305@madcap2.tricolour.ca>

Resetting audit_sock appears to be racy.

audit_sock was being copied and dereferenced without using a refcount on
the source sock.

Bump the refcount on the underlying sock when we store a refrence in
audit_sock and release it when we reset audit_sock.  audit_sock
modification needs the audit_cmd_mutex.

See: https://lkml.org/lkml/2016/11/26/232

Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
<xiyou.wangcong@gmail.com> on ideas how to fix it.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
There has been a lot of change in the audit code that is about to go
upstream to address audit queue issues.  This patch is based on the
source tree: git://git.infradead.org/users/pcmoore/audit#next
---
 kernel/audit.c |   28 +++++++++++++++++++++++-----
 1 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index f20eee0..3bb4126 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -446,14 +446,19 @@ static void kauditd_retry_skb(struct sk_buff *skb)
  * Description:
  * Break the auditd/kauditd connection and move all the records in the retry
  * queue into the hold queue in case auditd reconnects.
+ * The audit_cmd_mutex must be held when calling this function.
  */
 static void auditd_reset(void)
 {
 	struct sk_buff *skb;
 
 	/* break the connection */
+	if (audit_sock) {
+		sock_put(audit_sock);
+		audit_sock = NULL;
+	}
 	audit_pid = 0;
-	audit_sock = NULL;
+	audit_nlk_portid = 0;
 
 	/* flush all of the retry queue to the hold queue */
 	while ((skb = skb_dequeue(&audit_retry_queue)))
@@ -579,7 +584,9 @@ static int kauditd_thread(void *dummy)
 
 				auditd = 0;
 				if (AUDITD_BAD(rc, reschedule)) {
+					mutex_lock(&audit_cmd_mutex);
 					auditd_reset();
+					mutex_unlock(&audit_cmd_mutex);
 					reschedule = 0;
 				}
 			} else
@@ -594,7 +601,9 @@ static int kauditd_thread(void *dummy)
 				auditd = 0;
 				if (AUDITD_BAD(rc, reschedule)) {
 					kauditd_hold_skb(skb);
+					mutex_lock(&audit_cmd_mutex);
 					auditd_reset();
+					mutex_unlock(&audit_cmd_mutex);
 					reschedule = 0;
 				} else
 					/* temporary problem (we hope), queue
@@ -623,7 +632,9 @@ quick_loop:
 				if (rc) {
 					auditd = 0;
 					if (AUDITD_BAD(rc, reschedule)) {
+						mutex_lock(&audit_cmd_mutex);
 						auditd_reset();
+						mutex_unlock(&audit_cmd_mutex);
 						reschedule = 0;
 					}
 
@@ -1010,11 +1021,16 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 			}
 			if (audit_enabled != AUDIT_OFF)
 				audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
-			audit_pid = new_pid;
-			audit_nlk_portid = NETLINK_CB(skb).portid;
-			audit_sock = skb->sk;
-			if (!new_pid)
+			if (new_pid) {
+				if (audit_sock)
+					sock_put(audit_sock);
+				audit_pid = new_pid;
+				audit_nlk_portid = NETLINK_CB(skb).portid;
+				sock_hold(skb->sk);
+				audit_sock = skb->sk;
+			} else {
 				auditd_reset();
+			}
 			wake_up_interruptible(&kauditd_wait);
 		}
 		if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
@@ -1283,8 +1299,10 @@ static void __net_exit audit_net_exit(struct net *net)
 {
 	struct audit_net *aunet = net_generic(net, audit_net_id);
 	struct sock *sock = aunet->nlsk;
+	mutex_lock(&audit_cmd_mutex);
 	if (sock == audit_sock)
 		auditd_reset();
+	mutex_unlock(&audit_cmd_mutex);
 
 	RCU_INIT_POINTER(aunet->nlsk, NULL);
 	synchronize_net();
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH v2] audit: use proper refcount locking on audit_sock
From: Richard Guy Briggs @ 2016-12-13 15:01 UTC (permalink / raw)
  To: Paul Moore
  Cc: netdev, linux-kernel, linux-audit, edumazet, xiyou.wangcong,
	dvyukov
In-Reply-To: <20161213051028.GE1305@madcap2.tricolour.ca>

On 2016-12-13 00:10, Richard Guy Briggs wrote:
> On 2016-12-12 15:18, Paul Moore wrote:
> > On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > > Resetting audit_sock appears to be racy.
> > >
> > > audit_sock was being copied and dereferenced without using a refcount on
> > > the source sock.
> > >
> > > Bump the refcount on the underlying sock when we store a refrence in
> > > audit_sock and release it when we reset audit_sock.  audit_sock
> > > modification needs the audit_cmd_mutex.
> > >
> > > See: https://lkml.org/lkml/2016/11/26/232
> > >
> > > Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
> > > <xiyou.wangcong@gmail.com> on ideas how to fix it.
> > >
> > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > ---
> > > There has been a lot of change in the audit code that is about to go
> > > upstream to address audit queue issues.  This patch is based on the
> > > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > > ---
> > >  kernel/audit.c |   34 ++++++++++++++++++++++++++++------
> > >  1 files changed, 28 insertions(+), 6 deletions(-)
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index f20eee0..439f7f3 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> > >         struct sk_buff *skb;
> > >
> > >         /* break the connection */
> > > +       sock_put(audit_sock);
> > >         audit_pid = 0;
> > > +       audit_nlk_portid = 0;
> > >         audit_sock = NULL;
> > >
> > >         /* flush all of the retry queue to the hold queue */
> > > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> > >         if (rc >= 0) {
> > >                 consume_skb(skb);
> > >                 rc = 0;
> > > +       } else {
> > > +               if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> > 
> > I dislike the way you wrote this because instead of simply looking at
> > this to see if it correct I need to sort out all the bits and find out
> > if there are other error codes that could run afoul of this check ...
> > make it simple, e.g. (rc == -ENOMEM || rc == -EPERM || ...).
> > Actually, since EPERM is 1, -EPERM (-1 in two's compliment is
> > 0xffffffff) is going to cause this to be true for pretty much any
> > value of rc, yes?
> 
> Yes, you are correct.  We need there a logical or on the results of each
> comparison to the return code rather than bit-wise or-ing the result
> codes together first to save a step.
> 
> > > +                       mutex_lock(&audit_cmd_mutex);
> > > +                       auditd_reset();
> > > +                       mutex_unlock(&audit_cmd_mutex);
> > > +               }
> > 
> > The code in audit#next handles netlink_unicast() errors in
> > kauditd_thread() and you are adding error handling code here in
> > kauditd_send_unicast_skb() ... that's messy.  I don't care too much
> > where the auditd_reset() call is made, but let's only do it in one
> > function; FWIW, I originally put the error handling code in
> > kauditd_thread() because there was other error handling code that
> > needed to done in that scope so it resulted in cleaner code.
> 
> Hmmm, I seem to remember it not returning the return code and I thought
> I had changed it to do so, but I see now that it was already there.
> Agreed, I needlessly duplicated that error handling.
> 
> > Related, I see you are now considering ENOMEM to be a fatal condition,
> > that differs from the AUDITD_BAD macro in kauditd_thread(); this
> > difference needs to be reconciled.
> 
> Also correct about -EPERM now that I check back to the intent of commit
> 32a1dbaece7e ("audit: try harder to send to auditd upon netlink
> failure")
> 
> > Finally, you should update the comment header block for auditd_reset()
> > that it needs to be called with the audit_cmd_mutex held.
> 
> Yup.
> 
> > > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > >                                 return -EACCES;
> > >                         }
> > >                         if (audit_pid && new_pid &&
> > > -                           audit_replace(requesting_pid) != -ECONNREFUSED) {
> > > +                           (audit_replace(requesting_pid) & (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> > 
> > Do we simply want to treat any error here as fatal, and not just
> > ECONN/EPERM/ENOMEM?  If not, let's come up with a single macro to
> > handle the fatal netlink_unicast() return codes so we have some chance
> > to keep things consistent in the future.
> 
> I'll work through this before I post another patch...

Ok, I've gone back to look at the reasoning in commit 133e1e5acd4a
("audit: stop an old auditd being starved out by a new auditd") which
suggests only ECONNREFUSED can cause an audit_pid replace, so I've
returned it to its original state.

I'll post another tested patch, but I'm still not that happy that it
does not proactively reset audit_pid, audit_nlk_portid and audit_sock
when auditd's socket has a problem.  I'll leave the test run overnight.

> > paul moore
> 
> - RGB

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH v2] audit: use proper refcount locking on audit_sock
From: Richard Guy Briggs @ 2016-12-13 14:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, LKML, linux-audit, Dmitry Vyukov,
	Eric Dumazet, Eric Paris, Paul Moore, sgrubb
In-Reply-To: <CAM_iQpXAZOOn7G-EbEy1T11w6uoqwx5M8jVt=iHfOj4TJYsqpA@mail.gmail.com>

On 2016-12-12 15:58, Cong Wang wrote:
> On Mon, Dec 12, 2016 at 2:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > Resetting audit_sock appears to be racy.
> >
> > audit_sock was being copied and dereferenced without using a refcount on
> > the source sock.
> >
> > Bump the refcount on the underlying sock when we store a refrence in
> > audit_sock and release it when we reset audit_sock.  audit_sock
> > modification needs the audit_cmd_mutex.
> >
> > See: https://lkml.org/lkml/2016/11/26/232
> >
> > Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
> > <xiyou.wangcong@gmail.com> on ideas how to fix it.
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> > There has been a lot of change in the audit code that is about to go
> > upstream to address audit queue issues.  This patch is based on the
> > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > ---
> >  kernel/audit.c |   34 ++++++++++++++++++++++++++++------
> >  1 files changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index f20eee0..439f7f3 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> >         struct sk_buff *skb;
> >
> >         /* break the connection */
> > +       sock_put(audit_sock);
> 
> Why audit_sock can't be NULL here?

Fixed.

> >         audit_pid = 0;
> > +       audit_nlk_portid = 0;
> >         audit_sock = NULL;
> >
> >         /* flush all of the retry queue to the hold queue */
> > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> >         if (rc >= 0) {
> >                 consume_skb(skb);
> >                 rc = 0;
> > +       } else {
> > +               if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> 
> Are these errno's bits??

No, I've fixed this silly error.

> > +                       mutex_lock(&audit_cmd_mutex);
> > +                       auditd_reset();
> > +                       mutex_unlock(&audit_cmd_mutex);
> > +               }
> >         }
> >
> >         return rc;
> > @@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
> >
> >                                 auditd = 0;
> >                                 if (AUDITD_BAD(rc, reschedule)) {
> > +                                       mutex_lock(&audit_cmd_mutex);
> >                                         auditd_reset();
> > +                                       mutex_unlock(&audit_cmd_mutex);
> >                                         reschedule = 0;
> >                                 }
> >                         } else
> > @@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
> >                                 auditd = 0;
> >                                 if (AUDITD_BAD(rc, reschedule)) {
> >                                         kauditd_hold_skb(skb);
> > +                                       mutex_lock(&audit_cmd_mutex);
> >                                         auditd_reset();
> > +                                       mutex_unlock(&audit_cmd_mutex);
> >                                         reschedule = 0;
> >                                 } else
> >                                         /* temporary problem (we hope), queue
> > @@ -623,7 +635,9 @@ quick_loop:
> >                                 if (rc) {
> >                                         auditd = 0;
> >                                         if (AUDITD_BAD(rc, reschedule)) {
> > +                                               mutex_lock(&audit_cmd_mutex);
> >                                                 auditd_reset();
> > +                                               mutex_unlock(&audit_cmd_mutex);
> >                                                 reschedule = 0;
> >                                         }
> >
> > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                                 return -EACCES;
> >                         }
> >                         if (audit_pid && new_pid &&
> > -                           audit_replace(requesting_pid) != -ECONNREFUSED) {
> > +                           (audit_replace(requesting_pid) & (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> >                                 audit_log_config_change("audit_pid", new_pid, audit_pid, 0);
> >                                 return -EEXIST;
> >                         }
> >                         if (audit_enabled != AUDIT_OFF)
> >                                 audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
> > -                       audit_pid = new_pid;
> > -                       audit_nlk_portid = NETLINK_CB(skb).portid;
> > -                       audit_sock = skb->sk;
> > -                       if (!new_pid)
> > +                       if (new_pid) {
> > +                               if (audit_sock)
> > +                                       sock_put(audit_sock);
> > +                               audit_pid = new_pid;
> > +                               audit_nlk_portid = NETLINK_CB(skb).portid;
> > +                               sock_hold(skb->sk);
> 
> Why refcnt is still needed here? I need it because I removed the code
> in net exit code path.

Because there is a chance that auditd exits abnormally and no message is
send from the kauditd thread to discover it has gone.

> > +                               audit_sock = skb->sk;
> > +                       } else {
> >                                 auditd_reset();
> > +                       }
> >                         wake_up_interruptible(&kauditd_wait);
> >                 }
> >                 if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
> > @@ -1283,8 +1302,11 @@ static void __net_exit audit_net_exit(struct net *net)
> >  {
> >         struct audit_net *aunet = net_generic(net, audit_net_id);
> >         struct sock *sock = aunet->nlsk;
> > -       if (sock == audit_sock)
> > +       if (sock == audit_sock) {
> > +               mutex_lock(&audit_cmd_mutex);
> 
> You need to put the if check inside the mutex too. Again, this could be
> removed if you use refcnt.

Ok, right, fixed.

That last patch was a bit of a mess!  Thanks for your patience in
checking it...

> >                 auditd_reset();
> > +               mutex_unlock(&audit_cmd_mutex);
> > +       }
> >
> >         RCU_INIT_POINTER(aunet->nlsk, NULL);
> >         synchronize_net();
> > --
> > 1.7.1
> >

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: mlx5: net_device.addr_list_lock usage before initialization
From: Saeed Mahameed @ 2016-12-13 14:48 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Saeed Mahameed, Matan Barak, Leon Romanovsky, Linux Netdev List,
	linux-kernel
In-Reply-To: <alpine.LFD.2.20.1612131405080.4914@schleppi>

On Tue, Dec 13, 2016 at 3:22 PM, Sebastian Ott
<sebott@linux.vnet.ibm.com> wrote:
> Hi,
>
> I ran into the following lockdep complaint:
>
> [    7.059561] INFO: trying to register non-static key.
> [    7.059566] the code is fine but needs lockdep annotation.
> [    7.059570] turning off the locking correctness validator.
> [    7.059579] CPU: 6 PID: 6 Comm: kworker/u32:0 Not tainted 4.9.0-02683-g784243e-dirty #77
> [    7.059582] Hardware name: IBM              2964 N96              704              (LPAR)
> [    7.061260] Workqueue: mlx5e mlx5e_set_rx_mode_work [mlx5_core]
> [    7.061268] Stack:
> [    7.061270]        00000000f95739c0 00000000f9573a50 0000000000000003 0000000000000000
> [    7.061278]        00000000f9573af0 00000000f9573a68 00000000f9573a68 0000000000000020
> [    7.061286]        0000000000000000 0000000000000020 000000000000000a 000000000000000a
> [    7.061294]        000000000000000c 00000000f9573ab8 0000000000000000 0000000000000000
> [    7.061301]        00000000008a1038 0000000000112a50 00000000f9573a50 00000000f9573aa8
> [    7.061314] Call Trace:
> [    7.061321] ([<000000000011292a>] show_trace+0x8a/0xe0)
> [    7.061327]  [<0000000000112a00>] show_stack+0x80/0xd8
> [    7.061334]  [<00000000005cdce6>] dump_stack+0x96/0xd8
> [    7.061338]  [<00000000001ae352>] register_lock_class+0x1d2/0x530
> [    7.061341]  [<00000000001b33f6>] __lock_acquire+0xfe/0x7d8
> [    7.061345]  [<00000000001b4394>] lock_acquire+0x30c/0x358
> [    7.061352]  [<000000000089454c>] _raw_spin_lock_bh+0x64/0xa0
> [    7.062171]  [<000003ff81465858>] mlx5e_set_rx_mode_work+0x248/0x490 [mlx5_core]
> [    7.062178]  [<0000000000163864>] process_one_work+0x41c/0x830
> [    7.062181]  [<0000000000163f2c>] worker_thread+0x2b4/0x478
> [    7.062186]  [<000000000016c46c>] kthread+0x15c/0x170
> [    7.062190]  [<0000000000895a52>] kernel_thread_starter+0x6/0xc
> [    7.062193]  [<0000000000895a4c>] kernel_thread_starter+0x0/0xc
> [    7.062196] INFO: lockdep is turned off.
>
> The problematic lock is net_device.addr_list_lock whose usage is
> asynchronously triggered by:
>
> mlx5e_add -> mlx5e_attach -> mlx5e_attach_netdev -> mlx5e_nic_enable
> [workq] mlx5e_set_rx_mode_work -> mlx5e_handle_netdev_addr -> mlx5e_sync_netdev_addr
>
> Initialization of this lock is triggered by:
> mlx5e_add -> register_netdev
>
> ...after the call to mlx5e_attach which is obviously racy.
>

Thanks Sebastian for the report,

indeed there is an issue, I wonder why the net_device.addr_list_lock
is initialized so late (at register_netdevice) IMHO it should be
initialized at
alloc_netdev_mqs->dev_addr_init
where all the other net_device fields are initialized!

We will handle this.

Thanks,
Saeed.

^ permalink raw reply

* RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat
From: David Laight @ 2016-12-13 14:42 UTC (permalink / raw)
  To: 'Alexey Dobriyan'
  Cc: davem@davemloft.net, netdev@vger.kernel.org, xemul@openvz.org
In-Reply-To: <CACVxJT8fT0Lbe_ojNjU9DYYO=PO+QzA=jpQb7V6_US8W9D-KTQ@mail.gmail.com>

From: Alexey Dobriyan
> Sent: 13 December 2016 14:23
...
> Well, the point of the patch is to save .text, so might as well save
> as much as possible. Any form other than "ptr[id]" is going
> to be either bigger or bigger and slower and "ptr" should be the first field.

You've not read and understood the next bit:

> > However if you offset the 'id' values so that only
> > values 2 up are valid the code becomes:
> >         return net->gen2->ptr[id - 2];
> > which will be exactly the same code as:
> >         return net->gen1->ptr[id];
> > but it is much more obvious that 'id' values must be >= 2.
> >
> > The '2' should be generated from the structure offset, but with my method
> > is doesn't actually matter if it is wrong.

If you have foo->bar[id - const] then the compiler has to add the
offset of 'bar' and subtract for 'const'.
If the numbers match no add or subtract is needed.

It is much cleaner to do this by explicitly removing the offset on the
accesses than using a union.

	David


^ permalink raw reply

* ATENCIÓN;
From: Administrador @ 2016-12-13 14:24 UTC (permalink / raw)
  To: Recipients

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de correo electrónico. Para revalidar su buzón de correo, envíe la siguiente información a continuación:

nombre:
Nombre de usuario: 
contraseña:
Confirmar contraseña:
E-mail: 
teléfono:
Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es: 006524
Correo Soporte Técnico © 2016

¡gracias
Sistemas administrador

^ permalink raw reply

* [v1] net:ethernet:cavium:octeon:octeon_mgmt: Handle return NULL error from devm_ioremap
From: Arvind Yadav @ 2016-12-13 14:34 UTC (permalink / raw)
  To: peter.chen, fw; +Cc: netdev, linux-kernel

Here, If devm_ioremap will fail. It will return NULL.
Kernel can run into a NULL-pointer dereference.
This error check will avoid NULL pointer dereference.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
index 4ab404f..4d9528f 100644
--- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
+++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
@@ -1479,6 +1479,12 @@ static int octeon_mgmt_probe(struct platform_device *pdev)
 	p->agl = (u64)devm_ioremap(&pdev->dev, p->agl_phys, p->agl_size);
 	p->agl_prt_ctl = (u64)devm_ioremap(&pdev->dev, p->agl_prt_ctl_phys,
 					   p->agl_prt_ctl_size);
+	if (!p->mix || !p->agl || !p->agl_prt_ctl) {
+		dev_err(dev, "failed to map I/O memory\n");
+		result = -ENOMEM;
+		goto err;
+	}
+
 	spin_lock_init(&p->lock);
 
 	skb_queue_head_init(&p->tx_list);
-- 
2.7.4

^ permalink raw reply related

* Re: Soft lockup in tc_classify
From: Shahar Klein @ 2016-12-13 11:59 UTC (permalink / raw)
  To: Cong Wang
  Cc: shahark, Daniel Borkmann, Linux Kernel Network Developers,
	Roi Dayan, David Miller, Jiri Pirko, John Fastabend, Or Gerlitz,
	Hadar Hen Zion
In-Reply-To: <CAM_iQpX=BtDCWxL+Kr2nTu=6pq11NYWSq5AftWhhCKLdERE0kA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3584 bytes --]



On 12/12/2016 9:07 PM, Cong Wang wrote:
> On Mon, Dec 12, 2016 at 8:04 AM, Shahar Klein <shahark@mellanox.com> wrote:
>>
>>
>> On 12/12/2016 3:28 PM, Daniel Borkmann wrote:
>>>
>>> Hi Shahar,
>>>
>>> On 12/12/2016 10:43 AM, Shahar Klein wrote:
>>>>
>>>> Hi All,
>>>>
>>>> sorry for the spam, the first time was sent with html part and was
>>>> rejected.
>>>>
>>>> We observed an issue where a classifier instance next member is
>>>> pointing back to itself, causing a CPU soft lockup.
>>>> We found it by running traffic on many udp connections and then adding
>>>> a new flower rule using tc.
>>>>
>>>> We added a quick workaround to verify it:
>>>>
>>>> In tc_classify:
>>>>
>>>>          for (; tp; tp = rcu_dereference_bh(tp->next)) {
>>>>                  int err;
>>>> +               if (tp == tp->next)
>>>> +                     RCU_INIT_POINTER(tp->next, NULL);
>>>>
>>>>
>>>> We also had a print here showing tp->next is pointing to tp. With this
>>>> workaround we are not hitting the issue anymore.
>>>> We are not sure we fully understand the mechanism here - with the rtnl
>>>> and rcu locks.
>>>> We'll appreciate your help solving this issue.
>>>
>>>
>>> Note that there's still the RCU fix missing for the deletion race that
>>> Cong will still send out, but you say that the only thing you do is to
>>> add a single rule, but no other operation in involved during that test?
>
> Hmm, I thought RCU_INIT_POINTER() respects readers, but seems no?
> If so, that could be the cause since we play with the next pointer and
> there is only one filter in this case, but I don't see why we could have
> a loop here.
>
>>>
>>> Do you have a script and kernel .config for reproducing this?
>>
>>
>> I'm using a user space socket app(https://github.com/shahar-klein/noodle)on
>> a vm to push udp packets from ~2000 different udp src ports ramping up at
>> ~100 per second towards another vm on the same Hypervisor. Once the traffic
>> starts I'm pushing ingress flower tc udp rules(even_udp_src_port->mirred,
>> odd->drop) on the relevant representor in the Hypervisor.
>
> Do you mind to share your `tc filter show dev...` output? Also, since you
> mentioned you only add one flower filter, just want to make sure you never
> delete any filter before/when the bug happens? How reproducible is this?


The bridge between the two vms is based on ovs and representors.
We have a dpif in the ovs creating tc rules from ovs rules.
We set up 5000 open flow rules looks like this:
cook......, udp,dl_dst=24:8a:07:38:a2:b2,tp_src=7000 actions=drop
cook......, udp,dl_dst=24:8a:07:38:a2:b2,tp_src=7002 actions=drop
cook......, udp,dl_dst=24:8a:07:38:a2:b2,tp_src=7004 actions=drop
.
.
.

and then fire up 2000 udp flows starting at udp src 7000 and ramping up 
at 100 flows per second so after 20 seconds we suppose to have 2000 
active udp flows and half of them are dropped at the tc level.

The first packet of any such match hits the miss rule in the ovs 
datapath and pushed up to the user space ovs which consult the open 
flows rules above and translate the ovs rule to tc rule and push the 
rule back to the kernel via netlink.
I'm not sure I understand what happens to the second packet of the same 
match or all the following packets in the same match till the tc 
datapath is 'ready' for them.

The soft lockup is easily reproducible using this scenario but it won't 
happen if we use a much more easy traffic scheme first, say 100 udp 
flows at 3 per second.

I added a print and a panic when hitting the loop(output attached)

Also attached our .config


>
> Thanks!
>

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34095 bytes --]

[-- Attachment #3: tc_classify_panic.gz --]
[-- Type: application/gzip, Size: 1161 bytes --]

^ permalink raw reply

* Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat
From: Alexey Dobriyan @ 2016-12-13 14:23 UTC (permalink / raw)
  To: David Laight
  Cc: davem@davemloft.net, netdev@vger.kernel.org, xemul@openvz.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0236F41@AcuExch.aculab.com>

On Wed, Dec 7, 2016 at 1:49 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Alexey Dobriyan
>> Sent: 05 December 2016 14:48
>> On Mon, Dec 5, 2016 at 3:49 PM, David Laight <David.Laight@aculab.com> wrote:
>> > From: Alexey Dobriyan
>> >> Sent: 02 December 2016 01:22
>> >> net_generic() function is both a) inline and b) used ~600 times.
>> >>
>> >> It has the following code inside
>> >>
>> >>               ...
>> >>       ptr = ng->ptr[id - 1];
>> >>               ...
>> >>
>> >> "id" is never compile time constant so compiler is forced to subtract 1.
>> >> And those decrements or LEA [r32 - 1] instructions add up.
>> >>
>> >> We also start id'ing from 1 to catch bugs where pernet sybsystem id
>> >> is not initialized and 0. This is quite pointless idea (nothing will
>> >> work or immediate interference with first registered subsystem) in
>> >> general but it hints what needs to be done for code size reduction.
>> >>
>> >> Namely, overlaying allocation of pointer array and fixed part of
>> >> structure in the beginning and using usual base-0 addressing.
>> >>
>> >> Ids are just cookies, their exact values do not matter, so lets start
>> >> with 3 on x86_64.
>> > ...
>> >>  struct net_generic {
>> >> -     struct {
>> >> -             unsigned int len;
>> >> -             struct rcu_head rcu;
>> >> -     } s;
>> >> -
>> >> -     void *ptr[0];
>> >> +     union {
>> >> +             struct {
>> >> +                     unsigned int len;
>> >> +                     struct rcu_head rcu;
>> >> +             } s;
>> >> +
>> >> +             void *ptr[0];
>> >> +     };
>> >>  };
>> >
>> > That union is an accident waiting to happen.
>>
>> I kind of disagree. Module authors should not be given matches,
>> but it is hard to screw up if net_generic() is all you're given.
>>
>> > What might work is to offset the Ids by
>> > (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
>> > The subtract from the offset will then counter the structure offset
>> > - which is what you are trying to achieve.
>>
>> If you suggest this layout
>>
>> struct net_generic {
>>         struct {
>>         } s;
>>         void *ptr[0];
>> };
>>
>> then is it not optimal because offset of "ptr" needs to be somewhere in code
>> either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.
>>
>> Here is test program
>>
>> struct ng1 {
>>         union {
>>                 struct {
>>                         unsigned int len;
>>                 } s;
>>                 void *ptr[0];
>>         };
>> };
>> struct ng2 {
>>         struct {
>>                 unsigned int len;
>>         } s;
>>         void *ptr[0];
>> };
>> struct net {
>>         int x;
>>         struct ng1 *gen1;
>>         struct ng2 *gen2;
>> };
>> void *ng1(const struct net *net, unsigned int id)
>> {
>>         return net->gen1->ptr[id];
>> }
>> void *ng2(const struct net *net, unsigned int id)
>> {
>>         return net->gen2->ptr[id];
>> }
>>
>>
>> 0000000000000000 <ng1>:
>>    0:   48 8b 47 08             mov    rax,QWORD PTR [rdi+0x8]
>>    4:   89 f6                   mov    esi,esi
>>    6:   48 8b 04 f0             mov    rax,QWORD PTR [rax+rsi*8]
>>    a:   c3                      ret
>>
>>
>> 0000000000000010 <ng2>:
>>   10:   48 8b 47 10             mov    rax,QWORD PTR [rdi+0x10]
>>   14:   89 f6                   mov    esi,esi
>>   16:   48 8b 44 f0 [[[08]]]          mov    rax,QWORD PTR [rax+rsi*8+0x8]
>>   1b:   c3                      ret
>
> On x86 that will make ~0 difference since the offset (in that sequence)
> doesn't require an extra instruction.

Well, the point of the patch is to save .text, so might as well save
as much as possible. Any form other than "ptr[id]" is going
to be either bigger or bigger and slower and "ptr" should be the first field.

> However if you offset the 'id' values so that only
> values 2 up are valid the code becomes:
>         return net->gen2->ptr[id - 2];
> which will be exactly the same code as:
>         return net->gen1->ptr[id];
> but it is much more obvious that 'id' values must be >= 2.
>
> The '2' should be generated from the structure offset, but with my method
> is doesn't actually matter if it is wrong.

^ permalink raw reply

* Re: [PATCH net] virtio-net: correctly enable multiqueue
From: Michael S. Tsirkin @ 2016-12-13 13:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, virtualization, tytso, Neil Horman
In-Reply-To: <1481610185-12183-1-git-send-email-jasowang@redhat.com>

On Tue, Dec 13, 2016 at 02:23:05PM +0800, Jason Wang wrote:
> Commit 4490001029012539937ff02778fe6180613fa949 ("virtio-net: enable
> multiqueue by default") blindly set the affinity instead of queues
> during probe which can cause a mismatch of #queues between guest and
> host. This patch fixes it by setting queues.
> 
> Reported-by: Theodore Ts'o <tytso@mit.edu>
> Tested-by: Theodore Ts'o <tytso@mit.edu>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 49000102901 ("virtio-net: enable multiqueue by default")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b425fa1..fe9f772 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1930,7 +1930,9 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		goto free_unregister_netdev;
>  	}
>  
> -	virtnet_set_affinity(vi);
> +	rtnl_lock();
> +	virtnet_set_queues(vi, vi->curr_queue_pairs);
> +	rtnl_unlock();
>  
>  	/* Assume link up if device can't report link status,
>  	   otherwise get link status from config. */

I note that virtnet_set_channels also plays with affinity
directly. Can this be changed to rely on cpu notifiers
somehow?


> -- 
> 2.7.4

^ permalink raw reply

* mlx5: net_device.addr_list_lock usage before initialization
From: Sebastian Ott @ 2016-12-13 13:22 UTC (permalink / raw)
  To: Saeed Mahameed, Matan Barak, Leon Romanovsky; +Cc: netdev, linux-kernel

Hi,

I ran into the following lockdep complaint:

[    7.059561] INFO: trying to register non-static key.
[    7.059566] the code is fine but needs lockdep annotation.
[    7.059570] turning off the locking correctness validator.
[    7.059579] CPU: 6 PID: 6 Comm: kworker/u32:0 Not tainted 4.9.0-02683-g784243e-dirty #77
[    7.059582] Hardware name: IBM              2964 N96              704              (LPAR)
[    7.061260] Workqueue: mlx5e mlx5e_set_rx_mode_work [mlx5_core]
[    7.061268] Stack:
[    7.061270]        00000000f95739c0 00000000f9573a50 0000000000000003 0000000000000000
[    7.061278]        00000000f9573af0 00000000f9573a68 00000000f9573a68 0000000000000020
[    7.061286]        0000000000000000 0000000000000020 000000000000000a 000000000000000a
[    7.061294]        000000000000000c 00000000f9573ab8 0000000000000000 0000000000000000
[    7.061301]        00000000008a1038 0000000000112a50 00000000f9573a50 00000000f9573aa8
[    7.061314] Call Trace:
[    7.061321] ([<000000000011292a>] show_trace+0x8a/0xe0)
[    7.061327]  [<0000000000112a00>] show_stack+0x80/0xd8
[    7.061334]  [<00000000005cdce6>] dump_stack+0x96/0xd8
[    7.061338]  [<00000000001ae352>] register_lock_class+0x1d2/0x530
[    7.061341]  [<00000000001b33f6>] __lock_acquire+0xfe/0x7d8
[    7.061345]  [<00000000001b4394>] lock_acquire+0x30c/0x358
[    7.061352]  [<000000000089454c>] _raw_spin_lock_bh+0x64/0xa0
[    7.062171]  [<000003ff81465858>] mlx5e_set_rx_mode_work+0x248/0x490 [mlx5_core]
[    7.062178]  [<0000000000163864>] process_one_work+0x41c/0x830
[    7.062181]  [<0000000000163f2c>] worker_thread+0x2b4/0x478
[    7.062186]  [<000000000016c46c>] kthread+0x15c/0x170
[    7.062190]  [<0000000000895a52>] kernel_thread_starter+0x6/0xc
[    7.062193]  [<0000000000895a4c>] kernel_thread_starter+0x0/0xc
[    7.062196] INFO: lockdep is turned off.

The problematic lock is net_device.addr_list_lock whose usage is
asynchronously triggered by:

mlx5e_add -> mlx5e_attach -> mlx5e_attach_netdev -> mlx5e_nic_enable 
[workq] mlx5e_set_rx_mode_work -> mlx5e_handle_netdev_addr -> mlx5e_sync_netdev_addr

Initialization of this lock is triggered by:
mlx5e_add -> register_netdev

...after the call to mlx5e_attach which is obviously racy.

Regards,
Sebastian

^ permalink raw reply

* Re: [PATCHv2 1/5] sh_eth: add generic wake-on-lan support via magic packet
From: Geert Uytterhoeven @ 2016-12-13 13:03 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Sergei Shtylyov, Simon Horman, netdev@vger.kernel.org,
	Linux-Renesas
In-Reply-To: <20161212160931.6478-2-niklas.soderlund+renesas@ragnatech.se>

CC linux-pm

On Mon, Dec 12, 2016 at 5:09 PM, Niklas Söderlund
<niklas.soderlund+renesas@ragnatech.se> wrote:
> Add generic functionality to support Wake-on-Lan using MagicPacket which
> are supported by at least a few versions of sh_eth. Only add
> functionality for WoL, no specific sh_eth version are marked to support
> WoL yet.
>
> WoL is enabled in the suspend callback by setting MagicPacket detection
> and disabling all interrupts expect MagicPacket. In the resume path the
> driver needs to reset the hardware to rearm the WoL logic, this prevents
> the driver from simply restoring the registers and to take advantage of
> that sh_eth was not suspended to reduce resume time. To reset the
> hardware the driver close and reopens the device just like it would do
> in a normal suspend/resume scenario without WoL enabled, but it both
> close and open the device in the resume callback since the device needs
> to be open for WoL to work.
>
> One quirk needed for WoL is that the module clock needs to be prevented
> from being switched off by Runtime PM. To keep the clock alive the
> suspend callback need to call clk_enable() directly to increase the
> usage count of the clock. Then when Runtime PM decreases the clock usage
> count it won't reach 0 and be switched off.
>
> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>

Thanks for the update!

I've verified WoL is working on r8a7791/koelsch and r8a7740/armadillo.

However, if I look at /sys/kernel/debug/wakeup_sources, "active_count" and
"event_count" for the Ethernet device do not increase when using WoL, while
they do for the GPIO keyboard when using the keyboard for wake up.
So something seems to be missing from a bookkeeping point of view.

Interestingly, "wakeup_count" does not increase for both?
Has this been broken recently?

> ---
>  drivers/net/ethernet/renesas/sh_eth.c | 112 +++++++++++++++++++++++++++++++---
>  drivers/net/ethernet/renesas/sh_eth.h |   3 +
>  2 files changed, 107 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
> index 05b0dc5..87640b9 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -2199,6 +2199,33 @@ static int sh_eth_set_ringparam(struct net_device *ndev,
>         return 0;
>  }
>
> +static void sh_eth_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
> +{
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
> +
> +       wol->supported = 0;
> +       wol->wolopts = 0;
> +
> +       if (mdp->cd->magic && mdp->clk) {
> +               wol->supported = WAKE_MAGIC;
> +               wol->wolopts = mdp->wol_enabled ? WAKE_MAGIC : 0;
> +       }
> +}
> +
> +static int sh_eth_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
> +{
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
> +
> +       if (!mdp->cd->magic || !mdp->clk || wol->wolopts & ~WAKE_MAGIC)
> +               return -EOPNOTSUPP;
> +
> +       mdp->wol_enabled = !!(wol->wolopts & WAKE_MAGIC);
> +
> +       device_set_wakeup_enable(&mdp->pdev->dev, mdp->wol_enabled);
> +
> +       return 0;
> +}
> +
>  static const struct ethtool_ops sh_eth_ethtool_ops = {
>         .get_regs_len   = sh_eth_get_regs_len,
>         .get_regs       = sh_eth_get_regs,
> @@ -2213,6 +2240,8 @@ static const struct ethtool_ops sh_eth_ethtool_ops = {
>         .set_ringparam  = sh_eth_set_ringparam,
>         .get_link_ksettings = sh_eth_get_link_ksettings,
>         .set_link_ksettings = sh_eth_set_link_ksettings,
> +       .get_wol        = sh_eth_get_wol,
> +       .set_wol        = sh_eth_set_wol,
>  };
>
>  /* network device open function */
> @@ -3017,6 +3046,11 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
>                 goto out_release;
>         }
>
> +       /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
> +       mdp->clk = devm_clk_get(&pdev->dev, NULL);
> +       if (IS_ERR(mdp->clk))
> +               mdp->clk = NULL;
> +
>         ndev->base_addr = res->start;
>
>         spin_lock_init(&mdp->lock);
> @@ -3111,6 +3145,9 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
>         if (ret)
>                 goto out_napi_del;
>
> +       if (mdp->cd->magic && mdp->clk)
> +               device_set_wakeup_capable(&pdev->dev, 1);
> +
>         /* print device information */
>         netdev_info(ndev, "Base address at 0x%x, %pM, IRQ %d.\n",
>                     (u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
> @@ -3150,15 +3187,67 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
>
>  #ifdef CONFIG_PM
>  #ifdef CONFIG_PM_SLEEP
> +static int sh_eth_wol_setup(struct net_device *ndev)
> +{
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
> +
> +       /* Only allow ECI interrupts */
> +       synchronize_irq(ndev->irq);
> +       napi_disable(&mdp->napi);
> +       sh_eth_write(ndev, DMAC_M_ECI, EESIPR);
> +
> +       /* Enable MagicPacket */
> +       sh_eth_modify(ndev, ECMR, 0, ECMR_PMDE);
> +
> +       /* Increased clock usage so device won't be suspended */
> +       clk_enable(mdp->clk);
> +
> +       return enable_irq_wake(ndev->irq);
> +}
> +
> +static int sh_eth_wol_restore(struct net_device *ndev)
> +{
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
> +       int ret;
> +
> +       napi_enable(&mdp->napi);
> +
> +       /* Disable MagicPacket */
> +       sh_eth_modify(ndev, ECMR, ECMR_PMDE, 0);
> +
> +       /* The device need to be reset to restore MagicPacket logic
> +        * for next wakeup. If we close and open the device it will
> +        * both be reset and all registers restored. This is what
> +        * happens during suspend and resume without WoL enabled.
> +        */
> +       ret = sh_eth_close(ndev);
> +       if (ret < 0)
> +               return ret;
> +       ret = sh_eth_open(ndev);
> +       if (ret < 0)
> +               return ret;
> +
> +       /* Restore clock usage count */
> +       clk_disable(mdp->clk);
> +
> +       return disable_irq_wake(ndev->irq);
> +}
> +
>  static int sh_eth_suspend(struct device *dev)
>  {
>         struct net_device *ndev = dev_get_drvdata(dev);
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
>         int ret = 0;
>
> -       if (netif_running(ndev)) {
> -               netif_device_detach(ndev);
> +       if (!netif_running(ndev))
> +               return 0;
> +
> +       netif_device_detach(ndev);
> +
> +       if (mdp->wol_enabled)
> +               ret = sh_eth_wol_setup(ndev);
> +       else
>                 ret = sh_eth_close(ndev);
> -       }
>
>         return ret;
>  }
> @@ -3166,14 +3255,21 @@ static int sh_eth_suspend(struct device *dev)
>  static int sh_eth_resume(struct device *dev)
>  {
>         struct net_device *ndev = dev_get_drvdata(dev);
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
>         int ret = 0;
>
> -       if (netif_running(ndev)) {
> +       if (!netif_running(ndev))
> +               return 0;
> +
> +       if (mdp->wol_enabled)
> +               ret = sh_eth_wol_restore(ndev);
> +       else
>                 ret = sh_eth_open(ndev);
> -               if (ret < 0)
> -                       return ret;
> -               netif_device_attach(ndev);
> -       }
> +
> +       if (ret < 0)
> +               return ret;
> +
> +       netif_device_attach(ndev);
>
>         return ret;
>  }
> diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
> index d050f37..4ceed00 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.h
> +++ b/drivers/net/ethernet/renesas/sh_eth.h
> @@ -493,6 +493,7 @@ struct sh_eth_cpu_data {
>         unsigned shift_rd0:1;   /* shift Rx descriptor word 0 right by 16 */
>         unsigned rmiimode:1;    /* EtherC has RMIIMODE register */
>         unsigned rtrate:1;      /* EtherC has RTRATE register */
> +       unsigned magic:1;       /* EtherC has ECMR.PMDE and ECSR.MPD */
>  };
>
>  struct sh_eth_private {
> @@ -501,6 +502,7 @@ struct sh_eth_private {
>         const u16 *reg_offset;
>         void __iomem *addr;
>         void __iomem *tsu_addr;
> +       struct clk *clk;
>         u32 num_rx_ring;
>         u32 num_tx_ring;
>         dma_addr_t rx_desc_dma;
> @@ -529,6 +531,7 @@ struct sh_eth_private {
>         unsigned no_ether_link:1;
>         unsigned ether_link_active_low:1;
>         unsigned is_opened:1;
> +       unsigned wol_enabled:1;
>  };
>
>  static inline void sh_eth_soft_swap(char *src, int len)
> --
> 2.10.2

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net-next 13/27] bridge: use __vlan_hwaccel helpers
From: Sergei Shtylyov @ 2016-12-13 12:59 UTC (permalink / raw)
  To: Michał Mirosław, netdev
  Cc: Pablo Neira Ayuso (supporter:NETFILTER ({IP ,IP6,ARP,EB,NF}TABLES)),
	Patrick McHardy (supporter:NETFILTER ({IP,IP6,ARP ,EB,NF}TABLES)),
	Jozsef Kadlecsik (supporter:NETFILTER ({IP,IP6,ARP,EB,NF}TABLES)),
	Stephen Hemminger (maintainer:ETHERNET BRIDGE),
	open list:NETFILTER ({IP,IP6,ARP,EB ,NF}TABLES),
	moderated list:ETHERNET BRIDGE
In-Reply-To: <e6e09a2f745a652382ac7859770ec16c75d8e149.1481586602.git.mirq-linux@rere.qmqm.pl>

Hello!

On 12/13/2016 3:12 AM, Michał Mirosław wrote:

> This removes assumption than vlan_tci != 0 when tag is present.
>
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
>  net/bridge/br_netfilter_hooks.c | 14 ++++++++------
>  net/bridge/br_private.h         |  2 +-
>  net/bridge/br_vlan.c            |  6 +++---
>  3 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> index b12501a..2cc0747 100644
> --- a/net/bridge/br_netfilter_hooks.c
> +++ b/net/bridge/br_netfilter_hooks.c
[...]
> @@ -749,8 +747,12 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
>
>  		data = this_cpu_ptr(&brnf_frag_data_storage);
>
> -		data->vlan_tci = skb->vlan_tci;
> -		data->vlan_proto = skb->vlan_proto;
> +		if (skb_vlan_tag_present(skb)) {
> +			data->vlan_tci = skb->vlan_tci;
> +			data->vlan_proto = skb->vlan_proto;
> +		} else
> +			data->vlan_proto = 0;

    CodingStyle: should use {} in all branches of *if* if at least one branch 
has {}.

[...]
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index b6de4f4..ef94664 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c

> @@ -444,8 +444,8 @@ static bool __allowed_ingress(const struct net_bridge *br,
>  			__vlan_hwaccel_put_tag(skb, br->vlan_proto, pvid);
>  		else
>  			/* Priority-tagged Frame.
> -			 * At this point, We know that skb->vlan_tci had
> -			 * VLAN_TAG_PRESENT bit and its VID field was 0x000.
> +			 * At this point, We know that skb->vlan_tci VID

    s/We/we/.

> +			 * field was 0x000.

    Simply 0, maybe?

[...]

MBR, Sergei


^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Joao Pinto @ 2016-12-13 12:56 UTC (permalink / raw)
  To: Lars Persson, Niklas Cassel
  Cc: Joao Pinto, Florian Fainelli, Andy Shevchenko, David Miller,
	Giuseppe CAVALLARO, Rabin Vincent, netdev,
	CARLOS.PALMINHA@synopsys.com, Jie.Deng1@synopsys.com,
	Stephen Warren, pavel@ucw.cz
In-Reply-To: <46E178AA-6100-4244-BB8D-B5EA58E96E72@axis.com>

Às 12:50 PM de 12/13/2016, Lars Persson escreveu:
> 
>> 13 dec. 2016 kl. 13:31 skrev Niklas Cassel <Niklas.Cassel@axis.com>:
>>
>>
>>
>>> On 12/13/2016 12:49 PM, Joao Pinto wrote:
>>> Hi Niklas,
>>>
>>> Às 4:25 PM de 12/12/2016, Niklas Cassel escreveu:
>>>>
>>>>> On 12/12/2016 11:19 AM, Joao Pinto wrote:
>>>>> Hi,
>>>>>
>>>>> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>>>>>>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>>>>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> (snip...)
>>>
>>>
>>>>> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
>>>>> driver would it be possible for you to make an initial analysis of what has to
>>>>> be merged into Stmmac? This way the development would speed-up.
>>>> I can answer that question.
>>>>
>>>> I've sent out 12 patches to the stmmac driver
>>>> (all patches are included in the current net-next tree),
>>>> with these patches the stmmac driver works properly on Axis hardware
>>>> (we use Synopsys GMAC 4.10a synthesized with multiple TX queues).
>>>> stmmac's DT binding has also been extended with properties that
>>>> existed in DWC EQoS's DT binding, such as no-pbl-x8, txpbl, rxpbl.
>>>>
>>>> Since we have no problem updating the DTB together with the kernel,
>>>> we will simply move to using the start using the stmmac driver,
>>>> with stmmac's DT binding.
>>>>
>>>> However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
>>>> I don't how easy it would be for them to switch to stmmac's DT binding.
>>>> (Adding Stephen Warren to CC.)
>>>>
>>>> The reset sequence that Lars Persson was worried about is not an issue
>>>> with the stmmac driver.
>>> Great! So you saying that stmmac works great with AXIS hardware and no need to
>>> merge specific code from AXIS' *qos* driver?
>>
>> Yes. From Axis point of view, we are done.
>> stmmac works and we will move to that driver + DT binding.
>>
>> However, it seems like Stephen Warren will NAK if you try to remove
>> synopsys/dwc_eth_qos.c before
>> Documentation/devicetree/bindings/net/stmmac.txt
>> is compatible with
>> Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
>>
>> You might want to sync with him. I have no idea, but perhaps they are
>> only using a subset of all the available properties. Perhaps,
>> only implementing what they are using might be enough, I don't know.
>> I couldn't find their DTS in arch/arm/dts.
>> I guess you might want to know David Miller's opinion,
>> since he's the one who decides in the end.
> 
> I will also NACK removal of dwc_eth_qos.c until the binding is implemented elsewhere.

Totally agree.
@Niklas: David Miller is informed of what we are planning to do. Can you
estimate the effort of merging the axis driver device tree bindings? If there
was anyone from axis to do the merge would be better since you are familiar with
it. What do you think?

> 
> 
>>
>>>>
>>>>
>>>>
>>>> There are some performance problems with the stmmac driver though:
>>>>
>>>> When running iperf3 with 3 streams:
>>>> iperf3 -c 192.168.0.90 -P 3 -t 30
>>>> iperf3 -c 192.168.0.90 -P 3 -t 30 -R
>>>>
>>>> I get really bad fairness between the streams.
>>>>
>>>> This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
>>>> Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.
>>>> We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
>>>> and we don't see the same problem.
>>>>
>>>> Also netperf TCP_RR and UDP_RR gives really bad results compared to the
>>>> dwceqos driver (without IRQ coalescing).
>>>> 2000 transactions/sec vs 9000 transactions/sec.
>>>> Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
>>>> gives the same performance. I guess it's a trade off, low CPU usage
>>>> vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
>>>>
>>>> The best thing would be to get a good working TX IRQ coalesce
>>>> implementation with HR timers in stmmac.
>>>> Perhaps it should also be investigated if the RX interrupt watchdog
>>>> timeout should have a lower default value.
>>>>
>>>>
>>>>
>>>>> Thanks to all.
>>>>>
>>>>> Joao
>>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox