* Re: [PATCH net 0/3] net: stmmac: Misc fixes
From: Corentin Labbe @ 2019-01-30 15:17 UTC (permalink / raw)
To: Jose Abreu
Cc: netdev, Joao Pinto, David S . Miller, Giuseppe Cavallaro,
Alexandre Torgue
In-Reply-To: <cover.1548859967.git.joabreu@synopsys.com>
On Wed, Jan 30, 2019 at 03:54:18PM +0100, Jose Abreu wrote:
> Some misc fixes for stmmac targeting -net.
>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Alexandre Torgue <alexandre.torgue@st.com>
>
> Jose Abreu (3):
> net: stmmac: Fallback to Platform Data clock in Watchdog conversion
> net: stmmac: Send TSO packets always from Queue 0
> net: stmmac: Disable EEE mode earlier in XMIT callback
>
> drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 14 ++++++++++----
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 17 +++++++++++++----
> include/linux/stmmac.h | 1 +
> 3 files changed, 24 insertions(+), 8 deletions(-)
>
> --
> 2.7.4
>
Hello
Could you CC linux-kernel@vger.kernel.org when you send patch. (as asked by get_maintainer.pl)
By avoiding it, your patchset is not stored on lore nor is handled by all checkbots reading lkml.
Thanks
Regards
^ permalink raw reply
* Re: [PATCH net-next v6 1/2] net: add support for Cavium PTP coprocessor
From: Bjorn Helgaas @ 2019-01-30 15:06 UTC (permalink / raw)
To: Aleksey Makarov
Cc: netdev, Robert Richter, Goutham, Sunil, Richard Cochran,
Radoslaw Biernacki, linux-kernel, David Daney,
Philippe Ombredanne, Joe Perches, linux-arm-kernel
In-Reply-To: <20180115124500.14872-2-aleksey.makarov@cavium.com>
On Mon, Jan 15, 2018 at 06:44:56PM +0600, Aleksey Makarov wrote:
> +++ b/drivers/net/ethernet/cavium/common/cavium_ptp.c
> @@ -0,0 +1,353 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* cavium_ptp.c - PTP 1588 clock on Cavium hardware
> + * Copyright (c) 2003-2015, 2017 Cavium, Inc.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/timecounter.h>
> +#include <linux/pci.h>
> +
> +#include "cavium_ptp.h"
> +
> +#define DRV_NAME "Cavium PTP Driver"
This is also unconventional and looks funny, e.g., here:
$ ls /sys/bus/pci/drivers/
8250_mid/ exar_serial/ mei_me/ snd_hda_intel/
agpgart-intel/ i801_smbus/ parport_pc/ snd_soc_skl/
agpgart-sis/ i915/ pcieport/ xen-platform-pci/
agpgart-via/ intel_ish_ipc/ rtsx_pci/ xhci_hcd/
ahci/ intel_pch_thermal/ serial/
'Cavium PTP Driver'/ iosf_mbi_pci/ shpchp/
e1000e/ iwlwifi/ skl_uncore/
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] net: dsa: mv88e6xxx: Save switch rules
From: Andrew Lunn @ 2019-01-30 14:54 UTC (permalink / raw)
To: Miquel Raynal
Cc: Vivien Didelot, Florian Fainelli, David S. Miller, netdev,
linux-kernel, Thomas Petazzoni, Gregory Clement, Antoine Tenart,
Maxime Chevallier, Nadav Haklai
In-Reply-To: <20190130104606.31639abb@xps13>
> So your proposal is to refuse suspending when using a mv88e6xxx switch.
Hi Miquèl
That is the first step. It makes the mv88e6xxx suspend compliant, in
that it currently does not support suspend.
> What about the current situation where suspending is allowed, but all
> the configuration gone?
That is broken. The whole point of suspending is that you resume back
to the original state.
Andrew
^ permalink raw reply
* [PATCH net 3/3] net: stmmac: Disable EEE mode earlier in XMIT callback
From: Jose Abreu @ 2019-01-30 14:54 UTC (permalink / raw)
To: netdev
Cc: Jose Abreu, Joao Pinto, David S . Miller, Giuseppe Cavallaro,
Alexandre Torgue
In-Reply-To: <cover.1548859967.git.joabreu@synopsys.com>
In stmmac xmit callback we use a different flow for TSO packets but TSO
xmit callback is not disabling the EEE mode.
Fix this by disabling earlier the EEE mode, i.e. before calling the TSO
xmit callback.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 6656008068de..685d20472358 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3023,6 +3023,9 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
tx_q = &priv->tx_queue[queue];
+ if (priv->tx_path_in_lpi_mode)
+ stmmac_disable_eee_mode(priv);
+
/* Manage oversized TCP frames for GMAC4 device */
if (skb_is_gso(skb) && priv->tso) {
if (skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) {
@@ -3050,9 +3053,6 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_BUSY;
}
- if (priv->tx_path_in_lpi_mode)
- stmmac_disable_eee_mode(priv);
-
entry = tx_q->cur_tx;
first_entry = entry;
WARN_ON(tx_q->tx_skbuff[first_entry]);
--
2.7.4
^ permalink raw reply related
* [PATCH net 2/3] net: stmmac: Send TSO packets always from Queue 0
From: Jose Abreu @ 2019-01-30 14:54 UTC (permalink / raw)
To: netdev
Cc: Jose Abreu, Joao Pinto, David S . Miller, Giuseppe Cavallaro,
Alexandre Torgue
In-Reply-To: <cover.1548859967.git.joabreu@synopsys.com>
The number of TSO enabled channels in HW can be different than the
number of total channels. There is no way to determined, at runtime, the
number of TSO capable channels and its safe to assume that if TSO is
enabled then at least channel 0 will be TSO capable.
Lets always send TSO packets from Queue 0.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 5afba69981cf..6656008068de 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3025,8 +3025,17 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
/* Manage oversized TCP frames for GMAC4 device */
if (skb_is_gso(skb) && priv->tso) {
- if (skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))
+ if (skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) {
+ /*
+ * There is no way to determine the number of TSO
+ * capable Queues. Let's use always the Queue 0
+ * because if TSO is supported then at least this
+ * one will be capable.
+ */
+ skb_set_queue_mapping(skb, 0);
+
return stmmac_tso_xmit(skb, dev);
+ }
}
if (unlikely(stmmac_tx_avail(priv, queue) < nfrags + 1)) {
--
2.7.4
^ permalink raw reply related
* [PATCH net 1/3] net: stmmac: Fallback to Platform Data clock in Watchdog conversion
From: Jose Abreu @ 2019-01-30 14:54 UTC (permalink / raw)
To: netdev
Cc: Jose Abreu, Joao Pinto, David S . Miller, Giuseppe Cavallaro,
Alexandre Torgue
In-Reply-To: <cover.1548859967.git.joabreu@synopsys.com>
If we don't have DT then stmmac_clk will not be available. Let's add a
new Platform Data field so that we can specify the refclk by this mean.
This way we can still use the coalesce command in PCI based setups.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 14 ++++++++++----
include/linux/stmmac.h | 1 +
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index d1f61c25d82b..5d85742a2be0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -721,8 +721,11 @@ static u32 stmmac_usec2riwt(u32 usec, struct stmmac_priv *priv)
{
unsigned long clk = clk_get_rate(priv->plat->stmmac_clk);
- if (!clk)
- return 0;
+ if (!clk) {
+ clk = priv->plat->clk_ref_rate;
+ if (!clk)
+ return 0;
+ }
return (usec * (clk / 1000000)) / 256;
}
@@ -731,8 +734,11 @@ static u32 stmmac_riwt2usec(u32 riwt, struct stmmac_priv *priv)
{
unsigned long clk = clk_get_rate(priv->plat->stmmac_clk);
- if (!clk)
- return 0;
+ if (!clk) {
+ clk = priv->plat->clk_ref_rate;
+ if (!clk)
+ return 0;
+ }
return (riwt * 256) / (clk / 1000000);
}
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 7ddfc65586b0..4335bd771ce5 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -184,6 +184,7 @@ struct plat_stmmacenet_data {
struct clk *pclk;
struct clk *clk_ptp_ref;
unsigned int clk_ptp_rate;
+ unsigned int clk_ref_rate;
struct reset_control *stmmac_rst;
struct stmmac_axi *axi;
int has_gmac4;
--
2.7.4
^ permalink raw reply related
* [PATCH net 0/3] net: stmmac: Misc fixes
From: Jose Abreu @ 2019-01-30 14:54 UTC (permalink / raw)
To: netdev
Cc: Jose Abreu, Joao Pinto, David S . Miller, Giuseppe Cavallaro,
Alexandre Torgue
Some misc fixes for stmmac targeting -net.
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Jose Abreu (3):
net: stmmac: Fallback to Platform Data clock in Watchdog conversion
net: stmmac: Send TSO packets always from Queue 0
net: stmmac: Disable EEE mode earlier in XMIT callback
drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 14 ++++++++++----
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 17 +++++++++++++----
include/linux/stmmac.h | 1 +
3 files changed, 24 insertions(+), 8 deletions(-)
--
2.7.4
^ permalink raw reply
* Re: general protection fault in __xfrm_policy_bysel_ctx
From: Dmitry Vyukov @ 2019-01-30 14:30 UTC (permalink / raw)
To: Florian Westphal
Cc: syzbot, David Miller, Herbert Xu, LKML, netdev, Steffen Klassert,
syzkaller-bugs
In-Reply-To: <20190130142050.ilcccckfatn2zj6o@breakpoint.cc>
On Wed, Jan 30, 2019 at 3:20 PM Florian Westphal <fw@strlen.de> wrote:
>
> Dmitry Vyukov <dvyukov@google.com> wrote:
> > > syzbot <syzbot+e6e1fe9148cffa18cf97@syzkaller.appspotmail.com> wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit: 085c4c7dd2b6 net: lmc: remove -I. header search path
> > > > git tree: net-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12347128c00000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=505743eba4e4f68
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e6e1fe9148cffa18cf97
> > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > > >
> > > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > net-next doesn't contain the fixes for the rbtree fallout yet, so
> > > this might already be fixed (fingers crossed).
> >
> > Hi Florian,
> >
> > What is that fix for the record?
>
> I don't know. I managed to add every bug class imagineable in that series 8-(
>
> The last (most recent) fix from the 'fallout cleanup' is:
> 12750abad517a991c4568969bc748db302ab52cd
> ("xfrm: policy: fix infinite loop when merging src-nodes")
>
> so if syzkaller can generate a splat with that change present
> something is still broken.
>
> > We will need to close this later. Or perhaps we can already mark this
> > as fixed by that patch with "#syz fix:" command?
>
> There are a lot of open xfrm related splats that could all be explained
> by the rbtree bugs (one had a reproducer, the fix has appropriate
> reported-by tag).
>
> It would be great if there was a way to tell syzkaller to report those
> again if they still appear.
That's exactly what "#syz fix:" will do.
syzbot will wait until the fixing commit appears in all builds/trees
it tests, then close this bug, and then any new similarly looking
crash will produce a new bug report. So if the patch indeed fixes the
bug, then the bug will be closed and we are done. If it does not fix
this bug, then we will get another report but at that time on a tree
that includes the commit.
> I could pretend and claim above commit as "sys-fix", but it seems fishy.
>
> Let me know and I can tag all of them.
It's "safe" to mark these crashes as fixed when we are not 100% sure
in the sense that we won't lose the bug (it will be reported again
later if it's not fixed).
It's also useful to keep the list of open/active bugs shorter and more
precise (don't leave too many obsoleted open bugs). What happened
multiple times is that a bug was fixed but left open, and then a
similarly looking crashes started happening again (a new bug), but it
wasn't reported by syzbot because for syzbot it looked like the old
still unfixed bug.
^ permalink raw reply
* Re: [PATCH] include/linux: Remove duplicate header
From: Brajeswar Ghosh @ 2019-01-30 14:30 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
Ariel.Elior, davem, jasowang, john fastabend, Andrea Parri,
dhowells, ebiederm, james.morris, Paul Moore, zohar, mjg59, viro,
Richard Haines, Sargun Dhillon, Souptick Joarder, linux-kernel,
netdev, Sabyasachi Gupta
In-Reply-To: <20190110141930-mutt-send-email-mst@kernel.org>
On Fri, Jan 11, 2019 at 12:50 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jan 10, 2019 at 09:31:14PM +0530, Brajeswar Ghosh wrote:
> > Remove duplicate headers which are included more than once
> >
> > Signed-off-by: Brajeswar Ghosh <brajeswar.linux@gmail.com>
> > ---
> > include/linux/perf_event.h | 1 -
> > include/linux/ptr_ring.h | 1 -
> > include/linux/qed/qed_if.h | 1 -
> > include/linux/security.h | 1 -
> > 4 files changed, 4 deletions(-)
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 53c500f0ca79..8d7a2b65514b 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -53,7 +53,6 @@ struct perf_guest_info_callbacks {
> > #include <linux/atomic.h>
> > #include <linux/sysfs.h>
> > #include <linux/perf_regs.h>
> > -#include <linux/workqueue.h>
> > #include <linux/cgroup.h>
> > #include <asm/local.h>
> >
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 6894976b54e3..627b7aa067e0 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -26,7 +26,6 @@
> > #include <linux/cache.h>
> > #include <linux/types.h>
> > #include <linux/compiler.h>
> > -#include <linux/cache.h>
> > #include <linux/slab.h>
> > #include <asm/errno.h>
> > #endif
>
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
>
> Pls merge through the trivial tree or similar.
If no further comment, can we get this patch in queue for 5.1 ?
>
>
> > diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
> > index a47321a0d572..9ebdb60f605a 100644
> > --- a/include/linux/qed/qed_if.h
> > +++ b/include/linux/qed/qed_if.h
> > @@ -38,7 +38,6 @@
> > #include <linux/netdevice.h>
> > #include <linux/pci.h>
> > #include <linux/skbuff.h>
> > -#include <linux/types.h>
> > #include <asm/byteorder.h>
> > #include <linux/io.h>
> > #include <linux/compiler.h>
> > diff --git a/include/linux/security.h b/include/linux/security.h
> > index ac0ef467d25e..2915ea6a5b77 100644
> > --- a/include/linux/security.h
> > +++ b/include/linux/security.h
> > @@ -30,7 +30,6 @@
> > #include <linux/err.h>
> > #include <linux/string.h>
> > #include <linux/mm.h>
> > -#include <linux/fs.h>
> >
> > struct linux_binprm;
> > struct cred;
> > --
> > 2.17.1
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH net] ixgbe: fix potential RX buffer starvation for AF_XDP
From: Magnus Karlsson @ 2019-01-30 14:25 UTC (permalink / raw)
To: Björn Töpel
Cc: Jakub Kicinski, Network Development, Björn Töpel,
intel-wired-lan, Magnus Karlsson
In-Reply-To: <CAJ+HfNg=vUqUszLX+C6LvYRq8J_X_UiU4x63FCaXWqdRjV5DGA@mail.gmail.com>
On Wed, Jan 30, 2019 at 3:16 PM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> Den ons 30 jan. 2019 kl 10:35 skrev Magnus Karlsson <magnus.karlsson@gmail.com>:
> >
> > On Tue, Jan 29, 2019 at 8:15 PM Jakub Kicinski
> > <jakub.kicinski@netronome.com> wrote:
> [...]
> > >
> > > I may not understand the problem fully, but isn't it kind of normal
> > > that if you create a ring empty you'll never receive packets? And it
> > > should be reasonably easy to catch while writing an app from scratch
> > > (i.e. it behaves deterministically).
> >
> > Agree that this should be the normal behavior for a NIC. The question
> > is how to get out of this situation.There are two options: punt this
> > to the application writer or fix this in the driver. I chose to fix
> > the driver since this removes complexity in the application.
> >
>
> Magnus' fix addresses a race/timing issue. At zero-copy initialization
> point, if the fill ring was empty, the driver (both i40e and ixgbe)
> would stop retrying to "allocate" zero-copy frames from the fill
> ring. So, frames would never be received, even if the fill ring was
> filled at a later point.
>
> If the driver runs-dry in terms of Rx buffer if one or more frames has
> been received, the driver will retry polling the fill-ring. However at
> initialization point, if the fill-ring was empty, the driver would
> just give up and never retry.
>
> As Magnus stated, there is no "notify the kernel that the items has
> appeared in the fill ring" (other than a HW mechanism where the tail
> pointer is a door bell) on the Rx side, so for the Intel drivers it's
> up to the driver to solve this.
>
> > > Plus user space can already do the kick manually: create the ring empty,
> > > attach prog, put packets on the ring, and kick xmit - that would work,
> > > no?
> >
> > Unfortunately there is no way such a kick could be guaranteed to work
> > since it only guarantees to wake up Tx. If the Rx is in another NAPI
> > context, then this will not work. On i40e and ixgbe, the Rx and Tx
> > processing is in the same NAPI so this would indeed work. But can I
> > guarantee that this is true for all current and future NICs?
> > Personally I would like to have Tx in a separate NAPI as I could get
> > better performance that way, at least in the i40e driver. You also do
> > not need to have any Tx at all in your socket, as it could be Rx only.
> > These arguments are why I went with the approach to fix this in the
> > driver, instead of a hard-to-explain extra step in the application
> > that might or might not work depending on your driver.
> >
> > I believe the crux of the problem is that we need to tell the driver
> > to take buffers from the fill ring then put references to them in the
> > HW Rx ring so that packets can be received from the wire. In this
> > patch I went for installing an XDP program and calling bind() as "the"
> > two places. But there could be a scenario where I do bind then install
> > the XDP program and after that populate my fill queue and not call any
> > other syscall ever! This would not be covered by this patch :-(.
> >
> > So you might be correct that the only way to fix this is from the
> > application. But how to do it in a simple and elegant way? I do not
> > think sendto() is the right answer, but what about poll()? Can we wire
> > that up to kick Rx through the busy poll mechanism? I would like the
> > solution to be simple and not complicate things for the user, if
> > possible. Another idea would be to use a timer in the driver to
> > periodically check the fill queue in the case it is empty. This would
> > be easy for application writers but add complexity for people
> > implementing AF_XDP ZC support in their drivers. Any ideas? What do
> > you prefer?
> >
>
> Hmm, actually your patch *does* cover the starve Rx scenarios. One
> downside, which is related to the fact that the AF_XDP ZC ndo doens't
> state if there actually is a socket that needs Rx (we'll just enable
> Rx zero unconditionally), is that the driver will try to allocate
> buffers from the fill ring even if the are no sockets with Rx
> queues. In this case, the fill ring probably never be filled from
> userland. :-) But fixing that would go in to another patch IMO!
>
> I think we can leave the NAPI discussion a side for now. Currently,
> the SPSC queue setup require a single NAPI context. Let's relax that
> later (if ever). (Or did I misunderstand your points?)
OK, good. Thanks. And yes, the NAPI discussion was only for a possible
future state, if ever.
/Magnus
> Cheers,
> Björn
>
>
> > Thanks: Magnus
> >
> > > Putting the kick in ixgbe_xdp_setup() seems a tiny bit random to me,
> > > but perhaps I'm not seeing the rationale clearly.
> > _______________________________________________
> > Intel-wired-lan mailing list
> > Intel-wired-lan@osuosl.org
> > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply
* Re: general protection fault in __xfrm_policy_bysel_ctx
From: Florian Westphal @ 2019-01-30 14:20 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Florian Westphal, syzbot, David Miller, Herbert Xu, LKML, netdev,
Steffen Klassert, syzkaller-bugs
In-Reply-To: <CACT4Y+YB_icD2B6Bj6=5KjAvfamyCUeLnYHm5i5PQKO35=RMFA@mail.gmail.com>
Dmitry Vyukov <dvyukov@google.com> wrote:
> > syzbot <syzbot+e6e1fe9148cffa18cf97@syzkaller.appspotmail.com> wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit: 085c4c7dd2b6 net: lmc: remove -I. header search path
> > > git tree: net-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=12347128c00000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=505743eba4e4f68
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=e6e1fe9148cffa18cf97
> > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > net-next doesn't contain the fixes for the rbtree fallout yet, so
> > this might already be fixed (fingers crossed).
>
> Hi Florian,
>
> What is that fix for the record?
I don't know. I managed to add every bug class imagineable in that series 8-(
The last (most recent) fix from the 'fallout cleanup' is:
12750abad517a991c4568969bc748db302ab52cd
("xfrm: policy: fix infinite loop when merging src-nodes")
so if syzkaller can generate a splat with that change present
something is still broken.
> We will need to close this later. Or perhaps we can already mark this
> as fixed by that patch with "#syz fix:" command?
There are a lot of open xfrm related splats that could all be explained
by the rbtree bugs (one had a reproducer, the fix has appropriate
reported-by tag).
It would be great if there was a way to tell syzkaller to report those
again if they still appear.
I could pretend and claim above commit as "sys-fix", but it seems fishy.
Let me know and I can tag all of them.
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH net] ixgbe: fix potential RX buffer starvation for AF_XDP
From: Björn Töpel @ 2019-01-30 14:16 UTC (permalink / raw)
To: Magnus Karlsson
Cc: Jakub Kicinski, Network Development, Björn Töpel,
intel-wired-lan, Magnus Karlsson
In-Reply-To: <CAJ8uoz1GJBmC0GFbURvEzY4kDZZ6C7O9+1F+gV0y=GOMGLobUQ@mail.gmail.com>
Den ons 30 jan. 2019 kl 10:35 skrev Magnus Karlsson <magnus.karlsson@gmail.com>:
>
> On Tue, Jan 29, 2019 at 8:15 PM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
[...]
> >
> > I may not understand the problem fully, but isn't it kind of normal
> > that if you create a ring empty you'll never receive packets? And it
> > should be reasonably easy to catch while writing an app from scratch
> > (i.e. it behaves deterministically).
>
> Agree that this should be the normal behavior for a NIC. The question
> is how to get out of this situation.There are two options: punt this
> to the application writer or fix this in the driver. I chose to fix
> the driver since this removes complexity in the application.
>
Magnus' fix addresses a race/timing issue. At zero-copy initialization
point, if the fill ring was empty, the driver (both i40e and ixgbe)
would stop retrying to "allocate" zero-copy frames from the fill
ring. So, frames would never be received, even if the fill ring was
filled at a later point.
If the driver runs-dry in terms of Rx buffer if one or more frames has
been received, the driver will retry polling the fill-ring. However at
initialization point, if the fill-ring was empty, the driver would
just give up and never retry.
As Magnus stated, there is no "notify the kernel that the items has
appeared in the fill ring" (other than a HW mechanism where the tail
pointer is a door bell) on the Rx side, so for the Intel drivers it's
up to the driver to solve this.
> > Plus user space can already do the kick manually: create the ring empty,
> > attach prog, put packets on the ring, and kick xmit - that would work,
> > no?
>
> Unfortunately there is no way such a kick could be guaranteed to work
> since it only guarantees to wake up Tx. If the Rx is in another NAPI
> context, then this will not work. On i40e and ixgbe, the Rx and Tx
> processing is in the same NAPI so this would indeed work. But can I
> guarantee that this is true for all current and future NICs?
> Personally I would like to have Tx in a separate NAPI as I could get
> better performance that way, at least in the i40e driver. You also do
> not need to have any Tx at all in your socket, as it could be Rx only.
> These arguments are why I went with the approach to fix this in the
> driver, instead of a hard-to-explain extra step in the application
> that might or might not work depending on your driver.
>
> I believe the crux of the problem is that we need to tell the driver
> to take buffers from the fill ring then put references to them in the
> HW Rx ring so that packets can be received from the wire. In this
> patch I went for installing an XDP program and calling bind() as "the"
> two places. But there could be a scenario where I do bind then install
> the XDP program and after that populate my fill queue and not call any
> other syscall ever! This would not be covered by this patch :-(.
>
> So you might be correct that the only way to fix this is from the
> application. But how to do it in a simple and elegant way? I do not
> think sendto() is the right answer, but what about poll()? Can we wire
> that up to kick Rx through the busy poll mechanism? I would like the
> solution to be simple and not complicate things for the user, if
> possible. Another idea would be to use a timer in the driver to
> periodically check the fill queue in the case it is empty. This would
> be easy for application writers but add complexity for people
> implementing AF_XDP ZC support in their drivers. Any ideas? What do
> you prefer?
>
Hmm, actually your patch *does* cover the starve Rx scenarios. One
downside, which is related to the fact that the AF_XDP ZC ndo doens't
state if there actually is a socket that needs Rx (we'll just enable
Rx zero unconditionally), is that the driver will try to allocate
buffers from the fill ring even if the are no sockets with Rx
queues. In this case, the fill ring probably never be filled from
userland. :-) But fixing that would go in to another patch IMO!
I think we can leave the NAPI discussion a side for now. Currently,
the SPSC queue setup require a single NAPI context. Let's relax that
later (if ever). (Or did I misunderstand your points?)
Cheers,
Björn
> Thanks: Magnus
>
> > Putting the kick in ixgbe_xdp_setup() seems a tiny bit random to me,
> > but perhaps I'm not seeing the rationale clearly.
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply
* Re: [PATCH net-next v6 1/2] net: add support for Cavium PTP coprocessor
From: Bjorn Helgaas @ 2019-01-30 14:09 UTC (permalink / raw)
To: Aleksey Makarov
Cc: netdev, Robert Richter, Goutham, Sunil, Richard Cochran,
Radoslaw Biernacki, linux-kernel, David Daney,
Philippe Ombredanne, Joe Perches, linux-arm-kernel
In-Reply-To: <20180115124500.14872-2-aleksey.makarov@cavium.com>
On Mon, Jan 15, 2018 at 06:44:56PM +0600, Aleksey Makarov wrote:
> From: Radoslaw Biernacki <rad@semihalf.com>
>
> This patch adds support for the Precision Time Protocol
> Clocks and Timestamping hardware found on Cavium ThunderX
> processors.
>
> Signed-off-by: Radoslaw Biernacki <rad@semihalf.com>
> Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
> Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
> ---
> drivers/net/ethernet/cavium/Kconfig | 12 +
> drivers/net/ethernet/cavium/Makefile | 1 +
> drivers/net/ethernet/cavium/common/Makefile | 1 +
> drivers/net/ethernet/cavium/common/cavium_ptp.c | 353 ++++++++++++++++++++++++
> drivers/net/ethernet/cavium/common/cavium_ptp.h | 70 +++++
> 5 files changed, 437 insertions(+)
> create mode 100644 drivers/net/ethernet/cavium/common/Makefile
> create mode 100644 drivers/net/ethernet/cavium/common/cavium_ptp.c
> create mode 100644 drivers/net/ethernet/cavium/common/cavium_ptp.h
>
> diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig
> index 63be75eb34d2..96586c0b4490 100644
> --- a/drivers/net/ethernet/cavium/Kconfig
> +++ b/drivers/net/ethernet/cavium/Kconfig
> @@ -50,6 +50,18 @@ config THUNDER_NIC_RGX
> This driver supports configuring XCV block of RGX interface
> present on CN81XX chip.
>
> +config CAVIUM_PTP
> + tristate "Cavium PTP coprocessor as PTP clock"
> + depends on 64BIT
> + imply PTP_1588_CLOCK
> + default y
Why is this "default y"?
It looks like this is a PCI driver and probably should be loaded only
when the PCI device is present.
> + ---help---
> + This driver adds support for the Precision Time Protocol Clocks and
> + Timestamping coprocessor (PTP) found on Cavium processors.
> + PTP provides timestamping mechanism that is suitable for use in IEEE 1588
> + Precision Time Protocol or other purposes. Timestamps can be used in
> + BGX, TNS, GTI, and NIC blocks.
^ permalink raw reply
* Re: general protection fault in __xfrm_policy_bysel_ctx
From: Dmitry Vyukov @ 2019-01-30 14:03 UTC (permalink / raw)
To: Florian Westphal
Cc: syzbot, David Miller, Herbert Xu, LKML, netdev, Steffen Klassert,
syzkaller-bugs
In-Reply-To: <20190129094144.bln4dnirr5kz3dl4@breakpoint.cc>
On Tue, Jan 29, 2019 at 10:41 AM Florian Westphal <fw@strlen.de> wrote:
>
> syzbot <syzbot+e6e1fe9148cffa18cf97@syzkaller.appspotmail.com> wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: 085c4c7dd2b6 net: lmc: remove -I. header search path
> > git tree: net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12347128c00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=505743eba4e4f68
> > dashboard link: https://syzkaller.appspot.com/bug?extid=e6e1fe9148cffa18cf97
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
>
> net-next doesn't contain the fixes for the rbtree fallout yet, so
> this might already be fixed (fingers crossed).
Hi Florian,
What is that fix for the record?
We will need to close this later. Or perhaps we can already mark this
as fixed by that patch with "#syz fix:" command?
^ permalink raw reply
* general protection fault in __sock_release (2)
From: syzbot @ 2019-01-30 14:03 UTC (permalink / raw)
To: davem, linux-kernel, netdev, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: 30e5c2c6bf28 net: Revert devlink health changes.
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1249a2b8c00000
kernel config: https://syzkaller.appspot.com/x/.config?x=505743eba4e4f68
dashboard link: https://syzkaller.appspot.com/bug?extid=38b29941610a1cc735dc
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+38b29941610a1cc735dc@syzkaller.appspotmail.com
bond0 (unregistering): Releasing backup interface bond_slave_1
bond0 (unregistering): Releasing backup interface bond_slave_0
bond0 (unregistering): Released all slaves
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8762 Comm: syz-executor0 Not tainted 5.0.0-rc3+ #21
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__sock_release+0x2d/0x250 net/socket.c:574
Code: e5 41 57 41 56 41 55 41 54 49 89 f4 53 48 89 fb e8 38 d4 6c fb 4c 8d
73 28 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f
85 b1 01 00 00 4c 8b 6b 28 4d 85 ed 0f 84 c7 00 00
RSP: 0018:ffff8880a53bf8f8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90005ff4000
RDX: 0000000000000005 RSI: ffffffff86152ab8 RDI: 0000000000000000
RBP: ffff8880a53bf920 R08: ffff8880a51e2500 R09: fffffbfff1462b55
R10: fffffbfff1462b54 R11: ffffffff8a315aa7 R12: 0000000000000000
R13: 0000000000000007 R14: 0000000000000028 R15: ffff88809fc0eb88
FS: 00007fabf4208700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000402e9d CR3: 00000000a585c000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
sock_release+0x18/0x20 net/socket.c:598
inet_ctl_sock_destroy include/net/inet_common.h:56 [inline]
icmp_sk_exit+0x11f/0x1f0 net/ipv4/icmp.c:1187
ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
setup_net+0x4b2/0x8c0 net/core/net_namespace.c:331
copy_net_ns+0x2ae/0x4b0 net/core/net_namespace.c:437
create_new_namespaces+0x4ce/0x930 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x6d7/0xfb0 kernel/fork.c:2550
__do_sys_unshare kernel/fork.c:2618 [inline]
__se_sys_unshare kernel/fork.c:2616 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2616
do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fabf4207c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000458099
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000
RBP: 000000000073bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fabf42086d4
R13: 00000000004c6de4 R14: 00000000004dc408 R15: 00000000ffffffff
Modules linked in:
---[ end trace 320ccc001dc11318 ]---
RIP: 0010:__sock_release+0x2d/0x250 net/socket.c:574
Code: e5 41 57 41 56 41 55 41 54 49 89 f4 53 48 89 fb e8 38 d4 6c fb 4c 8d
73 28 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f
85 b1 01 00 00 4c 8b 6b 28 4d 85 ed 0f 84 c7 00 00
RSP: 0018:ffff8880a53bf8f8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90005ff4000
RDX: 0000000000000005 RSI: ffffffff86152ab8 RDI: 0000000000000000
RBP: ffff8880a53bf920 R08: ffff8880a51e2500 R09: fffffbfff1462b55
R10: fffffbfff1462b54 R11: ffffffff8a315aa7 R12: 0000000000000000
R13: 0000000000000007 R14: 0000000000000028 R15: ffff88809fc0eb88
FS: 00007fabf4208700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000402e9d CR3: 00000000a585c000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
^ permalink raw reply
* [PATCH bpf-next] bpf: add optional memory accounting for maps
From: Martynas Pumputis @ 2019-01-30 14:02 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, m
Previously, memory allocated for a map was not accounted. Therefore,
this memory could not be taken into consideration by the cgroups
memory controller.
This patch introduces the "BPF_F_ACCOUNT_MEM" flag which enables
the memory accounting for a map, and it can be set during
the map creation ("BPF_MAP_CREATE") in "map_flags".
When enabled, we account only that amount of memory which is charged
against the "RLIMIT_MEMLOCK" limit.
To validate the change, first we create the memory cgroup "test-map":
# mkdir /sys/fs/cgroup/memory/test-map
And then we run the following program against the cgroup:
$ cat test_map.c
<..>
int main() {
usleep(3 * 1000000);
assert(bpf_create_map(BPF_MAP_TYPE_HASH, 8, 16, 65536, 0) > 0);
usleep(3 * 1000000);
}
# cgexec -g memory:test-map ./test_map &
# cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
397312
258048
<after 3 sec the map has been created>
# bpftool map list
19: hash flags 0x0
key 8B value 16B max_entries 65536 memlock 5771264B
# cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
401408
262144
As we can see, the memory allocated for map is not accounted, as
397312B + 5771264B > 401408B.
Next, we enabled the accounting and re-run the test:
$ cat test_map.c
<..>
int main() {
usleep(3 * 1000000);
assert(bpf_create_map(BPF_MAP_TYPE_HASH, 8, 16, 65536, BPF_F_ACCOUNT_MEM) > 0);
usleep(3 * 1000000);
}
# cgexec -g memory:test-map ./test_map &
# cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
450560
307200
<after 3 sec the map has been created>
# bpftool map list
20: hash flags 0x80
key 8B value 16B max_entries 65536 memlock 5771264B
# cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
6221824
6078464
This time, the memory (including kmem) is accounted, as
450560B + 5771264B <= 6221824B
Signed-off-by: Martynas Pumputis <m@lambda.lt>
---
include/linux/bpf.h | 5 +++--
include/uapi/linux/bpf.h | 2 ++
kernel/bpf/arraymap.c | 14 +++++++++-----
kernel/bpf/bpf_lru_list.c | 11 +++++++++--
kernel/bpf/bpf_lru_list.h | 1 +
kernel/bpf/cpumap.c | 12 +++++++++---
kernel/bpf/devmap.c | 10 ++++++++--
kernel/bpf/hashtab.c | 19 ++++++++++++++-----
kernel/bpf/lpm_trie.c | 19 +++++++++++++------
kernel/bpf/queue_stack_maps.c | 5 +++--
kernel/bpf/reuseport_array.c | 3 ++-
kernel/bpf/stackmap.c | 12 ++++++++----
kernel/bpf/syscall.c | 12 ++++++++----
kernel/bpf/xskmap.c | 9 +++++++--
net/core/sock_map.c | 13 +++++++++----
tools/include/uapi/linux/bpf.h | 3 +++
16 files changed, 108 insertions(+), 42 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e734f163bd0b..353a3f4304fe 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -79,7 +79,8 @@ struct bpf_map {
u32 btf_value_type_id;
struct btf *btf;
bool unpriv_array;
- /* 55 bytes hole */
+ bool account_mem;
+ /* 54 bytes hole */
/* The 3rd and 4th cacheline with misc members to avoid false sharing
* particularly with refcounting.
@@ -506,7 +507,7 @@ void bpf_map_put(struct bpf_map *map);
int bpf_map_precharge_memlock(u32 pages);
int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
-void *bpf_map_area_alloc(size_t size, int numa_node);
+void *bpf_map_area_alloc(size_t size, int numa_node, bool account_mem);
void bpf_map_area_free(void *base);
void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 91c43884f295..a374ccbaa51b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -278,6 +278,8 @@ enum bpf_attach_type {
#define BPF_F_NO_COMMON_LRU (1U << 1)
/* Specify numa node during map creation */
#define BPF_F_NUMA_NODE (1U << 2)
+/* Enable memory accounting for map */
+#define BPF_F_ACCOUNT_MEM (1U << 7)
#define BPF_OBJ_NAME_LEN 16U
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 25632a75d630..86417f2e6f1b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -34,14 +34,17 @@ static void bpf_array_free_percpu(struct bpf_array *array)
}
}
-static int bpf_array_alloc_percpu(struct bpf_array *array)
+static int bpf_array_alloc_percpu(struct bpf_array *array, bool account_mem)
{
void __percpu *ptr;
+ gfp_t gfp = GFP_USER | __GFP_NOWARN;
int i;
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
+
for (i = 0; i < array->map.max_entries; i++) {
- ptr = __alloc_percpu_gfp(array->elem_size, 8,
- GFP_USER | __GFP_NOWARN);
+ ptr = __alloc_percpu_gfp(array->elem_size, 8, gfp);
if (!ptr) {
bpf_array_free_percpu(array);
return -ENOMEM;
@@ -82,6 +85,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
u32 elem_size, index_mask, max_entries;
bool unpriv = !capable(CAP_SYS_ADMIN);
u64 cost, array_size, mask64;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
struct bpf_array *array;
elem_size = round_up(attr->value_size, 8);
@@ -129,7 +133,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
return ERR_PTR(ret);
/* allocate all map elements and zero-initialize them */
- array = bpf_map_area_alloc(array_size, numa_node);
+ array = bpf_map_area_alloc(array_size, numa_node, account_mem);
if (!array)
return ERR_PTR(-ENOMEM);
array->index_mask = index_mask;
@@ -140,7 +144,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
array->map.pages = cost;
array->elem_size = elem_size;
- if (percpu && bpf_array_alloc_percpu(array)) {
+ if (percpu && bpf_array_alloc_percpu(array, account_mem)) {
bpf_map_area_free(array);
return ERR_PTR(-ENOMEM);
}
diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
index e6ef4401a138..4d58537e0af2 100644
--- a/kernel/bpf/bpf_lru_list.c
+++ b/kernel/bpf/bpf_lru_list.c
@@ -7,6 +7,7 @@
#include <linux/cpumask.h>
#include <linux/spinlock.h>
#include <linux/percpu.h>
+#include <linux/gfp.h>
#include "bpf_lru_list.h"
@@ -646,12 +647,17 @@ static void bpf_lru_list_init(struct bpf_lru_list *l)
}
int bpf_lru_init(struct bpf_lru *lru, bool percpu, u32 hash_offset,
+ bool account_mem,
del_from_htab_func del_from_htab, void *del_arg)
{
+ gfp_t gfp = GFP_KERNEL;
int cpu;
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
+
if (percpu) {
- lru->percpu_lru = alloc_percpu(struct bpf_lru_list);
+ lru->percpu_lru = alloc_percpu_gfp(struct bpf_lru_list, gfp);
if (!lru->percpu_lru)
return -ENOMEM;
@@ -665,7 +671,8 @@ int bpf_lru_init(struct bpf_lru *lru, bool percpu, u32 hash_offset,
} else {
struct bpf_common_lru *clru = &lru->common_lru;
- clru->local_list = alloc_percpu(struct bpf_lru_locallist);
+ clru->local_list = alloc_percpu_gfp(struct bpf_lru_locallist,
+ gfp);
if (!clru->local_list)
return -ENOMEM;
diff --git a/kernel/bpf/bpf_lru_list.h b/kernel/bpf/bpf_lru_list.h
index 7d4f89b7cb84..89566665592b 100644
--- a/kernel/bpf/bpf_lru_list.h
+++ b/kernel/bpf/bpf_lru_list.h
@@ -74,6 +74,7 @@ static inline void bpf_lru_node_set_ref(struct bpf_lru_node *node)
}
int bpf_lru_init(struct bpf_lru *lru, bool percpu, u32 hash_offset,
+ bool account_mem,
del_from_htab_func del_from_htab, void *delete_arg);
void bpf_lru_populate(struct bpf_lru *lru, void *buf, u32 node_offset,
u32 elem_size, u32 nr_elems);
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 8974b3755670..1e84bf78716e 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -81,6 +81,8 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
struct bpf_cpu_map *cmap;
int err = -ENOMEM;
u64 cost;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
+ gfp_t gfp = GFP_KERNEL;
int ret;
if (!capable(CAP_SYS_ADMIN))
@@ -117,16 +119,20 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
goto free_cmap;
}
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
/* A per cpu bitfield with a bit per possible CPU in map */
- cmap->flush_needed = __alloc_percpu(cpu_map_bitmap_size(attr),
- __alignof__(unsigned long));
+ cmap->flush_needed = __alloc_percpu_gfp(cpu_map_bitmap_size(attr),
+ __alignof__(unsigned long),
+ gfp);
if (!cmap->flush_needed)
goto free_cmap;
/* Alloc array for possible remote "destination" CPUs */
cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
sizeof(struct bpf_cpu_map_entry *),
- cmap->map.numa_node);
+ cmap->map.numa_node,
+ account_mem);
if (!cmap->cpu_map)
goto free_percpu;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 191b79948424..acfc1b35aa51 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -90,6 +90,8 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
struct bpf_dtab *dtab;
int err = -EINVAL;
u64 cost;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
+ gfp_t gfp;
if (!capable(CAP_NET_ADMIN))
return ERR_PTR(-EPERM);
@@ -120,16 +122,20 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
err = -ENOMEM;
+ gfp = GFP_KERNEL | __GFP_NOWARN;
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
/* A per cpu bitfield with a bit per possible net device */
dtab->flush_needed = __alloc_percpu_gfp(dev_map_bitmap_size(attr),
__alignof__(unsigned long),
- GFP_KERNEL | __GFP_NOWARN);
+ gfp);
if (!dtab->flush_needed)
goto free_dtab;
dtab->netdev_map = bpf_map_area_alloc(dtab->map.max_entries *
sizeof(struct bpf_dtab_netdev *),
- dtab->map.numa_node);
+ dtab->map.numa_node,
+ account_mem);
if (!dtab->netdev_map)
goto free_dtab;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4b7c76765d9d..fc2f44451256 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -23,6 +23,7 @@
#define HTAB_CREATE_FLAG_MASK \
(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \
+ BPF_F_ACCOUNT_MEM | \
BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ZERO_SEED)
struct bucket {
@@ -139,27 +140,32 @@ static struct htab_elem *prealloc_lru_pop(struct bpf_htab *htab, void *key,
return NULL;
}
-static int prealloc_init(struct bpf_htab *htab)
+static int prealloc_init(struct bpf_htab *htab, bool account_mem)
{
u32 num_entries = htab->map.max_entries;
+ gfp_t gfp = GFP_USER | __GFP_NOWARN;
int err = -ENOMEM, i;
if (!htab_is_percpu(htab) && !htab_is_lru(htab))
num_entries += num_possible_cpus();
htab->elems = bpf_map_area_alloc(htab->elem_size * num_entries,
- htab->map.numa_node);
+ htab->map.numa_node,
+ account_mem);
if (!htab->elems)
return -ENOMEM;
if (!htab_is_percpu(htab))
goto skip_percpu_elems;
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
+
for (i = 0; i < num_entries; i++) {
u32 size = round_up(htab->map.value_size, 8);
void __percpu *pptr;
- pptr = __alloc_percpu_gfp(size, 8, GFP_USER | __GFP_NOWARN);
+ pptr = __alloc_percpu_gfp(size, 8, gfp);
if (!pptr)
goto free_elems;
htab_elem_set_ptr(get_htab_elem(htab, i), htab->map.key_size,
@@ -173,6 +179,7 @@ static int prealloc_init(struct bpf_htab *htab)
htab->map.map_flags & BPF_F_NO_COMMON_LRU,
offsetof(struct htab_elem, hash) -
offsetof(struct htab_elem, lru_node),
+ account_mem,
htab_lru_map_delete_node,
htab);
else
@@ -313,6 +320,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
*/
bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
struct bpf_htab *htab;
int err, i;
u64 cost;
@@ -374,7 +382,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
err = -ENOMEM;
htab->buckets = bpf_map_area_alloc(htab->n_buckets *
sizeof(struct bucket),
- htab->map.numa_node);
+ htab->map.numa_node,
+ account_mem);
if (!htab->buckets)
goto free_htab;
@@ -389,7 +398,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
}
if (prealloc) {
- err = prealloc_init(htab);
+ err = prealloc_init(htab, account_mem);
if (err)
goto free_buckets;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index abf1002080df..8421fdb816f3 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -277,16 +277,19 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
}
static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
- const void *value)
+ const void *value,
+ bool account_mem)
{
struct lpm_trie_node *node;
size_t size = sizeof(struct lpm_trie_node) + trie->data_size;
+ gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
if (value)
size += trie->map.value_size;
- node = kmalloc_node(size, GFP_ATOMIC | __GFP_NOWARN,
- trie->map.numa_node);
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
+ node = kmalloc_node(size, gfp, trie->map.numa_node);
if (!node)
return NULL;
@@ -327,7 +330,7 @@ static int trie_update_elem(struct bpf_map *map,
goto out;
}
- new_node = lpm_trie_node_alloc(trie, value);
+ new_node = lpm_trie_node_alloc(trie, value, map->account_mem);
if (!new_node) {
ret = -ENOMEM;
goto out;
@@ -394,7 +397,7 @@ static int trie_update_elem(struct bpf_map *map,
goto out;
}
- im_node = lpm_trie_node_alloc(trie, NULL);
+ im_node = lpm_trie_node_alloc(trie, NULL, map->account_mem);
if (!im_node) {
ret = -ENOMEM;
goto out;
@@ -542,6 +545,8 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
static struct bpf_map *trie_alloc(union bpf_attr *attr)
{
struct lpm_trie *trie;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
+ gfp_t gfp = GFP_USER | __GFP_NOWARN;
u64 cost = sizeof(*trie), cost_per_node;
int ret;
@@ -558,7 +563,9 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
attr->value_size > LPM_VAL_SIZE_MAX)
return ERR_PTR(-EINVAL);
- trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN);
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
+ trie = kzalloc(sizeof(*trie), gfp);
if (!trie)
return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index b384ea9f3254..040ec350af3d 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -11,7 +11,7 @@
#include "percpu_freelist.h"
#define QUEUE_STACK_CREATE_FLAG_MASK \
- (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ACCOUNT_MEM)
struct bpf_queue_stack {
@@ -69,6 +69,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
int ret, numa_node = bpf_map_attr_numa_node(attr);
struct bpf_queue_stack *qs;
u64 size, queue_size, cost;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
size = (u64) attr->max_entries + 1;
cost = queue_size = sizeof(*qs) + size * attr->value_size;
@@ -81,7 +82,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
if (ret < 0)
return ERR_PTR(ret);
- qs = bpf_map_area_alloc(queue_size, numa_node);
+ qs = bpf_map_area_alloc(queue_size, numa_node, account_mem);
if (!qs)
return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 18e225de80ff..a9a2709c7507 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -152,6 +152,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
int err, numa_node = bpf_map_attr_numa_node(attr);
struct reuseport_array *array;
u64 cost, array_size;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
if (!capable(CAP_SYS_ADMIN))
return ERR_PTR(-EPERM);
@@ -170,7 +171,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
return ERR_PTR(err);
/* allocate all map elements and zero-initialize them */
- array = bpf_map_area_alloc(array_size, numa_node);
+ array = bpf_map_area_alloc(array_size, numa_node, account_mem);
if (!array)
return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index d43b14535827..46d37c7e09a2 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -61,13 +61,15 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}
-static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
+static int prealloc_elems_and_freelist(struct bpf_stack_map *smap,
+ bool account_mem)
{
u32 elem_size = sizeof(struct stack_map_bucket) + smap->map.value_size;
int err;
smap->elems = bpf_map_area_alloc(elem_size * smap->map.max_entries,
- smap->map.numa_node);
+ smap->map.numa_node,
+ account_mem);
if (!smap->elems)
return -ENOMEM;
@@ -90,6 +92,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
u32 value_size = attr->value_size;
struct bpf_stack_map *smap;
u64 cost, n_buckets;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
int err;
if (!capable(CAP_SYS_ADMIN))
@@ -119,7 +122,8 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
if (cost >= U32_MAX - PAGE_SIZE)
return ERR_PTR(-E2BIG);
- smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr));
+ smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr),
+ account_mem);
if (!smap)
return ERR_PTR(-ENOMEM);
@@ -141,7 +145,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
if (err)
goto free_smap;
- err = prealloc_elems_and_freelist(smap);
+ err = prealloc_elems_and_freelist(smap, account_mem);
if (err)
goto put_buffers;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b155cd17c1bd..13f2e1731a47 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -131,25 +131,29 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
return map;
map->ops = ops;
map->map_type = type;
+ map->account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
return map;
}
-void *bpf_map_area_alloc(size_t size, int numa_node)
+void *bpf_map_area_alloc(size_t size, int numa_node, bool account_mem)
{
/* We definitely need __GFP_NORETRY, so OOM killer doesn't
* trigger under memory pressure as we really just want to
* fail instead.
*/
- const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
+ gfp_t gfp = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
void *area;
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
+
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
- area = kmalloc_node(size, GFP_USER | flags, numa_node);
+ area = kmalloc_node(size, GFP_USER | gfp, numa_node);
if (area != NULL)
return area;
}
- return __vmalloc_node_flags_caller(size, numa_node, GFP_KERNEL | flags,
+ return __vmalloc_node_flags_caller(size, numa_node, GFP_KERNEL | gfp,
__builtin_return_address(0));
}
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 686d244e798d..bbc1f142326f 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -20,6 +20,8 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
int cpu, err = -EINVAL;
struct xsk_map *m;
u64 cost;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
+ gfp_t gfp = GFP_KERNEL;
if (!capable(CAP_NET_ADMIN))
return ERR_PTR(-EPERM);
@@ -49,7 +51,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
err = -ENOMEM;
- m->flush_list = alloc_percpu(struct list_head);
+ if (account_mem)
+ gfp |= __GFP_ACCOUNT;
+ m->flush_list = alloc_percpu_gfp(struct list_head, gfp);
if (!m->flush_list)
goto free_m;
@@ -58,7 +62,8 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
m->xsk_map = bpf_map_area_alloc(m->map.max_entries *
sizeof(struct xdp_sock *),
- m->map.numa_node);
+ m->map.numa_node,
+ account_mem);
if (!m->xsk_map)
goto free_percpu;
return &m->map;
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index be6092ac69f8..eefcfd1294c0 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -18,13 +18,15 @@ struct bpf_stab {
raw_spinlock_t lock;
};
-#define SOCK_CREATE_FLAG_MASK \
- (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+#define SOCK_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY | \
+ BPF_F_ACCOUNT_MEM)
static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
{
struct bpf_stab *stab;
u64 cost;
+ bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
int err;
if (!capable(CAP_NET_ADMIN))
@@ -56,7 +58,8 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
stab->sks = bpf_map_area_alloc(stab->map.max_entries *
sizeof(struct sock *),
- stab->map.numa_node);
+ stab->map.numa_node,
+ account_mem);
if (stab->sks)
return &stab->map;
err = -ENOMEM;
@@ -788,6 +791,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
struct bpf_htab *htab;
int i, err;
u64 cost;
+ bool account = (attr->map_flags & BPF_F_ACCOUNT_MEM);
if (!capable(CAP_NET_ADMIN))
return ERR_PTR(-EPERM);
@@ -823,7 +827,8 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
htab->buckets = bpf_map_area_alloc(htab->buckets_num *
sizeof(struct bpf_htab_bucket),
- htab->map.numa_node);
+ htab->map.numa_node,
+ account);
if (!htab->buckets) {
err = -ENOMEM;
goto free_htab;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 91c43884f295..dc490e3fdce3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -278,6 +278,9 @@ enum bpf_attach_type {
#define BPF_F_NO_COMMON_LRU (1U << 1)
/* Specify numa node during map creation */
#define BPF_F_NUMA_NODE (1U << 2)
+/* Enable memory accounting for map */
+#define BPF_F_ACCOUNT_MEM (1U << 7)
+
#define BPF_OBJ_NAME_LEN 16U
--
2.20.1
^ permalink raw reply related
* Re: Kernel memory corruption in CIPSO labeled TCP packets processing.
From: Nazarov Sergey @ 2019-01-30 13:11 UTC (permalink / raw)
To: Paul Moore
Cc: linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
netdev@vger.kernel.org, Casey Schaufler
In-Reply-To: <CAHC9VhRXc=UAfNidPjYsG561_kc_j2CkD6Db=Kb=Gjyhk3ko6Q@mail.gmail.com>
30.01.2019, 01:42, "Paul Moore" <paul@paul-moore.com>:
> There are several cases where the stack ends up calling icmp_send()
> after the skb has been through ip_options_compile(), that should be
> okay.
>
> --
> paul moore
> www.paul-moore.com
In those cases precompiled ip_options struct used, without the need to reuse ip_options_compile.
I think, for error ICMP packet, we can discard all other options except CIPSO. It will be better, than
send packet, contains wrong option's data. Modified patch 2:
---
net/ipv4/cipso_ipv4.c | 24 ++++++++++++++++++++++--
1 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 777fa3b..797826c 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1735,13 +1735,33 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
*/
void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
{
+ struct ip_options opt;
+ unsigned char *optptr;
+
if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
return;
+ /*
+ * We might be called above the IP layer,
+ * so we can not use icmp_send and IPCB here.
+ *
+ * For the generated ICMP packet, we create a
+ * temporary ip _options structure, contains
+ * the CIPSO option only, since the other options data
+ * could be modified when the original packet receiving.
+ */
+
+ memset(&opt, 0, sizeof(struct ip_options));
+ optptr = cipso_v4_optptr(skb);
+ if (optptr) {
+ opt.optlen = optptr[1];
+ opt.cipso = optptr - skb_network_header(skb);
+ }
+
if (gateway)
- icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0);
+ __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, &opt);
else
- icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0);
+ __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, &opt);
}
/**
^ permalink raw reply related
* [PATCH net] ipvlan, l3mdev: fix broken l3s mode wrt local routes
From: Daniel Borkmann @ 2019-01-30 11:49 UTC (permalink / raw)
To: davem
Cc: netdev, Daniel Borkmann, Mahesh Bandewar, David Ahern,
Florian Westphal, Martynas Pumputis
While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.
Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net->loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.
Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb->dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.
Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev->priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.
[0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Martynas Pumputis <m@lambda.lt>
---
drivers/net/ipvlan/ipvlan_main.c | 6 +++---
include/linux/netdevice.h | 8 ++++++++
include/net/l3mdev.h | 3 ++-
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 19bdde60680c..7cdac77d0c68 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -100,12 +100,12 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval,
err = ipvlan_register_nf_hook(read_pnet(&port->pnet));
if (!err) {
mdev->l3mdev_ops = &ipvl_l3mdev_ops;
- mdev->priv_flags |= IFF_L3MDEV_MASTER;
+ mdev->priv_flags |= IFF_L3MDEV_RX_HANDLER;
} else
goto fail;
} else if (port->mode == IPVLAN_MODE_L3S) {
/* Old mode was L3S */
- mdev->priv_flags &= ~IFF_L3MDEV_MASTER;
+ mdev->priv_flags &= ~IFF_L3MDEV_RX_HANDLER;
ipvlan_unregister_nf_hook(read_pnet(&port->pnet));
mdev->l3mdev_ops = NULL;
}
@@ -167,7 +167,7 @@ static void ipvlan_port_destroy(struct net_device *dev)
struct sk_buff *skb;
if (port->mode == IPVLAN_MODE_L3S) {
- dev->priv_flags &= ~IFF_L3MDEV_MASTER;
+ dev->priv_flags &= ~IFF_L3MDEV_RX_HANDLER;
ipvlan_unregister_nf_hook(dev_net(dev));
dev->l3mdev_ops = NULL;
}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1377d085ef99..86dbb3e29139 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1483,6 +1483,7 @@ struct net_device_ops {
* @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
* @IFF_FAILOVER: device is a failover master device
* @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
+ * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device
*/
enum netdev_priv_flags {
IFF_802_1Q_VLAN = 1<<0,
@@ -1514,6 +1515,7 @@ enum netdev_priv_flags {
IFF_NO_RX_HANDLER = 1<<26,
IFF_FAILOVER = 1<<27,
IFF_FAILOVER_SLAVE = 1<<28,
+ IFF_L3MDEV_RX_HANDLER = 1<<29,
};
#define IFF_802_1Q_VLAN IFF_802_1Q_VLAN
@@ -1544,6 +1546,7 @@ enum netdev_priv_flags {
#define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER
#define IFF_FAILOVER IFF_FAILOVER
#define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE
+#define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER
/**
* struct net_device - The DEVICE structure.
@@ -4549,6 +4552,11 @@ static inline bool netif_supports_nofcs(struct net_device *dev)
return dev->priv_flags & IFF_SUPP_NOFCS;
}
+static inline bool netif_has_l3_rx_handler(const struct net_device *dev)
+{
+ return dev->priv_flags & IFF_L3MDEV_RX_HANDLER;
+}
+
static inline bool netif_is_l3_master(const struct net_device *dev)
{
return dev->priv_flags & IFF_L3MDEV_MASTER;
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 78fa0ac4613c..5175fd63cd82 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -153,7 +153,8 @@ struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
if (netif_is_l3_slave(skb->dev))
master = netdev_master_upper_dev_get_rcu(skb->dev);
- else if (netif_is_l3_master(skb->dev))
+ else if (netif_is_l3_master(skb->dev) ||
+ netif_has_l3_rx_handler(skb->dev))
master = skb->dev;
if (master && master->l3mdev_ops->l3mdev_l3_rcv)
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] esp: Skip TX bytes accounting when sending from a request socket
From: Steffen Klassert @ 2019-01-30 11:33 UTC (permalink / raw)
To: Martin Willi; +Cc: netdev
In-Reply-To: <20190128083535.18006-1-martin@strongswan.org>
On Mon, Jan 28, 2019 at 09:35:35AM +0100, Martin Willi wrote:
> On ESP output, sk_wmem_alloc is incremented for the added padding if a
> socket is associated to the skb. When replying with TCP SYNACKs over
> IPsec, the associated sk is a casted request socket, only. Increasing
> sk_wmem_alloc on a request socket results in a write at an arbitrary
> struct offset. In the best case, this produces the following WARNING:
>
> WARNING: CPU: 1 PID: 0 at lib/refcount.c:102 esp_output_head+0x2e4/0x308 [esp4]
> refcount_t: addition on 0; use-after-free.
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3 #2
> Hardware name: Marvell Armada 380/385 (Device Tree)
> [...]
> [<bf0ff354>] (esp_output_head [esp4]) from [<bf1006a4>] (esp_output+0xb8/0x180 [esp4])
> [<bf1006a4>] (esp_output [esp4]) from [<c05dee64>] (xfrm_output_resume+0x558/0x664)
> [<c05dee64>] (xfrm_output_resume) from [<c05d07b0>] (xfrm4_output+0x44/0xc4)
> [<c05d07b0>] (xfrm4_output) from [<c05956bc>] (tcp_v4_send_synack+0xa8/0xe8)
> [<c05956bc>] (tcp_v4_send_synack) from [<c0586ad8>] (tcp_conn_request+0x7f4/0x948)
> [<c0586ad8>] (tcp_conn_request) from [<c058c404>] (tcp_rcv_state_process+0x2a0/0xe64)
> [<c058c404>] (tcp_rcv_state_process) from [<c05958ac>] (tcp_v4_do_rcv+0xf0/0x1f4)
> [<c05958ac>] (tcp_v4_do_rcv) from [<c0598a4c>] (tcp_v4_rcv+0xdb8/0xe20)
> [<c0598a4c>] (tcp_v4_rcv) from [<c056eb74>] (ip_protocol_deliver_rcu+0x2c/0x2dc)
> [<c056eb74>] (ip_protocol_deliver_rcu) from [<c056ee6c>] (ip_local_deliver_finish+0x48/0x54)
> [<c056ee6c>] (ip_local_deliver_finish) from [<c056eecc>] (ip_local_deliver+0x54/0xec)
> [<c056eecc>] (ip_local_deliver) from [<c056efac>] (ip_rcv+0x48/0xb8)
> [<c056efac>] (ip_rcv) from [<c0519c2c>] (__netif_receive_skb_one_core+0x50/0x6c)
> [...]
>
> The issue triggers only when not using TCP syncookies, as for syncookies
> no socket is associated.
>
> Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
> Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")
> Signed-off-by: Martin Willi <martin@strongswan.org>
Applied, thanks Martin!
^ permalink raw reply
* [PATCH 1/1] netdev/phy: add MDIO bus multiplexer driven by a regmap
From: Pankaj Bansal @ 2019-01-30 11:22 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli; +Cc: netdev@vger.kernel.org, Pankaj Bansal
In-Reply-To: <20190130164644.3948-1-pankaj.bansal@nxp.com>
Add support for an MDIO bus multiplexer controlled by a regmap
device, like an FPGA.
Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA
attached to the i2c bus.
Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
---
drivers/net/phy/Makefile | 2 +-
drivers/net/phy/mdio-mux-regmap.c | 170 ++++++++++++++++++++++++++++
include/linux/mdio-mux.h | 20 ++++
3 files changed, 191 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index f41b14115fde..16145973a42f 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -25,7 +25,7 @@ obj-$(CONFIG_PHYLIB) += libphy.o
obj-$(CONFIG_MDIO_BCM_IPROC) += mdio-bcm-iproc.o
obj-$(CONFIG_MDIO_BCM_UNIMAC) += mdio-bcm-unimac.o
obj-$(CONFIG_MDIO_BITBANG) += mdio-bitbang.o
-obj-$(CONFIG_MDIO_BUS_MUX) += mdio-mux.o
+obj-$(CONFIG_MDIO_BUS_MUX) += mdio-mux.o mdio-mux-regmap.o
obj-$(CONFIG_MDIO_BUS_MUX_BCM_IPROC) += mdio-mux-bcm-iproc.o
obj-$(CONFIG_MDIO_BUS_MUX_GPIO) += mdio-mux-gpio.o
obj-$(CONFIG_MDIO_BUS_MUX_MMIOREG) += mdio-mux-mmioreg.o
diff --git a/drivers/net/phy/mdio-mux-regmap.c b/drivers/net/phy/mdio-mux-regmap.c
new file mode 100644
index 000000000000..ca63a44da25f
--- /dev/null
+++ b/drivers/net/phy/mdio-mux-regmap.c
@@ -0,0 +1,170 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/* Simple regmap based MDIO MUX driver
+ *
+ * Copyright 2018 NXP
+ *
+ * Based on mdio-mux-mmioreg.c by Timur Tabi
+ *
+ * Author:
+ * Pankaj Bansal <pankaj.ban...@nxp.com>
+ */
+
+#include <linux/platform_device.h>
+#include <linux/device.h>
+#include <linux/of_mdio.h>
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/mdio-mux.h>
+#include <linux/regmap.h>
+
+struct mdio_mux_regmap_state {
+ void *mux_handle;
+ struct device *dev;
+ struct regmap *regmap;
+ u32 mux_reg;
+ u32 mask;
+};
+
+/* MDIO multiplexing switch function
+ *
+ * This function is called by the mdio-mux layer when it thinks the mdio bus
+ * multiplexer needs to switch.
+ *
+ * 'current_child' is the current value of the mux register (masked via
+ * s->mask).
+ *
+ * 'desired_child' is the value of the 'reg' property of the target child MDIO
+ * node.
+ *
+ * The first time this function is called, current_child == -1.
+ *
+ * If current_child == desired_child, then the mux is already set to the
+ * correct bus.
+ */
+static int mdio_mux_regmap_switch_fn(int current_child, int desired_child,
+ void *data)
+{
+ struct mdio_mux_regmap_state *s = data;
+ bool change;
+ int ret;
+
+ ret = regmap_update_bits_check(s->regmap,
+ s->mux_reg,
+ s->mask,
+ desired_child,
+ &change);
+
+ if (ret)
+ return ret;
+ if (change)
+ dev_dbg(s->dev, "%s %d -> %d\n", __func__, current_child,
+ desired_child);
+ return ret;
+}
+
+/**
+ * mdio_mux_regmap_init - control MDIO bus muxing using regmap constructs.
+ * @dev: device with which regmap construct is associated.
+ * @mux_node: mdio bus mux node that contains parent mdio bus phandle.
+ * This node also contains sub nodes, where each subnode denotes
+ * a child mdio bus. All the child mdio buses are muxed, i.e. at a
+ * time only one of the child mdio buses can be used.
+ * @data: to store the address of data allocated by this function
+ */
+int mdio_mux_regmap_init(struct device *dev,
+ struct device_node *mux_node,
+ void **data)
+{
+ struct device_node *child;
+ struct mdio_mux_regmap_state *s;
+ int ret;
+ u32 val;
+
+ dev_dbg(dev, "probing node %pOF\n", mux_node);
+
+ s = devm_kzalloc(dev, sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ s->regmap = dev_get_regmap(dev, NULL);
+ if (IS_ERR(s->regmap)) {
+ dev_err(dev, "Failed to get parent regmap\n");
+ return PTR_ERR(s->regmap);
+ }
+
+ ret = of_property_read_u32(mux_node, "reg", &s->mux_reg);
+ if (ret) {
+ dev_err(dev, "missing or invalid reg property\n");
+ return -ENODEV;
+ }
+
+ /* Test Register read write */
+ ret = regmap_read(s->regmap, s->mux_reg, &val);
+ if (ret) {
+ dev_err(dev, "error while reading reg\n");
+ return ret;
+ }
+
+ ret = regmap_write(s->regmap, s->mux_reg, val);
+ if (ret) {
+ dev_err(dev, "error while writing reg\n");
+ return ret;
+ }
+
+ ret = of_property_read_u32(mux_node, "mux-mask", &s->mask);
+ if (ret) {
+ dev_err(dev, "missing or invalid mux-mask property\n");
+ return -ENODEV;
+ }
+
+ /* Verify that the 'reg' property of each child MDIO bus does not
+ * set any bits outside of the 'mask'.
+ */
+ for_each_available_child_of_node(mux_node, child) {
+ ret = of_property_read_u32(child, "reg", &val);
+ if (ret) {
+ dev_err(dev, "mdio-mux child node %pOF is missing a 'reg' property\n", child);
+ of_node_put(child);
+ return -ENODEV;
+ }
+ if (val & ~s->mask) {
+ dev_err(dev, "mdio-mux child node %pOF has a 'reg' value with unmasked bits\n", child);
+ of_node_put(child);
+ return -ENODEV;
+ }
+ }
+
+ ret = mdio_mux_init(dev, mux_node, mdio_mux_regmap_switch_fn,
+ &s->mux_handle, s, NULL);
+ if (ret) {
+ if (ret != -EPROBE_DEFER)
+ dev_err(dev, "failed to register mdio-mux bus %pOF\n", mux_node);
+ return ret;
+ }
+
+ *data = s;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(mdio_mux_regmap_init);
+
+/**
+ * mdio_mux_regmap_uninit - relinquish the control of MDIO bus muxing using
+ * regmap constructs.
+ * @data: address of data allocated by mdio_mux_regmap_init
+ */
+int mdio_mux_regmap_uninit(void *data)
+{
+ struct mdio_mux_regmap_state *s = data;
+
+ mdio_mux_uninit(s->mux_handle);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(mdio_mux_regmap_uninit);
+
+MODULE_AUTHOR("Pankaj Bansal <pankaj.bansal@nxp.com>");
+MODULE_DESCRIPTION("regmap based MDIO MUX driver");
+MODULE_LICENSE("GPL v2");
+
diff --git a/include/linux/mdio-mux.h b/include/linux/mdio-mux.h
index a5d58f221939..231cfa3ba429 100644
--- a/include/linux/mdio-mux.h
+++ b/include/linux/mdio-mux.h
@@ -29,4 +29,24 @@ int mdio_mux_init(struct device *dev,
void mdio_mux_uninit(void *mux_handle);
+/**
+ * mdio_mux_regmap_init - control MDIO bus muxing using regmap constructs.
+ * @dev: device with which regmap construct is associated.
+ * @mux_node: mdio bus mux node that contains parent mdio bus phandle.
+ * This node also contains sub nodes, where each subnode denotes
+ * a child mdio bus. All the child mdio buses are muxed, i.e. at a
+ * time only one of the child mdio buses can be used.
+ * @data: to store the address of data allocated by this function
+ */
+int mdio_mux_regmap_init(struct device *dev,
+ struct device_node *mux_node,
+ void **data);
+
+/**
+ * mdio_mux_regmap_uninit - relinquish the control of MDIO bus muxing using
+ * regmap constructs.
+ * @data: address of data allocated by mdio_mux_regmap_init
+ */
+int mdio_mux_regmap_uninit(void *data);
+
#endif /* __LINUX_MDIO_MUX_H */
--
2.17.1
^ permalink raw reply related
* [PATCH 0/1] add MDIO bus multiplexer driven by a regmap device
From: Pankaj Bansal @ 2019-01-30 11:21 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli
Cc: netdev@vger.kernel.org, Pankaj Bansal, Varun Sethi
Add support for an MDIO bus multiplexer controlled by a regmap device, like an
FPGA.
These apis is an extension of the existing driver
drivers/net/phy/mdio-mux-mmioreg.c.
The problem with mmioreg driver is that it can operate only on memory mapped
devices.
but if we have a device that controls mdio muxing and that device is controlled
using
i2c or spi, then it will not work.
Therefore, added apis that can be used by regmap device to control mdio mux.
Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA attached to the
i2c bus.
This is my second attempt at this.
In my previous approach i wrote a separate driver for regmap apis. But then i
realized that it is not meant to control a specific device.
It is meant to control some registers of parent device. Therefore, IMO this
should not be a
Platform driver and there should not be any "compatible" property to which this
driver is associated.
The previous approach patches and discussion can be accessed here:
https://www.mail-archive.com/netdev@vger.kernel.org/msg252744.html
Cc: Varun Sethi <V.Sethi@nxp.com>
Pankaj Bansal (1):
netdev/phy: add MDIO bus multiplexer driven by a regmap
drivers/net/phy/Makefile | 2 +-
drivers/net/phy/mdio-mux-regmap.c | 170 ++++++++++++++++++++++++++++++
include/linux/mdio-mux.h | 20 ++++
3 files changed, 191 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/phy/mdio-mux-regmap.c
--
2.17.1
^ permalink raw reply
* Re: [PATCH iproute2-next] Introduce ip-brctl shell script
From: Stefano Brivio @ 2019-01-30 10:55 UTC (permalink / raw)
To: David Ahern
Cc: Phil Sutter, Eric Garver, Tomas Dolezal, Stephen Hemminger,
Lennert Buytenhek, netdev
In-Reply-To: <8b930eb8-6ffe-d03f-02a3-12a332edbba3@gmail.com>
Hi David,
On Tue, 29 Jan 2019 21:51:24 -0700
David Ahern <dsahern@gmail.com> wrote:
> On 1/18/19 10:00 AM, Stefano Brivio wrote:
> > This script wraps 'ip' and 'bridge' tools to provide a drop-in replacement
> > of the standalone 'brctl' utility.
> >
> > It's bug-to-bug compatible with brctl as of bridge-utils version 1.6,
> > has no dependencies other than a POSIX shell, and it's less than half
> > the binary size of brctl on x86_64.
> >
> > As many users (including myself) seem to find brctl usage vastly more
> > intuitive than ip-link, possibly due to habit, this might be a lightweight
> > approach to provide brctl syntax without the need to maintain bridge-utils
> > any longer.
> >
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > Acked-by: Phil Sutter <phil@nwl.cc>
> > ---
> > man/man8/Makefile | 5 +-
> > man/man8/ip-brctl.8 | 187 +++++++++++++++
> > misc/Makefile | 9 +-
> > misc/ip-brctl.in | 572 ++++++++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 770 insertions(+), 3 deletions(-)
> > create mode 100644 man/man8/ip-brctl.8
> > create mode 100755 misc/ip-brctl.in
>
> I get your intent, but this seems more appropriate for you / Red Hat to
> carry than something we want to distribute as part of iproute2.
Sure, I could also do that, but:
- me creating another project: similar maintenance burden for
distribution maintainers as keeping bridge-utils around,
for something that won't have any active development
- carrying it in a single distribution downstream: I would have gone
that way if I thought it wouldn't be useful for others. I myself use
(also) distributions other than Fedora/RHEL and this would feel
just... wrong
Why do you think it's not appropriate to distribute this as part of
iproute2? Too ugly? Bloated? Anything I can improve?
I think it would be appropriate because it intimately depends on
ip-link -- it's really nothing more than a helper for iproute2 tools.
--
Stefano
^ permalink raw reply
* Re: [PATCH] rtlwifi: remove set but not used variable 'cmd_seq'
From: Kalle Valo @ 2019-01-30 10:47 UTC (permalink / raw)
To: Pkshih
Cc: yuehaibing@huawei.com, linux-wireless@vger.kernel.org,
netdev@vger.kernel.org
In-Reply-To: <1548811903.17195.1.camel@realtek.com>
Pkshih <pkshih@realtek.com> writes:
> On Tue, 2019-01-29 at 14:03 +0800, YueHaibing wrote:
>> +cc netdev@vger.kernel.org
>>
>> On 2019/1/29 13:57, YueHaibing wrote:
>> > ping...
>> >
>> > On 2018/9/11 20:12, YueHaibing wrote:
>> >> Fixes gcc '-Wunused-but-set-variable' warning:
>> >>
>> >> drivers/net/wireless/realtek/rtlwifi/base.c: In function
>> 'rtl_c2h_content_parsing':
>> >> drivers/net/wireless/realtek/rtlwifi/base.c:2313:13: warning:
>> >> variable 'cmd_seq' set but not used [-Wunused-but-set-variable]
>> >>
>> >> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>
> Thanks for your fix.
>
> Acked-by: Ping-Ke Shih <pkshih@realtek.com>
For some reason I couldn't find the original patch from patchwork.
I didn't find the version sent today though:
https://patchwork.kernel.org/patch/10787619/
BTW YueHaibing, you can check the linux-wireless patch status yourself from
the patchwork:
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#checking_state_of_patches_from_patchwork
--
Kalle Valo
^ permalink raw reply
* Re: [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
From: Peter Zijlstra @ 2019-01-30 10:37 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: davem, daniel, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130101058.GD2278@hirez.programming.kicks-ass.net>
On Wed, Jan 30, 2019 at 11:10:58AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 29, 2019 at 08:04:57PM -0800, Alexei Starovoitov wrote:
> > Lockdep warns about false positive:
>
> The report reads like:
>
> tracepoint_probe_register()
> #0 mutex_lock(&tracepoint_mutex)
> tracepoint_add_func()
> static_key_slow_inc()
> #1 cpus_read_lock();
>
>
> _cpu_up()
> #1 cpus_write_lock();
> ...
> perf_event_init_cpu()
> #2 mutex_lock(&pmus_lock);
> #3 mutex_lock(&ctx->mutex);
>
>
> perf_ioctl()
> #4 perf_event_ctx_lock();
Sorry, that's #3, and then do s/#5/#4/ on the rest of the text.
> _perf_ioctl(IOC_QUERY_BPF)
> perf_event_query_prog_array()
> #5 mutex_lock(&bpf_event_mutex);
>
>
> bpf_probe_register()
> #5 mutex_lock(&bpf_event_mutex);
> __bpf_probe_register()
> tracepoint_probe_register()
> #0 mutex_lock(&tracepoint_mutex);
>
> Which to me reads like an entirely valid deadlock scenario.
>
> And note that the first and last can be combined to give:
>
> bpf_probe_register()
> #5 mutex_lock(&bpf_event_mutex);
> __bpf_probe_register()
> tracepoint_probe_register()
> #0 mutex_lock(&tracepoint_mutex);
> tracepoint_add_func()
> static_key_slow_inc()
> #1 cpus_read_lock();
>
>
> Which generates a deadlock even without #0.
>
> Why do you say this is not possible? All you need is 3 CPUs, one doing a
> CPU online, one doing a perf ioctl() and one doing that
> bpf_probe_register().
>
>
^ permalink raw reply
* RE: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Pankaj Bansal @ 2019-01-30 10:37 UTC (permalink / raw)
To: Pankaj Bansal, Shawn Guo, Leo Li, Andrew Lunn, Florian Fainelli
Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <20181115173752.22482-1-pankaj.bansal@nxp.com>
HI Shawn/Leo,
Can you please review this patch and include it in your tree ?
Regards,
Pankaj Bansal
-----Original Message-----
From: Pankaj Bansal [mailto:pankaj.bansal@nxp.com]
Sent: Thursday, 15 November, 2018 05:42 PM
To: Shawn Guo <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>; Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>
Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Pankaj Bansal <pankaj.bansal@nxp.com>
Subject: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
The two external MDIO buses used to communicate with phy devices that are external to SOC are muxed in LX2160AQDS board.
These buses can be routed to any one of the eight IO slots on LX2160AQDS board depending on value in fpga register 0x54.
Additionally the external MDIO1 is used to communicate to the onboard RGMII phy devices.
The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is controlled by bits 0-3 of fpga register.
Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
---
Notes:
This patch depends on following patches:
[1]https://patchwork.kernel.org/cover/10658863/
[2]https://patchwork.codeaurora.org/patch/637861/
.../boot/dts/freescale/fsl-lx2160a-qds.dts | 116 +++++++++++++++++
.../boot/dts/freescale/fsl-lx2160a.dtsi | 23 ++++
2 files changed, 139 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
index 8a0305a2b778..39aa2731ddfa 100644
--- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
@@ -54,6 +54,121 @@
&i2c0 {
status = "okay";
+ fpga@66 {
+ compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
+ reg = <0x66>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ mdio-mux-1@54 {
+ mdio-parent-bus = <&emdio1>;
+ reg = <0x54>; /* BRDCFG4 */
+ mux-mask = <0xf8>; /* EMI1_MDIO */
+ #address-cells=<1>;
+ #size-cells = <0>;
+
+ mdio@0 {
+ reg = <0x00>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@40 {
+ reg = <0x40>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@c0 {
+ reg = <0xc0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@c8 {
+ reg = <0xc8>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@d0 {
+ reg = <0xd0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@d8 {
+ reg = <0xd8>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@e0 {
+ reg = <0xe0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@e8 {
+ reg = <0xe8>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@f0 {
+ reg = <0xf0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@f8 {
+ reg = <0xf8>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ };
+
+ mdio-mux-2@54 {
+ mdio-parent-bus = <&emdio2>;
+ reg = <0x54>; /* BRDCFG4 */
+ mux-mask = <0x07>; /* EMI2_MDIO */
+ #address-cells=<1>;
+ #size-cells = <0>;
+
+ mdio@0 {
+ reg = <0x00>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@1 {
+ reg = <0x01>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@2 {
+ reg = <0x02>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@3 {
+ reg = <0x03>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@4 {
+ reg = <0x04>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@5 {
+ reg = <0x05>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@6 {
+ reg = <0x06>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ mdio@7 {
+ reg = <0x07>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ };
+ };
+
i2c-mux@77 {
compatible = "nxp,pca9547";
reg = <0x77>;
@@ -118,3 +233,4 @@
&usb1 {
status = "okay";
};
+
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
index 6ce0677c3096..518882b05f03 100644
--- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
@@ -780,5 +780,28 @@
<GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
dma-coherent;
};
+ /* TODO: WRIOP (CCSR?) */
+ /* WRIOP0: 0x8B8_0000, E-MDIO1: 0x1_6000 */
+ emdio1: mdio@0x8B96000 {
+ compatible = "fsl,fman-memac-mdio";
+ reg = <0x0 0x8B96000 0x0 0x1000>;
+ device_type = "mdio"; /* TODO: is this necessary? */
+ little-endian; /* force the driver in LE mode */
+
+ /* Not necessary on the QDS, but needed on the RDB*/
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+ /* WRIOP0: 0x8B8_0000, E-MDIO2: 0x1_7000 */
+ emdio2: mdio@0x8B97000 {
+ compatible = "fsl,fman-memac-mdio";
+ reg = <0x0 0x8B97000 0x0 0x1000>;
+ device_type = "mdio"; /* TODO: is this necessary? */
+ little-endian; /* force the driver in LE mode */
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
};
};
+
--
2.17.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox