* Re: [PATCH] netronome: don't access real_num_rx_queues directly
From: Jakub Kicinski @ 2016-11-16 16:37 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: David S. Miller, oss-drivers, netdev
In-Reply-To: <20161116141118.1893244-1-arnd@arndb.de>
On Wed, Nov 16, 2016 at 6:10 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS
> is enabled, so we now get a build failure when that is turned off:
>
> netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable':
> netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?
>
> As far as I can tell, the check here is only used as an optimization that
> we can skip in order to fix the compilation. If sysfs is disabled,
> the following netif_set_real_num_rx_queues() has no effect.
>
> Fixes: 164d1e9e5d52 ("nfp: add support for ethtool .set_channels")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Thanks Arnd! This is needed in net-next only.
^ permalink raw reply
* Re: [patch net-next 0/8] ipv4: fib: Allow modules to dump FIB tables
From: Alexander Duyck @ 2016-11-16 16:38 UTC (permalink / raw)
To: Jiri Pirko
Cc: Netdev, David Miller, idosch, eladr, yotamg, nogahf, arkadis,
Or Gerlitz, roopa, David Ahern, nikolay, Andy Gospodarek,
vivien.didelot, Andrew Lunn, Florian Fainelli, Duyck, Alexander H,
Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
kaber@trash.net
In-Reply-To: <1479305343-13167-1-git-send-email-jiri@resnulli.us>
On Wed, Nov 16, 2016 at 6:08 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Ido says:
>
> In kernel 4.9 the switchdev-specific FIB offload mechanism was replaced
> by a new FIB notification chain to which modules could register in order
> to be notified about the addition and deletion of FIB entries. The
> motivation for this change was that switchdev drivers need to be able to
> reflect the entire FIB table and not only FIBs configured on top of the
> port netdevs themselves. This is useful in case of in-band management.
>
> The fundamental problem with this approach is that upon registration
> listeners lose all the information previously sent in the chain and
> thus have an incomplete view of the FIB tables, which can result in
> packet loss. This patchset fixes that by introducing a new API to dump
> the FIB tables.
>
> The entire dump process is done under RCU and thus the FIB notification
> chain is converted to be atomic. The listeners are modified accordingly.
> This is done in the first five patches.
>
> The sixth patch adds the dump callback itself and the next patches call
> it from current listeners of the FIB notification chain.
>
> Ido Schimmel (8):
> ipv4: fib: Export free_fib_info()
> mlxsw: spectrum_router: Implement FIB offload in delayed work
> rocker: Implement FIB offload in delayed work
> ipv4: fib: Convert FIB notification chain to be atomic
> net: ipv4: Send notifications only after removing FIB alias
> ipv4: fib: Add an API to request a FIB dump
> mlxsw: spectrum_router: Request a dump of FIB tables during init
> rocker: Request a dump of FIB tables during init
>
> drivers/net/ethernet/mellanox/mlxsw/core.c | 6 +
> drivers/net/ethernet/mellanox/mlxsw/core.h | 1 +
> .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 73 ++++++++++--
> drivers/net/ethernet/rocker/rocker_main.c | 59 ++++++++--
> drivers/net/ethernet/rocker/rocker_ofdpa.c | 1 +
> include/net/ip_fib.h | 1 +
> net/ipv4/fib_semantics.c | 1 +
> net/ipv4/fib_trie.c | 128 +++++++++++++++++++--
> 8 files changed, 242 insertions(+), 28 deletions(-)
>
So for the pieces related to the fib_trie itself this patch series
looks good to me.
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
^ permalink raw reply
* RE: [patch net-next v2 5/8] Introduce sample tc action
From: Yotam Gigi @ 2016-11-16 16:26 UTC (permalink / raw)
To: Roopa Prabhu, Jiri Pirko
Cc: netdev@vger.kernel.org, davem@davemloft.net, Ido Schimmel,
Elad Raz, Nogah Frankel, Or Gerlitz, jhs@mojatatu.com,
geert+renesas@glider.be, stephen@networkplumber.org,
xiyou.wangcong@gmail.com, linux@roeck-us.net,
john.fastabend@gmail.com, simon.horman@netronome.com
In-Reply-To: <582C861B.8010709@cumulusnetworks.com>
>-----Original Message-----
>From: Roopa Prabhu [mailto:roopa@cumulusnetworks.com]
>Sent: Wednesday, November 16, 2016 6:15 PM
>To: Jiri Pirko <jiri@resnulli.us>
>Cc: netdev@vger.kernel.org; davem@davemloft.net; Yotam Gigi
><yotamg@mellanox.com>; Ido Schimmel <idosch@mellanox.com>; Elad Raz
><eladr@mellanox.com>; Nogah Frankel <nogahf@mellanox.com>; Or Gerlitz
><ogerlitz@mellanox.com>; jhs@mojatatu.com; geert+renesas@glider.be;
>stephen@networkplumber.org; xiyou.wangcong@gmail.com; linux@roeck-us.net;
>john.fastabend@gmail.com; simon.horman@netronome.com
>Subject: Re: [patch net-next v2 5/8] Introduce sample tc action
>
>On 11/14/16, 7:00 AM, Jiri Pirko wrote:
>> From: Yotam Gigi <yotamg@mellanox.com>
>>
>> This action allow the user to sample traffic matched by tc classifier.
>> The sampling consists of choosing packets randomly, truncating them,
>> adding some informative metadata regarding the interface and the original
>> packet size and mark them with specific mark, to allow further tc rules to
>> match and process. The marked sample packets are then injected into the
>> device ingress qdisc using netif_receive_skb.
>>
>> The packets metadata is packed using the ife encapsulation protocol, and
>> the outer packet's ethernet dest, source and eth_type, along with the
>> rate, mark and the optional truncation size can be configured from
>> userspace.
>>
>> Example:
>> To sample ingress traffic from interface eth1, and redirect the sampled
>> the sampled packets to interface dummy0, one may use the commands:
>>
>> tc qdisc add dev eth1 handle ffff: ingress
>>
>> tc filter add dev eth1 parent ffff: \
>> matchall action sample rate 12 mark 17
>
>Yotham, I am guessing in the future if one does not want to use mark,
>the sample api is extensible to allow for other actions to be added.
>This is from the general concern we had on using mark: some may not want to use
>mark.
>As long as the api is extensible to allow an alternate way in the future,
> we should be good. (We would prefer to not go down the path of having to
>introduce
>a new 'action sample' if this limits us in some way).
The code is extensible - if one does want to add another action to sample, he
totally can :)
By the way, one of the reasons we removed the patches for now is that we
consider adding mirroring functionality to it instead of heaving two tc-rules.
>
>>
>> tc filter add parent ffff: dev eth1 protocol all \
>> u32 match mark 17 0xff \
>> action mirred egress redirect dev dummy0
>>
>
>thanks.
^ permalink raw reply
* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Ido Schimmel @ 2016-11-16 16:42 UTC (permalink / raw)
To: David Miller
Cc: hannes, jiri, netdev, idosch, eladr, yotamg, nogahf, arkadis,
ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <20161116.112735.19078311822311561.davem@davemloft.net>
Hi Dave,
On Wed, Nov 16, 2016 at 11:27:35AM -0500, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Wed, 16 Nov 2016 15:51:01 +0100
>
> > I don't see a way around doing a journal like in filesystems somehow,
>
> We really just need a sequence counter incremented for each insert/remove,
> and restart the dump from the beginning if it changes mid-dump.
When you dump the FIB table your listener is already registered to the
chain, so any changes happening mid-dump will be propagated to it. I've
noted this in 5/8 and in my reply to Hannes.
^ permalink raw reply
* Re: [PATCH ipsec-next 6/8] xfrm: policy: only use rcu in xfrm_sk_policy_lookup
From: Nicolas Dichtel @ 2016-11-16 16:43 UTC (permalink / raw)
To: Florian Westphal, netdev, Steffen Klassert
In-Reply-To: <1470921479-25592-7-git-send-email-fw@strlen.de>
[-- Attachment #1: Type: text/plain, Size: 593 bytes --]
Le 11/08/2016 à 15:17, Florian Westphal a écrit :
> Don't acquire the readlock anymore and rely on rcu alone.
>
> In case writer on other CPU changed policy at the wrong moment (after we
> obtained sk policy pointer but before we could obtain the reference)
> just repeat the lookup.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Since this patch, our IKEv1 Transport tests (using charon) fail to establish the
connection. If I revert it, the IKE negociation is ok again.
charon logs are enclosed.
I didn't had time to investigate now, but any idea is welcomed ;-)
Regards,
Nicolas
[-- Attachment #2: charon.txt --]
[-- Type: text/plain, Size: 2343 bytes --]
/var/log/syslog:Nov 16 16:28:47 router charon: 04[CFG] received stroke: add connection 'mytunnel-17'
/var/log/syslog:Nov 16 16:28:47 router charon: 04[CFG] added configuration 'mytunnel-17'
/var/log/syslog:Nov 16 16:28:47 router charon: 08[CFG] received stroke: route 'mytunnel-17'
/var/log/syslog:Nov 16 16:28:52 router charon: 13[KNL] creating acquire job for policy 10.125.0.1/32[gre] === 10.125.0.2/32[gre] with reqid {1}
/var/log/syslog:Nov 16 16:28:52 router charon: 13[IKE] initiating Aggressive Mode IKE_SA mytunnel-17[1] to 10.125.0.2
/var/log/syslog:Nov 16 16:28:52 router charon: 13[ENC] generating AGGRESSIVE request 0 [ SA KE No ID V V V V V ]
/var/log/syslog:Nov 16 16:28:52 router charon: 13[NET] sending packet: from 10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:28:56 router charon: 07[IKE] sending retransmit 1 of request message ID 0, seq 1
/var/log/syslog:Nov 16 16:28:56 router charon: 07[NET] sending packet: from 10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:29:04 router charon: 08[IKE] sending retransmit 2 of request message ID 0, seq 1
/var/log/syslog:Nov 16 16:29:04 router charon: 08[NET] sending packet: from 10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:29:17 router charon: 09[IKE] sending retransmit 3 of request message ID 0, seq 1
/var/log/syslog:Nov 16 16:29:17 router charon: 09[NET] sending packet: from 10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:29:40 router charon: 11[IKE] sending retransmit 4 of request message ID 0, seq 1
/var/log/syslog:Nov 16 16:29:40 router charon: 11[NET] sending packet: from 10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:30:22 router charon: 05[IKE] sending retransmit 5 of request message ID 0, seq 1
/var/log/syslog:Nov 16 16:30:22 router charon: 05[NET] sending packet: from 10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:31:37 router charon: 15[KNL] creating delete job for CHILD_SA ESP/0x00000000/10.125.0.2
/var/log/syslog:Nov 16 16:31:37 router charon: 13[JOB] CHILD_SA ESP/0x00000000/10.125.0.2 not found for delete
/var/log/syslog:Nov 16 16:31:38 router charon: 09[IKE] giving up after 5 retransmits
/var/log/syslog:Nov 16 16:31:38 router charon: 09[IKE] establishing IKE_SA failed, peer not responding
^ permalink raw reply
* Re: [PATCH v2] net/phy/vitesse: Configure RGMII skew on VSC8601, if needed
From: Alex @ 2016-11-16 16:44 UTC (permalink / raw)
To: Andrew Lunn; +Cc: f.fainelli, netdev, linux-kernel, davem, Gokhan Cosgul
In-Reply-To: <20161116135053.GF19962@lunn.ch>
On 11/16/2016 05:50 AM, Andrew Lunn wrote:
> On Wed, Nov 16, 2016 at 01:02:33AM -0800, Alexandru Gagniuc wrote:
>> With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The
>> VSC8601 can handle this internally. While the VSC8601 can set more
>> fine-grained delays, the standard skew settings work out of the box.
>> The same heuristic is used to determine when this skew should be enabled
>> as in vsc824x_config_init().
>>
>> +/* This adds a skew for both TX and RX clocks, so the skew should only be
>> + * applied to "rgmii-id" interfaces. It may not work as expected
>> + * on "rgmii-txid", "rgmii-rxid" or "rgmii" interfaces. */
>
> Hi Alexandru
>
> You should be able to make "rgmii" work as expected. If that is the
> phy mode, disable the skew.
And that's exactly the implemented behavior. See vsc8601_config_init()
below.
Alex
^ permalink raw reply
* Re: [PATCH v2] net/phy/vitesse: Configure RGMII skew on VSC8601, if needed
From: Andrew Lunn @ 2016-11-16 16:54 UTC (permalink / raw)
To: Alex; +Cc: f.fainelli, netdev, linux-kernel, davem, Gokhan Cosgul
In-Reply-To: <f05f5f5d-a010-def2-314a-09a859f9eda3@adaptrum.com>
On Wed, Nov 16, 2016 at 08:44:30AM -0800, Alex wrote:
>
>
> On 11/16/2016 05:50 AM, Andrew Lunn wrote:
> >On Wed, Nov 16, 2016 at 01:02:33AM -0800, Alexandru Gagniuc wrote:
> >>With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The
> >>VSC8601 can handle this internally. While the VSC8601 can set more
> >>fine-grained delays, the standard skew settings work out of the box.
> >>The same heuristic is used to determine when this skew should be enabled
> >>as in vsc824x_config_init().
> >>
> >>+/* This adds a skew for both TX and RX clocks, so the skew should only be
> >>+ * applied to "rgmii-id" interfaces. It may not work as expected
> >>+ * on "rgmii-txid", "rgmii-rxid" or "rgmii" interfaces. */
> >
> >Hi Alexandru
> >
> >You should be able to make "rgmii" work as expected. If that is the
> >phy mode, disable the skew.
>
> And that's exactly the implemented behavior. See
> vsc8601_config_init() below.
I don't think so. vsc8601_config_init() will not cause the skew to be
cleared if the phy-mode is "rgmii" and something else like the
bootloader could of set the skew. So saying that "rgmii" might not
work as expected is true. But with a minor change, you can make it
work as expected.
Andrew
^ permalink raw reply
* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Hannes Frederic Sowa @ 2016-11-16 16:56 UTC (permalink / raw)
To: David Miller
Cc: jiri, netdev, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz,
roopa, dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <20161116.112735.19078311822311561.davem@davemloft.net>
On 16.11.2016 17:27, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Wed, 16 Nov 2016 15:51:01 +0100
>
>> I don't see a way around doing a journal like in filesystems somehow,
>
> We really just need a sequence counter incremented for each insert/remove,
> and restart the dump from the beginning if it changes mid-dump.
I thought about this at first, too, but became afraid if we could end up
looping endlessly because of routing changes happen constantly while
trying to upload the fib into the hardware.
^ permalink raw reply
* Re: [PATCH net 1/3] net: phy: realtek: add eee advertisement disable options
From: Florian Fainelli @ 2016-11-16 17:01 UTC (permalink / raw)
To: Jerome Brunet, Andrew Lunn
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
Carlo Caione, Kevin Hilman, Giuseppe Cavallaro, Alexandre TORGUE,
Martin Blumenstingl, Andre Roth, Neil Armstrong,
linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1479310731.17538.53.camel-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
On 11/16/2016 07:38 AM, Jerome Brunet wrote:
> On Wed, 2016-11-16 at 16:06 +0100, Andrew Lunn wrote:
>> On Wed, Nov 16, 2016 at 03:51:30PM +0100, Jerome Brunet wrote:
>>>
>>> On Wed, 2016-11-16 at 14:23 +0100, Andrew Lunn wrote:
>>>>
>>>>>
>>>>>
>>>>> There two kind of PHYs supporting eee, the one advertising eee
>>>>> by
>>>>> default (like realtek) and the one not advertising it (like
>>>>> micrel).
>>>
>>> This is just the default register value.
>>>
>>>>
>>>>
>>>> I don't know too much about EEE. So maybe a dumb question. Does
>>>> the
>>>> MAC need to be involved? Or is it just the PHY?
>>>>
>>>> If the MAC needs to be involved, the PHY should not be
>>>> advertising
>>>> EEE
>>>> unless the MAC asks for it by calling phy_init_eee(). If this is
>>>> true,
>>>> maybe we need to change the realtek driver, and others in that
>>>> class.
>>>
>>> As far I understand, the advertised capabilities are exchanged
>>> during
>>> the auto-negotiation.
>>>
>>> At this stage, if the advertisement is disabled (regarless of the
>>> actual support) on either side of the link, there will be no low
>>> power
>>> idle state on the Tx nor the Rx path.
>>>
>>> If the advertisement is enabled on both side but we don't call
>>> phy_init_eee, I suppose Tx won't enter LPI, but Rx could.
>>
>> What i was trying to find out is, if the MAC needs to support EEE as
>> well as the PHY, what happens when the MAC does not support EEE, but
>> the PHYs do negotiate EEE? Does it break?
>
> Interesting question. In a regular case, I suppose it should be fine.
> As you would have LPI only on the Rx path this should be transparent to
> the MAC. That's my understanding. Maybe people knowing EEE better than
> me could confirm (or not) ? Peppe? Alexandre?
EEE is a MAC and PHY feature, and both need to agree on what is enabled,
especially in the transmit path because the way packets may be
transmitted with or without EEE can be done differently at the HW level
(faster/slower return to idle, different clock source).
>
> I just checked with the OdroidC2, I disabled eee support by forcing
> "dma_cap.eee = 0" in stmmac_get_hw_features. As expected, no tx_LPI
> interrupts but plenty of rx_LPI interrupts.
>
> What was not expected is test failing like before.
> So in our case, having LPI on the Rx path is fine for receiving data,
> but not for sending.
OK, which really sounds like a potential interoperability problem, or
just the Realtek PHY with EEE enabled acting funky (irrespective of
being attached to stmmac).
--
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net 1/3] net: phy: realtek: add eee advertisement disable options
From: Anand Moon @ 2016-11-16 17:06 UTC (permalink / raw)
To: Jerome Brunet
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree, Florian Fainelli,
Alexandre TORGUE, Neil Armstrong, Martin Blumenstingl,
Kevin Hilman, Linux Kernel, Andre Roth,
linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Carlo Caione,
Giuseppe Cavallaro, linux-arm-kernel
In-Reply-To: <1479220154-25851-2-git-send-email-jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
Hi Jerome.
On 15 November 2016 at 19:59, Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org> wrote:
> On some platforms, energy efficient ethernet with rtl8211 devices is
> causing issue, like throughput drop or broken link.
>
> This was reported on the OdroidC2 (DWMAC + RTL8211F). While the issue root
> cause is not fully understood yet, disabling EEE advertisement prevent auto
> negotiation from enabling EEE.
>
> This patch provides options to disable 1000T and 100TX EEE advertisement
> individually for the realtek phys supporting this feature.
>
> Reported-by: Martin Blumenstingl <martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
> Cc: Giuseppe Cavallaro <peppe.cavallaro-qxv4g6HH51o@public.gmane.org>
> Cc: Alexandre TORGUE <alexandre.torgue-qxv4g6HH51o@public.gmane.org>
> Signed-off-by: Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Neil Armstrong <narmstrong-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> Tested-by: Andre Roth <neolynx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> drivers/net/phy/realtek.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 64 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
> index aadd6e9f54ad..77235fd5faaf 100644
> --- a/drivers/net/phy/realtek.c
> +++ b/drivers/net/phy/realtek.c
> @@ -15,6 +15,12 @@
> */
> #include <linux/phy.h>
> #include <linux/module.h>
> +#include <linux/of.h>
> +
> +struct rtl8211x_phy_priv {
> + bool eee_1000t_disable;
> + bool eee_100tx_disable;
> +};
>
> #define RTL821x_PHYSR 0x11
> #define RTL821x_PHYSR_DUPLEX 0x2000
> @@ -93,12 +99,44 @@ static int rtl8211f_config_intr(struct phy_device *phydev)
> return err;
> }
>
> +static void rtl8211x_clear_eee_adv(struct phy_device *phydev)
> +{
> + struct rtl8211x_phy_priv *priv = phydev->priv;
> + u16 val;
> +
> + if (priv->eee_1000t_disable || priv->eee_100tx_disable) {
> + val = phy_read_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
> + MDIO_MMD_AN);
> +
> + if (priv->eee_1000t_disable)
> + val &= ~MDIO_AN_EEE_ADV_1000T;
> + if (priv->eee_100tx_disable)
> + val &= ~MDIO_AN_EEE_ADV_100TX;
> +
> + phy_write_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
> + MDIO_MMD_AN, val);
> + }
> +}
> +
> +static int rtl8211x_config_init(struct phy_device *phydev)
> +{
> + int ret;
> +
> + ret = genphy_config_init(phydev);
> + if (ret < 0)
> + return ret;
> +
> + rtl8211x_clear_eee_adv(phydev);
> +
> + return 0;
> +}
> +
> static int rtl8211f_config_init(struct phy_device *phydev)
> {
> int ret;
> u16 reg;
>
> - ret = genphy_config_init(phydev);
> + ret = rtl8211x_config_init(phydev);
> if (ret < 0)
> return ret;
>
> @@ -115,6 +153,26 @@ static int rtl8211f_config_init(struct phy_device *phydev)
> return 0;
> }
>
> +static int rtl8211x_phy_probe(struct phy_device *phydev)
> +{
> + struct device *dev = &phydev->mdio.dev;
> + struct device_node *of_node = dev->of_node;
> + struct rtl8211x_phy_priv *priv;
> +
> + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + priv->eee_1000t_disable =
> + of_property_read_bool(of_node, "realtek,disable-eee-1000t");
> + priv->eee_100tx_disable =
> + of_property_read_bool(of_node, "realtek,disable-eee-100tx");
> +
> + phydev->priv = priv;
> +
> + return 0;
> +}
> +
> static struct phy_driver realtek_drvs[] = {
> {
> .phy_id = 0x00008201,
> @@ -140,7 +198,9 @@ static struct phy_driver realtek_drvs[] = {
> .phy_id_mask = 0x001fffff,
> .features = PHY_GBIT_FEATURES,
> .flags = PHY_HAS_INTERRUPT,
> + .probe = &rtl8211x_phy_probe,
> .config_aneg = genphy_config_aneg,
> + .config_init = &rtl8211x_config_init,
> .read_status = genphy_read_status,
> .ack_interrupt = rtl821x_ack_interrupt,
> .config_intr = rtl8211e_config_intr,
> @@ -152,7 +212,9 @@ static struct phy_driver realtek_drvs[] = {
> .phy_id_mask = 0x001fffff,
> .features = PHY_GBIT_FEATURES,
> .flags = PHY_HAS_INTERRUPT,
> + .probe = &rtl8211x_phy_probe,
> .config_aneg = &genphy_config_aneg,
> + .config_init = &rtl8211x_config_init,
> .read_status = &genphy_read_status,
> .ack_interrupt = &rtl821x_ack_interrupt,
> .config_intr = &rtl8211e_config_intr,
> @@ -164,6 +226,7 @@ static struct phy_driver realtek_drvs[] = {
> .phy_id_mask = 0x001fffff,
> .features = PHY_GBIT_FEATURES,
> .flags = PHY_HAS_INTERRUPT,
> + .probe = &rtl8211x_phy_probe,
> .config_aneg = &genphy_config_aneg,
> .config_init = &rtl8211f_config_init,
> .read_status = &genphy_read_status,
> --
> 2.7.4
>
How about adding callback functionality for .soft_reset to handle BMCR
where we update the Auto-Negotiation for the phy,
as per the datasheet of the rtl8211f.
-Best Regard
Anand Moon
>
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] net: dsa: fix fixed-link-phy device leaks
From: Florian Fainelli @ 2016-11-16 17:06 UTC (permalink / raw)
To: Johan Hovold, Andrew Lunn
Cc: Vivien Didelot, David S. Miller, netdev, linux-kernel
In-Reply-To: <1479307664-32428-1-git-send-email-johan@kernel.org>
On 11/16/2016 06:47 AM, Johan Hovold wrote:
> Make sure to drop the reference taken by of_phy_find_device() when
> registering and deregistering the fixed-link PHY-device.
>
> Note that we need to put both references held at deregistration.
>
> Fixes: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port
> speeds/duplex")
> Signed-off-by: Johan Hovold <johan@kernel.org>
> ---
>
> Hi,
>
> This is one has been compile tested only, but fixes a couple of leaks
> similar to one that was found in the cpsw driver for which I just posted
> a patch.
>
> It turns out all drivers but DSA fail to deregister the fixed-link PHYs
> registered by of_phy_register_fixed_link(). Due to the way this
> interface was designed, deregistering such a PHY is a bit cumbersome and
> looks like it would benefit from a common helper.
>
> However, perhaps the interface should instead be changed so that the PHY
> device is returned so that drivers do not need to use
> of_phy_find_device() when they need to access properties of the fixed
> link (e.g. as in dsu_cpu_dsa_setup below).
>
> Thoughts?
>
> Thanks,
> Johan
>
>
> net/dsa/dsa.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index a6902c1e2f28..798a6a776a5f 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -233,6 +233,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
> genphy_read_status(phydev);
> if (ds->ops->adjust_link)
> ds->ops->adjust_link(ds, port, phydev);
> +
> + phy_device_free(phydev); /* of_phy_find_device */
> }
>
> return 0;
> @@ -509,8 +511,12 @@ void dsa_cpu_dsa_destroy(struct device_node *port_dn)
> if (of_phy_is_fixed_link(port_dn)) {
> phydev = of_phy_find_device(port_dn);
> if (phydev) {
> - phy_device_free(phydev);
> fixed_phy_unregister(phydev);
> + /* Put references taken by of_phy_find_device() and
> + * of_phy_register_fixed_link().
> + */
> + phy_device_free(phydev);
> + phy_device_free(phydev);
Double free, this looks bogus here. Actually would not this mean a
triple free since you already free in dsa_cpu_dsa_setup() which is
paired with dsa_cpu_dsa_destroy()?
--
Florian
^ permalink raw reply
* Re: [PATCH v2] net/phy/vitesse: Configure RGMII skew on VSC8601, if needed
From: Alex @ 2016-11-16 17:10 UTC (permalink / raw)
To: Andrew Lunn; +Cc: f.fainelli, netdev, linux-kernel, davem, Gokhan Cosgul
In-Reply-To: <20161116165437.GK23231@lunn.ch>
On 11/16/2016 08:54 AM, Andrew Lunn wrote:
> On Wed, Nov 16, 2016 at 08:44:30AM -0800, Alex wrote:
>>
>>
>> On 11/16/2016 05:50 AM, Andrew Lunn wrote:
>>> On Wed, Nov 16, 2016 at 01:02:33AM -0800, Alexandru Gagniuc wrote:
>>>> With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The
>>>> VSC8601 can handle this internally. While the VSC8601 can set more
>>>> fine-grained delays, the standard skew settings work out of the box.
>>>> The same heuristic is used to determine when this skew should be enabled
>>>> as in vsc824x_config_init().
>>>>
>>>> +/* This adds a skew for both TX and RX clocks, so the skew should only be
>>>> + * applied to "rgmii-id" interfaces. It may not work as expected
>>>> + * on "rgmii-txid", "rgmii-rxid" or "rgmii" interfaces. */
>>>
>>> Hi Alexandru
>>>
>>> You should be able to make "rgmii" work as expected. If that is the
>>> phy mode, disable the skew.
>>
>> And that's exactly the implemented behavior. See
>> vsc8601_config_init() below.
>
> I don't think so. vsc8601_config_init() will not cause the skew to be
> cleared if the phy-mode is "rgmii" and something else like the
> bootloader could of set the skew. So saying that "rgmii" might not
> work as expected is true. But with a minor change, you can make it
> work as expected.
That's not within the scope of this change. The scope is to make
rgmii-id work. Any additional changes would be untested.
Alex
^ permalink raw reply
* [PATCH net-next] udp: enable busy polling for all sockets
From: Eric Dumazet @ 2016-11-16 17:10 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Willem de Bruijn
From: Eric Dumazet <edumazet@google.com>
UDP busy polling is restricted to connected UDP sockets.
This is because sk_busy_loop() only takes care of one NAPI context.
There are cases where it could be extended.
1) Some hosts receive traffic on a single NIC, with one RX queue.
2) Some applications use SO_REUSEPORT and associated BPF filter
to split the incoming traffic on one UDP socket per RX
queue/thread/cpu
3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()
This patch records the napi_id of first received skb, giving more
reach to busy polling.
Tested:
lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done
Before patch :
27867 28870 37324 41060 41215
36764 36838 44455 41282 43843
After patch :
73920 73213 70147 74845 71697
68315 68028 75219 70082 73707
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
---
include/net/busy_poll.h | 28 +++++++++++++++++++---------
net/ipv4/udp.c | 2 ++
net/ipv6/udp.c | 2 ++
3 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 2fbeb1313c0f4f78ac82ddf6c18d1016a901f99a..34c57a0ec040a58f09b12e3878f534433dc30015 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -81,11 +81,6 @@ static inline void skb_mark_napi_id(struct sk_buff *skb,
skb->napi_id = napi->napi_id;
}
-/* used in the protocol hanlder to propagate the napi_id to the socket */
-static inline void sk_mark_napi_id(struct sock *sk, struct sk_buff *skb)
-{
- sk->sk_napi_id = skb->napi_id;
-}
#else /* CONFIG_NET_RX_BUSY_POLL */
static inline unsigned long net_busy_loop_on(void)
@@ -108,10 +103,6 @@ static inline void skb_mark_napi_id(struct sk_buff *skb,
{
}
-static inline void sk_mark_napi_id(struct sock *sk, struct sk_buff *skb)
-{
-}
-
static inline bool busy_loop_timeout(unsigned long end_time)
{
return true;
@@ -123,4 +114,23 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock)
}
#endif /* CONFIG_NET_RX_BUSY_POLL */
+
+/* used in the protocol hanlder to propagate the napi_id to the socket */
+static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ sk->sk_napi_id = skb->napi_id;
+#endif
+}
+
+/* variant used for unconnected sockets */
+static inline void sk_mark_napi_id_once(struct sock *sk,
+ const struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ if (!sk->sk_napi_id)
+ sk->sk_napi_id = skb->napi_id;
+#endif
+}
+
#endif /* _LINUX_NET_BUSY_POLL_H */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9ae7c63a8b131f56a3c05e00a8f7d106a0362c54..e1fc0116e8d59d8185670c6e55d1219bde55610d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1569,6 +1569,8 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
sock_rps_save_rxhash(sk, skb);
sk_mark_napi_id(sk, skb);
sk_incoming_cpu_update(sk);
+ } else {
+ sk_mark_napi_id_once(sk, skb);
}
rc = __udp_enqueue_schedule_skb(sk, skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 86a8cacd333b64f40acc7350a45213cb422a9c1a..4f99417d9b401f2a65c7828e7d6b86d1d6161794 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -519,6 +519,8 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
sock_rps_save_rxhash(sk, skb);
sk_mark_napi_id(sk, skb);
sk_incoming_cpu_update(sk);
+ } else {
+ sk_mark_napi_id_once(sk, skb);
}
rc = __udp_enqueue_schedule_skb(sk, skb);
^ permalink raw reply related
* Re: [PATCH] net: dsa: fix fixed-link-phy device leaks
From: Johan Hovold @ 2016-11-16 17:11 UTC (permalink / raw)
To: Florian Fainelli
Cc: Johan Hovold, Andrew Lunn, Vivien Didelot, David S. Miller,
netdev, linux-kernel
In-Reply-To: <42af57ed-df84-f628-659d-c982d30da7d5@gmail.com>
On Wed, Nov 16, 2016 at 09:06:26AM -0800, Florian Fainelli wrote:
> On 11/16/2016 06:47 AM, Johan Hovold wrote:
> > Make sure to drop the reference taken by of_phy_find_device() when
> > registering and deregistering the fixed-link PHY-device.
> >
> > Note that we need to put both references held at deregistration.
> >
> > Fixes: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port
> > speeds/duplex")
> > Signed-off-by: Johan Hovold <johan@kernel.org>
> > ---
> >
> > Hi,
> >
> > This is one has been compile tested only, but fixes a couple of leaks
> > similar to one that was found in the cpsw driver for which I just posted
> > a patch.
> >
> > It turns out all drivers but DSA fail to deregister the fixed-link PHYs
> > registered by of_phy_register_fixed_link(). Due to the way this
> > interface was designed, deregistering such a PHY is a bit cumbersome and
> > looks like it would benefit from a common helper.
> >
> > However, perhaps the interface should instead be changed so that the PHY
> > device is returned so that drivers do not need to use
> > of_phy_find_device() when they need to access properties of the fixed
> > link (e.g. as in dsu_cpu_dsa_setup below).
> >
> > Thoughts?
> >
> > Thanks,
> > Johan
> >
> >
> > net/dsa/dsa.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> > index a6902c1e2f28..798a6a776a5f 100644
> > --- a/net/dsa/dsa.c
> > +++ b/net/dsa/dsa.c
> > @@ -233,6 +233,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
> > genphy_read_status(phydev);
> > if (ds->ops->adjust_link)
> > ds->ops->adjust_link(ds, port, phydev);
> > +
> > + phy_device_free(phydev); /* of_phy_find_device */
> > }
> >
> > return 0;
> > @@ -509,8 +511,12 @@ void dsa_cpu_dsa_destroy(struct device_node *port_dn)
> > if (of_phy_is_fixed_link(port_dn)) {
> > phydev = of_phy_find_device(port_dn);
> > if (phydev) {
> > - phy_device_free(phydev);
> > fixed_phy_unregister(phydev);
> > + /* Put references taken by of_phy_find_device() and
> > + * of_phy_register_fixed_link().
> > + */
> > + phy_device_free(phydev);
> > + phy_device_free(phydev);
>
> Double free, this looks bogus here. Actually would not this mean a
> triple free since you already free in dsa_cpu_dsa_setup() which is
> paired with dsa_cpu_dsa_destroy()?
The naming of phy_device_free() is unfortunate when it's really a put():
void phy_device_free(struct phy_device *phydev)
{
put_device(&phydev->mdio.dev);
}
which may need to be called multiple times, specifically after a call to
of_phy_find_device() which takes another reference.
With this patch the refcounts are properly balanced.
Johan
^ permalink raw reply
* Re: [PATCH net v2] net: nsid cannot be allocated for a dead netns
From: Cong Wang @ 2016-11-16 17:29 UTC (permalink / raw)
To: Nicolas Dichtel
Cc: Andrey Wagin, David Miller, Linux Kernel Network Developers
In-Reply-To: <1479289775-1715-1-git-send-email-nicolas.dichtel@6wind.com>
On Wed, Nov 16, 2016 at 1:49 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index f61c0e02a413..63f65387f4e1 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -159,6 +159,9 @@ static int alloc_netid(struct net *net, struct net *peer, int reqid)
> max = reqid + 1;
> }
>
> + if (!atomic_read(&net->count) || !atomic_read(&peer->count))
> + return -EINVAL;
> +
> return idr_alloc(&net->netns_ids, peer, min, max, GFP_ATOMIC);
> }
There is already a check in peernet2id_alloc(), so why not just the following?
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f61c0e0..7001da9 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -219,6 +219,8 @@ int peernet2id_alloc(struct net *net, struct net *peer)
bool alloc;
int id;
+ if (atomic_read(&net->count) == 0)
+ return NETNSA_NSID_NOT_ASSIGNED;
spin_lock_irqsave(&net->nsid_lock, flags);
alloc = atomic_read(&peer->count) == 0 ? false : true;
id = __peernet2id_alloc(net, peer, &alloc);
^ permalink raw reply related
* Re: [PATCH ipsec-next 6/8] xfrm: policy: only use rcu in xfrm_sk_policy_lookup
From: Florian Westphal @ 2016-11-16 17:30 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: Florian Westphal, netdev, Steffen Klassert
In-Reply-To: <0e235471-cb63-2376-c42d-5872e78e4c98@6wind.com>
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> Le 11/08/2016 à 15:17, Florian Westphal a écrit :
> > Don't acquire the readlock anymore and rely on rcu alone.
> >
> > In case writer on other CPU changed policy at the wrong moment (after we
> > obtained sk policy pointer but before we could obtain the reference)
> > just repeat the lookup.
> >
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> Since this patch, our IKEv1 Transport tests (using charon) fail to establish the
> connection. If I revert it, the IKE negociation is ok again.
> charon logs are enclosed.
>
> I didn't had time to investigate now, but any idea is welcomed ;-)
I'm an idiot. Thanks for figuring out the faulty commit!
This should fix it (if we succeed grabbing the refcount, then if (err && !xfrm_pol_hold_rcu
evaluates to false and we hit last else branch and set pol to ERR_PTR(0)...
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index fd6986634e6f..5bf7e1bfeac7 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1268,12 +1268,14 @@ static struct xfrm_policy *xfrm_sk_policy_lookup(const struct sock *sk, int dir,
err = security_xfrm_policy_lookup(pol->security,
fl->flowi_secid,
policy_to_flow_dir(dir));
- if (!err && !xfrm_pol_hold_rcu(pol))
- goto again;
- else if (err == -ESRCH)
+ if (!err) {
+ if (!xfrm_pol_hold_rcu(pol))
+ goto again;
+ } else if (err == -ESRCH) {
pol = NULL;
- else
+ } else {
pol = ERR_PTR(err);
+ }
} else
^ permalink raw reply related
* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Hannes Frederic Sowa @ 2016-11-16 17:35 UTC (permalink / raw)
To: Ido Schimmel
Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <20161116151829.h6kapbjes7xxxcyi@splinter.mtl.com>
On 16.11.2016 16:18, Ido Schimmel wrote:
> Hi Hannes,
>
> On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote:
>> On 16.11.2016 15:09, Jiri Pirko wrote:
>>> From: Ido Schimmel <idosch@mellanox.com>
>>>
>>> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
>>> introduced a new notification chain to notify listeners (f.e., switchdev
>>> drivers) about addition and deletion of routes.
>>>
>>> However, upon registration to the chain the FIB tables can already be
>>> populated, which means potential listeners will have an incomplete view
>>> of the tables.
>>>
>>> Solve that by adding an API to request a FIB dump. The dump itself it
>>> done using RCU in order not to starve consumers that need RTNL to make
>>> progress.
>>>
>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>
>> Have you looked at potential inconsistencies resulting of RCU walking
>> the table and having concurrent inserts?
>
> Yes. I did try to think about situations in which this approach will
> fail, but I could only find problems with concurrent removals, which I
> addressed in 5/8. In case of concurrent insertions, even if you missed
> the node, you would still get the ENTRY_ADD event to your listener.
Theoretically a node could still be installed while the deletion event
fired before registering the notifier. E.g. a synchronize_net before
dumping could help here?
I don't know how you prepare the data structures for inserting in into
the hardware, but if ordering matters, the notifier for a delete event
can be called before the dump installed the fib entry? E.g. you also
don't check what events are already queued in the delayed work queue so
far. I hope I understand the patches correctly?
Thanks,
Hannes
^ permalink raw reply
* [PATCHv2 perf/core 0/2] libbpf: Synchronize implementations
From: Joe Stringer @ 2016-11-16 17:43 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, wangnan0, ast, daniel, acme
Update tools/lib/bpf to provide more functionality and improve interoperation
with other tools that generate and use eBPF code:
* The kernel uapi headers are a bit newer than the version in the tools/
directory; synchronize those.
* samples/bpf/libbpf* has a bit more functionality than tools/lib/bpf, so
extend tools/lib/bpf/bpf* with these functions to bring them into parity.
I've got a separate series to update samples/bpf/* to rely on these libraries,
but there's a conflict with davem's tree at the moment so I suppose that the
way forward is to get these patches through first, then take the samples
through net-next at a later time.
---
v2: Don't shift non-bpf code into libbpf.
Drop the patch to synchronize ELF definitions with tc.
v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
First post.
Joe Stringer (2):
tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
tools lib bpf: Sync with samples/bpf/libbpf
tools/include/uapi/linux/bpf.h | 51 +++++++++++
tools/lib/bpf/bpf.c | 107 +++++++++++++++++-----
tools/lib/bpf/bpf.h | 202 +++++++++++++++++++++++++++++++++++++++--
tools/lib/bpf/libbpf.c | 3 +-
4 files changed, 330 insertions(+), 33 deletions(-)
--
2.9.3
^ permalink raw reply
* [PATCHv2 perf/core 1/2] tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
From: Joe Stringer @ 2016-11-16 17:43 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, wangnan0, ast, daniel, acme
In-Reply-To: <20161116174324.29675-1-joe@ovn.org>
The tools version of this header is out of date; update it to the latest
version from the kernel headers.
Signed-off-by: Joe Stringer <joe@ovn.org>
---
v2: No change.
---
tools/include/uapi/linux/bpf.h | 51 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9e5fc168c8a3..f09c70b97eca 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -95,6 +95,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SCHED_ACT,
BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_XDP,
+ BPF_PROG_TYPE_PERF_EVENT,
};
#define BPF_PSEUDO_MAP_FD 1
@@ -375,6 +376,56 @@ enum bpf_func_id {
*/
BPF_FUNC_probe_write_user,
+ /**
+ * bpf_current_task_under_cgroup(map, index) - Check cgroup2 membership of current task
+ * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
+ * @index: index of the cgroup in the bpf_map
+ * Return:
+ * == 0 current failed the cgroup2 descendant test
+ * == 1 current succeeded the cgroup2 descendant test
+ * < 0 error
+ */
+ BPF_FUNC_current_task_under_cgroup,
+
+ /**
+ * bpf_skb_change_tail(skb, len, flags)
+ * The helper will resize the skb to the given new size,
+ * to be used f.e. with control messages.
+ * @skb: pointer to skb
+ * @len: new skb length
+ * @flags: reserved
+ * Return: 0 on success or negative error
+ */
+ BPF_FUNC_skb_change_tail,
+
+ /**
+ * bpf_skb_pull_data(skb, len)
+ * The helper will pull in non-linear data in case the
+ * skb is non-linear and not all of len are part of the
+ * linear section. Only needed for read/write with direct
+ * packet access.
+ * @skb: pointer to skb
+ * @len: len to make read/writeable
+ * Return: 0 on success or negative error
+ */
+ BPF_FUNC_skb_pull_data,
+
+ /**
+ * bpf_csum_update(skb, csum)
+ * Adds csum into skb->csum in case of CHECKSUM_COMPLETE.
+ * @skb: pointer to skb
+ * @csum: csum to add
+ * Return: csum on success or negative error
+ */
+ BPF_FUNC_csum_update,
+
+ /**
+ * bpf_set_hash_invalid(skb)
+ * Invalidate current skb>hash.
+ * @skb: pointer to skb
+ */
+ BPF_FUNC_set_hash_invalid,
+
__BPF_FUNC_MAX_ID,
};
--
2.9.3
^ permalink raw reply related
* [PATCHv2 perf/core 2/2] tools lib bpf: Sync with samples/bpf/libbpf
From: Joe Stringer @ 2016-11-16 17:43 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, wangnan0, ast, daniel, acme
In-Reply-To: <20161116174324.29675-1-joe@ovn.org>
Extend the tools/ version of libbpf to include all of the functionality
provided in the samples/bpf version.
Signed-off-by: Joe Stringer <joe@ovn.org>
---
v2: Don't shift non-bpf changes across.
Various type cleanups, removal of extraneous declarations
---
tools/lib/bpf/bpf.c | 107 ++++++++++++++++++++------
tools/lib/bpf/bpf.h | 202 +++++++++++++++++++++++++++++++++++++++++++++++--
tools/lib/bpf/libbpf.c | 3 +-
3 files changed, 279 insertions(+), 33 deletions(-)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4212ed62235b..5e061851ac00 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -20,10 +20,17 @@
*/
#include <stdlib.h>
-#include <memory.h>
+#include <stdio.h>
#include <unistd.h>
#include <asm/unistd.h>
+#include <string.h>
+#include <linux/netlink.h>
#include <linux/bpf.h>
+#include <errno.h>
+#include <net/ethernet.h>
+#include <net/if.h>
+#include <linux/if_packet.h>
+#include <arpa/inet.h>
#include "bpf.h"
/*
@@ -53,24 +60,71 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
return syscall(__NR_bpf, cmd, attr, size);
}
-int bpf_create_map(enum bpf_map_type map_type, int key_size,
- int value_size, int max_entries)
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+ int max_entries, int map_flags)
{
- union bpf_attr attr;
+ union bpf_attr attr = {
+ .map_type = map_type,
+ .key_size = key_size,
+ .value_size = value_size,
+ .max_entries = max_entries,
+ .map_flags = map_flags,
+ };
- memset(&attr, '\0', sizeof(attr));
+ return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}
- attr.map_type = map_type;
- attr.key_size = key_size;
- attr.value_size = value_size;
- attr.max_entries = max_entries;
+int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags)
+{
+ union bpf_attr attr = {
+ .map_fd = fd,
+ .key = ptr_to_u64(key),
+ .value = ptr_to_u64(value),
+ .flags = flags,
+ };
- return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+ return sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_lookup_elem(int fd, void *key, void *value)
+{
+ union bpf_attr attr = {
+ .map_fd = fd,
+ .key = ptr_to_u64(key),
+ .value = ptr_to_u64(value),
+ };
+
+ return sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_delete_elem(int fd, void *key)
+{
+ union bpf_attr attr = {
+ .map_fd = fd,
+ .key = ptr_to_u64(key),
+ };
+
+ return sys_bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
}
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
- size_t insns_cnt, char *license,
- u32 kern_version, char *log_buf, size_t log_buf_sz)
+int bpf_get_next_key(int fd, void *key, void *next_key)
+{
+ union bpf_attr attr = {
+ .map_fd = fd,
+ .key = ptr_to_u64(key),
+ .next_key = ptr_to_u64(next_key),
+ };
+
+ return sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
+}
+
+#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
+
+
+int bpf_load_program(enum bpf_prog_type type,
+ const struct bpf_insn *insns, int insns_cnt,
+ const char *license, int kern_version,
+ char *log_buf, size_t log_buf_sz)
{
int fd;
union bpf_attr attr;
@@ -78,8 +132,8 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
bzero(&attr, sizeof(attr));
attr.prog_type = type;
attr.insn_cnt = (__u32)insns_cnt;
- attr.insns = ptr_to_u64(insns);
- attr.license = ptr_to_u64(license);
+ attr.insns = ptr_to_u64((void *)insns);
+ attr.license = ptr_to_u64((void *)license);
attr.log_buf = ptr_to_u64(NULL);
attr.log_size = 0;
attr.log_level = 0;
@@ -97,16 +151,21 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
}
-int bpf_map_update_elem(int fd, void *key, void *value,
- u64 flags)
+int bpf_obj_pin(int fd, const char *pathname)
{
- union bpf_attr attr;
+ union bpf_attr attr = {
+ .pathname = ptr_to_u64((void *)pathname),
+ .bpf_fd = fd,
+ };
- bzero(&attr, sizeof(attr));
- attr.map_fd = fd;
- attr.key = ptr_to_u64(key);
- attr.value = ptr_to_u64(value);
- attr.flags = flags;
+ return sys_bpf(BPF_OBJ_PIN, &attr, sizeof(attr));
+}
- return sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+int bpf_obj_get(const char *pathname)
+{
+ union bpf_attr attr = {
+ .pathname = ptr_to_u64((void *)pathname),
+ };
+
+ return sys_bpf(BPF_OBJ_GET, &attr, sizeof(attr));
}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index e8ba54087497..4dba36995771 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -23,16 +23,202 @@
#include <linux/bpf.h>
+struct bpf_insn;
+
int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
- int max_entries);
+ int max_entries, int map_flags);
+int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags);
+int bpf_lookup_elem(int fd, void *key, void *value);
+int bpf_delete_elem(int fd, void *key);
+int bpf_get_next_key(int fd, void *key, void *next_key);
+
+int bpf_load_program(enum bpf_prog_type prog_type,
+ const struct bpf_insn *insns, int insn_len,
+ const char *license, int kern_version,
+ char *log_buf, size_t log_buf_sz);
+
+int bpf_obj_pin(int fd, const char *pathname);
+int bpf_obj_get(const char *pathname);
-/* Recommend log buffer size */
#define BPF_LOG_BUF_SIZE 65536
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
- size_t insns_cnt, char *license,
- u32 kern_version, char *log_buf,
- size_t log_buf_sz);
-int bpf_map_update_elem(int fd, void *key, void *value,
- u64 flags);
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU64_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_ALU32_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_MOV32_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV64_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_MOV32_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
+#define BPF_LD_IMM64(DST, IMM) \
+ BPF_LD_IMM64_RAW(DST, 0, IMM)
+
+#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_DW | BPF_IMM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = (__u32) (IMM) }), \
+ ((struct bpf_insn) { \
+ .code = 0, /* zero is reserved opcode */ \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = ((__u64) (IMM)) >> 32 })
+
+#ifndef BPF_PSEUDO_MAP_FD
+# define BPF_PSEUDO_MAP_FD 1
+#endif
+
+/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
+#define BPF_LD_MAP_FD(DST, MAP_FD) \
+ BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
+
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
+
+#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
+
+#define BPF_JMP_REG(OP, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = CODE, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN() \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_EXIT, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = 0 })
+
#endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b699aea9a025..951e11023554 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -796,7 +796,8 @@ bpf_object__create_maps(struct bpf_object *obj)
*pfd = bpf_create_map(def->type,
def->key_size,
def->value_size,
- def->max_entries);
+ def->max_entries,
+ 0);
if (*pfd < 0) {
size_t j;
int err = *pfd;
--
2.9.3
^ permalink raw reply related
* Re: Netperf UDP issue with connected sockets
From: Rick Jones @ 2016-11-16 17:46 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: netdev@vger.kernel.org, Eric Dumazet
In-Reply-To: <20161116131609.4e5726b4@redhat.com>
On 11/16/2016 04:16 AM, Jesper Dangaard Brouer wrote:
> [1] Subj: High perf top ip_idents_reserve doing netperf UDP_STREAM
> - https://www.spinics.net/lists/netdev/msg294752.html
>
> Not fixed in version 2.7.0.
> - ftp://ftp.netperf.org/netperf/netperf-2.7.0.tar.gz
>
> Used extra netperf configure compile options:
> ./configure --enable-histogram --enable-demo
>
> It seems like some fix attempts exists in the SVN repository::
>
> svn checkout http://www.netperf.org/svn/netperf2/trunk/ netperf2-svn
> svn log -r709
> # A quick stab at getting remote connect going for UDP_STREAM
> svn diff -r708:709
>
> Testing with SVN version, still show __ip_select_ident() in top#1.
Indeed, there was a fix for getting the remote side connect()ed.
Looking at what I have for the top of trunk I do though see a connect()
call being made at the local end:
socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
getsockopt(4, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
getsockopt(4, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(4, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
setsockopt(4, SOL_SOCKET, SO_DONTROUTE, [1], 4) = 0
setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0
brk(0xe53000) = 0xe53000
getsockname(4, {sa_family=AF_INET, sin_port=htons(59758),
sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
sendto(3,
"\0\0\0a\377\377\377\377\377\377\377\377\377\377\377\377\0\0\0\10\0\0\0\0\0\0\0\321\377\377\377\377"...,
656, 0, NULL, 0) = 656
select(1024, [3], NULL, NULL, {120, 0}) = 1 (in [3], left {119, 995630})
recvfrom(3,
"\0\0\0b\0\0\0\0\0\3@\0\0\3@\0\0\0\0\2\0\3@\0\377\377\377\377\0\0\0\321"...,
656, 0, NULL, NULL) = 656
write(1, "need to connect is 1\n", 21) = 21
rt_sigaction(SIGALRM, {0x402ea6, [ALRM], SA_RESTORER|SA_INTERRUPT,
0x7f2824eb2cb0}, NULL, 8) = 0
rt_sigaction(SIGINT, {0x402ea6, [INT], SA_RESTORER|SA_INTERRUPT,
0x7f2824eb2cb0}, NULL, 8) = 0
alarm(1) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(34832),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) =
1024
sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) =
1024
sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) =
1024
the only difference there with top of trunk is that "need to connect"
write/printf I just put in the code to be a nice marker in the system
call trace.
It is a wild guess, but does setting SO_DONTROUTE affect whether or not
a connect() would have the desired effect? That is there to protect
people from themselves (long story about people using UDP_STREAM to
stress improperly air-gapped systems during link up/down testing....)
It can be disabled with a test-specific -R 1 option, so your netperf
command would become:
netperf -H 198.18.50.1 -t UDP_STREAM -l 120 -- -m 1472 -n -N -R 1
>
> (p.s. is netperf ever going to be converted from SVN to git?)
>
Well.... my git-fu could use some work (gentle, offlinetaps with a
clueful tutorial bat would be welcome), and at least in the past, going
to git was held back because there were a bunch of netperf users on
Windows and there wasn't (at the time) support for git under Windows.
But I am not against the idea in principle.
happy benchmarking,
rick jones
PS - rick.jones@hp.com no longer works. rick.jones2@hpe.com should be
used instead.
^ permalink raw reply
* Re: [PATCH net-next 0/5] net: busy-poll: allow preemption and other optimizations
From: John Fastabend @ 2016-11-16 18:03 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller
Cc: netdev, Willem de Bruijn, Adam Belay, Zach Brown, Tariq Toukan,
Yuval Mintz, Ariel Elior, Eric Dumazet
In-Reply-To: <1479233715-9905-1-git-send-email-edumazet@google.com>
On 16-11-15 10:15 AM, Eric Dumazet wrote:
> It is time to have preemption points in sk_busy_loop() and improve
> its scalability.
>
> Also napi_complete() and friends can tell drivers when it is safe to
> not re-enable device interrupts, saving some overhead under
> high busy polling.
>
> mlx4 and bnx2x are changed accordingly, to show how this busy polling
> status can be exploited by drivers.
>
> Next steps will implement Zach Brown suggestion, where NAPI polling
> would be enabled all the time for some chosen queues.
> This is needed for efficient epoll() support anyway.
Would you expect to make this a per queue option of the hardware
configured via ethtool/netlink/sysfs and like where users steer traffic
to particular queues using existing ntuple filters or 'tc' or infer it
from the socket layer?
So configuration would be (a) enable busy-polling on queues x,y,z and
then (b) use ntuple/RSS/etc to steer relative traffic to queues. In
this case traffic doesn't need to be bound to a socket in any way.
Seems like a useful generalization.
Thanks,
John
>
> Eric Dumazet (5):
> net: busy-poll: allow preemption in sk_busy_loop()
> net: busy-poll: remove need_resched() from sk_can_busy_loop()
> net: busy-poll: return busypolling status to drivers
> net/mlx4_en: use napi_complete_done() return value
> bnx2x: switch to napi_complete_done()
>
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 15 ++--
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 +-
> include/linux/netdevice.h | 17 +++-
> include/net/busy_poll.h | 5 +-
> net/core/dev.c | 110 +++++++++++++++++++-----
> 5 files changed, 113 insertions(+), 38 deletions(-)
>
^ permalink raw reply
* Re: linux-next: net->netns_ids is used after calling idr_destroy for it
From: Cong Wang @ 2016-11-16 18:04 UTC (permalink / raw)
To: Andrei Vagin; +Cc: Nicolas Dichtel, Linux Kernel Network Developers
In-Reply-To: <CANaxB-ypW4cFxNTAaS-n6ehNHxGyxe9x0G8SJap6PBp8UJX9gA@mail.gmail.com>
On Tue, Nov 15, 2016 at 2:45 PM, Andrei Vagin <avagin@gmail.com> wrote:
> On Tue, Nov 15, 2016 at 2:21 PM, Andrei Vagin <avagin@gmail.com> wrote:
>> I would like to agree with you here, but looks like sockets with
>> NETLINK_F_LISTEN_ALL_NSID are able to catch these messages.
>
> Actually I found that I was wrong.
>
> do_one_broadcast() sends a notification only if a device network
> namespace has an id in a socket netns. But cleanup_net() removes all
> id-s to a target namespace, so just ignore my last comment.
The point is all sockets in that netns are already gone at that point
because of refcount.
cleanup_net() also destroys net->netns_ids, so it should not be even
accessed after that point.
^ permalink raw reply
* Re: [PATCH] net: dsa: fix fixed-link-phy device leaks
From: Florian Fainelli @ 2016-11-16 18:14 UTC (permalink / raw)
To: Johan Hovold
Cc: Andrew Lunn, Vivien Didelot, David S. Miller, netdev,
linux-kernel
In-Reply-To: <20161116171145.GE4864@localhost>
On 11/16/2016 09:11 AM, Johan Hovold wrote:
> On Wed, Nov 16, 2016 at 09:06:26AM -0800, Florian Fainelli wrote:
>> On 11/16/2016 06:47 AM, Johan Hovold wrote:
>>> Make sure to drop the reference taken by of_phy_find_device() when
>>> registering and deregistering the fixed-link PHY-device.
>>>
>>> Note that we need to put both references held at deregistration.
>>>
>>> Fixes: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port
>>> speeds/duplex")
>>> Signed-off-by: Johan Hovold <johan@kernel.org>
>>> ---
>>>
>>> Hi,
>>>
>>> This is one has been compile tested only, but fixes a couple of leaks
>>> similar to one that was found in the cpsw driver for which I just posted
>>> a patch.
>>>
>>> It turns out all drivers but DSA fail to deregister the fixed-link PHYs
>>> registered by of_phy_register_fixed_link(). Due to the way this
>>> interface was designed, deregistering such a PHY is a bit cumbersome and
>>> looks like it would benefit from a common helper.
>>>
>>> However, perhaps the interface should instead be changed so that the PHY
>>> device is returned so that drivers do not need to use
>>> of_phy_find_device() when they need to access properties of the fixed
>>> link (e.g. as in dsu_cpu_dsa_setup below).
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Johan
>>>
>>>
>>> net/dsa/dsa.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
>>> index a6902c1e2f28..798a6a776a5f 100644
>>> --- a/net/dsa/dsa.c
>>> +++ b/net/dsa/dsa.c
>>> @@ -233,6 +233,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
>>> genphy_read_status(phydev);
>>> if (ds->ops->adjust_link)
>>> ds->ops->adjust_link(ds, port, phydev);
>>> +
>>> + phy_device_free(phydev); /* of_phy_find_device */
>>> }
>>>
>>> return 0;
>>> @@ -509,8 +511,12 @@ void dsa_cpu_dsa_destroy(struct device_node *port_dn)
>>> if (of_phy_is_fixed_link(port_dn)) {
>>> phydev = of_phy_find_device(port_dn);
>>> if (phydev) {
>>> - phy_device_free(phydev);
>>> fixed_phy_unregister(phydev);
>>> + /* Put references taken by of_phy_find_device() and
>>> + * of_phy_register_fixed_link().
>>> + */
>>> + phy_device_free(phydev);
>>> + phy_device_free(phydev);
>>
>> Double free, this looks bogus here. Actually would not this mean a
>> triple free since you already free in dsa_cpu_dsa_setup() which is
>> paired with dsa_cpu_dsa_destroy()?
>
> The naming of phy_device_free() is unfortunate when it's really a put():
>
> void phy_device_free(struct phy_device *phydev)
> {
> put_device(&phydev->mdio.dev);
> }
Indeed, should have looked a little harder.
>
> which may need to be called multiple times, specifically after a call to
> of_phy_find_device() which takes another reference.
>
> With this patch the refcounts are properly balanced.
The intent of your patch is good, but it still feels like having to
double imbalance the refcount is symptomatic of a larger issue here, it
does not seem like having several refcounts are necessary, so we may
really want to rework the API.
--
Florian
^ permalink raw reply
* Re: [PATCH net][v2] bpf: fix range arithmetic for bpf map access
From: David Miller @ 2016-11-16 18:22 UTC (permalink / raw)
To: jbacik; +Cc: jannh, ast, daniel, netdev
In-Reply-To: <1479156336-6211-1-git-send-email-jbacik@fb.com>
From: Josef Bacik <jbacik@fb.com>
Date: Mon, 14 Nov 2016 15:45:36 -0500
> I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
> invalid accesses to bpf map entries. Fix this up by doing a few things
>
> 1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real
> life and just adds extra complexity.
>
> 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
> minimum value to 0 for positive AND's.
>
> 3) Don't do operations on the ranges if they are set to the limits, as they are
> by definition undefined, and allowing arithmetic operations on those values
> could make them appear valid when they really aren't.
>
> This fixes the testcase provided by Jann as well as a few other theoretical
> problems.
>
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
Applied, thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox