Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] net: gmii2rgmii: Switch priv field in mdio device structure
From: Harini Katakam @ 2019-08-13 15:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Harini Katakam, Florian Fainelli, Heiner Kallweit, David Miller,
	Michal Simek, netdev, linux-arm-kernel, linux-kernel,
	radhey.shyam.pandey
In-Reply-To: <20190813132321.GF15047@lunn.ch>

Hi Andrew,

On Tue, Aug 13, 2019 at 6:54 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Tue, Aug 13, 2019 at 04:46:40PM +0530, Harini Katakam wrote:
> > Hi Andrew,
> >
> > On Thu, Aug 1, 2019 at 9:36 AM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > On Wed, Jul 31, 2019 at 03:06:19PM +0530, Harini Katakam wrote:
> > > > Use the priv field in mdio device structure instead of the one in
> > > > phy device structure. The phy device priv field may be used by the
> > > > external phy driver and should not be overwritten.
> > >
> > > Hi Harini
> > >
> > > I _think_ you could use dev_set_drvdata(&mdiodev->dev) in xgmiitorgmii_probe() and
> > > dev_get_drvdata(&phydev->mdiomdio.dev) in _read_status()
> >
> > Thanks for the review. This works if I do:
> > dev_set_drvdata(&priv->phy_dev->mdio.dev->dev) in probe
> > and then
> > dev_get_drvdata(&phydev->mdio.dev) in _read_status()
> >
> > i.e mdiodev in gmii2rgmii probe and priv->phy_dev->mdio are not the same.
> >
> > If this is acceptable, I can send a v2.
>
> Hi Harini
>
> I think this is better, making use of the central driver
> infrastructure, rather than inventing something new.

Ok sure.

>
> The kernel does have a few helper, spi_get_drvdata, pci_get_drvdata,
> hci_get_drvdata. So maybe had add phydev_get_drvdata(struct phy_device
> *phydev)?

Maybe phydev_mdio_get_drvdata? Because the driver data member available is
phydev->mdio.dev.driver_data.

Regards,
Harini

^ permalink raw reply

* Re: kernel BUG at net/rxrpc/local_object.c:LINE!
From: David Howells @ 2019-08-13 15:29 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: dhowells, syzbot, Eric Biggers, David Miller, linux-afs, LKML,
	netdev, syzkaller-bugs
In-Reply-To: <CACT4Y+YVyaTrwpaZfpfi9LKA=5TOdKSL60pjAH04dMPNCZTMSQ@mail.gmail.com>

Dmitry Vyukov <dvyukov@google.com> wrote:

> It only collects console output. I don't know what is trace log. If
> the trace log is not console output, then it won't.

Assuming the system is still alive:

	cat /sys/kernel/debug/tracing/trace

David

^ permalink raw reply

* Re: [PATCH net] netdevsim: Restore per-network namespace accounting for fib entries
From: Jiri Pirko @ 2019-08-13 15:34 UTC (permalink / raw)
  To: David Ahern; +Cc: David Miller, dsahern, netdev
In-Reply-To: <9306e893-cd43-75a0-9a81-fd2ee0dd44c5@gmail.com>

Tue, Aug 13, 2019 at 04:41:18PM CEST, dsahern@gmail.com wrote:
>On 8/13/19 1:14 AM, Jiri Pirko wrote:
>> Mon, Aug 12, 2019 at 05:28:02PM CEST, davem@davemloft.net wrote:
>>> From: Jiri Pirko <jiri@resnulli.us>
>>> Date: Mon, 12 Aug 2019 10:36:35 +0200
>>>
>>>> I understand it with real devices, but dummy testing device, who's
>>>> purpose is just to test API. Why?
>>>
>>> Because you'll break all of the wonderful testing infrastructure
>>> people like David have created.
>>  
>> Are you referring to selftests? There is no such test there :(
>
>I  have one now and will be submitting it after net merges with net-next.
>
>> But if it would be, could implement the limitations
>> properly (like using cgroups), change the tests and remove this
>> code from netdevsim?
>
>The intent of this code and test is to have a s/w model similar to how
>mlxsw works - responding to notifiers and deciding to reject a change.
>You are currently adding (or trying to) more devlink based s/w tests, so
>you must see the value of netdevsim as a source of testing.

Sure I do. Not sure makes sence to repeat myself again, but why not:
The way you use netdevsim with netnamespace limitation is nothing like
it is done in hardware. Devlink resources should limit the resources of
the device, not network namespace. You abused netdevsim and devlink for
that. Not cool :(

To be in sync with mlxsw, netdevsim should track fibs added to the ports
and apply the resource limitations to that. That is the correct
behaviour. Exacly like mlxsw does.

Frankly I don't really understand why you keep pushing your broken
design. Why the limitation applied only for fibs related to netdevsim
ports is not enough for testing??? Would that work for you? Please?

This is keeping me awake at night. Sigh :(

^ permalink raw reply

* Re: [PATCH 2/2] net: gmii2rgmii: Switch priv field in mdio device structure
From: Andrew Lunn @ 2019-08-13 15:38 UTC (permalink / raw)
  To: Harini Katakam
  Cc: Harini Katakam, Florian Fainelli, Heiner Kallweit, David Miller,
	Michal Simek, netdev, linux-arm-kernel, linux-kernel,
	radhey.shyam.pandey
In-Reply-To: <CAFcVECKipjD9atgEJSf8j78q_1aOAX77nD6vVeytZ-M00qBt6A@mail.gmail.com>

> > The kernel does have a few helper, spi_get_drvdata, pci_get_drvdata,
> > hci_get_drvdata. So maybe had add phydev_get_drvdata(struct phy_device
> > *phydev)?
> 
> Maybe phydev_mdio_get_drvdata? Because the driver data member available is
> phydev->mdio.dev.driver_data.

I still prefer phydev_get_drvdata(). It fits with the X_get_drvdata()
pattern, where X is the type of parameter passed to the call, spi,
pci, hci.

We can also add mdiodev_get_drvdata(mdiodev). A few DSA drivers could
use that.

   Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
From: Marek Behún @ 2019-08-13 15:44 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Heiner Kallweit, Sebastian Reichel, Vivien Didelot,
	Florian Fainelli, David S . Miller
In-Reply-To: <20190811165108.GG14290@lunn.ch>

Hi Andrew,

> We should read the switch registers. I think you can set the defaults
> using strapping pins. And in general, the driver always reads state
> from the hardware rather than caching it.

hmm. The cmode is cached for each port, though. For example
mv88e6390x_port_set_cmode compares the new requested value with the
cached one and doesn't do anything if they are equal.

If mv88e6xxx_port_setup_mac can be called once per second by phylink as
you say, do we really want to read the value via MDIO every time? We
already have cmode cached (read from registers at mv88e6xxx_setup, and
then changed when cmode change is requested). From cmode we can already
differentiate mode in the terms of phy_interface_t, unless it is RGMII,
in which case we would have to read RX/TX timing.

Marek

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
From: Marek Behún @ 2019-08-13 15:51 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Heiner Kallweit, Sebastian Reichel, Vivien Didelot,
	Florian Fainelli, David S . Miller
In-Reply-To: <20190813174416.5c57b08f@dellmb.labs.office.nic.cz>

On Tue, 13 Aug 2019 17:44:16 +0200
Marek Behún <marek.behun@nic.cz> wrote:

> Hi Andrew,
> 
> > We should read the switch registers. I think you can set the
> > defaults using strapping pins. And in general, the driver always
> > reads state from the hardware rather than caching it.  
> 
> hmm. The cmode is cached for each port, though. For example
> mv88e6390x_port_set_cmode compares the new requested value with the
> cached one and doesn't do anything if they are equal.
> 
> If mv88e6xxx_port_setup_mac can be called once per second by phylink
> as you say, do we really want to read the value via MDIO every time?
> We already have cmode cached (read from registers at mv88e6xxx_setup,
> and then changed when cmode change is requested). From cmode we can
> already differentiate mode in the terms of phy_interface_t, unless it
> is RGMII, in which case we would have to read RX/TX timing.
> 
> Marek

/o\ OK. I see now that mv88e6xxx_port_setup_mac already calls
->port_link_state(), which fills in a struct phylink_link_state, and
already does MDIO communication. Sorry :)
I will try to send a patch which adds the filling of the ->interface
member of the struct phylink_link_state in ->port_link_state() method.

Marek

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
From: Andrew Lunn @ 2019-08-13 15:52 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Heiner Kallweit, Sebastian Reichel, Vivien Didelot,
	Florian Fainelli, David S . Miller
In-Reply-To: <20190813174416.5c57b08f@dellmb.labs.office.nic.cz>

On Tue, Aug 13, 2019 at 05:44:16PM +0200, Marek Behún wrote:
> Hi Andrew,
> 
> > We should read the switch registers. I think you can set the defaults
> > using strapping pins. And in general, the driver always reads state
> > from the hardware rather than caching it.
> 
> hmm. The cmode is cached for each port, though. For example
> mv88e6390x_port_set_cmode compares the new requested value with the
> cached one and doesn't do anything if they are equal.
> 
> If mv88e6xxx_port_setup_mac can be called once per second by phylink as
> you say, do we really want to read the value via MDIO every time? We
> already have cmode cached (read from registers at mv88e6xxx_setup, and
> then changed when cmode change is requested). From cmode we can already
> differentiate mode in the terms of phy_interface_t, unless it is RGMII,
> in which case we would have to read RX/TX timing.

Hi Marek

cmode gets used a lot, and in interrupt thread context. So i think it
was worth caching it. RGMII Rx/Tx timing is not used much, so i don't
think it is worth caching it. But as you say, using cmode to determine
if the registers actually need to be read does make sense. Most ports
don't use RGMII, they have internal PHYs.

      Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Igor Russkikh @ 2019-08-13 16:18 UTC (permalink / raw)
  To: Andrew Lunn, Antoine Tenart
  Cc: davem@davemloft.net, sd@queasysnail.net, f.fainelli@gmail.com,
	hkallweit1@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, thomas.petazzoni@bootlin.com,
	alexandre.belloni@bootlin.com, allan.nielsen@microchip.com,
	camelia.groza@nxp.com, Simon Edelhaus, Pavel Belous
In-Reply-To: <20190813131706.GE15047@lunn.ch>



On 13.08.2019 16:17, Andrew Lunn wrote:
> On Tue, Aug 13, 2019 at 10:58:17AM +0200, Antoine Tenart wrote:
>> I think this question is linked to the use of a MACsec virtual interface
>> when using h/w offloading. The starting point for me was that I wanted
>> to reuse the data structures and the API exposed to the userspace by the
>> s/w implementation of MACsec. I then had two choices: keeping the exact
>> same interface for the user (having a virtual MACsec interface), or
>> registering the MACsec genl ops onto the real net devices (and making
>> the s/w implementation a virtual net dev and a provider of the MACsec
>> "offloading" ops).
>>
>> The advantages of the first option were that nearly all the logic of the
>> s/w implementation could be kept and especially that it would be
>> transparent for the user to use both implementations of MACsec.
> 
> Hi Antoine
> 
> We have always talked about offloading operations to the hardware,
> accelerating what the linux stack can do by making use of hardware
> accelerators. The basic user API should not change because of
> acceleration. Those are the general guidelines.
> 
> It would however be interesting to get comments from those who did the
> software implementation and what they think of this architecture. I've
> no personal experience with MACSec, so it is hard for me to say if the
> current architecture makes sense when using accelerators.

In terms of overall concepts, I'd add the following:

1) With current implementation it's impossible to install SW macsec engine onto
the device which supports HW offload. That could be a strong limitation in
cases when user sees HW macsec offload is broken or work differently, and he/she
wants to replace it with SW one.
MACSec is a complex feature, and it may happen something is missing in HW.
Trivial example is 256bit encryption, which is not always a musthave in HW
implementations.

2) I think, Antoine, its not totally true that otherwise the user macsec API
will be broken/changed. netlink api is the same, the only thing we may want to
add is an optional parameter to force selection of SW macsec engine.

I'm also eager to hear from sw macsec users/devs on whats better here.


Regards,
  Igor


^ permalink raw reply

* [PATCH bpf-next v3 0/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song

Currently there is no way to propagate sk storage from the listener
socket to a newly accepted one. Consider the following use case:

        fd = socket();
        setsockopt(fd, SOL_IP, IP_TOS,...);
        /* ^^^ setsockopt BPF program triggers here and saves something
         * into sk storage of the listener.
         */
        listen(fd, ...);
        while (client = accept(fd)) {
                /* At this point all association between listener
                 * socket and newly accepted one is gone. New
                 * socket will not have any sk storage attached.
                 */
        }

Let's add new BPF_F_CLONE flag that can be specified when creating
a socket storage map. This new flag indicates that map contents
should be cloned when the socket is cloned.

v3:
* make sure BPF_F_NO_PREALLOC is always present when creating
  a map (Martin KaFai Lau)
* don't call bpf_sk_storage_free explicitly, rely on
  sk_free_unlock_clone to do the cleanup (Martin KaFai Lau)

v2:
* remove spinlocks around selem_link_map/sk (Martin KaFai Lau)
* BPF_F_CLONE on a map, not selem (Martin KaFai Lau)
* hold a map while cloning (Martin KaFai Lau)
* use BTF maps in selftests (Yonghong Song)
* do proper cleanup selftests; don't call close(-1) (Yonghong Song)
* export bpf_map_inc_not_zero

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>

Stanislav Fomichev (4):
  bpf: export bpf_map_inc_not_zero
  bpf: support cloning sk storage on accept()
  bpf: sync bpf.h to tools/
  selftests/bpf: add sockopt clone/inheritance test

 include/linux/bpf.h                           |   2 +
 include/net/bpf_sk_storage.h                  |  10 +
 include/uapi/linux/bpf.h                      |   3 +
 kernel/bpf/syscall.c                          |  16 +-
 net/core/bpf_sk_storage.c                     | 103 ++++++-
 net/core/sock.c                               |   9 +-
 tools/include/uapi/linux/bpf.h                |   3 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../selftests/bpf/progs/sockopt_inherit.c     |  97 +++++++
 .../selftests/bpf/test_sockopt_inherit.c      | 253 ++++++++++++++++++
 11 files changed, 490 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
 create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c

-- 
2.23.0.rc1.153.gdeed80330f-goog

^ permalink raw reply

* [PATCH bpf-next v3 1/4] bpf: export bpf_map_inc_not_zero
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <20190813162630.124544-1-sdf@google.com>

Rename existing bpf_map_inc_not_zero to __bpf_map_inc_not_zero to
indicate that it's caller's responsibility to do proper locking.
Create and export bpf_map_inc_not_zero wrapper that properly
locks map_idr_lock. Will be used in the next commit to
hold a map while cloning a socket.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/syscall.c | 16 +++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f9a506147c8a..15ae49862b82 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -647,6 +647,8 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
 struct bpf_map *__bpf_map_get(struct fd f);
 struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
+struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map,
+						   bool uref);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
 int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5d141f16f6fa..cf8052b016e7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -683,8 +683,8 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
 }
 
 /* map_idr_lock should have been held */
-static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map,
-					    bool uref)
+static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map,
+					      bool uref)
 {
 	int refold;
 
@@ -704,6 +704,16 @@ static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map,
 	return map;
 }
 
+struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
+{
+	spin_lock_bh(&map_idr_lock);
+	map = __bpf_map_inc_not_zero(map, uref);
+	spin_unlock_bh(&map_idr_lock);
+
+	return map;
+}
+EXPORT_SYMBOL_GPL(bpf_map_inc_not_zero);
+
 int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 {
 	return -ENOTSUPP;
@@ -2177,7 +2187,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 	spin_lock_bh(&map_idr_lock);
 	map = idr_find(&map_idr, id);
 	if (map)
-		map = bpf_map_inc_not_zero(map, true);
+		map = __bpf_map_inc_not_zero(map, true);
 	else
 		map = ERR_PTR(-ENOENT);
 	spin_unlock_bh(&map_idr_lock);
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next v3 2/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <20190813162630.124544-1-sdf@google.com>

Add new helper bpf_sk_storage_clone which optionally clones sk storage
and call it from sk_clone_lock.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/net/bpf_sk_storage.h |  10 ++++
 include/uapi/linux/bpf.h     |   3 +
 net/core/bpf_sk_storage.c    | 103 ++++++++++++++++++++++++++++++++++-
 net/core/sock.c              |   9 ++-
 4 files changed, 119 insertions(+), 6 deletions(-)

diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index b9dcb02e756b..8e4f831d2e52 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
 extern const struct bpf_func_proto bpf_sk_storage_get_proto;
 extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
 
+#ifdef CONFIG_BPF_SYSCALL
+int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
+#else
+static inline int bpf_sk_storage_clone(const struct sock *sk,
+				       struct sock *newsk)
+{
+	return 0;
+}
+#endif
+
 #endif /* _BPF_SK_STORAGE_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4393bd4b2419..0ef594ac3899 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -337,6 +337,9 @@ enum bpf_attach_type {
 #define BPF_F_RDONLY_PROG	(1U << 7)
 #define BPF_F_WRONLY_PROG	(1U << 8)
 
+/* Clone map from listener for newly accepted socket */
+#define BPF_F_CLONE		(1U << 9)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 94c7f77ecb6b..1bc7de7e18ba 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -12,6 +12,9 @@
 
 static atomic_t cache_idx;
 
+#define SK_STORAGE_CREATE_FLAG_MASK					\
+	(BPF_F_NO_PREALLOC | BPF_F_CLONE)
+
 struct bucket {
 	struct hlist_head list;
 	raw_spinlock_t lock;
@@ -209,7 +212,6 @@ static void selem_unlink_sk(struct bpf_sk_storage_elem *selem)
 		kfree_rcu(sk_storage, rcu);
 }
 
-/* sk_storage->lock must be held and sk_storage->list cannot be empty */
 static void __selem_link_sk(struct bpf_sk_storage *sk_storage,
 			    struct bpf_sk_storage_elem *selem)
 {
@@ -509,7 +511,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
 	return 0;
 }
 
-/* Called by __sk_destruct() */
+/* Called by __sk_destruct() & bpf_sk_storage_clone() */
 void bpf_sk_storage_free(struct sock *sk)
 {
 	struct bpf_sk_storage_elem *selem;
@@ -557,6 +559,11 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
 
 	smap = (struct bpf_sk_storage_map *)map;
 
+	/* Note that this map might be concurrently cloned from
+	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
+	 * RCU read section to finish before proceeding. New RCU
+	 * read sections should be prevented via bpf_map_inc_not_zero.
+	 */
 	synchronize_rcu();
 
 	/* bpf prog and the userspace can no longer access this map
@@ -601,7 +608,9 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
 
 static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
 {
-	if (attr->map_flags != BPF_F_NO_PREALLOC || attr->max_entries ||
+	if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
+	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
+	    attr->max_entries ||
 	    attr->key_size != sizeof(int) || !attr->value_size ||
 	    /* Enforce BTF for userspace sk dumping */
 	    !attr->btf_key_type_id || !attr->btf_value_type_id)
@@ -739,6 +748,94 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
 	return err;
 }
 
+static struct bpf_sk_storage_elem *
+bpf_sk_storage_clone_elem(struct sock *newsk,
+			  struct bpf_sk_storage_map *smap,
+			  struct bpf_sk_storage_elem *selem)
+{
+	struct bpf_sk_storage_elem *copy_selem;
+
+	copy_selem = selem_alloc(smap, newsk, NULL, true);
+	if (!copy_selem)
+		return NULL;
+
+	if (map_value_has_spin_lock(&smap->map))
+		copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
+				      SDATA(selem)->data, true);
+	else
+		copy_map_value(&smap->map, SDATA(copy_selem)->data,
+			       SDATA(selem)->data);
+
+	return copy_selem;
+}
+
+int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
+{
+	struct bpf_sk_storage *new_sk_storage = NULL;
+	struct bpf_sk_storage *sk_storage;
+	struct bpf_sk_storage_elem *selem;
+	int ret;
+
+	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
+
+	rcu_read_lock();
+	sk_storage = rcu_dereference(sk->sk_bpf_storage);
+
+	if (!sk_storage || hlist_empty(&sk_storage->list))
+		goto out;
+
+	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
+		struct bpf_sk_storage_elem *copy_selem;
+		struct bpf_sk_storage_map *smap;
+		struct bpf_map *map;
+
+		smap = rcu_dereference(SDATA(selem)->smap);
+		if (!(smap->map.map_flags & BPF_F_CLONE))
+			continue;
+
+		map = bpf_map_inc_not_zero(&smap->map, false);
+		if (IS_ERR(map))
+			continue;
+
+		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
+		if (!copy_selem) {
+			ret = -ENOMEM;
+			bpf_map_put(map);
+			goto err;
+		}
+
+		if (new_sk_storage) {
+			selem_link_map(smap, copy_selem);
+			__selem_link_sk(new_sk_storage, copy_selem);
+		} else {
+			ret = sk_storage_alloc(newsk, smap, copy_selem);
+			if (ret) {
+				kfree(copy_selem);
+				atomic_sub(smap->elem_size,
+					   &newsk->sk_omem_alloc);
+				bpf_map_put(map);
+				goto err;
+			}
+
+			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
+		}
+		bpf_map_put(map);
+	}
+
+out:
+	rcu_read_unlock();
+	return 0;
+
+err:
+	rcu_read_unlock();
+
+	/* Don't free anything explicitly here, caller is responsible to
+	 * call bpf_sk_storage_free in case of an error.
+	 */
+
+	return ret;
+}
+
 BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
 	   void *, value, u64, flags)
 {
diff --git a/net/core/sock.c b/net/core/sock.c
index d57b0cc995a0..f5e801a9cea4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 			goto out;
 		}
 		RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
-#ifdef CONFIG_BPF_SYSCALL
-		RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
-#endif
+
+		if (bpf_sk_storage_clone(sk, newsk)) {
+			sk_free_unlock_clone(newsk);
+			newsk = NULL;
+			goto out;
+		}
 
 		newsk->sk_err	   = 0;
 		newsk->sk_err_soft = 0;
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next v3 3/4] bpf: sync bpf.h to tools/
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <20190813162630.124544-1-sdf@google.com>

Sync new sk storage clone flag.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4393bd4b2419..0ef594ac3899 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -337,6 +337,9 @@ enum bpf_attach_type {
 #define BPF_F_RDONLY_PROG	(1U << 7)
 #define BPF_F_WRONLY_PROG	(1U << 8)
 
+/* Clone map from listener for newly accepted socket */
+#define BPF_F_CLONE		(1U << 9)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next v3 4/4] selftests/bpf: add sockopt clone/inheritance test
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <20190813162630.124544-1-sdf@google.com>

Add a test that calls setsockopt on the listener socket which triggers
BPF program. This BPF program writes to the sk storage and sets
clone flag. Make sure that sk storage is cloned for a newly
accepted connection.

We have two cloned maps in the tests to make sure we hit both cases
in bpf_sk_storage_clone: first element (sk_storage_alloc) and
non-first element(s) (selem_link_map).

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../selftests/bpf/progs/sockopt_inherit.c     |  97 +++++++
 .../selftests/bpf/test_sockopt_inherit.c      | 253 ++++++++++++++++++
 4 files changed, 353 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
 create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 90f70d2c7c22..60c9338cd9b4 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -42,4 +42,5 @@ xdping
 test_sockopt
 test_sockopt_sk
 test_sockopt_multi
+test_sockopt_inherit
 test_tcp_rtt
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3bd0f4a0336a..c875763a851a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_cgroup_storage test_select_reuseport test_section_names \
 	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
 	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
-	test_sockopt_multi test_tcp_rtt
+	test_sockopt_multi test_sockopt_inherit test_tcp_rtt
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -110,6 +110,7 @@ $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
 $(OUTPUT)/test_sockopt: cgroup_helpers.c
 $(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
 $(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
+$(OUTPUT)/test_sockopt_inherit: cgroup_helpers.c
 $(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
 
 .PHONY: force
diff --git a/tools/testing/selftests/bpf/progs/sockopt_inherit.c b/tools/testing/selftests/bpf/progs/sockopt_inherit.c
new file mode 100644
index 000000000000..dede0fcd6102
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/sockopt_inherit.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
+
+#define SOL_CUSTOM			0xdeadbeef
+#define CUSTOM_INHERIT1			0
+#define CUSTOM_INHERIT2			1
+#define CUSTOM_LISTENER			2
+
+struct sockopt_inherit {
+	__u8 val;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
+	__type(key, int);
+	__type(value, struct sockopt_inherit);
+} cloned1_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
+	__type(key, int);
+	__type(value, struct sockopt_inherit);
+} cloned2_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, struct sockopt_inherit);
+} listener_only_map SEC(".maps");
+
+static __inline struct sockopt_inherit *get_storage(struct bpf_sockopt *ctx)
+{
+	if (ctx->optname == CUSTOM_INHERIT1)
+		return bpf_sk_storage_get(&cloned1_map, ctx->sk, 0,
+					  BPF_SK_STORAGE_GET_F_CREATE);
+	else if (ctx->optname == CUSTOM_INHERIT2)
+		return bpf_sk_storage_get(&cloned2_map, ctx->sk, 0,
+					  BPF_SK_STORAGE_GET_F_CREATE);
+	else
+		return bpf_sk_storage_get(&listener_only_map, ctx->sk, 0,
+					  BPF_SK_STORAGE_GET_F_CREATE);
+}
+
+SEC("cgroup/getsockopt")
+int _getsockopt(struct bpf_sockopt *ctx)
+{
+	__u8 *optval_end = ctx->optval_end;
+	struct sockopt_inherit *storage;
+	__u8 *optval = ctx->optval;
+
+	if (ctx->level != SOL_CUSTOM)
+		return 1; /* only interested in SOL_CUSTOM */
+
+	if (optval + 1 > optval_end)
+		return 0; /* EPERM, bounds check */
+
+	storage = get_storage(ctx);
+	if (!storage)
+		return 0; /* EPERM, couldn't get sk storage */
+
+	ctx->retval = 0; /* Reset system call return value to zero */
+
+	optval[0] = storage->val;
+	ctx->optlen = 1;
+
+	return 1;
+}
+
+SEC("cgroup/setsockopt")
+int _setsockopt(struct bpf_sockopt *ctx)
+{
+	__u8 *optval_end = ctx->optval_end;
+	struct sockopt_inherit *storage;
+	__u8 *optval = ctx->optval;
+
+	if (ctx->level != SOL_CUSTOM)
+		return 1; /* only interested in SOL_CUSTOM */
+
+	if (optval + 1 > optval_end)
+		return 0; /* EPERM, bounds check */
+
+	storage = get_storage(ctx);
+	if (!storage)
+		return 0; /* EPERM, couldn't get sk storage */
+
+	storage->val = optval[0];
+	ctx->optlen = -1;
+
+	return 1;
+}
diff --git a/tools/testing/selftests/bpf/test_sockopt_inherit.c b/tools/testing/selftests/bpf/test_sockopt_inherit.c
new file mode 100644
index 000000000000..1bf699815b9b
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sockopt_inherit.c
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <error.h>
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <pthread.h>
+
+#include <linux/filter.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+
+#define CG_PATH				"/sockopt_inherit"
+#define SOL_CUSTOM			0xdeadbeef
+#define CUSTOM_INHERIT1			0
+#define CUSTOM_INHERIT2			1
+#define CUSTOM_LISTENER			2
+
+static int connect_to_server(int server_fd)
+{
+	struct sockaddr_storage addr;
+	socklen_t len = sizeof(addr);
+	int fd;
+
+	fd = socket(AF_INET, SOCK_STREAM, 0);
+	if (fd < 0) {
+		log_err("Failed to create client socket");
+		return -1;
+	}
+
+	if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
+		log_err("Failed to get server addr");
+		goto out;
+	}
+
+	if (connect(fd, (const struct sockaddr *)&addr, len) < 0) {
+		log_err("Fail to connect to server");
+		goto out;
+	}
+
+	return fd;
+
+out:
+	close(fd);
+	return -1;
+}
+
+static int verify_sockopt(int fd, int optname, const char *msg, char expected)
+{
+	socklen_t optlen = 1;
+	char buf = 0;
+	int err;
+
+	err = getsockopt(fd, SOL_CUSTOM, optname, &buf, &optlen);
+	if (err) {
+		log_err("%s: failed to call getsockopt", msg);
+		return 1;
+	}
+
+	printf("%s %d: got=0x%x ? expected=0x%x\n", msg, optname, buf, expected);
+
+	if (buf != expected) {
+		log_err("%s: unexpected getsockopt value %d != %d", msg,
+			buf, expected);
+		return 1;
+	}
+
+	return 0;
+}
+
+static void *server_thread(void *arg)
+{
+	struct sockaddr_storage addr;
+	socklen_t len = sizeof(addr);
+	int fd = *(int *)arg;
+	int client_fd;
+	int err = 0;
+
+	if (listen(fd, 1) < 0)
+		error(1, errno, "Failed to listed on socket");
+
+	err += verify_sockopt(fd, CUSTOM_INHERIT1, "listen", 1);
+	err += verify_sockopt(fd, CUSTOM_INHERIT2, "listen", 1);
+	err += verify_sockopt(fd, CUSTOM_LISTENER, "listen", 1);
+
+	client_fd = accept(fd, (struct sockaddr *)&addr, &len);
+	if (client_fd < 0)
+		error(1, errno, "Failed to accept client");
+
+	err += verify_sockopt(client_fd, CUSTOM_INHERIT1, "accept", 1);
+	err += verify_sockopt(client_fd, CUSTOM_INHERIT2, "accept", 1);
+	err += verify_sockopt(client_fd, CUSTOM_LISTENER, "accept", 0);
+
+	close(client_fd);
+
+	return (void *)(long)err;
+}
+
+static int start_server(void)
+{
+	struct sockaddr_in addr = {
+		.sin_family = AF_INET,
+		.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+	};
+	char buf;
+	int err;
+	int fd;
+	int i;
+
+	fd = socket(AF_INET, SOCK_STREAM, 0);
+	if (fd < 0) {
+		log_err("Failed to create server socket");
+		return -1;
+	}
+
+	for (i = CUSTOM_INHERIT1; i <= CUSTOM_LISTENER; i++) {
+		buf = 0x01;
+		err = setsockopt(fd, SOL_CUSTOM, i, &buf, 1);
+		if (err) {
+			log_err("Failed to call setsockopt(%d)", i);
+			close(fd);
+			return -1;
+		}
+	}
+
+	if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
+		log_err("Failed to bind socket");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int prog_attach(struct bpf_object *obj, int cgroup_fd, const char *title)
+{
+	enum bpf_attach_type attach_type;
+	enum bpf_prog_type prog_type;
+	struct bpf_program *prog;
+	int err;
+
+	err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
+	if (err) {
+		log_err("Failed to deduct types for %s BPF program", title);
+		return -1;
+	}
+
+	prog = bpf_object__find_program_by_title(obj, title);
+	if (!prog) {
+		log_err("Failed to find %s BPF program", title);
+		return -1;
+	}
+
+	err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
+			      attach_type, 0);
+	if (err) {
+		log_err("Failed to attach %s BPF program", title);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int run_test(int cgroup_fd)
+{
+	struct bpf_prog_load_attr attr = {
+		.file = "./sockopt_inherit.o",
+	};
+	int server_fd = -1, client_fd;
+	struct bpf_object *obj;
+	void *server_err;
+	pthread_t tid;
+	int ignored;
+	int err;
+
+	err = bpf_prog_load_xattr(&attr, &obj, &ignored);
+	if (err) {
+		log_err("Failed to load BPF object");
+		return -1;
+	}
+
+	err = prog_attach(obj, cgroup_fd, "cgroup/getsockopt");
+	if (err)
+		goto close_bpf_object;
+
+	err = prog_attach(obj, cgroup_fd, "cgroup/setsockopt");
+	if (err)
+		goto close_bpf_object;
+
+	server_fd = start_server();
+	if (server_fd < 0) {
+		err = -1;
+		goto close_bpf_object;
+	}
+
+	pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
+
+	client_fd = connect_to_server(server_fd);
+	if (client_fd < 0) {
+		err = -1;
+		goto close_server_fd;
+	}
+
+	err += verify_sockopt(client_fd, CUSTOM_INHERIT1, "connect", 0);
+	err += verify_sockopt(client_fd, CUSTOM_INHERIT2, "connect", 0);
+	err += verify_sockopt(client_fd, CUSTOM_LISTENER, "connect", 0);
+
+	pthread_join(tid, &server_err);
+
+	err += (int)(long)server_err;
+
+	close(client_fd);
+
+close_server_fd:
+	close(server_fd);
+close_bpf_object:
+	bpf_object__close(obj);
+	return err;
+}
+
+int main(int args, char **argv)
+{
+	int cgroup_fd;
+	int err = EXIT_SUCCESS;
+
+	if (setup_cgroup_environment())
+		return err;
+
+	cgroup_fd = create_and_get_cgroup(CG_PATH);
+	if (cgroup_fd < 0)
+		goto cleanup_cgroup_env;
+
+	if (join_cgroup(CG_PATH))
+		goto cleanup_cgroup;
+
+	if (run_test(cgroup_fd))
+		err = EXIT_FAILURE;
+
+	printf("test_sockopt_inherit: %s\n",
+	       err == EXIT_SUCCESS ? "PASSED" : "FAILED");
+
+cleanup_cgroup:
+	close(cgroup_fd);
+cleanup_cgroup_env:
+	cleanup_cgroup_environment();
+	return err;
+}
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* Re: [PATCH net] sctp: fix the transport error_count check
From: Marcelo Ricardo Leitner @ 2019-08-13 16:27 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Neil Horman
In-Reply-To: <55b2fe3e5123958ccd7983e0892bc604aa717132.1565614152.git.lucien.xin@gmail.com>

On Mon, Aug 12, 2019 at 08:49:12PM +0800, Xin Long wrote:
> As the annotation says in sctp_do_8_2_transport_strike():
> 
>   "If the transport error count is greater than the pf_retrans
>    threshold, and less than pathmaxrtx ..."
> 
> It should be transport->error_count checked with pathmaxrxt,
> instead of asoc->pf_retrans.
> 
> Fixes: 5aa93bcf66f4 ("sctp: Implement quick failover draft from tsvwg")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Dave, please consider this one for stable. Thanks.

> ---
>  net/sctp/sm_sideeffect.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index a554d6d..1cf5bb5 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -546,7 +546,7 @@ static void sctp_do_8_2_transport_strike(struct sctp_cmd_seq *commands,
>  	 */
>  	if (net->sctp.pf_enable &&
>  	   (transport->state == SCTP_ACTIVE) &&
> -	   (asoc->pf_retrans < transport->pathmaxrxt) &&
> +	   (transport->error_count < transport->pathmaxrxt) &&
>  	   (transport->error_count > asoc->pf_retrans)) {
>  
>  		sctp_assoc_control_transport(asoc, transport,
> -- 
> 2.1.0
> 

^ permalink raw reply

* Re: [PATCH rdma-next 0/4] Add XRQ and SRQ support to DEVX interface
From: Doug Ledford @ 2019-08-13 16:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, RDMA mailing list, Edward Srouji, Saeed Mahameed,
	Yishai Hadas, linux-netdev
In-Reply-To: <20190813100642.GE29138@mtr-leonro.mtl.com>

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

On Tue, 2019-08-13 at 10:06 +0000, Leon Romanovsky wrote:
> On Mon, Aug 12, 2019 at 11:43:58AM -0400, Doug Ledford wrote:
> > On Thu, 2019-08-08 at 10:11 +0000, Leon Romanovsky wrote:
> > > On Thu, Aug 08, 2019 at 11:43:54AM +0300, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@mellanox.com>
> > > > 
> > > > Hi,
> > > > 
> > > > This small series extends DEVX interface with SRQ and XRQ legacy
> > > > commands.
> > > 
> > > Sorry for typo in cover letter, there is no SRQ here.
> > 
> > Series looks fine to me.  Are you planning on the first two via
> > mlx5-
> > next and the remainder via RDMA tree?
> > 
> 
> Thanks, applied to mlx5-next
> 
> b1635ee6120c net/mlx5: Add XRQ legacy commands opcodes
> 647d58a989b3 net/mlx5: Use debug message instead of warn

Merged mlx5-next, then applied remaining two patches to for-next. 
Thanks.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Andrew Lunn @ 2019-08-13 16:28 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Antoine Tenart, davem@davemloft.net, sd@queasysnail.net,
	f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <2e3c2307-d414-a531-26cb-064e05fa01fc@aquantia.com>

> 1) With current implementation it's impossible to install SW macsec engine onto
> the device which supports HW offload. That could be a strong limitation in
> cases when user sees HW macsec offload is broken or work differently, and he/she
> wants to replace it with SW one.
> MACSec is a complex feature, and it may happen something is missing in HW.
> Trivial example is 256bit encryption, which is not always a musthave in HW
> implementations.

Ideally, we want the driver to return EOPNOTSUPP if it does not
support something and the software implement should be used.

If the offload is broken, we want a bug report! And if it works
differently, it suggests there is also a bug we need to fix, or the
standard is ambiguous.

It would also be nice to add extra information to the netlink API to
indicate if HW or SW is being used. In other places where we offload
to accelerators we have such additional information.

   Andrew

^ permalink raw reply

* Re: [PATCH] sctp: fix memleak in sctp_send_reset_streams
From: Marcelo Ricardo Leitner @ 2019-08-13 16:29 UTC (permalink / raw)
  To: zhengbin; +Cc: vyasevich, nhorman, davem, linux-sctp, netdev, yi.zhang
In-Reply-To: <1565705150-17242-1-git-send-email-zhengbin13@huawei.com>

On Tue, Aug 13, 2019 at 10:05:50PM +0800, zhengbin wrote:
> If the stream outq is not empty, need to kfree nstr_list.
> 
> Fixes: d570a59c5b5f ("sctp: only allow the out stream reset when the stream outq is empty")
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: zhengbin <zhengbin13@huawei.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  net/sctp/stream.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/sctp/stream.c b/net/sctp/stream.c
> index 2594660..e83cdaa 100644
> --- a/net/sctp/stream.c
> +++ b/net/sctp/stream.c
> @@ -316,6 +316,7 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
>  		nstr_list[i] = htons(str_list[i]);
> 
>  	if (out && !sctp_stream_outq_is_empty(stream, str_nums, nstr_list)) {
> +		kfree(nstr_list);
>  		retval = -EAGAIN;
>  		goto out;
>  	}
> --
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Terry Duncan @ 2019-08-13 16:31 UTC (permalink / raw)
  To: Tao Ren
  Cc: Jakub Kicinski, Andrew Lunn, netdev, openbmc, linux-kernel,
	Samuel Mendoza-Jonas, David S.Miller, William Kennington
In-Reply-To: <bc9da695-3fd3-6643-8e06-562cc08fbc62@linux.intel.com>

Tao, in your new patch will it be possible to disable the setting of the 
BMC MAC?  I would like to be able to send NCSI_OEM_GET_MAC perhaps with 
netlink (TBD) to get the system address without it affecting the BMC 
address.

I was about to send patches to add support for the Intel adapters when I 
saw this thread.

Thanks,

Terry

>>> 	After giving it more thought, I'm thinking about adding ncsi dt node
>>> with following structure (mac/ncsi similar to mac/mdio/phy):
>>>
>>> &mac0 {
>>>      /* MAC properties... */
>>>
>>>      use-ncsi;
>> This property seems to be specific to Faraday FTGMAC100. Are you going
>> to make it more generic?
> I'm also using ftgmac100 on my platform, and I don't have plan to change this property.
>
>>>      ncsi {
>>>          /* ncsi level properties if any */
>>>
>>>          package@0 {
>> You should get Rob Herring involved. This is not really describing
>> hardware, so it might get rejected by the device tree maintainer.
> Got it. Thank you for the sharing, and let me think it over :-)
>
>>> 1) mac driver doesn't need to parse "mac-offset" stuff: these
>>> ncsi-network-controller specific settings should be parsed in ncsi
>>> stack.
>>> 2) get_bmc_mac_address command is a channel specific command, and
>>> technically people can configure different offset/formula for
>>> different channels.
>> Does that mean the NCSA code puts the interface into promiscuous mode?
>> Or at least adds these unicast MAC addresses to the MAC receive
>> filter? Humm, ftgmac100 only seems to support multicast address
>> filtering, not unicast filters, so it must be using promisc mode, if
>> you expect to receive frames using this MAC address.
> Uhh, I actually didn't think too much about this: basically it's how to configure frame filtering when there are multiple packages/channels active: single BMC MAC or multiple BMC MAC is also allowed?
> I don't have the answer yet, but will talk to NCSI expert and figure it out.
>
>
> Thanks,
>
> Tao
>
>

^ permalink raw reply

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Christoph Hellwig @ 2019-08-13 16:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Michael S. Tsirkin, kvm, virtualization, netdev,
	linux-kernel, linux-mm
In-Reply-To: <20190813115707.GC29508@ziepe.ca>

On Tue, Aug 13, 2019 at 08:57:07AM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
> 
> > What kind of issues do you see? Spinlock is to synchronize GUP with MMU
> > notifier in this series.
> 
> A GUP that can't sleep can't pagefault which makes it a really weird
> pattern

get_user_pages/get_user_pages_fast must not be called under a spinlock.
We have the somewhat misnamed __get_user_page_fast that just does a
lookup for existing pages and never faults for a few places that need
to do that lookup from contexts where we can't sleep.

^ permalink raw reply

* Re: tc - mirred ingress not supported at the moment
From: Cong Wang @ 2019-08-13 16:47 UTC (permalink / raw)
  To: Martin Olsson; +Cc: netdev
In-Reply-To: <CAAT+qEbOx8Jh3aFS-e7U6FyHo03sdcY6UoeGzwYQbO6WRjc3PQ@mail.gmail.com>

On Tue, Aug 13, 2019 at 4:05 AM Martin Olsson
<martin.olsson+netdev@sentorsecurity.com> wrote:
> Q1: Why was 'ingress' not implemented at the same time as 'egress'?

Because you are using an old iproute2.

ingress support is added by:
https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=5eca0a3701223619a513c7209f7d9335ca1b4cfa


> 2)
> Ok, so I have to use 'egress':
> # tc filter add dev eno2 parent ffff: prio 999  protocol all matchall
> action mirred egress redirect dev mon0


So you redirect packets from eno2's ingress to mon0's egress.


>
> Since the mirred action forces me to use 'egress' as the direction on
> the dest interface, all kinds of network statistics tools show
> incorrect counters. :-(
> eno2 is a pure sniffer interface (it is connected to the SPAN dest
> port of a switch).
> All packets (matchall) on eno2 are mirrored to mon0.
>
> # ip -s link show dev eno2
>     ...
>     ...
>     RX: bytes  packets  errors  dropped overrun mcast
>     13660757   16329    0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       0       0       0
> # ip -s link show dev mon0
>     ...
>     ...
>     RX: bytes  packets  errors  dropped overrun mcast
>     0          0        0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     13660757   16329    0       0       0       0
>
> eno2 and mon0 should be identical, but they are inverted.

Yes, this behavior is correct. The keyword "egress" in your cmdline
already says so.

>
> Q2: So... Can the 'ingress' option please be implemented? (I'm no
> programmer, so unfortunetly I can't do it myself).

It is completed, you need to update your iproute2 and kernel.

Thanks.

^ permalink raw reply

* Re: [PATCH bpf-next v3 2/4] bpf: support cloning sk storage on accept()
From: Yonghong Song @ 2019-08-13 16:58 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	Martin Lau
In-Reply-To: <20190813162630.124544-3-sdf@google.com>



On 8/13/19 9:26 AM, Stanislav Fomichev wrote:
> Add new helper bpf_sk_storage_clone which optionally clones sk storage
> and call it from sk_clone_lock.
> 
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
> Acked-by: Yonghong Song <yhs@fb.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>   include/net/bpf_sk_storage.h |  10 ++++
>   include/uapi/linux/bpf.h     |   3 +
>   net/core/bpf_sk_storage.c    | 103 ++++++++++++++++++++++++++++++++++-
>   net/core/sock.c              |   9 ++-
>   4 files changed, 119 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
> index b9dcb02e756b..8e4f831d2e52 100644
> --- a/include/net/bpf_sk_storage.h
> +++ b/include/net/bpf_sk_storage.h
> @@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
>   extern const struct bpf_func_proto bpf_sk_storage_get_proto;
>   extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
>   
> +#ifdef CONFIG_BPF_SYSCALL
> +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
> +#else
> +static inline int bpf_sk_storage_clone(const struct sock *sk,
> +				       struct sock *newsk)
> +{
> +	return 0;
> +}
> +#endif
> +
>   #endif /* _BPF_SK_STORAGE_H */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..0ef594ac3899 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -337,6 +337,9 @@ enum bpf_attach_type {
>   #define BPF_F_RDONLY_PROG	(1U << 7)
>   #define BPF_F_WRONLY_PROG	(1U << 8)
>   
> +/* Clone map from listener for newly accepted socket */
> +#define BPF_F_CLONE		(1U << 9)
> +
>   /* flags for BPF_PROG_QUERY */
>   #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
>   
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 94c7f77ecb6b..1bc7de7e18ba 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -12,6 +12,9 @@
>   
>   static atomic_t cache_idx;
>   
> +#define SK_STORAGE_CREATE_FLAG_MASK					\
> +	(BPF_F_NO_PREALLOC | BPF_F_CLONE)
> +
>   struct bucket {
>   	struct hlist_head list;
>   	raw_spinlock_t lock;
> @@ -209,7 +212,6 @@ static void selem_unlink_sk(struct bpf_sk_storage_elem *selem)
>   		kfree_rcu(sk_storage, rcu);
>   }
>   
> -/* sk_storage->lock must be held and sk_storage->list cannot be empty */
>   static void __selem_link_sk(struct bpf_sk_storage *sk_storage,
>   			    struct bpf_sk_storage_elem *selem)
>   {
> @@ -509,7 +511,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
>   	return 0;
>   }
>   
> -/* Called by __sk_destruct() */
> +/* Called by __sk_destruct() & bpf_sk_storage_clone() */
>   void bpf_sk_storage_free(struct sock *sk)
>   {
>   	struct bpf_sk_storage_elem *selem;
> @@ -557,6 +559,11 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
>   
>   	smap = (struct bpf_sk_storage_map *)map;
>   
> +	/* Note that this map might be concurrently cloned from
> +	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
> +	 * RCU read section to finish before proceeding. New RCU
> +	 * read sections should be prevented via bpf_map_inc_not_zero.
> +	 */
>   	synchronize_rcu();
>   
>   	/* bpf prog and the userspace can no longer access this map
> @@ -601,7 +608,9 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
>   
>   static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
>   {
> -	if (attr->map_flags != BPF_F_NO_PREALLOC || attr->max_entries ||
> +	if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
> +	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
> +	    attr->max_entries ||
>   	    attr->key_size != sizeof(int) || !attr->value_size ||
>   	    /* Enforce BTF for userspace sk dumping */
>   	    !attr->btf_key_type_id || !attr->btf_value_type_id)
> @@ -739,6 +748,94 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
>   	return err;
>   }
>   
> +static struct bpf_sk_storage_elem *
> +bpf_sk_storage_clone_elem(struct sock *newsk,
> +			  struct bpf_sk_storage_map *smap,
> +			  struct bpf_sk_storage_elem *selem)
> +{
> +	struct bpf_sk_storage_elem *copy_selem;
> +
> +	copy_selem = selem_alloc(smap, newsk, NULL, true);
> +	if (!copy_selem)
> +		return NULL;
> +
> +	if (map_value_has_spin_lock(&smap->map))
> +		copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
> +				      SDATA(selem)->data, true);
> +	else
> +		copy_map_value(&smap->map, SDATA(copy_selem)->data,
> +			       SDATA(selem)->data);
> +
> +	return copy_selem;
> +}
> +
> +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> +{
> +	struct bpf_sk_storage *new_sk_storage = NULL;
> +	struct bpf_sk_storage *sk_storage;
> +	struct bpf_sk_storage_elem *selem;
> +	int ret;
> +
> +	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> +
> +	rcu_read_lock();
> +	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> +
> +	if (!sk_storage || hlist_empty(&sk_storage->list))
> +		goto out;
> +
> +	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> +		struct bpf_sk_storage_elem *copy_selem;
> +		struct bpf_sk_storage_map *smap;
> +		struct bpf_map *map;
> +
> +		smap = rcu_dereference(SDATA(selem)->smap);
> +		if (!(smap->map.map_flags & BPF_F_CLONE))
> +			continue;
> +
> +		map = bpf_map_inc_not_zero(&smap->map, false);
> +		if (IS_ERR(map))
> +			continue;
> +
> +		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> +		if (!copy_selem) {
> +			ret = -ENOMEM;
> +			bpf_map_put(map);
> +			goto err;
> +		}
> +
> +		if (new_sk_storage) {
> +			selem_link_map(smap, copy_selem);
> +			__selem_link_sk(new_sk_storage, copy_selem);
> +		} else {
> +			ret = sk_storage_alloc(newsk, smap, copy_selem);
> +			if (ret) {
> +				kfree(copy_selem);
> +				atomic_sub(smap->elem_size,
> +					   &newsk->sk_omem_alloc);
> +				bpf_map_put(map);
> +				goto err;
> +			}
> +
> +			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> +		}
> +		bpf_map_put(map);
> +	}
> +
> +out:
> +	rcu_read_unlock();
> +	return 0;
> +
> +err:
> +	rcu_read_unlock();
> +
> +	/* Don't free anything explicitly here, caller is responsible to
> +	 * call bpf_sk_storage_free in case of an error.
> +	 */
> +
> +	return ret;

A nit.
If you set ret = 0 initially, you do not need the above two 
rcu_read_unlock(). One "return ret" should be enough.
The comment can be changed to
	/* In case of an error, ... */


> +}
> +
>   BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
>   	   void *, value, u64, flags)
>   {
> diff --git a/net/core/sock.c b/net/core/sock.c
> index d57b0cc995a0..f5e801a9cea4 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
>   			goto out;
>   		}
>   		RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
> -#ifdef CONFIG_BPF_SYSCALL
> -		RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> -#endif
> +
> +		if (bpf_sk_storage_clone(sk, newsk)) {
> +			sk_free_unlock_clone(newsk);
> +			newsk = NULL;
> +			goto out;
> +		}
>   
>   		newsk->sk_err	   = 0;
>   		newsk->sk_err_soft = 0;
> 

^ permalink raw reply

* Re: [patch net-next v3 1/3] net: devlink: allow to change namespaces
From: Jakub Kicinski @ 2019-08-13 17:04 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, stephen, dsahern, mlxsw
In-Reply-To: <20190813061355.GF2428@nanopsycho>

On Tue, 13 Aug 2019 08:13:55 +0200, Jiri Pirko wrote:
> Tue, Aug 13, 2019 at 03:21:22AM CEST, jakub.kicinski@netronome.com wrote:
> >On Mon, 12 Aug 2019 15:47:49 +0200, Jiri Pirko wrote:  
> >> @@ -6953,9 +7089,33 @@ int devlink_compat_switch_id_get(struct net_device *dev,
> >>  	return 0;
> >>  }
> >>  
> >> +static void __net_exit devlink_pernet_exit(struct net *net)
> >> +{
> >> +	struct devlink *devlink;
> >> +
> >> +	mutex_lock(&devlink_mutex);
> >> +	list_for_each_entry(devlink, &devlink_list, list)
> >> +		if (net_eq(devlink_net(devlink), net))
> >> +			devlink_netns_change(devlink, &init_net);
> >> +	mutex_unlock(&devlink_mutex);
> >> +}  
> >
> >Just to be sure - this will not cause any locking issues?
> >Usually the locking order goes devlink -> rtnl  
> 
> rtnl is not taken. Do I miss something?

Probably not, just double checking.

^ permalink raw reply

* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Will Deacon @ 2019-08-13 17:08 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nick Desaulniers, Andrew Morton, Sedat Dilek, Josh Poimboeuf, yhs,
	clang-built-linux, Catalin Marinas, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrey Konovalov,
	Greg Kroah-Hartman, Enrico Weigelt, Suzuki K Poulose,
	Thomas Gleixner, Masayoshi Mizuma, Shaokun Zhang, Alexios Zavras,
	Allison Randal, Linux ARM, linux-kernel, Network Development, bpf
In-Reply-To: <CANiq72mAfJ23PyWzZAELgbKQDCX2nvY0z+dmOMe14qz=wa6eFg@mail.gmail.com>

On Tue, Aug 13, 2019 at 02:36:06PM +0200, Miguel Ojeda wrote:
> On Tue, Aug 13, 2019 at 10:27 AM Will Deacon <will@kernel.org> wrote:
> > On Mon, Aug 12, 2019 at 02:50:45PM -0700, Nick Desaulniers wrote:
> > > GCC unescapes escaped string section names while Clang does not. Because
> > > __section uses the `#` stringification operator for the section name, it
> > > doesn't need to be escaped.
> > >
> > > This antipattern was found with:
> > > $ grep -e __section\(\" -e __section__\(\" -r
> > >
> > > Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> > > Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> > > ---
> > >  arch/arm64/include/asm/cache.h     | 2 +-
> > >  arch/arm64/kernel/smp_spin_table.c | 2 +-
> > >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > Does this fix a build issue, or is it just cosmetic or do we end up with
> > duplicate sections or something else?
> 
> This should be cosmetic -- basically we are trying to move all users
> of current available __attribute__s in compiler_attributes.h to the
> __attr forms. I am also adding (slowly) new attributes that are
> already used but we don't have them yet in __attr form.
> 
> > Happy to route it via arm64, just having trouble working out whether it's
> > 5.3 material!
> 
> As you prefer! Those that are not taken by a maintainer I will pick up
> and send via compiler-attributes.
> 
> I would go for 5.4, since there is no particular rush anyway.

Okey doke, I'll pick this one up for 5.4 then. Thanks for the explanation!

Will

^ permalink raw reply

* [RFC PATCH] bpf: handle 32-bit zext during constant blinding
From: Naveen N. Rao @ 2019-08-13 17:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Jiong Wang
  Cc: Michael Ellerman, bpf, linuxppc-dev, netdev, linux-kernel

Since BPF constant blinding is performed after the verifier pass, there
are certain ALU32 instructions inserted which don't have a corresponding
zext instruction inserted after. This is causing a kernel oops on
powerpc and can be reproduced by running 'test_cgroup_storage' with
bpf_jit_harden=2.

Fix this by emitting BPF_ZEXT during constant blinding if
prog->aux->verifier_zext is set.

Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis result")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
This approach (the location where zext is being introduced below, in 
particular) works for powerpc, but I am not entirely sure if this is 
sufficient for other architectures as well. This is broken on v5.3-rc4.

- Naveen


 kernel/bpf/core.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8191a7db2777..d84146e6fd9e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -890,7 +890,8 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
 
 static int bpf_jit_blind_insn(const struct bpf_insn *from,
 			      const struct bpf_insn *aux,
-			      struct bpf_insn *to_buff)
+			      struct bpf_insn *to_buff,
+			      bool emit_zext)
 {
 	struct bpf_insn *to = to_buff;
 	u32 imm_rnd = get_random_int();
@@ -939,6 +940,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
 		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
 		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
 		*to++ = BPF_ALU32_REG(from->code, from->dst_reg, BPF_REG_AX);
+		if (emit_zext)
+			*to++ = BPF_ZEXT_REG(from->dst_reg);
 		break;
 
 	case BPF_ALU64 | BPF_ADD | BPF_K:
@@ -992,6 +995,10 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
 			off -= 2;
 		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
 		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
+		if (emit_zext) {
+			*to++ = BPF_ZEXT_REG(BPF_REG_AX);
+			off--;
+		}
 		*to++ = BPF_JMP32_REG(from->code, from->dst_reg, BPF_REG_AX,
 				      off);
 		break;
@@ -1005,6 +1012,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
 	case 0: /* Part 2 of BPF_LD | BPF_IMM | BPF_DW. */
 		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ aux[0].imm);
 		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
+		if (emit_zext)
+			*to++ = BPF_ZEXT_REG(BPF_REG_AX);
 		*to++ = BPF_ALU64_REG(BPF_OR,  aux[0].dst_reg, BPF_REG_AX);
 		break;
 
@@ -1088,7 +1097,8 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
 		    insn[1].code == 0)
 			memcpy(aux, insn, sizeof(aux));
 
-		rewritten = bpf_jit_blind_insn(insn, aux, insn_buff);
+		rewritten = bpf_jit_blind_insn(insn, aux, insn_buff,
+						clone->aux->verifier_zext);
 		if (!rewritten)
 			continue;
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next] net: dsa: mv88e6xxx: check for mode change in port_setup_mac
From: Marek Behún @ 2019-08-13 17:12 UTC (permalink / raw)
  To: netdev; +Cc: Andrew Lunn, Vivien Didelot, Heiner Kallweit, Marek Behún

The mv88e6xxx_port_setup_mac checks if the requested MAC settings are
different from the current ones, and if not, does nothing (since chaning
them requires putting the link down).

In this check it only looks if the triplet [link, speed, duplex] is
being changed.

This patch adds support to also check if the mode parameter (of type
phy_interface_t) is requested to be changed. The current mode is
computed by the ->port_link_state() method, and if it is different from
PHY_INTERFACE_MODE_NA, we check for equality with the requested mode.

In the implementations of the mv88e6250_port_link_state() method we set
the current mode to PHY_INTERFACE_MODE_NA - so the code does not check
for mode change on 6250.

In the mv88e6352_port_link_state() method, we use the cached cmode of
the port to determine the mode as phy_interface_t (and if it is not
enough, eg. for RGMII, we also look at the port control register for
RX/TX timings).

Signed-off-by: Marek Behún <marek.behun@nic.cz>
---
 drivers/net/dsa/mv88e6xxx/chip.c |  4 +++-
 drivers/net/dsa/mv88e6xxx/port.c | 40 +++++++++++++++++++++++++++++++-
 drivers/net/dsa/mv88e6xxx/port.h |  1 +
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 818a83eb2dcb..9b3ad22a5b98 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -417,7 +417,9 @@ int mv88e6xxx_port_setup_mac(struct mv88e6xxx_chip *chip, int port, int link,
 	 */
 	if (state.link == link &&
 	    state.speed == speed &&
-	    state.duplex == duplex)
+	    state.duplex == duplex &&
+	    (state.interface == mode ||
+	     state.interface == PHY_INTERFACE_MODE_NA))
 		return 0;
 
 	/* Port's MAC control must not be changed unless the link is down */
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 04309ef0a1cc..304e5b118b08 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -590,6 +590,7 @@ int mv88e6250_port_link_state(struct mv88e6xxx_chip *chip, int port,
 	state->link = !!(reg & MV88E6250_PORT_STS_LINK);
 	state->an_enabled = 1;
 	state->an_complete = state->link;
+	state->interface = PHY_INTERFACE_MODE_NA;
 
 	return 0;
 }
@@ -598,12 +599,49 @@ int mv88e6352_port_link_state(struct mv88e6xxx_chip *chip, int port,
 			      struct phylink_link_state *state)
 {
 	int err;
-	u16 reg;
+	u16 reg, mac;
 
 	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_STS, &reg);
 	if (err)
 		return err;
 
+	switch (chip->ports[port].cmode) {
+	case MV88E6XXX_PORT_STS_CMODE_RGMII:
+		err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_MAC_CTL,
+					  &mac);
+		if (err)
+			return err;
+
+		if ((mac & MV88E6XXX_PORT_MAC_CTL_RGMII_DELAY_RXCLK) &&
+		    (mac & MV88E6XXX_PORT_MAC_CTL_RGMII_DELAY_TXCLK))
+			state->interface = PHY_INTERFACE_MODE_RGMII_ID;
+		else if (mac & MV88E6XXX_PORT_MAC_CTL_RGMII_DELAY_RXCLK)
+			state->interface = PHY_INTERFACE_MODE_RGMII_RXID;
+		else if (mac & MV88E6XXX_PORT_MAC_CTL_RGMII_DELAY_TXCLK)
+			state->interface = PHY_INTERFACE_MODE_RGMII_TXID;
+		else
+			state->interface = PHY_INTERFACE_MODE_RGMII;
+		break;
+	case MV88E6XXX_PORT_STS_CMODE_1000BASE_X:
+		state->interface = PHY_INTERFACE_MODE_1000BASEX;
+		break;
+	case MV88E6XXX_PORT_STS_CMODE_SGMII:
+		state->interface = PHY_INTERFACE_MODE_SGMII;
+		break;
+	case MV88E6XXX_PORT_STS_CMODE_2500BASEX:
+		state->interface = PHY_INTERFACE_MODE_2500BASEX;
+		break;
+	case MV88E6XXX_PORT_STS_CMODE_XAUI:
+		state->interface = PHY_INTERFACE_MODE_XAUI;
+		break;
+	case MV88E6XXX_PORT_STS_CMODE_RXAUI:
+		state->interface = PHY_INTERFACE_MODE_RXAUI;
+		break;
+	default:
+		/* we do not support other cmode values here */
+		state->interface = PHY_INTERFACE_MODE_NA;
+	}
+
 	switch (reg & MV88E6XXX_PORT_STS_SPEED_MASK) {
 	case MV88E6XXX_PORT_STS_SPEED_10:
 		state->speed = SPEED_10;
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
index ceec771f8bfc..1abf5ea033e2 100644
--- a/drivers/net/dsa/mv88e6xxx/port.h
+++ b/drivers/net/dsa/mv88e6xxx/port.h
@@ -42,6 +42,7 @@
 #define MV88E6XXX_PORT_STS_TX_PAUSED		0x0020
 #define MV88E6XXX_PORT_STS_FLOW_CTL		0x0010
 #define MV88E6XXX_PORT_STS_CMODE_MASK		0x000f
+#define MV88E6XXX_PORT_STS_CMODE_RGMII		0x0007
 #define MV88E6XXX_PORT_STS_CMODE_100BASE_X	0x0008
 #define MV88E6XXX_PORT_STS_CMODE_1000BASE_X	0x0009
 #define MV88E6XXX_PORT_STS_CMODE_SGMII		0x000a
-- 
2.21.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox