Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 0/2] TLS TX HW offload for Bond
From: Tariq Toukan @ 2020-11-24 15:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jarod Wilson, Tariq Toukan, David S. Miller, netdev,
	Saeed Mahameed, Moshe Shemesh, Maor Gottlieb
In-Reply-To: <20201123102040.338f32c7@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>



On 11/23/2020 8:20 PM, Jakub Kicinski wrote:
> On Sun, 22 Nov 2020 14:48:04 +0200 Tariq Toukan wrote:
>> On 11/19/2020 6:38 PM, Jakub Kicinski wrote:
>>> On Thu, 19 Nov 2020 17:59:38 +0200 Tariq Toukan wrote:
>>>> On 11/19/2020 2:02 AM, Jakub Kicinski wrote:
>>>>> On Sun, 15 Nov 2020 15:42:49 +0200 Tariq Toukan wrote:
>>>>>> This series opens TLS TX HW offload for bond interfaces.
>>>>>> This allows bond interfaces to benefit from capable slave devices.
>>>>>>
>>>>>> The first patch adds real_dev field in TLS context structure, and aligns
>>>>>> usages in TLS module and supporting drivers.
>>>>>> The second patch opens the offload for bond interfaces.
>>>>>>
>>>>>> For the configuration above, SW kTLS keeps picking the same slave
>>>>>> To keep simple track of the HW and SW TLS contexts, we bind each socket to
>>>>>> a specific slave for the socket's whole lifetime. This is logically valid
>>>>>> (and similar to the SW kTLS behavior) in the following bond configuration,
>>>>>> so we restrict the offload support to it:
>>>>>>
>>>>>> ((mode == balance-xor) or (mode == 802.3ad))
>>>>>> and xmit_hash_policy == layer3+4.
>>>>>
>>>>> This does not feel extremely clean, maybe you can convince me otherwise.
>>>>>
>>>>> Can we extend netdev_get_xmit_slave() and figure out the output dev
>>>>> (and if it's "stable") in a more generic way? And just feed that dev
>>>>> into TLS handling?
>>>>
>>>> I don't see we go through netdev_get_xmit_slave(), but through
>>>> .ndo_start_xmit (bond_start_xmit).
>>>
>>> I may be misunderstanding the purpose of netdev_get_xmit_slave(),
>>> please correct me if I'm wrong. AFAIU it's supposed to return a
>>> lower netdev that the skb should then be xmited on.
>>
>> That's true. It was recently added and used by the RDMA team. Not used
>> or integrated in the Eth networking stack.
>>
>>> So what I was thinking was either construct an skb or somehow reshuffle
>>> the netdev_get_xmit_slave() code to take a flow dissector output or
>>> ${insert other ideas}. Then add a helper in the core that would drill
>>> down from the socket netdev to the "egress" netdev. Have TLS call
>>> that helper, and talk to the "egress" netdev from the start, rather
>>> than the socket's netdev. Then loosen the checks on software devices.
>>
>> As I understand it, best if we can even generalize this to apply to all
>> kinds of traffic: bond driver won't do the xmit itself anymore, it just
>> picks an egress dev and returns it. The core infrastructure will call
>> the xmit function for the egress dev.
> 
> I think you went way further than I was intending :) I was only
> considering the control path. Leave the datapath unchanged.
> 
> AFAIK you're making 3 changes:
>   - forwarding tls ops
>   - pinning flows
>   - handling features
> 
> Pinning of the TLS device to a leg of the bond looks like ~15LoC.
> I think we can live with that.
> 

Good.
You mean the changes under __bond_start_xmit ?

> It's the 150 LoC of forwarding TLS ops and duplicating dev selection
> logic in bond_sk_hash_l34() that I'd rather avoid.
> 

I see.
But there are several issues with this:

1. The .ndo_get_xmit_slave acts on an SKB, not a socket. Hence, it 
doesn't fit for the stage of calling tls_dev_add, unless the ndo goes 
through some refactoring before the feature itself.

2. Existing hash logic acts on an SKB. We must have one that acts on a 
socket to be used inside get_slave(sk). Hence, I don't really see how 
the logic under bond_sk_hash_l34() are going to disappear, maybe just 
move around to a new place.


> Handling features is probably fine, too, I haven't thought about that
> much.
> 

Good.

>> I like the idea, it can generalize code structures for all kinds of
>> upper-devices and sockets, taking them into a common place in core,
>> which reduces code duplications.
>>
>> If we go only half the way, i.e. keep xmit logic in bond for
>> non-TLS-offloaded traffic, then we have to let TLS module (and others in
>> the future) act deferentially for different kinds of devs (upper/lower)
>> which IMHO reduces generality.
> 
> How so? I was expecting TLS to just do something like:
> 
> 	netdev = sk_get_xmit_dev_lowest(sk);
> 
> which would recursively call get_xmit_slave(CONST) until it reaches
> a device which doesn't resolve further.
> 
> BTW is the flow pinning to bond legs actually a must-do? I don't know
> much about bonding but wouldn't that mean that if the selected leg goes
> down we'd lose connectivity, rather than falling back to SW crypto?
> 

Right. As long as we don't have logic for un-offloading.
Currently in TLS, the device-offloaded connections has some 
"independence" once they are created, it's hard to modify them and apply 
configuration modifications to them (example: interaction with tx csum 
offload).
So I think there is a missing un-offloading mechanism in TLS that should 
address all of these together.

This fits with your comments below.

>> I'm in favor of the deeper change. It will be on a larger scale, and
>> totally orthogonal to the current TLS offload support in bond.
>>
>> If we decide to apply the idea only to TLS sockets (or any subset of
>> sockets) we're actually taking a generic one-flow (the xmit patch of a
>> bond dev) and turning it into two (or potentially more) flows, depending
>> on the socket type. This also reduces generality.
> 
> I don't follow this part.
> 
>>> I'm probably missing the problem you're trying to explain to me :S
>>
>> I kept the patch minimal, and kept the TLS offload logic internal to the
>> bond driver, just like it is internal to the device drivers (mlx5e, and
>> others), with no core infrastructure modification.
>>
>>> Side note - Jarod, I'd be happy to take a patch renaming
>>> netdev_get_xmit_slave() and the ndo, if you have the cycles to send
>>> a patch. It's a recent addition, and in the core we should make more
>>> of an effort to avoid sensitive terms.
>>>    
>>>> Currently I have my check there to
>>>> catch all skbs belonging to offloaded TLS sockets.
>>>>
>>>> The TLS offload get_slave() logic decision is per socket, so the result
>>>> cannot be saved in the bond memory. Currently I save the real_dev field
>>>> in the TLS context structure.
>>>
>>> Right, but we could just have ctx->netdev be the "egress" netdev
>>> always, right? Do we expect somewhere that it's going to be matching
>>> the socket's dst?
>>
>> So once the offload context is established we totally bypass the bond
>> dev? and lose interaction or reference to it?
> 
> Yup, I don't think we need it.
> 
>> What if the egress dev is detached form the bond? We must then be
>> notified somehow.
> 
> Do we notify TLS when routing changes? I think it's a separate topic.
> 
> If we have the code to "un-offload" a flow we could handle clearing
> features better and notify from sk_validate_xmit_skb that the flow
> started hitting unexpected dev, hence it should be re-offloaded.
> 
> I don't think we need an explicit invalidation from the particular
> drivers here.
> 

Agree.

^ permalink raw reply

* Re: [PATCH bpf-next v2 5/5] selftests/bpf: xsk selftests - Bi-directional Sockets - SKB, DRV
From: Weqaar Janjua @ 2020-11-24 15:11 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, Daniel Borkmann, ast, Magnus Karlsson,
	Björn Töpel, Weqaar Janjua, shuah, skhan,
	linux-kselftest, Anders Roxell, jonathan.lemon
In-Reply-To: <CAPLEeBY_p_0QsZeqvrr0P+uf1jkL_eFGgawc=KD6Rkuh_177NA@mail.gmail.com>

On Sat, 21 Nov 2020 at 20:14, Weqaar Janjua <weqaar.janjua@gmail.com> wrote:
>
> On Fri, 20 Nov 2020 at 20:45, Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 11/20/20 5:00 AM, Weqaar Janjua wrote:
> > > Adds following tests:
> > >
> > > 1. AF_XDP SKB mode
> > >     d. Bi-directional Sockets
> > >        Configure sockets as bi-directional tx/rx sockets, sets up fill
> > >        and completion rings on each socket, tx/rx in both directions.
> > >        Only nopoll mode is used
> > >
> > > 2. AF_XDP DRV/Native mode
> > >     d. Bi-directional Sockets
> > >     * Only copy mode is supported because veth does not currently support
> > >       zero-copy mode
> > >
> > > Signed-off-by: Weqaar Janjua <weqaar.a.janjua@intel.com>
> > > ---
> > >   tools/testing/selftests/bpf/Makefile          |   4 +-
> > >   .../bpf/test_xsk_drv_bidirectional.sh         |  23 ++++
> > >   .../selftests/bpf/test_xsk_drv_teardown.sh    |   3 -
> > >   .../bpf/test_xsk_skb_bidirectional.sh         |  20 ++++
> > >   tools/testing/selftests/bpf/xdpxceiver.c      | 100 +++++++++++++-----
> > >   tools/testing/selftests/bpf/xdpxceiver.h      |   4 +
> > >   6 files changed, 126 insertions(+), 28 deletions(-)
> > >   create mode 100755 tools/testing/selftests/bpf/test_xsk_drv_bidirectional.sh
> > >   create mode 100755 tools/testing/selftests/bpf/test_xsk_skb_bidirectional.sh
> > >
> > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > > index 515b29d321d7..258bd72812e0 100644
> > > --- a/tools/testing/selftests/bpf/Makefile
> > > +++ b/tools/testing/selftests/bpf/Makefile
> > > @@ -78,7 +78,9 @@ TEST_PROGS := test_kmod.sh \
> > >       test_xsk_drv_nopoll.sh \
> > >       test_xsk_drv_poll.sh \
> > >       test_xsk_skb_teardown.sh \
> > > -     test_xsk_drv_teardown.sh
> > > +     test_xsk_drv_teardown.sh \
> > > +     test_xsk_skb_bidirectional.sh \
> > > +     test_xsk_drv_bidirectional.sh
> > >
> > >   TEST_PROGS_EXTENDED := with_addr.sh \
> > >       with_tunnels.sh \
> > > diff --git a/tools/testing/selftests/bpf/test_xsk_drv_bidirectional.sh b/tools/testing/selftests/bpf/test_xsk_drv_bidirectional.sh
> > > new file mode 100755
> > > index 000000000000..d3a7e2934d83
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/test_xsk_drv_bidirectional.sh
> > > @@ -0,0 +1,23 @@
> > > +#!/bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright(c) 2020 Intel Corporation.
> > > +
> > > +# See test_xsk_prerequisites.sh for detailed information on tests
> > > +
> > > +. xsk_prereqs.sh
> > > +. xsk_env.sh
> > > +
> > > +TEST_NAME="DRV BIDIRECTIONAL SOCKETS"
> > > +
> > > +vethXDPnative ${VETH0} ${VETH1} ${NS1}
> > > +
> > > +params=("-N" "-B")
> > > +execxdpxceiver params
> > > +
> > > +retval=$?
> > > +test_status $retval "${TEST_NAME}"
> > > +
> > > +# Must be called in the last test to execute
> > > +cleanup_exit ${VETH0} ${VETH1} ${NS1}
> >
> > This also makes hard to run tests as users will not know this unless
> > they are familiar with the details of the tests.
> >
> > How about you have another scripts test_xsk.sh which includes all these
> > individual tests and pull the above cleanup_exit into test_xsk.sh?
> > User just need to run test_xsk.sh will be able to run all tests you
> > implemented here.
> >
> This works, test_xsk_* >> test_xsk.sh, will ship out as v3.
>
An issue with merging all tests in a single test_xsk.sh is reporting
number of test failures, with this approach a single test status is
printed by kselftest:

# PREREQUISITES: [ PASS ]
# SKB NOPOLL: [ FAIL ]
# SKB POLL: [ PASS ]
ok 1 selftests: xsk-patch2: test_xsk.sh

This is due to the fact Makefile has one TEST_PROGS = test_xsk.sh
(thus kselftest considers it one test?), where in the original
approach all tests have separate TEST_PROGS .sh which makes reporting
match each test and status. This can be a problem for automation.

An alternative would be to exit each test with failure status but then
the tests will stop execution at the failed test without executing the
rest of xsk tests, which we probably wouldn't want.

Suggestions please?

> > > +
> > > +test_exit $retval 0
> > > diff --git a/tools/testing/selftests/bpf/test_xsk_drv_teardown.sh b/tools/testing/selftests/bpf/test_xsk_drv_teardown.sh
> > [...]

^ permalink raw reply

* Re: [PATCH] net: phy: fix auto-negotiation in case of 'down-shift'
From: Russell King - ARM Linux admin @ 2020-11-24 15:17 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Antonio Borneo, Andrew Lunn, David S. Miller, Jakub Kicinski,
	netdev, Yonglong Liu, stable, linuxarm, Salil Mehta, linux-stm32,
	linux-kernel
In-Reply-To: <4684304a-37f5-e0cd-91cf-3f86318979c3@gmail.com>

On Tue, Nov 24, 2020 at 04:03:40PM +0100, Heiner Kallweit wrote:
> Am 24.11.2020 um 15:38 schrieb Antonio Borneo:
> > If the auto-negotiation fails to establish a gigabit link, the phy
> > can try to 'down-shift': it resets the bits in MII_CTRL1000 to
> > stop advertising 1Gbps and retries the negotiation at 100Mbps.
> > 
> I see that Russell answered already. My 2cts:
> 
> Are you sure all PHY's supporting downshift adjust the
> advertisement bits? IIRC an Aquantia PHY I dealt with does not.
> And if a PHY does so I'd consider this problematic:
> Let's say you have a broken cable and the PHY downshifts to
> 100Mbps. If you change the cable then the PHY would still negotiate
> 100Mbps only.

From what I've seen, that is not how downshift works, at least on
the PHYs I've seen.

When the PHY downshifts, it modifies the advertisement registers,
but it also remembers the original value. When the cable is
unplugged, it restores the setting to what was previously set.

It is _far_ from nice, but the fact is that your patch that Antonio
identified has broken previously working support, something that I
brought up when I patched one of the PHY drivers that was broken by
this very same problem by your patch.

That said, _if_ the PHY has a way to read the resolved state rather
than reading the advertisement registers, that is what should be
used (as I said previously) rather than trying to decode the
advertisement registers ourselves. That is normally more reliable
for speed and duplex.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* [PATCH net-next v4 0/4] remove compat_alloc_user_space()
From: Arnd Bergmann @ 2020-11-24 15:18 UTC (permalink / raw)
  To: netdev
  Cc: Arnd Bergmann, David S. Miller, Jakub Kicinski, Christoph Hellwig,
	Jiri Pirko, Taehee Yoo, Eric Dumazet, Alexei Starovoitov,
	Andrew Lunn, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

This is the fourth version of my series, now spanning four patches
instead of two, with a new approach for handling struct ifreq
compatibility after I realized that my earlier approach introduces
additional problems.

The idea here is to always push down the compat conversion
deeper into the call stack: rather than pretending to be
native mode with a modified copy of the original data on
the user space stack, have the code that actually works on
the data understand the difference between native and compat
versions.

I have spent a long time looking at all drivers that implement
an ndo_do_ioctl callback to verify that my assumptions are
correct. This has led to a series of 29 additional patches
that I am not including here but will post separately, fixing
a number of bugs in SIOCDEVPRIVATE ioctls, removing dead
code, and splitting ndo_do_ioctl into two new ndo callbacks
for private and ethernet specific commands.

The patches are identical to v3 that I sent a few weeks ago,
except this avoids the build failure when CONFIG_COMPAT is
disabled.

      Arnd

Changes in v4:
 - build fix without CONFIG_INET
 - build fix without CONFIG_COMPAT
 - style fixes pointed out by hch

Changes in v3:
 - complete rewrite of the series

Arnd Bergmann (4):
  ethtool: improve compat ioctl handling
  net: socket: rework SIOC?IFMAP ioctls
  net: socket: simplify dev_ifconf handling
  net: socket: rework compat_ifreq_ioctl()

 include/linux/compat.h     |  82 +++++------
 include/linux/ethtool.h    |   4 -
 include/linux/inetdevice.h |   9 ++
 include/linux/netdevice.h  |  12 +-
 net/appletalk/ddp.c        |   4 +-
 net/core/dev_ioctl.c       | 152 ++++++++++---------
 net/ethtool/ioctl.c        | 143 ++++++++++++++++--
 net/ieee802154/socket.c    |   4 +-
 net/ipv4/af_inet.c         |   6 +-
 net/ipv4/devinet.c         |   4 +-
 net/qrtr/qrtr.c            |   4 +-
 net/socket.c               | 289 ++++++++-----------------------------
 12 files changed, 338 insertions(+), 375 deletions(-)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

-- 
2.27.0

^ permalink raw reply

* [PATCH v4 1/4] ethtool: improve compat ioctl handling
From: Arnd Bergmann @ 2020-11-24 15:18 UTC (permalink / raw)
  To: netdev; +Cc: Arnd Bergmann, Christoph Hellwig
In-Reply-To: <20201124151828.169152-1-arnd@kernel.org>

From: Arnd Bergmann <arnd@arndb.de>

The ethtool compat ioctl handling is hidden away in net/socket.c,
which introduces a couple of minor oddities:

- The implementation may end up diverging, as seen in the RXNFC
  extension in commit 84a1d9c48200 ("net: ethtool: extend RXNFC
  API to support RSS spreading of filter matches") that does not work
  in compat mode.

- Most architectures do not need the compat handling at all
  because u64 and compat_u64 have the same alignment.

- On x86, the conversion is done for both x32 and i386 user space,
  but it's actually wrong to do it for x32 and cannot work there.

- On 32-bit Arm, it never worked for compat oabi user space, since
  that needs to do the same conversion but does not.

- It would be nice to get rid of both compat_alloc_user_space()
  and copy_in_user() throughout the kernel.

None of these actually seems to be a serious problem that real
users are likely to encounter, but fixing all of them actually
leads to code that is both shorter and more readable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
Changes in v2:
 - remove extraneous 'inline' keyword (davem)
 - split helper functions into smaller units (hch)
 - remove arm oabi check with missing dependency (0day bot)
---
 include/linux/ethtool.h |   4 --
 net/ethtool/ioctl.c     | 143 +++++++++++++++++++++++++++++++++++-----
 net/socket.c            | 125 +----------------------------------
 3 files changed, 128 insertions(+), 144 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index e3da25b51ae4..abeae15bf2d6 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -17,8 +17,6 @@
 #include <linux/compat.h>
 #include <uapi/linux/ethtool.h>
 
-#ifdef CONFIG_COMPAT
-
 struct compat_ethtool_rx_flow_spec {
 	u32		flow_type;
 	union ethtool_flow_union h_u;
@@ -38,8 +36,6 @@ struct compat_ethtool_rxnfc {
 	u32				rule_locs[];
 };
 
-#endif /* CONFIG_COMPAT */
-
 #include <linux/rculist.h>
 
 /**
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 771688e1b0da..bebd68e97078 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -7,6 +7,7 @@
  * the information ethtool needs.
  */
 
+#include <linux/compat.h>
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/capability.h>
@@ -807,6 +808,127 @@ static noinline_for_stack int ethtool_get_sset_info(struct net_device *dev,
 	return ret;
 }
 
+static bool ethtool_translate_compat(void)
+{
+#ifdef CONFIG_X86_64
+	/* On x86, translation is needed for i386 but not x32 */
+	return in_ia32_syscall();
+#else
+	BUILD_BUG_ON(sizeof(struct compat_ethtool_rxnfc) !=
+		     sizeof(struct ethtool_rxnfc));
+#endif
+	return false;
+}
+
+static int ethtool_rxnfc_copy_from_compat(struct ethtool_rxnfc *rxnfc,
+					  const struct compat_ethtool_rxnfc __user *useraddr,
+					  size_t size)
+{
+	struct compat_ethtool_rxnfc crxnfc = {};
+
+	/* We expect there to be holes between fs.m_ext and
+	 * fs.ring_cookie and at the end of fs, but nowhere else.
+	 */
+	BUILD_BUG_ON(offsetof(struct compat_ethtool_rxnfc, fs.m_ext) +
+		     sizeof(useraddr->fs.m_ext) !=
+		     offsetof(struct ethtool_rxnfc, fs.m_ext) +
+		     sizeof(rxnfc->fs.m_ext));
+	BUILD_BUG_ON(offsetof(struct compat_ethtool_rxnfc, fs.location) -
+		     offsetof(struct compat_ethtool_rxnfc, fs.ring_cookie) !=
+		     offsetof(struct ethtool_rxnfc, fs.location) -
+		     offsetof(struct ethtool_rxnfc, fs.ring_cookie));
+
+	if (copy_from_user(&crxnfc, useraddr, min(size, sizeof(crxnfc))))
+		return -EFAULT;
+
+	*rxnfc = (struct ethtool_rxnfc) {
+		.cmd		= crxnfc.cmd,
+		.flow_type	= crxnfc.flow_type,
+		.data		= crxnfc.data,
+		.fs		= {
+			.flow_type	= crxnfc.fs.flow_type,
+			.h_u		= crxnfc.fs.h_u,
+			.h_ext		= crxnfc.fs.h_ext,
+			.m_u		= crxnfc.fs.m_u,
+			.m_ext		= crxnfc.fs.m_ext,
+			.ring_cookie	= crxnfc.fs.ring_cookie,
+			.location	= crxnfc.fs.location,
+		},
+		.rule_cnt	= crxnfc.rule_cnt,
+	};
+
+	return 0;
+}
+
+static int ethtool_rxnfc_copy_from_user(struct ethtool_rxnfc *rxnfc,
+					const void __user *useraddr,
+					size_t size)
+{
+	if (ethtool_translate_compat())
+		return ethtool_rxnfc_copy_from_compat(rxnfc, useraddr, size);
+
+	if (copy_from_user(&rxnfc, useraddr, size))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int ethtool_rxnfc_copy_to_compat(void __user *useraddr,
+					const struct ethtool_rxnfc *rxnfc,
+					size_t size, const u32 *rule_buf)
+{
+	struct compat_ethtool_rxnfc crxnfc;
+
+	memset(&crxnfc, 0, sizeof(crxnfc));
+	crxnfc = (struct compat_ethtool_rxnfc) {
+		.cmd		= rxnfc->cmd,
+		.flow_type	= rxnfc->flow_type,
+		.data		= rxnfc->data,
+		.fs		= {
+			.flow_type	= rxnfc->fs.flow_type,
+			.h_u		= rxnfc->fs.h_u,
+			.h_ext		= rxnfc->fs.h_ext,
+			.m_u		= rxnfc->fs.m_u,
+			.m_ext		= rxnfc->fs.m_ext,
+			.ring_cookie	= rxnfc->fs.ring_cookie,
+			.location	= rxnfc->fs.location,
+		},
+		.rule_cnt	= rxnfc->rule_cnt,
+	};
+
+	if (copy_to_user(useraddr, &crxnfc, min(size, sizeof(crxnfc))))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int ethtool_rxnfc_copy_to_user(void __user *useraddr,
+				      const struct ethtool_rxnfc *rxnfc,
+				      size_t size, const u32 *rule_buf)
+{
+	int ret;
+
+	if (ethtool_translate_compat()) {
+		ret = ethtool_rxnfc_copy_to_compat(useraddr, rxnfc, size,
+						   rule_buf);
+		useraddr += offsetof(struct compat_ethtool_rxnfc, rule_locs);
+	} else {
+		ret = copy_to_user(useraddr, &rxnfc, size);
+		useraddr += offsetof(struct ethtool_rxnfc, rule_locs);
+	}
+
+	if (ret)
+		return -EFAULT;
+
+	if (rule_buf) {
+		if (copy_to_user(useraddr, rule_buf,
+				 rxnfc->rule_cnt * sizeof(u32)))
+			return -EFAULT;
+	}
+
+	return 0;
+}
+
 static noinline_for_stack int ethtool_set_rxnfc(struct net_device *dev,
 						u32 cmd, void __user *useraddr)
 {
@@ -825,7 +947,7 @@ static noinline_for_stack int ethtool_set_rxnfc(struct net_device *dev,
 		info_size = (offsetof(struct ethtool_rxnfc, data) +
 			     sizeof(info.data));
 
-	if (copy_from_user(&info, useraddr, info_size))
+	if (ethtool_rxnfc_copy_from_user(&info, useraddr, info_size))
 		return -EFAULT;
 
 	rc = dev->ethtool_ops->set_rxnfc(dev, &info);
@@ -833,7 +955,7 @@ static noinline_for_stack int ethtool_set_rxnfc(struct net_device *dev,
 		return rc;
 
 	if (cmd == ETHTOOL_SRXCLSRLINS &&
-	    copy_to_user(useraddr, &info, info_size))
+	    ethtool_rxnfc_copy_to_user(useraddr, &info, info_size, NULL))
 		return -EFAULT;
 
 	return 0;
@@ -859,7 +981,7 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
 		info_size = (offsetof(struct ethtool_rxnfc, data) +
 			     sizeof(info.data));
 
-	if (copy_from_user(&info, useraddr, info_size))
+	if (ethtool_rxnfc_copy_from_user(&info, useraddr, info_size))
 		return -EFAULT;
 
 	/* If FLOW_RSS was requested then user-space must be using the
@@ -867,7 +989,7 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
 	 */
 	if (cmd == ETHTOOL_GRXFH && info.flow_type & FLOW_RSS) {
 		info_size = sizeof(info);
-		if (copy_from_user(&info, useraddr, info_size))
+		if (ethtool_rxnfc_copy_from_user(&info, useraddr, info_size))
 			return -EFAULT;
 		/* Since malicious users may modify the original data,
 		 * we need to check whether FLOW_RSS is still requested.
@@ -893,18 +1015,7 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
 	if (ret < 0)
 		goto err_out;
 
-	ret = -EFAULT;
-	if (copy_to_user(useraddr, &info, info_size))
-		goto err_out;
-
-	if (rule_buf) {
-		useraddr += offsetof(struct ethtool_rxnfc, rule_locs);
-		if (copy_to_user(useraddr, rule_buf,
-				 info.rule_cnt * sizeof(u32)))
-			goto err_out;
-	}
-	ret = 0;
-
+	ret = ethtool_rxnfc_copy_to_user(useraddr, &info, info_size, rule_buf);
 err_out:
 	kfree(rule_buf);
 
diff --git a/net/socket.c b/net/socket.c
index 372e6c065d22..60df84a91051 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3109,128 +3109,6 @@ static int compat_dev_ifconf(struct net *net, struct compat_ifconf __user *uifc3
 	return 0;
 }
 
-static int ethtool_ioctl(struct net *net, struct compat_ifreq __user *ifr32)
-{
-	struct compat_ethtool_rxnfc __user *compat_rxnfc;
-	bool convert_in = false, convert_out = false;
-	size_t buf_size = 0;
-	struct ethtool_rxnfc __user *rxnfc = NULL;
-	struct ifreq ifr;
-	u32 rule_cnt = 0, actual_rule_cnt;
-	u32 ethcmd;
-	u32 data;
-	int ret;
-
-	if (get_user(data, &ifr32->ifr_ifru.ifru_data))
-		return -EFAULT;
-
-	compat_rxnfc = compat_ptr(data);
-
-	if (get_user(ethcmd, &compat_rxnfc->cmd))
-		return -EFAULT;
-
-	/* Most ethtool structures are defined without padding.
-	 * Unfortunately struct ethtool_rxnfc is an exception.
-	 */
-	switch (ethcmd) {
-	default:
-		break;
-	case ETHTOOL_GRXCLSRLALL:
-		/* Buffer size is variable */
-		if (get_user(rule_cnt, &compat_rxnfc->rule_cnt))
-			return -EFAULT;
-		if (rule_cnt > KMALLOC_MAX_SIZE / sizeof(u32))
-			return -ENOMEM;
-		buf_size += rule_cnt * sizeof(u32);
-		fallthrough;
-	case ETHTOOL_GRXRINGS:
-	case ETHTOOL_GRXCLSRLCNT:
-	case ETHTOOL_GRXCLSRULE:
-	case ETHTOOL_SRXCLSRLINS:
-		convert_out = true;
-		fallthrough;
-	case ETHTOOL_SRXCLSRLDEL:
-		buf_size += sizeof(struct ethtool_rxnfc);
-		convert_in = true;
-		rxnfc = compat_alloc_user_space(buf_size);
-		break;
-	}
-
-	if (copy_from_user(&ifr.ifr_name, &ifr32->ifr_name, IFNAMSIZ))
-		return -EFAULT;
-
-	ifr.ifr_data = convert_in ? rxnfc : (void __user *)compat_rxnfc;
-
-	if (convert_in) {
-		/* We expect there to be holes between fs.m_ext and
-		 * fs.ring_cookie and at the end of fs, but nowhere else.
-		 */
-		BUILD_BUG_ON(offsetof(struct compat_ethtool_rxnfc, fs.m_ext) +
-			     sizeof(compat_rxnfc->fs.m_ext) !=
-			     offsetof(struct ethtool_rxnfc, fs.m_ext) +
-			     sizeof(rxnfc->fs.m_ext));
-		BUILD_BUG_ON(
-			offsetof(struct compat_ethtool_rxnfc, fs.location) -
-			offsetof(struct compat_ethtool_rxnfc, fs.ring_cookie) !=
-			offsetof(struct ethtool_rxnfc, fs.location) -
-			offsetof(struct ethtool_rxnfc, fs.ring_cookie));
-
-		if (copy_in_user(rxnfc, compat_rxnfc,
-				 (void __user *)(&rxnfc->fs.m_ext + 1) -
-				 (void __user *)rxnfc) ||
-		    copy_in_user(&rxnfc->fs.ring_cookie,
-				 &compat_rxnfc->fs.ring_cookie,
-				 (void __user *)(&rxnfc->fs.location + 1) -
-				 (void __user *)&rxnfc->fs.ring_cookie))
-			return -EFAULT;
-		if (ethcmd == ETHTOOL_GRXCLSRLALL) {
-			if (put_user(rule_cnt, &rxnfc->rule_cnt))
-				return -EFAULT;
-		} else if (copy_in_user(&rxnfc->rule_cnt,
-					&compat_rxnfc->rule_cnt,
-					sizeof(rxnfc->rule_cnt)))
-			return -EFAULT;
-	}
-
-	ret = dev_ioctl(net, SIOCETHTOOL, &ifr, NULL);
-	if (ret)
-		return ret;
-
-	if (convert_out) {
-		if (copy_in_user(compat_rxnfc, rxnfc,
-				 (const void __user *)(&rxnfc->fs.m_ext + 1) -
-				 (const void __user *)rxnfc) ||
-		    copy_in_user(&compat_rxnfc->fs.ring_cookie,
-				 &rxnfc->fs.ring_cookie,
-				 (const void __user *)(&rxnfc->fs.location + 1) -
-				 (const void __user *)&rxnfc->fs.ring_cookie) ||
-		    copy_in_user(&compat_rxnfc->rule_cnt, &rxnfc->rule_cnt,
-				 sizeof(rxnfc->rule_cnt)))
-			return -EFAULT;
-
-		if (ethcmd == ETHTOOL_GRXCLSRLALL) {
-			/* As an optimisation, we only copy the actual
-			 * number of rules that the underlying
-			 * function returned.  Since Mallory might
-			 * change the rule count in user memory, we
-			 * check that it is less than the rule count
-			 * originally given (as the user buffer size),
-			 * which has been range-checked.
-			 */
-			if (get_user(actual_rule_cnt, &rxnfc->rule_cnt))
-				return -EFAULT;
-			if (actual_rule_cnt < rule_cnt)
-				rule_cnt = actual_rule_cnt;
-			if (copy_in_user(&compat_rxnfc->rule_locs[0],
-					 &rxnfc->rule_locs[0],
-					 rule_cnt * sizeof(u32)))
-				return -EFAULT;
-		}
-	}
-
-	return 0;
-}
-
 static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32)
 {
 	compat_uptr_t uptr32;
@@ -3385,8 +3263,6 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
 		return old_bridge_ioctl(argp);
 	case SIOCGIFCONF:
 		return compat_dev_ifconf(net, argp);
-	case SIOCETHTOOL:
-		return ethtool_ioctl(net, argp);
 	case SIOCWANDEV:
 		return compat_siocwandev(net, argp);
 	case SIOCGIFMAP:
@@ -3399,6 +3275,7 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
 		return sock->ops->gettstamp(sock, argp, cmd == SIOCGSTAMP_OLD,
 					    !COMPAT_USE_64BIT_TIME);
 
+	case SIOCETHTOOL:
 	case SIOCBONDSLAVEINFOQUERY:
 	case SIOCBONDINFOQUERY:
 	case SIOCSHWTSTAMP:
-- 
2.27.0


^ permalink raw reply related

* [PATCH v4 2/4] net: socket: rework SIOC?IFMAP ioctls
From: Arnd Bergmann @ 2020-11-24 15:18 UTC (permalink / raw)
  To: netdev; +Cc: Arnd Bergmann
In-Reply-To: <20201124151828.169152-1-arnd@kernel.org>

From: Arnd Bergmann <arnd@arndb.de>

SIOCGIFMAP and SIOCSIFMAP currently require compat_alloc_user_space()
and copy_in_user() for compat mode.

Move the compat handling into the location where the structures are
actually used, to avoid using those interfaces and get a clearer
implementation.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
changes in v3:
 - complete rewrite

changes in v2:
 - fix building with CONFIG_COMPAT disabled (0day bot)
 - split up dev_ifmap() into more readable helpers (hch)
 - move rcu_read_unlock() for readability (hch)
---
 include/linux/compat.h | 18 ++++++------
 net/core/dev_ioctl.c   | 64 +++++++++++++++++++++++++++++++++---------
 net/socket.c           | 39 ++-----------------------
 3 files changed, 62 insertions(+), 59 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 08dbd34bb7a5..47496c5eb5eb 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -96,6 +96,15 @@ struct compat_iovec {
 	compat_size_t	iov_len;
 };
 
+struct compat_ifmap {
+	compat_ulong_t mem_start;
+	compat_ulong_t mem_end;
+	unsigned short base_addr;
+	unsigned char irq;
+	unsigned char dma;
+	unsigned char port;
+};
+
 #ifdef CONFIG_COMPAT
 
 #ifndef compat_user_stack_pointer
@@ -314,15 +323,6 @@ typedef struct compat_sigevent {
 	} _sigev_un;
 } compat_sigevent_t;
 
-struct compat_ifmap {
-	compat_ulong_t mem_start;
-	compat_ulong_t mem_end;
-	unsigned short base_addr;
-	unsigned char irq;
-	unsigned char dma;
-	unsigned char port;
-};
-
 struct compat_if_settings {
 	unsigned int type;	/* Type of physical device or protocol */
 	unsigned int size;	/* Size of the data allocated by the caller */
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index db8a0ff86f36..de3df6fe65fe 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -98,6 +98,55 @@ int dev_ifconf(struct net *net, struct ifconf *ifc, int size)
 	return 0;
 }
 
+static int dev_getifmap(struct net_device *dev, struct ifreq *ifr)
+{
+	struct ifmap *ifmap = &ifr->ifr_map;
+	struct compat_ifmap *cifmap = (struct compat_ifmap *)&ifr->ifr_map;
+
+	if (in_compat_syscall()) {
+		cifmap->mem_start = dev->mem_start;
+		cifmap->mem_end   = dev->mem_end;
+		cifmap->base_addr = dev->base_addr;
+		cifmap->irq       = dev->irq;
+		cifmap->dma       = dev->dma;
+		cifmap->port      = dev->if_port;
+
+		return 0;
+	}
+
+	ifmap->mem_start  = dev->mem_start;
+	ifmap->mem_end    = dev->mem_end;
+	ifmap->base_addr  = dev->base_addr;
+	ifmap->irq        = dev->irq;
+	ifmap->dma        = dev->dma;
+	ifmap->port       = dev->if_port;
+
+	return 0;
+}
+
+static int dev_setifmap(struct net_device *dev, struct ifreq *ifr)
+{
+	struct compat_ifmap *cifmap = (struct compat_ifmap *)&ifr->ifr_map;
+
+	if (!dev->netdev_ops->ndo_set_config)
+		return -EOPNOTSUPP;
+
+	if (in_compat_syscall()) {
+		struct ifmap ifmap = {
+			.mem_start  = cifmap->mem_start,
+			.mem_end    = cifmap->mem_end,
+			.base_addr  = cifmap->base_addr,
+			.irq        = cifmap->irq,
+			.dma        = cifmap->dma,
+			.port       = cifmap->port,
+		};
+
+		return dev->netdev_ops->ndo_set_config(dev, &ifmap);
+	}
+
+	return dev->netdev_ops->ndo_set_config(dev, &ifr->ifr_map);
+}
+
 /*
  *	Perform the SIOCxIFxxx calls, inside rcu_read_lock()
  */
@@ -139,13 +188,7 @@ static int dev_ifsioc_locked(struct net *net, struct ifreq *ifr, unsigned int cm
 		break;
 
 	case SIOCGIFMAP:
-		ifr->ifr_map.mem_start = dev->mem_start;
-		ifr->ifr_map.mem_end   = dev->mem_end;
-		ifr->ifr_map.base_addr = dev->base_addr;
-		ifr->ifr_map.irq       = dev->irq;
-		ifr->ifr_map.dma       = dev->dma;
-		ifr->ifr_map.port      = dev->if_port;
-		return 0;
+		return dev_getifmap(dev, ifr);
 
 	case SIOCGIFINDEX:
 		ifr->ifr_ifindex = dev->ifindex;
@@ -286,12 +329,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
 		return 0;
 
 	case SIOCSIFMAP:
-		if (ops->ndo_set_config) {
-			if (!netif_device_present(dev))
-				return -ENODEV;
-			return ops->ndo_set_config(dev, &ifr->ifr_map);
-		}
-		return -EOPNOTSUPP;
+		return dev_setifmap(dev, ifr);
 
 	case SIOCADDMULTI:
 		if (!ops->ndo_set_rx_mode ||
diff --git a/net/socket.c b/net/socket.c
index 60df84a91051..2d32c8576e95 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3198,40 +3198,6 @@ static int compat_ifreq_ioctl(struct net *net, struct socket *sock,
 	return err;
 }
 
-static int compat_sioc_ifmap(struct net *net, unsigned int cmd,
-			struct compat_ifreq __user *uifr32)
-{
-	struct ifreq ifr;
-	struct compat_ifmap __user *uifmap32;
-	int err;
-
-	uifmap32 = &uifr32->ifr_ifru.ifru_map;
-	err = copy_from_user(&ifr, uifr32, sizeof(ifr.ifr_name));
-	err |= get_user(ifr.ifr_map.mem_start, &uifmap32->mem_start);
-	err |= get_user(ifr.ifr_map.mem_end, &uifmap32->mem_end);
-	err |= get_user(ifr.ifr_map.base_addr, &uifmap32->base_addr);
-	err |= get_user(ifr.ifr_map.irq, &uifmap32->irq);
-	err |= get_user(ifr.ifr_map.dma, &uifmap32->dma);
-	err |= get_user(ifr.ifr_map.port, &uifmap32->port);
-	if (err)
-		return -EFAULT;
-
-	err = dev_ioctl(net, cmd, &ifr, NULL);
-
-	if (cmd == SIOCGIFMAP && !err) {
-		err = copy_to_user(uifr32, &ifr, sizeof(ifr.ifr_name));
-		err |= put_user(ifr.ifr_map.mem_start, &uifmap32->mem_start);
-		err |= put_user(ifr.ifr_map.mem_end, &uifmap32->mem_end);
-		err |= put_user(ifr.ifr_map.base_addr, &uifmap32->base_addr);
-		err |= put_user(ifr.ifr_map.irq, &uifmap32->irq);
-		err |= put_user(ifr.ifr_map.dma, &uifmap32->dma);
-		err |= put_user(ifr.ifr_map.port, &uifmap32->port);
-		if (err)
-			err = -EFAULT;
-	}
-	return err;
-}
-
 /* Since old style bridge ioctl's endup using SIOCDEVPRIVATE
  * for some operations; this forces use of the newer bridge-utils that
  * use compatible ioctls
@@ -3265,9 +3231,6 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
 		return compat_dev_ifconf(net, argp);
 	case SIOCWANDEV:
 		return compat_siocwandev(net, argp);
-	case SIOCGIFMAP:
-	case SIOCSIFMAP:
-		return compat_sioc_ifmap(net, cmd, argp);
 	case SIOCGSTAMP_OLD:
 	case SIOCGSTAMPNS_OLD:
 		if (!sock->ops->gettstamp)
@@ -3297,6 +3260,8 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
 
 	case SIOCGIFFLAGS:
 	case SIOCSIFFLAGS:
+	case SIOCGIFMAP:
+	case SIOCSIFMAP:
 	case SIOCGIFMETRIC:
 	case SIOCSIFMETRIC:
 	case SIOCGIFMTU:
-- 
2.27.0


^ permalink raw reply related

* [PATCH v4 4/4] net: socket: rework compat_ifreq_ioctl()
From: Arnd Bergmann @ 2020-11-24 15:18 UTC (permalink / raw)
  To: netdev; +Cc: Arnd Bergmann
In-Reply-To: <20201124151828.169152-1-arnd@kernel.org>

From: Arnd Bergmann <arnd@arndb.de>

compat_ifreq_ioctl() is one of the last users of copy_in_user() and
compat_alloc_user_space(), as it attempts to convert the 'struct ifreq'
arguments from 32-bit to 64-bit format as used by dev_ioctl() and a
couple of socket family specific interpretations.

The current implementation works correctly when calling dev_ioctl(),
inet_ioctl(), ieee802154_sock_ioctl(), atalk_ioctl(), qrtr_ioctl()
and packet_ioctl(). The ioctl handlers for x25, netrom, rose and x25 do
not interpret the arguments and only block the corresponding commands,
so they do not care.

For af_inet6 and af_decnet however, the compat conversion is slightly
incorrect, as it will copy more data than the native handler accesses,
both of them use a structure that is shorter than ifreq.

Replace the copy_in_user() conversion with a pair of accessor functions
to read and write the ifreq data in place with the correct length where
needed, while leaving the other ones to copy the (already compatible)
structures directly.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/compat.h    |  53 ++++++++++----------
 include/linux/netdevice.h |   2 +
 net/appletalk/ddp.c       |   4 +-
 net/ieee802154/socket.c   |   4 +-
 net/ipv4/af_inet.c        |   6 +--
 net/qrtr/qrtr.c           |   4 +-
 net/socket.c              | 100 ++++++++++++++++++++++++--------------
 7 files changed, 101 insertions(+), 72 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index a97f80b704ab..db722ba9ec58 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -110,6 +110,33 @@ struct compat_ifconf {
 	compat_uptr_t	ifcbuf;
 };
 
+struct compat_if_settings {
+	unsigned int type;	/* Type of physical device or protocol */
+	unsigned int size;	/* Size of the data allocated by the caller */
+	compat_uptr_t ifs_ifsu;	/* union of pointers */
+};
+
+struct compat_ifreq {
+	union {
+		char	ifrn_name[IFNAMSIZ];    /* if name, e.g. "en0" */
+	} ifr_ifrn;
+	union {
+		struct	sockaddr ifru_addr;
+		struct	sockaddr ifru_dstaddr;
+		struct	sockaddr ifru_broadaddr;
+		struct	sockaddr ifru_netmask;
+		struct	sockaddr ifru_hwaddr;
+		short	ifru_flags;
+		compat_int_t	ifru_ivalue;
+		compat_int_t	ifru_mtu;
+		struct	compat_ifmap ifru_map;
+		char	ifru_slave[IFNAMSIZ];   /* Just fits the size */
+		char	ifru_newname[IFNAMSIZ];
+		compat_uptr_t	ifru_data;
+		struct	compat_if_settings ifru_settings;
+	} ifr_ifru;
+};
+
 #ifdef CONFIG_COMPAT
 
 #ifndef compat_user_stack_pointer
@@ -328,32 +355,6 @@ typedef struct compat_sigevent {
 	} _sigev_un;
 } compat_sigevent_t;
 
-struct compat_ifreq {
-	union {
-		char	ifrn_name[IFNAMSIZ];    /* if name, e.g. "en0" */
-	} ifr_ifrn;
-	union {
-		struct	sockaddr ifru_addr;
-		struct	sockaddr ifru_dstaddr;
-		struct	sockaddr ifru_broadaddr;
-		struct	sockaddr ifru_netmask;
-		struct	sockaddr ifru_hwaddr;
-		short	ifru_flags;
-		compat_int_t	ifru_ivalue;
-		compat_int_t	ifru_mtu;
-		struct	compat_ifmap ifru_map;
-		char	ifru_slave[IFNAMSIZ];   /* Just fits the size */
-		char	ifru_newname[IFNAMSIZ];
-		compat_caddr_t	ifru_data;
-		struct	compat_if_settings ifru_settings;
-	} ifr_ifru;
-};
-
-struct compat_ifconf {
-	compat_int_t	ifc_len;                /* size of buffer */
-	compat_caddr_t  ifcbuf;
-};
-
 struct compat_robust_list {
 	compat_uptr_t			next;
 };
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e99450a60d8e..c7935c3e5e05 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3860,6 +3860,8 @@ int netdev_rx_handler_register(struct net_device *dev,
 void netdev_rx_handler_unregister(struct net_device *dev);
 
 bool dev_valid_name(const char *name);
+int get_user_ifreq(struct ifreq *ifr, void __user **ifrdata, void __user *arg);
+int put_user_ifreq(struct ifreq *ifr, void __user *arg);
 int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr,
 		bool *need_copyout);
 int dev_ifconf(struct net *net, struct ifconf __user *ifc);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index ca1a0d07a087..d598e2e57633 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -666,7 +666,7 @@ static int atif_ioctl(int cmd, void __user *arg)
 	struct rtentry rtdef;
 	int add_route;
 
-	if (copy_from_user(&atreq, arg, sizeof(atreq)))
+	if (get_user_ifreq(&atreq, NULL, arg))
 		return -EFAULT;
 
 	dev = __dev_get_by_name(&init_net, atreq.ifr_name);
@@ -865,7 +865,7 @@ static int atif_ioctl(int cmd, void __user *arg)
 		return 0;
 	}
 
-	return copy_to_user(arg, &atreq, sizeof(atreq)) ? -EFAULT : 0;
+	return put_user_ifreq(&atreq, arg);
 }
 
 static int atrtr_ioctl_addrt(struct rtentry *rt)
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index a45a0401adc5..f5077de3619e 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -129,7 +129,7 @@ static int ieee802154_dev_ioctl(struct sock *sk, struct ifreq __user *arg,
 	int ret = -ENOIOCTLCMD;
 	struct net_device *dev;
 
-	if (copy_from_user(&ifr, arg, sizeof(struct ifreq)))
+	if (get_user_ifreq(&ifr, NULL, arg))
 		return -EFAULT;
 
 	ifr.ifr_name[IFNAMSIZ-1] = 0;
@@ -143,7 +143,7 @@ static int ieee802154_dev_ioctl(struct sock *sk, struct ifreq __user *arg,
 	if (dev->type == ARPHRD_IEEE802154 && dev->netdev_ops->ndo_do_ioctl)
 		ret = dev->netdev_ops->ndo_do_ioctl(dev, &ifr, cmd);
 
-	if (!ret && copy_to_user(arg, &ifr, sizeof(struct ifreq)))
+	if (!ret && put_user_ifreq(&ifr, arg))
 		ret = -EFAULT;
 	dev_put(dev);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index b7260c8cef2e..fb66d3f17a17 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -949,10 +949,10 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 	case SIOCGIFNETMASK:
 	case SIOCGIFDSTADDR:
 	case SIOCGIFPFLAGS:
-		if (copy_from_user(&ifr, p, sizeof(struct ifreq)))
+		if (get_user_ifreq(&ifr, NULL, p))
 			return -EFAULT;
 		err = devinet_ioctl(net, cmd, &ifr);
-		if (!err && copy_to_user(p, &ifr, sizeof(struct ifreq)))
+		if (!err && put_user_ifreq(&ifr, p))
 			err = -EFAULT;
 		break;
 
@@ -962,7 +962,7 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 	case SIOCSIFDSTADDR:
 	case SIOCSIFPFLAGS:
 	case SIOCSIFFLAGS:
-		if (copy_from_user(&ifr, p, sizeof(struct ifreq)))
+		if (get_user_ifreq(&ifr, NULL, p))
 			return -EFAULT;
 		err = devinet_ioctl(net, cmd, &ifr);
 		break;
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index f4ab3ca6d73b..ea56026457f7 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -1157,14 +1157,14 @@ static int qrtr_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 		rc = put_user(len, (int __user *)argp);
 		break;
 	case SIOCGIFADDR:
-		if (copy_from_user(&ifr, argp, sizeof(ifr))) {
+		if (get_user_ifreq(&ifr, NULL, argp)) {
 			rc = -EFAULT;
 			break;
 		}
 
 		sq = (struct sockaddr_qrtr *)&ifr.ifr_addr;
 		*sq = ipc->us;
-		if (copy_to_user(argp, &ifr, sizeof(ifr))) {
+		if (put_user_ifreq(&ifr, argp)) {
 			rc = -EFAULT;
 			break;
 		}
diff --git a/net/socket.c b/net/socket.c
index 41158e459f0b..4515c5f42bb0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3078,6 +3078,54 @@ void socket_seq_show(struct seq_file *seq)
 }
 #endif				/* CONFIG_PROC_FS */
 
+/* Handle the fact that while struct ifreq has the same *layout* on
+ * 32/64 for everything but ifreq::ifru_ifmap and ifreq::ifru_data,
+ * which are handled elsewhere, it still has different *size* due to
+ * ifreq::ifru_ifmap (which is 16 bytes on 32 bit, 24 bytes on 64-bit,
+ * resulting in struct ifreq being 32 and 40 bytes respectively).
+ * As a result, if the struct happens to be at the end of a page and
+ * the next page isn't readable/writable, we get a fault. To prevent
+ * that, copy back and forth to the full size.
+ */
+int get_user_ifreq(struct ifreq *ifr, void __user **ifrdata, void __user *arg)
+{
+	if (in_compat_syscall()) {
+		struct compat_ifreq *ifr32 = (struct compat_ifreq *)ifr;
+
+		memset(ifr, 0, sizeof(*ifr));
+		if (copy_from_user(ifr32, arg, sizeof(*ifr32)))
+			return -EFAULT;
+
+		if (ifrdata)
+			*ifrdata = compat_ptr(ifr32->ifr_data);
+
+		return 0;
+	}
+
+	if (copy_from_user(ifr, arg, sizeof(*ifr)))
+		return -EFAULT;
+
+	if (ifrdata)
+		*ifrdata = ifr->ifr_data;
+
+	return 0;
+}
+EXPORT_SYMBOL(get_user_ifreq);
+
+int put_user_ifreq(struct ifreq *ifr, void __user *arg)
+{
+	size_t size = sizeof(*ifr);
+
+	if (in_compat_syscall())
+		size = sizeof(struct compat_ifreq);
+
+	if (copy_to_user(arg, ifr, size))
+		return -EFAULT;
+
+	return 0;
+}
+EXPORT_SYMBOL(put_user_ifreq);
+
 #ifdef CONFIG_COMPAT
 static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32)
 {
@@ -3086,7 +3134,7 @@ static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32
 	void __user *saved;
 	int err;
 
-	if (copy_from_user(&ifr, uifr32, sizeof(struct compat_ifreq)))
+	if (get_user_ifreq(&ifr, NULL, uifr32))
 		return -EFAULT;
 
 	if (get_user(uptr32, &uifr32->ifr_settings.ifs_ifsu))
@@ -3098,7 +3146,7 @@ static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32
 	err = dev_ioctl(net, SIOCWANDEV, &ifr, NULL);
 	if (!err) {
 		ifr.ifr_settings.ifs_ifsu.raw_hdlc = saved;
-		if (copy_to_user(uifr32, &ifr, sizeof(struct compat_ifreq)))
+		if (put_user_ifreq(&ifr, uifr32))
 			err = -EFAULT;
 	}
 	return err;
@@ -3124,47 +3172,25 @@ static int compat_ifreq_ioctl(struct net *net, struct socket *sock,
 			      unsigned int cmd,
 			      struct compat_ifreq __user *uifr32)
 {
-	struct ifreq __user *uifr;
+	struct ifreq ifr;
+	bool need_copyout;
 	int err;
 
-	/* Handle the fact that while struct ifreq has the same *layout* on
-	 * 32/64 for everything but ifreq::ifru_ifmap and ifreq::ifru_data,
-	 * which are handled elsewhere, it still has different *size* due to
-	 * ifreq::ifru_ifmap (which is 16 bytes on 32 bit, 24 bytes on 64-bit,
-	 * resulting in struct ifreq being 32 and 40 bytes respectively).
-	 * As a result, if the struct happens to be at the end of a page and
-	 * the next page isn't readable/writable, we get a fault. To prevent
-	 * that, copy back and forth to the full size.
+	err = sock->ops->ioctl(sock, cmd, arg);
+
+	/* If this ioctl is unknown try to hand it down
+	 * to the NIC driver.
 	 */
+	if (err != -ENOIOCTLCMD)
+		return err;
 
-	uifr = compat_alloc_user_space(sizeof(*uifr));
-	if (copy_in_user(uifr, uifr32, sizeof(*uifr32)))
+	if (get_user_ifreq(&ifr, NULL, uifr32))
 		return -EFAULT;
+	err = dev_ioctl(net, cmd, &ifr, &need_copyout);
+	if (!err && need_copyout)
+		if (put_user_ifreq(&ifr, uifr32))
+			return -EFAULT;
 
-	err = sock_do_ioctl(net, sock, cmd, (unsigned long)uifr);
-
-	if (!err) {
-		switch (cmd) {
-		case SIOCGIFFLAGS:
-		case SIOCGIFMETRIC:
-		case SIOCGIFMTU:
-		case SIOCGIFMEM:
-		case SIOCGIFHWADDR:
-		case SIOCGIFINDEX:
-		case SIOCGIFADDR:
-		case SIOCGIFBRDADDR:
-		case SIOCGIFDSTADDR:
-		case SIOCGIFNETMASK:
-		case SIOCGIFPFLAGS:
-		case SIOCGIFTXQLEN:
-		case SIOCGMIIPHY:
-		case SIOCGMIIREG:
-		case SIOCGIFNAME:
-			if (copy_in_user(uifr32, uifr, sizeof(*uifr32)))
-				err = -EFAULT;
-			break;
-		}
-	}
 	return err;
 }
 
-- 
2.27.0


^ permalink raw reply related

* [PATCH v4 3/4] net: socket: simplify dev_ifconf handling
From: Arnd Bergmann @ 2020-11-24 15:18 UTC (permalink / raw)
  To: netdev; +Cc: Arnd Bergmann
In-Reply-To: <20201124151828.169152-1-arnd@kernel.org>

From: Arnd Bergmann <arnd@arndb.de>

The dev_ifconf() calling conventions make compat handling
more complicated than necessary, simplify this by moving
the in_compat_syscall() check into the function.
The implementation can be simplified further, based on the
knowledge that the dynamic registration is only ever used
for IPv4.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/compat.h     | 11 +++--
 include/linux/inetdevice.h |  9 ++++
 include/linux/netdevice.h  | 10 +----
 net/core/dev_ioctl.c       | 92 +++++++++++++++-----------------------
 net/ipv4/devinet.c         |  4 +-
 net/socket.c               | 59 ++++++------------------
 6 files changed, 66 insertions(+), 119 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 47496c5eb5eb..a97f80b704ab 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -105,6 +105,11 @@ struct compat_ifmap {
 	unsigned char port;
 };
 
+struct compat_ifconf {
+	compat_int_t	ifc_len;                /* size of buffer */
+	compat_uptr_t	ifcbuf;
+};
+
 #ifdef CONFIG_COMPAT
 
 #ifndef compat_user_stack_pointer
@@ -323,12 +328,6 @@ typedef struct compat_sigevent {
 	} _sigev_un;
 } compat_sigevent_t;
 
-struct compat_if_settings {
-	unsigned int type;	/* Type of physical device or protocol */
-	unsigned int size;	/* Size of the data allocated by the caller */
-	compat_uptr_t ifs_ifsu;	/* union of pointers */
-};
-
 struct compat_ifreq {
 	union {
 		char	ifrn_name[IFNAMSIZ];    /* if name, e.g. "en0" */
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 53aa0343bf69..67e042932681 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -178,6 +178,15 @@ static inline struct net_device *ip_dev_find(struct net *net, __be32 addr)
 
 int inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b);
 int devinet_ioctl(struct net *net, unsigned int cmd, struct ifreq *);
+#ifdef CONFIG_INET
+int inet_gifconf(struct net_device *dev, char __user *buf, int len, int size);
+#else
+static inline int inet_gifconf(struct net_device *dev, char __user *buf,
+			       int len, int size)
+{
+	return 0;
+}
+#endif
 void devinet_init(void);
 struct in_device *inetdev_by_index(struct net *, int);
 __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 03433a4c929e..e99450a60d8e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3163,14 +3163,6 @@ static inline bool dev_validate_header(const struct net_device *dev,
 	return false;
 }
 
-typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr,
-			   int len, int size);
-int register_gifconf(unsigned int family, gifconf_func_t *gifconf);
-static inline int unregister_gifconf(unsigned int family)
-{
-	return register_gifconf(family, NULL);
-}
-
 #ifdef CONFIG_NET_FLOW_LIMIT
 #define FLOW_LIMIT_HISTORY	(1 << 7)  /* must be ^2 and !overflow buckets */
 struct sd_flow_limit {
@@ -3870,7 +3862,7 @@ void netdev_rx_handler_unregister(struct net_device *dev);
 bool dev_valid_name(const char *name);
 int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr,
 		bool *need_copyout);
-int dev_ifconf(struct net *net, struct ifconf *, int);
+int dev_ifconf(struct net *net, struct ifconf __user *ifc);
 int dev_ethtool(struct net *net, struct ifreq *);
 unsigned int dev_get_flags(const struct net_device *);
 int __dev_change_flags(struct net_device *dev, unsigned int flags,
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index de3df6fe65fe..de5478b2ef77 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/kmod.h>
 #include <linux/netdevice.h>
+#include <linux/inetdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/rtnetlink.h>
 #include <linux/net_tstamp.h>
@@ -25,77 +26,56 @@ static int dev_ifname(struct net *net, struct ifreq *ifr)
 	return netdev_get_name(net, ifr->ifr_name, ifr->ifr_ifindex);
 }
 
-static gifconf_func_t *gifconf_list[NPROTO];
-
-/**
- *	register_gifconf	-	register a SIOCGIF handler
- *	@family: Address family
- *	@gifconf: Function handler
- *
- *	Register protocol dependent address dumping routines. The handler
- *	that is passed must not be freed or reused until it has been replaced
- *	by another handler.
- */
-int register_gifconf(unsigned int family, gifconf_func_t *gifconf)
-{
-	if (family >= NPROTO)
-		return -EINVAL;
-	gifconf_list[family] = gifconf;
-	return 0;
-}
-EXPORT_SYMBOL(register_gifconf);
-
 /*
  *	Perform a SIOCGIFCONF call. This structure will change
  *	size eventually, and there is nothing I can do about it.
  *	Thus we will need a 'compatibility mode'.
  */
-
-int dev_ifconf(struct net *net, struct ifconf *ifc, int size)
+int dev_ifconf(struct net *net, struct ifconf __user *uifc)
 {
 	struct net_device *dev;
-	char __user *pos;
-	int len;
-	int total;
-	int i;
+	void __user *pos;
+	size_t size;
+	int len, total = 0, done;
 
-	/*
-	 *	Fetch the caller's info block.
-	 */
+	/* both the ifconf and the ifreq structures are slightly different */
+	if (in_compat_syscall()) {
+		struct compat_ifconf ifc32;
 
-	pos = ifc->ifc_buf;
-	len = ifc->ifc_len;
+		if (copy_from_user(&ifc32, uifc, sizeof(struct compat_ifconf)))
+			return -EFAULT;
 
-	/*
-	 *	Loop over the interfaces, and write an info block for each.
-	 */
+		pos = compat_ptr(ifc32.ifcbuf);
+		len = ifc32.ifc_len;
+		size = sizeof(struct compat_ifreq);
+	} else {
+		struct ifconf ifc;
 
-	total = 0;
+		if (copy_from_user(&ifc, uifc, sizeof(struct ifconf)))
+			return -EFAULT;
+
+		pos = ifc.ifc_buf;
+		len = ifc.ifc_len;
+		size = sizeof(struct ifreq);
+	}
+
+	/* Loop over the interfaces, and write an info block for each. */
+	rtnl_lock();
 	for_each_netdev(net, dev) {
-		for (i = 0; i < NPROTO; i++) {
-			if (gifconf_list[i]) {
-				int done;
-				if (!pos)
-					done = gifconf_list[i](dev, NULL, 0, size);
-				else
-					done = gifconf_list[i](dev, pos + total,
-							       len - total, size);
-				if (done < 0)
-					return -EFAULT;
-				total += done;
-			}
+		if (!pos)
+			done = inet_gifconf(dev, NULL, 0, size);
+		else
+			done = inet_gifconf(dev, pos + total,
+					    len - total, size);
+		if (done < 0) {
+			rtnl_unlock();
+			return -EFAULT;
 		}
+		total += done;
 	}
+	rtnl_unlock();
 
-	/*
-	 *	All done.  Write the updated control block back to the caller.
-	 */
-	ifc->ifc_len = total;
-
-	/*
-	 * 	Both BSD and Solaris return 0 here, so we do too.
-	 */
-	return 0;
+	return put_user(total, &uifc->ifc_len);
 }
 
 static int dev_getifmap(struct net_device *dev, struct ifreq *ifr)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 75f67994fc85..3c51395a49c8 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1243,7 +1243,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr)
 	return ret;
 }
 
-static int inet_gifconf(struct net_device *dev, char __user *buf, int len, int size)
+int inet_gifconf(struct net_device *dev, char __user *buf, int len, int size)
 {
 	struct in_device *in_dev = __in_dev_get_rtnl(dev);
 	const struct in_ifaddr *ifa;
@@ -2761,8 +2761,6 @@ void __init devinet_init(void)
 		INIT_HLIST_HEAD(&inet_addr_lst[i]);
 
 	register_pernet_subsys(&devinet_ops);
-
-	register_gifconf(PF_INET, inet_gifconf);
 	register_netdevice_notifier(&ip_netdev_notifier);
 
 	queue_delayed_work(system_power_efficient_wq, &check_lifetime_work, 0);
diff --git a/net/socket.c b/net/socket.c
index 2d32c8576e95..41158e459f0b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1029,6 +1029,8 @@ EXPORT_SYMBOL(vlan_ioctl_set);
 static long sock_do_ioctl(struct net *net, struct socket *sock,
 			  unsigned int cmd, unsigned long arg)
 {
+	struct ifreq ifr;
+	bool need_copyout;
 	int err;
 	void __user *argp = (void __user *)arg;
 
@@ -1041,25 +1043,13 @@ static long sock_do_ioctl(struct net *net, struct socket *sock,
 	if (err != -ENOIOCTLCMD)
 		return err;
 
-	if (cmd == SIOCGIFCONF) {
-		struct ifconf ifc;
-		if (copy_from_user(&ifc, argp, sizeof(struct ifconf)))
-			return -EFAULT;
-		rtnl_lock();
-		err = dev_ifconf(net, &ifc, sizeof(struct ifreq));
-		rtnl_unlock();
-		if (!err && copy_to_user(argp, &ifc, sizeof(struct ifconf)))
-			err = -EFAULT;
-	} else {
-		struct ifreq ifr;
-		bool need_copyout;
-		if (copy_from_user(&ifr, argp, sizeof(struct ifreq)))
+	if (copy_from_user(&ifr, argp, sizeof(struct ifreq)))
+		return -EFAULT;
+	err = dev_ioctl(net, cmd, &ifr, &need_copyout);
+	if (!err && need_copyout)
+		if (copy_to_user(argp, &ifr, sizeof(struct ifreq)))
 			return -EFAULT;
-		err = dev_ioctl(net, cmd, &ifr, &need_copyout);
-		if (!err && need_copyout)
-			if (copy_to_user(argp, &ifr, sizeof(struct ifreq)))
-				return -EFAULT;
-	}
+
 	return err;
 }
 
@@ -1171,6 +1161,11 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 						   cmd == SIOCGSTAMP_NEW,
 						   false);
 			break;
+
+		case SIOCGIFCONF:
+			err = dev_ifconf(net, argp);
+			break;
+
 		default:
 			err = sock_do_ioctl(net, sock, cmd, arg);
 			break;
@@ -3084,31 +3079,6 @@ void socket_seq_show(struct seq_file *seq)
 #endif				/* CONFIG_PROC_FS */
 
 #ifdef CONFIG_COMPAT
-static int compat_dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
-{
-	struct compat_ifconf ifc32;
-	struct ifconf ifc;
-	int err;
-
-	if (copy_from_user(&ifc32, uifc32, sizeof(struct compat_ifconf)))
-		return -EFAULT;
-
-	ifc.ifc_len = ifc32.ifc_len;
-	ifc.ifc_req = compat_ptr(ifc32.ifcbuf);
-
-	rtnl_lock();
-	err = dev_ifconf(net, &ifc, sizeof(struct compat_ifreq));
-	rtnl_unlock();
-	if (err)
-		return err;
-
-	ifc32.ifc_len = ifc.ifc_len;
-	if (copy_to_user(uifc32, &ifc32, sizeof(struct compat_ifconf)))
-		return -EFAULT;
-
-	return 0;
-}
-
 static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32)
 {
 	compat_uptr_t uptr32;
@@ -3227,8 +3197,6 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
 	case SIOCSIFBR:
 	case SIOCGIFBR:
 		return old_bridge_ioctl(argp);
-	case SIOCGIFCONF:
-		return compat_dev_ifconf(net, argp);
 	case SIOCWANDEV:
 		return compat_siocwandev(net, argp);
 	case SIOCGSTAMP_OLD:
@@ -3256,6 +3224,7 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
 	case SIOCGSKNS:
 	case SIOCGSTAMP_NEW:
 	case SIOCGSTAMPNS_NEW:
+	case SIOCGIFCONF:
 		return sock_ioctl(file, cmd, arg);
 
 	case SIOCGIFFLAGS:
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH] net: phy: fix auto-negotiation in case of 'down-shift'
From: Antonio Borneo @ 2020-11-24 15:17 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	netdev, Yonglong Liu, stable, linuxarm, Salil Mehta, linux-stm32,
	linux-kernel
In-Reply-To: <20201124145647.GF1551@shell.armlinux.org.uk>

On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
> On Tue, Nov 24, 2020 at 03:38:48PM +0100, Antonio Borneo wrote:
> > If the auto-negotiation fails to establish a gigabit link, the phy
> > can try to 'down-shift': it resets the bits in MII_CTRL1000 to
> > stop advertising 1Gbps and retries the negotiation at 100Mbps.
> > 
> > From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
> > in genphy_read_status") the content of MII_CTRL1000 is not checked
> > anymore at the end of the negotiation, preventing the detection of
> > phy 'down-shift'.
> > In case of 'down-shift' phydev->advertising gets out-of-sync wrt
> > MII_CTRL1000 and still includes modes that the phy have already
> > dropped. The link partner could still advertise higher speeds,
> > while the link is established at one of the common lower speeds.
> > The logic 'and' in phy_resolve_aneg_linkmode() between
> > phydev->advertising and phydev->lp_advertising will report an
> > incorrect mode.
> > 
> > Issue detected with a local phy rtl8211f connected with a gigabit
> > capable router through a two-pairs network cable.
> > 
> > After auto-negotiation, read back MII_CTRL1000 and mask-out from
> > phydev->advertising the modes that have been eventually discarded
> > due to the 'down-shift'.
> 
> Sorry, but no. While your solution will appear to work, in
> introduces unexpected changes to the user visible APIs.
> 
> >  	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
> > +		if (phydev->is_gigabit_capable) {
> > +			adv = phy_read(phydev, MII_CTRL1000);
> > +			if (adv < 0)
> > +				return adv;
> > +			/* update advertising in case of 'down-shift' */
> > +			mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
> > +							adv);
> 
> If a down-shift occurs, this will cause the configured advertising
> mask to lose the 1G speed, which will be visible to userspace.

You are right, it gets propagated to user that 1Gbps is not advertised

> Userspace doesn't expect the advertising mask to change beneath it.
> Since updates from userspace are done using a read-modify-write of
> the ksettings, this can have the undesired effect of removing 1G
> from the configured advertising mask.
> 
> We've had other PHYs have this behaviour; the correct solution is for
> the PHY driver to implement reading the resolution from the PHY rather
> than relying on the generic implementation if it can down-shift

If it's already upstream, could you please point to one of the phy driver
that already implements this properly?

Thanks
Antonio


^ permalink raw reply

* Re: [PATCH net-next] net: DSCP in IPv4 routing v2
From: Guillaume Nault @ 2020-11-24 15:22 UTC (permalink / raw)
  To: Russell Strong; +Cc: netdev
In-Reply-To: <20201124124149.11fe991e@192-168-1-16.tpgi.com.au>

On Tue, Nov 24, 2020 at 12:41:49PM +1000, Russell Strong wrote:
> On Mon, 23 Nov 2020 23:55:05 +0100 Guillaume Nault <gnault@redhat.com> wrote:
> > On Sat, Nov 21, 2020 at 06:24:46PM +1000, Russell Strong wrote:
> 
> I was wondering if one patch would be acceptable, or should it be broken
> up?  If broken up. It would not make sense to apply 1/2 of them.

A patch series would be applied in its entirety or not applied at all.
However, it's not acceptable to temporarily bring regressions in one
patch and fix it later in the series. The tree has to remain
bisectable.

Anyway, I believe there's no need to replace all the TOS macros in the
same patch series. DSCP doesn't have to be enabled everywhere at once.
Small, targeted, patch series are much easier to review.

> > RT_TOS didn't clear the second lowest bit, while the new IP_DSCP does.
> > Therefore, there's no guarantee that such a blanket replacement isn't
> > going to change existing behaviours. Replacements have to be done
> > step by step and accompanied by an explanation of why they're safe.
> 
> Original TOS did not use this bit until it was added in RFC1349 as "lowcost".
> The DSCP change (RFC2474) marked these as currently unused, but worse than that,
> with the introduction of ECN, both of those now "unused" bits are for ECN.
> Other parts of the kernel are using those bits for ECN, so bit 1 probably
> shouldn't be used in routing anymore as congestion could create unexpected
> routing behaviour, i.e. fib_rules

The IETF meaning and history of these bits are well understood. But we
can't write patches based on assumptions like "bit 1 probably shouldn't
be used". The actual code is what matters. That's why, again, changes
have to be done incrementally and in a reviewable manner.

> > For example some of the ip6_make_flowinfo() calls can probably
> > erroneously mark some packets with ECT(0). Instead of masking the
> > problem in this patch, I think it'd be better to have an explicit fix
> > that'd mask the ECN bits in ip6_make_flowinfo() and drop the buggy
> > RT_TOS() in the callers.
> > 
> > Another example is inet_rtm_getroute(). It calls
> > ip_route_output_key_hash_rcu() without masking the tos field first.
> 
> Should rtm->tos be checked for validity in inet_rtm_valid_getroute_req? Seems
> like it was missed.

Well, I don't think so. inet_rtm_valid_getroute_req() is supposed to
return an error if a parameter is wrong. Verifying ->tos should have
been done since day 1, yes. However, in practice, we've been accepting
any value for years. That's the kind of user space behaviour that we
can't really change. The only solution I can see is to mask the ECN
bits silently. That way, users can still pass whatever they like (we
won't break any script), but the result will be right (that is,
consistent with what routing does).

> > Therefore it can return a different route than what the routing code
> > would actually use. Like for the ip6_make_flowinfo() case, it might
> > be better to stop relying on the callers to mask ECN bits and do that
> > in ip_route_output_key_hash_rcu() instead.
> 
> In this context one of the ECN bits is not an ECN bit, as can be seen by
> 
> #define RT_FL_TOS(oldflp4) \
>         ((oldflp4)->flowi4_tos & (IP_DSCP_MASK | RTO_ONLINK))

The RTO_ONLINK flag would have to be passed in a different way. Not a
trivial task (many places to audit), but that looks feasible.

> It's all a bit messy and spread about.  Reducing the distributed nature of
> the masking would be good.

Yes, that's why I'd like to stop sprinkling RT_TOS everywhere and mask
the bits in central places when possible. Once the RT_TOS situation
improves, adding DSCP support will be much easier.

> > I'll verify that these two problems can actually happen in practice
> > and will send patches if necessary.
> 
> Thanks
> 

^ permalink raw reply

* Re: [PATCH][next] mwifiex: Fix fall-through warnings for Clang
From: Gustavo A. R. Silva @ 2020-11-24 15:22 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Amitkumar Karwar, Ganapathi Bhat, Xinming Hu, David S. Miller,
	Jakub Kicinski, linux-wireless, netdev, linux-kernel,
	linux-hardening
In-Reply-To: <20201124150614.68C3EC43461@smtp.codeaurora.org>

On Tue, Nov 24, 2020 at 03:06:14PM +0000, Kalle Valo wrote:
> "Gustavo A. R. Silva" <gustavoars@kernel.org> wrote:
> 
> > In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
> > warnings by explicitly adding multiple break statements instead of
> > letting the code fall through to the next case.
> > 
> > Link: https://github.com/KSPP/linux/issues/115
> > Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> 
> Patch applied to wireless-drivers-next.git, thanks.
> 
> 003317581372 mwifiex: Fix fall-through warnings for Clang

Thank you, Kalle.
--
Gustavo

^ permalink raw reply

* [net-next v2 0/5] Add CHACHA20-POLY1305 cipher to Kernel TLS
From: Vadim Fedorenko @ 2020-11-24 15:24 UTC (permalink / raw)
  To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel
  Cc: Vadim Fedorenko, netdev, linux-crypto

RFC 7905 defines usage of ChaCha20-Poly1305 in TLS connections. This
cipher is widely used nowadays and it's good to have a support for it
in TLS connections in kernel.
Changes v2: 
  nit fixes suggested by Jakub Kicinski
  add linux-crypto to review patch set

Vadim Fedorenko (5):
  net/tls: make inline helpers protocol-aware
  net/tls: add CHACHA20-POLY1305 specific defines and structures
  net/tls: add CHACHA20-POLY1305 specific behavior
  net/tls: add CHACHA20-POLY1305 configuration
  selftests/tls: add CHACHA20-POLY1305 to tls selftests

 include/net/tls.h                 | 32 ++++++++++++++++---------------
 include/uapi/linux/tls.h          | 15 +++++++++++++++
 net/tls/tls_device.c              |  2 +-
 net/tls/tls_device_fallback.c     | 13 +++++++------
 net/tls/tls_main.c                |  3 +++
 net/tls/tls_sw.c                  | 34 ++++++++++++++++++++++++---------
 tools/testing/selftests/net/tls.c | 40 ++++++++++++++++++++++++++++++++-------
 7 files changed, 101 insertions(+), 38 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* [net-next v2 1/5] net/tls: make inline helpers protocol-aware
From: Vadim Fedorenko @ 2020-11-24 15:24 UTC (permalink / raw)
  To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel
  Cc: Vadim Fedorenko, netdev, linux-crypto
In-Reply-To: <1606231490-653-1-git-send-email-vfedorenko@novek.ru>

Inline functions defined in tls.h have a lot of AES-specific
constants. Remove these constants and change argument to struct
tls_prot_info to have an access to cipher type in later patches

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
 include/net/tls.h             | 26 ++++++++++++--------------
 net/tls/tls_device.c          |  2 +-
 net/tls/tls_device_fallback.c | 13 +++++++------
 net/tls/tls_sw.c              | 12 +++++-------
 4 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index cf14730..d04ce73 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -502,31 +502,30 @@ static inline void tls_advance_record_sn(struct sock *sk,
 		tls_err_abort(sk, EBADMSG);
 
 	if (prot->version != TLS_1_3_VERSION)
-		tls_bigint_increment(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
+		tls_bigint_increment(ctx->iv + prot->salt_size,
 				     prot->iv_size);
 }
 
 static inline void tls_fill_prepend(struct tls_context *ctx,
 			     char *buf,
 			     size_t plaintext_len,
-			     unsigned char record_type,
-			     int version)
+			     unsigned char record_type)
 {
 	struct tls_prot_info *prot = &ctx->prot_info;
 	size_t pkt_len, iv_size = prot->iv_size;
 
 	pkt_len = plaintext_len + prot->tag_size;
-	if (version != TLS_1_3_VERSION) {
+	if (prot->version != TLS_1_3_VERSION) {
 		pkt_len += iv_size;
 
 		memcpy(buf + TLS_NONCE_OFFSET,
-		       ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
+		       ctx->tx.iv + prot->salt_size, iv_size);
 	}
 
 	/* we cover nonce explicit here as well, so buf should be of
 	 * size KTLS_DTLS_HEADER_SIZE + KTLS_DTLS_NONCE_EXPLICIT_SIZE
 	 */
-	buf[0] = version == TLS_1_3_VERSION ?
+	buf[0] = prot->version == TLS_1_3_VERSION ?
 		   TLS_RECORD_TYPE_DATA : record_type;
 	/* Note that VERSION must be TLS_1_2 for both TLS1.2 and TLS1.3 */
 	buf[1] = TLS_1_2_VERSION_MINOR;
@@ -539,18 +538,17 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
 static inline void tls_make_aad(char *buf,
 				size_t size,
 				char *record_sequence,
-				int record_sequence_size,
 				unsigned char record_type,
-				int version)
+				struct tls_prot_info *prot)
 {
-	if (version != TLS_1_3_VERSION) {
-		memcpy(buf, record_sequence, record_sequence_size);
+	if (prot->version != TLS_1_3_VERSION) {
+		memcpy(buf, record_sequence, prot->rec_seq_size);
 		buf += 8;
 	} else {
-		size += TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+		size += prot->tag_size;
 	}
 
-	buf[0] = version == TLS_1_3_VERSION ?
+	buf[0] = prot->version == TLS_1_3_VERSION ?
 		  TLS_RECORD_TYPE_DATA : record_type;
 	buf[1] = TLS_1_2_VERSION_MAJOR;
 	buf[2] = TLS_1_2_VERSION_MINOR;
@@ -558,11 +556,11 @@ static inline void tls_make_aad(char *buf,
 	buf[4] = size & 0xFF;
 }
 
-static inline void xor_iv_with_seq(int version, char *iv, char *seq)
+static inline void xor_iv_with_seq(struct tls_prot_info *prot, char *iv, char *seq)
 {
 	int i;
 
-	if (version == TLS_1_3_VERSION) {
+	if (prot->version == TLS_1_3_VERSION) {
 		for (i = 0; i < 8; i++)
 			iv[i + 4] ^= seq[i];
 	}
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 54d3e16..6f93ad5 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -327,7 +327,7 @@ static int tls_device_record_close(struct sock *sk,
 	/* fill prepend */
 	tls_fill_prepend(ctx, skb_frag_address(&record->frags[0]),
 			 record->len - prot->overhead_size,
-			 record_type, prot->version);
+			 record_type);
 	return ret;
 }
 
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
index 2889533..d946817 100644
--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -49,7 +49,8 @@ static int tls_enc_record(struct aead_request *aead_req,
 			  struct crypto_aead *aead, char *aad,
 			  char *iv, __be64 rcd_sn,
 			  struct scatter_walk *in,
-			  struct scatter_walk *out, int *in_len)
+			  struct scatter_walk *out, int *in_len,
+			  struct tls_prot_info *prot)
 {
 	unsigned char buf[TLS_HEADER_SIZE + TLS_CIPHER_AES_GCM_128_IV_SIZE];
 	struct scatterlist sg_in[3];
@@ -73,8 +74,7 @@ static int tls_enc_record(struct aead_request *aead_req,
 	len -= TLS_CIPHER_AES_GCM_128_IV_SIZE;
 
 	tls_make_aad(aad, len - TLS_CIPHER_AES_GCM_128_TAG_SIZE,
-		(char *)&rcd_sn, sizeof(rcd_sn), buf[0],
-		TLS_1_2_VERSION);
+		(char *)&rcd_sn, buf[0], prot);
 
 	memcpy(iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, buf + TLS_HEADER_SIZE,
 	       TLS_CIPHER_AES_GCM_128_IV_SIZE);
@@ -140,7 +140,7 @@ static struct aead_request *tls_alloc_aead_request(struct crypto_aead *aead,
 static int tls_enc_records(struct aead_request *aead_req,
 			   struct crypto_aead *aead, struct scatterlist *sg_in,
 			   struct scatterlist *sg_out, char *aad, char *iv,
-			   u64 rcd_sn, int len)
+			   u64 rcd_sn, int len, struct tls_prot_info *prot)
 {
 	struct scatter_walk out, in;
 	int rc;
@@ -150,7 +150,7 @@ static int tls_enc_records(struct aead_request *aead_req,
 
 	do {
 		rc = tls_enc_record(aead_req, aead, aad, iv,
-				    cpu_to_be64(rcd_sn), &in, &out, &len);
+				    cpu_to_be64(rcd_sn), &in, &out, &len, prot);
 		rcd_sn++;
 
 	} while (rc == 0 && len);
@@ -348,7 +348,8 @@ static struct sk_buff *tls_enc_skb(struct tls_context *tls_ctx,
 		    payload_len, sync_size, dummy_buf);
 
 	if (tls_enc_records(aead_req, ctx->aead_send, sg_in, sg_out, aad, iv,
-			    rcd_sn, sync_size + payload_len) < 0)
+			    rcd_sn, sync_size + payload_len,
+			    &tls_ctx->prot_info) < 0)
 		goto free_nskb;
 
 	complete_skb(nskb, skb, tcp_payload_offset);
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2fe9e2c..6bc757a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -505,7 +505,7 @@ static int tls_do_encryption(struct sock *sk,
 	memcpy(&rec->iv_data[iv_offset], tls_ctx->tx.iv,
 	       prot->iv_size + prot->salt_size);
 
-	xor_iv_with_seq(prot->version, rec->iv_data, tls_ctx->tx.rec_seq);
+	xor_iv_with_seq(prot, rec->iv_data, tls_ctx->tx.rec_seq);
 
 	sge->offset += prot->prepend_size;
 	sge->length -= prot->prepend_size;
@@ -748,14 +748,13 @@ static int tls_push_record(struct sock *sk, int flags,
 	sg_chain(rec->sg_aead_out, 2, &msg_en->sg.data[i]);
 
 	tls_make_aad(rec->aad_space, msg_pl->sg.size + prot->tail_size,
-		     tls_ctx->tx.rec_seq, prot->rec_seq_size,
-		     record_type, prot->version);
+		     tls_ctx->tx.rec_seq, record_type, prot);
 
 	tls_fill_prepend(tls_ctx,
 			 page_address(sg_page(&msg_en->sg.data[i])) +
 			 msg_en->sg.data[i].offset,
 			 msg_pl->sg.size + prot->tail_size,
-			 record_type, prot->version);
+			 record_type);
 
 	tls_ctx->pending_open_record_frags = false;
 
@@ -1471,13 +1470,12 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 	else
 		memcpy(iv + iv_offset, tls_ctx->rx.iv, prot->salt_size);
 
-	xor_iv_with_seq(prot->version, iv, tls_ctx->rx.rec_seq);
+	xor_iv_with_seq(prot, iv, tls_ctx->rx.rec_seq);
 
 	/* Prepare AAD */
 	tls_make_aad(aad, rxm->full_len - prot->overhead_size +
 		     prot->tail_size,
-		     tls_ctx->rx.rec_seq, prot->rec_seq_size,
-		     ctx->control, prot->version);
+		     tls_ctx->rx.rec_seq, ctx->control, prot);
 
 	/* Prepare sgin */
 	sg_init_table(sgin, n_sgin);
-- 
1.8.3.1


^ permalink raw reply related

* [net-next v2 3/5] net/tls: add CHACHA20-POLY1305 specific behavior
From: Vadim Fedorenko @ 2020-11-24 15:24 UTC (permalink / raw)
  To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel
  Cc: Vadim Fedorenko, netdev, linux-crypto
In-Reply-To: <1606231490-653-1-git-send-email-vfedorenko@novek.ru>

RFC 7905 defines special behavior for ChaCha-Poly TLS sessions.
The differences are in the calculation of nonce and the absence
of explicit IV. This behavior is like TLSv1.3 partly.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
 include/net/tls.h | 9 ++++++---
 net/tls/tls_sw.c  | 6 ++++--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index e4e9c2a..b2637ed 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -502,7 +502,8 @@ static inline void tls_advance_record_sn(struct sock *sk,
 	if (tls_bigint_increment(ctx->rec_seq, prot->rec_seq_size))
 		tls_err_abort(sk, EBADMSG);
 
-	if (prot->version != TLS_1_3_VERSION)
+	if (prot->version != TLS_1_3_VERSION &&
+	    prot->cipher_type != TLS_CIPHER_CHACHA20_POLY1305)
 		tls_bigint_increment(ctx->iv + prot->salt_size,
 				     prot->iv_size);
 }
@@ -516,7 +517,8 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
 	size_t pkt_len, iv_size = prot->iv_size;
 
 	pkt_len = plaintext_len + prot->tag_size;
-	if (prot->version != TLS_1_3_VERSION) {
+	if (prot->version != TLS_1_3_VERSION &&
+	    prot->cipher_type != TLS_CIPHER_CHACHA20_POLY1305) {
 		pkt_len += iv_size;
 
 		memcpy(buf + TLS_NONCE_OFFSET,
@@ -561,7 +563,8 @@ static inline void xor_iv_with_seq(struct tls_prot_info *prot, char *iv, char *s
 {
 	int i;
 
-	if (prot->version == TLS_1_3_VERSION) {
+	if (prot->version == TLS_1_3_VERSION ||
+	    prot->cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
 		for (i = 0; i < 8; i++)
 			iv[i + 4] ^= seq[i];
 	}
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6bc757a..b4eefdb 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1464,7 +1464,8 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 		kfree(mem);
 		return err;
 	}
-	if (prot->version == TLS_1_3_VERSION)
+	if (prot->version == TLS_1_3_VERSION ||
+	    prot->cipher_type == TLS_CIPHER_CHACHA20_POLY1305)
 		memcpy(iv + iv_offset, tls_ctx->rx.iv,
 		       crypto_aead_ivsize(ctx->aead_recv));
 	else
@@ -2068,7 +2069,8 @@ static int tls_read_size(struct strparser *strp, struct sk_buff *skb)
 	data_len = ((header[4] & 0xFF) | (header[3] << 8));
 
 	cipher_overhead = prot->tag_size;
-	if (prot->version != TLS_1_3_VERSION)
+	if (prot->version != TLS_1_3_VERSION &&
+	    prot->cipher_type != TLS_CIPHER_CHACHA20_POLY1305)
 		cipher_overhead += prot->iv_size;
 
 	if (data_len > TLS_MAX_PAYLOAD_SIZE + cipher_overhead +
-- 
1.8.3.1


^ permalink raw reply related

* [net-next v2 2/5] net/tls: add CHACHA20-POLY1305 specific defines and structures
From: Vadim Fedorenko @ 2020-11-24 15:24 UTC (permalink / raw)
  To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel
  Cc: Vadim Fedorenko, netdev, linux-crypto
In-Reply-To: <1606231490-653-1-git-send-email-vfedorenko@novek.ru>

To provide support for ChaCha-Poly cipher we need to define
specific constants and structures.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
 include/net/tls.h        |  1 +
 include/uapi/linux/tls.h | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index d04ce73..e4e9c2a 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -211,6 +211,7 @@ struct cipher_context {
 	union {
 		struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
 		struct tls12_crypto_info_aes_gcm_256 aes_gcm_256;
+		struct tls12_crypto_info_chacha20_poly1305 chacha20_poly1305;
 	};
 };
 
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index bcd2869..0d54bae 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -77,6 +77,13 @@
 #define TLS_CIPHER_AES_CCM_128_TAG_SIZE		16
 #define TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE		8
 
+#define TLS_CIPHER_CHACHA20_POLY1305			54
+#define TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE		12
+#define TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE	32
+#define TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE		0
+#define TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE	16
+#define TLS_CIPHER_CHACHA20_POLY1305_REC_SEQ_SIZE	8
+
 #define TLS_SET_RECORD_TYPE	1
 #define TLS_GET_RECORD_TYPE	2
 
@@ -109,6 +116,14 @@ struct tls12_crypto_info_aes_ccm_128 {
 	unsigned char rec_seq[TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE];
 };
 
+struct tls12_crypto_info_chacha20_poly1305 {
+	struct tls_crypto_info info;
+	unsigned char iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+	unsigned char key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+	unsigned char salt[TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE];
+	unsigned char rec_seq[TLS_CIPHER_CHACHA20_POLY1305_REC_SEQ_SIZE];
+};
+
 enum {
 	TLS_INFO_UNSPEC,
 	TLS_INFO_VERSION,
-- 
1.8.3.1


^ permalink raw reply related

* [net-next v2 5/5] selftests/tls: add CHACHA20-POLY1305 to tls selftests
From: Vadim Fedorenko @ 2020-11-24 15:24 UTC (permalink / raw)
  To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel
  Cc: Vadim Fedorenko, netdev, linux-crypto
In-Reply-To: <1606231490-653-1-git-send-email-vfedorenko@novek.ru>

Add new cipher as a variant of standard tls selftests

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
 tools/testing/selftests/net/tls.c | 40 ++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index b599f1f..cb0d189 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -103,32 +103,58 @@
 
 FIXTURE_VARIANT(tls)
 {
-	unsigned int tls_version;
+	u16 tls_version;
+	u16 cipher_type;
 };
 
-FIXTURE_VARIANT_ADD(tls, 12)
+FIXTURE_VARIANT_ADD(tls, 12_gcm)
 {
 	.tls_version = TLS_1_2_VERSION,
+	.cipher_type = TLS_CIPHER_AES_GCM_128,
 };
 
-FIXTURE_VARIANT_ADD(tls, 13)
+FIXTURE_VARIANT_ADD(tls, 13_gcm)
 {
 	.tls_version = TLS_1_3_VERSION,
+	.cipher_type = TLS_CIPHER_AES_GCM_128,
+};
+
+FIXTURE_VARIANT_ADD(tls, 12_chacha)
+{
+	.tls_version = TLS_1_2_VERSION,
+	.cipher_type = TLS_CIPHER_CHACHA20_POLY1305,
+};
+
+FIXTURE_VARIANT_ADD(tls, 13_chacha)
+{
+	.tls_version = TLS_1_3_VERSION,
+	.cipher_type = TLS_CIPHER_CHACHA20_POLY1305,
 };
 
 FIXTURE_SETUP(tls)
 {
-	struct tls12_crypto_info_aes_gcm_128 tls12;
+	union tls_crypto_context tls12;
 	struct sockaddr_in addr;
 	socklen_t len;
 	int sfd, ret;
+	size_t tls12_sz;
 
 	self->notls = false;
 	len = sizeof(addr);
 
 	memset(&tls12, 0, sizeof(tls12));
 	tls12.info.version = variant->tls_version;
-	tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	tls12.info.cipher_type = variant->cipher_type;
+	switch (variant->cipher_type) {
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		tls12_sz = sizeof(tls12_crypto_info_chacha20_poly1305);
+		break;
+	case TLS_CIPHER_AES_GCM_128:
+		tls12_sz = sizeof(tls12_crypto_info_aes_gcm_128);
+		break;
+	default:
+		tls12_sz = 0;
+	}
 
 	addr.sin_family = AF_INET;
 	addr.sin_addr.s_addr = htonl(INADDR_ANY);
@@ -156,7 +182,7 @@
 
 	if (!self->notls) {
 		ret = setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12,
-				 sizeof(tls12));
+				 tls12_sz);
 		ASSERT_EQ(ret, 0);
 	}
 
@@ -169,7 +195,7 @@
 		ASSERT_EQ(ret, 0);
 
 		ret = setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12,
-				 sizeof(tls12));
+				 tls12_sz);
 		ASSERT_EQ(ret, 0);
 	}
 
-- 
1.8.3.1


^ permalink raw reply related

* [net-next v2 4/5] net/tls: add CHACHA20-POLY1305 configuration
From: Vadim Fedorenko @ 2020-11-24 15:24 UTC (permalink / raw)
  To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel
  Cc: Vadim Fedorenko, netdev, linux-crypto
In-Reply-To: <1606231490-653-1-git-send-email-vfedorenko@novek.ru>

Add ChaCha-Poly specific configuration code.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
 net/tls/tls_main.c |  3 +++
 net/tls/tls_sw.c   | 16 ++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 8d93cea..47b7c53 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -521,6 +521,9 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval,
 	case TLS_CIPHER_AES_CCM_128:
 		optsize = sizeof(struct tls12_crypto_info_aes_ccm_128);
 		break;
+	case TLS_CIPHER_CHACHA20_POLY1305:
+		optsize = sizeof(struct tls12_crypto_info_chacha20_poly1305);
+		break;
 	default:
 		rc = -EINVAL;
 		goto err_crypto_info;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index b4eefdb..53106f0 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2290,6 +2290,7 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
 	struct tls12_crypto_info_aes_gcm_128 *gcm_128_info;
 	struct tls12_crypto_info_aes_gcm_256 *gcm_256_info;
 	struct tls12_crypto_info_aes_ccm_128 *ccm_128_info;
+	struct tls12_crypto_info_chacha20_poly1305 *chacha20_poly1305_info;
 	struct tls_sw_context_tx *sw_ctx_tx = NULL;
 	struct tls_sw_context_rx *sw_ctx_rx = NULL;
 	struct cipher_context *cctx;
@@ -2402,6 +2403,21 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
 		cipher_name = "ccm(aes)";
 		break;
 	}
+	case TLS_CIPHER_CHACHA20_POLY1305: {
+		chacha20_poly1305_info = (void *)crypto_info;
+		nonce_size = 0;
+		tag_size = TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+		iv_size = TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE;
+		iv = chacha20_poly1305_info->iv;
+		rec_seq_size = TLS_CIPHER_CHACHA20_POLY1305_REC_SEQ_SIZE;
+		rec_seq = chacha20_poly1305_info->rec_seq;
+		keysize = TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+		key = chacha20_poly1305_info->key;
+		salt = chacha20_poly1305_info->salt;
+		salt_size = TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+		cipher_name = "rfc7539(chacha20,poly1305)";
+		break;
+	}
 	default:
 		rc = -EINVAL;
 		goto free_priv;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [EXT] Re: [PATCH v3] aquantia: Remove the build_skb path
From: Igor Russkikh @ 2020-11-24 15:26 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Ramsay, Lincoln, Florian Westphal, David S. Miller,
	Jakub Kicinski, netdev, Dmitry Bogdanov [C]
In-Reply-To: <20201123192817.GA11618@ranger.igk.intel.com>


>>
>> Since normal layout is 1400 packets - we do use 2K (half page) for each
> packet.
> 
> What is 'normal layout is 1400 packets' ? Didn't you mean the 1500 byte
> standard MTU? So this is what you've been trying to tell me - that for
> 1500 byte mtu and 1k HW granularity you need to provide to HW 2k of
> contiguous space, correct?

Thats right.
Sorry for confusion, of course I meant 1500 standard MTU.

> 
>> This way we reuse each allocated page for at least two packets (and
> putting
>> skb_shared into the remaining 512b).
> 
> I don't think I follow that. I thought that 2k needs to be exclusive for
> HW and now you're saying that for remaining 512 bytes you can do whatever
> you want.

As soon as we've got packet we know its length. IF its less than 2K minus
skb_shared_info - we put that halfpage directly into skb, and placing the tail
for shared_info. This is what fast path is doing now.

> If that's true then I think you can have build_skb support and I don't see
> that 1k granularity as a limitation.

Thats true, but we can't use build_skb exactly because of the reason Ramsay
discovered. We need extra headspace always.

>> I know many customers do consider AQC chips in near embedded
> environments
>> (routers, etc). They really do care about memories. So that could be
> risky.
> 
> We have a knob that is controlled by ethtool's priv flag so you can change
> the memory model and pull the build_skb out of the picture. Just FYI.

Priv flags are considered harmful today...
But I agree in general we lack support of driver fastpath tuning.
Like changing page order for large jumbos or page reuse logic knobs.
May be devlink params could be considered for this?

>>> This issue would pop up again if this driver would like to support XDP
>>> where 256 byte headroom will have to be provided.
>>
>> Actually it already popped. Thats one of the reasons I'm delaying with
> xdp
>> patch series for this driver.
>>
>> I think the best tradeoff here would be allocating order 1 or 2 pages
> (i.e. 8K
>> or 16K), and reuse the page for multiple placements of 2K XDP packets:
>>
>> (256+2048)*3 = 6912 (1K overhead for each 3 packets)
>>
>> (256+2048)*7 = 16128 (200b overhead over 7 packets)
> 
> And for XDP_PASS you would use build_skb? Then tailroom needs to be
> provided.

For efficient PASS - I think both tail and head room should somehow be
reserved. Then yes, build_skb could be used..

Thanks
  Igor


^ permalink raw reply

* Re: [PATCH] net: phy: fix auto-negotiation in case of 'down-shift'
From: Heiner Kallweit @ 2020-11-24 15:26 UTC (permalink / raw)
  To: Antonio Borneo, Russell King - ARM Linux admin
  Cc: Andrew Lunn, David S. Miller, Jakub Kicinski, netdev,
	Yonglong Liu, stable, linuxarm, Salil Mehta, linux-stm32,
	linux-kernel
In-Reply-To: <bd83b9c15f6cfed5df90da4f6b50d1a3f479b831.camel@st.com>

Am 24.11.2020 um 16:17 schrieb Antonio Borneo:
> On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
>> On Tue, Nov 24, 2020 at 03:38:48PM +0100, Antonio Borneo wrote:
>>> If the auto-negotiation fails to establish a gigabit link, the phy
>>> can try to 'down-shift': it resets the bits in MII_CTRL1000 to
>>> stop advertising 1Gbps and retries the negotiation at 100Mbps.
>>>
>>> From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
>>> in genphy_read_status") the content of MII_CTRL1000 is not checked
>>> anymore at the end of the negotiation, preventing the detection of
>>> phy 'down-shift'.
>>> In case of 'down-shift' phydev->advertising gets out-of-sync wrt
>>> MII_CTRL1000 and still includes modes that the phy have already
>>> dropped. The link partner could still advertise higher speeds,
>>> while the link is established at one of the common lower speeds.
>>> The logic 'and' in phy_resolve_aneg_linkmode() between
>>> phydev->advertising and phydev->lp_advertising will report an
>>> incorrect mode.
>>>
>>> Issue detected with a local phy rtl8211f connected with a gigabit
>>> capable router through a two-pairs network cable.
>>>
>>> After auto-negotiation, read back MII_CTRL1000 and mask-out from
>>> phydev->advertising the modes that have been eventually discarded
>>> due to the 'down-shift'.
>>
>> Sorry, but no. While your solution will appear to work, in
>> introduces unexpected changes to the user visible APIs.
>>
>>>  	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
>>> +		if (phydev->is_gigabit_capable) {
>>> +			adv = phy_read(phydev, MII_CTRL1000);
>>> +			if (adv < 0)
>>> +				return adv;
>>> +			/* update advertising in case of 'down-shift' */
>>> +			mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
>>> +							adv);
>>
>> If a down-shift occurs, this will cause the configured advertising
>> mask to lose the 1G speed, which will be visible to userspace.
> 
> You are right, it gets propagated to user that 1Gbps is not advertised
> 
>> Userspace doesn't expect the advertising mask to change beneath it.
>> Since updates from userspace are done using a read-modify-write of
>> the ksettings, this can have the undesired effect of removing 1G
>> from the configured advertising mask.
>>
>> We've had other PHYs have this behaviour; the correct solution is for
>> the PHY driver to implement reading the resolution from the PHY rather
>> than relying on the generic implementation if it can down-shift
> 
> If it's already upstream, could you please point to one of the phy driver
> that already implements this properly?
> 

See e.g. aqr107_read_rate(), used by aqr107_read_status().

> Thanks
> Antonio
> 


^ permalink raw reply

* Re: [PATCH] net: phy: fix auto-negotiation in case of 'down-shift'
From: Antonio Borneo @ 2020-11-24 15:31 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Heiner Kallweit
  Cc: Andrew Lunn, David S. Miller, Jakub Kicinski, netdev,
	Yonglong Liu, stable, linuxarm, Salil Mehta, linux-stm32,
	linux-kernel
In-Reply-To: <20201124151716.GG1551@shell.armlinux.org.uk>

On Tue, 2020-11-24 at 15:17 +0000, Russell King - ARM Linux admin wrote:
> On Tue, Nov 24, 2020 at 04:03:40PM +0100, Heiner Kallweit wrote:
> > Am 24.11.2020 um 15:38 schrieb Antonio Borneo:
> > > If the auto-negotiation fails to establish a gigabit link, the phy
> > > can try to 'down-shift': it resets the bits in MII_CTRL1000 to
> > > stop advertising 1Gbps and retries the negotiation at 100Mbps.
> > > 
> > I see that Russell answered already. My 2cts:
> > 
> > Are you sure all PHY's supporting downshift adjust the
> > advertisement bits? IIRC an Aquantia PHY I dealt with does not.
> > And if a PHY does so I'd consider this problematic:
> > Let's say you have a broken cable and the PHY downshifts to
> > 100Mbps. If you change the cable then the PHY would still negotiate
> > 100Mbps only.
> 
> From what I've seen, that is not how downshift works, at least on
> the PHYs I've seen.
> 
> When the PHY downshifts, it modifies the advertisement registers,
> but it also remembers the original value. When the cable is
> unplugged, it restores the setting to what was previously set.

In fact, at least rtl8211f is able to recover the original settings and
returns to 1Gbps once a decent cable gets plugged-in.

> 
> It is _far_ from nice, but the fact is that your patch that Antonio
> identified has broken previously working support, something that I
> brought up when I patched one of the PHY drivers that was broken by
> this very same problem by your patch.

The idea to fix it for a general case was indeed triggered by the fact that
before commit 5502b218e001 this was the norm. I considered it as a
regression.

> 
> That said, _if_ the PHY has a way to read the resolved state rather
> than reading the advertisement registers, that is what should be
> used (as I said previously) rather than trying to decode the
> advertisement registers ourselves. That is normally more reliable
> for speed and duplex.
> 

Wrt rtl8211f I don't have info other then the public datasheet, and there I
didn't found any way other than reading the advertisement register.

I have read the latest comment from Heiner. I will check aqr107!

Thanks
Antonio


^ permalink raw reply

* Re: [PATCH] net: phy: fix auto-negotiation in case of 'down-shift'
From: Russell King - ARM Linux admin @ 2020-11-24 15:37 UTC (permalink / raw)
  To: Antonio Borneo
  Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	netdev, Yonglong Liu, stable, linuxarm, Salil Mehta, linux-stm32,
	linux-kernel
In-Reply-To: <bd83b9c15f6cfed5df90da4f6b50d1a3f479b831.camel@st.com>

On Tue, Nov 24, 2020 at 04:17:42PM +0100, Antonio Borneo wrote:
> On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
> > Userspace doesn't expect the advertising mask to change beneath it.
> > Since updates from userspace are done using a read-modify-write of
> > the ksettings, this can have the undesired effect of removing 1G
> > from the configured advertising mask.
> > 
> > We've had other PHYs have this behaviour; the correct solution is for
> > the PHY driver to implement reading the resolution from the PHY rather
> > than relying on the generic implementation if it can down-shift
> 
> If it's already upstream, could you please point to one of the phy driver
> that already implements this properly?

Reading the resolved information is PHY specific as it isn't
standardised.

Marvell PHYs have read the resolved information for a very long time.
I added support for it to at803x.c:

06d5f3441b2e net: phy: at803x: use operating parameters from PHY-specific status

after it broke for exactly the reason you're reporting for your PHY.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH] bus: mhi: Remove auto-start option
From: Kalle Valo @ 2020-11-24 15:45 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Loic Poulain, linux-arm-msm, linux-wireless, bbhatt, netdev,
	hemantk, ath11k
In-Reply-To: <20201118093107.GC3286@work>

Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:

> On Wed, Nov 18, 2020 at 07:43:48AM +0200, Kalle Valo wrote:
>> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>> 
>> > From: Loic Poulain <loic.poulain@linaro.org>
>> >
>> > There is really no point having an auto-start for channels.
>> > This is confusing for the device drivers, some have to enable the
>> > channels, others don't have... and waste resources (e.g. pre allocated
>> > buffers) that may never be used.
>> >
>> > This is really up to the MHI device(channel) driver to manage the state
>> > of its channels.
>> >
>> > While at it, let's also remove the auto-start option from ath11k mhi
>> > controller.
>> >
>> > Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
>> > Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
>> > [mani: clubbed ath11k change]
>> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
>> 
>> Thanks and feel free to take this to the immutable branch:
>> 
>> Acked-by: Kalle Valo <kvalo@codeaurora.org>
>
> Patch applied to mhi-ath11k-immutable branch and merged into mhi-next.

Tested on QCA6390 hw2.0 and pulled also to ath-next, thanks.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* RE: [PATCH] net: phy: fix auto-negotiation in case of 'down-shift'
From: David Laight @ 2020-11-24 15:46 UTC (permalink / raw)
  To: 'Russell King - ARM Linux admin', Heiner Kallweit
  Cc: Antonio Borneo, Andrew Lunn, David S. Miller, Jakub Kicinski,
	netdev@vger.kernel.org, Yonglong Liu, stable@vger.kernel.org,
	linuxarm@huawei.com, Salil Mehta,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-kernel@vger.kernel.org
In-Reply-To: <20201124151716.GG1551@shell.armlinux.org.uk>

From: Russell King
> Sent: 24 November 2020 15:17
...
> That said, _if_ the PHY has a way to read the resolved state rather
> than reading the advertisement registers, that is what should be
> used (as I said previously) rather than trying to decode the
> advertisement registers ourselves. That is normally more reliable
> for speed and duplex.

Determining the speed and duplux from the ANAR and ANRR (I can't
remember the name of the response register) has always been
completely broken.

The problems arise when you connect to either a 10M hub or
a 10/100M autodetecting hub (these are a 10M hub and a 100M hub
connected by a bridge).
The PHY will either see single link test pulses (10M hub) or
a simple burst of link test pulses (10/100 hub) and fall back
to 10M HDX or 100M HDX.
Both the 10M hub and 10/100 hub are happy with the link test
pulse stream that contains the ANAR.
However the ANRR register will (typically) contain the value
from the last system that sent it one.
So if you unplug from something that does 100M FDX and plug into
a hub the MAC unit is likely to be misconfigured and do FDX.

Of course, there is no generic way to get the actual mode.
I'm not sure the PHY I was using (a long time ago) even had
any private register that could tell you.

For one system (which was never going to do anything fast)
I just removed the FDX modes from the ANAR.
The MAC didn't care whether it was 10M or 100M.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH 1/1] tools/bpftool: fix error return value in build_btf_type_table()
From: Yonghong Song @ 2020-11-24 16:00 UTC (permalink / raw)
  To: Zhen Lei, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh, netdev, bpf,
	linux-kernel
In-Reply-To: <20201124104100.491-1-thunder.leizhen@huawei.com>



On 11/24/20 2:41 AM, Zhen Lei wrote:
> An appropriate return value should be set on the failed path.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

LGTM.

Acked-by: Yonghong Song <yhs@fb.com>

Also this is a bug fix. It should probably be targeted to bpf tree. So,

Fixes: 4d374ba0bf30 ("tools: bpftool: implement "bpftool btf show|list"")

> ---
>   tools/bpf/bpftool/btf.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
> index 8ab142ff5eac..2afb7d5b1aca 100644
> --- a/tools/bpf/bpftool/btf.c
> +++ b/tools/bpf/bpftool/btf.c
> @@ -693,6 +693,7 @@ build_btf_type_table(struct btf_attach_table *tab, enum bpf_obj_type type,
>   		obj_node = calloc(1, sizeof(*obj_node));
>   		if (!obj_node) {
>   			p_err("failed to allocate memory: %s", strerror(errno));
> +			err = -ENOMEM;
>   			goto err_free;
>   		}
>   
> 

^ permalink raw reply

* [PATCH v2 net] ptp: clockmatrix: bug fix for idtcm_strverscmp
From: min.li.xe @ 2020-11-24 16:01 UTC (permalink / raw)
  To: richardcochran; +Cc: netdev, linux-kernel, Min Li

From: Min Li <min.li.xe@renesas.com>

Feed kstrtou8 with NULL terminated string.

Changes since v1:
-Use sscanf to get rid of adhoc string parse.

Signed-off-by: Min Li <min.li.xe@renesas.com>
---
 drivers/ptp/ptp_clockmatrix.c | 53 +++++++++++++++----------------------------
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/drivers/ptp/ptp_clockmatrix.c b/drivers/ptp/ptp_clockmatrix.c
index e020faf..12d939f 100644
--- a/drivers/ptp/ptp_clockmatrix.c
+++ b/drivers/ptp/ptp_clockmatrix.c
@@ -103,43 +103,26 @@ static int timespec_to_char_array(struct timespec64 const *ts,
 	return 0;
 }
 
-static int idtcm_strverscmp(const char *ver1, const char *ver2)
+static int idtcm_strverscmp(const char *version1, const char *version2)
 {
-	u8 num1;
-	u8 num2;
-	int result = 0;
-
-	/* loop through each level of the version string */
-	while (result == 0) {
-		/* extract leading version numbers */
-		if (kstrtou8(ver1, 10, &num1) < 0)
-			return -1;
-
-		if (kstrtou8(ver2, 10, &num2) < 0)
-			return -1;
-
-		/* if numbers differ, then set the result */
-		if (num1 < num2)
-			result = -1;
-		else if (num1 > num2)
-			result = 1;
-		else {
-			/* if numbers are the same, go to next level */
-			ver1 = strchr(ver1, '.');
-			ver2 = strchr(ver2, '.');
-			if (!ver1 && !ver2)
-				break;
-			else if (!ver1)
-				result = -1;
-			else if (!ver2)
-				result = 1;
-			else {
-				ver1++;
-				ver2++;
-			}
-		}
+	u8 ver1[3], ver2[3];
+	int i;
+
+	if (sscanf(version1, "%hhu.%hhu.%hhu",
+		   &ver1[0], &ver1[1], &ver1[2]) < 0)
+		return -1;
+	if (sscanf(version2, "%hhu.%hhu.%hhu",
+		   &ver2[0], &ver2[1], &ver2[2]) < 0)
+		return -1;
+
+	for (i = 0; i < 3; i++) {
+		if (ver1[i] > ver2[i])
+			return 1;
+		if (ver1[i] < ver2[i])
+			return -1;		
 	}
-	return result;
+
+	return 0;		
 }
 
 static int idtcm_xfer_read(struct idtcm *idtcm,
-- 
2.7.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox