* Re: [PATCH net] geneve: add ttl inherit support
From: David Miller @ 2018-09-13 3:38 UTC (permalink / raw)
To: liuhangbin; +Cc: netdev, pshelar, jbenc, sbrivio
In-Reply-To: <1536717861-21590-1-git-send-email-liuhangbin@gmail.com>
From: Hangbin Liu <liuhangbin@gmail.com>
Date: Wed, 12 Sep 2018 10:04:21 +0800
> Similar with commit 72f6d71e491e6 ("vxlan: add ttl inherit support"),
> currently ttl == 0 means "use whatever default value" on geneve instead
> of inherit inner ttl. To respect compatibility with old behavior, let's
> add a new IFLA_GENEVE_TTL_INHERIT for geneve ttl inherit support.
>
> Reported-by: Jianlin Shi <jishi@redhat.com>
> Suggested-by: Jiri Benc <jbenc@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Applied to net-next.
^ permalink raw reply
* [PATCH 1/2] netlink: add NLA_REJECT policy type
From: Johannes Berg @ 2018-09-13 8:46 UTC (permalink / raw)
To: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
Cc: Michal Kubecek, Johannes Berg
From: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
In some situations some netlink attributes may be used for output
only (kernel->userspace) or may be reserved for future use. It's
then helpful to be able to prevent userspace from using them in
messages sent to the kernel, since they'd otherwise be ignored and
any future will become impossible if this happens.
Add NLA_REJECT to the policy which does nothing but reject (with
EINVAL) validation of any messages containing this attribute.
Allow for returning a specific extended ACK error message in the
validation_data pointer.
While at it fix the indentation of NLA_BITFIELD32 and describe the
validation_data pointer for it.
The specific case I have in mind now is a shared nested attribute
containing request/response data, and it would be pointless and
potentially confusing to have userspace include response data in
the messages that actually contain a request.
Signed-off-by: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
include/net/netlink.h | 6 +++++-
lib/nlattr.c | 22 +++++++++++++++-------
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 0c154f98e987..04e40fcc70d6 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -180,6 +180,7 @@ enum {
NLA_S32,
NLA_S64,
NLA_BITFIELD32,
+ NLA_REJECT,
__NLA_TYPE_MAX,
};
@@ -208,7 +209,10 @@ enum {
* NLA_MSECS Leaving the length field zero will verify the
* given type fits, using it verifies minimum length
* just like "All other"
- * NLA_BITFIELD32 A 32-bit bitmap/bitselector attribute
+ * NLA_BITFIELD32 A 32-bit bitmap/bitselector attribute, validation
+ * data must point to a u32 value of valid flags
+ * NLA_REJECT Reject this attribute, validation data may point
+ * to a string to report as the error in extended ACK.
* All other Minimum length of attribute payload
*
* Example:
diff --git a/lib/nlattr.c b/lib/nlattr.c
index e335bcafa9e4..56e0aae5cf23 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -69,7 +69,8 @@ static int validate_nla_bitfield32(const struct nlattr *nla,
}
static int validate_nla(const struct nlattr *nla, int maxtype,
- const struct nla_policy *policy)
+ const struct nla_policy *policy,
+ struct netlink_ext_ack *extack)
{
const struct nla_policy *pt;
int minlen = 0, attrlen = nla_len(nla), type = nla_type(nla);
@@ -87,6 +88,11 @@ static int validate_nla(const struct nlattr *nla, int maxtype,
}
switch (pt->type) {
+ case NLA_REJECT:
+ if (pt->validation_data && extack)
+ extack->_msg = pt->validation_data;
+ return -EINVAL;
+
case NLA_FLAG:
if (attrlen > 0)
return -ERANGE;
@@ -180,11 +186,10 @@ int nla_validate(const struct nlattr *head, int len, int maxtype,
int rem;
nla_for_each_attr(nla, head, len, rem) {
- int err = validate_nla(nla, maxtype, policy);
+ int err = validate_nla(nla, maxtype, policy, extack);
if (err < 0) {
- if (extack)
- extack->bad_attr = nla;
+ NL_SET_BAD_ATTR(extack, nla);
return err;
}
}
@@ -251,10 +256,13 @@ int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head,
if (type > 0 && type <= maxtype) {
if (policy) {
- err = validate_nla(nla, maxtype, policy);
+ err = validate_nla(nla, maxtype, policy,
+ extack);
if (err < 0) {
- NL_SET_ERR_MSG_ATTR(extack, nla,
- "Attribute failed policy validation");
+ NL_SET_BAD_ATTR(extack, nla);
+ if (extack && !extack->_msg)
+ NL_SET_ERR_MSG(extack,
+ "Attribute failed policy validation");
goto errout;
}
}
--
2.14.4
^ permalink raw reply related
* Re: [PATCH] net: dsa: mv88e6xxx: Make sure to configure ports with external PHYs
From: David Miller @ 2018-09-13 3:36 UTC (permalink / raw)
To: marex; +Cc: netdev, andrew
In-Reply-To: <20180911221524.12938-1-marex@denx.de>
From: Marek Vasut <marex@denx.de>
Date: Wed, 12 Sep 2018 00:15:24 +0200
> The MV88E6xxx can have external PHYs attached to certain ports and those
> PHYs could even be on different MDIO bus than the one within the switch.
> This patch makes sure that ports with such PHYs are configured correctly
> according to the information provided by the PHY.
>
> Signed-off-by: Marek Vasut <marex@denx.de>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH] net: ethernet: Use DIV_ROUND_UP instead of reimplementing its function
From: David Miller @ 2018-09-13 3:35 UTC (permalink / raw)
To: zhongjiang
Cc: claudiu.manoil, tariqt, saeedm, derek.chickles, leon, jdmason,
netdev, linux-kernel
In-Reply-To: <1536671295-14432-1-git-send-email-zhongjiang@huawei.com>
From: zhong jiang <zhongjiang@huawei.com>
Date: Tue, 11 Sep 2018 21:08:15 +0800
> DIV_ROUND_UP has implemented the code-opened function. Therefore, just
> replace the implementation with DIV_ROUND_UP.
>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH net-next] qlcnic: Remove set but not used variables 'fw_mbx' and 'hdr_size'
From: David Miller @ 2018-09-13 3:33 UTC (permalink / raw)
To: yuehaibing
Cc: harish.patil, manish.chopra, Dept-GELinuxNICDev, netdev,
kernel-janitors
In-Reply-To: <1536666689-3906-1-git-send-email-yuehaibing@huawei.com>
From: YueHaibing <yuehaibing@huawei.com>
Date: Tue, 11 Sep 2018 11:51:29 +0000
> From: Yue Haibing <yuehaibing@huawei.com>
>
> Fixes gcc '-Wunused-but-set-variable' warning:
>
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c: In function 'qlcnic_sriov_pull_bc_msg':
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c:907:6: warning:
> variable 'fw_mbx' set but not used [-Wunused-but-set-variable]
>
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c: In function 'qlcnic_sriov_issue_bc_post':
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c:939:16: warning:
> variable 'hdr_size' set but not used [-Wunused-but-set-variable]
>
> Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v3 net-next 00/12] Preparing for phylib limkmodes
From: David Miller @ 2018-09-13 3:24 UTC (permalink / raw)
To: andrew; +Cc: netdev, f.fainelli
In-Reply-To: <1536709999-13420-1-git-send-email-andrew@lunn.ch>
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 12 Sep 2018 01:53:07 +0200
> phylib currently makes us of a u32 bitmap for advertising, supported,
> and link partner capabilities. For a long time, this has been
> sufficient, for devices up to 1Gbps. With more MAC/PHY combinations
> now supporting speeds greater than 1Gbps, we have run out of
> bits. There is the need to replace this u32 with an
> __ETHTOOL_DECLARE_LINK_MODE_MASK, which makes use of linux's generic
> bitmaps.
>
> This patchset does some of the work preparing for this change. A few
> cleanups are applied to PHY drivers. Some MAC drivers directly access
> members of phydev which are going to change type. These patches adds
> some helpers and swaps MAC drivers to use them, mostly dealing with
> Pause configuration.
>
> v3:
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> Add missing at in commit message
> Change Subject of patch 5
> Fix return in from phy_set_asym_pause
> Fix kerneldoc in phy_set_pause
>
> v2:
> Fixup bad indentation in tg3.c
> Rename phy_support_pause() to phy_support_sym_pause()
> Also trigger autoneg if the advertising settings have changed.
> Rename phy_set_pause() to phy_set_sym_pause()
> Use the bcm63xx_enet.c logic, not fec_main.c for validating pause
Series applied, thanks Andrew.
^ permalink raw reply
* Re: [PATCH net-next 5/5] ebpf: Add sample ebpf program for SOCKET_SG_FILTER
From: Alexei Starovoitov @ 2018-09-13 2:07 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Tushar Dave, ast, daniel, davem, santosh.shilimkar,
jakub.kicinski, quentin.monnet, jiong.wang, sandipan,
john.fastabend, kafai, rdna, yhs, netdev, rds-devel
In-Reply-To: <20180913005954.GA30305@oracle.com>
On Wed, Sep 12, 2018 at 08:59:54PM -0400, Sowmini Varadhan wrote:
> > On 09/11/2018 09:00 PM, Alexei Starovoitov wrote:
> > >please no samples.
> > >Add this as proper test to tools/testing/selftests/bpf
> > >that reports PASS/FAIL and can be run automatically.
> > >samples/bpf is effectively dead code.
>
> Just a second.
>
> You do realize that RDS is doing real networking, so it needs
> RDMA capable hardware to test the rds_rdma paths? Also, when we
> "talk to ourselves" we default to the rds_loop transport, so
> we would even bypass the rds-tcp module.
>
> I dont think this can be tested with some academic "test it
> over lo0" exercise.. I suppose you can add example code in
> sefltests for this, but asking for a "proper test" may be
> a litte unrealistic here- a proper test needs proper hardware
> in this case.
I didn't know that. The way I understand your statement that
this new program type, new sg logic, and all the complexity
are only applicable to RDMA capable hw and RDS.
In such case, I think, such BPF extensions do not belong
in the upstream kernel. I don't want BPF to support niche technologies,
since the maintenance overhead makes it prohibitive long term.
If the kernel had a way to deprecate the api it would have been
different story. Every new feature and bpf extension lands once
and then being maintained forever. Hence we have to be very
selective weighting the benefits vs long term maintenance.
I'm happy to review the code further and suggest improvements,
but it's not going to be applied.
^ permalink raw reply
* Re: [PATCH v3] PCI: Reprogram bridge prefetch registers on resume
From: Rafael J. Wysocki @ 2018-09-13 6:52 UTC (permalink / raw)
To: Daniel Drake
Cc: andy.shevchenko-VuQAYsv1563Yd54FQh9/CA, Linux PM, Linux PCI,
Rafael Wysocki, nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ, Keith Busch,
netdev, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Bjorn Helgaas,
rchang-eYqpPyKDWXRBDgjK7y7TUQ, Linux Upstreaming Team,
David Miller, jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
Heiner Kallweit
In-Reply-To: <20180913033745.11178-1-drake-6IF/jdPJHihWk0Htik3J/w@public.gmane.org>
On Thu, Sep 13, 2018 at 5:37 AM Daniel Drake <drake@endlessm.com> wrote:
>
> On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
>
> fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04
> [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
> DRM: failed to idle channel 0 [DRM]
>
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
>
> Runtime suspend/resume works fine, only S3 suspend is affected.
>
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> the cases that I checked, this register has value 0 and we just have to
> rewrite that value.
>
> Linux already saves and restores PCI config space during suspend/resume,
> but this register was being skipped because upon resume, it already
> has value 0 (the correct, pre-suspend value).
>
> Intel appear to have previously acknowledged this behaviour and the
> requirement to rewrite this register.
> https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
>
> Based on that, rewrite the prefetch register values even when that
> appears unnecessary.
>
> We have confirmed this solution on all the affected models we have
> in-hands (X542UQ, UX533FD, X530UN, V272UN).
>
> Additionally, this solves an issue where r8169 MSI-X interrupts were
> broken after S3 suspend/resume on Asus X441UAR. This issue was recently
> worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
> RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
> Aimfor-tech laptop that we had not yet patched. I suspect it will also
> fix the issue that was worked around in commit 7c53a722459c ("r8169:
> don't use MSI-X on RTL8168g").
>
> Thomas Martitz reports that this change also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
> Signed-off-by: Daniel Drake <drake@endlessm.com>
Still
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/pci/pci.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 29ff9619b5fa..5d58220b6997 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1289,12 +1289,12 @@ int pci_save_state(struct pci_dev *dev)
> EXPORT_SYMBOL(pci_save_state);
>
> static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> - u32 saved_val, int retry)
> + u32 saved_val, int retry, bool force)
> {
> u32 val;
>
> pci_read_config_dword(pdev, offset, &val);
> - if (val == saved_val)
> + if (!force && val == saved_val)
> return;
>
> for (;;) {
> @@ -1313,25 +1313,34 @@ static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> }
>
> static void pci_restore_config_space_range(struct pci_dev *pdev,
> - int start, int end, int retry)
> + int start, int end, int retry,
> + bool force)
> {
> int index;
>
> for (index = end; index >= start; index--)
> pci_restore_config_dword(pdev, 4 * index,
> pdev->saved_config_space[index],
> - retry);
> + retry, force);
> }
>
> static void pci_restore_config_space(struct pci_dev *pdev)
> {
> if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
> - pci_restore_config_space_range(pdev, 10, 15, 0);
> + pci_restore_config_space_range(pdev, 10, 15, 0, false);
> /* Restore BARs before the command register. */
> - pci_restore_config_space_range(pdev, 4, 9, 10);
> - pci_restore_config_space_range(pdev, 0, 3, 0);
> + pci_restore_config_space_range(pdev, 4, 9, 10, false);
> + pci_restore_config_space_range(pdev, 0, 3, 0, false);
> + } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> + pci_restore_config_space_range(pdev, 12, 15, 0, false);
> + /* Force rewriting of prefetch registers to avoid
> + * S3 resume issues on Intel PCI bridges that occur when
> + * these registers are not explicitly written.
> + */
> + pci_restore_config_space_range(pdev, 9, 11, 0, true);
> + pci_restore_config_space_range(pdev, 0, 8, 0, false);
> } else {
> - pci_restore_config_space_range(pdev, 0, 15, 0);
> + pci_restore_config_space_range(pdev, 0, 15, 0, false);
> }
> }
>
> --
> 2.17.1
>
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Milan Broz @ 2018-09-13 6:39 UTC (permalink / raw)
To: Andy Lutomirski, Ard Biesheuvel
Cc: Jason A. Donenfeld, LKML, Netdev, David Miller,
Greg Kroah-Hartman, Samuel Neves, Jean-Philippe Aumasson,
Linux Crypto Mailing List
In-Reply-To: <CALCETrU2qkLVaxC=cpU5iAeT5B7xGsR+m2ZWtLVK37jMMWtcAA@mail.gmail.com>
On 13/09/18 01:45, Andy Lutomirski wrote:
> On Wed, Sep 12, 2018 at 3:56 PM, Ard Biesheuvel
...
> b) Crypto that is used dynamically. This includes dm-crypt
> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
> lot of IPSEC stuff, possibly KCM, and probably many more. These will
> get comparatively little benefit from being converted to a zinc-like
> interface. For some of these cases, it wouldn't make any sense at all
> to convert them. Certainly the ones that do async hardware crypto
> using DMA engines will never look at all like zinc, even under the
> hood.
Please note, that dm-crypt now uses not only block ciphers and modes,
but also authenticated encryption and hashes (for ESSIV and HMAC
in authenticated composed modes) and RNG (for random IV).
We use crypto API, including async variants (I hope correctly :)
There is a long time battle to move initialization vectors generators
from dm-crypt to crypto API. If there are any plans to use a new library,
this issue should be discussed as well.
(Some dm-crypt IV generators are disk encryption specific, some do more
that just IV so porting is not straightforward etc).
Related problem here is an optimization of chain of sectors encryption -
if we have new crypto API, it would be nice if can take chain of sectors
so possible implementation can process this chain in one batch
(every sector need to be tweaked by differently generated IV - and we
are back in problem above).
I think filesystem encryption uses the same pattern.
And btw, we use the same algorithms through AF_ALG in userspace (cryptsetup).
So please, if you mention dm-crypt, note that it is very complex
crypto API consumer :) And everything is dynamic, configurable through
dm-crypt options.
That said, I would be more than happy to help in experiments to porting dm-crypt
to any other crypto library, but if it doesn't not help with problems
mentioned above, I do not see any compelling reason for the new library for dm-crypt...
Milan
^ permalink raw reply
* Re: [PATCH net-next 5/5] ebpf: Add sample ebpf program for SOCKET_SG_FILTER
From: Sowmini Varadhan @ 2018-09-13 0:59 UTC (permalink / raw)
To: Tushar Dave
Cc: Alexei Starovoitov, ast, daniel, davem, santosh.shilimkar,
jakub.kicinski, quentin.monnet, jiong.wang, sandipan,
john.fastabend, kafai, rdna, yhs, netdev, rds-devel
In-Reply-To: <74f959c3-27ef-a67c-6a54-599d84cde90b@oracle.com>
> On 09/11/2018 09:00 PM, Alexei Starovoitov wrote:
> >please no samples.
> >Add this as proper test to tools/testing/selftests/bpf
> >that reports PASS/FAIL and can be run automatically.
> >samples/bpf is effectively dead code.
Just a second.
You do realize that RDS is doing real networking, so it needs
RDMA capable hardware to test the rds_rdma paths? Also, when we
"talk to ourselves" we default to the rds_loop transport, so
we would even bypass the rds-tcp module.
I dont think this can be tested with some academic "test it
over lo0" exercise.. I suppose you can add example code in
sefltests for this, but asking for a "proper test" may be
a litte unrealistic here- a proper test needs proper hardware
in this case.
--Sowmini
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Ard Biesheuvel @ 2018-09-13 5:41 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Jason A. Donenfeld, LKML, Netdev, David Miller,
Greg Kroah-Hartman, Samuel Neves, Jean-Philippe Aumasson,
Linux Crypto Mailing List
In-Reply-To: <CALCETrU2qkLVaxC=cpU5iAeT5B7xGsR+m2ZWtLVK37jMMWtcAA@mail.gmail.com>
On 13 September 2018 at 01:45, Andy Lutomirski <luto@kernel.org> wrote:
> On Wed, Sep 12, 2018 at 3:56 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> I realize you've put a lot of good and hard work into the existing
>> I am also concerned about your claim that all software algorithms will
>> be moved into this crypto library. You are not specific about whose
>> responsibility it will be that this is going to happen in a timely
>> fashion. But more importantly, it is not clear at all how you expect
>> this to work for, e.g., h/w instruction based SHAxxx or AES in various
>> chaining modes, which should be used only on cores that implement
>> those instructions (note that on arm64, we have optional instructions
>> for AES, PMULL, SHA1, SHA256, SHA512, SHA3, SM3 and SM4). Are all
>> those implementations (only few of which will be used on a certain
>> core) going to be part of the monolithic library? What are the APIs
>> going to look like for block ciphers, taking chaining modes into
>> account?
>
> I'm not convinced that there's any real need for *all* crypto
> algorithms to move into lib/zinc or to move at all. As I see it,
> there are two classes of crypto algorithms in the kernel:
>
> a) Crypto that is used by code that chooses its algorithm statically
> and wants synchronous operations. These include everything in
> drivers/char/random.c, but also a bunch of various networking things
> that are hardcoded and basically everything that uses stack buffers.
> (This means it includes all the code that I broke when I did
> VMAP_STACK. Sign.)
>
> b) Crypto that is used dynamically. This includes dm-crypt
> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
> lot of IPSEC stuff, possibly KCM, and probably many more. These will
> get comparatively little benefit from being converted to a zinc-like
> interface. For some of these cases, it wouldn't make any sense at all
> to convert them. Certainly the ones that do async hardware crypto
> using DMA engines will never look at all like zinc, even under the
> hood.
>
> I think that, as a short-term goal, it makes a lot of sense to have
> implementations of the crypto that *new* kernel code (like Wireguard)
> wants to use in style (a) that live in /lib, and it obviously makes
> sense to consolidate their implementations with the crypto/
> implementations in a timely manner. As a medium-term goal, adding
> more algorithms as needed for things that could use the simpler APIs
> (Bluetooth, perhaps) would make sense.
>
> But I see no reason at all that /lib should ever contain a grab-bag of
> crypto implementations just for the heck of it. They should have real
> in-kernel users IMO. And this means that there will probably always
> be some crypto implementations in crypto/ for things like aes-xts.
>
But one of the supposed selling points of this crypto library is that
it gives engineers who are frightened of crypto in general and the
crypto API in particular simple and easy to use crypto primitives
rather than having to jump through the crypto API's hoops.
A crypto library whose only encryption algorithm is a stream cipher
does *not* deliver on that promise, since it is only suitable for
cases where IVs are guaranteed not to be reused. You yourself were
bitten by the clunkiness of the crypto API when attempting to use the
SHA26 code, right? So shouldn't we move that into this crypto library
as well?
I think it is reasonable for WireGuard to standardize on
ChaCha20/Poly1305 only, although I have my concerns about the flag day
that will be required if this 'one true cipher' ever does turn out to
be compromised (either that, or we will have to go back in time and
add some kind of protocol versioning to existing deployments of
WireGuard)
And frankly, if the code were as good as the prose, we wouldn't be
having this discussion. Zinc adds its own clunky ways to mix arch and
generic code, involving GCC -include command line arguments and
#ifdefs everywhere. My review comments on this were completely ignored
by Jason.
^ permalink raw reply
* Re: [PATCH iproute2] iproute2: fix use-after-free
From: Stephen Hemminger @ 2018-09-13 0:33 UTC (permalink / raw)
To: Mahesh Bandewar; +Cc: netdev, Mahesh Bandewar
In-Reply-To: <20180912232928.166085-1-mahesh@bandewar.net>
On Wed, 12 Sep 2018 16:29:28 -0700
Mahesh Bandewar <mahesh@bandewar.net> wrote:
> From: Mahesh Bandewar <maheshb@google.com>
>
> A local program using iproute2 lib pointed out the issue and looking
> at the code it is pretty obvious -
>
> a = (struct nlmsghdr *)b;
> ...
> free(b);
> if (a->nlmsg_seq == seq)
> ...
>
> Fixes: 86bf43c7c2fd ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Yes, this is a real problem.
Maybe a minimal patch like this would be enough:
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 928de1dd16d8..ab2d8452e4a1 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -661,6 +661,8 @@ next:
if (l < sizeof(struct nlmsgerr)) {
fprintf(stderr, "ERROR truncated\n");
} else if (!err->error) {
+ __u32 err_seq = h->nlmsg_seq;
+
/* check messages from kernel */
nl_dump_ext_ack(h, errfn);
@@ -668,7 +670,8 @@ next:
*answer = (struct nlmsghdr *)buf;
else
free(buf);
- if (h->nlmsg_seq == seq)
+
+ if (err_seq == seq)
return 0;
else if (i < iovlen)
goto next;
^ permalink raw reply related
* [PATCH net-next V2] virtio_net: ethtool tx napi configuration
From: Jason Wang @ 2018-09-13 5:35 UTC (permalink / raw)
To: mst, jasowang
Cc: netdev, Willem de Bruijn, davem, linux-kernel, virtualization
Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
Interrupt moderation is currently not supported, so these accept and
display the default settings of 0 usec and 1 frame.
Toggle tx napi through a bit in tx-frames. So as to not interfere
with possible future interrupt moderation, value 1 means tx napi while
value 0 means not.
To properly synchronize with the data path, tx napi is disabled and
tx lock is held when changing the value of napi weight. And two more
places that can access tx napi weight:
- speculative tx polling in rx napi, we can leave it as is since it
not a must for correctness.
- skb_xmit_done(), one more check of napi weight is added before
trying to enable tx to avoid tx to be disabled forever if napi is
disabled after skb_xmit_done() but before the napi
Link: https://patchwork.ozlabs.org/patch/948149/
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from V1:
- try to synchronize with datapath to allow changing mode when
interface is up.
- use tx-frames 0 as to disable tx napi while tx-frames 1 to enable tx napi
---
drivers/net/virtio_net.c | 64 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 63 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 765920905226..6e70864f5899 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -66,6 +66,8 @@ DECLARE_EWMA(pkt_len, 0, 64)
#define VIRTNET_DRIVER_VERSION "1.0.0"
+static const u32 ethtool_coalesce_napi_mask = (1UL << 10);
+
static const unsigned long guest_offloads[] = {
VIRTIO_NET_F_GUEST_TSO4,
VIRTIO_NET_F_GUEST_TSO6,
@@ -1444,7 +1446,10 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
virtqueue_napi_complete(napi, sq->vq, 0);
- if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+ /* Check napi.weight to avoid tx stall since it could be set
+ * to zero by ethtool after skb_xmit_done().
+ */
+ if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS || !sq->napi.weight)
netif_tx_wake_queue(txq);
return 0;
@@ -2181,6 +2186,61 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
return 0;
}
+static int virtnet_set_coalesce(struct net_device *dev,
+ struct ethtool_coalesce *ec)
+{
+ struct ethtool_coalesce ec_default = {
+ .cmd = ETHTOOL_SCOALESCE,
+ .rx_max_coalesced_frames = 1,
+ };
+ struct virtnet_info *vi = netdev_priv(dev);
+ int i, napi_weight;
+
+ if (ec->tx_max_coalesced_frames > 1)
+ return -EINVAL;
+
+ ec_default.tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
+ napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
+
+ /* disallow changes to fields not explicitly tested above */
+ if (memcmp(ec, &ec_default, sizeof(ec_default)))
+ return -EINVAL;
+
+ if (napi_weight ^ vi->sq[0].napi.weight) {
+ for (i = 0; i < vi->max_queue_pairs; i++) {
+ struct netdev_queue *txq =
+ netdev_get_tx_queue(vi->dev, i);
+
+ virtnet_napi_tx_disable(&vi->sq[i].napi);
+ __netif_tx_lock_bh(txq);
+ vi->sq[i].napi.weight = napi_weight;
+ __netif_tx_unlock_bh(txq);
+ virtnet_napi_tx_enable(vi, vi->sq[i].vq,
+ &vi->sq[i].napi);
+ }
+ }
+
+ return 0;
+}
+
+static int virtnet_get_coalesce(struct net_device *dev,
+ struct ethtool_coalesce *ec)
+{
+ struct ethtool_coalesce ec_default = {
+ .cmd = ETHTOOL_GCOALESCE,
+ .rx_max_coalesced_frames = 1,
+ .tx_max_coalesced_frames = 0,
+ };
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ memcpy(ec, &ec_default, sizeof(ec_default));
+
+ if (vi->sq[0].napi.weight)
+ ec->tx_max_coalesced_frames = 1;
+
+ return 0;
+}
+
static void virtnet_init_settings(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -2219,6 +2279,8 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
.get_ts_info = ethtool_op_get_ts_info,
.get_link_ksettings = virtnet_get_link_ksettings,
.set_link_ksettings = virtnet_set_link_ksettings,
+ .set_coalesce = virtnet_set_coalesce,
+ .get_coalesce = virtnet_get_coalesce,
};
static void virtnet_freeze_down(struct virtio_device *vdev)
--
2.17.1
^ permalink raw reply related
* Re: [PATCH bpf-next 11/11] Documentation: Describe bpf reference tracking
From: Alexei Starovoitov @ 2018-09-13 0:12 UTC (permalink / raw)
To: Joe Stringer
Cc: daniel, netdev, ast, john.fastabend, tgraf, kafai, nitin.hande,
mauricio.vasquez
In-Reply-To: <20180912003640.28316-12-joe@wand.net.nz>
On Tue, Sep 11, 2018 at 05:36:40PM -0700, Joe Stringer wrote:
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
just few words in commit log would be better than nothing.
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH bpf-next 10/11] selftests/bpf: Add C tests for reference tracking
From: Alexei Starovoitov @ 2018-09-13 0:11 UTC (permalink / raw)
To: Joe Stringer
Cc: daniel, netdev, ast, john.fastabend, tgraf, kafai, nitin.hande,
mauricio.vasquez
In-Reply-To: <20180912003640.28316-11-joe@wand.net.nz>
On Tue, Sep 11, 2018 at 05:36:39PM -0700, Joe Stringer wrote:
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
really nice set of tests.
please describe them briefly in commit log.
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH bpf-next 09/11] libbpf: Support loading individual progs
From: Alexei Starovoitov @ 2018-09-13 0:10 UTC (permalink / raw)
To: Joe Stringer
Cc: daniel, netdev, ast, john.fastabend, tgraf, kafai, nitin.hande,
mauricio.vasquez
In-Reply-To: <20180912003640.28316-10-joe@wand.net.nz>
On Tue, Sep 11, 2018 at 05:36:38PM -0700, Joe Stringer wrote:
> Allow the individual program load to be invoked. This will help with
> testing, where a single ELF may contain several sections, some of which
> denote subprograms that are expected to fail verification, along with
> some which are expected to pass verification. By allowing programs to be
> iterated and individually loaded, each program can be independently
> checked against its expected verification result.
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH bpf-next 08/11] selftests/bpf: Add tests for reference tracking
From: Alexei Starovoitov @ 2018-09-13 0:09 UTC (permalink / raw)
To: Joe Stringer
Cc: daniel, netdev, ast, john.fastabend, tgraf, kafai, nitin.hande,
mauricio.vasquez
In-Reply-To: <20180912003640.28316-9-joe@wand.net.nz>
On Tue, Sep 11, 2018 at 05:36:37PM -0700, Joe Stringer wrote:
> reference tracking: leak potential reference
> reference tracking: leak potential reference on stack
> reference tracking: leak potential reference on stack 2
> reference tracking: zero potential reference
> reference tracking: copy and zero potential references
> reference tracking: release reference without check
> reference tracking: release reference
> reference tracking: release reference twice
> reference tracking: release reference twice inside branch
> reference tracking: alloc, check, free in one subbranch
> reference tracking: alloc, check, free in both subbranches
> reference tracking in call: free reference in subprog
> reference tracking in call: free reference in subprog and outside
> reference tracking in call: alloc & leak reference in subprog
> reference tracking in call: alloc in subprog, release outside
> reference tracking in call: sk_ptr leak into caller stack
> reference tracking in call: sk_ptr spill into caller stack
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
...
> + "reference tracking in call: alloc in subprog, release outside",
> + .insns = {
> + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
> + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 5),
> + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
> + BPF_MOV64_IMM(BPF_REG_2, 0),
> + BPF_EMIT_CALL(BPF_FUNC_sk_release),
> + BPF_EXIT_INSN(),
> +
> + /* subprog 1 */
> + BPF_SK_LOOKUP,
> + BPF_EXIT_INSN(), /* return sk */
> + },
Thanks a lot for adding subprog tests and checking that return to the caller
and spill works too.
Awesome stuff!
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
From: Alexei Starovoitov @ 2018-09-13 0:06 UTC (permalink / raw)
To: Joe Stringer
Cc: daniel, netdev, ast, john.fastabend, tgraf, kafai, nitin.hande,
mauricio.vasquez
In-Reply-To: <20180912003640.28316-8-joe@wand.net.nz>
On Tue, Sep 11, 2018 at 05:36:36PM -0700, Joe Stringer wrote:
> This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and
> bpf_sk_lookup_udp() which allows BPF programs to find out if there is a
> socket listening on this host, and returns a socket pointer which the
> BPF program can then access to determine, for instance, whether to
> forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the
> socket, so when a BPF program makes use of this function, it must
> subsequently pass the returned pointer into the newly added sk_release()
> to return the reference.
>
> By way of example, the following pseudocode would filter inbound
> connections at XDP if there is no corresponding service listening for
> the traffic:
>
> struct bpf_sock_tuple tuple;
> struct bpf_sock_ops *sk;
>
> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
> sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0);
...
> +struct bpf_sock_tuple {
> + union {
> + __be32 ipv6[4];
> + __be32 ipv4;
> + } saddr;
> + union {
> + __be32 ipv6[4];
> + __be32 ipv4;
> + } daddr;
> + __be16 sport;
> + __be16 dport;
> + __u8 family;
> +};
since we can pass ptr_to_packet into map lookup and other helpers now,
can you move 'family' out of bpf_sock_tuple and combine with netns_id arg?
then progs wouldn't need to copy bytes from the packet into tuple
to do a lookup.
The rest looks great to me.
^ permalink raw reply
* Re: [PATCH net-next v3 01/17] asm: simd context helper API
From: Kevin Easton @ 2018-09-13 5:03 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: LKML, Netdev, David Miller, Greg Kroah-Hartman, Andrew Lutomirski,
Thomas Gleixner, Samuel Neves, linux-arch
In-Reply-To: <CAHmME9pmgu_xWJ0rvdS4RhzwjKPiNdkQ9nc6mqTNeb6A8xT8Eg@mail.gmail.com>
On Wed, Sep 12, 2018 at 08:10:41PM +0200, Jason A. Donenfeld wrote:
> On Wed, Sep 12, 2018 at 8:14 AM Kevin Easton <kevin@guarana.org> wrote:
> > Given that it's always supposed to be used like that, mightn't it be
> > better if simd_relax() took a pointer to the context, so the call is
> > just
> >
> > simd_relax(&simd_context);
> >
> > ?
> >
> > The inlining means that there won't actually be a pointer dereference in
> > the emitted code.
> >
> > If simd_put() also took a pointer then it could set the context back to
> > HAVE_NO_SIMD as well?
>
> That's sort of a neat idea. I guess in this scheme, you'd envision:
>
> simd_context_t simd_context;
>
> simd_get(&simd_context);
> simd_relax(&simd_context);
> simd_put(&simd_context);
>
> And this way, if simd_context ever becomes a heavier struct, it can be
> modified in place rather than returned by value from the function. On
> the other hand, it's a little bit more annoying to type and makes it
> harder to do declaration and initialization on the same line.
Yes. It's also how most get/put APIs already work in the kernel, eg
kref_get/put (mostly because they tend to be 'getting/putting' an
already-initialized object, though).
- Kevin
^ permalink raw reply
* [PATCH iproute2] iproute2: fix use-after-free
From: Mahesh Bandewar @ 2018-09-12 23:29 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Mahesh Bandewar
From: Mahesh Bandewar <maheshb@google.com>
A local program using iproute2 lib pointed out the issue and looking
at the code it is pretty obvious -
a = (struct nlmsghdr *)b;
...
free(b);
if (a->nlmsg_seq == seq)
...
Fixes: 86bf43c7c2fd ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
lib/libnetlink.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 928de1dd16d8..016a5f0bcfb6 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -661,17 +661,24 @@ next:
if (l < sizeof(struct nlmsgerr)) {
fprintf(stderr, "ERROR truncated\n");
} else if (!err->error) {
+ unsigned int tmp_seq;
+
/* check messages from kernel */
nl_dump_ext_ack(h, errfn);
- if (answer)
+ tmp_seq = h->nlmsg_seq;
+ if (answer) {
*answer = (struct nlmsghdr *)buf;
- else
+ } else {
free(buf);
- if (h->nlmsg_seq == seq)
+ buf = NULL;
+ }
+ if (tmp_seq == seq) {
return 0;
- else if (i < iovlen)
+ } else if (i < iovlen) {
+ free(buf);
goto next;
+ }
return 0;
}
--
2.19.0.rc2.392.g5ba43deb5a-goog
^ permalink raw reply related
* [PATCH net-next RFC] virtio_net: ethtool tx napi configuration
From: Willem de Bruijn @ 2018-09-12 23:29 UTC (permalink / raw)
To: netdev; +Cc: jasowang, mst, f.fainelli, Willem de Bruijn
From: Willem de Bruijn <willemb@google.com>
Implement ethtool .set_priv_flags and .get_priv_flags handlers
and use ethtool private flags to toggle transmit napi:
ethtool --set-priv-flags eth0 tx-napi on
ethtool --show-priv-flags eth0
Link: https://patchwork.ozlabs.org/patch/948149/
Suggested-by: Jason Wang <jasowang@redhat.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
drivers/net/virtio_net.c | 49 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 765920905226..9ca7e0a0f0d9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -73,6 +73,10 @@ static const unsigned long guest_offloads[] = {
VIRTIO_NET_F_GUEST_UFO
};
+static const char virtnet_ethtool_priv_flags[][ETH_GSTRING_LEN] = {
+ "tx-napi",
+};
+
struct virtnet_stat_desc {
char desc[ETH_GSTRING_LEN];
size_t offset;
@@ -2059,6 +2063,9 @@ static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *data)
}
}
break;
+ case ETH_SS_PRIV_FLAGS:
+ memcpy(data, virtnet_ethtool_priv_flags,
+ sizeof(virtnet_ethtool_priv_flags));
}
}
@@ -2070,6 +2077,9 @@ static int virtnet_get_sset_count(struct net_device *dev, int sset)
case ETH_SS_STATS:
return vi->curr_queue_pairs * (VIRTNET_RQ_STATS_LEN +
VIRTNET_SQ_STATS_LEN);
+ case ETH_SS_PRIV_FLAGS:
+ return ARRAY_SIZE(virtnet_ethtool_priv_flags);
+
default:
return -EOPNOTSUPP;
}
@@ -2181,6 +2191,43 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
return 0;
}
+static int virtnet_set_priv_flags(struct net_device *dev, u32 priv_flags)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ int i, napi_weight;
+
+ napi_weight = priv_flags & 0x1 ? NAPI_POLL_WEIGHT : 0;
+
+ if (napi_weight ^ vi->sq[0].napi.weight) {
+ for (i = 0; i < vi->max_queue_pairs; i++) {
+ struct netdev_queue *txq =
+ netdev_get_tx_queue(vi->dev, i);
+
+ virtnet_napi_tx_disable(&vi->sq[i].napi);
+ __netif_tx_lock_bh(txq);
+ vi->sq[i].napi.weight = napi_weight;
+ if (!napi_weight)
+ virtqueue_enable_cb(vi->sq[i].vq);
+ __netif_tx_unlock_bh(txq);
+ virtnet_napi_tx_enable(vi, vi->sq[i].vq,
+ &vi->sq[i].napi);
+ }
+ }
+
+ return 0;
+}
+
+static u32 virtnet_get_priv_flags(struct net_device *dev)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ int priv_flags = 0;
+
+ if (vi->sq[0].napi.weight)
+ priv_flags |= 0x1;
+
+ return priv_flags;
+}
+
static void virtnet_init_settings(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -2219,6 +2266,8 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
.get_ts_info = ethtool_op_get_ts_info,
.get_link_ksettings = virtnet_get_link_ksettings,
.set_link_ksettings = virtnet_set_link_ksettings,
+ .set_priv_flags = virtnet_set_priv_flags,
+ .get_priv_flags = virtnet_get_priv_flags,
};
static void virtnet_freeze_down(struct virtio_device *vdev)
--
2.19.0.rc2.392.g5ba43deb5a-goog
^ permalink raw reply related
* Re: [PATCH net-next] virtio_net: ethtool tx napi configuration
From: Willem de Bruijn @ 2018-09-12 23:27 UTC (permalink / raw)
To: f.fainelli
Cc: Network Development, David Miller, caleb.raitto, Jason Wang,
Michael S. Tsirkin, Jon Olson (Google Drive), Willem de Bruijn
In-Reply-To: <CAF=yD-+Aqe=9GbKGo_n748D97W2rJHdsYL+cay1gyR4eA2Hc=w@mail.gmail.com>
On Wed, Sep 12, 2018 at 3:11 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, Sep 12, 2018 at 2:16 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> >
> >
> >
> > On 9/12/2018 11:07 AM, Willem de Bruijn wrote:
> > > On Wed, Sep 12, 2018 at 1:42 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > >>
> > >>
> > >>
> > >> On 9/9/2018 3:44 PM, Willem de Bruijn wrote:
> > >>> From: Willem de Bruijn <willemb@google.com>
> > >>>
> > >>> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
> > >>> Interrupt moderation is currently not supported, so these accept and
> > >>> display the default settings of 0 usec and 1 frame.
> > >>>
> > >>> Toggle tx napi through a bit in tx-frames. So as to not interfere
> > >>> with possible future interrupt moderation, use bit 10, well outside
> > >>> the reasonable range of real interrupt moderation values.
> > >>>
> > >>> Changes are not atomic. The tx IRQ, napi BH and transmit path must
> > >>> be quiesced when switching modes. Only allow changing this setting
> > >>> when the device is down.
> > >>
> > >> Humm, would not a private ethtool flag to switch TX NAPI on/off be more
> > >> appropriate rather than use the coalescing configuration API here?
> > >
> > > What do you mean by private ethtool flag? A new field in ethtool
> > > --features (-k)?
> >
> > I meant using ethtool_drvinfo::n_priv_flags, ETH_SS_PRIV_FLAGS and then
> > ETHTOOL_GFPFLAGS and ETHTOOL_SPFLAGS to control the toggling of that
> > private flag. mlx5 has a number of privates flags for instance.
>
> Interesting, thanks! I was not at all aware of those ethtool flags.
> Am having a look. It definitely looks promising.
Okay, I made that change. That is indeed much cleaner, thanks.
Let me send the patch, initially as RFC.
I've observed one issue where if we toggle the flag before bringing
up the device, it hits a kernel BUG at include/linux/netdevice.h:515
BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
^ permalink raw reply
* Re: [PATCH bpf-next 06/11] bpf: Add reference tracking to verifier
From: Alexei Starovoitov @ 2018-09-12 23:17 UTC (permalink / raw)
To: Joe Stringer
Cc: daniel, netdev, ast, john.fastabend, tgraf, kafai, nitin.hande,
mauricio.vasquez
In-Reply-To: <20180912003640.28316-7-joe@wand.net.nz>
On Tue, Sep 11, 2018 at 05:36:35PM -0700, Joe Stringer wrote:
> Allow helper functions to acquire a reference and return it into a
> register. Specific pointer types such as the PTR_TO_SOCKET will
> implicitly represent such a reference. The verifier must ensure that
> these references are released exactly once in each path through the
> program.
>
> To achieve this, this commit assigns an id to the pointer and tracks it
> in the 'bpf_func_state', then when the function or program exits,
> verifies that all of the acquired references have been freed. When the
> pointer is passed to a function that frees the reference, it is removed
> from the 'bpf_func_state` and all existing copies of the pointer in
> registers are marked invalid.
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
> include/linux/bpf_verifier.h | 24 ++-
> kernel/bpf/verifier.c | 303 ++++++++++++++++++++++++++++++++---
> 2 files changed, 306 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 23a2b17bfd75..23f222e0cb0b 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -104,6 +104,17 @@ struct bpf_stack_state {
> u8 slot_type[BPF_REG_SIZE];
> };
>
> +struct bpf_reference_state {
> + /* Track each reference created with a unique id, even if the same
> + * instruction creates the reference multiple times (eg, via CALL).
> + */
> + int id;
> + /* Instruction where the allocation of this reference occurred. This
> + * is used purely to inform the user of a reference leak.
> + */
> + int insn_idx;
> +};
> +
> /* state of the program:
> * type of all registers and stack info
> */
> @@ -121,7 +132,9 @@ struct bpf_func_state {
> */
> u32 subprogno;
>
> - /* should be second to last. See copy_func_state() */
> + /* The following fields should be last. See copy_func_state() */
> + int acquired_refs;
> + struct bpf_reference_state *refs;
> int allocated_stack;
> struct bpf_stack_state *stack;
> };
> @@ -217,11 +230,16 @@ __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
> __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
> const char *fmt, ...);
>
> -static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
> +static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
> {
> struct bpf_verifier_state *cur = env->cur_state;
>
> - return cur->frame[cur->curframe]->regs;
> + return cur->frame[cur->curframe];
> +}
> +
> +static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
> +{
> + return cur_func(env)->regs;
> }
>
> int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index faa83b3d7011..67c62ef67d37 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1,5 +1,6 @@
> /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
> * Copyright (c) 2016 Facebook
> + * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
> *
> * This program is free software; you can redistribute it and/or
> * modify it under the terms of version 2 of the GNU General Public
> @@ -140,6 +141,18 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
> *
> * After the call R0 is set to return type of the function and registers R1-R5
> * are set to NOT_INIT to indicate that they are no longer readable.
> + *
> + * The following reference types represent a potential reference to a kernel
> + * resource which, after first being allocated, must be checked and freed by
> + * the BPF program:
> + * - PTR_TO_SOCKET_OR_NULL, PTR_TO_SOCKET
> + *
> + * When the verifier sees a helper call return a reference type, it allocates a
> + * pointer id for the reference and stores it in the current function state.
> + * Similar to the way that PTR_TO_MAP_VALUE_OR_NULL is converted into
> + * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the type
> + * passes through a NULL-check conditional. For the branch wherein the state is
> + * changed to CONST_IMM, the verifier releases the reference.
> */
>
> /* verifier_state + insn_idx are pushed to stack when branch is encountered */
> @@ -189,6 +202,7 @@ struct bpf_call_arg_meta {
> int access_size;
> s64 msize_smax_value;
> u64 msize_umax_value;
> + int ptr_id;
> };
>
> static DEFINE_MUTEX(bpf_verifier_lock);
> @@ -251,7 +265,42 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type)
>
> static bool reg_type_may_be_null(enum bpf_reg_type type)
> {
> - return type == PTR_TO_MAP_VALUE_OR_NULL;
> + return type == PTR_TO_MAP_VALUE_OR_NULL ||
> + type == PTR_TO_SOCKET_OR_NULL;
> +}
> +
> +static bool type_is_refcounted(enum bpf_reg_type type)
> +{
> + return type == PTR_TO_SOCKET;
> +}
> +
> +static bool type_is_refcounted_or_null(enum bpf_reg_type type)
> +{
> + return type == PTR_TO_SOCKET || type == PTR_TO_SOCKET_OR_NULL;
> +}
> +
> +static bool reg_is_refcounted(const struct bpf_reg_state *reg)
> +{
> + return type_is_refcounted(reg->type);
> +}
> +
> +static bool reg_is_refcounted_or_null(const struct bpf_reg_state *reg)
> +{
> + return type_is_refcounted_or_null(reg->type);
> +}
> +
> +static bool arg_type_is_refcounted(enum bpf_arg_type type)
> +{
> + return type == ARG_PTR_TO_SOCKET;
> +}
> +
> +/* Determine whether the function releases some resources allocated by another
> + * function call. The first reference type argument will be assumed to be
> + * released by release_reference().
> + */
> +static bool is_release_function(enum bpf_func_id func_id)
> +{
> + return false;
> }
>
> /* string representation of 'enum bpf_reg_type' */
> @@ -384,6 +433,12 @@ static void print_verifier_state(struct bpf_verifier_env *env,
> else
> verbose(env, "=%s", types_buf);
> }
> + if (state->acquired_refs && state->refs[0].id) {
> + verbose(env, " refs=%d", state->refs[0].id);
> + for (i = 1; i < state->acquired_refs; i++)
> + if (state->refs[i].id)
> + verbose(env, ",%d", state->refs[i].id);
> + }
> verbose(env, "\n");
> }
>
> @@ -402,6 +457,8 @@ static int copy_##NAME##_state(struct bpf_func_state *dst, \
> sizeof(*src->FIELD) * (src->COUNT / SIZE)); \
> return 0; \
> }
> +/* copy_reference_state() */
> +COPY_STATE_FN(reference, acquired_refs, refs, 1)
> /* copy_stack_state() */
> COPY_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
> #undef COPY_STATE_FN
> @@ -440,6 +497,8 @@ static int realloc_##NAME##_state(struct bpf_func_state *state, int size, \
> state->FIELD = new_##FIELD; \
> return 0; \
> }
> +/* realloc_reference_state() */
> +REALLOC_STATE_FN(reference, acquired_refs, refs, 1)
> /* realloc_stack_state() */
> REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
> #undef REALLOC_STATE_FN
> @@ -451,16 +510,89 @@ REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE)
> * which realloc_stack_state() copies over. It points to previous
> * bpf_verifier_state which is never reallocated.
> */
> -static int realloc_func_state(struct bpf_func_state *state, int size,
> - bool copy_old)
> +static int realloc_func_state(struct bpf_func_state *state, int stack_size,
> + int refs_size, bool copy_old)
> {
> - return realloc_stack_state(state, size, copy_old);
> + int err = realloc_reference_state(state, refs_size, copy_old);
> + if (err)
> + return err;
> + return realloc_stack_state(state, stack_size, copy_old);
> +}
> +
> +/* Acquire a pointer id from the env and update the state->refs to include
> + * this new pointer reference.
> + * On success, returns a valid pointer id to associate with the register
> + * On failure, returns a negative errno.
> + */
> +static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx)
> +{
> + struct bpf_func_state *state = cur_func(env);
> + int new_ofs = state->acquired_refs;
> + int id, err;
> +
> + err = realloc_reference_state(state, state->acquired_refs + 1, true);
> + if (err)
> + return err;
> + id = ++env->id_gen;
> + state->refs[new_ofs].id = id;
> + state->refs[new_ofs].insn_idx = insn_idx;
> +
> + return id;
> +}
> +
> +/* release function corresponding to acquire_reference_state(). Idempotent. */
> +static int __release_reference_state(struct bpf_func_state *state, int ptr_id)
> +{
> + int i, last_idx;
> +
> + if (!ptr_id)
> + return 0;
Is this defensive programming or this condition can actually happen?
As far as I can see all callers suppose to pass valid ptr_id into it.
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Ard Biesheuvel @ 2018-09-12 22:56 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: LKML, Netdev, David Miller, Greg Kroah-Hartman, Andrew Lutomirski,
Samuel Neves, Jean-Philippe Aumasson, Linux Crypto Mailing List
In-Reply-To: <CAHmME9rTEFLaYmzFrc719LW2DWS2Ua_vL6by1UXZ9uDvMpbhYw@mail.gmail.com>
On 11 September 2018 at 23:22, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hello Ard,
>
> I realize you've put a lot of good and hard work into the existing
> crypto API, and that my writing in these commit messages might be a
> bit too bristly and dismissive of that hard work. So I think in a
> sense it's understandable that you've responded as such here. But
> hopefully I can address your concerns. One thing to keep in mind is
> that Zinc endeavors to provide the basis for simple and direct
> functions to software algorithms. This is fairly modest goal. Just
> some functions that do some stuff in software. Around these you'll
> still be able to have complicated and impressive dynamic dispatch and
> asynchronous mechanisms such as the present crypto API. Zinc is merely
> getting the software implementation side done in a very simple and
> direct way. So I don't think there's a good reason for so much
> antagonism, despite a perhaps overbearing tone of my commit messages.
> Rather, I expect that we'll wind up working together on this quite a
> bit down the line.
>
No worries, it takes more than this to piss me off. I do take pride in
my work, but I am perfectly aware that I am not a rare talent like
Andy Polyakov, which is actually why I have worked extensively with
him in the past to get kernel specific changes to the ARM/arm64 NEON
code into upstream OpenSSL, so that we could use the upstream code
unmodified and benefit from upstream review. [1] [2]
In this series, you are dumping a huge volume of unannotated,
generated asm into the kernel which has been modified [by you] to
[among other things?] adhere to the kernel API (without documenting
what the changes are exactly). How does that live up to the promise of
better, peer reviewed code?
Then there is the performance claim. We know for instance that the
OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to
possess a micro-architectural property that ALU instructions are
essentially free when they are interleaved with SIMD instructions. But
we also know that a) Cortex-A7, which is a relevant target, is not one
of those cores, and b) that chip designers are not likely to optimize
for that particular usage pattern so relying on it in generic code is
unwise in general.
I am also concerned about your claim that all software algorithms will
be moved into this crypto library. You are not specific about whose
responsibility it will be that this is going to happen in a timely
fashion. But more importantly, it is not clear at all how you expect
this to work for, e.g., h/w instruction based SHAxxx or AES in various
chaining modes, which should be used only on cores that implement
those instructions (note that on arm64, we have optional instructions
for AES, PMULL, SHA1, SHA256, SHA512, SHA3, SM3 and SM4). Are all
those implementations (only few of which will be used on a certain
core) going to be part of the monolithic library? What are the APIs
going to look like for block ciphers, taking chaining modes into
account?
I'm sure it is rather simple to port the crypto API implementation of
ChaCha20 to use your library. I am more concerned about how your
library is going to expand to cover all other software algorithms that
we currently use in the kernel.
>> In spite of the wall of text, you fail to point out exactly why the
>> existing AEAD API in unsuitable, and why fixing it is not an option.
>
> I thought I had addressed this. Firstly, there's a need for more than
> just AEAD, but ignoring that, the AEAD API is a big full API that does
> lots of things, makes allocations, parses descriptors, and so forth.
> I'm sure this kind of highly-engineered approach will continue to
> improve over time in that highly engineered direction. Zinc is doing
> something a bit different: it's providing a series of simple functions
> for various cryptographic routines. This is a considerably different
> goal -- perhaps even a complementary one -- to the AEAD API.
>
I understand how your crypto library is different from the crypto API.
But the question was why WireGuard cannot make use of the crypto APIs
AEAD interface. And note that this is an honest question: I know that
the crypto API is cumbersome to use in general, but it would be very
helpful to understand where exactly the impedance mismatch is between
what your code needs and what the crypto API provides.
>> I don't think you have
>> convinced anyone else yet either.
>
> Please only speak for yourself and refrain from rhetoric like this,
> which is patently false.
>
Fair enough. You have clearly convinced Greg :-)
>> Please refrain from sending a v4 with just a couple of more tweaks on
>> top
>
> Sorry, no, I'm not going to stop working hard on this because you're
> wary of a new approach. I will continue to improve the submission
> until it is mergable, and I do not intend to stop.
>
Of course. But please respond to all the concerns, not just the low
hanging fruit. I have already mentioned a couple of times that I
object to dumping large volumes of generated assembly into the kernel
tree without any annotation whatsoever, or any documentation about how
the generated code was hand tweaked before submission. You have not
responded to that concern yet.
> Anyway, it sounds like this whole thing may have ruffled your feathers
> a bit. Will you be at Linux Plumbers Conference in November? I'm
> planning on attending, and perhaps we could find some time there to
> sit down and talk one on one a bit.
>
That would be good, yes. I will be there.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4e7f10bfc4069925e99cc4b428c3434e30b6c3f
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7918ecef073fe80eeb399a37d8d48561864eedf1
^ permalink raw reply
* Re: [bpf-next, v2 1/3] flow_dissector: implements flow dissector BPF hook
From: Willem de Bruijn @ 2018-09-12 22:53 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Petar Penkov, Network Development, David Miller,
Alexei Starovoitov, Daniel Borkmann, simon.horman, ecree,
songliubraving, Tom Herbert, Petar Penkov, Willem de Bruijn
In-Reply-To: <20180912222500.57wnc7sjofgatuxu@ast-mbp>
On Wed, Sep 12, 2018 at 6:25 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Sep 12, 2018 at 02:43:37PM -0400, Willem de Bruijn wrote:
> > On Tue, Sep 11, 2018 at 11:47 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Sep 07, 2018 at 05:11:08PM -0700, Petar Penkov wrote:
> > > > From: Petar Penkov <ppenkov@google.com>
> > > >
> > > > Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
> > > > attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
> > > > path. The BPF program is per-network namespace.
> > > >
> > > > Signed-off-by: Petar Penkov <ppenkov@google.com>
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > > > include/linux/bpf.h | 1 +
> > > > include/linux/bpf_types.h | 1 +
> > > > include/linux/skbuff.h | 7 ++
> > > > include/net/net_namespace.h | 3 +
> > > > include/net/sch_generic.h | 12 ++-
> > > > include/uapi/linux/bpf.h | 25 ++++++
> > > > kernel/bpf/syscall.c | 8 ++
> > > > kernel/bpf/verifier.c | 32 ++++++++
> > > > net/core/filter.c | 67 ++++++++++++++++
> > > > net/core/flow_dissector.c | 136 +++++++++++++++++++++++++++++++++
> > > > tools/bpf/bpftool/prog.c | 1 +
> > > > tools/include/uapi/linux/bpf.h | 25 ++++++
> > > > tools/lib/bpf/libbpf.c | 2 +
> > >
> > > please split up update to tools/include/uapi/linux/bpf.h as a separate patch 2.
> > > We often have conflicts in there, so best to have a separate.
> > > Also please split tools/lib and tools/bpf chnages into patch 3.
> >
> > Will do in v3.
> >
> > > > struct qdisc_skb_cb {
> > > > - unsigned int pkt_len;
> > > > - u16 slave_dev_queue_mapping;
> > > > - u16 tc_classid;
> > > > + union {
> > > > + struct {
> > > > + unsigned int pkt_len;
> > > > + u16 slave_dev_queue_mapping;
> > > > + u16 tc_classid;
> > > > + };
> > > > + struct bpf_flow_keys *flow_keys;
> > > > + };
> > >
> > > is this magic really necessary? flow_dissector runs very early in recv path.
> > > There is no qdisc or conflicts with tcp/ip use of cb.
> > > I think the whole cb block can be used.
> >
> > The flow dissector also runs in the context of TC, from flower.
> > But that is not a reason to use this struct.
> >
> > We need both (a) data shared with the BPF application and between
> > applications using tail-calls, to pass along the parse offset (nhoff),
> > and (b) data not accessible by the program, to store the flow_keys
> > pointer.
> >
> > qdisc_skb_cb already has this split, exposing only the 20B .data
> > field to the application. Flow dissection currently reuses the existing
> > bpf_convert_ctx_access logic for this field.
> >
> > We could create a separate flowdissect_skb_cb struct with the
> > same split setup, but a second constraint is that relevant internal
> > BPF interfaces already expect qdisc_skb_cb, notably
> > bkf_skb_data_end. So the union was easier.
>
> got it. all makes sense.
>
> >
> > There is another way to pass nhoff besides cb[] (see below). But
> > I don't immediately see another place to store the flow_keys ptr.
> >
> > At least, if we pass skb as context. One larger change would
> > be to introduce another ctx struct, similar to sk_reuseport_(kern|md).
>
> yeah. thought about this too, but your approach looks easier and faster.
> Accesses to skb have one less dereference.
>
> > > > @@ -2333,6 +2335,7 @@ struct __sk_buff {
> > > > /* ... here. */
> > > >
> > > > __u32 data_meta;
> > > > + __u32 flow_keys;
> > >
> > > please use
> > > struct bpf_flow_keys *flow_keys;
> > > instead.
> > >
> > > See what we did in 'struct sk_msg_md' and in 'struct sk_reuseport_md'.
> > > There is no need to hide pointers in u32.
> > >
> >
> > Will do in v3.
> >
> > > > @@ -658,6 +754,46 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> > > > FLOW_DISSECTOR_KEY_BASIC,
> > > > target_container);
> > > >
> > > > + rcu_read_lock();
> > > > + attached = skb ? rcu_dereference(dev_net(skb->dev)->flow_dissector_prog)
> > > > + : NULL;
> > > > + if (attached) {
> > > > + /* Note that even though the const qualifier is discarded
> > > > + * throughout the execution of the BPF program, all changes(the
> > > > + * control block) are reverted after the BPF program returns.
> > > > + * Therefore, __skb_flow_dissect does not alter the skb.
> > > > + */
> > > > + struct bpf_flow_keys flow_keys = {};
> > > > + struct qdisc_skb_cb cb_saved;
> > > > + struct qdisc_skb_cb *cb;
> > > > + u16 *pseudo_cb;
> > > > + u32 result;
> > > > +
> > > > + cb = qdisc_skb_cb(skb);
> > > > + pseudo_cb = (u16 *)bpf_skb_cb((struct sk_buff *)skb);
> > > > +
> > > > + /* Save Control Block */
> > > > + memcpy(&cb_saved, cb, sizeof(cb_saved));
> > > > + memset(cb, 0, sizeof(cb_saved));
> > > > +
> > > > + /* Pass parameters to the BPF program */
> > > > + cb->flow_keys = &flow_keys;
> > > > + *pseudo_cb = nhoff;
> > >
> > > I don't understand this bit.
> > > What is this pseudo_cb and why nhoff goes in there?
> > > Some odd way to pass it into the prog?
> >
> > Yes. nhoff passes the offset to the program to start parsing from.
> > Applications also pass this during tail calls.
> >
> > Alternatively we can just add a new field to struct bpf_flow_keys.
>
> I think that certainly will be cleaner and easier to use from
> bpf prog pov. Since flow_keys stay constant any change to nhoff
> between tail_calls will be preserved too. I see no cons to such approach.
Yes, it's definitely simpler. We'll do that.
Thanks!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox