Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] netlabel: remove redundant assignment to pointer iter
From: Christophe JAILLET @ 2019-09-01 17:20 UTC (permalink / raw)
  To: Paul Moore, Colin King
  Cc: David S . Miller, netdev, linux-security-module, kernel-janitors,
	linux-kernel
In-Reply-To: <CAHC9VhSVKEJ-EBAry5fVN3Ux22occGQ5jxbFBecMsR+q7+UT5A@mail.gmail.com>

Le 01/09/2019 à 18:04, Paul Moore a écrit :
> On Sun, Sep 1, 2019 at 11:52 AM Colin King <colin.king@canonical.com> wrote:
>> From: Colin Ian King <colin.king@canonical.com>
>>
>> Pointer iter is being initialized with a value that is never read and
>> is being re-assigned a little later on. The assignment is redundant
>> and hence can be removed.
>>
>> Addresses-Coverity: ("Unused value")
>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
>> ---
>>   net/netlabel/netlabel_kapi.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
> This patch doesn't seem correct to me, at least not in current form.
> At the top of _netlbl_catmap_getnode() is a check to see if iter is
> NULL (as well as a few other checks on iter after that); this patch
> would break that code.
>
> Perhaps we can get rid of the iter/catmap assignment when we define
> iter, but I don't think this patch is the right way to do it.
>
>> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
>> index 2b0ef55cf89e..409a3ae47ce2 100644
>> --- a/net/netlabel/netlabel_kapi.c
>> +++ b/net/netlabel/netlabel_kapi.c
>> @@ -607,7 +607,7 @@ static struct netlbl_lsm_catmap *_netlbl_catmap_getnode(
>>    */
>>   int netlbl_catmap_walk(struct netlbl_lsm_catmap *catmap, u32 offset)
>>   {
>> -       struct netlbl_lsm_catmap *iter = catmap;
>> +       struct netlbl_lsm_catmap *iter;
>>          u32 idx;
>>          u32 bit;
>>          NETLBL_CATMAP_MAPTYPE bitmap;
>> --
>> 2.20.1
>

Hi,

'iter' is reassigned a value between the declaration and the NULL test, so removing the first initialization looks good to me.
int  netlbl_catmap_walk(struct  netlbl_lsm_catmap  *catmap,  u32  offset)
{
|

	struct  netlbl_lsm_catmap  *iter  =  catmap;
	u32  idx;
	u32  bit;
	NETLBL_CATMAP_MAPTYPE  bitmap;

	iter  =  _netlbl_catmap_getnode(&catmap,  offset,  _CM_F_WALK,  0);			<-- Here
	if  (iter  ==  NULL)
		return  -ENOENT; This is dead code since commit d960a6184a92 ("netlabel: fix the catmap walking functions") where the call to _netlbl_catmap_getnode has been introduced.

Just my 2c,

CJ|


^ permalink raw reply

* Re: [PATCH] netlabel: remove redundant assignment to pointer iter
From: Paul Moore @ 2019-09-01 16:04 UTC (permalink / raw)
  To: Colin King
  Cc: David S . Miller, netdev, linux-security-module, kernel-janitors,
	linux-kernel
In-Reply-To: <20190901155205.16877-1-colin.king@canonical.com>

On Sun, Sep 1, 2019 at 11:52 AM Colin King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> Pointer iter is being initialized with a value that is never read and
> is being re-assigned a little later on. The assignment is redundant
> and hence can be removed.
>
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  net/netlabel/netlabel_kapi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This patch doesn't seem correct to me, at least not in current form.
At the top of _netlbl_catmap_getnode() is a check to see if iter is
NULL (as well as a few other checks on iter after that); this patch
would break that code.

Perhaps we can get rid of the iter/catmap assignment when we define
iter, but I don't think this patch is the right way to do it.

> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 2b0ef55cf89e..409a3ae47ce2 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -607,7 +607,7 @@ static struct netlbl_lsm_catmap *_netlbl_catmap_getnode(
>   */
>  int netlbl_catmap_walk(struct netlbl_lsm_catmap *catmap, u32 offset)
>  {
> -       struct netlbl_lsm_catmap *iter = catmap;
> +       struct netlbl_lsm_catmap *iter;
>         u32 idx;
>         u32 bit;
>         NETLBL_CATMAP_MAPTYPE bitmap;
> --
> 2.20.1

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [PATCH] netlabel: remove redundant assignment to pointer iter
From: Colin King @ 2019-09-01 15:52 UTC (permalink / raw)
  To: Paul Moore, David S . Miller, netdev, linux-security-module
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Pointer iter is being initialized with a value that is never read and
is being re-assigned a little later on. The assignment is redundant
and hence can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 net/netlabel/netlabel_kapi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
index 2b0ef55cf89e..409a3ae47ce2 100644
--- a/net/netlabel/netlabel_kapi.c
+++ b/net/netlabel/netlabel_kapi.c
@@ -607,7 +607,7 @@ static struct netlbl_lsm_catmap *_netlbl_catmap_getnode(
  */
 int netlbl_catmap_walk(struct netlbl_lsm_catmap *catmap, u32 offset)
 {
-	struct netlbl_lsm_catmap *iter = catmap;
+	struct netlbl_lsm_catmap *iter;
 	u32 idx;
 	u32 bit;
 	NETLBL_CATMAP_MAPTYPE bitmap;
-- 
2.20.1


^ permalink raw reply related

* [PATCH net] isdn/capi: check message length in capi_write()
From: Eric Biggers @ 2019-09-01 14:32 UTC (permalink / raw)
  To: netdev, David S . Miller; +Cc: Karsten Keil, syzkaller-bugs
In-Reply-To: <0000000000000e35f00581a579cd@google.com>

From: Eric Biggers <ebiggers@google.com>

syzbot reported:

    BUG: KMSAN: uninit-value in capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700
    CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ #2
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
      __dump_stack lib/dump_stack.c:77 [inline]
      dump_stack+0x173/0x1d0 lib/dump_stack.c:113
      kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
      __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
      capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700
      do_loop_readv_writev fs/read_write.c:703 [inline]
      do_iter_write+0x83e/0xd80 fs/read_write.c:961
      vfs_writev fs/read_write.c:1004 [inline]
      do_writev+0x397/0x840 fs/read_write.c:1039
      __do_sys_writev fs/read_write.c:1112 [inline]
      __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109
      __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109
      do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
      entry_SYSCALL_64_after_hwframe+0x63/0xe7
    [...]

The problem is that capi_write() is reading past the end of the message.
Fix it by checking the message's length in the needed places.

Reported-and-tested-by: syzbot+0849c524d9c634f5ae66@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 drivers/isdn/capi/capi.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/isdn/capi/capi.c b/drivers/isdn/capi/capi.c
index 3c3ad42f22bf..a016d8c3c410 100644
--- a/drivers/isdn/capi/capi.c
+++ b/drivers/isdn/capi/capi.c
@@ -688,6 +688,9 @@ capi_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos
 	if (!cdev->ap.applid)
 		return -ENODEV;
 
+	if (count < CAPIMSG_BASELEN)
+		return -EINVAL;
+
 	skb = alloc_skb(count, GFP_USER);
 	if (!skb)
 		return -ENOMEM;
@@ -698,7 +701,8 @@ capi_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos
 	}
 	mlen = CAPIMSG_LEN(skb->data);
 	if (CAPIMSG_CMD(skb->data) == CAPI_DATA_B3_REQ) {
-		if ((size_t)(mlen + CAPIMSG_DATALEN(skb->data)) != count) {
+		if (count < 18 ||
+		    (size_t)(mlen + CAPIMSG_DATALEN(skb->data)) != count) {
 			kfree_skb(skb);
 			return -EINVAL;
 		}
@@ -711,6 +715,10 @@ capi_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos
 	CAPIMSG_SETAPPID(skb->data, cdev->ap.applid);
 
 	if (CAPIMSG_CMD(skb->data) == CAPI_DISCONNECT_B3_RESP) {
+		if (count < 12) {
+			kfree_skb(skb);
+			return -EINVAL;
+		}
 		mutex_lock(&cdev->lock);
 		capincci_free(cdev, CAPIMSG_NCCI(skb->data));
 		mutex_unlock(&cdev->lock);
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH] Fix a double free bug in rsi_91x_deinit
From: Guenter Roeck @ 2019-09-01 14:08 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Hui Peng, security, Mathias Payer, David S. Miller,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <87k1asqw87.fsf@kamboji.qca.qualcomm.com>

On 9/1/19 1:03 AM, Kalle Valo wrote:
> Guenter Roeck <linux@roeck-us.net> writes:
> 
>> On Mon, Aug 19, 2019 at 06:02:29PM -0400, Hui Peng wrote:
>>> `dev` (struct rsi_91x_usbdev *) field of adapter
>>> (struct rsi_91x_usbdev *) is allocated  and initialized in
>>> `rsi_init_usb_interface`. If any error is detected in information
>>> read from the device side,  `rsi_init_usb_interface` will be
>>> freed. However, in the higher level error handling code in
>>> `rsi_probe`, if error is detected, `rsi_91x_deinit` is called
>>> again, in which `dev` will be freed again, resulting double free.
>>>
>>> This patch fixes the double free by removing the free operation on
>>> `dev` in `rsi_init_usb_interface`, because `rsi_91x_deinit` is also
>>> used in `rsi_disconnect`, in that code path, the `dev` field is not
>>>   (and thus needs to be) freed.
>>>
>>> This bug was found in v4.19, but is also present in the latest version
>>> of kernel.
>>>
>>> Reported-by: Hui Peng <benquike@gmail.com>
>>> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
>>> Signed-off-by: Hui Peng <benquike@gmail.com>
>>
>> FWIW:
>>
>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
>>
>> This patch is listed as fix for CVE-2019-15504, which has a CVSS 2.0 score
>> of 10.0 (high) and CVSS 3.0 score of 9.8 (critical).
> 
> A double free in error path is considered as a critical CVE issue? I'm
> very curious, why is that?
> 

You'd have to ask the people assigning CVSS scores. However, if the memory
was reallocated, that reallocated memory (which is still in use) is freed.
Then all kinds of bad things can happen.

Guenter

>> Are there any plans to apply this patch to the upstream kernel anytime
>> soon ?
> 
> I was on vacation last week and hence I have not been able to apply any
> wireless patches. I should be able to catch up next week.
> 


^ permalink raw reply

* [PATCH V2 3/4] firmware: xilinx: Add ZynqMP aes API for AES functionality
From: Kalyani Akula @ 2019-09-01 13:54 UTC (permalink / raw)
  To: herbert, kstewart, gregkh, tglx, pombredanne, linux-crypto,
	linux-kernel, netdev
  Cc: Kalyani Akula, Kalyani Akula
In-Reply-To: <1567346098-27927-1-git-send-email-kalyani.akula@xilinx.com>

Add ZynqMP firmware AES API to perform encryption/decryption
of given data.

Signed-off-by: Kalyani Akula <kalyani.akula@xilinx.com>
---
 drivers/firmware/xilinx/zynqmp.c     | 23 +++++++++++++++++++++++
 include/linux/firmware/xlnx-zynqmp.h |  2 ++
 2 files changed, 25 insertions(+)

diff --git a/drivers/firmware/xilinx/zynqmp.c b/drivers/firmware/xilinx/zynqmp.c
index fd3d837..74f3354 100644
--- a/drivers/firmware/xilinx/zynqmp.c
+++ b/drivers/firmware/xilinx/zynqmp.c
@@ -663,6 +663,28 @@ static int zynqmp_pm_set_requirement(const u32 node, const u32 capabilities,
 	return zynqmp_pm_invoke_fn(PM_SET_REQUIREMENT, node, capabilities,
 				   qos, ack, NULL);
 }
+/**
+ * zynqmp_pm_aes - Access AES hardware to encrypt/decrypt the data using
+ * AES-GCM core.
+ * @address:	Address of the AesParams structure.
+ * @out:	Returned output value
+ *
+ * Return:	Returns status, either success or error code.
+ */
+static int zynqmp_pm_aes_engine(const u64 address, u32 *out)
+{
+	u32 ret_payload[PAYLOAD_ARG_CNT];
+	int ret;
+
+	if (!out)
+		return -EINVAL;
+
+	ret = zynqmp_pm_invoke_fn(PM_SECURE_AES, upper_32_bits(address),
+				  lower_32_bits(address),
+				  0, 0, ret_payload);
+	*out = ret_payload[1];
+	return ret;
+}
 
 static const struct zynqmp_eemi_ops eemi_ops = {
 	.get_api_version = zynqmp_pm_get_api_version,
@@ -687,6 +709,7 @@ static int zynqmp_pm_set_requirement(const u32 node, const u32 capabilities,
 	.set_requirement = zynqmp_pm_set_requirement,
 	.fpga_load = zynqmp_pm_fpga_load,
 	.fpga_get_status = zynqmp_pm_fpga_get_status,
+	.aes = zynqmp_pm_aes_engine,
 };
 
 /**
diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h
index 778abbb..508edd7 100644
--- a/include/linux/firmware/xlnx-zynqmp.h
+++ b/include/linux/firmware/xlnx-zynqmp.h
@@ -77,6 +77,7 @@ enum pm_api_id {
 	PM_CLOCK_GETRATE,
 	PM_CLOCK_SETPARENT,
 	PM_CLOCK_GETPARENT,
+	PM_SECURE_AES = 47,
 };
 
 /* PMU-FW return status codes */
@@ -294,6 +295,7 @@ struct zynqmp_eemi_ops {
 			       const u32 capabilities,
 			       const u32 qos,
 			       const enum zynqmp_pm_request_ack ack);
+	int (*aes)(const u64 address, u32 *out);
 };
 
 int zynqmp_pm_invoke_fn(u32 pm_api_id, u32 arg0, u32 arg1,
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH V2 1/4] dt-bindings: crypto: Add bindings for ZynqMP AES driver
From: Kalyani Akula @ 2019-09-01 13:54 UTC (permalink / raw)
  To: herbert, kstewart, gregkh, tglx, pombredanne, linux-crypto,
	linux-kernel, netdev
  Cc: Kalyani Akula, Kalyani Akula
In-Reply-To: <1567346098-27927-1-git-send-email-kalyani.akula@xilinx.com>

Add documentation to describe Xilinx ZynqMP AES driver
bindings.

Signed-off-by: Kalyani Akula <kalyani.akula@xilinx.com>
---
 Documentation/devicetree/bindings/crypto/xlnx,zynqmp-aes.txt | 12 ++++++++++++
 1 file changed, 12 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/xlnx,zynqmp-aes.txt

diff --git a/Documentation/devicetree/bindings/crypto/xlnx,zynqmp-aes.txt b/Documentation/devicetree/bindings/crypto/xlnx,zynqmp-aes.txt
new file mode 100644
index 0000000..226bfb9
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/xlnx,zynqmp-aes.txt
@@ -0,0 +1,12 @@
+Xilinx ZynqMP AES hw acceleration support
+
+The ZynqMP PS-AES hw accelerator is used to encrypt/decrypt
+the given user data.
+
+Required properties:
+- compatible: should contain "xlnx,zynqmp-aes"
+
+Example:
+	zynqmp_aes {
+		compatible = "xlnx,zynqmp-aes";
+	};
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH V2 2/4] ARM64: zynqmp: Add Xilinix AES node.
From: Kalyani Akula @ 2019-09-01 13:54 UTC (permalink / raw)
  To: herbert, kstewart, gregkh, tglx, pombredanne, linux-crypto,
	linux-kernel, netdev
  Cc: Kalyani Akula, Kalyani Akula
In-Reply-To: <1567346098-27927-1-git-send-email-kalyani.akula@xilinx.com>

This patch adds a AES DT node for Xilinx ZynqMP SoC.

Signed-off-by: Kalyani Akula <kalyani.akula@xilinx.com>
---
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index 9aa6734..b41febc 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -124,6 +124,10 @@
 			     <1 10 0xf08>;
 	};
 
+	xlnx_aes: zynqmp_aes {
+		compatible = "xlnx,zynqmp-aes";
+	};
+
 	amba_apu: amba-apu@0 {
 		compatible = "simple-bus";
 		#address-cells = <2>;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH V2 4/4] crypto: Add Xilinx AES driver
From: Kalyani Akula @ 2019-09-01 13:54 UTC (permalink / raw)
  To: herbert, kstewart, gregkh, tglx, pombredanne, linux-crypto,
	linux-kernel, netdev
  Cc: Kalyani Akula, Kalyani Akula
In-Reply-To: <1567346098-27927-1-git-send-email-kalyani.akula@xilinx.com>

This patch adds AES driver support for the Xilinx
ZynqMP SoC.

Signed-off-by: Kalyani Akula <kalyani.akula@xilinx.com>
---
 drivers/crypto/Kconfig          |  11 ++
 drivers/crypto/Makefile         |   1 +
 drivers/crypto/zynqmp-aes-gcm.c | 297 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 309 insertions(+)
 create mode 100644 drivers/crypto/zynqmp-aes-gcm.c

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 603413f..a0d058a 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -677,6 +677,17 @@ config CRYPTO_DEV_ROCKCHIP
 	  This driver interfaces with the hardware crypto accelerator.
 	  Supporting cbc/ecb chainmode, and aes/des/des3_ede cipher mode.
 
+config CRYPTO_DEV_ZYNQMP_AES
+	tristate "Support for Xilinx ZynqMP AES hw accelerator"
+	depends on ARCH_ZYNQMP || COMPILE_TEST
+	select CRYPTO_AES
+	select CRYPTO_SKCIPHER
+	help
+	  Xilinx ZynqMP has AES-GCM engine used for symmetric key
+	  encryption and decryption. This driver interfaces with AES hw
+	  accelerator. Select this if you want to use the ZynqMP module
+	  for AES algorithms.
+
 config CRYPTO_DEV_MEDIATEK
 	tristate "MediaTek's EIP97 Cryptographic Engine driver"
 	depends on (ARM && ARCH_MEDIATEK) || COMPILE_TEST
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index afc4753..c99663a 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -48,3 +48,4 @@ obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/
 obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/
 obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/
 obj-y += hisilicon/
+obj-$(CONFIG_CRYPTO_DEV_ZYNQMP_AES) += zynqmp-aes-gcm.o
diff --git a/drivers/crypto/zynqmp-aes-gcm.c b/drivers/crypto/zynqmp-aes-gcm.c
new file mode 100644
index 0000000..d65f038
--- /dev/null
+++ b/drivers/crypto/zynqmp-aes-gcm.c
@@ -0,0 +1,297 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xilinx ZynqMP AES Driver.
+ * Copyright (c) 2019 Xilinx Inc.
+ */
+
+#include <crypto/aes.h>
+#include <crypto/scatterwalk.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/scatterlist.h>
+#include <linux/firmware/xlnx-zynqmp.h>
+
+#define ZYNQMP_AES_IV_SIZE			12
+#define ZYNQMP_AES_GCM_SIZE			16
+#define ZYNQMP_AES_KEY_SIZE			32
+
+#define ZYNQMP_AES_DECRYPT			0
+#define ZYNQMP_AES_ENCRYPT			1
+
+#define ZYNQMP_AES_KUP_KEY			0
+#define ZYNQMP_AES_DEVICE_KEY			1
+#define ZYNQMP_AES_PUF_KEY			2
+
+#define ZYNQMP_AES_GCM_TAG_MISMATCH_ERR		0x01
+#define ZYNQMP_AES_SIZE_ERR			0x06
+#define ZYNQMP_AES_WRONG_KEY_SRC_ERR		0x13
+#define ZYNQMP_AES_PUF_NOT_PROGRAMMED		0xE300
+
+#define ZYNQMP_AES_BLOCKSIZE			0x04
+
+static const struct zynqmp_eemi_ops *eemi_ops;
+struct zynqmp_aes_dev *aes_dd;
+
+struct zynqmp_aes_dev {
+	struct device *dev;
+};
+
+struct zynqmp_aes_op {
+	struct zynqmp_aes_dev *dd;
+	void *src;
+	void *dst;
+	int len;
+	u8 key[ZYNQMP_AES_KEY_SIZE];
+	u8 *iv;
+	u32 keylen;
+	u32 keytype;
+};
+
+struct zynqmp_aes_data {
+	u64 src;
+	u64 iv;
+	u64 key;
+	u64 dst;
+	u64 size;
+	u64 optype;
+	u64 keysrc;
+};
+
+static int zynqmp_setkey_blk(struct crypto_tfm *tfm, const u8 *key,
+			     unsigned int len)
+{
+	struct zynqmp_aes_op *op = crypto_tfm_ctx(tfm);
+
+	if (((len != 1) && (len !=  ZYNQMP_AES_KEY_SIZE)) || (!key))
+		return -EINVAL;
+
+	if (len == 1) {
+		op->keytype = *key;
+
+		if ((op->keytype < ZYNQMP_AES_KUP_KEY) ||
+			(op->keytype > ZYNQMP_AES_PUF_KEY))
+			return -EINVAL;
+
+	} else if (len == ZYNQMP_AES_KEY_SIZE) {
+		op->keytype = ZYNQMP_AES_KUP_KEY;
+		op->keylen = len;
+		memcpy(op->key, key, len);
+	}
+
+	return 0;
+}
+
+static int zynqmp_aes_xcrypt(struct blkcipher_desc *desc,
+			     struct scatterlist *dst,
+			     struct scatterlist *src,
+			     unsigned int nbytes,
+			     unsigned int flags)
+{
+	struct zynqmp_aes_op *op = crypto_blkcipher_ctx(desc->tfm);
+	struct zynqmp_aes_dev *dd = aes_dd;
+	int err, ret, copy_bytes, src_data = 0, dst_data = 0;
+	dma_addr_t dma_addr, dma_addr_buf;
+	struct zynqmp_aes_data *abuf;
+	struct blkcipher_walk walk;
+	unsigned int data_size;
+	size_t dma_size;
+	char *kbuf;
+
+	if (!eemi_ops->aes)
+		return -ENOTSUPP;
+
+	if (op->keytype == ZYNQMP_AES_KUP_KEY)
+		dma_size = nbytes + ZYNQMP_AES_KEY_SIZE
+			+ ZYNQMP_AES_IV_SIZE;
+	else
+		dma_size = nbytes + ZYNQMP_AES_IV_SIZE;
+
+	kbuf = dma_alloc_coherent(dd->dev, dma_size, &dma_addr, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	abuf = dma_alloc_coherent(dd->dev, sizeof(struct zynqmp_aes_data),
+				  &dma_addr_buf, GFP_KERNEL);
+	if (!abuf) {
+		dma_free_coherent(dd->dev, dma_size, kbuf, dma_addr);
+		return -ENOMEM;
+	}
+
+	data_size = nbytes;
+	blkcipher_walk_init(&walk, dst, src, data_size);
+	err = blkcipher_walk_virt(desc, &walk);
+	op->iv = walk.iv;
+
+	while ((nbytes = walk.nbytes)) {
+		op->src = walk.src.virt.addr;
+		memcpy(kbuf + src_data, op->src, nbytes);
+		src_data = src_data + nbytes;
+		nbytes &= (ZYNQMP_AES_BLOCKSIZE - 1);
+		err = blkcipher_walk_done(desc, &walk, nbytes);
+	}
+	memcpy(kbuf + data_size, op->iv, ZYNQMP_AES_IV_SIZE);
+	abuf->src = dma_addr;
+	abuf->dst = dma_addr;
+	abuf->iv = abuf->src + data_size;
+	abuf->size = data_size - ZYNQMP_AES_GCM_SIZE;
+	abuf->optype = flags;
+	abuf->keysrc = op->keytype;
+
+	if (op->keytype == ZYNQMP_AES_KUP_KEY) {
+		memcpy(kbuf + data_size + ZYNQMP_AES_IV_SIZE,
+		       op->key, ZYNQMP_AES_KEY_SIZE);
+
+		abuf->key = abuf->src + data_size + ZYNQMP_AES_IV_SIZE;
+	} else {
+		abuf->key = 0;
+	}
+	eemi_ops->aes(dma_addr_buf, &ret);
+
+	if (ret != 0) {
+		switch (ret) {
+		case ZYNQMP_AES_GCM_TAG_MISMATCH_ERR:
+			dev_err(dd->dev, "ERROR: Gcm Tag mismatch\n\r");
+			break;
+		case ZYNQMP_AES_SIZE_ERR:
+			dev_err(dd->dev, "ERROR : Non word aligned data\n\r");
+			break;
+		case ZYNQMP_AES_WRONG_KEY_SRC_ERR:
+			dev_err(dd->dev, "ERROR: Wrong KeySrc, enable secure mode\n\r");
+			break;
+		case ZYNQMP_AES_PUF_NOT_PROGRAMMED:
+			dev_err(dd->dev, "ERROR: PUF is not registered\r\n");
+			break;
+		default:
+			dev_err(dd->dev, "ERROR: Invalid");
+			break;
+		}
+		goto END;
+	}
+	if (flags)
+		copy_bytes = data_size;
+	else
+		copy_bytes = data_size - ZYNQMP_AES_GCM_SIZE;
+
+	blkcipher_walk_init(&walk, dst, src, copy_bytes);
+	err = blkcipher_walk_virt(desc, &walk);
+
+	while ((nbytes = walk.nbytes)) {
+		memcpy(walk.dst.virt.addr, kbuf + dst_data, nbytes);
+		dst_data = dst_data + nbytes;
+		nbytes &= (ZYNQMP_AES_BLOCKSIZE - 1);
+		err = blkcipher_walk_done(desc, &walk, nbytes);
+	}
+END:
+	memset(kbuf, 0, dma_size);
+	memset(abuf, 0, sizeof(struct zynqmp_aes_data));
+	dma_free_coherent(dd->dev, dma_size, kbuf, dma_addr);
+	dma_free_coherent(dd->dev, sizeof(struct zynqmp_aes_data),
+			  abuf, dma_addr_buf);
+	return err;
+}
+
+static int zynqmp_aes_decrypt(struct blkcipher_desc *desc,
+			      struct scatterlist *dst,
+			      struct scatterlist *src,
+			      unsigned int nbytes)
+{
+	return zynqmp_aes_xcrypt(desc, dst, src, nbytes, ZYNQMP_AES_DECRYPT);
+}
+
+static int zynqmp_aes_encrypt(struct blkcipher_desc *desc,
+			      struct scatterlist *dst,
+			      struct scatterlist *src,
+			      unsigned int nbytes)
+{
+	return zynqmp_aes_xcrypt(desc, dst, src, nbytes, ZYNQMP_AES_ENCRYPT);
+}
+
+static struct crypto_alg zynqmp_alg = {
+	.cra_name		=	"xilinx-zynqmp-aes",
+	.cra_driver_name	=	"zynqmp-aes-gcm",
+	.cra_priority		=	400,
+	.cra_flags		=	CRYPTO_ALG_TYPE_BLKCIPHER |
+					CRYPTO_ALG_KERN_DRIVER_ONLY,
+	.cra_blocksize		=	ZYNQMP_AES_BLOCKSIZE,
+	.cra_ctxsize		=	sizeof(struct zynqmp_aes_op),
+	.cra_alignmask		=	15,
+	.cra_type		=	&crypto_blkcipher_type,
+	.cra_module		=	THIS_MODULE,
+	.cra_u			=	{
+	.blkcipher	=	{
+			.min_keysize	=	0,
+			.max_keysize	=	ZYNQMP_AES_KEY_SIZE,
+			.setkey		=	zynqmp_setkey_blk,
+			.encrypt	=	zynqmp_aes_encrypt,
+			.decrypt	=	zynqmp_aes_decrypt,
+			.ivsize		=	ZYNQMP_AES_IV_SIZE,
+		}
+	}
+};
+
+static const struct of_device_id zynqmp_aes_dt_ids[] = {
+	{ .compatible = "xlnx,zynqmp-aes" },
+	{ /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, zynqmp_aes_dt_ids);
+
+static int zynqmp_aes_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	int ret;
+
+	eemi_ops = zynqmp_pm_get_eemi_ops();
+
+	if (IS_ERR(eemi_ops))
+		return PTR_ERR(eemi_ops);
+
+	aes_dd = devm_kzalloc(dev, sizeof(*aes_dd), GFP_KERNEL);
+	if (!aes_dd)
+		return -ENOMEM;
+
+	aes_dd->dev = dev;
+	platform_set_drvdata(pdev, aes_dd);
+
+	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(44));
+	if (ret < 0) {
+		dev_err(dev, "no usable DMA configuration");
+		return ret;
+	}
+
+	ret = crypto_register_alg(&zynqmp_alg);
+	if (ret)
+		goto err_algs;
+
+	dev_info(dev, "AES Successfully Registered\n\r");
+	return 0;
+
+err_algs:
+	return ret;
+}
+
+static int zynqmp_aes_remove(struct platform_device *pdev)
+{
+	aes_dd = platform_get_drvdata(pdev);
+	if (!aes_dd)
+		return -ENODEV;
+	crypto_unregister_alg(&zynqmp_alg);
+
+	return 0;
+}
+
+static struct platform_driver xilinx_aes_driver = {
+	.probe = zynqmp_aes_probe,
+	.remove = zynqmp_aes_remove,
+	.driver = {
+		.name = "zynqmp_aes",
+		.of_match_table = of_match_ptr(zynqmp_aes_dt_ids),
+	},
+};
+
+module_platform_driver(xilinx_aes_driver);
+
+MODULE_DESCRIPTION("Xilinx ZynqMP AES hw acceleration support.");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Nava kishore Manne <nava.manne@xilinx.com>");
+MODULE_AUTHOR("Kalyani Akula <kalyani.akula@xilinx.com>");
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH V2 0/4] Add Xilinx's ZynqMP AES driver support
From: Kalyani Akula @ 2019-09-01 13:54 UTC (permalink / raw)
  To: herbert, kstewart, gregkh, tglx, pombredanne, linux-crypto,
	linux-kernel, netdev
  Cc: Kalyani Akula, Kalyani Akula

This patch set adds support for
- dt-binding docs for Xilinx ZynqMP AES driver
- Adds device tree node for ZynqMP AES driver
- Adds communication layer support for aes in zynqmp.c
- Adds Xilinx ZynqMP driver for AES Algorithm

V2 Changes :
- Converted RFC PATCH to PATCH
- Removed ALG_SET_KEY_TYPE that was added to support keytype
  attribute. Taken using setkey interface.
- Removed deprecated BLKCIPHER in Kconfig
- Erased Key/IV from the buffer.
- Renamed zynqmp-aes driver to zynqmp-aes-gcm.
- Addressed few other review comments

Kalyani Akula (4):
  dt-bindings: crypto: Add bindings for ZynqMP AES driver
  ARM64: zynqmp: Add Xilinix AES node.
  firmware: xilinx: Add ZynqMP aes API for AES functionality
  crypto: Add Xilinx AES driver

 .../devicetree/bindings/crypto/xlnx,zynqmp-aes.txt |  12 +
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi             |   4 +
 drivers/crypto/Kconfig                             |  11 +
 drivers/crypto/Makefile                            |   1 +
 drivers/crypto/zynqmp-aes-gcm.c                    | 297 +++++++++++++++++++++
 drivers/firmware/xilinx/zynqmp.c                   |  23 ++
 include/linux/firmware/xlnx-zynqmp.h               |   2 +
 7 files changed, 350 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/xlnx,zynqmp-aes.txt
 create mode 100644 drivers/crypto/zynqmp-aes-gcm.c

-- 
1.8.3.1


^ permalink raw reply

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Michael S. Tsirkin @ 2019-09-01 10:17 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190901025815-mutt-send-email-mst@kernel.org>

On Sun, Sep 01, 2019 at 04:26:19AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> > On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > > 
> > > > (...)
> > > > 
> > > > > > 
> > > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > > of 4K, there might be issues.
> > > > > 
> > > > > Shouldn't be if they are following the spec. If not let's fix
> > > > > the broken parts.
> > > > > 
> > > > > > 
> > > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > > 
> > > > > > Thanks,
> > > > > > Stefano
> > > > > 
> > > > > Why would a remote care about buffer sizes?
> > > > > 
> > > > > Let's first see what the issues are. If they exist
> > > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > > 
> > > > 
> > > > The vhost_transport '.stream_enqueue' callback
> > > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > > passing the user message. This function allocates a new packet, copying
> > > > the user message, but (before this series) it limits the packet size to
> > > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > > 
> > > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > 					  struct virtio_vsock_pkt_info *info)
> > > > {
> > > >  ...
> > > > 	/* we can send less than pkt_len bytes */
> > > > 	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > 		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > > 
> > > > 	/* virtio_transport_get_credit might return less than pkt_len credit */
> > > > 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > 
> > > > 	/* Do not send zero length OP_RW pkt */
> > > > 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > 		return pkt_len;
> > > >  ...
> > > > }
> > > > 
> > > > then it queues the packet for the TX worker calling .send_pkt()
> > > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > > 
> > > > The main function executed by the TX worker is
> > > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > > and it tries to copy the packet (up to 4K) on it.  If the buffer
> > > > allocated from the guest will be smaller then 4K, I think here it will
> > > > be discarded with an error:
> > > > 
> > 
> > I'm adding more lines to explain better.
> > 
> > > > static void
> > > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > > 				struct vhost_virtqueue *vq)
> > > > {
> > 		...
> > 
> > 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> > 					 &out, &in, NULL, NULL);
> > 
> > 		...
> > 
> > 		len = iov_length(&vq->iov[out], in);
> > 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> > 
> > 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> > 		if (nbytes != sizeof(pkt->hdr)) {
> > 			virtio_transport_free_pkt(pkt);
> > 			vq_err(vq, "Faulted on copying pkt hdr\n");
> > 			break;
> > 		}
> > 
> > > >  ...
> > > > 		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > > 
> > > isn't pck len the actual length though?
> > > 
> > 
> > It is the length of the packet that we are copying in the guest RX
> > buffers pointed by the iov_iter. The guest allocates an iovec with 2
> > buffers, one for the header and one for the payload (4KB).
> 
> BTW at the moment that forces another kmalloc within virtio core. Maybe
> vsock needs a flag to skip allocation in this case.  Worth benchmarking.
> See virtqueue_use_indirect which just does total_sg > 1.
> 
> > 
> > > > 		if (nbytes != pkt->len) {
> > > > 			virtio_transport_free_pkt(pkt);
> > > > 			vq_err(vq, "Faulted on copying pkt buf\n");
> > > > 			break;
> > > > 		}
> > > >  ...
> > > > }
> > > > 
> > > > 
> > > > This series changes this behavior since now we will split the packet in
> > > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > > virtqueue.
> > > > 
> > > > We didn't change the buffer size in this series, so we still backward
> > > > compatible, but if we will use buffers smaller than 4K, we should
> > > > encounter the error described above.
> 
> So that's an implementation bug then? It made an assumption
> of a 4K sized buffer? Or even PAGE_SIZE sized buffer?

Assuming we miss nothing and buffers < 4K are broken,
I think we need to add this to the spec, possibly with
a feature bit to relax the requirement that all buffers
are at least 4k in size.

> 
> > > > 
> > > > How do you suggest we proceed if we want to change the buffer size?
> > > > Maybe adding a feature to "support any buffer size"?
> > > > 
> > > > Thanks,
> > > > Stefano
> > > 
> > > 
> > 
> > -- 

^ permalink raw reply

* [PATCH net-next] r8169: don't set bit RxVlan on RTL8125
From: Heiner Kallweit @ 2019-09-01  8:42 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org

RTL8125 uses a different register for VLAN offloading config,
therefore don't set bit RxVlan.

Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 7d1094ffd..74f81fe03 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -7163,8 +7163,10 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		NETIF_F_HIGHDMA;
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 
-	tp->cp_cmd |= RxChkSum | RxVlan;
-
+	tp->cp_cmd |= RxChkSum;
+	/* RTL8125 uses register RxConfig for VLAN offloading config */
+	if (!rtl_is_8125(tp))
+		tp->cp_cmd |= RxVlan;
 	/*
 	 * Pretend we are using VLANs; This bypasses a nasty bug where
 	 * Interrupts stop flowing on high load on 8110SCd controllers.
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Michael S. Tsirkin @ 2019-09-01  8:26 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190801133616.sik5drn6ecesukbb@steredhat>

On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > 
> > > (...)
> > > 
> > > > > 
> > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > of 4K, there might be issues.
> > > > 
> > > > Shouldn't be if they are following the spec. If not let's fix
> > > > the broken parts.
> > > > 
> > > > > 
> > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > 
> > > > > Thanks,
> > > > > Stefano
> > > > 
> > > > Why would a remote care about buffer sizes?
> > > > 
> > > > Let's first see what the issues are. If they exist
> > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > 
> > > 
> > > The vhost_transport '.stream_enqueue' callback
> > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > passing the user message. This function allocates a new packet, copying
> > > the user message, but (before this series) it limits the packet size to
> > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > 
> > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > 					  struct virtio_vsock_pkt_info *info)
> > > {
> > >  ...
> > > 	/* we can send less than pkt_len bytes */
> > > 	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > 		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > 
> > > 	/* virtio_transport_get_credit might return less than pkt_len credit */
> > > 	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > 
> > > 	/* Do not send zero length OP_RW pkt */
> > > 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > 		return pkt_len;
> > >  ...
> > > }
> > > 
> > > then it queues the packet for the TX worker calling .send_pkt()
> > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > 
> > > The main function executed by the TX worker is
> > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > and it tries to copy the packet (up to 4K) on it.  If the buffer
> > > allocated from the guest will be smaller then 4K, I think here it will
> > > be discarded with an error:
> > > 
> 
> I'm adding more lines to explain better.
> 
> > > static void
> > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > 				struct vhost_virtqueue *vq)
> > > {
> 		...
> 
> 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> 					 &out, &in, NULL, NULL);
> 
> 		...
> 
> 		len = iov_length(&vq->iov[out], in);
> 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> 
> 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> 		if (nbytes != sizeof(pkt->hdr)) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Faulted on copying pkt hdr\n");
> 			break;
> 		}
> 
> > >  ...
> > > 		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > 
> > isn't pck len the actual length though?
> > 
> 
> It is the length of the packet that we are copying in the guest RX
> buffers pointed by the iov_iter. The guest allocates an iovec with 2
> buffers, one for the header and one for the payload (4KB).

BTW at the moment that forces another kmalloc within virtio core. Maybe
vsock needs a flag to skip allocation in this case.  Worth benchmarking.
See virtqueue_use_indirect which just does total_sg > 1.

> 
> > > 		if (nbytes != pkt->len) {
> > > 			virtio_transport_free_pkt(pkt);
> > > 			vq_err(vq, "Faulted on copying pkt buf\n");
> > > 			break;
> > > 		}
> > >  ...
> > > }
> > > 
> > > 
> > > This series changes this behavior since now we will split the packet in
> > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > virtqueue.
> > > 
> > > We didn't change the buffer size in this series, so we still backward
> > > compatible, but if we will use buffers smaller than 4K, we should
> > > encounter the error described above.

So that's an implementation bug then? It made an assumption
of a 4K sized buffer? Or even PAGE_SIZE sized buffer?


> > > 
> > > How do you suggest we proceed if we want to change the buffer size?
> > > Maybe adding a feature to "support any buffer size"?
> > > 
> > > Thanks,
> > > Stefano
> > 
> > 
> 
> -- 

^ permalink raw reply

* Re: [PATCH 2/2] Fix a NULL-ptr-deref bug in ath10k_usb_alloc_urb_from_pipe
From: Kalle Valo @ 2019-09-01  8:06 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Hui Peng, davem, Mathias Payer, ath10k, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <20190831213139.GA32507@roeck-us.net>

Guenter Roeck <linux@roeck-us.net> writes:

> Hi,
>
> On Sat, Aug 03, 2019 at 08:31:01PM -0400, Hui Peng wrote:
>> The `ar_usb` field of `ath10k_usb_pipe_usb_pipe` objects
>> are initialized to point to the containing `ath10k_usb` object
>> according to endpoint descriptors read from the device side, as shown
>> below in `ath10k_usb_setup_pipe_resources`:
>> 
>> for (i = 0; i < iface_desc->desc.bNumEndpoints; ++i) {
>>         endpoint = &iface_desc->endpoint[i].desc;
>> 
>>         // get the address from endpoint descriptor
>>         pipe_num = ath10k_usb_get_logical_pipe_num(ar_usb,
>>                                                 endpoint->bEndpointAddress,
>>                                                 &urbcount);
>>         ......
>>         // select the pipe object
>>         pipe = &ar_usb->pipes[pipe_num];
>> 
>>         // initialize the ar_usb field
>>         pipe->ar_usb = ar_usb;
>> }
>> 
>> The driver assumes that the addresses reported in endpoint
>> descriptors from device side  to be complete. If a device is
>> malicious and does not report complete addresses, it may trigger
>> NULL-ptr-deref `ath10k_usb_alloc_urb_from_pipe` and
>> `ath10k_usb_free_urb_to_pipe`.
>> 
>> This patch fixes the bug by preventing potential NULL-ptr-deref.
>> 
>> Signed-off-by: Hui Peng <benquike@gmail.com>
>> Reported-by: Hui Peng <benquike@gmail.com>
>> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
>
> This patch fixes CVE-2019-15099, which has CVSS scores of 7.5 (CVSS 3.0)
> and 7.8 (CVSS 2.0). Yet, I don't find it in the upstream kernel or in Linux
> next.
>
> Is the patch going to be applied to the upstream kernel anytime soon ?

Same answer as in patch 1:

https://patchwork.kernel.org/patch/11074655/

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH] Fix a double free bug in rsi_91x_deinit
From: Kalle Valo @ 2019-09-01  8:03 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Hui Peng, security, Mathias Payer, David S. Miller,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <20190831181852.GA22160@roeck-us.net>

Guenter Roeck <linux@roeck-us.net> writes:

> On Mon, Aug 19, 2019 at 06:02:29PM -0400, Hui Peng wrote:
>> `dev` (struct rsi_91x_usbdev *) field of adapter
>> (struct rsi_91x_usbdev *) is allocated  and initialized in
>> `rsi_init_usb_interface`. If any error is detected in information
>> read from the device side,  `rsi_init_usb_interface` will be
>> freed. However, in the higher level error handling code in
>> `rsi_probe`, if error is detected, `rsi_91x_deinit` is called
>> again, in which `dev` will be freed again, resulting double free.
>> 
>> This patch fixes the double free by removing the free operation on
>> `dev` in `rsi_init_usb_interface`, because `rsi_91x_deinit` is also
>> used in `rsi_disconnect`, in that code path, the `dev` field is not
>>  (and thus needs to be) freed.
>> 
>> This bug was found in v4.19, but is also present in the latest version
>> of kernel.
>> 
>> Reported-by: Hui Peng <benquike@gmail.com>
>> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
>> Signed-off-by: Hui Peng <benquike@gmail.com>
>
> FWIW:
>
> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
>
> This patch is listed as fix for CVE-2019-15504, which has a CVSS 2.0 score
> of 10.0 (high) and CVSS 3.0 score of 9.8 (critical).

A double free in error path is considered as a critical CVE issue? I'm
very curious, why is that?

> Are there any plans to apply this patch to the upstream kernel anytime
> soon ?

I was on vacation last week and hence I have not been able to apply any
wireless patches. I should be able to catch up next week.

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH 1/2] Fix a NULL-ptr-deref bug in ath6kl_usb_alloc_urb_from_pipe
From: Kalle Valo @ 2019-09-01  7:58 UTC (permalink / raw)
  To: Hui Peng
  Cc: Guenter Roeck, David S. Miller, Mathias Payer, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <CAKpmkkXyhuTviRfJG9dG-=Pt0KKdoHaxhXdvW9tSadOoKfnP1w@mail.gmail.com>

Hui Peng <benquike@gmail.com> writes:

> The reason that this patch is still in the pending state is that it
> has not reviewed by maintainers (they are not responding).
> @Greg: can we apply it?

Who is "we" in this case? But anyway, I'll review the patch and if it's
ok I'll take it through my ath.git tree, as normal.

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH 1/2] Fix a NULL-ptr-deref bug in ath6kl_usb_alloc_urb_from_pipe
From: Kalle Valo @ 2019-09-01  7:55 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Hui Peng, davem, Mathias Payer, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <20190831180219.GA20860@roeck-us.net>

Guenter Roeck <linux@roeck-us.net> writes:

> On Sat, Aug 03, 2019 at 08:29:04PM -0400, Hui Peng wrote:
>> The `ar_usb` field of `ath6kl_usb_pipe_usb_pipe` objects
>> are initialized to point to the containing `ath6kl_usb` object
>> according to endpoint descriptors read from the device side, as shown
>> below in `ath6kl_usb_setup_pipe_resources`:
>> 
>> for (i = 0; i < iface_desc->desc.bNumEndpoints; ++i) {
>> 	endpoint = &iface_desc->endpoint[i].desc;
>> 
>> 	// get the address from endpoint descriptor
>> 	pipe_num = ath6kl_usb_get_logical_pipe_num(ar_usb,
>> 						endpoint->bEndpointAddress,
>> 						&urbcount);
>> 	......
>> 	// select the pipe object
>> 	pipe = &ar_usb->pipes[pipe_num];
>> 
>> 	// initialize the ar_usb field
>> 	pipe->ar_usb = ar_usb;
>> }
>> 
>> The driver assumes that the addresses reported in endpoint
>> descriptors from device side  to be complete. If a device is
>> malicious and does not report complete addresses, it may trigger
>> NULL-ptr-deref `ath6kl_usb_alloc_urb_from_pipe` and
>> `ath6kl_usb_free_urb_to_pipe`.
>> 
>> This patch fixes the bug by preventing potential NULL-ptr-deref.
>> 
>> Signed-off-by: Hui Peng <benquike@gmail.com>
>> Reported-by: Hui Peng <benquike@gmail.com>
>> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net>
>
> I don't see this patch in the upstream kernel or in -next.
>
> At the same time, it is supposed to fix CVE-2019-15098, which has
> a CVSS v2.0 score of 7.8 (high).
>
> Is this patch going to be applied to the upstream kernel ?

Lately I have been very busy and I have not had a chance to apply ath6kl
nor ath10k patches. This patch is on my queue and my plan is to go
through my patch queue next week.

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [patch net-next v2] mlx5: Add missing init_net check in FIB notifier
From: Roi Dayan @ 2019-09-01  7:38 UTC (permalink / raw)
  To: Jiri Pirko, netdev@vger.kernel.org
  Cc: davem@davemloft.net, Saeed Mahameed, leon@kernel.org, mlxsw
In-Reply-To: <20190830082530.8958-1-jiri@resnulli.us>



On 2019-08-30 11:25 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Take only FIB events that are happening in init_net into account. No other
> namespaces are supported.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
> v1->v2:
> - no change, just cced maintainers (fat finger made me avoid them in v1)
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
> index e69766393990..5d20d615663e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
> @@ -248,6 +248,9 @@ static int mlx5_lag_fib_event(struct notifier_block *nb,
>  	struct net_device *fib_dev;
>  	struct fib_info *fi;
>  
> +	if (!net_eq(info->net, &init_net))
> +		return NOTIFY_DONE;
> +
>  	if (info->family != AF_INET)
>  		return NOTIFY_DONE;
>  
> 

thanks

Acked-by: Roi Dayan <roid@mellanox.com>

^ permalink raw reply

* [PATCHv2 1/1] forcedeth: use per cpu to collect xmit/recv statistics
From: Zhu Yanjun @ 2019-09-01  7:26 UTC (permalink / raw)
  To: eric.dumazet, davem, netdev
In-Reply-To: <1567322773-5183-1-git-send-email-yanjun.zhu@oracle.com>

When testing with a background iperf pushing 1Gbit/sec traffic and running
both ifconfig and netstat to collect statistics, some deadlocks occurred.

Ifconfig and netstat will call nv_get_stats64 to get software xmit/recv
statistics. In the commit f5d827aece36 ("forcedeth: implement
ndo_get_stats64() API"), the normal tx/rx variables is to collect tx/rx
statistics. The fix is to replace normal tx/rx variables with per
cpu 64-bit variable to collect xmit/recv statistics. The per cpu variable
will avoid deadlocks and provide fast efficient statistics updates.

In nv_probe, the per cpu variable is initialized. In nv_remove, this
per cpu variable is freed.

In xmit/recv process, this per cpu variable will be updated.

In nv_get_stats64, this per cpu variable on each cpu is added up. Then
the driver can get xmit/recv packets statistics.

A test runs for several days with this commit, the deadlocks disappear
and the performance is better.

Tested:
   - iperf SMP x86_64 ->
   Client connecting to 1.1.1.108, TCP port 5001
   TCP window size: 85.0 KByte (default)
   ------------------------------------------------------------
   [  3] local 1.1.1.105 port 38888 connected with 1.1.1.108 port 5001
   [ ID] Interval       Transfer     Bandwidth
   [  3]  0.0-10.0 sec  1.10 GBytes   943 Mbits/sec

   ifconfig results:

   enp0s9 Link encap:Ethernet  HWaddr 00:21:28:6f:de:0f
          inet addr:1.1.1.105  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5774764531 errors:0 dropped:0 overruns:0 frame:0
          TX packets:633534193 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7646159340904 (7.6 TB) TX bytes:11425340407722 (11.4 TB)

   netstat results:

   Kernel Interface table
   Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
   ...
   enp0s9 1500 0  5774764531 0    0 0      633534193      0      0  0 BMRU
   ...

Fixes: f5d827aece36 ("forcedeth: implement ndo_get_stats64() API")
CC: Joe Jin <joe.jin@oracle.com>
CC: JUNXIAO_BI <junxiao.bi@oracle.com>
Reported-and-tested-by: Nan san <nan.1986san@gmail.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
V1->V2: Following Eric's advice fix the problem "If the loops are ever
	 restarted, the storage->fields will have been modified multiple
	 times."
---
 drivers/net/ethernet/nvidia/forcedeth.c | 143 ++++++++++++++++++++++----------
 1 file changed, 99 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index b327b29..07dd017 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -713,6 +713,21 @@ struct nv_skb_map {
 	struct nv_skb_map *next_tx_ctx;
 };
 
+struct nv_txrx_stats {
+	u64 stat_rx_packets;
+	u64 stat_rx_bytes; /* not always available in HW */
+	u64 stat_rx_missed_errors;
+	u64 stat_rx_dropped;
+	u64 stat_tx_packets; /* not always available in HW */
+	u64 stat_tx_bytes;
+	u64 stat_tx_dropped;
+};
+
+#define nv_txrx_stats_inc(member) \
+		__this_cpu_inc(np->txrx_stats->member)
+#define nv_txrx_stats_add(member, count) \
+		__this_cpu_add(np->txrx_stats->member, (count))
+
 /*
  * SMP locking:
  * All hardware access under netdev_priv(dev)->lock, except the performance
@@ -797,10 +812,7 @@ struct fe_priv {
 
 	/* RX software stats */
 	struct u64_stats_sync swstats_rx_syncp;
-	u64 stat_rx_packets;
-	u64 stat_rx_bytes; /* not always available in HW */
-	u64 stat_rx_missed_errors;
-	u64 stat_rx_dropped;
+	struct nv_txrx_stats __percpu *txrx_stats;
 
 	/* media detection workaround.
 	 * Locking: Within irq hander or disable_irq+spin_lock(&np->lock);
@@ -826,9 +838,6 @@ struct fe_priv {
 
 	/* TX software stats */
 	struct u64_stats_sync swstats_tx_syncp;
-	u64 stat_tx_packets; /* not always available in HW */
-	u64 stat_tx_bytes;
-	u64 stat_tx_dropped;
 
 	/* msi/msi-x fields */
 	u32 msi_flags;
@@ -1721,6 +1730,39 @@ static void nv_update_stats(struct net_device *dev)
 	}
 }
 
+static inline void nv_get_stats(int cpu, struct fe_priv *np,
+				struct rtnl_link_stats64 *storage)
+{
+	struct nv_txrx_stats *src = per_cpu_ptr(np->txrx_stats, cpu);
+	unsigned int syncp_start;
+	u64 rx_packets, rx_bytes, rx_dropped, rx_missed_errors;
+	u64 tx_packets, tx_bytes, tx_dropped;
+
+	do {
+		syncp_start = u64_stats_fetch_begin_irq(&np->swstats_rx_syncp);
+		rx_packets       = src->stat_rx_packets;
+		rx_bytes         = src->stat_rx_bytes;
+		rx_dropped       = src->stat_rx_dropped;
+		rx_missed_errors = src->stat_rx_missed_errors;
+	} while (u64_stats_fetch_retry_irq(&np->swstats_rx_syncp, syncp_start));
+
+	storage->rx_packets       += rx_packets;
+	storage->rx_bytes         += rx_bytes;
+	storage->rx_dropped       += rx_dropped;
+	storage->rx_missed_errors += rx_missed_errors;
+
+	do {
+		syncp_start = u64_stats_fetch_begin_irq(&np->swstats_tx_syncp);
+		tx_packets  = src->stat_tx_packets;
+		tx_bytes    = src->stat_tx_bytes;
+		tx_dropped  = src->stat_tx_dropped;
+	} while (u64_stats_fetch_retry_irq(&np->swstats_tx_syncp, syncp_start));
+
+	storage->tx_packets += tx_packets;
+	storage->tx_bytes   += tx_bytes;
+	storage->tx_dropped += tx_dropped;
+}
+
 /*
  * nv_get_stats64: dev->ndo_get_stats64 function
  * Get latest stats value from the nic.
@@ -1733,7 +1775,7 @@ nv_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage)
 	__releases(&netdev_priv(dev)->hwstats_lock)
 {
 	struct fe_priv *np = netdev_priv(dev);
-	unsigned int syncp_start;
+	int cpu;
 
 	/*
 	 * Note: because HW stats are not always available and for
@@ -1746,20 +1788,8 @@ nv_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage)
 	 */
 
 	/* software stats */
-	do {
-		syncp_start = u64_stats_fetch_begin_irq(&np->swstats_rx_syncp);
-		storage->rx_packets       = np->stat_rx_packets;
-		storage->rx_bytes         = np->stat_rx_bytes;
-		storage->rx_dropped       = np->stat_rx_dropped;
-		storage->rx_missed_errors = np->stat_rx_missed_errors;
-	} while (u64_stats_fetch_retry_irq(&np->swstats_rx_syncp, syncp_start));
-
-	do {
-		syncp_start = u64_stats_fetch_begin_irq(&np->swstats_tx_syncp);
-		storage->tx_packets = np->stat_tx_packets;
-		storage->tx_bytes   = np->stat_tx_bytes;
-		storage->tx_dropped = np->stat_tx_dropped;
-	} while (u64_stats_fetch_retry_irq(&np->swstats_tx_syncp, syncp_start));
+	for_each_online_cpu(cpu)
+		nv_get_stats(cpu, np, storage);
 
 	/* If the nic supports hw counters then retrieve latest values */
 	if (np->driver_data & DEV_HAS_STATISTICS_V123) {
@@ -1827,7 +1857,7 @@ static int nv_alloc_rx(struct net_device *dev)
 		} else {
 packet_dropped:
 			u64_stats_update_begin(&np->swstats_rx_syncp);
-			np->stat_rx_dropped++;
+			nv_txrx_stats_inc(stat_rx_dropped);
 			u64_stats_update_end(&np->swstats_rx_syncp);
 			return 1;
 		}
@@ -1869,7 +1899,7 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 		} else {
 packet_dropped:
 			u64_stats_update_begin(&np->swstats_rx_syncp);
-			np->stat_rx_dropped++;
+			nv_txrx_stats_inc(stat_rx_dropped);
 			u64_stats_update_end(&np->swstats_rx_syncp);
 			return 1;
 		}
@@ -2013,7 +2043,7 @@ static void nv_drain_tx(struct net_device *dev)
 		}
 		if (nv_release_txskb(np, &np->tx_skb[i])) {
 			u64_stats_update_begin(&np->swstats_tx_syncp);
-			np->stat_tx_dropped++;
+			nv_txrx_stats_inc(stat_tx_dropped);
 			u64_stats_update_end(&np->swstats_tx_syncp);
 		}
 		np->tx_skb[i].dma = 0;
@@ -2227,7 +2257,7 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			/* on DMA mapping error - drop the packet */
 			dev_kfree_skb_any(skb);
 			u64_stats_update_begin(&np->swstats_tx_syncp);
-			np->stat_tx_dropped++;
+			nv_txrx_stats_inc(stat_tx_dropped);
 			u64_stats_update_end(&np->swstats_tx_syncp);
 			return NETDEV_TX_OK;
 		}
@@ -2273,7 +2303,7 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 				dev_kfree_skb_any(skb);
 				np->put_tx_ctx = start_tx_ctx;
 				u64_stats_update_begin(&np->swstats_tx_syncp);
-				np->stat_tx_dropped++;
+				nv_txrx_stats_inc(stat_tx_dropped);
 				u64_stats_update_end(&np->swstats_tx_syncp);
 				return NETDEV_TX_OK;
 			}
@@ -2384,7 +2414,7 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
 			/* on DMA mapping error - drop the packet */
 			dev_kfree_skb_any(skb);
 			u64_stats_update_begin(&np->swstats_tx_syncp);
-			np->stat_tx_dropped++;
+			nv_txrx_stats_inc(stat_tx_dropped);
 			u64_stats_update_end(&np->swstats_tx_syncp);
 			return NETDEV_TX_OK;
 		}
@@ -2431,7 +2461,7 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
 				dev_kfree_skb_any(skb);
 				np->put_tx_ctx = start_tx_ctx;
 				u64_stats_update_begin(&np->swstats_tx_syncp);
-				np->stat_tx_dropped++;
+				nv_txrx_stats_inc(stat_tx_dropped);
 				u64_stats_update_end(&np->swstats_tx_syncp);
 				return NETDEV_TX_OK;
 			}
@@ -2560,9 +2590,12 @@ static int nv_tx_done(struct net_device *dev, int limit)
 					    && !(flags & NV_TX_RETRYCOUNT_MASK))
 						nv_legacybackoff_reseed(dev);
 				} else {
+					unsigned int len;
+
 					u64_stats_update_begin(&np->swstats_tx_syncp);
-					np->stat_tx_packets++;
-					np->stat_tx_bytes += np->get_tx_ctx->skb->len;
+					nv_txrx_stats_inc(stat_tx_packets);
+					len = np->get_tx_ctx->skb->len;
+					nv_txrx_stats_add(stat_tx_bytes, len);
 					u64_stats_update_end(&np->swstats_tx_syncp);
 				}
 				bytes_compl += np->get_tx_ctx->skb->len;
@@ -2577,9 +2610,12 @@ static int nv_tx_done(struct net_device *dev, int limit)
 					    && !(flags & NV_TX2_RETRYCOUNT_MASK))
 						nv_legacybackoff_reseed(dev);
 				} else {
+					unsigned int len;
+
 					u64_stats_update_begin(&np->swstats_tx_syncp);
-					np->stat_tx_packets++;
-					np->stat_tx_bytes += np->get_tx_ctx->skb->len;
+					nv_txrx_stats_inc(stat_tx_packets);
+					len = np->get_tx_ctx->skb->len;
+					nv_txrx_stats_add(stat_tx_bytes, len);
 					u64_stats_update_end(&np->swstats_tx_syncp);
 				}
 				bytes_compl += np->get_tx_ctx->skb->len;
@@ -2627,9 +2663,12 @@ static int nv_tx_done_optimized(struct net_device *dev, int limit)
 						nv_legacybackoff_reseed(dev);
 				}
 			} else {
+				unsigned int len;
+
 				u64_stats_update_begin(&np->swstats_tx_syncp);
-				np->stat_tx_packets++;
-				np->stat_tx_bytes += np->get_tx_ctx->skb->len;
+				nv_txrx_stats_inc(stat_tx_packets);
+				len = np->get_tx_ctx->skb->len;
+				nv_txrx_stats_add(stat_tx_bytes, len);
 				u64_stats_update_end(&np->swstats_tx_syncp);
 			}
 
@@ -2806,6 +2845,15 @@ static int nv_getlen(struct net_device *dev, void *packet, int datalen)
 	}
 }
 
+static inline void rx_missing_handler(u32 flags, struct fe_priv *np)
+{
+	if (flags & NV_RX_MISSEDFRAME) {
+		u64_stats_update_begin(&np->swstats_rx_syncp);
+		nv_txrx_stats_inc(stat_rx_missed_errors);
+		u64_stats_update_end(&np->swstats_rx_syncp);
+	}
+}
+
 static int nv_rx_process(struct net_device *dev, int limit)
 {
 	struct fe_priv *np = netdev_priv(dev);
@@ -2848,11 +2896,7 @@ static int nv_rx_process(struct net_device *dev, int limit)
 					}
 					/* the rest are hard errors */
 					else {
-						if (flags & NV_RX_MISSEDFRAME) {
-							u64_stats_update_begin(&np->swstats_rx_syncp);
-							np->stat_rx_missed_errors++;
-							u64_stats_update_end(&np->swstats_rx_syncp);
-						}
+						rx_missing_handler(flags, np);
 						dev_kfree_skb(skb);
 						goto next_pkt;
 					}
@@ -2896,8 +2940,8 @@ static int nv_rx_process(struct net_device *dev, int limit)
 		skb->protocol = eth_type_trans(skb, dev);
 		napi_gro_receive(&np->napi, skb);
 		u64_stats_update_begin(&np->swstats_rx_syncp);
-		np->stat_rx_packets++;
-		np->stat_rx_bytes += len;
+		nv_txrx_stats_inc(stat_rx_packets);
+		nv_txrx_stats_add(stat_rx_bytes, len);
 		u64_stats_update_end(&np->swstats_rx_syncp);
 next_pkt:
 		if (unlikely(np->get_rx.orig++ == np->last_rx.orig))
@@ -2982,8 +3026,8 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 			}
 			napi_gro_receive(&np->napi, skb);
 			u64_stats_update_begin(&np->swstats_rx_syncp);
-			np->stat_rx_packets++;
-			np->stat_rx_bytes += len;
+			nv_txrx_stats_inc(stat_rx_packets);
+			nv_txrx_stats_add(stat_rx_bytes, len);
 			u64_stats_update_end(&np->swstats_rx_syncp);
 		} else {
 			dev_kfree_skb(skb);
@@ -5651,6 +5695,12 @@ static int nv_probe(struct pci_dev *pci_dev, const struct pci_device_id *id)
 	SET_NETDEV_DEV(dev, &pci_dev->dev);
 	u64_stats_init(&np->swstats_rx_syncp);
 	u64_stats_init(&np->swstats_tx_syncp);
+	np->txrx_stats = alloc_percpu(struct nv_txrx_stats);
+	if (!np->txrx_stats) {
+		pr_err("np->txrx_stats, alloc memory error.\n");
+		err = -ENOMEM;
+		goto out_alloc_percpu;
+	}
 
 	timer_setup(&np->oom_kick, nv_do_rx_refill, 0);
 	timer_setup(&np->nic_poll, nv_do_nic_poll, 0);
@@ -6060,6 +6110,8 @@ static int nv_probe(struct pci_dev *pci_dev, const struct pci_device_id *id)
 out_disable:
 	pci_disable_device(pci_dev);
 out_free:
+	free_percpu(np->txrx_stats);
+out_alloc_percpu:
 	free_netdev(dev);
 out:
 	return err;
@@ -6105,6 +6157,9 @@ static void nv_restore_mac_addr(struct pci_dev *pci_dev)
 static void nv_remove(struct pci_dev *pci_dev)
 {
 	struct net_device *dev = pci_get_drvdata(pci_dev);
+	struct fe_priv *np = netdev_priv(dev);
+
+	free_percpu(np->txrx_stats);
 
 	unregister_netdev(dev);
 
-- 
2.7.4


^ permalink raw reply related

* [PATCHv2 0/1] Fix deadlock problem and make performance better
From: Zhu Yanjun @ 2019-09-01  7:26 UTC (permalink / raw)
  To: eric.dumazet, davem, netdev

When running with about 1Gbit/ses for very long time, running ifconfig
and netstat causes dead lock. These symptoms are similar to the
commit 5f6b4e14cada ("net: dsa: User per-cpu 64-bit statistics"). After
replacing network devices statistics with per-cpu 64-bit statistics,
the dead locks disappear even after very long time running with 1Gbit/sec.

Based on Eric's advice, "If the loops are ever restarted, the
storage->fields will have been modified multiple times.".

A similar change in the commit 5f6b4e14cada ("net: dsa: User per-cpu
64-bit statistics") is borrowed to fix the above problem.

Zhu Yanjun (1):
  forcedeth: use per cpu to collect xmit/recv statistics

 drivers/net/ethernet/nvidia/forcedeth.c | 143 ++++++++++++++++++++++----------
 1 file changed, 99 insertions(+), 44 deletions(-)

-- 
2.7.4


^ permalink raw reply

* Re: [PATCH net] rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
From: David Howells @ 2019-09-01  7:14 UTC (permalink / raw)
  To: Hillf Danton; +Cc: dhowells, netdev, marc.dionne, linux-afs, linux-kernel
In-Reply-To: <20190901020519.2392-1-hdanton@sina.com>

Hillf Danton <hdanton@sina.com> wrote:

> > It's certainly possible that that can happen.  The reaper is per
> > network-namespace.
> > 
> > conn->params.local holds a ref on the local endpoint.
> > 
> Then local endpoint can not become dead without connection reaper
> running first, because of the ref held by connection. When it is
> dead, however, there is no need to run reaper directly (rather than
> through a workqueue).

The reaper is per-net_ns, not per-local.  There may be more than one local
endpoint in a net_ns and they share the list of service connections.

David

^ permalink raw reply

* Re: [PATCH net 7/7] rxrpc: Use skb_unshare() rather than skb_cow_data()
From: David Howells @ 2019-09-01  7:11 UTC (permalink / raw)
  To: Hillf Danton; +Cc: dhowells, netdev, linux-afs, linux-kernel
In-Reply-To: <20190901065603.432-1-hdanton@sina.com>

Hillf Danton <hdanton@sina.com> wrote:

> > +		/* Unshare the packet so that it can be modified for in-place
> > +		 * decryption.
> > +		 */
> > +		if (sp->hdr.securityIndex != 0) {
> > +			struct sk_buff *nskb = skb_unshare(skb, GFP_ATOMIC);
> > +			if (!nskb) {
> > +				rxrpc_eaten_skb(skb, rxrpc_skb_unshared_nomem);
> > +				goto out;
> > +			}
> > +
> > +			if (nskb != skb) {
> > +				rxrpc_eaten_skb(skb, rxrpc_skb_received);
> > +				rxrpc_new_skb(skb, rxrpc_skb_unshared);
> > +				skb = nskb;
> > +				sp = rxrpc_skb(skb);
> > +			}
> > +		}
> 
> Unsharing skb makes it perilous to take a peep at it afterwards.

Ah, good point.  rxrpc_new_skb() should be after the assignment.

David

^ permalink raw reply

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Michael S. Tsirkin @ 2019-09-01  6:56 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, netdev, linux-kernel, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190830094059.c7qo5cxrp2nkrncd@steredhat>

On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > Since virtio-vsock was introduced, the buffers filled by the host
> > > and pushed to the guest using the vring, are directly queued in
> > > a per-socket list. These buffers are preallocated by the guest
> > > with a fixed size (4 KB).
> > > 
> > > The maximum amount of memory used by each socket should be
> > > controlled by the credit mechanism.
> > > The default credit available per-socket is 256 KB, but if we use
> > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > guest will continue to fill the vring with new 4 KB free buffers
> > > to avoid starvation of other sockets.
> > > 
> > > This patch mitigates this issue copying the payload of small
> > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > order to avoid wasting memory.
> > > 
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > 
> > This is good enough for net-next, but for net I think we
> > should figure out how to address the issue completely.
> > Can we make the accounting precise? What happens to
> > performance if we do?
> > 
> 
> Since I'm back from holidays, I'm restarting this thread to figure out
> how to address the issue completely.
> 
> I did a better analysis of the credit mechanism that we implemented in
> virtio-vsock to get a clearer view and I'd share it with you:
> 
>     This issue affect only the "host->guest" path. In this case, when the
>     host wants to send a packet to the guest, it uses a "free" buffer
>     allocated by the guest (4KB).
>     The "free" buffers available for the host are shared between all
>     sockets, instead, the credit mechanism is per-socket, I think to
>     avoid the starvation of others sockets.
>     The guests re-fill the "free" queue when the available buffers are
>     less than half.
> 
>     Each peer have these variables in the per-socket state:
>        /* local vars */
>        buf_alloc        /* max bytes usable by this socket
>                            [exposed to the other peer] */
>        fwd_cnt          /* increased when RX packet is consumed by the
>                            user space [exposed to the other peer] */
>        tx_cnt 	        /* increased when TX packet is sent to the other peer */
> 
>        /* remote vars  */
>        peer_buf_alloc   /* peer's buf_alloc */
>        peer_fwd_cnt     /* peer's fwd_cnt */
> 
>     When a peer sends a packet, it increases the 'tx_cnt'; when the
>     receiver consumes the packet (copy it to the user-space buffer), it
>     increases the 'fwd_cnt'.
>     Note: increments are made considering the payload length and not the
>     buffer length.
> 
>     The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
>     all packet headers or with an explicit CREDIT_UPDATE packet.
> 
>     The local 'buf_alloc' value can be modified by the user space using
>     setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> 
>     Before to send a packet, the peer checks the space available:
>     	credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
>     and it will send up to credit_available bytes to the other peer.
> 
> Possible solutions considering Michael's advice:
> 1. Use the buffer length instead of the payload length when we increment
>    the counters:
>   - This approach will account precisely the memory used per socket.
>   - This requires changes in both guest and host.
>   - It is not compatible with old drivers, so a feature should be negotiated.
> 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
>    the socket queue but not used. (e.g. 256 byte used on 4K available in
>    the buffer)
>   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
>   - This should be compatible also with old drivers.
> 
> Maybe the second is less invasive, but will it be too tricky?
> Any other advice or suggestions?
> 
> Thanks in advance,
> Stefano

OK let me try to clarify.  The idea is this:

Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
means that in the worst case (128 byte packets), each byte of credit in
the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
to also account for the virtio_vsock_pkt since I think it's kept around
until userspace consumes it.

Thus given X buf alloc allowed in the socket, we should publish X/16
credits to the other side. This will ensure the other side does not send
more than X/16 bytes for a given socket and thus we won't need to
allocate more than X bytes to hold the data.

We can play with the copy break value to tweak this.




^ permalink raw reply

* Re: [PATCH net] tc-testing: don't hardcode 'ip' in nsPlugin.py
From: David Miller @ 2019-09-01  6:47 UTC (permalink / raw)
  To: dcaratti
  Cc: liuhangbin, mrv, vladbu, lucasb, nicolas.dichtel, netdev,
	marcelo.leitner
In-Reply-To: <8ade839e21c5231d2d6b8690b39587f802642306.1567180765.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Fri, 30 Aug 2019 18:51:47 +0200

> the following tdc test fails on Fedora:
> 
>  # ./tdc.py -e 2638
>   -- ns/SubPlugin.__init__
>  Test 2638: Add matchall and try to get it
>  -----> prepare stage *** Could not execute: "$TC qdisc add dev $DEV1 clsact"
>  -----> prepare stage *** Error message: "/bin/sh: ip: command not found"
>  returncode 127; expected [0]
>  -----> prepare stage *** Aborting test run.
> 
> Let nsPlugin.py use the 'IP' variable introduced with commit 92c1a19e2fb9
> ("tc-tests: added path to ip command in tdc"), so that the path to 'ip' is
> correctly resolved to the value we have in tdc_config.py.
> 
>  # ./tdc.py -e 2638
>   -- ns/SubPlugin.__init__
>  Test 2638: Add matchall and try to get it
>  All test results:
>  1..1
>  ok 1 2638 - Add matchall and try to get it
> 
> Fixes: 489ce2f42514 ("tc-testing: Restore original behaviour for namespaces in tdc")
> Reported-by: Hangbin Liu <liuhangbin@gmail.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/2] Minor cleanup in devlink
From: David Miller @ 2019-09-01  6:46 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: parav, netdev, jiri
In-Reply-To: <20190830231436.34f221b2@cakuba.netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Fri, 30 Aug 2019 23:14:36 -0700

> On Fri, 30 Aug 2019 05:39:43 -0500, Parav Pandit wrote:
>> Two minor cleanup in devlink.
>> 
>> Patch-1 Explicitly defines devlink port index as unsigned int
>> Patch-2 Uses switch-case to handle different port flavours attributes
> 
> Always nice to see one's comment addressed, even if it takes a while :)
> 
> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Series applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox