Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 04/19] net: usb: aqc111: Various callbacks implementation
From: Bjørn Mork @ 2018-10-09 13:27 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Igor Russkikh, David S . Miller, Dmitry Bezrukov,
	linux-usb@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <1539006434.10342.14.camel@suse.com>

Oliver Neukum <oneukum@suse.com> writes:

> On Fr, 2018-10-05 at 10:24 +0000, Igor Russkikh wrote:
>> From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
>> 
>> Reset, stop callbacks, driver unbind callback.
>> More register defines required for these callbacks.
>> 
>> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
>> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
>> ---
>>  drivers/net/usb/aqc111.c |  48 ++++++++++++++++++++++
>>  drivers/net/usb/aqc111.h | 101 +++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 149 insertions(+)
>> 
>> diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
>> index 7f3e5a615750..22bb259d71fb 100644
>> --- a/drivers/net/usb/aqc111.c
>> +++ b/drivers/net/usb/aqc111.c
>> @@ -169,12 +169,60 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
>>  
>>  static void aqc111_unbind(struct usbnet *dev, struct usb_interface *intf)
>>  {
>> +	u8 reg8;
>> +	u16 reg16;
>> +
>> +	/* Force bz */
>> +	reg16 = SFR_PHYPWR_RSTCTL_BZ;
>> +	aqc111_write_cmd_nopm(dev, AQ_ACCESS_MAC, SFR_PHYPWR_RSTCTL,
>> +			      2, 2, &reg16);
>
> No, I am sorry, you are doing DMA on the kernel stack. That is not
> allowed. These functions will all have to be fixed.

Huh?  No, he doesn't.  That's the whole point with
usbnet_read_cmd_nopm(), isn't it?



Bjørn

^ permalink raw reply

* Re: [PATCH net-next 03/19] net: usb: aqc111: Add implementation of read and write commands
From: Bjørn Mork @ 2018-10-09 13:33 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: David S . Miller, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org, Dmitry Bezrukov
In-Reply-To: <ebf0867900ae849581fbd20b52ee9e855e6345c8.1538734658.git.igor.russkikh@aquantia.com>

Igor Russkikh <Igor.Russkikh@aquantia.com> writes:

> +static int __aqc111_read_cmd(struct usbnet *dev, u8 cmd, u16 value,
> +			     u16 index, u16 size, void *data, int nopm)
> +{
> +	int ret;
> +	int (*fn)(struct usbnet *dev, u8 cmd, u8 reqtype, u16 value,
> +		  u16 index, void *data, u16 size);
> +
> +	if (nopm)
> +		fn = usbnet_read_cmd_nopm;
> +	else
> +		fn = usbnet_read_cmd;
> +
> +	ret = fn(dev, cmd, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE,
> +		 value, index, data, size);
> +	if (size == 2)
> +		le16_to_cpus(data);
> +
> +	if (unlikely(ret < 0))
> +		netdev_warn(dev->net,
> +			    "Failed to read(0x%x) reg index 0x%04x: %d\n",
> +			    cmd, index, ret);
> +	return ret;
> +}
> +
> +static int aqc111_read_cmd_nopm(struct usbnet *dev, u8 cmd, u16 value,
> +				u16 index, u16 size, void *data)
> +{
> +	return __aqc111_read_cmd(dev, cmd, value, index, size, data, 1);
> +}
> +
> +static int aqc111_read_cmd(struct usbnet *dev, u8 cmd, u16 value,
> +			   u16 index, u16 size, void *data)
> +{
> +	return __aqc111_read_cmd(dev, cmd, value, index, size, data, 0);
> +}
> +

Why would you want to do something like this instead of simply
implementing aqc111_read_cmd_nopm() and aqc111_read_cmd() as separate
functions?  The function pointer stuff is incredibly ugly, as Oliver
pointed out.  It wasn't done like that in usbnet.c, so why should we do
it like that here?

And the "if (size == 2) le16_to_cpus(data)" looks like something that
will come back and haunt you.  Will this code never read larger
integers?  Maybe add some sanity checks then, just in case...

Or simply add more helpers.  An additional pair of helpers for reading
16bit integers might simplify your code quite a bit.


Bjørn

^ permalink raw reply

* Re: [PATCH net-next 01/19] net: usb: aqc111: Driver skeleton for Aquantia AQtion USB to 5GbE
From: Bjørn Mork @ 2018-10-09 13:37 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: David S . Miller, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org, Dmitry Bezrukov
In-Reply-To: <d7ac7ff4c2fc479021286d0f7549d9b2e0aac803.1538734658.git.igor.russkikh@aquantia.com>

Igor Russkikh <Igor.Russkikh@aquantia.com> writes:

>> +static const struct driver_info aqc111_info = {
> +	.description	= "Aquantia AQtion USB to 5GbE Controller",
> +};
> +
> +#define AQC111_USB_ETH_DEV(vid, pid, table) \
> +	.match_flags = USB_DEVICE_ID_MATCH_DEVICE | \
> +			USB_DEVICE_ID_MATCH_INT_CLASS, \
> +	USB_DEVICE(vid, pid), \
> +	.bInterfaceClass = USB_CLASS_VENDOR_SPEC, \
> +	.driver_info = (unsigned long)&table, \
> +}, \
> +{ \
> +	.match_flags = USB_DEVICE_ID_MATCH_DEVICE | \
> +			USB_DEVICE_ID_MATCH_INT_INFO, \
> +	USB_DEVICE(vid, pid), \
> +	.bInterfaceClass = USB_CLASS_COMM, \
> +	.bInterfaceSubClass = USB_CDC_SUBCLASS_ETHERNET, \
> +	.bInterfaceProtocol = USB_CDC_PROTO_NONE
> +

Is the missing .driver_info for the CDC class intentional?  If so, then
why include it at all?



Bjørn

^ permalink raw reply

* Re: [PATCH net-next 09/19] net: usb: aqc111: Implement RX data path
From: Bjørn Mork @ 2018-10-09 13:39 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: David S . Miller, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org, Dmitry Bezrukov
In-Reply-To: <81aef6c50a93d22c6b91e966c89d7520dcbaaf87.1538734658.git.igor.russkikh@aquantia.com>

Igor Russkikh <Igor.Russkikh@aquantia.com> writes:

> From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
>
> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
> ---

You'd want some description here.



Bjørn

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2018-10-09 20:58 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David Miller, Networking, Linux-Next Mailing List,
	Linux Kernel Mailing List, Al Viro
In-Reply-To: <7cb98153-14df-96e4-edee-518775f49ec5@mojatatu.com>

[-- Attachment #1: Type: text/plain, Size: 237 bytes --]

Hi Jamal,

On Tue, 9 Oct 2018 06:02:25 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> Attached should fix it. Al, please double check.

OK, I will use that resolution from today.

Thanks.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH net-next] cxgb4: Add thermal zone support
From: Ganesh Goudar @ 2018-10-09 13:44 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, dt, Ganesh Goudar

Add thermal zone support to monitor ASIC's temperature.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/Makefile        |   1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |  18 ++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |   8 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c | 114 +++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h      |   1 +
 5 files changed, 142 insertions(+)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c

diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile b/drivers/net/ethernet/chelsio/cxgb4/Makefile
index bea6a05..91d8a88 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/Makefile
+++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile
@@ -12,3 +12,4 @@ cxgb4-objs := cxgb4_main.o l2t.o smt.o t4_hw.o sge.o clip_tbl.o cxgb4_ethtool.o
 cxgb4-$(CONFIG_CHELSIO_T4_DCB) +=  cxgb4_dcb.o
 cxgb4-$(CONFIG_CHELSIO_T4_FCOE) +=  cxgb4_fcoe.o
 cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o
+cxgb4-$(CONFIG_THERMAL) += cxgb4_thermal.o
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index b5010bd..95909f0 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -52,6 +52,7 @@
 #include <linux/ptp_clock_kernel.h>
 #include <linux/ptp_classify.h>
 #include <linux/crash_dump.h>
+#include <linux/thermal.h>
 #include <asm/io.h>
 #include "t4_chip_type.h"
 #include "cxgb4_uld.h"
@@ -890,6 +891,14 @@ struct mps_encap_entry {
 	atomic_t refcnt;
 };
 
+#ifdef CONFIG_THERMAL
+struct ch_thermal {
+	struct thermal_zone_device *tzdev;
+	int trip_temp;
+	int trip_type;
+};
+#endif
+
 struct adapter {
 	void __iomem *regs;
 	void __iomem *bar2;
@@ -1008,6 +1017,9 @@ struct adapter {
 
 	/* Dump buffer for collecting logs in kdump kernel */
 	struct vmcoredd_data vmcoredd;
+#ifdef CONFIG_THERMAL
+	struct ch_thermal ch_thermal;
+#endif
 };
 
 /* Support for "sched-class" command to allow a TX Scheduling Class to be
@@ -1862,4 +1874,10 @@ void cxgb4_ring_tx_db(struct adapter *adap, struct sge_txq *q, int n);
 int t4_set_vlan_acl(struct adapter *adap, unsigned int mbox, unsigned int vf,
 		    u16 vlan);
 int cxgb4_dcb_enabled(const struct net_device *dev);
+
+#ifdef CONFIG_THERMAL
+int cxgb4_thermal_init(struct adapter *adap);
+int cxgb4_thermal_remove(struct adapter *adap);
+#endif /* CONFIG_THERMAL */
+
 #endif /* __CXGB4_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 1a93efa..03cc073 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5864,6 +5864,11 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (!is_t4(adapter->params.chip))
 		cxgb4_ptp_init(adapter);
 
+#ifdef CONFIG_THERMAL
+	if (!is_t4(adapter->params.chip) && (adapter->flags & FW_OK))
+		cxgb4_thermal_init(adapter);
+#endif /* CONFIG_THERMAL */
+
 	print_adapter_info(adapter);
 	return 0;
 
@@ -5929,6 +5934,9 @@ static void remove_one(struct pci_dev *pdev)
 
 		if (!is_t4(adapter->params.chip))
 			cxgb4_ptp_stop(adapter);
+#ifdef CONFIG_THERMAL
+		cxgb4_thermal_remove(adapter);
+#endif
 
 		/* If we allocated filters, free up state associated with any
 		 * valid filters ...
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c
new file mode 100644
index 0000000..28052e750
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c
@@ -0,0 +1,114 @@
+/*
+ *  Copyright (C) 2017 Chelsio Communications.  All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms and conditions of the GNU General Public License,
+ *  version 2, as published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ *  more details.
+ *
+ *  The full GNU General Public License is included in this distribution in
+ *  the file called "COPYING".
+ *
+ *  Written by: Ganesh Goudar (ganeshgr@chelsio.com)
+ */
+
+#include "cxgb4.h"
+
+#define CXGB4_NUM_TRIPS 1
+
+static int cxgb4_thermal_get_temp(struct thermal_zone_device *tzdev,
+				  int *temp)
+{
+	struct adapter *adap = tzdev->devdata;
+	u32 param, val;
+	int ret;
+
+	param = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DEV) |
+		 FW_PARAMS_PARAM_X_V(FW_PARAMS_PARAM_DEV_DIAG) |
+		 FW_PARAMS_PARAM_Y_V(FW_PARAM_DEV_DIAG_TMP));
+
+	ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 1,
+			      &param, &val);
+	if (ret < 0 || val == 0)
+		return -1;
+
+	*temp = val * 1000;
+	return 0;
+}
+
+static int cxgb4_thermal_get_trip_type(struct thermal_zone_device *tzdev,
+				       int trip, enum thermal_trip_type *type)
+{
+	struct adapter *adap = tzdev->devdata;
+
+	if (!adap->ch_thermal.trip_temp)
+		return -EINVAL;
+
+	*type = adap->ch_thermal.trip_type;
+	return 0;
+}
+
+static int cxgb4_thermal_get_trip_temp(struct thermal_zone_device *tzdev,
+				       int trip, int *temp)
+{
+	struct adapter *adap = tzdev->devdata;
+
+	if (!adap->ch_thermal.trip_temp)
+		return -EINVAL;
+
+	*temp = adap->ch_thermal.trip_temp;
+	return 0;
+}
+
+static struct thermal_zone_device_ops cxgb4_thermal_ops = {
+	.get_temp = cxgb4_thermal_get_temp,
+	.get_trip_type = cxgb4_thermal_get_trip_type,
+	.get_trip_temp = cxgb4_thermal_get_trip_temp,
+};
+
+int cxgb4_thermal_init(struct adapter *adap)
+{
+	struct ch_thermal *ch_thermal = &adap->ch_thermal;
+	int num_trip = CXGB4_NUM_TRIPS;
+	u32 param, val;
+	int ret;
+
+	/* on older firmwares we may not get the trip temperature,
+	 * set the num of trips to 0.
+	 */
+	param = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DEV) |
+		 FW_PARAMS_PARAM_X_V(FW_PARAMS_PARAM_DEV_DIAG) |
+		 FW_PARAMS_PARAM_Y_V(FW_PARAM_DEV_DIAG_MAXTMPTHRESH));
+
+	ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 1,
+			      &param, &val);
+	if (ret < 0) {
+		num_trip = 0; /* could not get trip temperature */
+	} else {
+		ch_thermal->trip_temp = val * 1000;
+		ch_thermal->trip_type = THERMAL_TRIP_CRITICAL;
+	}
+
+	ch_thermal->tzdev = thermal_zone_device_register("cxgb4", num_trip,
+							 0, adap,
+							 &cxgb4_thermal_ops,
+							 NULL, 0, 0);
+	if (IS_ERR(ch_thermal->tzdev)) {
+		ret = PTR_ERR(ch_thermal->tzdev);
+		dev_err(adap->pdev_dev, "Failed to register thermal zone\n");
+		ch_thermal->tzdev = NULL;
+		return ret;
+	}
+	return 0;
+}
+
+int cxgb4_thermal_remove(struct adapter *adap)
+{
+	if (adap->ch_thermal.tzdev)
+		thermal_zone_device_unregister(adap->ch_thermal.tzdev);
+	return 0;
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 6d2bc87..57584ab 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -1332,6 +1332,7 @@ enum fw_params_param_dev_phyfw {
 enum fw_params_param_dev_diag {
 	FW_PARAM_DEV_DIAG_TMP		= 0x00,
 	FW_PARAM_DEV_DIAG_VDD		= 0x01,
+	FW_PARAM_DEV_DIAG_MAXTMPTHRESH	= 0x02,
 };
 
 enum fw_params_param_dev_fwcache {
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next 08/19] net: usb: aqc111: Implement TX data path
From: Bjørn Mork @ 2018-10-09 13:50 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: David S . Miller, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org, Dmitry Bezrukov
In-Reply-To: <65674cc0cd1b025d859b1b0a5410b21ca9f88176.1538734658.git.igor.russkikh@aquantia.com>

Igor Russkikh <Igor.Russkikh@aquantia.com> writes:

> +struct aq_tx_packet_desc {
> +	struct {
> +		u32 length:21;
> +		u32 checksum:7;
> +		u32 drop_padding:1;
> +		u32 vlan_tag:1;
> +		u32 cphi:1;
> +		u32 dicf:1;
> +	};
> +	struct {
> +		u32 max_seg_size:15;
> +		u32 reserved:1;
> +		u32 vlan_info:16;
> +	};
> +};


You might want to shift and mask instead to avoid going insane when
trying to use this header on a BE system...


Bjørn

^ permalink raw reply

* Re: [RFC PATCH 00/11] net: ethernet: ti: cpsw: replace cpsw-phy-sel with phy driver
From: Grygorii Strashko @ 2018-10-09 21:12 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: David S. Miller, netdev, Rob Herring, Kishon Vijay Abraham I,
	Sekhar Nori, linux-kernel, linux-omap, devicetree
In-Reply-To: <20181009143631.GK5662@atomide.com>



On 10/09/2018 09:36 AM, Tony Lindgren wrote:
> * Grygorii Strashko <grygorii.strashko@ti.com> [181008 23:54]:
>> 2) introduce new PHY API for network interface mode selection which will use
>> already defined set of modes from phy_interface_t.
>>
>> Option 2 was selected for this series.
> 
> Looks good to me :) The dts files will cause merge conflicts with
> what I have pending for the ti-sysc changes so please send the dts
> changes in a separate series when posting without RFC.

expected. I did on top of mauster, but will rebase and resend if approved.

-- 
regards,
-grygorii

^ permalink raw reply

* [PATCH v2] rxrpc: use correct kvec num while send response packet in rxrpc_reject_packets
From: YueHaibing @ 2018-10-09 14:15 UTC (permalink / raw)
  To: David Howells, davem; +Cc: YueHaibing, linux-afs, netdev, kernel-janitors
In-Reply-To: <1539052273-34824-1-git-send-email-yuehaibing@huawei.com>

Fixes gcc '-Wunused-but-set-variable' warning:

net/rxrpc/output.c: In function 'rxrpc_reject_packets':
net/rxrpc/output.c:527:11: warning:
 variable 'ioc' set but not used [-Wunused-but-set-variable]

'ioc' is the correct kvec num while send response packet.

Fixes: commit ece64fec164f ("rxrpc: Emit BUSY packets when supposed to rather than ABORTs")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 net/rxrpc/output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index e8fb892..a141ee3 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -572,7 +572,8 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
 			whdr.flags	^= RXRPC_CLIENT_INITIATED;
 			whdr.flags	&= RXRPC_CLIENT_INITIATED;
 
-			ret = kernel_sendmsg(local->socket, &msg, iov, 2, size);
+			ret = kernel_sendmsg(local->socket, &msg,
+					     iov, ioc, size);
 			if (ret < 0)
 				trace_rxrpc_tx_fail(local->debug_id, 0, ret,
 						    rxrpc_tx_point_reject);

^ permalink raw reply related

* Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path
From: Eric Dumazet @ 2018-10-09 14:12 UTC (permalink / raw)
  To: Yafang Shao; +Cc: David Miller, netdev, LKML
In-Reply-To: <1539086718-4119-2-git-send-email-laoar.shao@gmail.com>

On Tue, Oct 9, 2018 at 5:05 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> By default, the sk->sk_allocation is GFP_KERNEL, that means if there's
> no enough memory it will do both direct reclaim and background reclaim.
> If the size of system memory is great, the direct reclaim may cause great
> latency spike.
>
> When we set MSG_DONTWAIT in send syscalls, we really don't want it to be
> blocked, so we'd better clear __GFP_DIRECT_RECLAIM when allocate skb in the
> send path. Then, it will return immediately if there's no enough memory to
> be allocated, and then the appliation has a chance to do some other stuffs
> instead of being blocked here.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  net/ipv4/tcp.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 43ef83b..fe4f5ce 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1182,6 +1182,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>         bool process_backlog = false;
>         bool zc = false;
>         long timeo;
> +       gfp_t gfp;
>
>         flags = msg->msg_flags;
>
> @@ -1255,6 +1256,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>         /* Ok commence sending. */
>         copied = 0;
>
> +       gfp = flags & MSG_DONTWAIT ? sk->sk_allocation & ~__GFP_DIRECT_RECLAIM :
> +             sk->sk_allocation;
> +
>  restart:
>         mss_now = tcp_send_mss(sk, &size_goal, flags);
>
> @@ -1283,8 +1287,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>                         }
>                         first_skb = tcp_rtx_and_write_queues_empty(sk);
>                         linear = select_size(first_skb, zc);
> -                       skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
> -                                                 first_skb);
> +                       skb = sk_stream_alloc_skb(sk, linear, gfp, first_skb);
>                         if (!skb)
>                                 goto wait_for_memory;


How have you tested this patch exactly ?

Most of TCP payloads are added in page fragments, and you have not
changed the page allocation fragments.

Also, I do not see how an application will get future notifications
that it can retry the failed system call ?
How are you really going to deal with this in high performance applications ?

I would rather prefer a socket setsockopt() to eventually be able to
flip __GFP_DIRECT_RECLAIM in sk->sk_allocation,
to not add all these tests in fast path, but honestly I do not see how
applications can really make use of this.

^ permalink raw reply

* [PATCH  v2 1/2] net: if_arp: Fix incorrect indents
From: Håkon Bugge @ 2018-10-09 14:27 UTC (permalink / raw)
  To: netdev
  Cc: stephen, David S . Miller, Kate Stewart, Thomas Gleixner,
	Greg Kroah-Hartman, Philippe Ombredanne, linux-kernel
In-Reply-To: <20181009142724.2213012-1-Haakon.Bugge@oracle.com>

Fixing incorrect indents and align comments.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 include/uapi/linux/if_arp.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
index 4605527ca41b..b68b4b3d9172 100644
--- a/include/uapi/linux/if_arp.h
+++ b/include/uapi/linux/if_arp.h
@@ -114,18 +114,18 @@
 
 /* ARP ioctl request. */
 struct arpreq {
-  struct sockaddr	arp_pa;		/* protocol address		*/
-  struct sockaddr	arp_ha;		/* hardware address		*/
-  int			arp_flags;	/* flags			*/
-  struct sockaddr       arp_netmask;    /* netmask (only for proxy arps) */
-  char			arp_dev[16];
+	struct sockaddr	arp_pa;		/* protocol address		 */
+	struct sockaddr	arp_ha;		/* hardware address		 */
+	int		arp_flags;	/* flags			 */
+	struct sockaddr arp_netmask;    /* netmask (only for proxy arps) */
+	char		arp_dev[16];
 };
 
 struct arpreq_old {
-  struct sockaddr	arp_pa;		/* protocol address		*/
-  struct sockaddr	arp_ha;		/* hardware address		*/
-  int			arp_flags;	/* flags			*/
-  struct sockaddr       arp_netmask;    /* netmask (only for proxy arps) */
+	struct sockaddr	arp_pa;		/* protocol address		 */
+	struct sockaddr	arp_ha;		/* hardware address		 */
+	int		arp_flags;	/* flags			 */
+	struct sockaddr	arp_netmask;    /* netmask (only for proxy arps) */
 };
 
 /* ARP Flag values. */
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH net-next 07/19] net: usb: aqc111: Add support for getting and setting of MAC address
From: Igor Russkikh @ 2018-10-09 14:34 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S . Miller, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org, Dmitry Bezrukov
In-Reply-To: <20181006010346.GA32455@lunn.ch>

Hi Andrew,

>> +	if (ret < 0)
>> +		goto out;
>> +
>> +	memcpy(dev->net->dev_addr, buf, ETH_ALEN);
>> +	memcpy(dev->net->perm_addr, dev->net->dev_addr, ETH_ALEN);
> 
> Is this really the permanent address? If i call aqc111_set_mac_addr()
> followed by aqc111_get_mac() i still get what is in the OTP EEPROM?

Thats actually a confusion with function name here.
Think its better to name it aqc111_init_mac() since it gets called
only once on bind.

It really initializes perm_addr once, thus standard ndev callback will give
the perm mac you want.

> 
> You initialized it above as {0}. You don't need to memset it here.
> 
>> +	ret = aqc111_get_mac(dev, buf);
> 
> Do you even need to zero it? If aqc111_get_mac() fails, it will be
> left undefined, but you fail the bind anyway.

We even don't need this `buf` here at all. We'll move it into
above init_mac function.

BR, Igor

^ permalink raw reply

* Re: [net-next,v2,2/4] net/smc: ipv6 support for smc_diag.c
From: Ursula Braun @ 2018-10-09 14:41 UTC (permalink / raw)
  To: Eugene Syromiatnikov
  Cc: davem, netdev, linux-s390, schwidefsky, heiko.carstens, raspl,
	kgraul
In-Reply-To: <20181007011152.GA11112@asgard.redhat.com>



On 10/07/2018 03:11 AM, Eugene Syromiatnikov wrote:
> On Wed, May 02, 2018 at 04:56:45PM +0200, Ursula Braun wrote:
>> From: Karsten Graul <kgraul@linux.ibm.com>
>>
>> Update smc_diag.c to support ipv6 addresses on the diagnosis interface.
>>
>> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
>> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
>> ---
>>  net/smc/smc_diag.c | 39 ++++++++++++++++++++++++++++++---------
>>  1 file changed, 30 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
>> index 427b91c1c964..05dd7e6d314d 100644
>> --- a/net/smc/smc_diag.c
>> +++ b/net/smc/smc_diag.c
>> @@ -38,17 +38,27 @@ static void smc_diag_msg_common_fill(struct smc_diag_msg *r, struct sock *sk)
>>  {
>>  	struct smc_sock *smc = smc_sk(sk);
>>  
>> -	r->diag_family = sk->sk_family;
>>  	if (!smc->clcsock)
>>  		return;
>>  	r->id.idiag_sport = htons(smc->clcsock->sk->sk_num);
>>  	r->id.idiag_dport = smc->clcsock->sk->sk_dport;
>>  	r->id.idiag_if = smc->clcsock->sk->sk_bound_dev_if;
>>  	sock_diag_save_cookie(sk, r->id.idiag_cookie);
>> -	memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
>> -	memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
>> -	r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr;
>> -	r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr;
>> +	if (sk->sk_protocol == SMCPROTO_SMC) {
>> +		r->diag_family = PF_INET;
>> +		memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
>> +		memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
>> +		r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr;
>> +		r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr;
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +	} else if (sk->sk_protocol == SMCPROTO_SMC6) {
>> +		r->diag_family = PF_INET6;
>> +		memcpy(&r->id.idiag_src, &smc->clcsock->sk->sk_v6_rcv_saddr,
>> +		       sizeof(smc->clcsock->sk->sk_v6_rcv_saddr));
>> +		memcpy(&r->id.idiag_dst, &smc->clcsock->sk->sk_v6_daddr,
>> +		       sizeof(smc->clcsock->sk->sk_v6_daddr));
>> +#endif
>> +	}
> 
> This change makes it impossible to distinguish an inet_sock_diag
> response message from SMC sock_diag response (previously it reported
> AF_SMC in diag_family which allows deciding whether that a part of
> struct smc_diag_msg or struct inet_diag_msg).
> 

Eugene,

we are considering the following patch:

---
 net/smc/smc_diag.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index dbf64a93d68a..371b4cf31fcd 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -38,6 +38,7 @@ static void smc_diag_msg_common_fill(struct smc_diag_msg *r, struct sock *sk)
 {
 	struct smc_sock *smc = smc_sk(sk);
 
+	r->diag_family = sk->sk_family;
 	if (!smc->clcsock)
 		return;
 	r->id.idiag_sport = htons(smc->clcsock->sk->sk_num);
@@ -45,14 +46,12 @@ static void smc_diag_msg_common_fill(struct smc_diag_msg *r, struct sock *sk)
 	r->id.idiag_if = smc->clcsock->sk->sk_bound_dev_if;
 	sock_diag_save_cookie(sk, r->id.idiag_cookie);
 	if (sk->sk_protocol == SMCPROTO_SMC) {
-		r->diag_family = PF_INET;
 		memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
 		memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
 		r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr;
 		r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr;
 #if IS_ENABLED(CONFIG_IPV6)
 	} else if (sk->sk_protocol == SMCPROTO_SMC6) {
-		r->diag_family = PF_INET6;
 		memcpy(&r->id.idiag_src, &smc->clcsock->sk->sk_v6_rcv_saddr,
 		       sizeof(smc->clcsock->sk->sk_v6_rcv_saddr));
 		memcpy(&r->id.idiag_dst, &smc->clcsock->sk->sk_v6_daddr,

^ permalink raw reply related

* Re: [PATCH net] net/sched: cls_api: add missing validation of netlink attributes
From: David Ahern @ 2018-10-09 14:46 UTC (permalink / raw)
  To: Davide Caratti, David S. Miller, Jamal Hadi Salim; +Cc: netdev
In-Reply-To: <05f98d2d220d443c157fc797fecc22692eeaa0da.1539090183.git.dcaratti@redhat.com>

On 10/9/18 7:10 AM, Davide Caratti wrote:
> Similarly to what has been done in 8b4c3cdd9dd8 ("net: sched: Add policy
> validation for tc attributes"), add validation for TCA_CHAIN and TCA_KIND
> netlink attributes.
> 
> tested with:
>  # ./tdc.py -c filter
> 
> Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
>  net/sched/cls_api.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 0a75cb2e5e7b..fb1afc0e130d 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -37,6 +37,11 @@ static LIST_HEAD(tcf_proto_base);
>  /* Protects list of registered TC modules. It is pure SMP lock. */
>  static DEFINE_RWLOCK(cls_mod_lock);
>  
> +const struct nla_policy cls_tca_policy[TCA_MAX + 1] = {
> +	[TCA_KIND]	= { .type = NLA_STRING },
> +	[TCA_CHAIN]	= { .type = NLA_U32 },
> +};
> +

That should be static since it can not be used outside this module.

it be nice to have a tc_common module so this stuff does not have to be
defined multiple times.

^ permalink raw reply

* Re: [PATCH net-next 07/19] net: usb: aqc111: Add support for getting and setting of MAC address
From: Andrew Lunn @ 2018-10-09 14:46 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: David S . Miller, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org, Dmitry Bezrukov
In-Reply-To: <082aefef-5927-181e-e505-685c1ca51492@aquantia.com>

On Tue, Oct 09, 2018 at 02:34:36PM +0000, Igor Russkikh wrote:
> Hi Andrew,
> 
> >> +	if (ret < 0)
> >> +		goto out;
> >> +
> >> +	memcpy(dev->net->dev_addr, buf, ETH_ALEN);
> >> +	memcpy(dev->net->perm_addr, dev->net->dev_addr, ETH_ALEN);
> > 
> > Is this really the permanent address? If i call aqc111_set_mac_addr()
> > followed by aqc111_get_mac() i still get what is in the OTP EEPROM?
> 
> Thats actually a confusion with function name here.
> Think its better to name it aqc111_init_mac() since it gets called
> only once on bind.

Hi Igor

Or aqc111_get_otp_mac()?

   Andrew

^ permalink raw reply

* Re: [RFC PATCH 02/11] dt-bindings: phy: add cpsw port interface mode selection phy bindings
From: Grygorii Strashko @ 2018-10-09 22:04 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: David S. Miller, netdev, Rob Herring, Kishon Vijay Abraham I,
	Sekhar Nori, linux-kernel, linux-omap, devicetree
In-Reply-To: <20181009203017.GM5662@atomide.com>



On 10/09/2018 03:30 PM, Tony Lindgren wrote:
> * Grygorii Strashko <grygorii.strashko@ti.com> [181009 20:10]:
>>
>>
>> On 10/09/2018 09:40 AM, Tony Lindgren wrote:
>>> * Grygorii Strashko <grygorii.strashko@ti.com> [181008 23:54]:
>>>> +Examples:
>>>> +	phy_gmii_sel: phy-gmii-sel {
>>>> +		compatible = "ti,am3352-phy-gmii-sel";
>>>> +		syscon-scm = <&scm_conf>;
>>>> +		#phy-cells = <2>;
>>>> +	};
>>>
>>> Now that this driver can live in it's proper place in the
>>
>> right
>>
>>> dts, you may want to consider just using standard reg
>>> property for it instead of the syscon-scm. And also get
>>> rid of the syscon reads and writes.
>>
>> Could you help clarify how to get syscon in this case?
>> syscon_node_to_regmap(dev->parent->of_node)?
> 
> Hmm I don't think you need syscon at all now. You can just
> ioremap the register(s) and use readl/writel and that's it.
> Or use regmap without syscon if you prefer that.

It will overlap with already remapped SCM syscon and i'd like to avoid this.
+ it seems common practice to use syscon for devices/drivers which are
child to SCM node - makes overall system more consistent.

> 
> The ioremap in this case should be hitting cached ranges
> anyways, so no extra overhead there.
> 
>> Also, there are could be more then one gmii_sel registers in SCM in the future,
>> so I hidden offsets in of_match data.
>> As result, "reg" not needed at all now.
> 
> But then you have to patch driver for various SoCs
> instead of just configuring the standard reg property
> in the dts file :)

Problem is that they are not guarantee to be standard between SoC's families 
(number of regs and fields placement), as result it might require to change
driver any way for various SoCs to handle properly new fields placement.

I prefer to fix driver then fight with DT ;) as it's static for SoC family
and need to be changed only once when new SoC family introduced.

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [RFC PATCH 02/11] dt-bindings: phy: add cpsw port interface mode selection phy bindings
From: Tony Lindgren @ 2018-10-09 22:07 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Rob Herring, Kishon Vijay Abraham I,
	Sekhar Nori, linux-kernel, linux-omap, devicetree
In-Reply-To: <f9dfd775-0679-f82b-b53b-065e9f7be7c7@ti.com>

* Grygorii Strashko <grygorii.strashko@ti.com> [181009 22:04]:
> 
> 
> On 10/09/2018 03:30 PM, Tony Lindgren wrote:
> > * Grygorii Strashko <grygorii.strashko@ti.com> [181009 20:10]:
> >>
> >>
> >> On 10/09/2018 09:40 AM, Tony Lindgren wrote:
> >>> * Grygorii Strashko <grygorii.strashko@ti.com> [181008 23:54]:
> >>>> +Examples:
> >>>> +	phy_gmii_sel: phy-gmii-sel {
> >>>> +		compatible = "ti,am3352-phy-gmii-sel";
> >>>> +		syscon-scm = <&scm_conf>;
> >>>> +		#phy-cells = <2>;
> >>>> +	};
> >>>
> >>> Now that this driver can live in it's proper place in the
> >>
> >> right
> >>
> >>> dts, you may want to consider just using standard reg
> >>> property for it instead of the syscon-scm. And also get
> >>> rid of the syscon reads and writes.
> >>
> >> Could you help clarify how to get syscon in this case?
> >> syscon_node_to_regmap(dev->parent->of_node)?
> > 
> > Hmm I don't think you need syscon at all now. You can just
> > ioremap the register(s) and use readl/writel and that's it.
> > Or use regmap without syscon if you prefer that.
> 
> It will overlap with already remapped SCM syscon and i'd like to avoid this.
> + it seems common practice to use syscon for devices/drivers which are
> child to SCM node - makes overall system more consistent.

Well it was just set up with syscon in deperation earlier with
drivers just blindly mapping registers outside of their
range..

> > The ioremap in this case should be hitting cached ranges
> > anyways, so no extra overhead there.
> > 
> >> Also, there are could be more then one gmii_sel registers in SCM in the future,
> >> so I hidden offsets in of_match data.
> >> As result, "reg" not needed at all now.
> > 
> > But then you have to patch driver for various SoCs
> > instead of just configuring the standard reg property
> > in the dts file :)
> 
> Problem is that they are not guarantee to be standard between SoC's families 
> (number of regs and fields placement), as result it might require to change
> driver any way for various SoCs to handle properly new fields placement.
> 
> I prefer to fix driver then fight with DT ;) as it's static for SoC family
> and need to be changed only once when new SoC family introduced.

Fine with me, that can be changed later too no problem.

Regards,

Tony

^ permalink raw reply

* [PATCH 1/7] fore200e: simplify fore200e_bus usage
From: Christoph Hellwig @ 2018-10-09 14:57 UTC (permalink / raw)
  To: Chas Williams, netdev; +Cc: linux-atm-general, linux-kernel
In-Reply-To: <20181009145720.32578-1-hch@lst.de>

There is no need to have a global array of the ops, instead PCI and sbus
can have their own instances assigned in *_probe.  Also switch to C99
initializers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/atm/fore200e.c | 121 +++++++++++++++++++----------------------
 1 file changed, 56 insertions(+), 65 deletions(-)

diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index 99a38115b0a8..008bd8541c61 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -106,7 +106,6 @@
 
 
 static const struct atmdev_ops   fore200e_ops;
-static const struct fore200e_bus fore200e_bus[];
 
 static LIST_HEAD(fore200e_boards);
 
@@ -664,9 +663,31 @@ fore200e_pca_proc_read(struct fore200e* fore200e, char *page)
 		   pci_dev->bus->number, PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
 }
 
+static const struct fore200e_bus fore200e_pci_ops = {
+	.model_name		= "PCA-200E",
+	.proc_name		= "pca200e",
+	.descr_alignment	= 32,
+	.buffer_alignment	= 4,
+	.status_alignment	= 32,
+	.read			= fore200e_pca_read,
+	.write			= fore200e_pca_write,
+	.dma_map		= fore200e_pca_dma_map,
+	.dma_unmap		= fore200e_pca_dma_unmap,
+	.dma_sync_for_cpu	= fore200e_pca_dma_sync_for_cpu,
+	.dma_sync_for_device	= fore200e_pca_dma_sync_for_device,
+	.dma_chunk_alloc	= fore200e_pca_dma_chunk_alloc,
+	.dma_chunk_free		= fore200e_pca_dma_chunk_free,
+	.configure		= fore200e_pca_configure,
+	.map			= fore200e_pca_map,
+	.reset			= fore200e_pca_reset,
+	.prom_read		= fore200e_pca_prom_read,
+	.unmap			= fore200e_pca_unmap,
+	.irq_check		= fore200e_pca_irq_check,
+	.irq_ack		= fore200e_pca_irq_ack,
+	.proc_read		= fore200e_pca_proc_read,
+};
 #endif /* CONFIG_PCI */
 
-
 #ifdef CONFIG_SBUS
 
 static u32 fore200e_sba_read(volatile u32 __iomem *addr)
@@ -855,8 +876,32 @@ static int fore200e_sba_proc_read(struct fore200e *fore200e, char *page)
 	return sprintf(page, "   SBUS slot/device:\t\t%d/'%s'\n",
 		       (regs ? regs->which_io : 0), op->dev.of_node->name);
 }
-#endif /* CONFIG_SBUS */
 
+static const struct fore200e_bus fore200e_sbus_ops = {
+	.model_name		= "SBA-200E",
+	.proc_name		= "sba200e",
+	.descr_alignment	= 32,
+	.buffer_alignent	= 64,
+	.status_alignment	= 32,
+	.read			= fore200e_sba_read,
+	.write			= fore200e_sba_write,
+	.dma_map		= fore200e_sba_dma_map,
+	.dma_unap		= fore200e_sba_dma_unmap,
+	.dma_sync_for_cpu	= fore200e_sba_dma_sync_for_cpu,
+	.dma_sync_for_device	= fore200e_sba_dma_sync_for_device,
+	.dma_chunk_alloc	= fore200e_sba_dma_chunk_alloc,
+	.dma_chunk_free		= fore200e_sba_dma_chunk_free,
+	.configure		= fore200e_sba_configure,
+	.map			= fore200e_sba_map,
+	.reset			= fore200e_sba_reset,
+	.prom_read		= fore200e_sba_prom_read,
+	.unmap			= fore200e_sba_unmap,
+	.irq_enable		= fore200e_sba_irq_enable,
+	.irq_check		= fore200e_sba_irq_check,
+	.irq_ack		= fore200e_sba_irq_ack,
+	.proc_read		= fore200e_sba_proc_read,
+};
+#endif /* CONFIG_SBUS */
 
 static void
 fore200e_tx_irq(struct fore200e* fore200e)
@@ -2631,7 +2676,6 @@ static const struct of_device_id fore200e_sba_match[];
 static int fore200e_sba_probe(struct platform_device *op)
 {
 	const struct of_device_id *match;
-	const struct fore200e_bus *bus;
 	struct fore200e *fore200e;
 	static int index = 0;
 	int err;
@@ -2639,18 +2683,17 @@ static int fore200e_sba_probe(struct platform_device *op)
 	match = of_match_device(fore200e_sba_match, &op->dev);
 	if (!match)
 		return -EINVAL;
-	bus = match->data;
 
 	fore200e = kzalloc(sizeof(struct fore200e), GFP_KERNEL);
 	if (!fore200e)
 		return -ENOMEM;
 
-	fore200e->bus = bus;
+	fore200e->bus = &fore200e_sbus_ops;
 	fore200e->bus_dev = op;
 	fore200e->irq = op->archdata.irqs[0];
 	fore200e->phys_base = op->resource[0].start;
 
-	sprintf(fore200e->name, "%s-%d", bus->model_name, index);
+	sprintf(fore200e->name, "SBA-200E-%d", index);
 
 	err = fore200e_init(fore200e, &op->dev);
 	if (err < 0) {
@@ -2678,7 +2721,6 @@ static int fore200e_sba_remove(struct platform_device *op)
 static const struct of_device_id fore200e_sba_match[] = {
 	{
 		.name = SBA200E_PROM_NAME,
-		.data = (void *) &fore200e_bus[1],
 	},
 	{},
 };
@@ -2698,7 +2740,6 @@ static struct platform_driver fore200e_sba_driver = {
 static int fore200e_pca_detect(struct pci_dev *pci_dev,
 			       const struct pci_device_id *pci_ent)
 {
-    const struct fore200e_bus* bus = (struct fore200e_bus*) pci_ent->driver_data;
     struct fore200e* fore200e;
     int err = 0;
     static int index = 0;
@@ -2719,20 +2760,19 @@ static int fore200e_pca_detect(struct pci_dev *pci_dev,
 	goto out_disable;
     }
 
-    fore200e->bus       = bus;
+    fore200e->bus       = &fore200e_pci_ops;
     fore200e->bus_dev   = pci_dev;    
     fore200e->irq       = pci_dev->irq;
     fore200e->phys_base = pci_resource_start(pci_dev, 0);
 
-    sprintf(fore200e->name, "%s-%d", bus->model_name, index - 1);
+    sprintf(fore200e->name, "PCA-200E-%d", index - 1);
 
     pci_set_master(pci_dev);
 
-    printk(FORE200E "device %s found at 0x%lx, IRQ %s\n",
-	   fore200e->bus->model_name, 
+    printk(FORE200E "device PCA-200E found at 0x%lx, IRQ %s\n",
 	   fore200e->phys_base, fore200e_irq_itoa(fore200e->irq));
 
-    sprintf(fore200e->name, "%s-%d", bus->model_name, index);
+    sprintf(fore200e->name, "PCA-200E-%d", index);
 
     err = fore200e_init(fore200e, &pci_dev->dev);
     if (err < 0) {
@@ -2767,8 +2807,7 @@ static void fore200e_pca_remove_one(struct pci_dev *pci_dev)
 
 
 static const struct pci_device_id fore200e_pca_tbl[] = {
-    { PCI_VENDOR_ID_FORE, PCI_DEVICE_ID_FORE_PCA200E, PCI_ANY_ID, PCI_ANY_ID,
-      0, 0, (unsigned long) &fore200e_bus[0] },
+    { PCI_VENDOR_ID_FORE, PCI_DEVICE_ID_FORE_PCA200E, PCI_ANY_ID, PCI_ANY_ID },
     { 0, }
 };
 
@@ -3108,8 +3147,7 @@ module_init(fore200e_module_init);
 module_exit(fore200e_module_cleanup);
 
 
-static const struct atmdev_ops fore200e_ops =
-{
+static const struct atmdev_ops fore200e_ops = {
 	.open       = fore200e_open,
 	.close      = fore200e_close,
 	.ioctl      = fore200e_ioctl,
@@ -3121,53 +3159,6 @@ static const struct atmdev_ops fore200e_ops =
 	.owner      = THIS_MODULE
 };
 
-
-static const struct fore200e_bus fore200e_bus[] = {
-#ifdef CONFIG_PCI
-    { "PCA-200E", "pca200e", 32, 4, 32, 
-      fore200e_pca_read,
-      fore200e_pca_write,
-      fore200e_pca_dma_map,
-      fore200e_pca_dma_unmap,
-      fore200e_pca_dma_sync_for_cpu,
-      fore200e_pca_dma_sync_for_device,
-      fore200e_pca_dma_chunk_alloc,
-      fore200e_pca_dma_chunk_free,
-      fore200e_pca_configure,
-      fore200e_pca_map,
-      fore200e_pca_reset,
-      fore200e_pca_prom_read,
-      fore200e_pca_unmap,
-      NULL,
-      fore200e_pca_irq_check,
-      fore200e_pca_irq_ack,
-      fore200e_pca_proc_read,
-    },
-#endif
-#ifdef CONFIG_SBUS
-    { "SBA-200E", "sba200e", 32, 64, 32,
-      fore200e_sba_read,
-      fore200e_sba_write,
-      fore200e_sba_dma_map,
-      fore200e_sba_dma_unmap,
-      fore200e_sba_dma_sync_for_cpu,
-      fore200e_sba_dma_sync_for_device,
-      fore200e_sba_dma_chunk_alloc,
-      fore200e_sba_dma_chunk_free,
-      fore200e_sba_configure,
-      fore200e_sba_map,
-      fore200e_sba_reset,
-      fore200e_sba_prom_read,
-      fore200e_sba_unmap,
-      fore200e_sba_irq_enable,
-      fore200e_sba_irq_check,
-      fore200e_sba_irq_ack,
-      fore200e_sba_proc_read,
-    },
-#endif
-    {}
-};
-
 MODULE_LICENSE("GPL");
 #ifdef CONFIG_PCI
 #ifdef __LITTLE_ENDIAN__
-- 
2.19.0

^ permalink raw reply related

* Re: [PATCH v3 1/2] Driver core: add bus_find_device_by_fwnode
From: Mark Brown @ 2018-10-09 15:15 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Silesh C V, Greg Kroah-Hartman, Rafael J. Wysocki, linux-kernel,
	linux-arm-kernel, linux-i2c, linux-rdma, netdev, devicetree,
	linux-spi, Mathieu Poirier, Lijun Ou, Wei Hu(Xavier),
	Yisen Zhuang, Salil Mehta, Srinivas Kandagatla, Andrew Lunn,
	Florian Fainelli, Rob Herring, Frank Rowand, David S. Miller
In-Reply-To: <20181009110210.i6xphyuy5jkcfaug@katana>

[-- Attachment #1: Type: text/plain, Size: 444 bytes --]

On Tue, Oct 09, 2018 at 01:02:10PM +0200, Wolfram Sang wrote:

> We recently had this discussion in I2C world about using the parent if
> the (logical) device has a NULL fw_node [1]. I don't know if the other
> subsystems you modify use logical devices as well? If no, it seems we
> need an additional check for the parent in the I2C core only. If yes,
> this might be considered in your patchset?

SPI has a logical controller device as well.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2] net: core: change bool members of struct net_device to bitfield members
From: David Ahern @ 2018-10-09 15:20 UTC (permalink / raw)
  To: Heiner Kallweit, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <e0150624-a317-eb4f-126e-92c99db8ce13@gmail.com>

On 10/8/18 2:17 PM, Heiner Kallweit wrote:
> bool is good as parameter type or function return type, but if used
> for struct members it consumes more memory than needed.
> Changing the bool members of struct net_device to bitfield members
> allows to decrease the memory footprint of this struct.

What does pahole show for the size of the struct before and after? I
suspect you have not really changed the size and certainly not the
actual memory allocated.

^ permalink raw reply

* [PATCH stable 4.9 00/29] backport of IP fragmentation fixes
From: Florian Fainelli @ 2018-10-09 22:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, gregkh, stable, edumazet, sthemmin, Florian Fainelli

This is based on Stephen's v4.14 patches, with the necessary merge
conflicts, and the lack of timer_setup() on the 4.9 baseline.

Perf results on a gigabit capable system, before and after are below.

Series can also be found here:

https://github.com/ffainelli/linux/commits/fragment-stack-v4.9


   PerfTop:     457 irqs/sec  kernel:74.4%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
-------------------------------------------------------------------------------

    29.62%  [kernel]       [k] ip_defrag                  
     6.57%  [kernel]       [k] arch_cpu_idle              
     1.72%  [kernel]       [k] v7_dma_inv_range           
     1.68%  [kernel]       [k] __netif_receive_skb_core   
     1.43%  [kernel]       [k] fib_table_lookup           
     1.30%  [kernel]       [k] finish_task_switch         
     1.08%  [kernel]       [k] ip_rcv                     
     1.01%  [kernel]       [k] skb_release_data           
     0.99%  [kernel]       [k] __slab_free                
     0.96%  [kernel]       [k] bcm_sysport_poll           
     0.88%  [kernel]       [k] __netdev_alloc_skb         
     0.87%  [kernel]       [k] tick_nohz_idle_enter       
     0.86%  [kernel]       [k] dev_gro_receive            
     0.85%  [kernel]       [k] _raw_spin_unlock_irqrestore
     0.84%  [kernel]       [k] __memzero                  
     0.74%  [kernel]       [k] tick_nohz_idle_exit        
     0.73%  ld-2.24.so     [.] do_lookup_x                
     0.66%  [kernel]       [k] kmem_cache_free            
     0.66%  [kernel]       [k] bcm_sysport_rx_refill      
     0.65%  [kernel]       [k] eth_type_trans             


After patching:

  PerfTop:     170 irqs/sec  kernel:86.5%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
-------------------------------------------------------------------------------

     7.79%  [kernel]       [k] arch_cpu_idle              
     5.14%  [kernel]       [k] v7_dma_inv_range           
     4.20%  [kernel]       [k] ip_defrag                  
     3.89%  [kernel]       [k] __netif_receive_skb_core   
     3.65%  [kernel]       [k] fib_table_lookup           
     2.16%  [kernel]       [k] finish_task_switch         
     1.93%  [kernel]       [k] _raw_spin_unlock_irqrestore
     1.90%  [kernel]       [k] ip_rcv                     
     1.84%  [kernel]       [k] bcm_sysport_poll           
     1.83%  [kernel]       [k] __memzero                  
     1.65%  [kernel]       [k] __netdev_alloc_skb         
     1.60%  [kernel]       [k] __slab_free                
     1.49%  [kernel]       [k] __do_softirq               
     1.49%  [kernel]       [k] bcm_sysport_rx_refill      
     1.31%  [kernel]       [k] dma_cache_maint_page       
     1.25%  [kernel]       [k] tick_nohz_idle_enter       
     1.24%  [kernel]       [k] ip_route_input_noref       
     1.17%  [kernel]       [k] eth_type_trans             
     1.06%  [kernel]       [k] fib_validate_source        
     1.03%  [kernel]       [k] inet_frag_find    

Dan Carpenter (1):
  ipv4: frags: precedence bug in ip_expire()

Eric Dumazet (22):
  inet: frags: change inet_frags_init_net() return value
  inet: frags: add a pointer to struct netns_frags
  inet: frags: refactor ipfrag_init()
  inet: frags: refactor ipv6_frag_init()
  inet: frags: refactor lowpan_net_frag_init()
  ipv6: export ip6 fragments sysctl to unprivileged users
  rhashtable: add schedule points
  inet: frags: use rhashtables for reassembly units
  inet: frags: remove some helpers
  inet: frags: get rif of inet_frag_evicting()
  inet: frags: remove inet_frag_maybe_warn_overflow()
  inet: frags: break the 2GB limit for frags storage
  inet: frags: do not clone skb in ip_expire()
  ipv6: frags: rewrite ip6_expire_frag_queue()
  rhashtable: reorganize struct rhashtable layout
  inet: frags: reorganize struct netns_frags
  inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
  inet: frags: fix ip6frag_low_thresh boundary
  net: speed up skb_rbtree_purge()
  net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
  net: add rb_to_skb() and other rb tree helpers
  net: sk_buff rbnode reorg

Florian Westphal (1):
  ipv6: defrag: drop non-last frags smaller than min mtu

Peter Oskolkov (4):
  ip: discard IPv4 datagrams with overlapping segments.
  net: modify skb_rbtree_purge to return the truesize of all purged
    skbs.
  ip: add helpers to process in-order fragments faster.
  ip: process in-order fragments efficiently

Taehee Yoo (1):
  ip: frags: fix crash in ip_do_fragment()

 Documentation/networking/ip-sysctl.txt  |  13 +-
 include/linux/rhashtable.h              |   4 +-
 include/linux/skbuff.h                  |  48 +-
 include/net/inet_frag.h                 | 133 +++---
 include/net/ip.h                        |   1 -
 include/net/ipv6.h                      |  26 +-
 include/uapi/linux/snmp.h               |   1 +
 lib/rhashtable.c                        |   5 +-
 net/core/skbuff.c                       |  31 +-
 net/ieee802154/6lowpan/6lowpan_i.h      |  26 +-
 net/ieee802154/6lowpan/reassembly.c     | 148 +++---
 net/ipv4/inet_fragment.c                | 379 ++++------------
 net/ipv4/ip_fragment.c                  | 573 +++++++++++++-----------
 net/ipv4/proc.c                         |   7 +-
 net/ipv4/tcp_input.c                    |  33 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c | 100 ++---
 net/ipv6/proc.c                         |   5 +-
 net/ipv6/reassembly.c                   | 212 ++++-----
 18 files changed, 785 insertions(+), 960 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH stable 4.9 04/29] inet: frags: refactor ipv6_frag_init()
From: Florian Fainelli @ 2018-10-09 22:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, gregkh, stable, edumazet, sthemmin
In-Reply-To: <20181009224924.30151-1-f.fainelli@gmail.com>

From: Eric Dumazet <edumazet@google.com>

We want to call inet_frags_init() earlier.

This is a prereq to "inet: frags: use rhashtables for reassembly units"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5b975bab23615cd0fdf67af6c9298eb01c4b9f61)
---
 net/ipv6/reassembly.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 436e6d594f25..9440bb9bdab7 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -740,10 +740,21 @@ int __init ipv6_frag_init(void)
 {
 	int ret;
 
-	ret = inet6_add_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+	ip6_frags.hashfn = ip6_hashfn;
+	ip6_frags.constructor = ip6_frag_init;
+	ip6_frags.destructor = NULL;
+	ip6_frags.qsize = sizeof(struct frag_queue);
+	ip6_frags.match = ip6_frag_match;
+	ip6_frags.frag_expire = ip6_frag_expire;
+	ip6_frags.frags_cache_name = ip6_frag_cache_name;
+	ret = inet_frags_init(&ip6_frags);
 	if (ret)
 		goto out;
 
+	ret = inet6_add_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+	if (ret)
+		goto err_protocol;
+
 	ret = ip6_frags_sysctl_register();
 	if (ret)
 		goto err_sysctl;
@@ -752,16 +763,6 @@ int __init ipv6_frag_init(void)
 	if (ret)
 		goto err_pernet;
 
-	ip6_frags.hashfn = ip6_hashfn;
-	ip6_frags.constructor = ip6_frag_init;
-	ip6_frags.destructor = NULL;
-	ip6_frags.qsize = sizeof(struct frag_queue);
-	ip6_frags.match = ip6_frag_match;
-	ip6_frags.frag_expire = ip6_frag_expire;
-	ip6_frags.frags_cache_name = ip6_frag_cache_name;
-	ret = inet_frags_init(&ip6_frags);
-	if (ret)
-		goto err_pernet;
 out:
 	return ret;
 
@@ -769,6 +770,8 @@ int __init ipv6_frag_init(void)
 	ip6_frags_sysctl_unregister();
 err_sysctl:
 	inet6_del_protocol(&frag_protocol, IPPROTO_FRAGMENT);
+err_protocol:
+	inet_frags_fini(&ip6_frags);
 	goto out;
 }
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH stable 4.9 05/29] inet: frags: refactor lowpan_net_frag_init()
From: Florian Fainelli @ 2018-10-09 22:49 UTC (permalink / raw)
  To: netdev; +Cc: davem, gregkh, stable, edumazet, sthemmin
In-Reply-To: <20181009224924.30151-1-f.fainelli@gmail.com>

From: Eric Dumazet <edumazet@google.com>

We want to call lowpan_net_frag_init() earlier.
Similar to commit "inet: frags: refactor ipv6_frag_init()"

This is a prereq to "inet: frags: use rhashtables for reassembly units"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 807f1844df4ac23594268fa9f41902d0549e92aa)
---
 net/ieee802154/6lowpan/reassembly.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 9ccb8458b5c3..977b4ed58112 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -614,14 +614,6 @@ int __init lowpan_net_frag_init(void)
 {
 	int ret;
 
-	ret = lowpan_frags_sysctl_register();
-	if (ret)
-		return ret;
-
-	ret = register_pernet_subsys(&lowpan_frags_ops);
-	if (ret)
-		goto err_pernet;
-
 	lowpan_frags.hashfn = lowpan_hashfn;
 	lowpan_frags.constructor = lowpan_frag_init;
 	lowpan_frags.destructor = NULL;
@@ -631,11 +623,21 @@ int __init lowpan_net_frag_init(void)
 	lowpan_frags.frags_cache_name = lowpan_frags_cache_name;
 	ret = inet_frags_init(&lowpan_frags);
 	if (ret)
-		goto err_pernet;
+		goto out;
 
+	ret = lowpan_frags_sysctl_register();
+	if (ret)
+		goto err_sysctl;
+
+	ret = register_pernet_subsys(&lowpan_frags_ops);
+	if (ret)
+		goto err_pernet;
+out:
 	return ret;
 err_pernet:
 	lowpan_frags_sysctl_unregister();
+err_sysctl:
+	inet_frags_fini(&lowpan_frags);
 	return ret;
 }
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH stable 4.9 11/29] inet: frags: remove inet_frag_maybe_warn_overflow()
From: Florian Fainelli @ 2018-10-09 22:49 UTC (permalink / raw)
  To: netdev; +Cc: davem, gregkh, stable, edumazet, sthemmin
In-Reply-To: <20181009224924.30151-1-f.fainelli@gmail.com>

From: Eric Dumazet <edumazet@google.com>

This function is obsolete, after rhashtable addition to inet defrag.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2d44ed22e607f9a285b049de2263e3840673a260)
---
 include/net/inet_frag.h                 |  2 --
 net/ieee802154/6lowpan/reassembly.c     |  5 ++---
 net/ipv4/inet_fragment.c                | 11 -----------
 net/ipv4/ip_fragment.c                  |  5 ++---
 net/ipv6/netfilter/nf_conntrack_reasm.c |  5 ++---
 net/ipv6/reassembly.c                   |  5 ++---
 6 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 7e984045b2b7..23161bf5d899 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -109,8 +109,6 @@ void inet_frags_exit_net(struct netns_frags *nf);
 void inet_frag_kill(struct inet_frag_queue *q);
 void inet_frag_destroy(struct inet_frag_queue *q);
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);
-void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
-				   const char *prefix);
 
 static inline void inet_frag_put(struct inet_frag_queue *q)
 {
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index a63360a05108..b54015981af9 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -83,10 +83,9 @@ fq_find(struct net *net, const struct lowpan_802154_cb *cb,
 	struct inet_frag_queue *q;
 
 	q = inet_frag_find(&ieee802154_lowpan->frags, &key);
-	if (IS_ERR_OR_NULL(q)) {
-		inet_frag_maybe_warn_overflow(q, pr_fmt());
+	if (!q)
 		return NULL;
-	}
+
 	return container_of(q, struct lowpan_frag_queue, q);
 }
 
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index a50ac25878aa..47c240f50b99 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -217,14 +217,3 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key)
 	return inet_frag_create(nf, key);
 }
 EXPORT_SYMBOL(inet_frag_find);
-
-void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
-				   const char *prefix)
-{
-	static const char msg[] = "inet_frag_find: Fragment hash bucket"
-		" list length grew over limit. Dropping fragment.\n";
-
-	if (PTR_ERR(q) == -ENOBUFS)
-		net_dbg_ratelimited("%s%s", prefix, msg);
-}
-EXPORT_SYMBOL(inet_frag_maybe_warn_overflow);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 308592a8ba97..696bfef06caa 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -219,10 +219,9 @@ static struct ipq *ip_find(struct net *net, struct iphdr *iph,
 	struct inet_frag_queue *q;
 
 	q = inet_frag_find(&net->ipv4.frags, &key);
-	if (IS_ERR_OR_NULL(q)) {
-		inet_frag_maybe_warn_overflow(q, pr_fmt());
+	if (!q)
 		return NULL;
-	}
+
 	return container_of(q, struct ipq, q);
 }
 
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 314568d8b84a..267f2ae2d05c 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -177,10 +177,9 @@ static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user,
 	struct inet_frag_queue *q;
 
 	q = inet_frag_find(&net->nf_frag.frags, &key);
-	if (IS_ERR_OR_NULL(q)) {
-		inet_frag_maybe_warn_overflow(q, pr_fmt());
+	if (!q)
 		return NULL;
-	}
+
 	return container_of(q, struct frag_queue, q);
 }
 
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 629a45a4c79f..6de4cec69054 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -154,10 +154,9 @@ fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
 		key.iif = 0;
 
 	q = inet_frag_find(&net->ipv6.frags, &key);
-	if (IS_ERR_OR_NULL(q)) {
-		inet_frag_maybe_warn_overflow(q, pr_fmt());
+	if (!q)
 		return NULL;
-	}
+
 	return container_of(q, struct frag_queue, q);
 }
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH stable 4.9 13/29] inet: frags: do not clone skb in ip_expire()
From: Florian Fainelli @ 2018-10-09 22:49 UTC (permalink / raw)
  To: netdev; +Cc: davem, gregkh, stable, edumazet, sthemmin
In-Reply-To: <20181009224924.30151-1-f.fainelli@gmail.com>

From: Eric Dumazet <edumazet@google.com>

An skb_clone() was added in commit ec4fbd64751d ("inet: frag: release
spinlock before calling icmp_send()")

While fixing the bug at that time, it also added a very high cost
for DDOS frags, as the ICMP rate limit is applied after this
expensive operation (skb_clone() + consume_skb(), implying memory
allocations, copy, and freeing)

We can use skb_get(head) here, all we want is to make sure skb wont
be freed by another cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1eec5d5670084ee644597bd26c25e22c69b9f748)
---
 net/ipv4/ip_fragment.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 3dd19bebeb55..e235f62dab58 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -141,8 +141,8 @@ static bool frag_expire_skip_icmp(u32 user)
  */
 static void ip_expire(unsigned long arg)
 {
-	struct sk_buff *clone, *head;
 	const struct iphdr *iph;
+	struct sk_buff *head;
 	struct net *net;
 	struct ipq *qp;
 	int err;
@@ -185,16 +185,12 @@ static void ip_expire(unsigned long arg)
 	    (skb_rtable(head)->rt_type != RTN_LOCAL))
 		goto out;
 
-	clone = skb_clone(head, GFP_ATOMIC);
+	skb_get(head);
+	spin_unlock(&qp->q.lock);
+	icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
+	kfree_skb(head);
+	goto out_rcu_unlock;
 
-	/* Send an ICMP "Fragment Reassembly Timeout" message. */
-	if (clone) {
-		spin_unlock(&qp->q.lock);
-		icmp_send(clone, ICMP_TIME_EXCEEDED,
-			  ICMP_EXC_FRAGTIME, 0);
-		consume_skb(clone);
-		goto out_rcu_unlock;
-	}
 out:
 	spin_unlock(&qp->q.lock);
 out_rcu_unlock:
-- 
2.17.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox