* Re: [PATCH net-next] liquidio: 'imply' ptp instead of 'select'
From: David Miller @ 2016-12-04 4:39 UTC (permalink / raw)
To: arnd
Cc: felix.manlunas, tglx, david.daney, satananda.burla, rvatsavayi,
nicolas.pitre, sgoutham, netdev, linux-kernel
In-Reply-To: <20161202230451.1639318-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Sat, 3 Dec 2016 00:04:32 +0100
> ptp now depends on the optional POSIX_TIMERS setting and fails to build
> if we select it without that:
>
> warning: (LIQUIDIO_VF && TI_CPTS) selects PTP_1588_CLOCK which has unmet direct dependencies (NET && POSIX_TIMERS)
> warning: (LIQUIDIO_VF && TI_CPTS) selects PTP_1588_CLOCK which has unmet direct dependencies (NET && POSIX_TIMERS)
> ERROR: "posix_clock_unregister" [drivers/ptp/ptp.ko] undefined!
> ERROR: "posix_clock_register" [drivers/ptp/ptp.ko] undefined!
> ERROR: "pps_unregister_source" [drivers/ptp/ptp.ko] undefined!
> ERROR: "pps_event" [drivers/ptp/ptp.ko] undefined!
> ERROR: "pps_register_source" [drivers/ptp/ptp.ko] undefined!
>
> It seems that two patches have collided here, the build failure
> is a result of the combination. Changing the new option to 'imply'
> as well fixes it.
>
> Fixes: 111fc64a237f ("liquidio CN23XX: VF registration")
> Fixes: d1cbfd771ce8 ("ptp_clock: Allow for it to be optional")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Like the kbuild robot, when I apply this it complains about 'imply' being
an unknown option.
I guess it worked for you because support for 'imply' exists in the -next
tree and gets pulled in from somewhere else.
In any event, as-is I cannot apply this.
^ permalink raw reply
* Re: [PATCH 1/1] net: dcb: set error code on failures
From: David Miller @ 2016-12-04 4:55 UTC (permalink / raw)
To: bianpan201602; +Cc: netdev, linux-kernel, bianpan2016
In-Reply-To: <1480772948-7087-1-git-send-email-bianpan201602@163.com>
From: Pan Bian <bianpan201602@163.com>
Date: Sat, 3 Dec 2016 21:49:08 +0800
> From: Pan Bian <bianpan2016@163.com>
>
> In function dcbnl_cee_fill(), returns the value of variable err on
> errors. However, on some error paths (e.g. nla put fails), its value may
> be 0. It may be better to explicitly set a negative errno to variable
> err before returning.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188881
>
> Signed-off-by: Pan Bian <bianpan2016@163.com>
Applied, thanks.
^ permalink raw reply
* [PATCH net v2] ipv6: Allow IPv4-mapped address as next-hop
From: Erik Nordmark @ 2016-12-04 4:57 UTC (permalink / raw)
To: davem, nordmark; +Cc: netdev, Bob Gilligan
Made kernel accept IPv6 routes with IPv4-mapped address as next-hop.
It is possible to configure IP interfaces with IPv4-mapped addresses, and
one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior
to this fix the kernel returned an EINVAL when attempting to add an IPv6
route with an IPv4-mapped address as a nexthop/gateway.
RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops,
thus in order to support that type of address configuration the kernel
needs to allow IPv4-mapped addresses as nexthops.
Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
---
v2 honoring minimum 1000 ft vertical separation between Thunderbird and patches (fixed whitespace issues)
net/ipv6/route.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1b57e11..86bdb02 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1995,8 +1995,11 @@ static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg)
It is very good, but in some (rare!) circumstances
(SIT, PtP, NBMA NOARP links) it is handy to allow
some exceptions. --ANK
+ We allow IPv4-mapped nexthops to support RFC4798-type
+ addressing
*/
- if (!(gwa_type & IPV6_ADDR_UNICAST))
+ if (!(gwa_type & (IPV6_ADDR_UNICAST |
+ IPV6_ADDR_MAPPED)))
goto out;
if (cfg->fc_table) {
--
1.8.1.4
^ permalink raw reply related
* [PATCH 1/1] isdn: hisax: set error code on failure
From: Pan Bian @ 2016-12-04 5:15 UTC (permalink / raw)
To: Karsten Keil, netdev; +Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
In function hfc4s8s_probe(), the value of return variable err should be
negative on failures. However, when the call to request_region() returns
NULL, the value of err is 0. This patch fixes the bug, assiging
"-ENOMEM" to err on the path that request_region() fails.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188931
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/isdn/hisax/hfc4s8s_l1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/isdn/hisax/hfc4s8s_l1.c b/drivers/isdn/hisax/hfc4s8s_l1.c
index 9600cd7..3172cee 100644
--- a/drivers/isdn/hisax/hfc4s8s_l1.c
+++ b/drivers/isdn/hisax/hfc4s8s_l1.c
@@ -1499,6 +1499,7 @@ struct hfc4s8s_l1 {
printk(KERN_INFO
"HFC-4S/8S: failed to request address space at 0x%04x\n",
hw->iobase);
+ err = -ENOMEM;
goto out;
}
--
1.9.1
^ permalink raw reply related
* [PATCH 1/1] net: irda: set error code on failures
From: Pan Bian @ 2016-12-04 5:27 UTC (permalink / raw)
To: Samuel Ortiz, netdev; +Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
When the calls to kzalloc() fail, the value of return variable ret may
be 0. 0 means success in this context. This patch fixes the bug,
assigning "-ENOMEM" to ret before calling kzalloc().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188971
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/irda/irda-usb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/irda/irda-usb.c b/drivers/net/irda/irda-usb.c
index a198946..8716b8c 100644
--- a/drivers/net/irda/irda-usb.c
+++ b/drivers/net/irda/irda-usb.c
@@ -1723,6 +1723,7 @@ static int irda_usb_probe(struct usb_interface *intf,
/* Don't change this buffer size and allocation without doing
* some heavy and complete testing. Don't ask why :-(
* Jean II */
+ ret = -ENOMEM;
self->speed_buff = kzalloc(IRDA_USB_SPEED_MTU, GFP_KERNEL);
if (!self->speed_buff)
goto err_out_3;
--
1.9.1
^ permalink raw reply related
* Re: [net-next PATCH v4 1/6] net: virtio dynamically disable/enable LRO
From: Michael S. Tsirkin @ 2016-12-04 5:36 UTC (permalink / raw)
To: John Fastabend
Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
netdev, bblanco, brouer
In-Reply-To: <20161202204945.4331.2419.stgit@john-Precision-Tower-5810>
On Fri, Dec 02, 2016 at 12:49:45PM -0800, John Fastabend wrote:
> This adds support for dynamically setting the LRO feature flag. The
> message to control guest features in the backend uses the
> CTRL_GUEST_OFFLOADS msg type.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> drivers/net/virtio_net.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a21d93a..d814e7cb 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1419,6 +1419,41 @@ static void virtnet_init_settings(struct net_device *dev)
> .set_settings = virtnet_set_settings,
> };
>
> +static int virtnet_set_features(struct net_device *netdev,
> + netdev_features_t features)
> +{
> + struct virtnet_info *vi = netdev_priv(netdev);
> + struct virtio_device *vdev = vi->vdev;
> + struct scatterlist sg;
> + u64 offloads = 0;
> +
> + if (features & NETIF_F_LRO)
> + offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
> + (1 << VIRTIO_NET_F_GUEST_TSO6);
> +
> + if (features & NETIF_F_RXCSUM)
> + offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
> +
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> + sg_init_one(&sg, &offloads, sizeof(uint64_t));
> + if (!virtnet_send_command(vi,
> + VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> + VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
> + &sg)) {
> + dev_warn(&netdev->dev,
> + "Failed to set guest offloads by virtnet command.\n");
> + return -EINVAL;
> + }
> + } else if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) &&
> + !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
No need for VIRTIO_F_VERSION_1 here.
> + dev_warn(&netdev->dev,
> + "No support for setting offloads pre version_1.\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> static const struct net_device_ops virtnet_netdev = {
> .ndo_open = virtnet_open,
> .ndo_stop = virtnet_close,
> @@ -1435,6 +1470,7 @@ static void virtnet_init_settings(struct net_device *dev)
> #ifdef CONFIG_NET_RX_BUSY_POLL
> .ndo_busy_poll = virtnet_busy_poll,
> #endif
> + .ndo_set_features = virtnet_set_features,
> };
>
> static void virtnet_config_changed_work(struct work_struct *work)
> @@ -1815,6 +1851,12 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> dev->features |= NETIF_F_RXCSUM;
>
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> + virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> + dev->features |= NETIF_F_LRO;
> + dev->hw_features |= NETIF_F_LRO;
> + }
> +
> dev->vlan_features = dev->features;
>
> /* MTU range: 68 - 65535 */
> @@ -2057,7 +2099,8 @@ static int virtnet_restore(struct virtio_device *vdev)
> VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> VIRTIO_NET_F_CTRL_MAC_ADDR, \
> - VIRTIO_NET_F_MTU
> + VIRTIO_NET_F_MTU, \
> + VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>
> static unsigned int features[] = {
> VIRTNET_FEATURES,
^ permalink raw reply
* [PATCH 1/1] atm: fix improper return value
From: Pan Bian @ 2016-12-04 5:45 UTC (permalink / raw)
To: Chas Williams, linux-atm-general, netdev; +Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
It returns variable "error" when ioremap_nocache() returns a NULL
pointer. The value of "error" is 0 then, which will mislead the callers
to believe that there is no error. This patch fixes the bug, returning
"-ENOMEM".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189021
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/atm/eni.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c
index f2aaf9e..40c2d56 100644
--- a/drivers/atm/eni.c
+++ b/drivers/atm/eni.c
@@ -1727,7 +1727,7 @@ static int eni_do_init(struct atm_dev *dev)
printk("\n");
printk(KERN_ERR DEV_LABEL "(itf %d): can't set up page "
"mapping\n",dev->number);
- return error;
+ return -ENOMEM;
}
eni_dev->ioaddr = base;
eni_dev->base_diff = real_base - (unsigned long) base;
--
1.9.1
^ permalink raw reply related
* [PATCH 1/1] net: ethernet: qlogic: set error code on failure
From: Pan Bian @ 2016-12-04 5:53 UTC (permalink / raw)
To: Yuval Mintz, Ariel Elior, everest-linux-l2, netdev; +Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
When calling dma_mapping_error(), the value of return variable rc is 0.
And when the call returns an unexpected value, rc is not set to a
negative errno. Thus, it will return 0 on the error path, and its
callers cannot detect the bug. This patch fixes the bug, assigning
"-ENOMEM" to err.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189041
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/ethernet/qlogic/qed/qed_ll2.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index f95385c..62ae55b 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -1730,6 +1730,7 @@ static int qed_ll2_start_xmit(struct qed_dev *cdev, struct sk_buff *skb)
mapping))) {
DP_NOTICE(cdev,
"Unable to map frag - dropping packet\n");
+ rc = -ENOMEM;
goto err;
}
} else {
--
1.9.1
^ permalink raw reply related
* Re: [PATCH v3 07/13] net: ethernet: ti: cpts: clean up event list if event pool is empty
From: kbuild test robot @ 2016-12-04 6:09 UTC (permalink / raw)
To: Grygorii Strashko
Cc: kbuild-all-JC7UmRfGjtg, David S. Miller,
netdev-u79uwXL29TY76Z2rM5mHXA, Mugunthan V N, Richard Cochran,
Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-omap-u79uwXL29TY76Z2rM5mHXA, Rob Herring,
devicetree-u79uwXL29TY76Z2rM5mHXA, Murali Karicheri, Wingman Kwok,
Thomas Gleixner, Grygorii Strashko
In-Reply-To: <20161202203023.25526-8-grygorii.strashko-l0cyMroinI0@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 3916 bytes --]
Hi WingMan,
[auto build test ERROR on net/master]
[also build test ERROR on v4.9-rc7 next-20161202]
[cannot apply to net-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Grygorii-Strashko/net-ethernet-ti-cpts-switch-to-readl-writel_relaxed/20161204-010355
config: arm-omap2plus_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm
Note: the linux-review/Grygorii-Strashko/net-ethernet-ti-cpts-switch-to-readl-writel_relaxed/20161204-010355 HEAD f938805dce662197c057c75ac67849f60da87c9f builds fine.
It only hurts bisectibility.
All errors (new ones prefixed by >>):
In file included from include/linux/dma-mapping.h:6:0,
from include/linux/skbuff.h:34,
from include/linux/ip.h:20,
from include/linux/ptp_classify.h:26,
from drivers/net/ethernet/ti/cpts.c:25:
drivers/net/ethernet/ti/cpts.c: In function 'cpts_purge_events':
>> drivers/net/ethernet/ti/cpts.c:76:15: error: 'struct cpts' has no member named 'dev'
dev_dbg(cpts->dev, "cpts: event pool cleaned up %d\n", removed);
^
include/linux/device.h:1209:26: note: in definition of macro 'dev_dbg'
dev_printk(KERN_DEBUG, dev, format, ##arg); \
^~~
drivers/net/ethernet/ti/cpts.c: In function 'cpts_fifo_read':
drivers/net/ethernet/ti/cpts.c:94:16: error: 'struct cpts' has no member named 'dev'
dev_err(cpts->dev, "cpts: event pool empty\n");
^~
vim +76 drivers/net/ethernet/ti/cpts.c
19 */
20 #include <linux/err.h>
21 #include <linux/if.h>
22 #include <linux/hrtimer.h>
23 #include <linux/module.h>
24 #include <linux/net_tstamp.h>
> 25 #include <linux/ptp_classify.h>
26 #include <linux/time.h>
27 #include <linux/uaccess.h>
28 #include <linux/workqueue.h>
29 #include <linux/if_ether.h>
30 #include <linux/if_vlan.h>
31
32 #include "cpts.h"
33
34 #define cpts_read32(c, r) readl_relaxed(&c->reg->r)
35 #define cpts_write32(c, v, r) writel_relaxed(v, &c->reg->r)
36
37 static int event_expired(struct cpts_event *event)
38 {
39 return time_after(jiffies, event->tmo);
40 }
41
42 static int event_type(struct cpts_event *event)
43 {
44 return (event->high >> EVENT_TYPE_SHIFT) & EVENT_TYPE_MASK;
45 }
46
47 static int cpts_fifo_pop(struct cpts *cpts, u32 *high, u32 *low)
48 {
49 u32 r = cpts_read32(cpts, intstat_raw);
50
51 if (r & TS_PEND_RAW) {
52 *high = cpts_read32(cpts, event_high);
53 *low = cpts_read32(cpts, event_low);
54 cpts_write32(cpts, EVENT_POP, event_pop);
55 return 0;
56 }
57 return -1;
58 }
59
60 static int cpts_purge_events(struct cpts *cpts)
61 {
62 struct list_head *this, *next;
63 struct cpts_event *event;
64 int removed = 0;
65
66 list_for_each_safe(this, next, &cpts->events) {
67 event = list_entry(this, struct cpts_event, list);
68 if (event_expired(event)) {
69 list_del_init(&event->list);
70 list_add(&event->list, &cpts->pool);
71 ++removed;
72 }
73 }
74
75 if (removed)
> 76 dev_dbg(cpts->dev, "cpts: event pool cleaned up %d\n", removed);
77 return removed ? 0 : -1;
78 }
79
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28475 bytes --]
^ permalink raw reply
* [PATCH 1/1] net: ethernet: qlogic: fix improper return value
From: Pan Bian @ 2016-12-04 6:15 UTC (permalink / raw)
To: Harish Patil, Manish Chopra, Dept-GELinuxNICDev, netdev
Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
When the call to qlcnic_alloc_mbx_args() fails, returning variable "err"
seems improper. With reference to the context, returing variable
"config" may be better.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189101
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index bdbcd2b..21c4aca 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -3189,7 +3189,7 @@ int qlcnic_83xx_test_link(struct qlcnic_adapter *adapter)
err = qlcnic_alloc_mbx_args(&cmd, adapter, QLCNIC_CMD_GET_LINK_STATUS);
if (err)
- return err;
+ return config;
err = qlcnic_issue_cmd(adapter, &cmd);
if (err) {
--
1.9.1
^ permalink raw reply related
* [PATCH 1/1] net: ethernet: broadcom: fix improper return value
From: Pan Bian @ 2016-12-04 6:29 UTC (permalink / raw)
To: Yuval Mintz, Ariel Elior, everest-linux-l2, netdev; +Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
Marco BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate
memory, and jumps to label "lbl" if the allocation fails. Label "lbl"
first cleans memory and then returns variable rc. Before calling the
macro, the value of variable rc is 0. Because 0 means no error, the
callers of bnx2x_init_firmware() may be misled. This patch fixes the bug,
assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 0cee4c0..6f9fc20 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13505,6 +13505,7 @@ static int bnx2x_init_firmware(struct bnx2x *bp)
/* Initialize the pointers to the init arrays */
/* Blob */
+ rc = -ENOMEM;
BNX2X_ALLOC_AND_SET(init_data, request_firmware_exit, be32_to_cpu_n);
/* Opcodes */
--
1.9.1
^ permalink raw reply related
* Re: [PATCN net-next] net_sched: gen_estimator: complete rewrite of rate estimators
From: Eric Dumazet @ 2016-12-04 7:12 UTC (permalink / raw)
To: David Miller; +Cc: netdev, netfilter-devel
In-Reply-To: <1480835273.18162.457.camel@edumazet-glaptop3.roam.corp.google.com>
On Sat, 2016-12-03 at 23:07 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> 1) Old code was hard to maintain, due to complex lock chains.
> (We probably will be able to remove some kfree_rcu() in callers)
>
> 2) Using a single timer to update all estimators does not scale.
>
> 3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
> is not supposed to work well)
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index c7adcb57654ea57d1ba6702c91743cb7d2c74d28..859b60bfa86712031186fffc09c65bc43aa065dd 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2081,6 +2081,14 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
> limit <<= factor;
>
> if (atomic_read(&sk->sk_wmem_alloc) > limit) {
> + /* Special case where TX completion is delayed too much :
> + * If the skb we try to send is the first skb in write queue,
> + * then send it !
> + * No need to wait for TX completion to call us back.
> + */
> + if (skb == sk->sk_write_queue.next)
> + return false;
> +
Oups, this has nothing to do here. I will send a v2, sorry.
^ permalink raw reply
* [PATCN v2 net-next] net_sched: gen_estimator: complete rewrite of rate estimators
From: Eric Dumazet @ 2016-12-04 7:18 UTC (permalink / raw)
To: David Miller; +Cc: netdev, netfilter-devel
In-Reply-To: <1480835273.18162.457.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <edumazet@google.com>
1) Old code was hard to maintain, due to complex lock chains.
(We probably will be able to remove some kfree_rcu() in callers)
2) Using a single timer to update all estimators does not scale.
3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
is not supposed to work well)
In this rewrite :
- I removed the RB tree that had to be scanned in
gen_estimator_active(). qdisc dumps should be much faster.
- Each estimator has its own timer.
- Estimations are maintained in net_rate_estimator structure,
instead of dirtying the qdisc. Minor, but part of the simplification.
- Reading the estimator uses RCU and a seqcount to provide proper
support for 32bit kernels.
- We reduce memory need when estimators are not used, since
we store a pointer, instead of the bytes/packets counters.
- xt_rateest_mt() no longer has to grab a spinlock.
(In the future, xt_rateest_tg() could be switched to per cpu counters)
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
v2: removed unwanted changes to tcp_output.c
Renamed some parameters to please htmldoc
include/net/act_api.h | 2
include/net/gen_stats.h | 17 -
include/net/netfilter/xt_rateest.h | 10
include/net/sch_generic.h | 2
net/core/gen_estimator.c | 299 +++++++++------------------
net/core/gen_stats.c | 17 -
net/netfilter/xt_RATEEST.c | 4
net/netfilter/xt_rateest.c | 28 +-
net/sched/act_api.c | 9
net/sched/act_police.c | 21 +
net/sched/sch_api.c | 2
net/sched/sch_cbq.c | 6
net/sched/sch_drr.c | 6
net/sched/sch_generic.c | 2
net/sched/sch_hfsc.c | 6
net/sched/sch_htb.c | 6
net/sched/sch_qfq.c | 8
17 files changed, 181 insertions(+), 264 deletions(-)
diff --git a/include/net/act_api.h b/include/net/act_api.h
index 9dddf77a69ccbcb003cfa66bcc0de337f78f3dae..1d716449209e4753a297c61a287077a1eb96e6d8 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -36,7 +36,7 @@ struct tc_action {
struct tcf_t tcfa_tm;
struct gnet_stats_basic_packed tcfa_bstats;
struct gnet_stats_queue tcfa_qstats;
- struct gnet_stats_rate_est64 tcfa_rate_est;
+ struct net_rate_estimator __rcu *tcfa_rate_est;
spinlock_t tcfa_lock;
struct rcu_head tcfa_rcu;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 231e121cc7d9c72075e7e6dde3655d631f64a1c4..8b7aa370e7a4af61fcb71ed751dba72ebead6143 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -11,6 +11,8 @@ struct gnet_stats_basic_cpu {
struct u64_stats_sync syncp;
};
+struct net_rate_estimator;
+
struct gnet_dump {
spinlock_t * lock;
struct sk_buff * skb;
@@ -42,8 +44,7 @@ void __gnet_stats_copy_basic(const seqcount_t *running,
struct gnet_stats_basic_cpu __percpu *cpu,
struct gnet_stats_basic_packed *b);
int gnet_stats_copy_rate_est(struct gnet_dump *d,
- const struct gnet_stats_basic_packed *b,
- struct gnet_stats_rate_est64 *r);
+ struct net_rate_estimator __rcu **ptr);
int gnet_stats_copy_queue(struct gnet_dump *d,
struct gnet_stats_queue __percpu *cpu_q,
struct gnet_stats_queue *q, __u32 qlen);
@@ -53,16 +54,16 @@ int gnet_stats_finish_copy(struct gnet_dump *d);
int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- struct gnet_stats_rate_est64 *rate_est,
+ struct net_rate_estimator __rcu **rate_est,
spinlock_t *stats_lock,
seqcount_t *running, struct nlattr *opt);
-void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
- struct gnet_stats_rate_est64 *rate_est);
+void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- struct gnet_stats_rate_est64 *rate_est,
+ struct net_rate_estimator __rcu **ptr,
spinlock_t *stats_lock,
seqcount_t *running, struct nlattr *opt);
-bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
- const struct gnet_stats_rate_est64 *rate_est);
+bool gen_estimator_active(struct net_rate_estimator __rcu **ptr);
+bool gen_estimator_read(struct net_rate_estimator __rcu **ptr,
+ struct gnet_stats_rate_est64 *sample);
#endif
diff --git a/include/net/netfilter/xt_rateest.h b/include/net/netfilter/xt_rateest.h
index 79f45e19f31eaa54deb8437ef4b883bec78def84..130e58361f998ecdf361910b196e69778224d955 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -1,19 +1,23 @@
#ifndef _XT_RATEEST_H
#define _XT_RATEEST_H
+#include <net/gen_stats.h>
+
struct xt_rateest {
/* keep lock and bstats on same cache line to speedup xt_rateest_tg() */
struct gnet_stats_basic_packed bstats;
spinlock_t lock;
- /* keep rstats and lock on same cache line to speedup xt_rateest_mt() */
- struct gnet_stats_rate_est64 rstats;
+
/* following fields not accessed in hot path */
+ unsigned int refcnt;
struct hlist_node list;
char name[IFNAMSIZ];
- unsigned int refcnt;
struct gnet_estimator params;
struct rcu_head rcu;
+
+ /* keep this field far away to speedup xt_rateest_mt() */
+ struct net_rate_estimator __rcu *rate_est;
};
struct xt_rateest *xt_rateest_lookup(const char *name);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e6aa0a249672a802dce583166f4f808154b77a96..498f81b229a4565e140f1a054ce931e80361e8b2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -76,7 +76,7 @@ struct Qdisc {
struct netdev_queue *dev_queue;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
struct gnet_stats_queue __percpu *cpu_qstats;
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 0993844faeeaefb221ea619fd9d796600b82fbb9..101b5d0e2142e6a813f6b524a59945379f02bcb3 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -7,6 +7,7 @@
* 2 of the License, or (at your option) any later version.
*
* Authors: Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
+ * Eric Dumazet <edumazet@google.com>
*
* Changes:
* Jamal Hadi Salim - moved it to net/core and reshulfed
@@ -30,171 +31,79 @@
#include <linux/skbuff.h>
#include <linux/rtnetlink.h>
#include <linux/init.h>
-#include <linux/rbtree.h>
#include <linux/slab.h>
+#include <linux/seqlock.h>
#include <net/sock.h>
#include <net/gen_stats.h>
-/*
- This code is NOT intended to be used for statistics collection,
- its purpose is to provide a base for statistical multiplexing
- for controlled load service.
- If you need only statistics, run a user level daemon which
- periodically reads byte counters.
-
- Unfortunately, rate estimation is not a very easy task.
- F.e. I did not find a simple way to estimate the current peak rate
- and even failed to formulate the problem 8)8)
-
- So I preferred not to built an estimator into the scheduler,
- but run this task separately.
- Ideally, it should be kernel thread(s), but for now it runs
- from timers, which puts apparent top bounds on the number of rated
- flows, has minimal overhead on small, but is enough
- to handle controlled load service, sets of aggregates.
-
- We measure rate over A=(1<<interval) seconds and evaluate EWMA:
-
- avrate = avrate*(1-W) + rate*W
-
- where W is chosen as negative power of 2: W = 2^(-ewma_log)
-
- The resulting time constant is:
-
- T = A/(-ln(1-W))
-
-
- NOTES.
-
- * avbps and avpps are scaled by 2^5.
- * both values are reported as 32 bit unsigned values. bps can
- overflow for fast links : max speed being 34360Mbit/sec
- * Minimal interval is HZ/4=250msec (it is the greatest common divisor
- for HZ=100 and HZ=1024 8)), maximal interval
- is (HZ*2^EST_MAX_INTERVAL)/4 = 8sec. Shorter intervals
- are too expensive, longer ones can be implemented
- at user level painlessly.
+/* This code is NOT intended to be used for statistics collection,
+ * its purpose is to provide a base for statistical multiplexing
+ * for controlled load service.
+ * If you need only statistics, run a user level daemon which
+ * periodically reads byte counters.
*/
-#define EST_MAX_INTERVAL 5
-
-struct gen_estimator {
- struct list_head list;
+struct net_rate_estimator {
struct gnet_stats_basic_packed *bstats;
- struct gnet_stats_rate_est64 *rate_est;
spinlock_t *stats_lock;
seqcount_t *running;
- int ewma_log;
+ struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+ u8 ewma_log;
+ u8 intvl_log; /* period : (250ms << intvl_log) */
+
+ seqcount_t seq;
u32 last_packets;
- unsigned long avpps;
u64 last_bytes;
+
+ u64 avpps;
u64 avbps;
- struct rcu_head e_rcu;
- struct rb_node node;
- struct gnet_stats_basic_cpu __percpu *cpu_bstats;
- struct rcu_head head;
-};
-struct gen_estimator_head {
- unsigned long next_jiffies;
- struct timer_list timer;
- struct list_head list;
+ unsigned long next_jiffies;
+ struct timer_list timer;
+ struct rcu_head rcu;
};
-static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
-
-/* Protects against NULL dereference */
-static DEFINE_RWLOCK(est_lock);
-
-/* Protects against soft lockup during large deletion */
-static struct rb_root est_root = RB_ROOT;
-static DEFINE_SPINLOCK(est_tree_lock);
-
-static void est_timer(unsigned long arg)
+static void est_fetch_counters(struct net_rate_estimator *e,
+ struct gnet_stats_basic_packed *b)
{
- int idx = (int)arg;
- struct gen_estimator *e;
+ if (e->stats_lock)
+ spin_lock(e->stats_lock);
- rcu_read_lock();
- list_for_each_entry_rcu(e, &elist[idx].list, list) {
- struct gnet_stats_basic_packed b = {0};
- unsigned long rate;
- u64 brate;
-
- if (e->stats_lock)
- spin_lock(e->stats_lock);
- read_lock(&est_lock);
- if (e->bstats == NULL)
- goto skip;
-
- __gnet_stats_copy_basic(e->running, &b, e->cpu_bstats, e->bstats);
-
- brate = (b.bytes - e->last_bytes)<<(7 - idx);
- e->last_bytes = b.bytes;
- e->avbps += (brate >> e->ewma_log) - (e->avbps >> e->ewma_log);
- WRITE_ONCE(e->rate_est->bps, (e->avbps + 0xF) >> 5);
-
- rate = b.packets - e->last_packets;
- rate <<= (7 - idx);
- e->last_packets = b.packets;
- e->avpps += (rate >> e->ewma_log) - (e->avpps >> e->ewma_log);
- WRITE_ONCE(e->rate_est->pps, (e->avpps + 0xF) >> 5);
-skip:
- read_unlock(&est_lock);
- if (e->stats_lock)
- spin_unlock(e->stats_lock);
- }
+ __gnet_stats_copy_basic(e->running, b, e->cpu_bstats, e->bstats);
- if (!list_empty(&elist[idx].list)) {
- elist[idx].next_jiffies += ((HZ/4) << idx);
+ if (e->stats_lock)
+ spin_unlock(e->stats_lock);
- if (unlikely(time_after_eq(jiffies, elist[idx].next_jiffies))) {
- /* Ouch... timer was delayed. */
- elist[idx].next_jiffies = jiffies + 1;
- }
- mod_timer(&elist[idx].timer, elist[idx].next_jiffies);
- }
- rcu_read_unlock();
}
-static void gen_add_node(struct gen_estimator *est)
+static void est_timer(unsigned long arg)
{
- struct rb_node **p = &est_root.rb_node, *parent = NULL;
+ struct net_rate_estimator *est = (struct net_rate_estimator *)arg;
+ struct gnet_stats_basic_packed b;
+ u64 rate, brate;
- while (*p) {
- struct gen_estimator *e;
+ est_fetch_counters(est, &b);
+ brate = (b.bytes - est->last_bytes) << (8 - est->ewma_log);
+ brate -= (est->avbps >> est->ewma_log);
- parent = *p;
- e = rb_entry(parent, struct gen_estimator, node);
+ rate = (u64)(b.packets - est->last_packets) << (8 - est->ewma_log);
+ rate -= (est->avpps >> est->ewma_log);
- if (est->bstats > e->bstats)
- p = &parent->rb_right;
- else
- p = &parent->rb_left;
- }
- rb_link_node(&est->node, parent, p);
- rb_insert_color(&est->node, &est_root);
-}
-
-static
-struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats,
- const struct gnet_stats_rate_est64 *rate_est)
-{
- struct rb_node *p = est_root.rb_node;
+ write_seqcount_begin(&est->seq);
+ est->avbps += brate;
+ est->avpps += rate;
+ write_seqcount_end(&est->seq);
- while (p) {
- struct gen_estimator *e;
+ est->last_bytes = b.bytes;
+ est->last_packets = b.packets;
- e = rb_entry(p, struct gen_estimator, node);
+ est->next_jiffies += ((HZ/4) << est->intvl_log);
- if (bstats > e->bstats)
- p = p->rb_right;
- else if (bstats < e->bstats || rate_est != e->rate_est)
- p = p->rb_left;
- else
- return e;
+ if (unlikely(time_after_eq(jiffies, est->next_jiffies))) {
+ /* Ouch... timer was delayed. */
+ est->next_jiffies = jiffies + 1;
}
- return NULL;
+ mod_timer(&est->timer, est->next_jiffies);
}
/**
@@ -217,84 +126,76 @@ struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
*/
int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- struct gnet_stats_rate_est64 *rate_est,
+ struct net_rate_estimator __rcu **rate_est,
spinlock_t *stats_lock,
seqcount_t *running,
struct nlattr *opt)
{
- struct gen_estimator *est;
struct gnet_estimator *parm = nla_data(opt);
- struct gnet_stats_basic_packed b = {0};
- int idx;
+ struct net_rate_estimator *old, *est;
+ struct gnet_stats_basic_packed b;
+ int intvl_log;
if (nla_len(opt) < sizeof(*parm))
return -EINVAL;
+ /* allowed timer periods are :
+ * -2 : 250ms, -1 : 500ms, 0 : 1 sec
+ * 1 : 2 sec, 2 : 4 sec, 3 : 8 sec
+ */
if (parm->interval < -2 || parm->interval > 3)
return -EINVAL;
est = kzalloc(sizeof(*est), GFP_KERNEL);
- if (est == NULL)
+ if (!est)
return -ENOBUFS;
- __gnet_stats_copy_basic(running, &b, cpu_bstats, bstats);
-
- idx = parm->interval + 2;
+ seqcount_init(&est->seq);
+ intvl_log = parm->interval + 2;
est->bstats = bstats;
- est->rate_est = rate_est;
est->stats_lock = stats_lock;
est->running = running;
est->ewma_log = parm->ewma_log;
- est->last_bytes = b.bytes;
- est->avbps = rate_est->bps<<5;
- est->last_packets = b.packets;
- est->avpps = rate_est->pps<<10;
+ est->intvl_log = intvl_log;
est->cpu_bstats = cpu_bstats;
- spin_lock_bh(&est_tree_lock);
- if (!elist[idx].timer.function) {
- INIT_LIST_HEAD(&elist[idx].list);
- setup_timer(&elist[idx].timer, est_timer, idx);
+ est_fetch_counters(est, &b);
+ est->last_bytes = b.bytes;
+ est->last_packets = b.packets;
+ old = rcu_dereference_protected(*rate_est, 1);
+ if (old) {
+ del_timer_sync(&old->timer);
+ est->avbps = old->avbps;
+ est->avpps = old->avpps;
}
- if (list_empty(&elist[idx].list)) {
- elist[idx].next_jiffies = jiffies + ((HZ/4) << idx);
- mod_timer(&elist[idx].timer, elist[idx].next_jiffies);
- }
- list_add_rcu(&est->list, &elist[idx].list);
- gen_add_node(est);
- spin_unlock_bh(&est_tree_lock);
+ est->next_jiffies = jiffies + ((HZ/4) << intvl_log);
+ setup_timer(&est->timer, est_timer, (unsigned long)est);
+ mod_timer(&est->timer, est->next_jiffies);
+ rcu_assign_pointer(*rate_est, est);
+ if (old)
+ kfree_rcu(old, rcu);
return 0;
}
EXPORT_SYMBOL(gen_new_estimator);
/**
* gen_kill_estimator - remove a rate estimator
- * @bstats: basic statistics
- * @rate_est: rate estimator statistics
+ * @rate_est: rate estimator
*
- * Removes the rate estimator specified by &bstats and &rate_est.
+ * Removes the rate estimator.
*
- * Note : Caller should respect an RCU grace period before freeing stats_lock
*/
-void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
- struct gnet_stats_rate_est64 *rate_est)
+void gen_kill_estimator(struct net_rate_estimator __rcu **rate_est)
{
- struct gen_estimator *e;
-
- spin_lock_bh(&est_tree_lock);
- while ((e = gen_find_node(bstats, rate_est))) {
- rb_erase(&e->node, &est_root);
+ struct net_rate_estimator *est;
- write_lock(&est_lock);
- e->bstats = NULL;
- write_unlock(&est_lock);
-
- list_del_rcu(&e->list);
- kfree_rcu(e, e_rcu);
+ est = xchg((__force struct net_rate_estimator **)rate_est, NULL);
+ if (est) {
+ del_timer_sync(&est->timer);
+ kfree_rcu(est, rcu);
}
- spin_unlock_bh(&est_tree_lock);
}
EXPORT_SYMBOL(gen_kill_estimator);
@@ -314,33 +215,47 @@ EXPORT_SYMBOL(gen_kill_estimator);
*/
int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- struct gnet_stats_rate_est64 *rate_est,
+ struct net_rate_estimator __rcu **rate_est,
spinlock_t *stats_lock,
seqcount_t *running, struct nlattr *opt)
{
- gen_kill_estimator(bstats, rate_est);
- return gen_new_estimator(bstats, cpu_bstats, rate_est, stats_lock, running, opt);
+ return gen_new_estimator(bstats, cpu_bstats, rate_est,
+ stats_lock, running, opt);
}
EXPORT_SYMBOL(gen_replace_estimator);
/**
* gen_estimator_active - test if estimator is currently in use
- * @bstats: basic statistics
- * @rate_est: rate estimator statistics
+ * @rate_est: rate estimator
*
* Returns true if estimator is active, and false if not.
*/
-bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
- const struct gnet_stats_rate_est64 *rate_est)
+bool gen_estimator_active(struct net_rate_estimator __rcu **rate_est)
{
- bool res;
+ return !!rcu_access_pointer(*rate_est);
+}
+EXPORT_SYMBOL(gen_estimator_active);
- ASSERT_RTNL();
+bool gen_estimator_read(struct net_rate_estimator __rcu **rate_est,
+ struct gnet_stats_rate_est64 *sample)
+{
+ struct net_rate_estimator *est;
+ unsigned seq;
+
+ rcu_read_lock();
+ est = rcu_dereference(*rate_est);
+ if (!est) {
+ rcu_read_unlock();
+ return false;
+ }
- spin_lock_bh(&est_tree_lock);
- res = gen_find_node(bstats, rate_est) != NULL;
- spin_unlock_bh(&est_tree_lock);
+ do {
+ seq = read_seqcount_begin(&est->seq);
+ sample->bps = est->avbps >> 8;
+ sample->pps = est->avpps >> 8;
+ } while (read_seqcount_retry(&est->seq, seq));
- return res;
+ rcu_read_unlock();
+ return true;
}
-EXPORT_SYMBOL(gen_estimator_active);
+EXPORT_SYMBOL(gen_estimator_read);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 508e051304fb62627e61b5065b2325edd1b84f2e..55fd980568d855d9558afcdcf37d3d316ae20acc 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -205,18 +205,17 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
*/
int
gnet_stats_copy_rate_est(struct gnet_dump *d,
- const struct gnet_stats_basic_packed *b,
- struct gnet_stats_rate_est64 *r)
+ struct net_rate_estimator __rcu **ptr)
{
+ struct gnet_stats_rate_est64 sample;
struct gnet_stats_rate_est est;
int res;
- if (b && !gen_estimator_active(b, r))
+ if (!gen_estimator_read(ptr, &sample))
return 0;
-
- est.bps = min_t(u64, UINT_MAX, r->bps);
+ est.bps = min_t(u64, UINT_MAX, sample.bps);
/* we have some time before reaching 2^32 packets per second */
- est.pps = r->pps;
+ est.pps = sample.pps;
if (d->compat_tc_stats) {
d->tc_stats.bps = est.bps;
@@ -226,11 +225,11 @@ gnet_stats_copy_rate_est(struct gnet_dump *d,
if (d->tail) {
res = gnet_stats_copy(d, TCA_STATS_RATE_EST, &est, sizeof(est),
TCA_STATS_PAD);
- if (res < 0 || est.bps == r->bps)
+ if (res < 0 || est.bps == sample.bps)
return res;
/* emit 64bit stats only if needed */
- return gnet_stats_copy(d, TCA_STATS_RATE_EST64, r, sizeof(*r),
- TCA_STATS_PAD);
+ return gnet_stats_copy(d, TCA_STATS_RATE_EST64, &sample,
+ sizeof(sample), TCA_STATS_PAD);
}
return 0;
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index dbd6c4a12b97435d2937c92fd79a035fd5828965..91a373a3f534de8d8641341c33c08d8fc49cbd29 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -63,7 +63,7 @@ void xt_rateest_put(struct xt_rateest *est)
mutex_lock(&xt_rateest_mutex);
if (--est->refcnt == 0) {
hlist_del(&est->list);
- gen_kill_estimator(&est->bstats, &est->rstats);
+ gen_kill_estimator(&est->rate_est);
/*
* gen_estimator est_timer() might access est->lock or bstats,
* wait a RCU grace period before freeing 'est'
@@ -132,7 +132,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
cfg.est.interval = info->interval;
cfg.est.ewma_log = info->ewma_log;
- ret = gen_new_estimator(&est->bstats, NULL, &est->rstats,
+ ret = gen_new_estimator(&est->bstats, NULL, &est->rate_est,
&est->lock, NULL, &cfg.opt);
if (ret < 0)
goto err2;
diff --git a/net/netfilter/xt_rateest.c b/net/netfilter/xt_rateest.c
index 7720b036d76a84c949080c8c32827b604c80da64..1db02f6fca54d7eb5d60c2d8c5cb8ab11656fbcb 100644
--- a/net/netfilter/xt_rateest.c
+++ b/net/netfilter/xt_rateest.c
@@ -18,35 +18,33 @@ static bool
xt_rateest_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct xt_rateest_match_info *info = par->matchinfo;
- struct gnet_stats_rate_est64 *r;
+ struct gnet_stats_rate_est64 sample = {0};
u_int32_t bps1, bps2, pps1, pps2;
bool ret = true;
- spin_lock_bh(&info->est1->lock);
- r = &info->est1->rstats;
+ gen_estimator_read(&info->est1->rate_est, &sample);
+
if (info->flags & XT_RATEEST_MATCH_DELTA) {
- bps1 = info->bps1 >= r->bps ? info->bps1 - r->bps : 0;
- pps1 = info->pps1 >= r->pps ? info->pps1 - r->pps : 0;
+ bps1 = info->bps1 >= sample.bps ? info->bps1 - sample.bps : 0;
+ pps1 = info->pps1 >= sample.pps ? info->pps1 - sample.pps : 0;
} else {
- bps1 = r->bps;
- pps1 = r->pps;
+ bps1 = sample.bps;
+ pps1 = sample.pps;
}
- spin_unlock_bh(&info->est1->lock);
if (info->flags & XT_RATEEST_MATCH_ABS) {
bps2 = info->bps2;
pps2 = info->pps2;
} else {
- spin_lock_bh(&info->est2->lock);
- r = &info->est2->rstats;
+ gen_estimator_read(&info->est2->rate_est, &sample);
+
if (info->flags & XT_RATEEST_MATCH_DELTA) {
- bps2 = info->bps2 >= r->bps ? info->bps2 - r->bps : 0;
- pps2 = info->pps2 >= r->pps ? info->pps2 - r->pps : 0;
+ bps2 = info->bps2 >= sample.bps ? info->bps2 - sample.bps : 0;
+ pps2 = info->pps2 >= sample.pps ? info->pps2 - sample.pps : 0;
} else {
- bps2 = r->bps;
- pps2 = r->pps;
+ bps2 = sample.bps;
+ pps2 = sample.pps;
}
- spin_unlock_bh(&info->est2->lock);
}
switch (info->mode) {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index f893d180da1caa3b6dd1cc8773920beb1885f9b0..2095c83ce7730d49550103a8efad943d28815617 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -41,8 +41,7 @@ static void tcf_hash_destroy(struct tcf_hashinfo *hinfo, struct tc_action *p)
spin_lock_bh(&hinfo->lock);
hlist_del(&p->tcfa_head);
spin_unlock_bh(&hinfo->lock);
- gen_kill_estimator(&p->tcfa_bstats,
- &p->tcfa_rate_est);
+ gen_kill_estimator(&p->tcfa_rate_est);
/*
* gen_estimator est_timer() might access p->tcfa_lock
* or bstats, wait a RCU grace period before freeing p
@@ -237,8 +236,7 @@ EXPORT_SYMBOL(tcf_hash_check);
void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est)
{
if (est)
- gen_kill_estimator(&a->tcfa_bstats,
- &a->tcfa_rate_est);
+ gen_kill_estimator(&a->tcfa_rate_est);
call_rcu(&a->tcfa_rcu, free_tcf);
}
EXPORT_SYMBOL(tcf_hash_cleanup);
@@ -670,8 +668,7 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *p,
goto errout;
if (gnet_stats_copy_basic(NULL, &d, p->cpu_bstats, &p->tcfa_bstats) < 0 ||
- gnet_stats_copy_rate_est(&d, &p->tcfa_bstats,
- &p->tcfa_rate_est) < 0 ||
+ gnet_stats_copy_rate_est(&d, &p->tcfa_rate_est) < 0 ||
gnet_stats_copy_queue(&d, p->cpu_qstats,
&p->tcfa_qstats,
p->tcfa_qstats.qlen) < 0)
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index c990b73a6c85151862c30971bd3787d20dbcecd1..0ba91d1ce99480c4817684dbe0b4fde61811e03e 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -142,8 +142,7 @@ static int tcf_act_police_init(struct net *net, struct nlattr *nla,
goto failure_unlock;
} else if (tb[TCA_POLICE_AVRATE] &&
(ret == ACT_P_CREATED ||
- !gen_estimator_active(&police->tcf_bstats,
- &police->tcf_rate_est))) {
+ !gen_estimator_active(&police->tcf_rate_est))) {
err = -EINVAL;
goto failure_unlock;
}
@@ -216,13 +215,17 @@ static int tcf_act_police(struct sk_buff *skb, const struct tc_action *a,
bstats_update(&police->tcf_bstats, skb);
tcf_lastuse_update(&police->tcf_tm);
- if (police->tcfp_ewma_rate &&
- police->tcf_rate_est.bps >= police->tcfp_ewma_rate) {
- police->tcf_qstats.overlimits++;
- if (police->tcf_action == TC_ACT_SHOT)
- police->tcf_qstats.drops++;
- spin_unlock(&police->tcf_lock);
- return police->tcf_action;
+ if (police->tcfp_ewma_rate) {
+ struct gnet_stats_rate_est64 sample;
+
+ if (!gen_estimator_read(&police->tcf_rate_est, &sample) ||
+ sample.bps >= police->tcfp_ewma_rate) {
+ police->tcf_qstats.overlimits++;
+ if (police->tcf_action == TC_ACT_SHOT)
+ police->tcf_qstats.drops++;
+ spin_unlock(&police->tcf_lock);
+ return police->tcf_action;
+ }
}
if (qdisc_pkt_len(skb) <= police->tcfp_mtu) {
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index f337f1bdd1d4a4cac738f9fe1fbd42e5984174fe..d7b93429f0cca61ff489536b4247f31f3690fdd8 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1395,7 +1395,7 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(q),
&d, cpu_bstats, &q->bstats) < 0 ||
- gnet_stats_copy_rate_est(&d, &q->bstats, &q->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(&d, &q->rate_est) < 0 ||
gnet_stats_copy_queue(&d, cpu_qstats, &q->qstats, qlen) < 0)
goto nla_put_failure;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index beb554aa8cfbb4acb5d89511b5e0030b4c9cf26e..9ffe1c220b02848032474fa7b9b2c2f09d40b110 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -122,7 +122,7 @@ struct cbq_class {
psched_time_t penalized;
struct gnet_stats_basic_packed bstats;
struct gnet_stats_queue qstats;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct tc_cbq_xstats xstats;
struct tcf_proto __rcu *filter_list;
@@ -1346,7 +1346,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL, &cl->qstats, cl->q->q.qlen) < 0)
return -1;
@@ -1405,7 +1405,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
tcf_destroy_chain(&cl->filter_list);
qdisc_destroy(cl->q);
qdisc_put_rtab(cl->R_tab);
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
if (cl != &q->link)
kfree(cl);
}
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 8af5c59eef848db47cc33a878296693dabb44955..bb4cbdf7500482b170eef6e7923cf2f2259e52b5 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -25,7 +25,7 @@ struct drr_class {
struct gnet_stats_basic_packed bstats;
struct gnet_stats_queue qstats;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct list_head alist;
struct Qdisc *qdisc;
@@ -142,7 +142,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
static void drr_destroy_class(struct Qdisc *sch, struct drr_class *cl)
{
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
qdisc_destroy(cl->qdisc);
kfree(cl);
}
@@ -283,7 +283,7 @@ static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL, &cl->qdisc->qstats, qlen) < 0)
return -1;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6cfb6e9038c28187cc888cebf004e61270f00871..6eb9c8e88519a36fc3ef82be7c7f1dbe0ef1e13c 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -709,7 +709,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
qdisc_put_stab(rtnl_dereference(qdisc->stab));
#endif
- gen_kill_estimator(&qdisc->bstats, &qdisc->rate_est);
+ gen_kill_estimator(&qdisc->rate_est);
if (ops->reset)
ops->reset(qdisc);
if (ops->destroy)
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 000f1d36128e1abfc0657bce779a7df852f66384..3ffaa6fb0990f0aa31487a2f1829b1f1accf8b21 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -114,7 +114,7 @@ struct hfsc_class {
struct gnet_stats_basic_packed bstats;
struct gnet_stats_queue qstats;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct tcf_proto __rcu *filter_list; /* filter list */
unsigned int filter_cnt; /* filter count */
unsigned int level; /* class level in hierarchy */
@@ -1091,7 +1091,7 @@ hfsc_destroy_class(struct Qdisc *sch, struct hfsc_class *cl)
tcf_destroy_chain(&cl->filter_list);
qdisc_destroy(cl->qdisc);
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
if (cl != &q->root)
kfree(cl);
}
@@ -1348,7 +1348,7 @@ hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
xstats.rtwork = cl->cl_cumul;
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL, &cl->qstats, cl->qdisc->q.qlen) < 0)
return -1;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 9926fe4f3b6f7db9aeba22fbbfae3d47c4e2af60..760f39e7caeeb91b6c34d561f64f1ed97c884a32 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -111,7 +111,7 @@ struct htb_class {
unsigned int children;
struct htb_class *parent; /* parent class */
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
/*
* Written often fields
@@ -1145,7 +1145,7 @@ htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL, &qs, qlen) < 0)
return -1;
@@ -1228,7 +1228,7 @@ static void htb_destroy_class(struct Qdisc *sch, struct htb_class *cl)
WARN_ON(!cl->un.leaf.q);
qdisc_destroy(cl->un.leaf.q);
}
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
tcf_destroy_chain(&cl->filter_list);
kfree(cl);
}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index ca0516e6f74356f47dffcc2262f233ee50f0c47c..f9e712ce2d15ce9280c31d2f75d62b84034ae51d 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -137,7 +137,7 @@ struct qfq_class {
struct gnet_stats_basic_packed bstats;
struct gnet_stats_queue qstats;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct Qdisc *qdisc;
struct list_head alist; /* Link for active-classes list. */
struct qfq_aggregate *agg; /* Parent aggregate. */
@@ -508,7 +508,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
new_agg = kzalloc(sizeof(*new_agg), GFP_KERNEL);
if (new_agg == NULL) {
err = -ENOBUFS;
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
goto destroy_class;
}
sch_tree_lock(sch);
@@ -533,7 +533,7 @@ static void qfq_destroy_class(struct Qdisc *sch, struct qfq_class *cl)
struct qfq_sched *q = qdisc_priv(sch);
qfq_rm_from_agg(q, cl);
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
qdisc_destroy(cl->qdisc);
kfree(cl);
}
@@ -667,7 +667,7 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL,
&cl->qdisc->qstats, cl->qdisc->q.qlen) < 0)
return -1;
^ permalink raw reply related
* [PATCN net-next] net_sched: gen_estimator: complete rewrite of rate estimators
From: Eric Dumazet @ 2016-12-04 7:07 UTC (permalink / raw)
To: David Miller; +Cc: netdev, netfilter-devel
From: Eric Dumazet <edumazet@google.com>
1) Old code was hard to maintain, due to complex lock chains.
(We probably will be able to remove some kfree_rcu() in callers)
2) Using a single timer to update all estimators does not scale.
3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
is not supposed to work well)
In this rewrite :
- I removed the RB tree that had to be scanned in
gen_estimator_active(). qdisc dumps should be much faster.
- Each estimator has its own timer.
- Estimations are maintained in net_rate_estimator structure,
instead of dirtying the qdisc. Minor, but part of the simplification.
- Reading the estimator uses RCU and a seqcount to provide proper
support for 32bit kernels.
- We reduce memory need when estimators are not used, since
we store a pointer, instead of the bytes/packets counters.
- xt_rateest_mt() no longer has to grab a spinlock.
(In the future, xt_rateest_tg() could be switched to per cpu counters)
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/act_api.h | 2
include/net/gen_stats.h | 17 -
include/net/netfilter/xt_rateest.h | 10
include/net/sch_generic.h | 2
net/core/gen_estimator.c | 298 +++++++++------------------
net/core/gen_stats.c | 17 -
net/ipv4/tcp_output.c | 8
net/netfilter/xt_RATEEST.c | 4
net/netfilter/xt_rateest.c | 28 +-
net/sched/act_api.c | 9
net/sched/act_police.c | 21 +
net/sched/sch_api.c | 2
net/sched/sch_cbq.c | 6
net/sched/sch_drr.c | 6
net/sched/sch_generic.c | 2
net/sched/sch_hfsc.c | 6
net/sched/sch_htb.c | 6
net/sched/sch_qfq.c | 8
18 files changed, 188 insertions(+), 264 deletions(-)
diff --git a/include/net/act_api.h b/include/net/act_api.h
index 9dddf77a69ccbcb003cfa66bcc0de337f78f3dae..1d716449209e4753a297c61a287077a1eb96e6d8 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -36,7 +36,7 @@ struct tc_action {
struct tcf_t tcfa_tm;
struct gnet_stats_basic_packed tcfa_bstats;
struct gnet_stats_queue tcfa_qstats;
- struct gnet_stats_rate_est64 tcfa_rate_est;
+ struct net_rate_estimator __rcu *tcfa_rate_est;
spinlock_t tcfa_lock;
struct rcu_head tcfa_rcu;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 231e121cc7d9c72075e7e6dde3655d631f64a1c4..8b7aa370e7a4af61fcb71ed751dba72ebead6143 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -11,6 +11,8 @@ struct gnet_stats_basic_cpu {
struct u64_stats_sync syncp;
};
+struct net_rate_estimator;
+
struct gnet_dump {
spinlock_t * lock;
struct sk_buff * skb;
@@ -42,8 +44,7 @@ void __gnet_stats_copy_basic(const seqcount_t *running,
struct gnet_stats_basic_cpu __percpu *cpu,
struct gnet_stats_basic_packed *b);
int gnet_stats_copy_rate_est(struct gnet_dump *d,
- const struct gnet_stats_basic_packed *b,
- struct gnet_stats_rate_est64 *r);
+ struct net_rate_estimator __rcu **ptr);
int gnet_stats_copy_queue(struct gnet_dump *d,
struct gnet_stats_queue __percpu *cpu_q,
struct gnet_stats_queue *q, __u32 qlen);
@@ -53,16 +54,16 @@ int gnet_stats_finish_copy(struct gnet_dump *d);
int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- struct gnet_stats_rate_est64 *rate_est,
+ struct net_rate_estimator __rcu **rate_est,
spinlock_t *stats_lock,
seqcount_t *running, struct nlattr *opt);
-void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
- struct gnet_stats_rate_est64 *rate_est);
+void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- struct gnet_stats_rate_est64 *rate_est,
+ struct net_rate_estimator __rcu **ptr,
spinlock_t *stats_lock,
seqcount_t *running, struct nlattr *opt);
-bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
- const struct gnet_stats_rate_est64 *rate_est);
+bool gen_estimator_active(struct net_rate_estimator __rcu **ptr);
+bool gen_estimator_read(struct net_rate_estimator __rcu **ptr,
+ struct gnet_stats_rate_est64 *sample);
#endif
diff --git a/include/net/netfilter/xt_rateest.h b/include/net/netfilter/xt_rateest.h
index 79f45e19f31eaa54deb8437ef4b883bec78def84..130e58361f998ecdf361910b196e69778224d955 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -1,19 +1,23 @@
#ifndef _XT_RATEEST_H
#define _XT_RATEEST_H
+#include <net/gen_stats.h>
+
struct xt_rateest {
/* keep lock and bstats on same cache line to speedup xt_rateest_tg() */
struct gnet_stats_basic_packed bstats;
spinlock_t lock;
- /* keep rstats and lock on same cache line to speedup xt_rateest_mt() */
- struct gnet_stats_rate_est64 rstats;
+
/* following fields not accessed in hot path */
+ unsigned int refcnt;
struct hlist_node list;
char name[IFNAMSIZ];
- unsigned int refcnt;
struct gnet_estimator params;
struct rcu_head rcu;
+
+ /* keep this field far away to speedup xt_rateest_mt() */
+ struct net_rate_estimator __rcu *rate_est;
};
struct xt_rateest *xt_rateest_lookup(const char *name);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e6aa0a249672a802dce583166f4f808154b77a96..498f81b229a4565e140f1a054ce931e80361e8b2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -76,7 +76,7 @@ struct Qdisc {
struct netdev_queue *dev_queue;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
struct gnet_stats_queue __percpu *cpu_qstats;
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 0993844faeeaefb221ea619fd9d796600b82fbb9..d9cc2e0c1b8f18b417c91dd2f57e646b7c602a63 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -7,6 +7,7 @@
* 2 of the License, or (at your option) any later version.
*
* Authors: Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru>
+ * Eric Dumazet <edumazet@google.com>
*
* Changes:
* Jamal Hadi Salim - moved it to net/core and reshulfed
@@ -30,171 +31,79 @@
#include <linux/skbuff.h>
#include <linux/rtnetlink.h>
#include <linux/init.h>
-#include <linux/rbtree.h>
#include <linux/slab.h>
+#include <linux/seqlock.h>
#include <net/sock.h>
#include <net/gen_stats.h>
-/*
- This code is NOT intended to be used for statistics collection,
- its purpose is to provide a base for statistical multiplexing
- for controlled load service.
- If you need only statistics, run a user level daemon which
- periodically reads byte counters.
-
- Unfortunately, rate estimation is not a very easy task.
- F.e. I did not find a simple way to estimate the current peak rate
- and even failed to formulate the problem 8)8)
-
- So I preferred not to built an estimator into the scheduler,
- but run this task separately.
- Ideally, it should be kernel thread(s), but for now it runs
- from timers, which puts apparent top bounds on the number of rated
- flows, has minimal overhead on small, but is enough
- to handle controlled load service, sets of aggregates.
-
- We measure rate over A=(1<<interval) seconds and evaluate EWMA:
-
- avrate = avrate*(1-W) + rate*W
-
- where W is chosen as negative power of 2: W = 2^(-ewma_log)
-
- The resulting time constant is:
-
- T = A/(-ln(1-W))
-
-
- NOTES.
-
- * avbps and avpps are scaled by 2^5.
- * both values are reported as 32 bit unsigned values. bps can
- overflow for fast links : max speed being 34360Mbit/sec
- * Minimal interval is HZ/4=250msec (it is the greatest common divisor
- for HZ=100 and HZ=1024 8)), maximal interval
- is (HZ*2^EST_MAX_INTERVAL)/4 = 8sec. Shorter intervals
- are too expensive, longer ones can be implemented
- at user level painlessly.
+/* This code is NOT intended to be used for statistics collection,
+ * its purpose is to provide a base for statistical multiplexing
+ * for controlled load service.
+ * If you need only statistics, run a user level daemon which
+ * periodically reads byte counters.
*/
-#define EST_MAX_INTERVAL 5
-
-struct gen_estimator {
- struct list_head list;
+struct net_rate_estimator {
struct gnet_stats_basic_packed *bstats;
- struct gnet_stats_rate_est64 *rate_est;
spinlock_t *stats_lock;
seqcount_t *running;
- int ewma_log;
+ struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+ u8 ewma_log;
+ u8 intvl_log; /* period : (250ms << intvl_log) */
+
+ seqcount_t seq;
u32 last_packets;
- unsigned long avpps;
u64 last_bytes;
+
+ u64 avpps;
u64 avbps;
- struct rcu_head e_rcu;
- struct rb_node node;
- struct gnet_stats_basic_cpu __percpu *cpu_bstats;
- struct rcu_head head;
-};
-struct gen_estimator_head {
- unsigned long next_jiffies;
- struct timer_list timer;
- struct list_head list;
+ unsigned long next_jiffies;
+ struct timer_list timer;
+ struct rcu_head rcu;
};
-static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
-
-/* Protects against NULL dereference */
-static DEFINE_RWLOCK(est_lock);
-
-/* Protects against soft lockup during large deletion */
-static struct rb_root est_root = RB_ROOT;
-static DEFINE_SPINLOCK(est_tree_lock);
-
-static void est_timer(unsigned long arg)
+static void est_fetch_counters(struct net_rate_estimator *e,
+ struct gnet_stats_basic_packed *b)
{
- int idx = (int)arg;
- struct gen_estimator *e;
+ if (e->stats_lock)
+ spin_lock(e->stats_lock);
- rcu_read_lock();
- list_for_each_entry_rcu(e, &elist[idx].list, list) {
- struct gnet_stats_basic_packed b = {0};
- unsigned long rate;
- u64 brate;
-
- if (e->stats_lock)
- spin_lock(e->stats_lock);
- read_lock(&est_lock);
- if (e->bstats == NULL)
- goto skip;
-
- __gnet_stats_copy_basic(e->running, &b, e->cpu_bstats, e->bstats);
-
- brate = (b.bytes - e->last_bytes)<<(7 - idx);
- e->last_bytes = b.bytes;
- e->avbps += (brate >> e->ewma_log) - (e->avbps >> e->ewma_log);
- WRITE_ONCE(e->rate_est->bps, (e->avbps + 0xF) >> 5);
-
- rate = b.packets - e->last_packets;
- rate <<= (7 - idx);
- e->last_packets = b.packets;
- e->avpps += (rate >> e->ewma_log) - (e->avpps >> e->ewma_log);
- WRITE_ONCE(e->rate_est->pps, (e->avpps + 0xF) >> 5);
-skip:
- read_unlock(&est_lock);
- if (e->stats_lock)
- spin_unlock(e->stats_lock);
- }
+ __gnet_stats_copy_basic(e->running, b, e->cpu_bstats, e->bstats);
- if (!list_empty(&elist[idx].list)) {
- elist[idx].next_jiffies += ((HZ/4) << idx);
+ if (e->stats_lock)
+ spin_unlock(e->stats_lock);
- if (unlikely(time_after_eq(jiffies, elist[idx].next_jiffies))) {
- /* Ouch... timer was delayed. */
- elist[idx].next_jiffies = jiffies + 1;
- }
- mod_timer(&elist[idx].timer, elist[idx].next_jiffies);
- }
- rcu_read_unlock();
}
-static void gen_add_node(struct gen_estimator *est)
+static void est_timer(unsigned long arg)
{
- struct rb_node **p = &est_root.rb_node, *parent = NULL;
+ struct net_rate_estimator *est = (struct net_rate_estimator *)arg;
+ struct gnet_stats_basic_packed b;
+ u64 rate, brate;
- while (*p) {
- struct gen_estimator *e;
+ est_fetch_counters(est, &b);
+ brate = (b.bytes - est->last_bytes) << (8 - est->ewma_log);
+ brate -= (est->avbps >> est->ewma_log);
- parent = *p;
- e = rb_entry(parent, struct gen_estimator, node);
+ rate = (u64)(b.packets - est->last_packets) << (8 - est->ewma_log);
+ rate -= (est->avpps >> est->ewma_log);
- if (est->bstats > e->bstats)
- p = &parent->rb_right;
- else
- p = &parent->rb_left;
- }
- rb_link_node(&est->node, parent, p);
- rb_insert_color(&est->node, &est_root);
-}
-
-static
-struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats,
- const struct gnet_stats_rate_est64 *rate_est)
-{
- struct rb_node *p = est_root.rb_node;
+ write_seqcount_begin(&est->seq);
+ est->avbps += brate;
+ est->avpps += rate;
+ write_seqcount_end(&est->seq);
- while (p) {
- struct gen_estimator *e;
+ est->last_bytes = b.bytes;
+ est->last_packets = b.packets;
- e = rb_entry(p, struct gen_estimator, node);
+ est->next_jiffies += ((HZ/4) << est->intvl_log);
- if (bstats > e->bstats)
- p = p->rb_right;
- else if (bstats < e->bstats || rate_est != e->rate_est)
- p = p->rb_left;
- else
- return e;
+ if (unlikely(time_after_eq(jiffies, est->next_jiffies))) {
+ /* Ouch... timer was delayed. */
+ est->next_jiffies = jiffies + 1;
}
- return NULL;
+ mod_timer(&est->timer, est->next_jiffies);
}
/**
@@ -217,84 +126,76 @@ struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
*/
int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- struct gnet_stats_rate_est64 *rate_est,
+ struct net_rate_estimator __rcu **rate_est,
spinlock_t *stats_lock,
seqcount_t *running,
struct nlattr *opt)
{
- struct gen_estimator *est;
struct gnet_estimator *parm = nla_data(opt);
- struct gnet_stats_basic_packed b = {0};
- int idx;
+ struct net_rate_estimator *old, *est;
+ struct gnet_stats_basic_packed b;
+ int intvl_log;
if (nla_len(opt) < sizeof(*parm))
return -EINVAL;
+ /* allowed timer periods are :
+ * -2 : 250ms, -1 : 500ms, 0 : 1 sec
+ * 1 : 2 sec, 2 : 4 sec, 3 : 8 sec
+ */
if (parm->interval < -2 || parm->interval > 3)
return -EINVAL;
est = kzalloc(sizeof(*est), GFP_KERNEL);
- if (est == NULL)
+ if (!est)
return -ENOBUFS;
- __gnet_stats_copy_basic(running, &b, cpu_bstats, bstats);
-
- idx = parm->interval + 2;
+ seqcount_init(&est->seq);
+ intvl_log = parm->interval + 2;
est->bstats = bstats;
- est->rate_est = rate_est;
est->stats_lock = stats_lock;
est->running = running;
est->ewma_log = parm->ewma_log;
- est->last_bytes = b.bytes;
- est->avbps = rate_est->bps<<5;
- est->last_packets = b.packets;
- est->avpps = rate_est->pps<<10;
+ est->intvl_log = intvl_log;
est->cpu_bstats = cpu_bstats;
- spin_lock_bh(&est_tree_lock);
- if (!elist[idx].timer.function) {
- INIT_LIST_HEAD(&elist[idx].list);
- setup_timer(&elist[idx].timer, est_timer, idx);
+ est_fetch_counters(est, &b);
+ est->last_bytes = b.bytes;
+ est->last_packets = b.packets;
+ old = rcu_dereference_protected(*rate_est, 1);
+ if (old) {
+ del_timer_sync(&old->timer);
+ est->avbps = old->avbps;
+ est->avpps = old->avpps;
}
- if (list_empty(&elist[idx].list)) {
- elist[idx].next_jiffies = jiffies + ((HZ/4) << idx);
- mod_timer(&elist[idx].timer, elist[idx].next_jiffies);
- }
- list_add_rcu(&est->list, &elist[idx].list);
- gen_add_node(est);
- spin_unlock_bh(&est_tree_lock);
+ est->next_jiffies = jiffies + ((HZ/4) << intvl_log);
+ setup_timer(&est->timer, est_timer, (unsigned long)est);
+ mod_timer(&est->timer, est->next_jiffies);
+ rcu_assign_pointer(*rate_est, est);
+ if (old)
+ kfree_rcu(old, rcu);
return 0;
}
EXPORT_SYMBOL(gen_new_estimator);
/**
* gen_kill_estimator - remove a rate estimator
- * @bstats: basic statistics
- * @rate_est: rate estimator statistics
+ * @est: rate estimator
*
- * Removes the rate estimator specified by &bstats and &rate_est.
+ * Removes the rate estimator.
*
- * Note : Caller should respect an RCU grace period before freeing stats_lock
*/
-void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
- struct gnet_stats_rate_est64 *rate_est)
+void gen_kill_estimator(struct net_rate_estimator __rcu **ptr)
{
- struct gen_estimator *e;
-
- spin_lock_bh(&est_tree_lock);
- while ((e = gen_find_node(bstats, rate_est))) {
- rb_erase(&e->node, &est_root);
+ struct net_rate_estimator *est;
- write_lock(&est_lock);
- e->bstats = NULL;
- write_unlock(&est_lock);
-
- list_del_rcu(&e->list);
- kfree_rcu(e, e_rcu);
+ est = xchg((__force struct net_rate_estimator **)ptr, NULL);
+ if (est) {
+ del_timer_sync(&est->timer);
+ kfree_rcu(est, rcu);
}
- spin_unlock_bh(&est_tree_lock);
}
EXPORT_SYMBOL(gen_kill_estimator);
@@ -314,33 +215,46 @@ EXPORT_SYMBOL(gen_kill_estimator);
*/
int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
- struct gnet_stats_rate_est64 *rate_est,
+ struct net_rate_estimator __rcu **ptr,
spinlock_t *stats_lock,
seqcount_t *running, struct nlattr *opt)
{
- gen_kill_estimator(bstats, rate_est);
- return gen_new_estimator(bstats, cpu_bstats, rate_est, stats_lock, running, opt);
+ return gen_new_estimator(bstats, cpu_bstats, ptr, stats_lock, running, opt);
}
EXPORT_SYMBOL(gen_replace_estimator);
/**
* gen_estimator_active - test if estimator is currently in use
- * @bstats: basic statistics
- * @rate_est: rate estimator statistics
+ * @rate_est: rate estimator
*
* Returns true if estimator is active, and false if not.
*/
-bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
- const struct gnet_stats_rate_est64 *rate_est)
+bool gen_estimator_active(struct net_rate_estimator __rcu **ptr)
{
- bool res;
+ return !!rcu_access_pointer(*ptr);
+}
+EXPORT_SYMBOL(gen_estimator_active);
- ASSERT_RTNL();
+bool gen_estimator_read(struct net_rate_estimator __rcu **ptr,
+ struct gnet_stats_rate_est64 *sample)
+{
+ struct net_rate_estimator *est;
+ unsigned seq;
+
+ rcu_read_lock();
+ est = rcu_dereference(*ptr);
+ if (!est) {
+ rcu_read_unlock();
+ return false;
+ }
- spin_lock_bh(&est_tree_lock);
- res = gen_find_node(bstats, rate_est) != NULL;
- spin_unlock_bh(&est_tree_lock);
+ do {
+ seq = read_seqcount_begin(&est->seq);
+ sample->bps = est->avbps >> 8;
+ sample->pps = est->avpps >> 8;
+ } while (read_seqcount_retry(&est->seq, seq));
- return res;
+ rcu_read_unlock();
+ return true;
}
-EXPORT_SYMBOL(gen_estimator_active);
+EXPORT_SYMBOL(gen_estimator_read);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 508e051304fb62627e61b5065b2325edd1b84f2e..55fd980568d855d9558afcdcf37d3d316ae20acc 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -205,18 +205,17 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
*/
int
gnet_stats_copy_rate_est(struct gnet_dump *d,
- const struct gnet_stats_basic_packed *b,
- struct gnet_stats_rate_est64 *r)
+ struct net_rate_estimator __rcu **ptr)
{
+ struct gnet_stats_rate_est64 sample;
struct gnet_stats_rate_est est;
int res;
- if (b && !gen_estimator_active(b, r))
+ if (!gen_estimator_read(ptr, &sample))
return 0;
-
- est.bps = min_t(u64, UINT_MAX, r->bps);
+ est.bps = min_t(u64, UINT_MAX, sample.bps);
/* we have some time before reaching 2^32 packets per second */
- est.pps = r->pps;
+ est.pps = sample.pps;
if (d->compat_tc_stats) {
d->tc_stats.bps = est.bps;
@@ -226,11 +225,11 @@ gnet_stats_copy_rate_est(struct gnet_dump *d,
if (d->tail) {
res = gnet_stats_copy(d, TCA_STATS_RATE_EST, &est, sizeof(est),
TCA_STATS_PAD);
- if (res < 0 || est.bps == r->bps)
+ if (res < 0 || est.bps == sample.bps)
return res;
/* emit 64bit stats only if needed */
- return gnet_stats_copy(d, TCA_STATS_RATE_EST64, r, sizeof(*r),
- TCA_STATS_PAD);
+ return gnet_stats_copy(d, TCA_STATS_RATE_EST64, &sample,
+ sizeof(sample), TCA_STATS_PAD);
}
return 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c7adcb57654ea57d1ba6702c91743cb7d2c74d28..859b60bfa86712031186fffc09c65bc43aa065dd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2081,6 +2081,14 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
limit <<= factor;
if (atomic_read(&sk->sk_wmem_alloc) > limit) {
+ /* Special case where TX completion is delayed too much :
+ * If the skb we try to send is the first skb in write queue,
+ * then send it !
+ * No need to wait for TX completion to call us back.
+ */
+ if (skb == sk->sk_write_queue.next)
+ return false;
+
set_bit(TSQ_THROTTLED, &tcp_sk(sk)->tsq_flags);
/* It is possible TX completion already happened
* before we set TSQ_THROTTLED, so we must
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index dbd6c4a12b97435d2937c92fd79a035fd5828965..91a373a3f534de8d8641341c33c08d8fc49cbd29 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -63,7 +63,7 @@ void xt_rateest_put(struct xt_rateest *est)
mutex_lock(&xt_rateest_mutex);
if (--est->refcnt == 0) {
hlist_del(&est->list);
- gen_kill_estimator(&est->bstats, &est->rstats);
+ gen_kill_estimator(&est->rate_est);
/*
* gen_estimator est_timer() might access est->lock or bstats,
* wait a RCU grace period before freeing 'est'
@@ -132,7 +132,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
cfg.est.interval = info->interval;
cfg.est.ewma_log = info->ewma_log;
- ret = gen_new_estimator(&est->bstats, NULL, &est->rstats,
+ ret = gen_new_estimator(&est->bstats, NULL, &est->rate_est,
&est->lock, NULL, &cfg.opt);
if (ret < 0)
goto err2;
diff --git a/net/netfilter/xt_rateest.c b/net/netfilter/xt_rateest.c
index 7720b036d76a84c949080c8c32827b604c80da64..1db02f6fca54d7eb5d60c2d8c5cb8ab11656fbcb 100644
--- a/net/netfilter/xt_rateest.c
+++ b/net/netfilter/xt_rateest.c
@@ -18,35 +18,33 @@ static bool
xt_rateest_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct xt_rateest_match_info *info = par->matchinfo;
- struct gnet_stats_rate_est64 *r;
+ struct gnet_stats_rate_est64 sample = {0};
u_int32_t bps1, bps2, pps1, pps2;
bool ret = true;
- spin_lock_bh(&info->est1->lock);
- r = &info->est1->rstats;
+ gen_estimator_read(&info->est1->rate_est, &sample);
+
if (info->flags & XT_RATEEST_MATCH_DELTA) {
- bps1 = info->bps1 >= r->bps ? info->bps1 - r->bps : 0;
- pps1 = info->pps1 >= r->pps ? info->pps1 - r->pps : 0;
+ bps1 = info->bps1 >= sample.bps ? info->bps1 - sample.bps : 0;
+ pps1 = info->pps1 >= sample.pps ? info->pps1 - sample.pps : 0;
} else {
- bps1 = r->bps;
- pps1 = r->pps;
+ bps1 = sample.bps;
+ pps1 = sample.pps;
}
- spin_unlock_bh(&info->est1->lock);
if (info->flags & XT_RATEEST_MATCH_ABS) {
bps2 = info->bps2;
pps2 = info->pps2;
} else {
- spin_lock_bh(&info->est2->lock);
- r = &info->est2->rstats;
+ gen_estimator_read(&info->est2->rate_est, &sample);
+
if (info->flags & XT_RATEEST_MATCH_DELTA) {
- bps2 = info->bps2 >= r->bps ? info->bps2 - r->bps : 0;
- pps2 = info->pps2 >= r->pps ? info->pps2 - r->pps : 0;
+ bps2 = info->bps2 >= sample.bps ? info->bps2 - sample.bps : 0;
+ pps2 = info->pps2 >= sample.pps ? info->pps2 - sample.pps : 0;
} else {
- bps2 = r->bps;
- pps2 = r->pps;
+ bps2 = sample.bps;
+ pps2 = sample.pps;
}
- spin_unlock_bh(&info->est2->lock);
}
switch (info->mode) {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index f893d180da1caa3b6dd1cc8773920beb1885f9b0..2095c83ce7730d49550103a8efad943d28815617 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -41,8 +41,7 @@ static void tcf_hash_destroy(struct tcf_hashinfo *hinfo, struct tc_action *p)
spin_lock_bh(&hinfo->lock);
hlist_del(&p->tcfa_head);
spin_unlock_bh(&hinfo->lock);
- gen_kill_estimator(&p->tcfa_bstats,
- &p->tcfa_rate_est);
+ gen_kill_estimator(&p->tcfa_rate_est);
/*
* gen_estimator est_timer() might access p->tcfa_lock
* or bstats, wait a RCU grace period before freeing p
@@ -237,8 +236,7 @@ EXPORT_SYMBOL(tcf_hash_check);
void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est)
{
if (est)
- gen_kill_estimator(&a->tcfa_bstats,
- &a->tcfa_rate_est);
+ gen_kill_estimator(&a->tcfa_rate_est);
call_rcu(&a->tcfa_rcu, free_tcf);
}
EXPORT_SYMBOL(tcf_hash_cleanup);
@@ -670,8 +668,7 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *p,
goto errout;
if (gnet_stats_copy_basic(NULL, &d, p->cpu_bstats, &p->tcfa_bstats) < 0 ||
- gnet_stats_copy_rate_est(&d, &p->tcfa_bstats,
- &p->tcfa_rate_est) < 0 ||
+ gnet_stats_copy_rate_est(&d, &p->tcfa_rate_est) < 0 ||
gnet_stats_copy_queue(&d, p->cpu_qstats,
&p->tcfa_qstats,
p->tcfa_qstats.qlen) < 0)
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index c990b73a6c85151862c30971bd3787d20dbcecd1..0ba91d1ce99480c4817684dbe0b4fde61811e03e 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -142,8 +142,7 @@ static int tcf_act_police_init(struct net *net, struct nlattr *nla,
goto failure_unlock;
} else if (tb[TCA_POLICE_AVRATE] &&
(ret == ACT_P_CREATED ||
- !gen_estimator_active(&police->tcf_bstats,
- &police->tcf_rate_est))) {
+ !gen_estimator_active(&police->tcf_rate_est))) {
err = -EINVAL;
goto failure_unlock;
}
@@ -216,13 +215,17 @@ static int tcf_act_police(struct sk_buff *skb, const struct tc_action *a,
bstats_update(&police->tcf_bstats, skb);
tcf_lastuse_update(&police->tcf_tm);
- if (police->tcfp_ewma_rate &&
- police->tcf_rate_est.bps >= police->tcfp_ewma_rate) {
- police->tcf_qstats.overlimits++;
- if (police->tcf_action == TC_ACT_SHOT)
- police->tcf_qstats.drops++;
- spin_unlock(&police->tcf_lock);
- return police->tcf_action;
+ if (police->tcfp_ewma_rate) {
+ struct gnet_stats_rate_est64 sample;
+
+ if (!gen_estimator_read(&police->tcf_rate_est, &sample) ||
+ sample.bps >= police->tcfp_ewma_rate) {
+ police->tcf_qstats.overlimits++;
+ if (police->tcf_action == TC_ACT_SHOT)
+ police->tcf_qstats.drops++;
+ spin_unlock(&police->tcf_lock);
+ return police->tcf_action;
+ }
}
if (qdisc_pkt_len(skb) <= police->tcfp_mtu) {
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index f337f1bdd1d4a4cac738f9fe1fbd42e5984174fe..d7b93429f0cca61ff489536b4247f31f3690fdd8 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1395,7 +1395,7 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(q),
&d, cpu_bstats, &q->bstats) < 0 ||
- gnet_stats_copy_rate_est(&d, &q->bstats, &q->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(&d, &q->rate_est) < 0 ||
gnet_stats_copy_queue(&d, cpu_qstats, &q->qstats, qlen) < 0)
goto nla_put_failure;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index beb554aa8cfbb4acb5d89511b5e0030b4c9cf26e..9ffe1c220b02848032474fa7b9b2c2f09d40b110 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -122,7 +122,7 @@ struct cbq_class {
psched_time_t penalized;
struct gnet_stats_basic_packed bstats;
struct gnet_stats_queue qstats;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct tc_cbq_xstats xstats;
struct tcf_proto __rcu *filter_list;
@@ -1346,7 +1346,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL, &cl->qstats, cl->q->q.qlen) < 0)
return -1;
@@ -1405,7 +1405,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
tcf_destroy_chain(&cl->filter_list);
qdisc_destroy(cl->q);
qdisc_put_rtab(cl->R_tab);
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
if (cl != &q->link)
kfree(cl);
}
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 8af5c59eef848db47cc33a878296693dabb44955..bb4cbdf7500482b170eef6e7923cf2f2259e52b5 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -25,7 +25,7 @@ struct drr_class {
struct gnet_stats_basic_packed bstats;
struct gnet_stats_queue qstats;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct list_head alist;
struct Qdisc *qdisc;
@@ -142,7 +142,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
static void drr_destroy_class(struct Qdisc *sch, struct drr_class *cl)
{
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
qdisc_destroy(cl->qdisc);
kfree(cl);
}
@@ -283,7 +283,7 @@ static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL, &cl->qdisc->qstats, qlen) < 0)
return -1;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6cfb6e9038c28187cc888cebf004e61270f00871..6eb9c8e88519a36fc3ef82be7c7f1dbe0ef1e13c 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -709,7 +709,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
qdisc_put_stab(rtnl_dereference(qdisc->stab));
#endif
- gen_kill_estimator(&qdisc->bstats, &qdisc->rate_est);
+ gen_kill_estimator(&qdisc->rate_est);
if (ops->reset)
ops->reset(qdisc);
if (ops->destroy)
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 000f1d36128e1abfc0657bce779a7df852f66384..3ffaa6fb0990f0aa31487a2f1829b1f1accf8b21 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -114,7 +114,7 @@ struct hfsc_class {
struct gnet_stats_basic_packed bstats;
struct gnet_stats_queue qstats;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct tcf_proto __rcu *filter_list; /* filter list */
unsigned int filter_cnt; /* filter count */
unsigned int level; /* class level in hierarchy */
@@ -1091,7 +1091,7 @@ hfsc_destroy_class(struct Qdisc *sch, struct hfsc_class *cl)
tcf_destroy_chain(&cl->filter_list);
qdisc_destroy(cl->qdisc);
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
if (cl != &q->root)
kfree(cl);
}
@@ -1348,7 +1348,7 @@ hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
xstats.rtwork = cl->cl_cumul;
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL, &cl->qstats, cl->qdisc->q.qlen) < 0)
return -1;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 9926fe4f3b6f7db9aeba22fbbfae3d47c4e2af60..760f39e7caeeb91b6c34d561f64f1ed97c884a32 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -111,7 +111,7 @@ struct htb_class {
unsigned int children;
struct htb_class *parent; /* parent class */
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
/*
* Written often fields
@@ -1145,7 +1145,7 @@ htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL, &qs, qlen) < 0)
return -1;
@@ -1228,7 +1228,7 @@ static void htb_destroy_class(struct Qdisc *sch, struct htb_class *cl)
WARN_ON(!cl->un.leaf.q);
qdisc_destroy(cl->un.leaf.q);
}
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
tcf_destroy_chain(&cl->filter_list);
kfree(cl);
}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index ca0516e6f74356f47dffcc2262f233ee50f0c47c..f9e712ce2d15ce9280c31d2f75d62b84034ae51d 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -137,7 +137,7 @@ struct qfq_class {
struct gnet_stats_basic_packed bstats;
struct gnet_stats_queue qstats;
- struct gnet_stats_rate_est64 rate_est;
+ struct net_rate_estimator __rcu *rate_est;
struct Qdisc *qdisc;
struct list_head alist; /* Link for active-classes list. */
struct qfq_aggregate *agg; /* Parent aggregate. */
@@ -508,7 +508,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
new_agg = kzalloc(sizeof(*new_agg), GFP_KERNEL);
if (new_agg == NULL) {
err = -ENOBUFS;
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
goto destroy_class;
}
sch_tree_lock(sch);
@@ -533,7 +533,7 @@ static void qfq_destroy_class(struct Qdisc *sch, struct qfq_class *cl)
struct qfq_sched *q = qdisc_priv(sch);
qfq_rm_from_agg(q, cl);
- gen_kill_estimator(&cl->bstats, &cl->rate_est);
+ gen_kill_estimator(&cl->rate_est);
qdisc_destroy(cl->qdisc);
kfree(cl);
}
@@ -667,7 +667,7 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
d, NULL, &cl->bstats) < 0 ||
- gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+ gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
gnet_stats_copy_queue(d, NULL,
&cl->qdisc->qstats, cl->qdisc->q.qlen) < 0)
return -1;
^ permalink raw reply related
* Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN
From: Neftin, Sasha @ 2016-12-04 7:35 UTC (permalink / raw)
To: Baicar, Tyler, jeffrey.t.kirsher, intel-wired-lan, netdev,
linux-kernel, okaya, timur
In-Reply-To: <a5e8ac37-9769-960b-8134-c90c5c061f12@codeaurora.org>
On 12/2/2016 7:02 PM, Baicar, Tyler wrote:
> Hello Sasha,
>
> Were you able to reproduce this issue?
>
> Do you have a patch fixing the close function inconsistencies that you
> mentioned which I could try out?
>
> Thanks,
> Tyler
>
> On 11/21/2016 1:40 PM, Baicar, Tyler wrote:
>> On 11/17/2016 6:31 AM, Neftin, Sasha wrote:
>>> On 11/13/2016 10:34 AM, Neftin, Sasha wrote:
>>>> On 11/11/2016 12:35 AM, Baicar, Tyler wrote:
>>>>> Hello Sasha,
>>>>>
>>>>> On 11/9/2016 11:19 PM, Neftin, Sasha wrote:
>>>>>> On 11/9/2016 11:41 PM, Tyler Baicar wrote:
>>>>>>> Move IRQ free code so that it will happen regardless of the
>>>>>>> __E1000_DOWN bit. Currently the e1000e driver only releases its IRQ
>>>>>>> if the __E1000_DOWN bit is cleared. This is not sufficient because
>>>>>>> it is possible for __E1000_DOWN to be set without releasing the IRQ.
>>>>>>> In such a situation, we will hit a kernel bug later in e1000_remove
>>>>>>> because the IRQ still has action since it was never freed. A
>>>>>>> secondary bus reset can cause this case to happen.
>>>>>>>
>>>>>>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>>>>>>> ---
>>>>>>> drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>>> index 7017281..36cfcb0 100644
>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>>>>>>> if (!test_bit(__E1000_DOWN, &adapter->state)) {
>>>>>>> e1000e_down(adapter, true);
>>>>>>> - e1000_free_irq(adapter);
>>>>>>> /* Link status message must follow this format */
>>>>>>> pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>>>>>>> }
>>>>>>> + e1000_free_irq(adapter);
>>>>>>> +
>>>>>>> napi_disable(&adapter->napi);
>>>>>>> e1000e_free_tx_resources(adapter->tx_ring);
>>>>>>>
>>>>>> I would like not recommend insert this change. This change related
>>>>>> driver state machine, we afraid from lot of synchronization
>>>>>> problem and
>>>>>> issues.
>>>>>> We need keep e1000_free_irq in loop and check for 'test_bit' ready.
>>>>> What do you mean here? There is no loop. If __E1000_DOWN is set
>>>>> then we
>>>>> will never free the IRQ.
>>>>>
>>>>>> Another point, does before execute secondary bus reset your SW
>>>>>> back up
>>>>>> pcie configuration space as properly?
>>>>> After a secondary bus reset, the link needs to recover and go back
>>>>> to a
>>>>> working state after 1 second.
>>>>>
>>>>> From the callstack, the issue is happening while removing the
>>>>> endpoint
>>>>> from the system, before applying the secondary bus reset.
>>>>>
>>>>> The order of events is
>>>>> 1. remove the drivers
>>>>> 2. cause a secondary bus reset
>>>>> 3. wait 1 second
>>>> Actually, this is too much, usually link up in less than 100ms.You can
>>>> check Data Link Layer indication.
>>>>> 4. recover the link
>>>>>
>>>>> callstack:
>>>>> free_msi_irqs+0x6c/0x1a8
>>>>> pci_disable_msi+0xb0/0x148
>>>>> e1000e_reset_interrupt_capability+0x60/0x78
>>>>> e1000_remove+0xc8/0x180
>>>>> pci_device_remove+0x48/0x118
>>>>> __device_release_driver+0x80/0x108
>>>>> device_release_driver+0x2c/0x40
>>>>> pci_stop_bus_device+0xa0/0xb0
>>>>> pci_stop_bus_device+0x3c/0xb0
>>>>> pci_stop_root_bus+0x54/0x80
>>>>> acpi_pci_root_remove+0x28/0x64
>>>>> acpi_bus_trim+0x6c/0xa4
>>>>> acpi_device_hotplug+0x19c/0x3f4
>>>>> acpi_hotplug_work_fn+0x28/0x3c
>>>>> process_one_work+0x150/0x460
>>>>> worker_thread+0x50/0x4b8
>>>>> kthread+0xd4/0xe8
>>>>> ret_from_fork+0x10/0x50
>>>>>
>>>>> Thanks,
>>>>> Tyler
>>>>>
>>>> Hello Tyler,
>>>> Okay, we need consult more about this suggestion.
>>>> May I ask what is setup you run? Is there NIC or on board LAN? I would
>>>> like try reproduce this issue in our lab's too.
>>>> Also, is same issue observed with same scenario and others NIC's too?
>>>> Sasha
>>>> _______________________________________________
>>>> Intel-wired-lan mailing list
>>>> Intel-wired-lan@lists.osuosl.org
>>>> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
>>>>
>>> Hello Tyler,
>>> I see some in consistent implementation of __*_close methods in our
>>> drivers. Do you have any igb NIC to check if same problem persist there?
>>> Thanks,
>>> Sasha
>> Hello Sasha,
>>
>> I couldn't find an igb NIC to test with, but I did find another e1000e
>> card that does not cause the same issue. That card is:
>>
>> 0004:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
>> Network Connection
>> Subsystem: Intel Corporation Gigabit CT Desktop Adapter
>> Physical Slot: 5-1
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR+ FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>> Latency: 0, Cache Line Size: 64 bytes
>> Interrupt: pin A routed to IRQ 299
>> Region 0: Memory at c01001c0000 (32-bit, non-prefetchable)
>> [size=128K]
>> Region 1: Memory at c0100100000 (32-bit, non-prefetchable)
>> [size=512K]
>> Region 2: I/O ports at 1000 [size=32]
>> Region 3: Memory at c01001e0000 (32-bit, non-prefetchable)
>> [size=16K]
>> Expansion ROM at c0100180000 [disabled] [size=256K]
>> Capabilities: [c8] Power Management version 2
>> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> Address: 00000000397f0040 Data: 0000
>> Capabilities: [e0] Express (v1) Endpoint, MSI 00
>> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
>> <512ns, L1 <64us
>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>> DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
>> Unsupported+
>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>> MaxPayload 256 bytes, MaxReadReq 512 bytes
>> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
>> AuxPwr+ TransPend-
>> LnkCap: Port #8, Speed 2.5GT/s, Width x1, ASPM L0s L1,
>> Exit Latency L0s <128ns, L1 <64us
>> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
>> SlotClk+ DLActive- BWMgmt- ABWMgmt-
>> Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
>> Vector table: BAR=3 offset=00000000
>> PBA: BAR=3 offset=00002000
>> Capabilities: [100 v1] Advanced Error Reporting
>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
>> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
>> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
>> UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>> NonFatalErr-
>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>> NonFatalErr+
>> AERCap: First Error Pointer: 00, GenCap- CGenEn-
>> ChkCap- ChkEn-
>> Capabilities: [140 v1] Device Serial Number
>> 68-05-ca-ff-ff-29-47-34
>> Kernel driver in use: e1000e
>>
>> Here are the kernel logs from first running the test on this card and
>> then running the test on the Intel 82571EB card that I originally
>> found the issue with (you can see the issue doesn't happen on this
>> card but does on the other):
>>
>> [ 44.146749] ACPI: \_SB_.PCI0: Device has suffered a power fault
>> [ 44.155238] pcieport 0000:00:00.0: PCIe Bus Error:
>> severity=Uncorrected (Non-Fatal), type=Transaction Layer,
>> id=0000(Requester ID)
>> [ 44.166111] pcieport 0000:00:00.0: device [17cb:0400] error
>> status/mask=00004000/00400000
>> [ 44.174420] pcieport 0000:00:00.0: [14] Completion Timeout (First)
>> [ 44.401943] e1000e 0000:01:00.0 eth0: Timesync Tx Control register
>> not set as expected
>> [ 82.445586] pcieport 0002:00:00.0: PCIe Bus Error:
>> severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
>> [ 82.454851] pcieport 0002:00:00.0: device [17cb:0400] error
>> status/mask=00000001/00006000
>> [ 82.463209] pcieport 0002:00:00.0: [ 0] Receiver Error
>> [ 82.469355] pcieport 0002:00:00.0: PCIe Bus Error:
>> severity=Uncorrected (Non-Fatal), type=Transaction Layer,
>> id=0000(Requester ID)
>> [ 82.481026] pcieport 0002:00:00.0: device [17cb:0400] error
>> status/mask=00004000/00400000
>> [ 82.489343] pcieport 0002:00:00.0: [14] Completion Timeout (First)
>> [ 82.504573] ACPI: \_SB_.PCI2: Device has suffered a power fault
>> [ 84.528036] kernel BUG at drivers/pci/msi.c:369!
>>
>> I'm not sure why it reproduces on the 82571EB card and not the 82574L
>> card. The only obvious difference is there is no Reciever Error on the
>> 82574L card.
>>
>> If you have a patch fixing the inconsistencies you mentioned with the
>> __*_close methods I would certainly be willing to test it out!
>>
>> Thanks,
>> Tyler
>>
>
Hello Tyler,
We do not tried reproduce issues yet. As I know hit on BUG_ON happen on
very old NIC. Our more new NIC do not experienced such problem. I would
like recommend do not change kernel code in this moment. This is very
sensitive for a driver state machine and we afraid from opening lot of
issues. I will investigate more depth this problem later(hope have a
time for it) and let you know if we have suggest fixes for this problem.
Thanks,
Sasha
^ permalink raw reply
* RE: [PATCH 1/1] net: ethernet: qlogic: set error code on failure
From: Mintz, Yuval @ 2016-12-04 7:29 UTC (permalink / raw)
To: Pan Bian, netdev@vger.kernel.org; +Cc: Elior, Ariel
In-Reply-To: <1480830833-5084-1-git-send-email-bianpan201603@163.com>
> From: Pan Bian <bianpan2016@163.com>
>
> When calling dma_mapping_error(), the value of return variable rc is 0.
> And when the call returns an unexpected value, rc is not set to a negative
> errno. Thus, it will return 0 on the error path, and its callers cannot detect
> the bug. This patch fixes the bug, assigning "-ENOMEM" to err.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189041
>
> Signed-off-by: Pan Bian <bianpan2016@163.com>
The title should have been "[PATCH net 1/1] qed: Set error code on failure".
But the fix itself is sound. Thanks.
BTW, is -ENOMEM the right return code in case of DMA mapping errors?
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
^ permalink raw reply
* RE: [PATCH 1/1] net: ethernet: broadcom: fix improper return value
From: Kalderon, Michal @ 2016-12-04 7:59 UTC (permalink / raw)
To: Pan Bian, Mintz, Yuval, Elior, Ariel, netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Pan Bian
In-Reply-To: <1480832969-5888-1-git-send-email-bianpan201604@163.com>
> From: Pan Bian <bianpan2016@163.com>
>
> Marco BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate
> memory, and jumps to label "lbl" if the allocation fails. Label "lbl"
> first cleans memory and then returns variable rc. Before calling the macro, the
> value of variable rc is 0. Because 0 means no error, the callers of
> bnx2x_init_firmware() may be misled. This patch fixes the bug, assigning "-
> ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET().
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141
>
> Signed-off-by: Pan Bian <bianpan2016@163.com>
The title is wrong from net-next; It should have been "bnx2x: Fix improper return value".
Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
^ permalink raw reply
* [PATCH 1/1] net: bnx2x: fix improper return value
From: Pan Bian @ 2016-12-04 8:39 UTC (permalink / raw)
To: Michal Kalderon, Yuval Mintz, Ariel Elior, everest-linux-l2,
netdev
Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
Marco BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate
memory, and jumps to label "lbl" if the allocation fails. Label "lbl"
first cleans memory and then returns variable rc. Before calling the
macro, the value of variable rc is 0. Because 0 means no error, the
callers of bnx2x_init_firmware() may be misled. This patch fixes the bug,
assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 0cee4c0..6f9fc20 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13505,6 +13505,7 @@ static int bnx2x_init_firmware(struct bnx2x *bp)
/* Initialize the pointers to the init arrays */
/* Blob */
+ rc = -ENOMEM;
BNX2X_ALLOC_AND_SET(init_data, request_firmware_exit, be32_to_cpu_n);
/* Opcodes */
--
1.9.1
^ permalink raw reply related
* Re: [PATCH 1/1] isdn: hisax: set error code on failure
From: Sergei Shtylyov @ 2016-12-04 10:14 UTC (permalink / raw)
To: Pan Bian, Karsten Keil, netdev; +Cc: linux-kernel, Pan Bian
In-Reply-To: <1480828511-4251-1-git-send-email-bianpan201603@163.com>
Hello.
On 12/4/2016 8:15 AM, Pan Bian wrote:
> From: Pan Bian <bianpan2016@163.com>
>
> In function hfc4s8s_probe(), the value of return variable err should be
> negative on failures. However, when the call to request_region() returns
> NULL, the value of err is 0. This patch fixes the bug, assiging
> "-ENOMEM" to err on the path that request_region() fails.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188931
>
> Signed-off-by: Pan Bian <bianpan2016@163.com>
> ---
> drivers/isdn/hisax/hfc4s8s_l1.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/isdn/hisax/hfc4s8s_l1.c b/drivers/isdn/hisax/hfc4s8s_l1.c
> index 9600cd7..3172cee 100644
> --- a/drivers/isdn/hisax/hfc4s8s_l1.c
> +++ b/drivers/isdn/hisax/hfc4s8s_l1.c
> @@ -1499,6 +1499,7 @@ struct hfc4s8s_l1 {
> printk(KERN_INFO
> "HFC-4S/8S: failed to request address space at 0x%04x\n",
> hw->iobase);
> + err = -ENOMEM;
-EBUSY fits request_region() better.
[..]
MBR, Sergei
^ permalink raw reply
* Re: [PATCH 1/1] net: bnx2x: fix improper return value
From: Sergei Shtylyov @ 2016-12-04 10:24 UTC (permalink / raw)
To: Pan Bian, Michal Kalderon, Yuval Mintz, Ariel Elior,
everest-linux-l2, netdev
Cc: linux-kernel, Pan Bian
In-Reply-To: <1480840746-7498-1-git-send-email-bianpan201604@163.com>
On 12/4/2016 11:39 AM, Pan Bian wrote:
> From: Pan Bian <bianpan2016@163.com>
>
> Marco BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate
Macro.
> memory, and jumps to label "lbl" if the allocation fails. Label "lbl"
> first cleans memory and then returns variable rc. Before calling the
> macro, the value of variable rc is 0. Because 0 means no error, the
> callers of bnx2x_init_firmware() may be misled. This patch fixes the bug,
> assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET().
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141
>
> Signed-off-by: Pan Bian <bianpan2016@163.com>
[...]
MBR, Sergei
^ permalink raw reply
* [PATCH 1/1] isdn: hisax: set error code on failure
From: Pan Bian @ 2016-12-04 10:33 UTC (permalink / raw)
To: Sergei Shtylyov, Karsten Keil, netdev; +Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
In function hfc4s8s_probe(), the value of return variable err should be
negative on failures. However, when the call to request_region() returns
NULL, the value of err is 0. This patch fixes the bug, assiging
"-EBUSY" to err on the path that request_region() fails.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188931
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/isdn/hisax/hfc4s8s_l1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/isdn/hisax/hfc4s8s_l1.c b/drivers/isdn/hisax/hfc4s8s_l1.c
index 9600cd7..3172cee 100644
--- a/drivers/isdn/hisax/hfc4s8s_l1.c
+++ b/drivers/isdn/hisax/hfc4s8s_l1.c
@@ -1499,6 +1499,7 @@ struct hfc4s8s_l1 {
printk(KERN_INFO
"HFC-4S/8S: failed to request address space at 0x%04x\n",
hw->iobase);
+ err = -EBUSY;
goto out;
}
--
1.9.1
^ permalink raw reply related
* Re: [PATCH 1/1] isdn: hisax: set error code on failure
From: Sergei Shtylyov @ 2016-12-04 10:40 UTC (permalink / raw)
To: Pan Bian, Karsten Keil, netdev; +Cc: linux-kernel, Pan Bian
In-Reply-To: <1480847603-15520-1-git-send-email-bianpan201603@163.com>
On 12/4/2016 1:33 PM, Pan Bian wrote:
You now need to indicate the patch version in hte subject, like this:
[PATCH 1/1 v2] isdn:...
> From: Pan Bian <bianpan2016@163.com>
>
> In function hfc4s8s_probe(), the value of return variable err should be
> negative on failures. However, when the call to request_region() returns
> NULL, the value of err is 0. This patch fixes the bug, assiging
Assigning.
> "-EBUSY" to err on the path that request_region() fails.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188931
>
> Signed-off-by: Pan Bian <bianpan2016@163.com>
> ---
And here you're supposed to describe changes in the different patch
versions...
[...]
MBR, Sergei
^ permalink raw reply
* [PATCH 1/1] net: bnx2x: fix improper return value
From: Pan Bian @ 2016-12-04 10:40 UTC (permalink / raw)
To: Sergei Shtylyov, Michal Kalderon, Yuval Mintz, Ariel Elior,
everest-linux-l2, netdev
Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
Macro BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate
memory, and jumps to label "lbl" if the allocation fails. Label "lbl"
first cleans memory and then returns variable rc. Before calling the
macro, the value of variable rc is 0. Because 0 means no error, the
callers of bnx2x_init_firmware() may be misled. This patch fixes the bug,
assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 0cee4c0..6f9fc20 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13505,6 +13505,7 @@ static int bnx2x_init_firmware(struct bnx2x *bp)
/* Initialize the pointers to the init arrays */
/* Blob */
+ rc = -ENOMEM;
BNX2X_ALLOC_AND_SET(init_data, request_firmware_exit, be32_to_cpu_n);
/* Opcodes */
--
1.9.1
^ permalink raw reply related
* [PATCH 1/1 v2] isdn: hisax: set error code on failure
From: Pan Bian @ 2016-12-04 10:43 UTC (permalink / raw)
To: Sergei Shtylyov, Karsten Keil, netdev; +Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
In function hfc4s8s_probe(), the value of return variable err should be
negative on failures. However, when the call to request_region() returns
NULL, the value of err is 0. This patch fixes the bug, assigning
"-EBUSY" to err on the path that request_region() fails.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188931
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/isdn/hisax/hfc4s8s_l1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/isdn/hisax/hfc4s8s_l1.c b/drivers/isdn/hisax/hfc4s8s_l1.c
index 9600cd7..3172cee 100644
--- a/drivers/isdn/hisax/hfc4s8s_l1.c
+++ b/drivers/isdn/hisax/hfc4s8s_l1.c
@@ -1499,6 +1499,7 @@ struct hfc4s8s_l1 {
printk(KERN_INFO
"HFC-4S/8S: failed to request address space at 0x%04x\n",
hw->iobase);
+ err = -EBUSY;
goto out;
}
--
1.9.1
^ permalink raw reply related
* [PATCH 1/1 v2] net: bnx2x: fix improper return value
From: Pan Bian @ 2016-12-04 10:46 UTC (permalink / raw)
To: Sergei Shtylyov, Michal Kalderon, Yuval Mintz, Ariel Elior,
everest-linux-l2, netdev
Cc: linux-kernel, Pan Bian
From: Pan Bian <bianpan2016@163.com>
Macro BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate
memory, and jumps to label "lbl" if the allocation fails. Label "lbl"
first cleans memory and then returns variable rc. Before calling the
macro, the value of variable rc is 0. Because 0 means no error, the
callers of bnx2x_init_firmware() may be misled. This patch fixes the bug,
assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 0cee4c0..6f9fc20 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13505,6 +13505,7 @@ static int bnx2x_init_firmware(struct bnx2x *bp)
/* Initialize the pointers to the init arrays */
/* Blob */
+ rc = -ENOMEM;
BNX2X_ALLOC_AND_SET(init_data, request_firmware_exit, be32_to_cpu_n);
/* Opcodes */
--
1.9.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox