* Re: problems with SCTP GSO
From: Marcelo Ricardo Leitner @ 2018-06-12 17:05 UTC (permalink / raw)
To: David Miller; +Cc: lucien.xin, edumazet, netdev
In-Reply-To: <20180611.202905.1954825345357429286.davem@davemloft.net>
On Mon, Jun 11, 2018 at 08:29:05PM -0700, David Miller wrote:
>
> I would like to bring up some problems with the current GSO
> implementation in SCTP.
>
> The most important for me right now is that SCTP uses
> "skb_gro_receive()" to build "GSO" frames :-(
>
> Really it just ends up using the slow path (basically, label 'merge'
> and onwards).
>
> So, using a GRO helper to build GSO packets is not great.
Okay.
>
> I want to make major surgery here and the only way I can is if
> it is exactly the GRO demuxing path that uses skb_gro_receive().
>
> Those paths pass in the list head from the NAPI struct that initiated
> the GRO code paths. That makes it easy for me to change this to use a
> list_head or a hash chain.
>
> Probably in the short term SCTP should just have a private helper that
> builds the frag list, appending 'skb' to 'head'.
>
> In the long term, SCTP should use the page frags just like TCP to
> append the data when building GSO frames. Then it could actually be
> offloaded and passed into drivers without linearizing.
Sounds like a plan. Shouldn't be too hard to do it.
(I'm out on PTO, btw)
Thanks,
Marcelo
^ permalink raw reply
* [PATCH] Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"
From: Bart Van Assche @ 2018-06-12 17:05 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, Bart Van Assche, Maciej Żenczykowski, Eric Dumazet
Revert the patch mentioned in the subject because it breaks at least
the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
daemon to fail to start:
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.
Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
---
net/core/sock.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index f333d75ef1a9..bcc41829a16d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -728,22 +728,9 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
sock_valbool_flag(sk, SOCK_DBG, valbool);
break;
case SO_REUSEADDR:
- val = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
- if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
- inet_sk(sk)->inet_num &&
- (sk->sk_reuse != val)) {
- ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
- break;
- }
- sk->sk_reuse = val;
+ sk->sk_reuse = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
break;
case SO_REUSEPORT:
- if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
- inet_sk(sk)->inet_num &&
- (sk->sk_reuseport != valbool)) {
- ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
- break;
- }
sk->sk_reuseport = valbool;
break;
case SO_TYPE:
--
2.17.0
^ permalink raw reply related
* Re: [PATCH] Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"
From: Eric Dumazet @ 2018-06-12 17:13 UTC (permalink / raw)
To: Bart Van Assche, David S . Miller
Cc: netdev, Maciej Żenczykowski, Eric Dumazet
In-Reply-To: <20180612170555.11733-1-bart.vanassche@wdc.com>
On 06/12/2018 10:05 AM, Bart Van Assche wrote:
> Revert the patch mentioned in the subject because it breaks at least
> the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
> daemon to fail to start:
>
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
> Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
> Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
> Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.
>
> Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> ---
> net/core/sock.c | 15 +--------------
> 1 file changed, 1 insertion(+), 14 deletions(-)
Yes, this change probably broke a lot of applications, unfortunately.
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: problems with SCTP GSO
From: Marcelo Ricardo Leitner @ 2018-06-12 17:30 UTC (permalink / raw)
To: David Miller; +Cc: lucien.xin, edumazet, netdev
In-Reply-To: <20180612170506.GF3877@localhost.localdomain>
On Tue, Jun 12, 2018 at 02:05:06PM -0300, Marcelo Ricardo Leitner wrote:
> On Mon, Jun 11, 2018 at 08:29:05PM -0700, David Miller wrote:
> >
> > I would like to bring up some problems with the current GSO
> > implementation in SCTP.
> >
> > The most important for me right now is that SCTP uses
> > "skb_gro_receive()" to build "GSO" frames :-(
> >
> > Really it just ends up using the slow path (basically, label 'merge'
> > and onwards).
> >
> > So, using a GRO helper to build GSO packets is not great.
>
> Okay.
>
> >
> > I want to make major surgery here and the only way I can is if
> > it is exactly the GRO demuxing path that uses skb_gro_receive().
> >
> > Those paths pass in the list head from the NAPI struct that initiated
> > the GRO code paths. That makes it easy for me to change this to use a
> > list_head or a hash chain.
> >
> > Probably in the short term SCTP should just have a private helper that
> > builds the frag list, appending 'skb' to 'head'.
> >
> > In the long term, SCTP should use the page frags just like TCP to
> > append the data when building GSO frames. Then it could actually be
> > offloaded and passed into drivers without linearizing.
>
> Sounds like a plan. Shouldn't be too hard to do it.
> (I'm out on PTO, btw)
Xin will work on this, mean while at least. Thanks Xin.
>
> Thanks,
> Marcelo
>
^ permalink raw reply
* Fw: [Bug 200033] New: stack-out-of-bounds in __xfrm_dst_hash net/xfrm/xfrm_hash.h
From: Stephen Hemminger @ 2018-06-12 17:38 UTC (permalink / raw)
To: netdev
Begin forwarded message:
Date: Tue, 12 Jun 2018 01:44:36 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 200033] New: stack-out-of-bounds in __xfrm_dst_hash net/xfrm/xfrm_hash.h
https://bugzilla.kernel.org/show_bug.cgi?id=200033
Bug ID: 200033
Summary: stack-out-of-bounds in __xfrm_dst_hash
net/xfrm/xfrm_hash.h
Product: Networking
Version: 2.5
Kernel Version: v4.17
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Other
Assignee: stephen@networkplumber.org
Reporter: icytxw@gmail.com
Regression: No
Created attachment 276483
--> https://bugzilla.kernel.org/attachment.cgi?id=276483&action=edit
Found this bug with modified syzkaller
==================================================================
BUG: KASAN: stack-out-of-bounds in __xfrm_dst_hash net/xfrm/xfrm_hash.h:96
[inline]
BUG: KASAN: stack-out-of-bounds in xfrm_dst_hash net/xfrm/xfrm_state.c:61
[inline]
BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x24ab/0x26e0
net/xfrm/xfrm_state.c:953
Read of size 4 at addr ffff880054b17b70 by task syz-executor0/13697
CPU: 0 PID: 13697 Comm: syz-executor0 Not tainted 4.17.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1
04/01/2014
Call Trace:
The buggy address belongs to the page:
page:ffffea000152c5c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x100000000000000()
raw: 0100000000000000 0000000000000000 ffffea000152c5c8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880054b17a00: 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2
ffff880054b17a80: f2 f2 f2 00 00 00 00 f2 f2 f2 f2 00 00 00 00 00
>ffff880054b17b00: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2 f2
^
ffff880054b17b80: f2 f2 f2 00 00 00 00 00 00 00 00 00 f2 f2 f2 f3
ffff880054b17c00: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 13697 Comm: syz-executor0 Tainted: G B 4.17.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1
04/01/2014
Call Trace:
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* Re: [jkirsher/next-queue PATCH v2 2/7] net: Add support for subordinate device traffic classes
From: Florian Fainelli @ 2018-06-12 17:49 UTC (permalink / raw)
To: Alexander Duyck, intel-wired-lan, jeffrey.t.kirsher, netdev
In-Reply-To: <20180612151835.86792.93718.stgit@ahduyck-green-test.jf.intel.com>
On 06/12/2018 08:18 AM, Alexander Duyck wrote:
> This patch is meant to provide the basic tools needed to allow us to create
> subordinate device traffic classes. The general idea here is to allow
> subdividing the queues of a device into queue groups accessible through an
> upper device such as a macvlan.
>
> The idea here is to enforce the idea that an upper device has to be a
> single queue device, ideally with IFF_NO_QUQUE set. With that being the
> case we can pretty much guarantee that the tc_to_txq mappings and XPS maps
> for the upper device are unused. As such we could reuse those in order to
> support subdividing the lower device and distributing those queues between
> the subordinate devices.
This is not necessarily a valid paradigm to work with. For instance in
DSA we have IFF_NO_QUEUE devices, but we still expose multiple egress
queues because that is how an application can choose how it wants to get
packets transmitted at the switch level. We have a 1:1 representation
between a queue at the net_device level, and what an egress queue at the
switch level is, so things like buffer reservation etc. can be configured.
I think you should consider that an upper device might want to have a
1:1 mapping to the lower device's queues and make that permissible.
Thoughts?
>
> In order to distinguish between a regular set of traffic classes and if a
> device is carrying subordinate traffic classes I changed num_tc from a u8
> to a s16 value and use the negative values to represent the suboordinate
> pool values. So starting at -1 and running to -32768 we can encode those as
> pool values, and the existing values of 0 to 15 can be maintained.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> include/linux/netdevice.h | 16 ++++++++
> net/core/dev.c | 89 +++++++++++++++++++++++++++++++++++++++++++++
> net/core/net-sysfs.c | 21 ++++++++++-
> 3 files changed, 124 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3ec9850..41b4660 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -569,6 +569,9 @@ struct netdev_queue {
> * (/sys/class/net/DEV/Q/trans_timeout)
> */
> unsigned long trans_timeout;
> +
> + /* Suboordinate device that the queue has been assigned to */
> + struct net_device *sb_dev;
> /*
> * write-mostly part
> */
> @@ -1978,7 +1981,7 @@ struct net_device {
> #ifdef CONFIG_DCB
> const struct dcbnl_rtnl_ops *dcbnl_ops;
> #endif
> - u8 num_tc;
> + s16 num_tc;
> struct netdev_tc_txq tc_to_txq[TC_MAX_QUEUE];
> u8 prio_tc_map[TC_BITMASK + 1];
>
> @@ -2032,6 +2035,17 @@ int netdev_get_num_tc(struct net_device *dev)
> return dev->num_tc;
> }
>
> +void netdev_unbind_sb_channel(struct net_device *dev,
> + struct net_device *sb_dev);
> +int netdev_bind_sb_channel_queue(struct net_device *dev,
> + struct net_device *sb_dev,
> + u8 tc, u16 count, u16 offset);
> +int netdev_set_sb_channel(struct net_device *dev, u16 channel);
> +static inline int netdev_get_sb_channel(struct net_device *dev)
> +{
> + return max_t(int, -dev->num_tc, 0);
> +}
> +
> static inline
> struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
> unsigned int index)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6e18242..27fe4f2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2068,11 +2068,13 @@ int netdev_txq_to_tc(struct net_device *dev, unsigned int txq)
> struct netdev_tc_txq *tc = &dev->tc_to_txq[0];
> int i;
>
> + /* walk through the TCs and see if it falls into any of them */
> for (i = 0; i < TC_MAX_QUEUE; i++, tc++) {
> if ((txq - tc->offset) < tc->count)
> return i;
> }
>
> + /* didn't find it, just return -1 to indicate no match */
> return -1;
> }
>
> @@ -2215,7 +2217,14 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
> bool active = false;
>
> if (dev->num_tc) {
> + /* Do not allow XPS on subordinate device directly */
> num_tc = dev->num_tc;
> + if (num_tc < 0)
> + return -EINVAL;
> +
> + /* If queue belongs to subordinate dev use its map */
> + dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
> +
> tc = netdev_txq_to_tc(dev, index);
> if (tc < 0)
> return -EINVAL;
> @@ -2366,11 +2375,25 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
> EXPORT_SYMBOL(netif_set_xps_queue);
>
> #endif
> +static void netdev_unbind_all_sb_channels(struct net_device *dev)
> +{
> + struct netdev_queue *txq = &dev->_tx[dev->num_tx_queues];
> +
> + /* Unbind any subordinate channels */
> + while (txq-- != &dev->_tx[0]) {
> + if (txq->sb_dev)
> + netdev_unbind_sb_channel(dev, txq->sb_dev);
> + }
> +}
> +
> void netdev_reset_tc(struct net_device *dev)
> {
> #ifdef CONFIG_XPS
> netif_reset_xps_queues_gt(dev, 0);
> #endif
> + netdev_unbind_all_sb_channels(dev);
> +
> + /* Reset TC configuration of device */
> dev->num_tc = 0;
> memset(dev->tc_to_txq, 0, sizeof(dev->tc_to_txq));
> memset(dev->prio_tc_map, 0, sizeof(dev->prio_tc_map));
> @@ -2399,11 +2422,77 @@ int netdev_set_num_tc(struct net_device *dev, u8 num_tc)
> #ifdef CONFIG_XPS
> netif_reset_xps_queues_gt(dev, 0);
> #endif
> + netdev_unbind_all_sb_channels(dev);
> +
> dev->num_tc = num_tc;
> return 0;
> }
> EXPORT_SYMBOL(netdev_set_num_tc);
>
> +void netdev_unbind_sb_channel(struct net_device *dev,
> + struct net_device *sb_dev)
> +{
> + struct netdev_queue *txq = &dev->_tx[dev->num_tx_queues];
> +
> +#ifdef CONFIG_XPS
> + netif_reset_xps_queues_gt(sb_dev, 0);
> +#endif
> + memset(sb_dev->tc_to_txq, 0, sizeof(sb_dev->tc_to_txq));
> + memset(sb_dev->prio_tc_map, 0, sizeof(sb_dev->prio_tc_map));
> +
> + while (txq-- != &dev->_tx[0]) {
> + if (txq->sb_dev == sb_dev)
> + txq->sb_dev = NULL;
> + }
> +}
> +EXPORT_SYMBOL(netdev_unbind_sb_channel);
> +
> +int netdev_bind_sb_channel_queue(struct net_device *dev,
> + struct net_device *sb_dev,
> + u8 tc, u16 count, u16 offset)
> +{
> + /* Make certain the sb_dev and dev are already configured */
> + if (sb_dev->num_tc >= 0 || tc >= dev->num_tc)
> + return -EINVAL;
> +
> + /* We cannot hand out queues we don't have */
> + if ((offset + count) > dev->real_num_tx_queues)
> + return -EINVAL;
> +
> + /* Record the mapping */
> + sb_dev->tc_to_txq[tc].count = count;
> + sb_dev->tc_to_txq[tc].offset = offset;
> +
> + /* Provide a way for Tx queue to find the tc_to_txq map or
> + * XPS map for itself.
> + */
> + while (count--)
> + netdev_get_tx_queue(dev, count + offset)->sb_dev = sb_dev;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(netdev_bind_sb_channel_queue);
> +
> +int netdev_set_sb_channel(struct net_device *dev, u16 channel)
> +{
> + /* Do not use a multiqueue device to represent a subordinate channel */
> + if (netif_is_multiqueue(dev))
> + return -ENODEV;
> +
> + /* We allow channels 1 - 32767 to be used for subordinate channels.
> + * Channel 0 is meant to be "native" mode and used only to represent
> + * the main root device. We allow writing 0 to reset the device back
> + * to normal mode after being used as a subordinate channel.
> + */
> + if (channel > S16_MAX)
> + return -EINVAL;
> +
> + dev->num_tc = -channel;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(netdev_set_sb_channel);
> +
> /*
> * Routine to help set real_num_tx_queues. To avoid skbs mapped to queues
> * greater than real_num_tx_queues stale skbs on the qdisc must be flushed.
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 335c6a4..bd067b1 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1054,11 +1054,23 @@ static ssize_t traffic_class_show(struct netdev_queue *queue,
> return -ENOENT;
>
> index = get_netdev_queue_index(queue);
> +
> + /* If queue belongs to subordinate dev use its tc mapping */
> + dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
> +
> tc = netdev_txq_to_tc(dev, index);
> if (tc < 0)
> return -EINVAL;
>
> - return sprintf(buf, "%u\n", tc);
> + /* We can report the traffic class one of two ways:
> + * Subordinate device traffic classes are reported with the traffic
> + * class first, and then the subordinate class so for example TC0 on
> + * subordinate device 2 will be reported as "0-2". If the queue
> + * belongs to the root device it will be reported with just the
> + * traffic class, so just "0" for TC 0 for example.
> + */
> + return dev->num_tc < 0 ? sprintf(buf, "%u%d\n", tc, dev->num_tc) :
> + sprintf(buf, "%u\n", tc);
> }
>
> #ifdef CONFIG_XPS
> @@ -1225,7 +1237,14 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
> index = get_netdev_queue_index(queue);
>
> if (dev->num_tc) {
> + /* Do not allow XPS on subordinate device directly */
> num_tc = dev->num_tc;
> + if (num_tc < 0)
> + return -EINVAL;
> +
> + /* If queue belongs to subordinate dev use its map */
> + dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
> +
> tc = netdev_txq_to_tc(dev, index);
> if (tc < 0)
> return -EINVAL;
>
--
Florian
^ permalink raw reply
* Re: [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
From: Stephen Hemminger @ 2018-06-12 17:50 UTC (permalink / raw)
To: Alexander Duyck; +Cc: intel-wired-lan, jeffrey.t.kirsher, netdev
In-Reply-To: <20180612151322.86792.97587.stgit@ahduyck-green-test.jf.intel.com>
On Tue, 12 Jun 2018 11:18:25 -0400
Alexander Duyck <alexander.h.duyck@intel.com> wrote:
> This patch series is meant to allow support for the L2 forward offload, aka
> MACVLAN offload without the need for using ndo_select_queue.
>
> The existing solution currently requires that we use ndo_select_queue in
> the transmit path if we want to associate specific Tx queues with a given
> MACVLAN interface. In order to get away from this we need to repurpose the
> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
> a means of accessing the queues on the lower device. As a result we cannot
> offload a device that is configured as multiqueue, however it doesn't
> really make sense to configure a macvlan interfaced as being multiqueue
> anyway since it doesn't really have a qdisc of its own in the first place.
>
> I am submitting this as an RFC for the netdev mailing list, and officially
> submitting it for testing to Jeff Kirsher's next-queue in order to validate
> the ixgbe specific bits.
>
> The big changes in this set are:
> Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
> Disable XPS for single queue devices
> Replace accel_priv with sb_dev in ndo_select_queue
> Add sb_dev parameter to fallback function for ndo_select_queue
> Consolidated ndo_select_queue functions that appeared to be duplicates
>
> v2: Implement generic "select_queue" functions instead of "fallback" functions.
> Tweak last two patches to account for changes in dev_pick_tx_xxx functions.
>
> ---
>
> Alexander Duyck (7):
> net-sysfs: Drop support for XPS and traffic_class on single queue device
> net: Add support for subordinate device traffic classes
> ixgbe: Add code to populate and use macvlan tc to Tx queue map
> net: Add support for subordinate traffic classes to netdev_pick_tx
> net: Add generic ndo_select_queue functions
> net: allow ndo_select_queue to pass netdev
> net: allow fallback function to pass netdev
>
>
> drivers/infiniband/hw/hfi1/vnic_main.c | 2
> drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c | 4 -
> drivers/net/bonding/bond_main.c | 3
> drivers/net/ethernet/amazon/ena/ena_netdev.c | 5 -
> drivers/net/ethernet/broadcom/bcmsysport.c | 6 -
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 6 +
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 5 -
> drivers/net/ethernet/hisilicon/hns/hns_enet.c | 5 -
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 62 ++++++--
> drivers/net/ethernet/lantiq_etop.c | 10 -
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 7 +
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3
> drivers/net/ethernet/mellanox/mlx5/core/en.h | 3
> drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 -
> drivers/net/ethernet/renesas/ravb_main.c | 3
> drivers/net/ethernet/sun/ldmvsw.c | 3
> drivers/net/ethernet/sun/sunvnet.c | 3
> drivers/net/ethernet/ti/netcp_core.c | 9 -
> drivers/net/hyperv/netvsc_drv.c | 6 -
> drivers/net/macvlan.c | 10 -
> drivers/net/net_failover.c | 7 +
> drivers/net/team/team.c | 3
> drivers/net/tun.c | 3
> drivers/net/wireless/marvell/mwifiex/main.c | 3
> drivers/net/xen-netback/interface.c | 4 -
> drivers/net/xen-netfront.c | 3
> drivers/staging/netlogic/xlr_net.c | 9 -
> drivers/staging/rtl8188eu/os_dep/os_intfs.c | 3
> drivers/staging/rtl8723bs/os_dep/os_intfs.c | 7 -
> include/linux/netdevice.h | 34 ++++-
> net/core/dev.c | 156 ++++++++++++++++++---
> net/core/net-sysfs.c | 36 ++++-
> net/mac80211/iface.c | 4 -
> net/packet/af_packet.c | 7 +
> 35 files changed, 312 insertions(+), 130 deletions(-)
>
> --
This makes sense. I thought you were hoping to get rid of select queue in future?
^ permalink raw reply
* Re: [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
From: Florian Fainelli @ 2018-06-12 17:56 UTC (permalink / raw)
To: Alexander Duyck, intel-wired-lan, jeffrey.t.kirsher, netdev
In-Reply-To: <20180612151322.86792.97587.stgit@ahduyck-green-test.jf.intel.com>
On 06/12/2018 08:18 AM, Alexander Duyck wrote:
> This patch series is meant to allow support for the L2 forward offload, aka
> MACVLAN offload without the need for using ndo_select_queue.
>
> The existing solution currently requires that we use ndo_select_queue in
> the transmit path if we want to associate specific Tx queues with a given
> MACVLAN interface. In order to get away from this we need to repurpose the
> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
> a means of accessing the queues on the lower device. As a result we cannot
> offload a device that is configured as multiqueue, however it doesn't
> really make sense to configure a macvlan interfaced as being multiqueue
> anyway since it doesn't really have a qdisc of its own in the first place.
Interesting, so at some point I had came up with the following for
mapping queues between the DSA slave network devices and the DSA master
network device (doing the actual transmission). The DSA master network
device driver is just a normal network device driver.
The set-up is as follows: 4 external Ethernet switch ports, each with 8
egress queues and the DSA master (bcmsysport.c), aka CPU Ethernet
controller has 32 output queues, so you can do a 1:1 mapping of those,
that's actually what we want. A subsequent hardware generation only
provides 16 output queues, so we can still do 2:1 mapping.
The implementation is done like this:
- DSA slave network devices are always created after the DSA master
network device so we can leverage that
- a specific notifier is running from the DSA core and tells the DSA
master about the switch position in the tree (position 0 = directly
attached), and the switch port number and a pointer to the slave network
device
- we establish the mapping between the queues within the bcmsysport
driver as a simple array
- when transmitting, DSA slave network devices set a specific queue/port
number within the 16-bits that skb->queue_mapping permits
- this gets re-used by bcmsysport.c to extract the correct queue number
during ndo_select_queue such that the appropriate queue number gets used
and congestion works end-to-end.
The reason why we do that is because there is some out of band HW that
monitors the queue depth of the switch port's egress queue and
back-pressure the Ethernet controller directly when trying to transmit
to a congested queue.
I had initially considered establishing the mapping using tc and some
custom "bind" argument of some kind, but ended-up doing things the way
they are which are more automatic though they leave less configuration
to an user. This has a number of caveats though:
- this is made generic within the context of DSA in that nothing is
switch driver or Ethernet MAC driver specific and the notifier
represents the contract between these two seemingly independent subsystems
- the queue indicated between DSA slave and master is unfortunately
switch driver/controller specific (BRCM_TAG_SET_PORT_QUEUE,
BRCM_TAG_GET_PORT, BRCM_TAG_GET_QUEUE)
What I like about your patchset is the mapping establishment, but as you
will read from my reply in patch 2, I think the (upper) 1:N (lower)
mapping might not work for my specific use case.
Anyhow, not intended to be blocking this, as it seems to be going in the
right direction anyway.
>
> I am submitting this as an RFC for the netdev mailing list, and officially
> submitting it for testing to Jeff Kirsher's next-queue in order to validate
> the ixgbe specific bits.
>
> The big changes in this set are:
> Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
> Disable XPS for single queue devices
> Replace accel_priv with sb_dev in ndo_select_queue
> Add sb_dev parameter to fallback function for ndo_select_queue
> Consolidated ndo_select_queue functions that appeared to be duplicates
Interesting, turns out I had a possibly similar use case with DSA with
the slave network devices need to select an outgoing queue number for
>
> v2: Implement generic "select_queue" functions instead of "fallback" functions.
> Tweak last two patches to account for changes in dev_pick_tx_xxx functions.
>
> ---
>
> Alexander Duyck (7):
> net-sysfs: Drop support for XPS and traffic_class on single queue device
> net: Add support for subordinate device traffic classes
> ixgbe: Add code to populate and use macvlan tc to Tx queue map
> net: Add support for subordinate traffic classes to netdev_pick_tx
> net: Add generic ndo_select_queue functions
> net: allow ndo_select_queue to pass netdev
> net: allow fallback function to pass netdev
>
>
> drivers/infiniband/hw/hfi1/vnic_main.c | 2
> drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c | 4 -
> drivers/net/bonding/bond_main.c | 3
> drivers/net/ethernet/amazon/ena/ena_netdev.c | 5 -
> drivers/net/ethernet/broadcom/bcmsysport.c | 6 -
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 6 +
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 5 -
> drivers/net/ethernet/hisilicon/hns/hns_enet.c | 5 -
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 62 ++++++--
> drivers/net/ethernet/lantiq_etop.c | 10 -
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 7 +
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3
> drivers/net/ethernet/mellanox/mlx5/core/en.h | 3
> drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 -
> drivers/net/ethernet/renesas/ravb_main.c | 3
> drivers/net/ethernet/sun/ldmvsw.c | 3
> drivers/net/ethernet/sun/sunvnet.c | 3
> drivers/net/ethernet/ti/netcp_core.c | 9 -
> drivers/net/hyperv/netvsc_drv.c | 6 -
> drivers/net/macvlan.c | 10 -
> drivers/net/net_failover.c | 7 +
> drivers/net/team/team.c | 3
> drivers/net/tun.c | 3
> drivers/net/wireless/marvell/mwifiex/main.c | 3
> drivers/net/xen-netback/interface.c | 4 -
> drivers/net/xen-netfront.c | 3
> drivers/staging/netlogic/xlr_net.c | 9 -
> drivers/staging/rtl8188eu/os_dep/os_intfs.c | 3
> drivers/staging/rtl8723bs/os_dep/os_intfs.c | 7 -
> include/linux/netdevice.h | 34 ++++-
> net/core/dev.c | 156 ++++++++++++++++++---
> net/core/net-sysfs.c | 36 ++++-
> net/mac80211/iface.c | 4 -
> net/packet/af_packet.c | 7 +
> 35 files changed, 312 insertions(+), 130 deletions(-)
>
> --
>
--
Florian
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 18:08 UTC (permalink / raw)
To: Bart Van Assche
Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
kvm@vger.kernel.org, netdev@vger.kernel.org,
linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <0c93c72a3a339f3479f82de04223315671e07863.camel@wdc.com>
On Tue, Jun 12, 2018 at 04:32:03PM +0000, Bart Van Assche wrote:
> On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> > On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> > > Please introduce functions in the target core for allocating and freeing a tag
> > > instead of spreading the knowledge of how to allocate and free tags over all
> > > target drivers.
> >
> > I can't without doing an unreasonably large amount of work on drivers that
> > I have no way to test. Some of the drivers have the se_cmd already; some
> > of them don't. I'd be happy to introduce a common function for freeing
> > a tag.
>
> Which target drivers are you referring to? If you are referring to the sbp driver:
> I think that driver is dead and can be removed from the kernel tree. I even don't
> know whether that driver ever has had any users other than the developer of that
> driver.
For example tcm_fc:
tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
goto busy;
cmd = &((struct ft_cmd *)se_sess->sess_cmd_map)[tag];
or qla2xxx:
tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return NULL;
cmd = &((struct qla_tgt_cmd *)se_sess->sess_cmd_map)[tag];
The core doesn't know at what offset from the pointer to store the tag
& cpu. Only the individual drivers know their cmd layout.
^ permalink raw reply
* Re: [PATCH] Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"
From: David Miller @ 2018-06-12 18:10 UTC (permalink / raw)
To: bart.vanassche; +Cc: netdev, maze, edumazet
In-Reply-To: <20180612170555.11733-1-bart.vanassche@wdc.com>
From: Bart Van Assche <bart.vanassche@wdc.com>
Date: Tue, 12 Jun 2018 10:05:55 -0700
> Revert the patch mentioned in the subject because it breaks at least
> the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
> daemon to fail to start:
>
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
> Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
> Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
> Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
> Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.
>
> Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Applied, thanks.
I held off on submitting the reverted patch to -stable, and have now
thus removed it from my -stable queue.
^ permalink raw reply
* Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs
From: Andrew Lunn @ 2018-06-12 18:11 UTC (permalink / raw)
To: Tom Lendacky
Cc: Don Bollinger, Arnd Bergmann, Greg Kroah-Hartman, linux-kernel,
brandon_chuang, wally_wang, roy_lee, rick_burchett, quentin.chang,
jeffrey.townsend, scotte, roopa, David Ahern, luke.williams,
Guohan Lu, Russell King, netdev@vger.kernel.org
In-Reply-To: <496e06b9-9f02-c4ae-4156-ab6221ba23fd@amd.com>
> There's an SFP driver under drivers/net/phy. Can that driver be extended
> to provide this support? Adding Russel King who developed sfp.c, as well
> at the netdev mailing list.
I agree, the current SFP code should be used.
My observations seem to be there are two different ways {Q}SFP are used:
1) The Linux kernel has full control, as assumed by the devlink/SFP
frame work. We parse the SFP data to find the capabilities of the SFP
and use it to program the MAC to use the correct mode. The MAC can be
a NIC, but it can also be a switch. DSA is gaining support for
PHYLINK, so SFP modules should just work with most switches which DSA
support. And there is no reason a plain switchdev switch can not use
PHYLINK.
2) Firmware is in control of the PHY layer, but there is a wish to
expose some of the data which is available via i2c from the {Q}SFP to
linux.
It appears this optoe supports this second case. It does not appear to
support any in kernel API to actually make use of the SFP data in the
kernel.
We should not be duplicating code. We should share the SFP code for
both use cases above. There is also a Linux standard API for getting
access to this information. ethtool -m/--module-info. Anything which
is exporting {Q}SFP data needs to use this API.
Andrew
^ permalink raw reply
* Re: [PATCH bpf v3] tools/bpftool: fix a bug in bpftool perf
From: Jakub Kicinski @ 2018-06-12 18:15 UTC (permalink / raw)
To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180612053548.901931-1-yhs@fb.com>
On Mon, 11 Jun 2018 22:35:48 -0700, Yonghong Song wrote:
> Commit b04df400c302 ("tools/bpftool: add perf subcommand")
> introduced bpftool subcommand perf to query bpf program
> kuprobe and tracepoint attachments.
>
> The perf subcommand will first test whether bpf subcommand
> BPF_TASK_FD_QUERY is supported in kernel or not. It does it
> by opening a file with argv[0] and feeds the file descriptor
> and current task pid to the kernel for querying.
>
> Such an approach won't work if the argv[0] cannot be opened
> successfully in the current directory. This is especially
> true when bpftool is accessible through PATH env variable.
> The error below reflects the open failure for file argv[0]
> at home directory.
>
> [yhs@localhost ~]$ which bpftool
> /usr/local/sbin/bpftool
> [yhs@localhost ~]$ bpftool perf
> Error: perf_query_support: No such file or directory
>
> To fix the issue, let us open root directory ("/")
> which exists in every linux system. With the fix, the
> error message will correctly reflect the permission issue.
>
> [yhs@localhost ~]$ which bpftool
> /usr/local/sbin/bpftool
> [yhs@localhost ~]$ bpftool perf
> Error: perf_query_support: Operation not permitted
> HINT: non root or kernel doesn't support TASK_FD_QUERY
>
> Fixes: b04df400c302 ("tools/bpftool: add perf subcommand")
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
FWIW :)
^ permalink raw reply
* Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup
From: Tobias Hommel @ 2018-06-12 18:39 UTC (permalink / raw)
To: Kristian Evensen; +Cc: Steffen Klassert, Markus Berner, Network Development
In-Reply-To: <CAKfDRXiq2c2ruvT8XoXGQntHYccAOp0zUZ3uH4iJM3cSAQkNVw@mail.gmail.com>
On Fri, Jun 08, 2018 at 10:41:37AM +0200, Kristian Evensen wrote:
> Hi,
>
> On Wed, Jun 6, 2018 at 6:03 PM, Tobias Hommel <netdev-list@genoetigt.de> wrote:
> > Sorry no progress until now, I currently do not get time to have a deeper look
> > into that. We're back to 4.1.6 right now.
>
> Thanks for letting me know. In the project I am currently involved in,
> we unfortunately don't have the option of reverting the kernel, so we
> are finding ways to live with the error. We have been looking into the
> error a bit more, and have made the following observations:
>
> * First of all, as discussed earlier in the thread, the error is
> triggered by dst_orig being NULL. Our current work-around is just to
> return from xfrm_lookup if dst_orig is NULL and this seems to work
> fine, the error doesn't happen that often (in our use-cases at least).
> * The machine we use for testing (and where we first saw the error) is
> used as initiator.
The machine where I encountered the bug is a "roadwarrior gateway", so it only
serves as a responder.
> * When we compare the logs from Strongswan with the ones from the
> kernel, it seems that the error is typically triggered when a tunnels
> is teared down/about to come up. We need quite a lot of tunnels for
> the error to trigger, usually around 30+. I guess this might point to
> some race or some condition not being met when packets are
> sent/received.
> * We see the error much more frequently when hardware encryption is enabled.
> * Yesterday, we upgraded the kernel from 4.14.34 to 4.14.48, and the
> error happens much less frequently. I see that 4.14.48 includes
> several IPsec fixes (for example the previously mentioned ("xfrm: Fix
> a race in the xdst pcpu cache.")).
>
> BR,
> Kristian
^ permalink raw reply
* [PATCH iproute2-next v2] ip-xfrm: Add support for OUTPUT_MARK
From: Subash Abhinov Kasiviswanathan @ 2018-06-12 18:48 UTC (permalink / raw)
To: lorenzo, netdev, stephen, dsahern, steffen.klassert
Cc: Subash Abhinov Kasiviswanathan
This patch adds support for OUTPUT_MARK in xfrm state to exercise the
functionality added by kernel commit 077fbac405bf
("net: xfrm: support setting an output mark.").
Sample output with output-mark -
src 192.168.1.1 dst 192.168.1.2
proto esp spi 0x00004321 reqid 0 mode tunnel
replay-window 0 flag af-unspec
mark 0x10000/0x3ffff
output-mark 0x20000
auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
v1->v2: Moved the XFRMA_OUTPUT_MARK print after XFRMA_MARK in
xfrm_xfrma_print() as mentioned by Lorenzo
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
ip/ipxfrm.c | 6 ++++++
ip/xfrm_state.c | 9 +++++++++
man/man8/ip-xfrm.8 | 2 ++
3 files changed, 17 insertions(+)
diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index 12c2f72..8b88c8f 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -681,6 +681,12 @@ void xfrm_xfrma_print(struct rtattr *tb[], __u16 family,
fprintf(fp, "%s", _SL_);
}
+ if (tb[XFRMA_OUTPUT_MARK]) {
+ __u32 output_mark = rta_getattr_u32(tb[XFRMA_OUTPUT_MARK]);
+
+ fprintf(fp, "\toutput-mark 0x%x %s", output_mark, _SL_);
+ }
+
if (tb[XFRMA_ALG_AUTH] && !tb[XFRMA_ALG_AUTH_TRUNC]) {
struct rtattr *rta = tb[XFRMA_ALG_AUTH];
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 85d959c..d005802 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -61,6 +61,7 @@ static void usage(void)
fprintf(stderr, " [ flag FLAG-LIST ] [ sel SELECTOR ] [ LIMIT-LIST ] [ encap ENCAP ]\n");
fprintf(stderr, " [ coa ADDR[/PLEN] ] [ ctx CTX ] [ extra-flag EXTRA-FLAG-LIST ]\n");
fprintf(stderr, " [ offload [dev DEV] dir DIR ]\n");
+ fprintf(stderr, " [ output-mark OUTPUT-MARK]\n");
fprintf(stderr, "Usage: ip xfrm state allocspi ID [ mode MODE ] [ mark MARK [ mask MASK ] ]\n");
fprintf(stderr, " [ reqid REQID ] [ seq SEQ ] [ min SPI max SPI ]\n");
fprintf(stderr, "Usage: ip xfrm state { delete | get } ID [ mark MARK [ mask MASK ] ]\n");
@@ -322,6 +323,7 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
struct xfrm_user_sec_ctx sctx;
char str[CTX_BUF_SIZE];
} ctx = {};
+ __u32 output_mark = 0;
while (argc > 0) {
if (strcmp(*argv, "mode") == 0) {
@@ -437,6 +439,10 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
invarg("value after \"offload dir\" is invalid", *argv);
is_offload = false;
}
+ } else if (strcmp(*argv, "output-mark") == 0) {
+ NEXT_ARG();
+ if (get_u32(&output_mark, *argv, 0))
+ invarg("value after \"output-mark\" is invalid", *argv);
} else {
/* try to assume ALGO */
int type = xfrm_algotype_getbyname(*argv);
@@ -720,6 +726,9 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
}
}
+ if (output_mark != 0)
+ addattr32(&req.n, sizeof(req.buf), XFRMA_OUTPUT_MARK, output_mark);
+
if (rtnl_open_byproto(&rth, 0, NETLINK_XFRM) < 0)
exit(1);
diff --git a/man/man8/ip-xfrm.8 b/man/man8/ip-xfrm.8
index 988cc6a..e001596 100644
--- a/man/man8/ip-xfrm.8
+++ b/man/man8/ip-xfrm.8
@@ -59,6 +59,8 @@ ip-xfrm \- transform configuration
.IR CTX " ]"
.RB "[ " extra-flag
.IR EXTRA-FLAG-LIST " ]"
+.RB "[ " output-mark
+.IR OUTPUT-MARK " ]"
.ti -8
.B "ip xfrm state allocspi"
--
1.9.1
^ permalink raw reply related
* Re: [PATCH iproute2-next] ip-xfrm: Add support for OUTPUT_MARK
From: Subash Abhinov Kasiviswanathan @ 2018-06-12 18:51 UTC (permalink / raw)
To: Lorenzo Colitti; +Cc: netdev, Stephen Hemminger, David Ahern, Steffen Klassert
In-Reply-To: <CAKD1Yr0Z8ZgyE=b2MXtGOaJSRm0Y8spnU2pDxuWLd5FFgfx=eQ@mail.gmail.com>
> Have you considered putting this earlier up in the output, where the
> mark is printed as well?
>
>> + if (tb[XFRMA_OUTPUT_MARK]) {
>> + __u32 output_mark =
>> rta_getattr_u32(tb[XFRMA_OUTPUT_MARK]);
>> +
>> + fprintf(fp, "\toutput-mark 0x%x %s", output_mark,
>> _SL_);
>> + }
>> }
>
> If you wanted to implement the suggestion above, I think you could do
> that by moving this code into xfrm_xfrma_print.
>
Hi Lorenzo
I have updated it now in v2.
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply
* [PATCH 0/3] Use sbitmap instead of percpu_ida
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
linux-usb, kvm, virtualization, netdev, Juergen Gross,
qla2xxx-upstream, Kent Overstreet, Jens Axboe
Cc: Matthew Wilcox
Removing the percpu_ida code nets over 400 lines of removal. It's not
as spectacular as deleting an entire architecture, but it's still a
worthy reduction in lines of code.
Untested due to lack of hardware and not understanding how to set up a
target platform.
Changes from v1:
- Fixed bugs pointed out by Jens in iscsit_wait_for_tag()
- Abstracted out tag freeing as requested by Bart
- Made iscsit_wait_for_tag static as pointed out by 0day
Matthew Wilcox (3):
target: Abstract tag freeing
Convert target drivers to use sbitmap
Remove percpu_ida
drivers/scsi/qla2xxx/qla_target.c | 14 +-
drivers/target/iscsi/iscsi_target_util.c | 35 ++-
drivers/target/sbp/sbp_target.c | 7 +-
drivers/target/target_core_transport.c | 5 +-
drivers/target/tcm_fc/tfc_cmd.c | 10 +-
drivers/usb/gadget/function/f_tcm.c | 7 +-
drivers/vhost/scsi.c | 8 +-
drivers/xen/xen-scsiback.c | 9 +-
include/linux/percpu_ida.h | 83 -----
include/target/iscsi/iscsi_target_core.h | 1 +
include/target/target_core_base.h | 10 +-
lib/Makefile | 2 +-
lib/percpu_ida.c | 370 -----------------------
13 files changed, 73 insertions(+), 488 deletions(-)
delete mode 100644 include/linux/percpu_ida.h
delete mode 100644 lib/percpu_ida.c
--
2.17.1
^ permalink raw reply
* [PATCH 1/3] target: Abstract tag freeing
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
linux-usb, kvm, virtualization, netdev, Juergen Gross,
qla2xxx-upstream, Kent Overstreet, Jens Axboe
Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>
Introduce target_free_tag() and convert all drivers to use it.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
drivers/scsi/qla2xxx/qla_target.c | 4 ++--
drivers/target/iscsi/iscsi_target_util.c | 2 +-
drivers/target/sbp/sbp_target.c | 2 +-
drivers/target/tcm_fc/tfc_cmd.c | 4 ++--
drivers/usb/gadget/function/f_tcm.c | 2 +-
drivers/vhost/scsi.c | 2 +-
drivers/xen/xen-scsiback.c | 4 +---
include/target/target_core_base.h | 5 +++++
8 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index b85c833099ff..05290966e630 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -3783,7 +3783,7 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
return;
}
cmd->jiffies_at_free = get_jiffies_64();
- percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+ target_free_tag(sess->se_sess, &cmd->se_cmd);
}
EXPORT_SYMBOL(qlt_free_cmd);
@@ -4146,7 +4146,7 @@ static void __qlt_do_work(struct qla_tgt_cmd *cmd)
qlt_send_term_exchange(qpair, NULL, &cmd->atio, 1, 0);
qlt_decr_num_pend_cmds(vha);
- percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+ target_free_tag(sess->se_sess, &cmd->se_cmd);
spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
spin_lock_irqsave(&ha->tgt.sess_lock, flags);
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 4435bf374d2d..7e98697cfb8e 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -711,7 +711,7 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd)
kfree(cmd->iov_data);
kfree(cmd->text_in_ptr);
- percpu_ida_free(&sess->se_sess->sess_tag_pool, se_cmd->map_tag);
+ target_free_tag(sess->se_sess, se_cmd);
}
EXPORT_SYMBOL(iscsit_release_cmd);
diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c
index fb1003921d85..679ae29d25ab 100644
--- a/drivers/target/sbp/sbp_target.c
+++ b/drivers/target/sbp/sbp_target.c
@@ -1460,7 +1460,7 @@ static void sbp_free_request(struct sbp_target_request *req)
kfree(req->pg_tbl);
kfree(req->cmd_buf);
- percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+ target_free_tag(se_sess, se_cmd);
}
static void sbp_mgt_agent_process(struct work_struct *work)
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index ec372860106f..13e4efbe1ce7 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -92,7 +92,7 @@ static void ft_free_cmd(struct ft_cmd *cmd)
if (fr_seq(fp))
fc_seq_release(fr_seq(fp));
fc_frame_free(fp);
- percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+ target_free_tag(sess->se_sess, &cmd->se_cmd);
ft_sess_put(sess); /* undo get from lookup at recv */
}
@@ -461,7 +461,7 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
cmd->sess = sess;
cmd->seq = fc_seq_assign(lport, fp);
if (!cmd->seq) {
- percpu_ida_free(&se_sess->sess_tag_pool, tag);
+ target_free_tag(se_sess, &cmd->se_cmd);
goto busy;
}
cmd->req_frame = fp; /* hold frame during cmd */
diff --git a/drivers/usb/gadget/function/f_tcm.c b/drivers/usb/gadget/function/f_tcm.c
index d78dbb73bde8..9f670d9224b9 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1288,7 +1288,7 @@ static void usbg_release_cmd(struct se_cmd *se_cmd)
struct se_session *se_sess = se_cmd->se_sess;
kfree(cmd->data_buf);
- percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+ target_free_tag(se_sess, se_cmd);
}
static u32 usbg_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 7ad57094d736..70d35e696533 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -324,7 +324,7 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
}
vhost_scsi_put_inflight(tv_cmd->inflight);
- percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+ target_free_tag(se_sess, se_cmd);
}
static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index 7bc88fd43cfc..ec6635258ed8 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -1377,9 +1377,7 @@ static int scsiback_check_stop_free(struct se_cmd *se_cmd)
static void scsiback_release_cmd(struct se_cmd *se_cmd)
{
- struct se_session *se_sess = se_cmd->se_sess;
-
- percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+ target_free_tag(se_cmd->se_sess, se_cmd);
}
static u32 scsiback_sess_get_index(struct se_session *se_sess)
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 922a39f45abc..260c2f3e9460 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -934,4 +934,9 @@ static inline void atomic_dec_mb(atomic_t *v)
smp_mb__after_atomic();
}
+static inline void target_free_tag(struct se_session *sess, struct se_cmd *cmd)
+{
+ percpu_ida_free(&sess->sess_tag_pool, cmd->map_tag);
+}
+
#endif /* TARGET_CORE_BASE_H */
--
2.17.1
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
^ permalink raw reply related
* [PATCH 2/3] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
linux-usb, kvm, virtualization, netdev, Juergen Gross,
qla2xxx-upstream, Kent Overstreet, Jens Axboe
Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>
The sbitmap and the percpu_ida perform essentially the same task,
allocating tags for commands. The sbitmap outperforms the percpu_ida
as documented here: https://lkml.org/lkml/2014/4/22/553
The sbitmap interface is a little harder to use, but being able to
remove the percpu_ida code and getting better performance justifies the
additional complexity.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> # f_tcm
---
drivers/scsi/qla2xxx/qla_target.c | 10 ++++---
drivers/target/iscsi/iscsi_target_util.c | 33 +++++++++++++++++++++---
drivers/target/sbp/sbp_target.c | 5 ++--
drivers/target/target_core_transport.c | 5 ++--
drivers/target/tcm_fc/tfc_cmd.c | 6 ++---
drivers/usb/gadget/function/f_tcm.c | 5 ++--
drivers/vhost/scsi.c | 6 ++---
drivers/xen/xen-scsiback.c | 5 ++--
include/target/iscsi/iscsi_target_core.h | 1 +
include/target/target_core_base.h | 7 ++---
10 files changed, 59 insertions(+), 24 deletions(-)
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 05290966e630..a1725a054749 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -4277,9 +4277,9 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
{
struct se_session *se_sess = sess->se_sess;
struct qla_tgt_cmd *cmd;
- int tag;
+ int tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return NULL;
@@ -4292,6 +4292,7 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
qlt_incr_num_pend_cmds(vha);
cmd->vha = vha;
cmd->se_cmd.map_tag = tag;
+ cmd->se_cmd.map_cpu = cpu;
cmd->sess = sess;
cmd->loop_id = sess->loop_id;
cmd->conf_compl_supported = sess->conf_compl_supported;
@@ -5294,7 +5295,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
struct fc_port *sess;
struct se_session *se_sess;
struct qla_tgt_cmd *cmd;
- int tag;
+ int tag, cpu;
unsigned long flags;
if (unlikely(tgt->tgt_stop)) {
@@ -5326,7 +5327,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
se_sess = sess->se_sess;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return;
@@ -5357,6 +5358,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
cmd->reset_count = ha->base_qpair->chip_reset;
cmd->q_full = 1;
cmd->qpair = ha->base_qpair;
+ cmd->se_cmd.map_cpu = cpu;
if (qfull) {
cmd->q_full = 1;
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 7e98697cfb8e..8cfcf9033507 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -17,7 +17,7 @@
******************************************************************************/
#include <linux/list.h>
-#include <linux/percpu_ida.h>
+#include <linux/sched/signal.h>
#include <net/ipv6.h> /* ipv6_addr_equal() */
#include <scsi/scsi_tcq.h>
#include <scsi/iscsi_proto.h>
@@ -147,6 +147,30 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd)
spin_unlock_bh(&cmd->r2t_lock);
}
+static int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
+{
+ int tag = -1;
+ DEFINE_WAIT(wait);
+ struct sbq_wait_state *ws;
+
+ if (state == TASK_RUNNING)
+ return tag;
+
+ ws = &se_sess->sess_tag_pool.ws[0];
+ for (;;) {
+ prepare_to_wait_exclusive(&ws->wait, &wait, state);
+ if (signal_pending_state(state, current))
+ break;
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
+ if (tag >= 0)
+ break;
+ schedule();
+ }
+
+ finish_wait(&ws->wait, &wait);
+ return tag;
+}
+
/*
* May be called from software interrupt (timer) context for allocating
* iSCSI NopINs.
@@ -155,9 +179,11 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
{
struct iscsi_cmd *cmd;
struct se_session *se_sess = conn->sess->se_sess;
- int size, tag;
+ int size, tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, state);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
+ if (tag < 0)
+ tag = iscsit_wait_for_tag(se_sess, state, &cpu);
if (tag < 0)
return NULL;
@@ -166,6 +192,7 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
memset(cmd, 0, size);
cmd->se_cmd.map_tag = tag;
+ cmd->se_cmd.map_cpu = cpu;
cmd->conn = conn;
cmd->data_direction = DMA_NONE;
INIT_LIST_HEAD(&cmd->i_conn_node);
diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c
index 679ae29d25ab..42b21f2ac8b0 100644
--- a/drivers/target/sbp/sbp_target.c
+++ b/drivers/target/sbp/sbp_target.c
@@ -926,15 +926,16 @@ static struct sbp_target_request *sbp_mgt_get_req(struct sbp_session *sess,
{
struct se_session *se_sess = sess->se_sess;
struct sbp_target_request *req;
- int tag;
+ int tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return ERR_PTR(-ENOMEM);
req = &((struct sbp_target_request *)se_sess->sess_cmd_map)[tag];
memset(req, 0, sizeof(*req));
req->se_cmd.map_tag = tag;
+ req->se_cmd.map_cpu = cpu;
req->se_cmd.tag = next_orb;
return req;
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index f0e8f0f4ccb4..18c53c5cdd3d 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -260,7 +260,8 @@ int transport_alloc_session_tags(struct se_session *se_sess,
}
}
- rc = percpu_ida_init(&se_sess->sess_tag_pool, tag_num);
+ rc = sbitmap_queue_init_node(&se_sess->sess_tag_pool, tag_num, -1,
+ false, GFP_KERNEL, NUMA_NO_NODE);
if (rc < 0) {
pr_err("Unable to init se_sess->sess_tag_pool,"
" tag_num: %u\n", tag_num);
@@ -547,7 +548,7 @@ void transport_free_session(struct se_session *se_sess)
target_put_nacl(se_nacl);
}
if (se_sess->sess_cmd_map) {
- percpu_ida_destroy(&se_sess->sess_tag_pool);
+ sbitmap_queue_free(&se_sess->sess_tag_pool);
kvfree(se_sess->sess_cmd_map);
}
kmem_cache_free(se_sess_cache, se_sess);
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index 13e4efbe1ce7..a183d4da7db2 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -28,7 +28,6 @@
#include <linux/configfs.h>
#include <linux/ctype.h>
#include <linux/hash.h>
-#include <linux/percpu_ida.h>
#include <asm/unaligned.h>
#include <scsi/scsi_tcq.h>
#include <scsi/libfc.h>
@@ -448,9 +447,9 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
struct ft_cmd *cmd;
struct fc_lport *lport = sess->tport->lport;
struct se_session *se_sess = sess->se_sess;
- int tag;
+ int tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
goto busy;
@@ -458,6 +457,7 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
memset(cmd, 0, sizeof(struct ft_cmd));
cmd->se_cmd.map_tag = tag;
+ cmd->se_cmd.map_cpu = cpu;
cmd->sess = sess;
cmd->seq = fc_seq_assign(lport, fp);
if (!cmd->seq) {
diff --git a/drivers/usb/gadget/function/f_tcm.c b/drivers/usb/gadget/function/f_tcm.c
index 9f670d9224b9..5003e857dce7 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1071,15 +1071,16 @@ static struct usbg_cmd *usbg_get_cmd(struct f_uas *fu,
{
struct se_session *se_sess = tv_nexus->tvn_se_sess;
struct usbg_cmd *cmd;
- int tag;
+ int tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return ERR_PTR(-ENOMEM);
cmd = &((struct usbg_cmd *)se_sess->sess_cmd_map)[tag];
memset(cmd, 0, sizeof(*cmd));
cmd->se_cmd.map_tag = tag;
+ cmd->se_cmd.map_cpu = cpu;
cmd->se_cmd.tag = cmd->tag = scsi_tag;
cmd->fu = fu;
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 70d35e696533..c9c5d6b291cc 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -46,7 +46,6 @@
#include <linux/virtio_scsi.h>
#include <linux/llist.h>
#include <linux/bitmap.h>
-#include <linux/percpu_ida.h>
#include "vhost.h"
@@ -567,7 +566,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
struct se_session *se_sess;
struct scatterlist *sg, *prot_sg;
struct page **pages;
- int tag;
+ int tag, cpu;
tv_nexus = tpg->tpg_nexus;
if (!tv_nexus) {
@@ -576,7 +575,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
}
se_sess = tv_nexus->tvn_se_sess;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0) {
pr_err("Unable to obtain tag for vhost_scsi_cmd\n");
return ERR_PTR(-ENOMEM);
@@ -591,6 +590,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
cmd->tvc_prot_sgl = prot_sg;
cmd->tvc_upages = pages;
cmd->tvc_se_cmd.map_tag = tag;
+ cmd->tvc_se_cmd.map_cpu = cpu;
cmd->tvc_tag = scsi_tag;
cmd->tvc_lun = lun;
cmd->tvc_task_attr = task_attr;
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index ec6635258ed8..764dd9aa0131 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -654,9 +654,9 @@ static struct vscsibk_pend *scsiback_get_pend_req(struct vscsiif_back_ring *ring
struct scsiback_nexus *nexus = tpg->tpg_nexus;
struct se_session *se_sess = nexus->tvn_se_sess;
struct vscsibk_pend *req;
- int tag, i;
+ int tag, cpu, i;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0) {
pr_err("Unable to obtain tag for vscsiif_request\n");
return ERR_PTR(-ENOMEM);
@@ -665,6 +665,7 @@ static struct vscsibk_pend *scsiback_get_pend_req(struct vscsiif_back_ring *ring
req = &((struct vscsibk_pend *)se_sess->sess_cmd_map)[tag];
memset(req, 0, sizeof(*req));
req->se_cmd.map_tag = tag;
+ req->se_cmd.map_cpu = cpu;
for (i = 0; i < VSCSI_MAX_GRANTS; i++)
req->grant_handles[i] = SCSIBACK_INVALID_HANDLE;
diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
index cf5f3fff1f1a..f2e6abea8490 100644
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -4,6 +4,7 @@
#include <linux/dma-direction.h> /* enum dma_data_direction */
#include <linux/list.h> /* struct list_head */
+#include <linux/sched.h>
#include <linux/socket.h> /* struct sockaddr_storage */
#include <linux/types.h> /* u8 */
#include <scsi/iscsi_proto.h> /* itt_t */
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 260c2f3e9460..448f291125c2 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -4,7 +4,7 @@
#include <linux/configfs.h> /* struct config_group */
#include <linux/dma-direction.h> /* enum dma_data_direction */
-#include <linux/percpu_ida.h> /* struct percpu_ida */
+#include <linux/sbitmap.h>
#include <linux/percpu-refcount.h>
#include <linux/semaphore.h> /* struct semaphore */
#include <linux/completion.h>
@@ -455,6 +455,7 @@ struct se_cmd {
int sam_task_attr;
/* Used for se_sess->sess_tag_pool */
unsigned int map_tag;
+ int map_cpu;
/* Transport protocol dependent state, see transport_state_table */
enum transport_state_table t_state;
/* See se_cmd_flags_table */
@@ -608,7 +609,7 @@ struct se_session {
struct list_head sess_wait_list;
spinlock_t sess_cmd_lock;
void *sess_cmd_map;
- struct percpu_ida sess_tag_pool;
+ struct sbitmap_queue sess_tag_pool;
};
struct se_device;
@@ -936,7 +937,7 @@ static inline void atomic_dec_mb(atomic_t *v)
static inline void target_free_tag(struct se_session *sess, struct se_cmd *cmd)
{
- percpu_ida_free(&sess->sess_tag_pool, cmd->map_tag);
+ sbitmap_queue_clear(&sess->sess_tag_pool, cmd->map_tag, cmd->map_cpu);
}
#endif /* TARGET_CORE_BASE_H */
--
2.17.1
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
^ permalink raw reply related
* [PATCH 3/3] Remove percpu_ida
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
linux-usb, kvm, virtualization, netdev, Juergen Gross,
qla2xxx-upstream, Kent Overstreet, Jens Axboe
Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>
With its one user gone, remove the library code.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
include/linux/percpu_ida.h | 83 ---------
lib/Makefile | 2 +-
lib/percpu_ida.c | 370 -------------------------------------
3 files changed, 1 insertion(+), 454 deletions(-)
delete mode 100644 include/linux/percpu_ida.h
delete mode 100644 lib/percpu_ida.c
diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
deleted file mode 100644
index 07d78e4653bc..000000000000
--- a/include/linux/percpu_ida.h
+++ /dev/null
@@ -1,83 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __PERCPU_IDA_H__
-#define __PERCPU_IDA_H__
-
-#include <linux/types.h>
-#include <linux/bitops.h>
-#include <linux/init.h>
-#include <linux/sched.h>
-#include <linux/spinlock_types.h>
-#include <linux/wait.h>
-#include <linux/cpumask.h>
-
-struct percpu_ida_cpu;
-
-struct percpu_ida {
- /*
- * number of tags available to be allocated, as passed to
- * percpu_ida_init()
- */
- unsigned nr_tags;
- unsigned percpu_max_size;
- unsigned percpu_batch_size;
-
- struct percpu_ida_cpu __percpu *tag_cpu;
-
- /*
- * Bitmap of cpus that (may) have tags on their percpu freelists:
- * steal_tags() uses this to decide when to steal tags, and which cpus
- * to try stealing from.
- *
- * It's ok for a freelist to be empty when its bit is set - steal_tags()
- * will just keep looking - but the bitmap _must_ be set whenever a
- * percpu freelist does have tags.
- */
- cpumask_t cpus_have_tags;
-
- struct {
- spinlock_t lock;
- /*
- * When we go to steal tags from another cpu (see steal_tags()),
- * we want to pick a cpu at random. Cycling through them every
- * time we steal is a bit easier and more or less equivalent:
- */
- unsigned cpu_last_stolen;
-
- /* For sleeping on allocation failure */
- wait_queue_head_t wait;
-
- /*
- * Global freelist - it's a stack where nr_free points to the
- * top
- */
- unsigned nr_free;
- unsigned *freelist;
- } ____cacheline_aligned_in_smp;
-};
-
-/*
- * Number of tags we move between the percpu freelist and the global freelist at
- * a time
- */
-#define IDA_DEFAULT_PCPU_BATCH_MOVE 32U
-/* Max size of percpu freelist, */
-#define IDA_DEFAULT_PCPU_SIZE ((IDA_DEFAULT_PCPU_BATCH_MOVE * 3) / 2)
-
-int percpu_ida_alloc(struct percpu_ida *pool, int state);
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
-
-void percpu_ida_destroy(struct percpu_ida *pool);
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
- unsigned long max_size, unsigned long batch_size);
-static inline int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
-{
- return __percpu_ida_init(pool, nr_tags, IDA_DEFAULT_PCPU_SIZE,
- IDA_DEFAULT_PCPU_BATCH_MOVE);
-}
-
-typedef int (*percpu_ida_cb)(unsigned, void *);
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
- void *data);
-
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu);
-#endif /* __PERCPU_IDA_H__ */
diff --git a/lib/Makefile b/lib/Makefile
index 84c6dcb31fbb..f4722a7fa62c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -40,7 +40,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
bsearch.o find_bit.o llist.o memweight.o kfifo.o \
- percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
+ percpu-refcount.o rhashtable.o reciprocal_div.o \
once.o refcount.o usercopy.o errseq.o bucket_locks.o
obj-$(CONFIG_STRING_SELFTEST) += test_string.o
obj-y += string_helpers.o
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
deleted file mode 100644
index 9bbd9c5d375a..000000000000
--- a/lib/percpu_ida.c
+++ /dev/null
@@ -1,370 +0,0 @@
-/*
- * Percpu IDA library
- *
- * Copyright (C) 2013 Datera, Inc. Kent Overstreet
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- */
-
-#include <linux/mm.h>
-#include <linux/bitmap.h>
-#include <linux/bitops.h>
-#include <linux/bug.h>
-#include <linux/err.h>
-#include <linux/export.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/percpu.h>
-#include <linux/sched/signal.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
-#include <linux/percpu_ida.h>
-
-struct percpu_ida_cpu {
- /*
- * Even though this is percpu, we need a lock for tag stealing by remote
- * CPUs:
- */
- spinlock_t lock;
-
- /* nr_free/freelist form a stack of free IDs */
- unsigned nr_free;
- unsigned freelist[];
-};
-
-static inline void move_tags(unsigned *dst, unsigned *dst_nr,
- unsigned *src, unsigned *src_nr,
- unsigned nr)
-{
- *src_nr -= nr;
- memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr);
- *dst_nr += nr;
-}
-
-/*
- * Try to steal tags from a remote cpu's percpu freelist.
- *
- * We first check how many percpu freelists have tags
- *
- * Then we iterate through the cpus until we find some tags - we don't attempt
- * to find the "best" cpu to steal from, to keep cacheline bouncing to a
- * minimum.
- */
-static inline void steal_tags(struct percpu_ida *pool,
- struct percpu_ida_cpu *tags)
-{
- unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
- struct percpu_ida_cpu *remote;
-
- for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
- cpus_have_tags; cpus_have_tags--) {
- cpu = cpumask_next(cpu, &pool->cpus_have_tags);
-
- if (cpu >= nr_cpu_ids) {
- cpu = cpumask_first(&pool->cpus_have_tags);
- if (cpu >= nr_cpu_ids)
- BUG();
- }
-
- pool->cpu_last_stolen = cpu;
- remote = per_cpu_ptr(pool->tag_cpu, cpu);
-
- cpumask_clear_cpu(cpu, &pool->cpus_have_tags);
-
- if (remote == tags)
- continue;
-
- spin_lock(&remote->lock);
-
- if (remote->nr_free) {
- memcpy(tags->freelist,
- remote->freelist,
- sizeof(unsigned) * remote->nr_free);
-
- tags->nr_free = remote->nr_free;
- remote->nr_free = 0;
- }
-
- spin_unlock(&remote->lock);
-
- if (tags->nr_free)
- break;
- }
-}
-
-/*
- * Pop up to IDA_PCPU_BATCH_MOVE IDs off the global freelist, and push them onto
- * our percpu freelist:
- */
-static inline void alloc_global_tags(struct percpu_ida *pool,
- struct percpu_ida_cpu *tags)
-{
- move_tags(tags->freelist, &tags->nr_free,
- pool->freelist, &pool->nr_free,
- min(pool->nr_free, pool->percpu_batch_size));
-}
-
-/**
- * percpu_ida_alloc - allocate a tag
- * @pool: pool to allocate from
- * @state: task state for prepare_to_wait
- *
- * Returns a tag - an integer in the range [0..nr_tags) (passed to
- * tag_pool_init()), or otherwise -ENOSPC on allocation failure.
- *
- * Safe to be called from interrupt context (assuming it isn't passed
- * TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, of course).
- *
- * @gfp indicates whether or not to wait until a free id is available (it's not
- * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep
- * however long it takes until another thread frees an id (same semantics as a
- * mempool).
- *
- * Will not fail if passed TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE.
- */
-int percpu_ida_alloc(struct percpu_ida *pool, int state)
-{
- DEFINE_WAIT(wait);
- struct percpu_ida_cpu *tags;
- unsigned long flags;
- int tag = -ENOSPC;
-
- tags = raw_cpu_ptr(pool->tag_cpu);
- spin_lock_irqsave(&tags->lock, flags);
-
- /* Fastpath */
- if (likely(tags->nr_free >= 0)) {
- tag = tags->freelist[--tags->nr_free];
- spin_unlock_irqrestore(&tags->lock, flags);
- return tag;
- }
- spin_unlock_irqrestore(&tags->lock, flags);
-
- while (1) {
- spin_lock_irqsave(&pool->lock, flags);
- tags = this_cpu_ptr(pool->tag_cpu);
-
- /*
- * prepare_to_wait() must come before steal_tags(), in case
- * percpu_ida_free() on another cpu flips a bit in
- * cpus_have_tags
- *
- * global lock held and irqs disabled, don't need percpu lock
- */
- if (state != TASK_RUNNING)
- prepare_to_wait(&pool->wait, &wait, state);
-
- if (!tags->nr_free)
- alloc_global_tags(pool, tags);
- if (!tags->nr_free)
- steal_tags(pool, tags);
-
- if (tags->nr_free) {
- tag = tags->freelist[--tags->nr_free];
- if (tags->nr_free)
- cpumask_set_cpu(smp_processor_id(),
- &pool->cpus_have_tags);
- }
-
- spin_unlock_irqrestore(&pool->lock, flags);
-
- if (tag >= 0 || state == TASK_RUNNING)
- break;
-
- if (signal_pending_state(state, current)) {
- tag = -ERESTARTSYS;
- break;
- }
-
- schedule();
- }
- if (state != TASK_RUNNING)
- finish_wait(&pool->wait, &wait);
-
- return tag;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_alloc);
-
-/**
- * percpu_ida_free - free a tag
- * @pool: pool @tag was allocated from
- * @tag: a tag previously allocated with percpu_ida_alloc()
- *
- * Safe to be called from interrupt context.
- */
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
-{
- struct percpu_ida_cpu *tags;
- unsigned long flags;
- unsigned nr_free;
-
- BUG_ON(tag >= pool->nr_tags);
-
- tags = raw_cpu_ptr(pool->tag_cpu);
-
- spin_lock_irqsave(&tags->lock, flags);
- tags->freelist[tags->nr_free++] = tag;
-
- nr_free = tags->nr_free;
-
- if (nr_free == 1) {
- cpumask_set_cpu(smp_processor_id(),
- &pool->cpus_have_tags);
- wake_up(&pool->wait);
- }
- spin_unlock_irqrestore(&tags->lock, flags);
-
- if (nr_free == pool->percpu_max_size) {
- spin_lock_irqsave(&pool->lock, flags);
- spin_lock(&tags->lock);
-
- if (tags->nr_free == pool->percpu_max_size) {
- move_tags(pool->freelist, &pool->nr_free,
- tags->freelist, &tags->nr_free,
- pool->percpu_batch_size);
-
- wake_up(&pool->wait);
- }
- spin_unlock(&tags->lock);
- spin_unlock_irqrestore(&pool->lock, flags);
- }
-}
-EXPORT_SYMBOL_GPL(percpu_ida_free);
-
-/**
- * percpu_ida_destroy - release a tag pool's resources
- * @pool: pool to free
- *
- * Frees the resources allocated by percpu_ida_init().
- */
-void percpu_ida_destroy(struct percpu_ida *pool)
-{
- free_percpu(pool->tag_cpu);
- free_pages((unsigned long) pool->freelist,
- get_order(pool->nr_tags * sizeof(unsigned)));
-}
-EXPORT_SYMBOL_GPL(percpu_ida_destroy);
-
-/**
- * percpu_ida_init - initialize a percpu tag pool
- * @pool: pool to initialize
- * @nr_tags: number of tags that will be available for allocation
- *
- * Initializes @pool so that it can be used to allocate tags - integers in the
- * range [0, nr_tags). Typically, they'll be used by driver code to refer to a
- * preallocated array of tag structures.
- *
- * Allocation is percpu, but sharding is limited by nr_tags - for best
- * performance, the workload should not span more cpus than nr_tags / 128.
- */
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
- unsigned long max_size, unsigned long batch_size)
-{
- unsigned i, cpu, order;
-
- memset(pool, 0, sizeof(*pool));
-
- init_waitqueue_head(&pool->wait);
- spin_lock_init(&pool->lock);
- pool->nr_tags = nr_tags;
- pool->percpu_max_size = max_size;
- pool->percpu_batch_size = batch_size;
-
- /* Guard against overflow */
- if (nr_tags > (unsigned) INT_MAX + 1) {
- pr_err("percpu_ida_init(): nr_tags too large\n");
- return -EINVAL;
- }
-
- order = get_order(nr_tags * sizeof(unsigned));
- pool->freelist = (void *) __get_free_pages(GFP_KERNEL, order);
- if (!pool->freelist)
- return -ENOMEM;
-
- for (i = 0; i < nr_tags; i++)
- pool->freelist[i] = i;
-
- pool->nr_free = nr_tags;
-
- pool->tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
- pool->percpu_max_size * sizeof(unsigned),
- sizeof(unsigned));
- if (!pool->tag_cpu)
- goto err;
-
- for_each_possible_cpu(cpu)
- spin_lock_init(&per_cpu_ptr(pool->tag_cpu, cpu)->lock);
-
- return 0;
-err:
- percpu_ida_destroy(pool);
- return -ENOMEM;
-}
-EXPORT_SYMBOL_GPL(__percpu_ida_init);
-
-/**
- * percpu_ida_for_each_free - iterate free ids of a pool
- * @pool: pool to iterate
- * @fn: interate callback function
- * @data: parameter for @fn
- *
- * Note, this doesn't guarantee to iterate all free ids restrictly. Some free
- * ids might be missed, some might be iterated duplicated, and some might
- * be iterated and not free soon.
- */
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
- void *data)
-{
- unsigned long flags;
- struct percpu_ida_cpu *remote;
- unsigned cpu, i, err = 0;
-
- for_each_possible_cpu(cpu) {
- remote = per_cpu_ptr(pool->tag_cpu, cpu);
- spin_lock_irqsave(&remote->lock, flags);
- for (i = 0; i < remote->nr_free; i++) {
- err = fn(remote->freelist[i], data);
- if (err)
- break;
- }
- spin_unlock_irqrestore(&remote->lock, flags);
- if (err)
- goto out;
- }
-
- spin_lock_irqsave(&pool->lock, flags);
- for (i = 0; i < pool->nr_free; i++) {
- err = fn(pool->freelist[i], data);
- if (err)
- break;
- }
- spin_unlock_irqrestore(&pool->lock, flags);
-out:
- return err;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_for_each_free);
-
-/**
- * percpu_ida_free_tags - return free tags number of a specific cpu or global pool
- * @pool: pool related
- * @cpu: specific cpu or global pool if @cpu == nr_cpu_ids
- *
- * Note: this just returns a snapshot of free tags number.
- */
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu)
-{
- struct percpu_ida_cpu *remote;
- if (cpu == nr_cpu_ids)
- return pool->nr_free;
- remote = per_cpu_ptr(pool->tag_cpu, cpu);
- return remote->nr_free;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_free_tags);
--
2.17.1
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
^ permalink raw reply related
* Problems in tc-cbq-details.8, tc-cbq.8, tc-mqprio.8, tc-prio.8, tc-htb.8
From: esr @ 2018-06-12 19:16 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 708 bytes --]
This is automatically generated email about markup problems in a man
page for which you appear to be responsible. If you are not the right
person or list, please tell me so I can correct my database.
See http://catb.org/~esr/doclifter/bugs.html for details on how and
why these patches were generated. Feel free to email me with any
questions. Note: These patches do not change the modification date of
any manual page. You may wish to do that by hand.
I apologize if this message seems spammy or impersonal. The volume of
markup bugs I am tracking is over five hundred - there is no real
alternative to generating bugmail from a database and template.
--
Eric S. Raymond
[-- Attachment #2: Type: text/plain, Size: 379 bytes --]
Problems with tc-mqprio.8:
( ) notation for mandatory parts of command syntax should be { }.
--- tc-mqprio.8-unpatched 2018-05-18 17:15:30.026540775 -0400
+++ tc-mqprio.8 2018-05-18 17:15:41.762461016 -0400
@@ -4,9 +4,9 @@
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root } [ handle
major:
.B ] mqprio [ numtc
tcs
[-- Attachment #3: Type: text/plain, Size: 373 bytes --]
Problems with tc-prio.8:
( ) notation for mandatory parts of command syntax should be { }.
--- tc-prio.8-unpatched 2018-05-18 17:17:17.793808383 -0400
+++ tc-prio.8 2018-05-18 17:17:30.785720088 -0400
@@ -4,9 +4,9 @@
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root } [ handle
major:
.B ] prio [ bands
bands
[-- Attachment #4: Type: text/plain, Size: 374 bytes --]
Problems with tc-htb.8:
( ) notation for mandatory parts of command syntax should be { }.
--- tc-htb.8-unpatched 2018-05-18 17:05:31.142610823 -0400
+++ tc-htb.8 2018-05-18 17:05:42.262535252 -0400
@@ -4,9 +4,9 @@
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root } [ handle
major:
.B ] htb [ default
minor-id
[-- Attachment #5: Type: text/plain, Size: 390 bytes --]
Problems with tc-cbq-details.8:
( ) notation for mandatory parts of command syntax should be { }.
--- tc-cbq-details.8-unpatched 2018-05-18 17:00:55.116486712 -0400
+++ tc-cbq-details.8 2018-05-18 17:01:06.988406029 -0400
@@ -4,9 +4,9 @@
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root} [ handle
major:
.B ] cbq avpkt
bytes
[-- Attachment #6: Type: text/plain, Size: 369 bytes --]
Problems with tc-cbq.8:
( ) notation for mandatory parts of command syntax should be { }.
--- tc-cbq.8-unpatched 2018-05-18 17:03:33.087413133 -0400
+++ tc-cbq.8 2018-05-18 17:03:45.223330656 -0400
@@ -4,9 +4,9 @@
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root } [ handle
major:
.B ] cbq [ allot
bytes
^ permalink raw reply
* BUG: MAX_LOCK_DEPTH too low! (2)
From: syzbot @ 2018-06-12 19:23 UTC (permalink / raw)
To: davem, e, edumazet, jbenc, linux-kernel, netdev, pshelar,
syzkaller-bugs, yi.y.yang
Hello,
syzbot found the following crash on:
HEAD commit: ae40832e53c3 bpfilter: fix a build err
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17094ed7800000
kernel config: https://syzkaller.appspot.com/x/.config?x=7db1a87249322f90
dashboard link: https://syzkaller.appspot.com/bug?extid=802a5abb8abae86eb6de
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+802a5abb8abae86eb6de@syzkaller.appspotmail.com
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48 max: 48!
48 locks held by syz-executor6/30643:
#0: 00000000ac2541d7 (rcu_read_lock_bh){....}, at:
__dev_queue_xmit+0x323/0x3900 net/core/dev.c:3521
#1: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#1: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#2: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#2: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#3: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#3: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#4: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#4: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#5: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#5: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#6: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#6: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#7: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#7: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#8: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#8: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#9: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#9: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
netlink: 8 bytes leftover after parsing attributes in process
`syz-executor5'.
#10: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#10: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#11: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#11: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#12: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#12: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#13: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#13: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#14: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#14: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#15: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#15: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#16: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#16: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
netlink: 8 bytes leftover after parsing attributes in process
`syz-executor5'.
#17: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#17: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#18: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#18: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#19: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#19: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#20: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#20: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#21: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#21: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#22: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#22: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#23: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#23: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#24: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#24: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#25: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#25: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#26: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#26: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#27: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#27: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#28: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#28: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#29: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#29: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#30: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#30: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#31: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#31: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#32: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#32: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#33: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#33: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#34: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#34: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#35: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#35: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#36: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#36: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#37: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#37: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#38: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#38: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#39: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#39: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#40: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#40: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#41: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#41: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#42: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#42: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#43: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#43: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#44: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#44: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#45: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#45: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#46: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#46: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
#47: 00000000c22902c0 (rcu_read_lock){....}, at: __skb_pull
include/linux/skbuff.h:2082 [inline]
#47: 00000000c22902c0 (rcu_read_lock){....}, at:
skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
INFO: lockdep is turned off.
CPU: 1 PID: 30643 Comm: syz-executor6 Not tainted 4.17.0-rc6+ #68
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
__lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449
lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
rcu_lock_acquire include/linux/rcupdate.h:246 [inline]
rcu_read_lock include/linux/rcupdate.h:632 [inline]
skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
nsh_gso_segment+0x470/0xb40 net/nsh/nsh.c:111
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
__skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
skb_gso_segment include/linux/netdevice.h:4072 [inline]
validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3122
__dev_queue_xmit+0xc0c/0x3900 net/core/dev.c:3579
dev_queue_xmit+0x17/0x20 net/core/dev.c:3620
packet_snd net/packet/af_packet.c:2921 [inline]
packet_sendmsg+0x4275/0x6100 net/packet/af_packet.c:2946
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg+0xd5/0x120 net/socket.c:639
__sys_sendto+0x3d7/0x670 net/socket.c:1789
__do_sys_sendto net/socket.c:1801 [inline]
__se_sys_sendto net/socket.c:1797 [inline]
__x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455a09
RSP: 002b:00007f125b3edc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f125b3ee6d4 RCX: 0000000000455a09
RDX: 0000000000000176 RSI: 00000000200000c0 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000020000080 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000005d8 R14: 00000000006fdce0 R15: 0000000000000000
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 30643 Comm: syz-executor6 Not tainted 4.17.0-rc6+ #68
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_reset_network_header include/linux/skbuff.h:2306 [inline]
RIP: 0010:nsh_gso_segment+0x3a/0xb40 net/nsh/nsh.c:87
RSP: 0018:ffff8801aa585f98 EFLAGS: 00010a06
RAX: dffffc0000000000 RBX: ffffffff897e00e0 RCX: 1fecbfffffde10cc
RDX: ffffed00354b0c22 RSI: ffffffff876a7afd RDI: ff65fffffef0858b
RBP: ffff8801aa586020 R08: ffff880198924600 R09: 0000000000000000
R10: fffffbfff12ea1dc R11: ffff880198924600 R12: ff65fffffef0858b
R13: dffffc0000000000 R14: 0000000000004f89 R15: 0000000000004f89
FS: 00007f125b3ee700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
kasan: CONFIG_KASAN_INLINE enabled
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200002c0 CR3: 00000001d8346000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
Code:
------------[ cut here ]------------
kasan: GPF could be caused by NULL-ptr deref or user memory access
Bad or missing usercopy whitelist? Kernel memory overwrite attempt detected
to SLAB object 'vm_area_struct' (offset 119, size 1)!
WARNING: CPU: 1 PID: 30643 at mm/usercopy.c:81 usercopy_warn+0xf5/0x120
mm/usercopy.c:76
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
^ permalink raw reply
* Re: [PATCH 1/2] r8169: Don't disable ASPM in the driver
From: Heiner Kallweit @ 2018-06-12 19:30 UTC (permalink / raw)
To: Kai-Heng Feng, davem
Cc: ryankao, hayeswang, hau, romieu, bhelgaas, netdev, linux-pci,
linux-kernel
In-Reply-To: <20180612095759.6828-1-kai.heng.feng@canonical.com>
On 12.06.2018 11:57, Kai-Heng Feng wrote:
> Enable or disable ASPM should be done in PCI core instead of in the
> device driver.
>
> Commit ba04c7c93bbc ("r8169: disable ASPM") uses
> pci_disable_link_state() to disable ASPM. This is incorrect, if the
> device really needs to disable ASPM, we should use a quirk in PCI core
> to prevent the PCI core from setting ASPM altogether.
>
I wouldn't call using pci_disable_link_state() in a driver incorrect
(as it works), there is just a better way which is more in line with
the PCI subsystem architecture.
> Let's remove pci_disable_link_state() for now. Use PCI core quirks if
> any regression happens.
>
The vendor driver disables ASPM unconditionally for chip version 25
(there it's METHOD_9), so I think ASPM support is broken in this chip
version. I'll cook a PCI quirk.
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Please note that netdev is closed currently. Once 4.18-RC1 is out it
will be re-opened. Then please re-submit properly annotating PATCH
with "net-next" (I've forgotten this often enough myself).
> ---
> v2:
> - Remove module parameter.
> - Remove pci_disable_link_state().
>
> drivers/net/ethernet/realtek/r8169.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 75dfac0248f4..9b55ce513a36 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -25,7 +25,6 @@
> #include <linux/dma-mapping.h>
> #include <linux/pm_runtime.h>
> #include <linux/firmware.h>
> -#include <linux/pci-aspm.h>
> #include <linux/prefetch.h>
> #include <linux/ipv6.h>
> #include <net/ip6_checksum.h>
> @@ -7647,10 +7646,6 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> mii->reg_num_mask = 0x1f;
> mii->supports_gmii = cfg->has_gmii;
>
> - /* disable ASPM completely as that cause random device stop working
> - * problems as well as full system hangs for some PCIe devices users */
> - pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
> - PCIE_LINK_STATE_CLKPM);
>
> /* enable device (incl. PCI PM wakeup and hotplug setup) */
> rc = pcim_enable_device(pdev);
>
^ permalink raw reply
* Re: [PATCH 2/2] r8169: Reinstate ASPM Support
From: Heiner Kallweit @ 2018-06-12 19:35 UTC (permalink / raw)
To: Kai-Heng Feng, davem
Cc: ryankao, hayeswang, hau, romieu, bhelgaas, netdev, linux-pci,
linux-kernel
In-Reply-To: <20180612095759.6828-2-kai.heng.feng@canonical.com>
On 12.06.2018 11:57, Kai-Heng Feng wrote:
> On newer Intel platforms, ASPM support in r8169 is the last missing
> puzzle to let Package C-State achieves PC8. Without ASPM support, the
> deepest Package C-State can hit is PC3.
> PC8 can save additional ~3W in comparison with PC3 on my testing
> platform.
>
Maybe we should replace PC8 with "beyond PC3". My system
(Haswell 2961Y) reaches 50% PC7 + 5% PC9 + 45% PC10 now.
It never seems to use PC8.
> The original patch is from Realtek.
>
Please add a link to this original patch.
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> v2:
> - Remove module parameter.
> - Remove pci_disable_link_state().
>
> drivers/net/ethernet/realtek/r8169.c | 41 +++++++++++++++++++---------
> 1 file changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 9b55ce513a36..85f4e746b040 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -5289,6 +5289,18 @@ static void rtl_pcie_state_l2l3_enable(struct rtl8169_private *tp, bool enable)
> RTL_W8(tp, Config3, data);
> }
>
> +static void rtl_hw_internal_aspm_clkreq_enable(struct rtl8169_private *tp,
> + bool enable)
Do we need this hw_internal in the function name?
> +{
> + if (enable) {
> + RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
> + RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
> + } else {
> + RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
> + RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
> + }
> +}
> +
> static void rtl_hw_start_8168bb(struct rtl8169_private *tp)
> {
> RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Beacon_en);
> @@ -5645,9 +5657,9 @@ static void rtl_hw_start_8168g_1(struct rtl8169_private *tp)
> rtl_hw_start_8168g(tp);
>
> /* disable aspm and clock request before access ephy */
> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
> rtl_ephy_init(tp, e_info_8168g_1, ARRAY_SIZE(e_info_8168g_1));
> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
> }
>
> static void rtl_hw_start_8168g_2(struct rtl8169_private *tp)
> @@ -5680,9 +5692,9 @@ static void rtl_hw_start_8411_2(struct rtl8169_private *tp)
> rtl_hw_start_8168g(tp);
>
> /* disable aspm and clock request before access ephy */
> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
> rtl_ephy_init(tp, e_info_8411_2, ARRAY_SIZE(e_info_8411_2));
> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
> }
>
> static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
> @@ -5699,8 +5711,7 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
> };
>
> /* disable aspm and clock request before access ephy */
> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
> rtl_ephy_init(tp, e_info_8168h_1, ARRAY_SIZE(e_info_8168h_1));
>
> RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
> @@ -5779,6 +5790,8 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
> r8168_mac_ocp_write(tp, 0xe63e, 0x0000);
> r8168_mac_ocp_write(tp, 0xc094, 0x0000);
> r8168_mac_ocp_write(tp, 0xc09e, 0x0000);
> +
> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
> }
>
> static void rtl_hw_start_8168ep(struct rtl8169_private *tp)
> @@ -5830,11 +5843,12 @@ static void rtl_hw_start_8168ep_1(struct rtl8169_private *tp)
> };
>
> /* disable aspm and clock request before access ephy */
> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
> rtl_ephy_init(tp, e_info_8168ep_1, ARRAY_SIZE(e_info_8168ep_1));
>
> rtl_hw_start_8168ep(tp);
> +
> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
> }
>
> static void rtl_hw_start_8168ep_2(struct rtl8169_private *tp)
> @@ -5846,14 +5860,15 @@ static void rtl_hw_start_8168ep_2(struct rtl8169_private *tp)
> };
>
> /* disable aspm and clock request before access ephy */
> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
> rtl_ephy_init(tp, e_info_8168ep_2, ARRAY_SIZE(e_info_8168ep_2));
>
> rtl_hw_start_8168ep(tp);
>
> RTL_W8(tp, DLLPR, RTL_R8(tp, DLLPR) & ~PFM_EN);
> RTL_W8(tp, MISC_1, RTL_R8(tp, MISC_1) & ~PFM_D3COLD_EN);
> +
> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
> }
>
> static void rtl_hw_start_8168ep_3(struct rtl8169_private *tp)
> @@ -5867,8 +5882,7 @@ static void rtl_hw_start_8168ep_3(struct rtl8169_private *tp)
> };
>
> /* disable aspm and clock request before access ephy */
> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
> rtl_ephy_init(tp, e_info_8168ep_3, ARRAY_SIZE(e_info_8168ep_3));
>
> rtl_hw_start_8168ep(tp);
> @@ -5888,6 +5902,8 @@ static void rtl_hw_start_8168ep_3(struct rtl8169_private *tp)
> data = r8168_mac_ocp_read(tp, 0xe860);
> data |= 0x0080;
> r8168_mac_ocp_write(tp, 0xe860, data);
> +
> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
> }
>
> static void rtl_hw_start_8168(struct rtl8169_private *tp)
> @@ -7646,7 +7662,6 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> mii->reg_num_mask = 0x1f;
> mii->supports_gmii = cfg->has_gmii;
>
> -
> /* enable device (incl. PCI PM wakeup and hotplug setup) */
> rc = pcim_enable_device(pdev);
> if (rc < 0) {
>
^ permalink raw reply
* Re: [PATCH net-next 0/10] xfrm: remove flow cache
From: Kristian Evensen @ 2018-06-12 19:42 UTC (permalink / raw)
To: David Miller
Cc: Florian Westphal, Network Development, Steffen Klassert, ilant
In-Reply-To: <20170718.111535.1186267705268802212.davem@davemloft.net>
Hello,
On Tue, Jul 18, 2017 at 8:15 PM, David Miller <davem@davemloft.net> wrote:
> Steffen, I know you have some level of trepidation about this because
> there is obviously some performance cost immediately for removing this
> DoS problem.
In a project I am involved in, we are running ipsec (Strongswan) on
different mt7621-based routers. Each router is configured as an
initiator and has around ~30 tunnels to different responders (running
on misc. devices). Before the flow cache was removed (kernel 4.9), we
got a combined throughput of around 70Mbit/s for all tunnels on one
router. However, we recently switched to kernel 4.14 (4.14.48), and
the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
drop of around 20%. Reverting the flow cache removal restores, as
expected, performance levels to that of kernel 4.9.
Carrying around a fairly large revert patch is not something we want,
we are more interested in trying to fix at least some of the
performance problems. However, we are not very experienced when it
comes to profiling the kernel code or the xfrm-code itself. Are there
any known areas we should take a special look at, or should we just
read-up on different profiling tools and get started?
Also, the revert went very smooth, which always makes me a bit
nervous. Are there any parts of the flow cache removal that should or
would require a bit of special care when reverted?
Thanks in advance for any help.
BR,
Kristian
^ permalink raw reply
* Re: [PATCH net-next 6/6] Documentation: networking: cpsw: add MQPRIO & CBS offload examples
From: Grygorii Strashko @ 2018-06-12 19:55 UTC (permalink / raw)
To: Ivan Khoronzhuk, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, ilias.apalodimas,
p-varis, spatton, francois.ozog, yogeshs, nsekhar
In-Reply-To: <20180611133047.4818-7-ivan.khoronzhuk@linaro.org>
On 06/11/2018 08:30 AM, Ivan Khoronzhuk wrote:
> This document describes MQPRIO and CBS Qdisc offload configuration
> for cpsw driver based on examples. It potentially can be used in
> audio video bridging (AVB) and time sensitive networking (TSN).
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
> Documentation/networking/cpsw.txt | 540 ++++++++++++++++++++++++++++++
> 1 file changed, 540 insertions(+)
> create mode 100644 Documentation/networking/cpsw.txt
>
> diff --git a/Documentation/networking/cpsw.txt b/Documentation/networking/cpsw.txt
> new file mode 100644
> index 000000000000..f5d58f502e52
> --- /dev/null
> +++ b/Documentation/networking/cpsw.txt
Could you name it with "ti" prefix, pls?
Like "ti-cpsw.txt" or "ti,cpsw.txt"
> @@ -0,0 +1,540 @@
> +* Texas Instruments CPSW ethernet driver
> +
> +Multiqueue & CBS & MQPRIO
> +=====================================================================
> +=====================================================================
[...]
--
regards,
-grygorii
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox