Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v2 09/14] devlink: Add generic packet traps and groups
From: Ido Schimmel @ 2019-08-13  7:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, jiri, toke, dsahern, roopa, nikolay,
	jakub.kicinski, andy, f.fainelli, andrew, vivien.didelot, mlxsw,
	Ido Schimmel
In-Reply-To: <20190813075400.11841-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Add generic packet traps and groups that can report dropped packets as
well as exceptions such as TTL error.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h | 40 ++++++++++++++++++++++++++++++++++++++++
 net/core/devlink.c    | 12 ++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 03b32e33e93e..fb02e0e89f9d 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -541,18 +541,58 @@ struct devlink_trap {
 };
 
 enum devlink_trap_generic_id {
+	DEVLINK_TRAP_GENERIC_ID_SMAC_MC,
+	DEVLINK_TRAP_GENERIC_ID_VLAN_TAG_MISMATCH,
+	DEVLINK_TRAP_GENERIC_ID_INGRESS_VLAN_FILTER,
+	DEVLINK_TRAP_GENERIC_ID_INGRESS_STP_FILTER,
+	DEVLINK_TRAP_GENERIC_ID_EMPTY_TX_LIST,
+	DEVLINK_TRAP_GENERIC_ID_PORT_LOOPBACK_FILTER,
+	DEVLINK_TRAP_GENERIC_ID_BLACKHOLE_ROUTE,
+	DEVLINK_TRAP_GENERIC_ID_TTL_ERROR,
+	DEVLINK_TRAP_GENERIC_ID_TAIL_DROP,
+
 	/* Add new generic trap IDs above */
 	__DEVLINK_TRAP_GENERIC_ID_MAX,
 	DEVLINK_TRAP_GENERIC_ID_MAX = __DEVLINK_TRAP_GENERIC_ID_MAX - 1,
 };
 
 enum devlink_trap_group_generic_id {
+	DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS,
+	DEVLINK_TRAP_GROUP_GENERIC_ID_L3_DROPS,
+	DEVLINK_TRAP_GROUP_GENERIC_ID_BUFFER_DROPS,
+
 	/* Add new generic trap group IDs above */
 	__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX,
 	DEVLINK_TRAP_GROUP_GENERIC_ID_MAX =
 		__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX - 1,
 };
 
+#define DEVLINK_TRAP_GENERIC_NAME_SMAC_MC \
+	"source_mac_is_multicast"
+#define DEVLINK_TRAP_GENERIC_NAME_VLAN_TAG_MISMATCH \
+	"vlan_tag_mismatch"
+#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_VLAN_FILTER \
+	"ingress_vlan_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_STP_FILTER \
+	"ingress_spanning_tree_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_EMPTY_TX_LIST \
+	"port_list_is_empty"
+#define DEVLINK_TRAP_GENERIC_NAME_PORT_LOOPBACK_FILTER \
+	"port_loopback_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_BLACKHOLE_ROUTE \
+	"blackhole_route"
+#define DEVLINK_TRAP_GENERIC_NAME_TTL_ERROR \
+	"ttl_value_is_too_small"
+#define DEVLINK_TRAP_GENERIC_NAME_TAIL_DROP \
+	"tail_drop"
+
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \
+	"l2_drops"
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L3_DROPS \
+	"l3_drops"
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_BUFFER_DROPS \
+	"buffer_drops"
+
 #define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group, _metadata_cap) \
 	{								      \
 		.type = DEVLINK_TRAP_TYPE_##_type,			      \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 3a3ef2f8c133..6b15b9f744c7 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -7442,6 +7442,15 @@ EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
 	}
 
 static const struct devlink_trap devlink_trap_generic[] = {
+	DEVLINK_TRAP(SMAC_MC, DROP),
+	DEVLINK_TRAP(VLAN_TAG_MISMATCH, DROP),
+	DEVLINK_TRAP(INGRESS_VLAN_FILTER, DROP),
+	DEVLINK_TRAP(INGRESS_STP_FILTER, DROP),
+	DEVLINK_TRAP(EMPTY_TX_LIST, DROP),
+	DEVLINK_TRAP(PORT_LOOPBACK_FILTER, DROP),
+	DEVLINK_TRAP(BLACKHOLE_ROUTE, DROP),
+	DEVLINK_TRAP(TTL_ERROR, EXCEPTION),
+	DEVLINK_TRAP(TAIL_DROP, DROP),
 };
 
 #define DEVLINK_TRAP_GROUP(_id)						      \
@@ -7451,6 +7460,9 @@ static const struct devlink_trap devlink_trap_generic[] = {
 	}
 
 static const struct devlink_trap_group devlink_trap_group_generic[] = {
+	DEVLINK_TRAP_GROUP(L2_DROPS),
+	DEVLINK_TRAP_GROUP(L3_DROPS),
+	DEVLINK_TRAP_GROUP(BUFFER_DROPS),
 };
 
 static int devlink_trap_generic_verify(const struct devlink_trap *trap)
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH 13/16] include/asm-generic: prefer __section from compiler_attributes.h
From: Arnd Bergmann @ 2019-08-13  7:57 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Andrew Morton, sedat.dilek, Josh Poimboeuf, Yonghong Song,
	Miguel Ojeda Sandonis, clang-built-linux, Naveen N. Rao,
	Anil S Keshavamurthy, David S. Miller, Masami Hiramatsu,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	linux-arch, Linux Kernel Mailing List, Networking, bpf
In-Reply-To: <20190812215052.71840-13-ndesaulniers@google.com>

On Mon, Aug 12, 2019 at 11:52 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

The patch looks fine, but it looks like you forgot to add a description.

       Arnd

^ permalink raw reply

* Re: [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Felipe Balbi @ 2019-08-13  7:50 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Richard Cochran, netdev, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, x86, linux-kernel,
	Christopher S . Hall
In-Reply-To: <20190719132021.GC24930@lunn.ch>


Hi,

Andrew Lunn <andrew@lunn.ch> writes:
>> Andrew Lunn <andrew@lunn.ch> writes:
>> > On Tue, Jul 16, 2019 at 10:20:33AM +0300, Felipe Balbi wrote:
>> >> TGPIO is a new IP which allows for time synchronization between systems
>> >> without any other means of synchronization such as PTP or NTP. The
>> >> driver is implemented as part of the PTP framework since its features
>> >> covered most of what this controller can do.
>> >
>> > Hi Felipe
>> >
>> > Given the name TGPIO, can it also be used for plain old boring GPIO?
>> 
>> not really, no. This is a misnomer, IMHO :-) We can only assert output
>> pulses at specified intervals or capture a timestamp of an external
>> signal.
>
> Hi Felipe
>
> So i guess Intel Marketing wants to call it a GPIO, but between
> engineers can we give it a better name?

If we do that we make it difficult for those reading specification and
trying to find the matching driver.

>> > Also, is this always embedded into a SoC? Or could it actually be in a
>> > discrete NIC?
>> 
>> Technically, this could be done as a discrete, but it isn't. In any
>> case, why does that matter? From a linux-point of view, we have a device
>> driver either way.
>
> I've seen a lot of i210 used with ARM SoCs. How necessary is the tsc
> patch? Is there an architecture independent alternative?

Without the TSC patch, we don't get the timestamp we need. One can argue
that $this driver could call get_tsc_ns() directly instead of providing
a wrapper for it. But that's something else entirely.

-- 
balbi

^ permalink raw reply

* Re: tun: mark small packets as owned by the tap sock
From: Jack Wang @ 2019-08-13  7:58 UTC (permalink / raw)
  To: Dave Jones; +Cc: Alexis Bauvin, netdev, stable
In-Reply-To: <20190812221954.GA13314@codemonkey.org.uk>

Dave Jones <davej@codemonkey.org.uk> 于2019年8月13日周二 上午1:05写道：
>
> On Wed, Aug 07, 2019 at 12:30:07AM +0000, Linux Kernel wrote:
>  > Commit:     4b663366246be1d1d4b1b8b01245b2e88ad9e706
>  > Parent:     16b2084a8afa1432d14ba72b7c97d7908e178178
>  > Web:        https://git.kernel.org/torvalds/c/4b663366246be1d1d4b1b8b01245b2e88ad9e706
>  > Author:     Alexis Bauvin <abauvin@scaleway.com>
>  > AuthorDate: Tue Jul 23 16:23:01 2019 +0200
>  >
>  >     tun: mark small packets as owned by the tap sock
>  >
>  >     - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
>  >
>  >     Small packets going out of a tap device go through an optimized code
>  >     path that uses build_skb() rather than sock_alloc_send_pskb(). The
>  >     latter calls skb_set_owner_w(), but the small packet code path does not.
>  >
>  >     The net effect is that small packets are not owned by the userland
>  >     application's socket (e.g. QEMU), while large packets are.
>  >     This can be seen with a TCP session, where packets are not owned when
>  >     the window size is small enough (around PAGE_SIZE), while they are once
>  >     the window grows (note that this requires the host to support virtio
>  >     tso for the guest to offload segmentation).
>  >     All this leads to inconsistent behaviour in the kernel, especially on
>  >     netfilter modules that uses sk->socket (e.g. xt_owner).
>  >
>  >     Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet")
>  >     Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
>  >     Acked-by: Jason Wang <jasowang@redhat.com>
>
> This commit breaks ipv6 routing when I deployed on it a linode.
> It seems to work briefly after boot, and then silently all packets get
> dropped. (Presumably, it's dropping RA or ND packets)
>
> With this reverted, everything works as it did in rc3.
>
>         Dave
>
Thanks for reporting, Dave.

+cc stable
Just noticed, the patch has been backported to  4.14,4.19, 5.2

Regards,
Jack Wang

^ permalink raw reply

* [PATCH net-next] net/mvpp2: Replace tasklet with softirq hrtimer
From: Sebastian Andrzej Siewior @ 2019-08-13  8:00 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Gleixner, David S. Miller, Maxime Chevallier

From: Thomas Gleixner <tglx@linutronix.de>

The tx_done_tasklet tasklet is used in invoke the hrtimer
(mvpp2_hr_timer_cb) in softirq context. This can be also achieved without
the tasklet but with HRTIMER_MODE_SOFT as hrtimer mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |  3 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 58 +++++++------------
 2 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index 4d9564ba68f69..ee3bab508ee8c 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -829,9 +829,8 @@ struct mvpp2_pcpu_stats {
 /* Per-CPU port control */
 struct mvpp2_port_pcpu {
 	struct hrtimer tx_done_timer;
+	struct net_device *dev;
 	bool timer_scheduled;
-	/* Tasklet for egress finalization */
-	struct tasklet_struct tx_done_tasklet;
 };
 
 struct mvpp2_queue_vector {
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 74fd9e1718654..12e799e99803c 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -2651,31 +2651,21 @@ static irqreturn_t mvpp2_link_status_isr(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-static void mvpp2_timer_set(struct mvpp2_port_pcpu *port_pcpu)
-{
-	ktime_t interval;
-
-	if (!port_pcpu->timer_scheduled) {
-		port_pcpu->timer_scheduled = true;
-		interval = MVPP2_TXDONE_HRTIMER_PERIOD_NS;
-		hrtimer_start(&port_pcpu->tx_done_timer, interval,
-			      HRTIMER_MODE_REL_PINNED);
-	}
-}
-
-static void mvpp2_tx_proc_cb(unsigned long data)
+static enum hrtimer_restart mvpp2_hr_timer_cb(struct hrtimer *timer)
 {
-	struct net_device *dev = (struct net_device *)data;
-	struct mvpp2_port *port = netdev_priv(dev);
+	struct net_device *dev;
+	struct mvpp2_port *port;
 	struct mvpp2_port_pcpu *port_pcpu;
 	unsigned int tx_todo, cause;
 
-	port_pcpu = per_cpu_ptr(port->pcpu,
-				mvpp2_cpu_to_thread(port->priv, smp_processor_id()));
+	port_pcpu = container_of(timer, struct mvpp2_port_pcpu, tx_done_timer);
+	dev = port_pcpu->dev;
 
 	if (!netif_running(dev))
-		return;
+		return HRTIMER_NORESTART;
+
 	port_pcpu->timer_scheduled = false;
+	port = netdev_priv(dev);
 
 	/* Process all the Tx queues */
 	cause = (1 << port->ntxqs) - 1;
@@ -2683,18 +2673,13 @@ static void mvpp2_tx_proc_cb(unsigned long data)
 				mvpp2_cpu_to_thread(port->priv, smp_processor_id()));
 
 	/* Set the timer in case not all the packets were processed */
-	if (tx_todo)
-		mvpp2_timer_set(port_pcpu);
-}
-
-static enum hrtimer_restart mvpp2_hr_timer_cb(struct hrtimer *timer)
-{
-	struct mvpp2_port_pcpu *port_pcpu = container_of(timer,
-							 struct mvpp2_port_pcpu,
-							 tx_done_timer);
-
-	tasklet_schedule(&port_pcpu->tx_done_tasklet);
+	if (tx_todo && !port_pcpu->timer_scheduled) {
+		port_pcpu->timer_scheduled = true;
+		hrtimer_forward_now(&port_pcpu->tx_done_timer,
+				    MVPP2_TXDONE_HRTIMER_PERIOD_NS);
 
+		return HRTIMER_RESTART;
+	}
 	return HRTIMER_NORESTART;
 }
 
@@ -3182,7 +3167,12 @@ static netdev_tx_t mvpp2_tx(struct sk_buff *skb, struct net_device *dev)
 	    txq_pcpu->count > 0) {
 		struct mvpp2_port_pcpu *port_pcpu = per_cpu_ptr(port->pcpu, thread);
 
-		mvpp2_timer_set(port_pcpu);
+		if (!port_pcpu->timer_scheduled) {
+			port_pcpu->timer_scheduled = true;
+			hrtimer_start(&port_pcpu->tx_done_timer,
+				      MVPP2_TXDONE_HRTIMER_PERIOD_NS,
+				      HRTIMER_MODE_REL_PINNED_SOFT);
+		}
 	}
 
 	if (test_bit(thread, &port->priv->lock_map))
@@ -3619,7 +3609,6 @@ static int mvpp2_stop(struct net_device *dev)
 
 			hrtimer_cancel(&port_pcpu->tx_done_timer);
 			port_pcpu->timer_scheduled = false;
-			tasklet_kill(&port_pcpu->tx_done_tasklet);
 		}
 	}
 	mvpp2_cleanup_rxqs(port);
@@ -5183,13 +5172,10 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 			port_pcpu = per_cpu_ptr(port->pcpu, thread);
 
 			hrtimer_init(&port_pcpu->tx_done_timer, CLOCK_MONOTONIC,
-				     HRTIMER_MODE_REL_PINNED);
+				     HRTIMER_MODE_REL_PINNED_SOFT);
 			port_pcpu->tx_done_timer.function = mvpp2_hr_timer_cb;
 			port_pcpu->timer_scheduled = false;
-
-			tasklet_init(&port_pcpu->tx_done_tasklet,
-				     mvpp2_tx_proc_cb,
-				     (unsigned long)dev);
+			port_pcpu->dev = dev;
 		}
 	}
 
-- 
2.23.0.rc1


^ permalink raw reply related

* Re: [PATCH v2 bpf-next] mm: mmap: increase sockets maximum memory size pgoff for 32bits
From: Magnus Karlsson @ 2019-08-13  8:02 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: Björn Töpel, linux-mm, Xdp, Network Development, bpf,
	linux-kernel, akpm, Alexei Starovoitov, Karlsson, Magnus
In-Reply-To: <20190812124326.32146-1-ivan.khoronzhuk@linaro.org>

On Mon, Aug 12, 2019 at 2:45 PM Ivan Khoronzhuk
<ivan.khoronzhuk@linaro.org> wrote:
>
> The AF_XDP sockets umem mapping interface uses XDP_UMEM_PGOFF_FILL_RING
> and XDP_UMEM_PGOFF_COMPLETION_RING offsets. The offsets seems like are
> established already and are part of configuration interface.
>
> But for 32-bit systems, while AF_XDP socket configuration, the values
> are to large to pass maximum allowed file size verification.
> The offsets can be tuned ofc, but instead of changing existent
> interface - extend max allowed file size for sockets.

Can you use mmap2() instead that takes a larger offset (2^44) even on
32-bit systems?

/Magnus

> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>
> Based on bpf-next/master
>
> v2..v1:
>         removed not necessarily #ifdev as ULL and UL for 64 has same size
>
>  mm/mmap.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 7e8c3e8ae75f..578f52812361 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1358,6 +1358,9 @@ static inline u64 file_mmap_size_max(struct file *file, struct inode *inode)
>         if (S_ISBLK(inode->i_mode))
>                 return MAX_LFS_FILESIZE;
>
> +       if (S_ISSOCK(inode->i_mode))
> +               return MAX_LFS_FILESIZE;
> +
>         /* Special "we do even unsigned file positions" case */
>         if (file->f_mode & FMODE_UNSIGNED_OFFSET)
>                 return 0;
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Jason Wang @ 2019-08-13  8:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, linux-mm, jgg
In-Reply-To: <20190812054429-mutt-send-email-mst@kernel.org>


On 2019/8/12 下午5:49, Michael S. Tsirkin wrote:
> On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:
>> On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
>>> On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
>>>> Hi all:
>>>>
>>>> This series try to fix several issues introduced by meta data
>>>> accelreation series. Please review.
>>>>
>>>> Changes from V4:
>>>> - switch to use spinlock synchronize MMU notifier with accessors
>>>>
>>>> Changes from V3:
>>>> - remove the unnecessary patch
>>>>
>>>> Changes from V2:
>>>> - use seqlck helper to synchronize MMU notifier with vhost worker
>>>>
>>>> Changes from V1:
>>>> - try not use RCU to syncrhonize MMU notifier with vhost worker
>>>> - set dirty pages after no readers
>>>> - return -EAGAIN only when we find the range is overlapped with
>>>>     metadata
>>>>
>>>> Jason Wang (9):
>>>>     vhost: don't set uaddr for invalid address
>>>>     vhost: validate MMU notifier registration
>>>>     vhost: fix vhost map leak
>>>>     vhost: reset invalidate_count in vhost_set_vring_num_addr()
>>>>     vhost: mark dirty pages during map uninit
>>>>     vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
>>>>     vhost: do not use RCU to synchronize MMU notifier with worker
>>>>     vhost: correctly set dirty pages in MMU notifiers callback
>>>>     vhost: do not return -EAGAIN for non blocking invalidation too early
>>>>
>>>>    drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
>>>>    drivers/vhost/vhost.h |   6 +-
>>>>    2 files changed, 122 insertions(+), 86 deletions(-)
>>> This generally looks more solid.
>>>
>>> But this amounts to a significant overhaul of the code.
>>>
>>> At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
>>> for this release, and then re-apply a corrected version
>>> for the next one?
>>
>> If possible, consider we've actually disabled the feature. How about just
>> queued those patches for next release?
>>
>> Thanks
> Sorry if I was unclear. My idea is that
> 1. I revert the disabled code
> 2. You send a patch readding it with all the fixes squashed
> 3. Maybe optimizations on top right away?
> 4. We queue *that* for next and see what happens.
>
> And the advantage over the patchy approach is that the current patches
> are hard to review. E.g.  it's not reasonable to ask RCU guys to review
> the whole of vhost for RCU usage but it's much more reasonable to ask
> about a specific patch.


Ok. Then I agree to revert.

Thanks


^ permalink raw reply

* [PATCH] net/mlx5: Fix a memory leak bug
From: Wenwen Wang @ 2019-08-13  8:21 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller,
	open list:MELLANOX MLX5 core VPI driver,
	open list:MELLANOX MLX5 core VPI driver, open list

In mlx5_cmd_invoke(), 'ent' is allocated through kzalloc() in alloc_cmd().
After the work is queued, wait_func() is invoked to wait the completion of
the work. If wait_func() returns -ETIMEDOUT, the following execution will
be terminated. However, the allocated 'ent' is not deallocated on this
program path, leading to a memory leak bug.

To fix the above issue, free 'ent' before returning the error.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 8cdd7e6..90cdb9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1036,7 +1036,7 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in,
 
 	err = wait_func(dev, ent);
 	if (err == -ETIMEDOUT)
-		goto out;
+		goto out_free;
 
 	ds = ent->ts2 - ent->ts1;
 	op = MLX5_GET(mbox_in, in->first.data, opcode);
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Will Deacon @ 2019-08-13  8:27 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: akpm, sedat.dilek, jpoimboe, yhs, miguel.ojeda.sandonis,
	clang-built-linux, Catalin Marinas, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrey Konovalov,
	Greg Kroah-Hartman, Enrico Weigelt, Suzuki K Poulose,
	Thomas Gleixner, Masayoshi Mizuma, Shaokun Zhang, Alexios Zavras,
	Allison Randal, linux-arm-kernel, linux-kernel, netdev, bpf
In-Reply-To: <20190812215052.71840-12-ndesaulniers@google.com>

Hi Nick,

On Mon, Aug 12, 2019 at 02:50:45PM -0700, Nick Desaulniers wrote:
> GCC unescapes escaped string section names while Clang does not. Because
> __section uses the `#` stringification operator for the section name, it
> doesn't need to be escaped.
> 
> This antipattern was found with:
> $ grep -e __section\(\" -e __section__\(\" -r
> 
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
>  arch/arm64/include/asm/cache.h     | 2 +-
>  arch/arm64/kernel/smp_spin_table.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Does this fix a build issue, or is it just cosmetic or do we end up with
duplicate sections or something else?

Happy to route it via arm64, just having trouble working out whether it's
5.3 material!

Will

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain index
From: Paul Blakey @ 2019-08-13  8:29 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: Linux Kernel Network Developers, David S. Miller, Justin Pettit,
	Simon Horman, Marcelo Ricardo Leitner, Vlad Buslov, Jiri Pirko,
	Roi Dayan, Yossi Kuperman, Rony Efraim, Oz Shlomo
In-Reply-To: <CAOrHB_CDrau-jLycRYxRkn1tEXVrRhoSYSd8sAcGPiZ-bp+FEg@mail.gmail.com>


On 8/12/2019 7:18 PM, Pravin Shelar wrote:
> On Sun, Aug 11, 2019 at 3:46 AM Paul Blakey <paulb@mellanox.com> wrote:
>>
>> On 8/8/2019 11:53 PM, Pravin Shelar wrote:
>>> On Wed, Aug 7, 2019 at 5:08 AM Paul Blakey <paulb@mellanox.com> wrote:
>>>> Offloaded OvS datapath rules are translated one to one to tc rules,
>>>> for example the following simplified OvS rule:
>>>>
>>>> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
>>>>
>>>> Will be translated to the following tc rule:
>>>>
>>>> $ tc filter add dev dev1 ingress \
>>>>               prio 1 chain 0 proto ip \
>>>>                   flower tcp ct_state -trk \
>>>>                   action ct pipe \
>>>>                   action goto chain 2
>>>>
>>>> Received packets will first travel though tc, and if they aren't stolen
>>>> by it, like in the above rule, they will continue to OvS datapath.
>>>> Since we already did some actions (action ct in this case) which might
>>>> modify the packets, and updated action stats, we would like to continue
>>>> the proccessing with the correct recirc_id in OvS (here recirc_id(2))
>>>> where we left off.
>>>>
>>>> To support this, introduce a new skb extension for tc, which
>>>> will be used for translating tc chain to ovs recirc_id to
>>>> handle these miss cases. Last tc chain index will be set
>>>> by tc goto chain action and read by OvS datapath.
>>>>
>>>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>>>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>>    include/linux/skbuff.h    | 13 +++++++++++++
>>>>    include/net/sch_generic.h |  5 ++++-
>>>>    net/core/skbuff.c         |  6 ++++++
>>>>    net/openvswitch/flow.c    |  9 +++++++++
>>>>    net/sched/Kconfig         | 13 +++++++++++++
>>>>    net/sched/act_api.c       |  1 +
>>>>    net/sched/cls_api.c       | 12 ++++++++++++
>>>>    7 files changed, 58 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 3aef8d8..fb2a792 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -279,6 +279,16 @@ struct nf_bridge_info {
>>>>    };
>>>>    #endif
>>>>
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> +/* Chain in tc_skb_ext will be used to share the tc chain with
>>>> + * ovs recirc_id. It will be set to the current chain by tc
>>>> + * and read by ovs to recirc_id.
>>>> + */
>>>> +struct tc_skb_ext {
>>>> +       __u32 chain;
>>>> +};
>>>> +#endif
>>>> +
>>>>    struct sk_buff_head {
>>>>           /* These two members must be first. */
>>>>           struct sk_buff  *next;
>>>> @@ -4050,6 +4060,9 @@ enum skb_ext_id {
>>>>    #ifdef CONFIG_XFRM
>>>>           SKB_EXT_SEC_PATH,
>>>>    #endif
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> +       TC_SKB_EXT,
>>>> +#endif
>>>>           SKB_EXT_NUM, /* must be last */
>>>>    };
>>>>
>>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>>> index 6b6b012..871feea 100644
>>>> --- a/include/net/sch_generic.h
>>>> +++ b/include/net/sch_generic.h
>>>> @@ -275,7 +275,10 @@ struct tcf_result {
>>>>                           unsigned long   class;
>>>>                           u32             classid;
>>>>                   };
>>>> -               const struct tcf_proto *goto_tp;
>>>> +               struct {
>>>> +                       const struct tcf_proto *goto_tp;
>>>> +                       u32 goto_index;
>>>> +               };
>>>>
>>>>                   /* used in the skb_tc_reinsert function */
>>>>                   struct {
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index ea8e8d3..2b40b5a 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -4087,6 +4087,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>>>>    #ifdef CONFIG_XFRM
>>>>           [SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path),
>>>>    #endif
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> +       [TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext),
>>>> +#endif
>>>>    };
>>>>
>>>>    static __always_inline unsigned int skb_ext_total_length(void)
>>>> @@ -4098,6 +4101,9 @@ static __always_inline unsigned int skb_ext_total_length(void)
>>>>    #ifdef CONFIG_XFRM
>>>>                   skb_ext_type_len[SKB_EXT_SEC_PATH] +
>>>>    #endif
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> +               skb_ext_type_len[TC_SKB_EXT] +
>>>> +#endif
>>>>                   0;
>>>>    }
>>>>
>>>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>>>> index bc89e16..0287ead 100644
>>>> --- a/net/openvswitch/flow.c
>>>> +++ b/net/openvswitch/flow.c
>>>> @@ -816,6 +816,9 @@ static int key_extract_mac_proto(struct sk_buff *skb)
>>>>    int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>>>>                            struct sk_buff *skb, struct sw_flow_key *key)
>>>>    {
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> +       struct tc_skb_ext *tc_ext;
>>>> +#endif
>>>>           int res, err;
>>>>
>>>>           /* Extract metadata from packet. */
>>>> @@ -848,7 +851,13 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>>>>           if (res < 0)
>>>>                   return res;
>>>>           key->mac_proto = res;
>>>> +
>>>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>>>> +       tc_ext = skb_ext_find(skb, TC_SKB_EXT);
>>>> +       key->recirc_id = tc_ext ? tc_ext->chain : 0;
>>>> +#else
>>>>           key->recirc_id = 0;
>>>> +#endif
>>>>
>>> Most of cases the config would be turned on, so the ifdef is not that
>>> useful. Can you add static key to avoid searching the skb-ext in non
>>> offload cases.
>> Hi,
>>
>> What do you mean by a static key?
>>
> https://www.kernel.org/doc/Documentation/static-keys.txt
>
> Static key can be enabled when a flow is added to the tc filter.

Hi and thanks for the feedback,

The skb_ext_find() just checks a single bit on the 
skb->active_extensions, and if so returns an offset. Do you think it 
will impact performance much?


But to your suggestion, do you mean that the first tc goto action 
instance with the relevant ifdef (CONFIG_NET_TC_SKB_EXT) it will enable 
the OvS static key that guards this skb_ext_find()?

I guess calling it in tcf_action_set_ctrlact() if goto_chain != 0.

This will expose some OvS helper function (or static key) to 
net/sched/act_api.c right?


Thanks,

Paul.





^ permalink raw reply

* RE: [PATCH net-next v2 04/12] net: stmmac: Add Split Header support and enable it in XGMAC cores
From: Jose Abreu @ 2019-08-13  8:30 UTC (permalink / raw)
  To: David Miller, jakub.kicinski@netronome.com
  Cc: netdev@vger.kernel.org, Joao.Pinto@synopsys.com,
	peppe.cavallaro@st.com, alexandre.torgue@st.com,
	mcoquelin.stm32@gmail.com,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190812.140618.1288127671943783439.davem@davemloft.net>

++ Jakub

From: David Miller <davem@davemloft.net>
Date: Aug/12/2019, 22:06:18 (UTC+00:00)

> From: Jose Abreu <Jose.Abreu@synopsys.com>
> Date: Mon, 12 Aug 2019 11:44:03 +0200
> 
> > 	- Add performance info (David)
> 
> Ummm...
> 
> Whilst cpu utilization is interesting, I might be mainly interested in
> how this effects "networking" performance.  I find it very surprising
> that it isn't obvious that this is what I wanted.
> 
> Do you not do performance testing on the networking level when you
> make fundamental changes to how packets are processed by the
> hardware/driver?

I do.

In my HW this feature does not impact performance neither improves it as 
I'm already on max of theoretical bandwidth.

I do expect it to reduce CPU usage and memory footprint because we no 
longer have to memcpy the entire buffer to SKB and instead we just copy 
the header and then append payload as page which is passed directly to 
net layer.

This feature is already used in some drivers and is part of GRO 
offloading I think.

Jakub, as David is off can you please comment on how can we proceed with 
this series ? I can add more information in commit log for this patch 
...

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Jason Wang @ 2019-08-13  8:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, linux-mm
In-Reply-To: <20190812130252.GE24457@ziepe.ca>


On 2019/8/12 下午9:02, Jason Gunthorpe wrote:
> On Mon, Aug 12, 2019 at 05:49:08AM -0400, Michael S. Tsirkin wrote:
>> On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:
>>> On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
>>>> On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:
>>>>> Hi all:
>>>>>
>>>>> This series try to fix several issues introduced by meta data
>>>>> accelreation series. Please review.
>>>>>
>>>>> Changes from V4:
>>>>> - switch to use spinlock synchronize MMU notifier with accessors
>>>>>
>>>>> Changes from V3:
>>>>> - remove the unnecessary patch
>>>>>
>>>>> Changes from V2:
>>>>> - use seqlck helper to synchronize MMU notifier with vhost worker
>>>>>
>>>>> Changes from V1:
>>>>> - try not use RCU to syncrhonize MMU notifier with vhost worker
>>>>> - set dirty pages after no readers
>>>>> - return -EAGAIN only when we find the range is overlapped with
>>>>>     metadata
>>>>>
>>>>> Jason Wang (9):
>>>>>     vhost: don't set uaddr for invalid address
>>>>>     vhost: validate MMU notifier registration
>>>>>     vhost: fix vhost map leak
>>>>>     vhost: reset invalidate_count in vhost_set_vring_num_addr()
>>>>>     vhost: mark dirty pages during map uninit
>>>>>     vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
>>>>>     vhost: do not use RCU to synchronize MMU notifier with worker
>>>>>     vhost: correctly set dirty pages in MMU notifiers callback
>>>>>     vhost: do not return -EAGAIN for non blocking invalidation too early
>>>>>
>>>>>    drivers/vhost/vhost.c | 202 +++++++++++++++++++++++++-----------------
>>>>>    drivers/vhost/vhost.h |   6 +-
>>>>>    2 files changed, 122 insertions(+), 86 deletions(-)
>>>> This generally looks more solid.
>>>>
>>>> But this amounts to a significant overhaul of the code.
>>>>
>>>> At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
>>>> for this release, and then re-apply a corrected version
>>>> for the next one?
>>>
>>> If possible, consider we've actually disabled the feature. How about just
>>> queued those patches for next release?
>>>
>>> Thanks
>> Sorry if I was unclear. My idea is that
>> 1. I revert the disabled code
>> 2. You send a patch readding it with all the fixes squashed
>> 3. Maybe optimizations on top right away?
>> 4. We queue *that* for next and see what happens.
>>
>> And the advantage over the patchy approach is that the current patches
>> are hard to review. E.g.  it's not reasonable to ask RCU guys to review
>> the whole of vhost for RCU usage but it's much more reasonable to ask
>> about a specific patch.
> I think there are other problems here too, I don't like that the use
> of mmu notifiers is so different from every other driver, or that GUP
> is called under spinlock.


What kind of issues do you see? Spinlock is to synchronize GUP with MMU 
notifier in this series.

Btw, back to the original question. May I know why synchronize_rcu() is 
not suitable? Consider:

- MMU notifier are allowed to sleep
- MMU notifier could be preempted

If you mean something that prevents RCU grace period from running. I'm 
afraid MMU notifier is not the only victim.  But it should be no more 
worse than some one is holding a lock for very long time. If the only 
concern is the preemption of vhost kthread, I can switch to use 
rcu_read_lock_bh() instead.

Thanks


>
> So I favor the revert and try again approach as well. It is hard to
> get a clear picture with these endless bug fix patches
>
> Jason


Ok.

Thanks


^ permalink raw reply

* Re: [PATCH 14/16] include/linux: prefer __section from compiler_attributes.h
From: Ard Biesheuvel @ 2019-08-13  8:31 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Andrew Morton, Sedat Dilek, Josh Poimboeuf, yhs, Miguel Ojeda,
	clang-built-linux, Luc Van Oostenryck, Lai Jiangshan,
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra (Intel), Nicholas Piggin, Jiri Kosina, Will Deacon,
	Michael Ellerman, Masahiro Yamada, Hans Liljestrand,
	Elena Reshetova, David Windsor, Marc Zyngier, Ming Lei,
	Dou Liyang, Julien Thierry, Mauro Carvalho Chehab, Jens Axboe,
	Linux Kernel Mailing List, Linux-Sparse, rcu,
	<netdev@vger.kernel.org>, bpf
In-Reply-To: <20190812215052.71840-14-ndesaulniers@google.com>

On Tue, 13 Aug 2019 at 00:53, Nick Desaulniers <ndesaulniers@google.com> wrote:
>

This patch needs a commit log that describes the reason for making this change.

> Link: https://github.com/ClangBuiltLinux/linux/issues/619
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
>  include/linux/cache.h       | 6 +++---
>  include/linux/compiler.h    | 8 ++++----
>  include/linux/cpu.h         | 2 +-
>  include/linux/export.h      | 2 +-
>  include/linux/init_task.h   | 4 ++--
>  include/linux/interrupt.h   | 5 ++---
>  include/linux/sched/debug.h | 2 +-
>  include/linux/srcutree.h    | 2 +-
>  8 files changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/cache.h b/include/linux/cache.h
> index 750621e41d1c..3f4df9eef1e1 100644
> --- a/include/linux/cache.h
> +++ b/include/linux/cache.h
> @@ -28,7 +28,7 @@
>   * but may get written to during init, so can't live in .rodata (via "const").
>   */
>  #ifndef __ro_after_init
> -#define __ro_after_init __attribute__((__section__(".data..ro_after_init")))
> +#define __ro_after_init __section(.data..ro_after_init)
>  #endif
>
>  #ifndef ____cacheline_aligned
> @@ -45,8 +45,8 @@
>
>  #ifndef __cacheline_aligned
>  #define __cacheline_aligned                                    \
> -  __attribute__((__aligned__(SMP_CACHE_BYTES),                 \
> -                __section__(".data..cacheline_aligned")))
> +       __aligned(SMP_CACHE_BYTES)                              \
> +       __section(.data..cacheline_aligned)
>  #endif /* __cacheline_aligned */
>
>  #ifndef __cacheline_aligned_in_smp
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index f0fd5636fddb..5e88e7e33abe 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -24,7 +24,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>                         long ______r;                                   \
>                         static struct ftrace_likely_data                \
>                                 __aligned(4)                            \
> -                               __section("_ftrace_annotated_branch")   \
> +                               __section(_ftrace_annotated_branch)     \
>                                 ______f = {                             \
>                                 .data.func = __func__,                  \
>                                 .data.file = __FILE__,                  \
> @@ -60,7 +60,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #define __trace_if_value(cond) ({                      \
>         static struct ftrace_branch_data                \
>                 __aligned(4)                            \
> -               __section("_ftrace_branch")             \
> +               __section(_ftrace_branch)               \
>                 __if_trace = {                          \
>                         .func = __func__,               \
>                         .file = __FILE__,               \
> @@ -118,7 +118,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         ".popsection\n\t"
>
>  /* Annotate a C jump table to allow objtool to follow the code flow */
> -#define __annotate_jump_table __section(".rodata..c_jump_table")
> +#define __annotate_jump_table __section(.rodata..c_jump_table)
>
>  #else
>  #define annotate_reachable()
> @@ -298,7 +298,7 @@ unsigned long read_word_at_a_time(const void *addr)
>   * visible to the compiler.
>   */
>  #define __ADDRESSABLE(sym) \
> -       static void * __section(".discard.addressable") __used \
> +       static void * __section(.discard.addressable) __used \
>                 __PASTE(__addressable_##sym, __LINE__) = (void *)&sym;
>
>  /**
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index fcb1386bb0d4..186bbd79d6ce 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -166,7 +166,7 @@ void cpu_startup_entry(enum cpuhp_state state);
>  void cpu_idle_poll_ctrl(bool enable);
>
>  /* Attach to any functions which should be considered cpuidle. */
> -#define __cpuidle      __attribute__((__section__(".cpuidle.text")))
> +#define __cpuidle      __section(.cpuidle.text)
>
>  bool cpu_in_idle(unsigned long pc);
>
> diff --git a/include/linux/export.h b/include/linux/export.h
> index fd8711ed9ac4..808c1a0c2ef9 100644
> --- a/include/linux/export.h
> +++ b/include/linux/export.h
> @@ -104,7 +104,7 @@ struct kernel_symbol {
>   * discarded in the final link stage.
>   */
>  #define __ksym_marker(sym)     \
> -       static int __ksym_marker_##sym[0] __section(".discard.ksym") __used
> +       static int __ksym_marker_##sym[0] __section(.discard.ksym) __used
>
>  #define __EXPORT_SYMBOL(sym, sec)                              \
>         __ksym_marker(sym);                                     \
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 6049baa5b8bc..50139505da34 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -51,12 +51,12 @@ extern struct cred init_cred;
>
>  /* Attach to the init_task data structure for proper alignment */
>  #ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK
> -#define __init_task_data __attribute__((__section__(".data..init_task")))
> +#define __init_task_data __section(.data..init_task)
>  #else
>  #define __init_task_data /**/
>  #endif
>
>  /* Attach to the thread_info data structure for proper alignment */
> -#define __init_thread_info __attribute__((__section__(".data..init_thread_info")))
> +#define __init_thread_info __section(.data..init_thread_info)
>
>  #endif
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5b8328a99b2a..29debfe4dd0f 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -741,8 +741,7 @@ extern int arch_early_irq_init(void);
>  /*
>   * We want to know which function is an entrypoint of a hardirq or a softirq.
>   */
> -#define __irq_entry             __attribute__((__section__(".irqentry.text")))
> -#define __softirq_entry  \
> -       __attribute__((__section__(".softirqentry.text")))
> +#define __irq_entry    __section(.irqentry.text)
> +#define __softirq_entry        __section(.softirqentry.text)
>
>  #endif
> diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h
> index 95fb9e025247..e17b66221fdd 100644
> --- a/include/linux/sched/debug.h
> +++ b/include/linux/sched/debug.h
> @@ -42,7 +42,7 @@ extern void proc_sched_set_task(struct task_struct *p);
>  #endif
>
>  /* Attach to any functions which should be ignored in wchan output. */
> -#define __sched                __attribute__((__section__(".sched.text")))
> +#define __sched                __section(.sched.text)
>
>  /* Linker adds these: start and end of __sched functions */
>  extern char __sched_text_start[], __sched_text_end[];
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 9cfcc8a756ae..9de652f4e1bd 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -124,7 +124,7 @@ struct srcu_struct {
>  # define __DEFINE_SRCU(name, is_static)                                        \
>         is_static struct srcu_struct name;                              \
>         struct srcu_struct * const __srcu_struct_##name                 \
> -               __section("___srcu_struct_ptrs") = &name
> +               __section(___srcu_struct_ptrs) = &name
>  #else
>  # define __DEFINE_SRCU(name, is_static)                                        \
>         static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data);      \
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>

^ permalink raw reply

* [PATCH iproute2-next v2 0/4] Add devlink-trap support
From: Ido Schimmel @ 2019-08-13  8:31 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

This patchset adds devlink-trap support in iproute2.

Patch #1 increases the number of options devlink can handle.

Patches #2-#3 gradually add support for all devlink-trap commands.

Patch #4 adds a man page for devlink-trap.

See individual commit messages for example usage and output.

Changes in v2:
* Remove report option and monitor command since monitoring is done
  using drop monitor

Ido Schimmel (4):
  devlink: Increase number of supported options
  devlink: Add devlink trap set and show commands
  devlink: Add devlink trap group set and show commands
  devlink: Add man page for devlink-trap

 devlink/devlink.c          | 448 +++++++++++++++++++++++++++++++++++--
 man/man8/devlink-monitor.8 |   3 +-
 man/man8/devlink-trap.8    | 138 ++++++++++++
 man/man8/devlink.8         |  11 +-
 4 files changed, 581 insertions(+), 19 deletions(-)
 create mode 100644 man/man8/devlink-trap.8

-- 
2.21.0


^ permalink raw reply

* [PATCH iproute2-next v2 1/4] devlink: Increase number of supported options
From: Ido Schimmel @ 2019-08-13  8:31 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Currently, the number of supported options is capped at 32 which is a
problem given we are about to add a few more and go over the limit.

Increase the limit to 64 options.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 91c85dc1de73..4ed240e251f5 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -235,7 +235,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER	BIT(28)
 
 struct dl_opts {
-	uint32_t present; /* flags of present items */
+	uint64_t present; /* flags of present items */
 	char *bus_name;
 	char *dev_name;
 	uint32_t port_index;
@@ -735,7 +735,7 @@ static int dl_argv_handle_port(struct dl *dl, char **p_bus_name,
 
 static int dl_argv_handle_both(struct dl *dl, char **p_bus_name,
 			       char **p_dev_name, uint32_t *p_port_index,
-			       uint32_t *p_handle_bit)
+			       uint64_t *p_handle_bit)
 {
 	char *str = dl_argv_next(dl);
 	unsigned int slash_count;
@@ -1015,7 +1015,7 @@ static int param_cmode_get(const char *cmodestr,
 }
 
 struct dl_args_metadata {
-	uint32_t o_flag;
+	uint64_t o_flag;
 	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
 };
 
@@ -1042,10 +1042,10 @@ static const struct dl_args_metadata dl_args_required[] = {
 	{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
 };
 
-static int dl_args_finding_required_validate(uint32_t o_required,
-					     uint32_t o_found)
+static int dl_args_finding_required_validate(uint64_t o_required,
+					     uint64_t o_found)
 {
-	uint32_t o_flag;
+	uint64_t o_flag;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(dl_args_required); i++) {
@@ -1058,16 +1058,16 @@ static int dl_args_finding_required_validate(uint32_t o_required,
 	return 0;
 }
 
-static int dl_argv_parse(struct dl *dl, uint32_t o_required,
-			 uint32_t o_optional)
+static int dl_argv_parse(struct dl *dl, uint64_t o_required,
+			 uint64_t o_optional)
 {
 	struct dl_opts *opts = &dl->opts;
-	uint32_t o_all = o_required | o_optional;
-	uint32_t o_found = 0;
+	uint64_t o_all = o_required | o_optional;
+	uint64_t o_found = 0;
 	int err;
 
 	if (o_required & DL_OPT_HANDLE && o_required & DL_OPT_HANDLEP) {
-		uint32_t handle_bit;
+		uint64_t handle_bit;
 
 		err = dl_argv_handle_both(dl, &opts->bus_name, &opts->dev_name,
 					  &opts->port_index, &handle_bit);
@@ -1446,7 +1446,7 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
-			     uint32_t o_required, uint32_t o_optional)
+			     uint64_t o_required, uint64_t o_optional)
 {
 	int err;
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH iproute2-next v2 2/4] devlink: Add devlink trap set and show commands
From: Ido Schimmel @ 2019-08-13  8:31 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

The trap set command allows the user to set the action of an individual
trap. Example:

# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap

The trap show command allows the user to get the current status of an
individual trap or a dump of all traps in case one is not specified.
When '-s' is specified the trap's statistics are shown. When '-v' is
specified the metadata types the trap can provide are shown. Example:

# devlink -jvps trap show netdevsim/netdevsim10 trap blackhole_route
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "blackhole_route",
                "type": "drop",
                "generic": true,
                "action": "trap",
                "group": "l3_drops",
                "metadata": [ "input_port" ],
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 devlink/devlink.c | 293 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 288 insertions(+), 5 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 4ed240e251f5..81fff4429841 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -233,6 +233,8 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_HEALTH_REPORTER_NAME	BIT(27)
 #define DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD	BIT(27)
 #define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER	BIT(28)
+#define DL_OPT_TRAP_NAME		BIT(29)
+#define DL_OPT_TRAP_ACTION		BIT(30)
 
 struct dl_opts {
 	uint64_t present; /* flags of present items */
@@ -269,6 +271,8 @@ struct dl_opts {
 	const char *reporter_name;
 	uint64_t reporter_graceful_period;
 	bool reporter_auto_recover;
+	const char *trap_name;
+	enum devlink_trap_action trap_action;
 };
 
 struct dl {
@@ -282,6 +286,7 @@ struct dl {
 	bool json_output;
 	bool pretty_output;
 	bool verbose;
+	bool stats;
 	struct {
 		bool present;
 		char *bus_name;
@@ -436,6 +441,19 @@ static const enum mnl_attr_data_type devlink_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT] = MNL_TYPE_U64,
 	[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS] = MNL_TYPE_U64,
 	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_STATS] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_TRAP_NAME] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_TRAP_ACTION] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_TRAP_TYPE] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_TRAP_GENERIC] = MNL_TYPE_FLAG,
+	[DEVLINK_ATTR_TRAP_METADATA] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_TRAP_GROUP_NAME] = MNL_TYPE_STRING,
+};
+
+static const enum mnl_attr_data_type
+devlink_stats_policy[DEVLINK_ATTR_STATS_MAX + 1] = {
+	[DEVLINK_ATTR_STATS_RX_PACKETS] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_STATS_RX_BYTES] = MNL_TYPE_U64,
 };
 
 static int attr_cb(const struct nlattr *attr, void *data)
@@ -454,6 +472,25 @@ static int attr_cb(const struct nlattr *attr, void *data)
 	return MNL_CB_OK;
 }
 
+static int attr_stats_cb(const struct nlattr *attr, void *data)
+{
+	const struct nlattr **tb = data;
+	int type;
+
+	/* Allow the tool to work on top of newer kernels that might contain
+	 * more attributes.
+	 */
+	if (mnl_attr_type_valid(attr, DEVLINK_ATTR_STATS_MAX) < 0)
+		return MNL_CB_OK;
+
+	type = mnl_attr_get_type(attr);
+	if (mnl_attr_validate(attr, devlink_stats_policy[type]) < 0)
+		return MNL_CB_ERROR;
+
+	tb[type] = attr;
+	return MNL_CB_OK;
+}
+
 static int ifname_map_cb(const struct nlmsghdr *nlh, void *data)
 {
 	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
@@ -1014,6 +1051,20 @@ static int param_cmode_get(const char *cmodestr,
 	return 0;
 }
 
+static int trap_action_get(const char *actionstr,
+			   enum devlink_trap_action *p_action)
+{
+	if (strcmp(actionstr, "drop") == 0) {
+		*p_action = DEVLINK_TRAP_ACTION_DROP;
+	} else if (strcmp(actionstr, "trap") == 0) {
+		*p_action = DEVLINK_TRAP_ACTION_TRAP;
+	} else {
+		pr_err("Unknown trap action \"%s\"\n", actionstr);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 struct dl_args_metadata {
 	uint64_t o_flag;
 	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
@@ -1040,6 +1091,7 @@ static const struct dl_args_metadata dl_args_required[] = {
 	{DL_OPT_REGION_ADDRESS,	      "Region address value expected."},
 	{DL_OPT_REGION_LENGTH,	      "Region length value expected."},
 	{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
+	{DL_OPT_TRAP_NAME,            "Trap's name is expected."},
 };
 
 static int dl_args_finding_required_validate(uint64_t o_required,
@@ -1329,6 +1381,25 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_HEALTH_REPORTER_AUTO_RECOVER;
+		} else if (dl_argv_match(dl, "trap") &&
+			   (o_all & DL_OPT_TRAP_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->trap_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_TRAP_NAME;
+		} else if (dl_argv_match(dl, "action") &&
+			   (o_all & DL_OPT_TRAP_ACTION)) {
+			const char *actionstr;
+
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &actionstr);
+			if (err)
+				return err;
+			err = trap_action_get(actionstr, &opts->trap_action);
+			if (err)
+				return err;
+			o_found |= DL_OPT_TRAP_ACTION;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -1442,6 +1513,12 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_HEALTH_REPORTER_AUTO_RECOVER)
 		mnl_attr_put_u8(nlh, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
 				opts->reporter_auto_recover);
+	if (opts->present & DL_OPT_TRAP_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_NAME,
+				  opts->trap_name);
+	if (opts->present & DL_OPT_TRAP_ACTION)
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_TRAP_ACTION,
+				opts->trap_action);
 
 }
 
@@ -1945,6 +2022,30 @@ static void pr_out_entry_end(struct dl *dl)
 		__pr_out_newline();
 }
 
+static void pr_out_stats(struct dl *dl, struct nlattr *nla_stats)
+{
+	struct nlattr *tb[DEVLINK_ATTR_STATS_MAX + 1] = {};
+	int err;
+
+	if (!dl->stats)
+		return;
+
+	err = mnl_attr_parse_nested(nla_stats, attr_stats_cb, tb);
+	if (err != MNL_CB_OK)
+		return;
+
+	pr_out_object_start(dl, "stats");
+	pr_out_object_start(dl, "rx");
+	if (tb[DEVLINK_ATTR_STATS_RX_BYTES])
+		pr_out_u64(dl, "bytes",
+			   mnl_attr_get_u64(tb[DEVLINK_ATTR_STATS_RX_BYTES]));
+	if (tb[DEVLINK_ATTR_STATS_RX_PACKETS])
+		pr_out_u64(dl, "packets",
+			   mnl_attr_get_u64(tb[DEVLINK_ATTR_STATS_RX_PACKETS]));
+	pr_out_object_end(dl);
+	pr_out_object_end(dl);
+}
+
 static const char *param_cmode_name(uint8_t cmode)
 {
 	switch (cmode) {
@@ -3764,6 +3865,10 @@ static const char *cmd_name(uint8_t cmd)
 	case DEVLINK_CMD_REGION_SET: return "set";
 	case DEVLINK_CMD_REGION_NEW: return "new";
 	case DEVLINK_CMD_REGION_DEL: return "del";
+	case DEVLINK_CMD_TRAP_GET: return "get";
+	case DEVLINK_CMD_TRAP_SET: return "set";
+	case DEVLINK_CMD_TRAP_NEW: return "new";
+	case DEVLINK_CMD_TRAP_DEL: return "del";
 	default: return "<unknown cmd>";
 	}
 }
@@ -3792,6 +3897,11 @@ static const char *cmd_obj(uint8_t cmd)
 	case DEVLINK_CMD_REGION_NEW:
 	case DEVLINK_CMD_REGION_DEL:
 		return "region";
+	case DEVLINK_CMD_TRAP_GET:
+	case DEVLINK_CMD_TRAP_SET:
+	case DEVLINK_CMD_TRAP_NEW:
+	case DEVLINK_CMD_TRAP_DEL:
+		return "trap";
 	default: return "<unknown obj>";
 	}
 }
@@ -3817,6 +3927,7 @@ static bool cmd_filter_check(struct dl *dl, uint8_t cmd)
 }
 
 static void pr_out_region(struct dl *dl, struct nlattr **tb);
+static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array);
 
 static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 {
@@ -3872,6 +3983,22 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 		pr_out_mon_header(genl->cmd);
 		pr_out_region(dl, tb);
 		break;
+	case DEVLINK_CMD_TRAP_GET: /* fall through */
+	case DEVLINK_CMD_TRAP_SET: /* fall through */
+	case DEVLINK_CMD_TRAP_NEW: /* fall through */
+	case DEVLINK_CMD_TRAP_DEL:
+		mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+		if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_TYPE] ||
+		    !tb[DEVLINK_ATTR_TRAP_ACTION] ||
+		    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_METADATA] ||
+		    !tb[DEVLINK_ATTR_STATS])
+			return MNL_CB_ERROR;
+		pr_out_mon_header(genl->cmd);
+		pr_out_trap(dl, tb, false);
+		break;
 	}
 	return MNL_CB_OK;
 }
@@ -3885,7 +4012,8 @@ static int cmd_mon_show(struct dl *dl)
 	while ((cur_obj = dl_argv_index(dl, index++))) {
 		if (strcmp(cur_obj, "all") != 0 &&
 		    strcmp(cur_obj, "dev") != 0 &&
-		    strcmp(cur_obj, "port") != 0) {
+		    strcmp(cur_obj, "port") != 0 &&
+		    strcmp(cur_obj, "trap") != 0) {
 			pr_err("Unknown object \"%s\"\n", cur_obj);
 			return -EINVAL;
 		}
@@ -3902,7 +4030,7 @@ static int cmd_mon_show(struct dl *dl)
 static void cmd_mon_help(void)
 {
 	pr_err("Usage: devlink monitor [ all | OBJECT-LIST ]\n"
-	       "where  OBJECT-LIST := { dev | port }\n");
+	       "where  OBJECT-LIST := { dev | port | trap }\n");
 }
 
 static int cmd_mon(struct dl *dl)
@@ -6330,12 +6458,160 @@ static int cmd_health(struct dl *dl)
 	return -ENOENT;
 }
 
+static const char *trap_type_name(uint8_t type)
+{
+	switch (type) {
+	case DEVLINK_TRAP_TYPE_DROP:
+		return "drop";
+	case DEVLINK_TRAP_TYPE_EXCEPTION:
+		return "exception";
+	default:
+		return "<unknown type>";
+	}
+}
+
+static const char *trap_action_name(uint8_t action)
+{
+	switch (action) {
+	case DEVLINK_TRAP_ACTION_DROP:
+		return "drop";
+	case DEVLINK_TRAP_ACTION_TRAP:
+		return "trap";
+	default:
+		return "<unknown action>";
+	}
+}
+
+static const char *trap_metadata_name(const struct nlattr *attr)
+{
+	switch (attr->nla_type) {
+	case DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT:
+		return "input_port";
+	default:
+		return "<unknown metadata type>";
+	}
+}
+static void pr_out_trap_metadata(struct dl *dl, struct nlattr *attr)
+{
+	struct nlattr *attr_metadata;
+
+	pr_out_array_start(dl, "metadata");
+	mnl_attr_for_each_nested(attr_metadata, attr)
+		pr_out_str_value(dl, trap_metadata_name(attr_metadata));
+	pr_out_array_end(dl);
+}
+
+static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array)
+{
+	uint8_t action = mnl_attr_get_u8(tb[DEVLINK_ATTR_TRAP_ACTION]);
+	uint8_t type = mnl_attr_get_u8(tb[DEVLINK_ATTR_TRAP_TYPE]);
+
+	if (array)
+		pr_out_handle_start_arr(dl, tb);
+	else
+		__pr_out_handle_start(dl, tb, true, false);
+
+	pr_out_str(dl, "name", mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_NAME]));
+	pr_out_str(dl, "type", trap_type_name(type));
+	pr_out_bool(dl, "generic", !!tb[DEVLINK_ATTR_TRAP_GENERIC]);
+	pr_out_str(dl, "action", trap_action_name(action));
+	pr_out_str(dl, "group",
+		   mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_GROUP_NAME]));
+	if (dl->verbose)
+		pr_out_trap_metadata(dl, tb[DEVLINK_ATTR_TRAP_METADATA]);
+	pr_out_stats(dl, tb[DEVLINK_ATTR_STATS]);
+	pr_out_handle_end(dl);
+}
+
+static int cmd_trap_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct dl *dl = data;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_TRAP_NAME] || !tb[DEVLINK_ATTR_TRAP_TYPE] ||
+	    !tb[DEVLINK_ATTR_TRAP_ACTION] ||
+	    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+	    !tb[DEVLINK_ATTR_TRAP_METADATA] || !tb[DEVLINK_ATTR_STATS])
+		return MNL_CB_ERROR;
+
+	pr_out_trap(dl, tb, true);
+
+	return MNL_CB_OK;
+}
+
+static void cmd_trap_help(void)
+{
+	pr_err("Usage: devlink trap set DEV trap TRAP [ action { trap | drop } ]\n");
+	pr_err("       devlink trap show [ DEV trap TRAP ]\n");
+}
+
+static int cmd_trap_show(struct dl *dl)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	struct nlmsghdr *nlh;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl,
+					DL_OPT_HANDLE | DL_OPT_TRAP_NAME, 0);
+		if (err)
+			return err;
+	}
+
+	pr_out_section_start(dl, "trap");
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_trap_show_cb, dl);
+	pr_out_section_end(dl);
+
+	return err;
+}
+
+static int cmd_trap_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_TRAP_NAME,
+				DL_OPT_TRAP_ACTION);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_trap(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_trap_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_trap_show(dl);
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_trap_set(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static void help(void)
 {
 	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       devlink [ -f[orce] ] -b[atch] filename\n"
-	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health }\n"
-	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] }\n");
+	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health | trap }\n"
+	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] -s[tatistics] }\n");
 }
 
 static int dl_cmd(struct dl *dl, int argc, char **argv)
@@ -6370,6 +6646,9 @@ static int dl_cmd(struct dl *dl, int argc, char **argv)
 	} else if (dl_argv_match(dl, "health")) {
 		dl_arg_inc(dl);
 		return cmd_health(dl);
+	} else if (dl_argv_match(dl, "trap")) {
+		dl_arg_inc(dl);
+		return cmd_trap(dl);
 	}
 	pr_err("Object \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
@@ -6479,6 +6758,7 @@ int main(int argc, char **argv)
 		{ "json",		no_argument,		NULL, 'j' },
 		{ "pretty",		no_argument,		NULL, 'p' },
 		{ "verbose",		no_argument,		NULL, 'v' },
+		{ "statistics",		no_argument,		NULL, 's' },
 		{ NULL, 0, NULL, 0 }
 	};
 	const char *batch_file = NULL;
@@ -6494,7 +6774,7 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
-	while ((opt = getopt_long(argc, argv, "Vfb:njpv",
+	while ((opt = getopt_long(argc, argv, "Vfb:njpvs",
 				  long_options, NULL)) >= 0) {
 
 		switch (opt) {
@@ -6520,6 +6800,9 @@ int main(int argc, char **argv)
 		case 'v':
 			dl->verbose = true;
 			break;
+		case 's':
+			dl->stats = true;
+			break;
 		default:
 			pr_err("Unknown option.\n");
 			help();
-- 
2.21.0


^ permalink raw reply related

* [PATCH iproute2-next v2 3/4] devlink: Add devlink trap group set and show commands
From: Ido Schimmel @ 2019-08-13  8:31 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

These commands are similar to the trap set and show commands, but
operate on a trap group and not individual traps. Example:

# devlink trap group set netdevsim/netdevsim10 group l3_drops action trap
# devlink -jps trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true,
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 devlink/devlink.c | 135 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 133 insertions(+), 2 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 81fff4429841..2f084c020765 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -235,6 +235,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER	BIT(28)
 #define DL_OPT_TRAP_NAME		BIT(29)
 #define DL_OPT_TRAP_ACTION		BIT(30)
+#define DL_OPT_TRAP_GROUP_NAME		BIT(31)
 
 struct dl_opts {
 	uint64_t present; /* flags of present items */
@@ -272,6 +273,7 @@ struct dl_opts {
 	uint64_t reporter_graceful_period;
 	bool reporter_auto_recover;
 	const char *trap_name;
+	const char *trap_group_name;
 	enum devlink_trap_action trap_action;
 };
 
@@ -1092,6 +1094,7 @@ static const struct dl_args_metadata dl_args_required[] = {
 	{DL_OPT_REGION_LENGTH,	      "Region length value expected."},
 	{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
 	{DL_OPT_TRAP_NAME,            "Trap's name is expected."},
+	{DL_OPT_TRAP_GROUP_NAME,      "Trap group's name is expected."},
 };
 
 static int dl_args_finding_required_validate(uint64_t o_required,
@@ -1388,6 +1391,13 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_TRAP_NAME;
+		} else if (dl_argv_match(dl, "group") &&
+			   (o_all & DL_OPT_TRAP_GROUP_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->trap_group_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_TRAP_GROUP_NAME;
 		} else if (dl_argv_match(dl, "action") &&
 			   (o_all & DL_OPT_TRAP_ACTION)) {
 			const char *actionstr;
@@ -1516,6 +1526,9 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_TRAP_NAME)
 		mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_NAME,
 				  opts->trap_name);
+	if (opts->present & DL_OPT_TRAP_GROUP_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_GROUP_NAME,
+				  opts->trap_group_name);
 	if (opts->present & DL_OPT_TRAP_ACTION)
 		mnl_attr_put_u8(nlh, DEVLINK_ATTR_TRAP_ACTION,
 				opts->trap_action);
@@ -3869,6 +3882,10 @@ static const char *cmd_name(uint8_t cmd)
 	case DEVLINK_CMD_TRAP_SET: return "set";
 	case DEVLINK_CMD_TRAP_NEW: return "new";
 	case DEVLINK_CMD_TRAP_DEL: return "del";
+	case DEVLINK_CMD_TRAP_GROUP_GET: return "get";
+	case DEVLINK_CMD_TRAP_GROUP_SET: return "set";
+	case DEVLINK_CMD_TRAP_GROUP_NEW: return "new";
+	case DEVLINK_CMD_TRAP_GROUP_DEL: return "del";
 	default: return "<unknown cmd>";
 	}
 }
@@ -3902,6 +3919,11 @@ static const char *cmd_obj(uint8_t cmd)
 	case DEVLINK_CMD_TRAP_NEW:
 	case DEVLINK_CMD_TRAP_DEL:
 		return "trap";
+	case DEVLINK_CMD_TRAP_GROUP_GET:
+	case DEVLINK_CMD_TRAP_GROUP_SET:
+	case DEVLINK_CMD_TRAP_GROUP_NEW:
+	case DEVLINK_CMD_TRAP_GROUP_DEL:
+		return "trap-group";
 	default: return "<unknown obj>";
 	}
 }
@@ -3928,6 +3950,7 @@ static bool cmd_filter_check(struct dl *dl, uint8_t cmd)
 
 static void pr_out_region(struct dl *dl, struct nlattr **tb);
 static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array);
+static void pr_out_trap_group(struct dl *dl, struct nlattr **tb, bool array);
 
 static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 {
@@ -3999,6 +4022,18 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 		pr_out_mon_header(genl->cmd);
 		pr_out_trap(dl, tb, false);
 		break;
+	case DEVLINK_CMD_TRAP_GROUP_GET: /* fall through */
+	case DEVLINK_CMD_TRAP_GROUP_SET: /* fall through */
+	case DEVLINK_CMD_TRAP_GROUP_NEW: /* fall through */
+	case DEVLINK_CMD_TRAP_GROUP_DEL:
+		mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+		if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+		    !tb[DEVLINK_ATTR_STATS])
+			return MNL_CB_ERROR;
+		pr_out_mon_header(genl->cmd);
+		pr_out_trap_group(dl, tb, false);
+		break;
 	}
 	return MNL_CB_OK;
 }
@@ -4013,7 +4048,8 @@ static int cmd_mon_show(struct dl *dl)
 		if (strcmp(cur_obj, "all") != 0 &&
 		    strcmp(cur_obj, "dev") != 0 &&
 		    strcmp(cur_obj, "port") != 0 &&
-		    strcmp(cur_obj, "trap") != 0) {
+		    strcmp(cur_obj, "trap") != 0 &&
+		    strcmp(cur_obj, "trap-group") != 0) {
 			pr_err("Unknown object \"%s\"\n", cur_obj);
 			return -EINVAL;
 		}
@@ -4030,7 +4066,7 @@ static int cmd_mon_show(struct dl *dl)
 static void cmd_mon_help(void)
 {
 	pr_err("Usage: devlink monitor [ all | OBJECT-LIST ]\n"
-	       "where  OBJECT-LIST := { dev | port | trap }\n");
+	       "where  OBJECT-LIST := { dev | port | trap | trap-group }\n");
 }
 
 static int cmd_mon(struct dl *dl)
@@ -6546,6 +6582,8 @@ static void cmd_trap_help(void)
 {
 	pr_err("Usage: devlink trap set DEV trap TRAP [ action { trap | drop } ]\n");
 	pr_err("       devlink trap show [ DEV trap TRAP ]\n");
+	pr_err("       devlink trap group set DEV group GROUP [ action { trap | drop } ]\n");
+	pr_err("       devlink trap group show [ DEV group GROUP ]\n");
 }
 
 static int cmd_trap_show(struct dl *dl)
@@ -6589,6 +6627,96 @@ static int cmd_trap_set(struct dl *dl)
 	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
 }
 
+static void pr_out_trap_group(struct dl *dl, struct nlattr **tb, bool array)
+{
+	if (array)
+		pr_out_handle_start_arr(dl, tb);
+	else
+		__pr_out_handle_start(dl, tb, true, false);
+
+	pr_out_str(dl, "name",
+		   mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_GROUP_NAME]));
+	pr_out_bool(dl, "generic", !!tb[DEVLINK_ATTR_TRAP_GENERIC]);
+	pr_out_stats(dl, tb[DEVLINK_ATTR_STATS]);
+	pr_out_handle_end(dl);
+}
+
+static int cmd_trap_group_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct dl *dl = data;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] || !tb[DEVLINK_ATTR_STATS])
+		return MNL_CB_ERROR;
+
+	pr_out_trap_group(dl, tb, true);
+
+	return MNL_CB_OK;
+}
+
+static int cmd_trap_group_show(struct dl *dl)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	struct nlmsghdr *nlh;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GROUP_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl,
+					DL_OPT_HANDLE | DL_OPT_TRAP_GROUP_NAME,
+					0);
+		if (err)
+			return err;
+	}
+
+	pr_out_section_start(dl, "trap_group");
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_trap_group_show_cb, dl);
+	pr_out_section_end(dl);
+
+	return err;
+}
+
+static int cmd_trap_group_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GROUP_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl,
+				DL_OPT_HANDLE | DL_OPT_TRAP_GROUP_NAME,
+				DL_OPT_TRAP_ACTION);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_trap_group(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_trap_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_trap_group_show(dl);
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_trap_group_set(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static int cmd_trap(struct dl *dl)
 {
 	if (dl_argv_match(dl, "help")) {
@@ -6601,6 +6729,9 @@ static int cmd_trap(struct dl *dl)
 	} else if (dl_argv_match(dl, "set")) {
 		dl_arg_inc(dl);
 		return cmd_trap_set(dl);
+	} else if (dl_argv_match(dl, "group")) {
+		dl_arg_inc(dl);
+		return cmd_trap_group(dl);
 	}
 	pr_err("Command \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
-- 
2.21.0


^ permalink raw reply related

* [PATCH iproute2-next v2 4/4] devlink: Add man page for devlink-trap
From: Ido Schimmel @ 2019-08-13  8:31 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 man/man8/devlink-monitor.8 |   3 +-
 man/man8/devlink-trap.8    | 138 +++++++++++++++++++++++++++++++++++++
 man/man8/devlink.8         |  11 ++-
 3 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 man/man8/devlink-trap.8

diff --git a/man/man8/devlink-monitor.8 b/man/man8/devlink-monitor.8
index 13fe641dc8f5..fffab3a4ce88 100644
--- a/man/man8/devlink-monitor.8
+++ b/man/man8/devlink-monitor.8
@@ -21,7 +21,7 @@ command is the first in the command line and then the object list.
 .I OBJECT-LIST
 is the list of object types that we want to monitor.
 It may contain
-.BR dev ", " port ".
+.BR dev ", " port ", " trap ", " trap-group .
 
 .B devlink
 opens Devlink Netlink socket, listens on it and dumps state changes.
@@ -31,6 +31,7 @@ opens Devlink Netlink socket, listens on it and dumps state changes.
 .BR devlink-dev (8),
 .BR devlink-sb (8),
 .BR devlink-port (8),
+.BR devlink-trap (8),
 .br
 
 .SH AUTHOR
diff --git a/man/man8/devlink-trap.8 b/man/man8/devlink-trap.8
new file mode 100644
index 000000000000..4f079eb86d7b
--- /dev/null
+++ b/man/man8/devlink-trap.8
@@ -0,0 +1,138 @@
+.TH DEVLINK\-TRAP 8 "2 August 2019" "iproute2" "Linux"
+.SH NAME
+devlink-trap \- devlink trap configuration
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.B trap
+.RI "{ " COMMAND " |"
+.BR help " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-v\fR[\fIerbose\fR] |
+\fB\-s\fR[\fItatistics\fR] }
+
+.ti -8
+.B "devlink trap show"
+.RI "[ " DEV
+.B trap
+.IR TRAP " ]"
+
+.ti -8
+.BI "devlink trap set " DEV " trap " TRAP
+.RB "[ " action " { " trap " | " drop " } ]"
+
+.ti -8
+.B "devlink trap group show"
+.RI "[ " DEV
+.B group
+.IR GROUP " ]"
+
+.ti -8
+.BI "devlink trap group set " DEV " group " GROUP
+.RB "[ " action " { " trap " | " drop " } ]"
+
+.ti -8
+.B devlink trap help
+
+.SH "DESCRIPTION"
+.SS devlink trap show - display available packet traps and their attributes
+
+.PP
+.I "DEV"
+- specifies the devlink device from which to show packet traps.
+If this argument is omitted all packet traps of all devices are listed.
+
+.PP
+.BI "trap " TRAP
+- specifies the packet trap.
+Only applicable if a devlink device is also specified.
+
+.SS devlink trap set - set attributes of a packet trap
+
+.PP
+.I "DEV"
+- specifies the devlink device the packet trap belongs to.
+
+.PP
+.BI "trap " TRAP
+- specifies the packet trap.
+
+.TP
+.BR action " { " trap " | " drop " } "
+packet trap action.
+
+.I trap
+- the sole copy of the packet is sent to the CPU.
+
+.I drop
+- the packet is dropped by the underlying device and a copy is not sent to the CPU.
+
+.SS devlink trap group show - display available packet trap groups and their attributes
+
+.PP
+.I "DEV"
+- specifies the devlink device from which to show packet trap groups.
+If this argument is omitted all packet trap groups of all devices are listed.
+
+.PP
+.BI "group " GROUP
+- specifies the packet trap group.
+Only applicable if a devlink device is also specified.
+
+.SS devlink trap group set - set attributes of a packet trap group
+
+.PP
+.I "DEV"
+- specifies the devlink device the packet trap group belongs to.
+
+.PP
+.BI "group " GROUP
+- specifies the packet trap group.
+
+.TP
+.BR action " { " trap " | " drop " } "
+packet trap action. The action is set for all the packet traps member in the
+trap group. The actions of non-drop traps cannot be changed and are thus
+skipped.
+
+.SH "EXAMPLES"
+.PP
+devlink trap show
+.RS 4
+List available packet traps.
+.RE
+.PP
+devlink trap group show
+.RS 4
+List available packet trap groups.
+.RE
+.PP
+devlink -vs trap show pci/0000:01:00.0 trap source_mac_is_multicast
+.RS 4
+Show attributes and statistics of a specific packet trap.
+.RE
+.PP
+devlink -s trap group show pci/0000:01:00.0 group l2_drops
+.RS 4
+Show attributes and statistics of a specific packet trap group.
+.RE
+.PP
+devlink trap set pci/0000:01:00.0 trap source_mac_is_multicast action trap
+.RS 4
+Set the action of a specific packet trap to 'trap'.
+
+.SH SEE ALSO
+.BR devlink (8),
+.BR devlink-dev (8),
+.BR devlink-monitor (8),
+.br
+
+.SH AUTHOR
+Ido Schimmel <idosch@mellanox.com>
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 13d4dcd908b3..12d489440a3d 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -7,7 +7,7 @@ devlink \- Devlink tool
 .in +8
 .ti -8
 .B devlink
-.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health " } { " COMMAND " | "
+.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health | trap " } { " COMMAND " | "
 .BR help " }"
 .sp
 
@@ -51,6 +51,10 @@ When combined with -j generate a pretty JSON output.
 .BR "\-v" , " --verbose"
 Turn on verbose output.
 
+.TP
+.BR "\-s" , " --statistics"
+Output statistics.
+
 .SS
 .I OBJECT
 
@@ -82,6 +86,10 @@ Turn on verbose output.
 .B health
 - devlink reporting and recovery
 
+.TP
+.B trap
+- devlink trap configuration
+
 .SS
 .I COMMAND
 
@@ -114,6 +122,7 @@ Exit status is 0 if command was successful or a positive integer upon failure.
 .BR devlink-resource (8),
 .BR devlink-region (8),
 .BR devlink-health (8),
+.BR devlink-trap (8),
 .br
 
 .SH REPORTING BUGS
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH 14/16] include/linux: prefer __section from compiler_attributes.h
From: Will Deacon @ 2019-08-13  8:32 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: akpm, sedat.dilek, jpoimboe, yhs, miguel.ojeda.sandonis,
	clang-built-linux, Luc Van Oostenryck, Lai Jiangshan,
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra (Intel), Nicholas Piggin, Jiri Kosina,
	Ard Biesheuvel, Michael Ellerman, Masahiro Yamada,
	Hans Liljestrand, Elena Reshetova, David Windsor, Marc Zyngier,
	Ming Lei, Dou Liyang, Julien Thierry, Mauro Carvalho Chehab,
	Jens Axboe, linux-kernel, linux-sparse, rcu, netdev, bpf
In-Reply-To: <20190812215052.71840-14-ndesaulniers@google.com>

On Mon, Aug 12, 2019 at 02:50:47PM -0700, Nick Desaulniers wrote:
> Link: https://github.com/ClangBuiltLinux/linux/issues/619
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---

-ENOCOMMITMESSAGE

Otherwise, patch looks good to me.

Will

^ permalink raw reply

* Re: tun: mark small packets as owned by the tap sock
From: Jason Wang @ 2019-08-13  8:33 UTC (permalink / raw)
  To: Dave Jones, Alexis Bauvin; +Cc: netdev
In-Reply-To: <20190812221954.GA13314@codemonkey.org.uk>


On 2019/8/13 上午6:19, Dave Jones wrote:
> On Wed, Aug 07, 2019 at 12:30:07AM +0000, Linux Kernel wrote:
>   > Commit:     4b663366246be1d1d4b1b8b01245b2e88ad9e706
>   > Parent:     16b2084a8afa1432d14ba72b7c97d7908e178178
>   > Web:        https://git.kernel.org/torvalds/c/4b663366246be1d1d4b1b8b01245b2e88ad9e706
>   > Author:     Alexis Bauvin <abauvin@scaleway.com>
>   > AuthorDate: Tue Jul 23 16:23:01 2019 +0200
>   >
>   >     tun: mark small packets as owned by the tap sock
>   >
>   >     - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
>   >
>   >     Small packets going out of a tap device go through an optimized code
>   >     path that uses build_skb() rather than sock_alloc_send_pskb(). The
>   >     latter calls skb_set_owner_w(), but the small packet code path does not.
>   >
>   >     The net effect is that small packets are not owned by the userland
>   >     application's socket (e.g. QEMU), while large packets are.
>   >     This can be seen with a TCP session, where packets are not owned when
>   >     the window size is small enough (around PAGE_SIZE), while they are once
>   >     the window grows (note that this requires the host to support virtio
>   >     tso for the guest to offload segmentation).
>   >     All this leads to inconsistent behaviour in the kernel, especially on
>   >     netfilter modules that uses sk->socket (e.g. xt_owner).
>   >
>   >     Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet")
>   >     Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
>   >     Acked-by: Jason Wang <jasowang@redhat.com>
>
> This commit breaks ipv6 routing when I deployed on it a linode.
> It seems to work briefly after boot, and then silently all packets get
> dropped. (Presumably, it's dropping RA or ND packets)
>
> With this reverted, everything works as it did in rc3.
>
> 	Dave


Hi:

Two questions:

- Are you using XDP for TUN?

- Does it work before 66ccbc9c87c2? If yes, could you show us the result 
of net_dropmonitor?

Thanks


>

^ permalink raw reply

* Re: [PATCH iproute2-next v2 2/4] devlink: Add devlink trap set and show commands
From: Jiri Pirko @ 2019-08-13  8:44 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, dsahern, stephen, jiri, mlxsw, Ido Schimmel
In-Reply-To: <20190813083143.13509-3-idosch@idosch.org>

Tue, Aug 13, 2019 at 10:31:41AM CEST, idosch@idosch.org wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>The trap set command allows the user to set the action of an individual
>trap. Example:
>
># devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap
>
>The trap show command allows the user to get the current status of an
>individual trap or a dump of all traps in case one is not specified.
>When '-s' is specified the trap's statistics are shown. When '-v' is
>specified the metadata types the trap can provide are shown. Example:
>
># devlink -jvps trap show netdevsim/netdevsim10 trap blackhole_route
>{
>    "trap": {
>        "netdevsim/netdevsim10": [ {
>                "name": "blackhole_route",
>                "type": "drop",
>                "generic": true,
>                "action": "trap",
>                "group": "l3_drops",
>                "metadata": [ "input_port" ],
>                "stats": {
>                    "rx": {
>                        "bytes": 0,
>                        "packets": 0
>                    }
>                }
>            } ]
>    }
>}
>
>Signed-off-by: Ido Schimmel <idosch@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH v4 8/9] MIPS: SGI-IP27: fix readb/writeb addressing
From: Philippe Mathieu-Daudé @ 2019-08-13  8:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Andy Shevchenko
  Cc: Thomas Bogendoerfer, Ralf Baechle, Paul Burton, James Hogan,
	Dmitry Torokhov, Lee Jones, David S. Miller, Srinivas Kandagatla,
	Alessandro Zummo, Alexandre Belloni, Jiri Slaby, Evgeniy Polyakov,
	linux-mips, Linux Kernel Mailing List, linux-input, netdev,
	open list:REAL TIME CLOCK (RTC) SUBSYSTEM,
	open list:SERIAL DRIVERS
In-Reply-To: <20190811072907.GA1416@kroah.com>

Hi Thomas,

On 8/11/19 9:29 AM, Greg Kroah-Hartman wrote:
> On Sat, Aug 10, 2019 at 04:22:23PM +0300, Andy Shevchenko wrote:
>> On Fri, Aug 9, 2019 at 1:34 PM Thomas Bogendoerfer
>> <tbogendoerfer@suse.de> wrote:
>>>
>>> Our chosen byte swapping, which is what firmware already uses, is to
>>> do readl/writel by normal lw/sw intructions (data invariance). This
>>> also means we need to mangle addresses for u8 and u16 accesses. The
>>> mangling for 16bit has been done aready, but 8bit one was missing.
>>> Correcting this causes different addresses for accesses to the
>>> SuperIO and local bus of the IOC3 chip. This is fixed by changing
>>> byte order in ioc3 and m48rtc_rtc structs.
>>
>>>  /* serial port register map */
>>>  struct ioc3_serialregs {
>>> -       uint32_t        sscr;
>>> -       uint32_t        stpir;
>>> -       uint32_t        stcir;
>>> -       uint32_t        srpir;
>>> -       uint32_t        srcir;
>>> -       uint32_t        srtr;
>>> -       uint32_t        shadow;
>>> +       u32     sscr;
>>> +       u32     stpir;
>>> +       u32     stcir;
>>> +       u32     srpir;
>>> +       u32     srcir;
>>> +       u32     srtr;
>>> +       u32     shadow;
>>>  };
>>
>> Isn't it a churn? AFAIU kernel documentation the uint32_t is okay to
>> use, just be consistent inside one module / driver.
>> Am I mistaken?
> 
> No, but really it uint* shouldn't be used anywhere in the kernel source
> as it does not make sense.

If you respin your series, please send this cleanup as a separate patch.

Thanks,

Phil.

^ permalink raw reply

* Re: [PATCH 05/16] sh: prefer __section from compiler_attributes.h
From: Yoshinori Sato @ 2019-08-13  8:51 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: akpm, sedat.dilek, jpoimboe, yhs, miguel.ojeda.sandonis,
	clang-built-linux, Rich Felker, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, linux-sh,
	linux-kernel, netdev, bpf
In-Reply-To: <20190812215052.71840-5-ndesaulniers@google.com>

On Tue, 13 Aug 2019 06:50:38 +0900,
Nick Desaulniers wrote:
> 
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
>  arch/sh/include/asm/cache.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/sh/include/asm/cache.h b/arch/sh/include/asm/cache.h
> index 2408ac4873aa..07ddf31124a3 100644
> --- a/arch/sh/include/asm/cache.h
> +++ b/arch/sh/include/asm/cache.h
> @@ -15,7 +15,7 @@
>  
>  #define L1_CACHE_BYTES		(1 << L1_CACHE_SHIFT)
>  
> -#define __read_mostly __attribute__((__section__(".data..read_mostly")))
> +#define __read_mostly __section(.data..read_mostly)
>  
>  #ifndef __ASSEMBLY__
>  struct cache_info {
> -- 
> 2.23.0.rc1.153.gdeed80330f-goog
> 

Applied sh-next.
Thanks.

-- 
Yosinori Sato

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-13  8:58 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Antoine Tenart, davem@davemloft.net, sd@queasysnail.net,
	andrew@lunn.ch, f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <e96fa4ae-1f2c-c1be-b2d8-060217d8e151@aquantia.com>

Hi Igor,

On Sat, Aug 10, 2019 at 01:20:32PM +0000, Igor Russkikh wrote:
> On 08.08.2019 17:05, Antoine Tenart wrote:
> 
> > The Rx and TX handlers are modified to take in account the special case
> > were the MACsec transformation happens in the hardware, whether in a PHY
> > or in a MAC, as the packets seen by the networking stack on both the
> 
> Don't you think we may eventually may need xmit / handle_frame ops to be
> a part of macsec_ops?
> 
> That way software macsec could be extract to just another type of offload.
> The drawback of current code is it doesn't show explicitly the path of
> offloaded packets. It is hidden in `handle_not_macsec` and in
> `macsec_start_xmit` branch. This makes incorrect counters to tick (see my below
> comment)
> 
> Another thing is that both xmit / macsec_handle_frame can't now be customized
> by device driver. But this may be required.
> We for example have usecases and HW features to allow specific flows to bypass
> macsec encryption. This is normally used for macsec key control protocols,
> identified by ethertype. Your phy is also capable on that as I see.

I think this question is linked to the use of a MACsec virtual interface
when using h/w offloading. The starting point for me was that I wanted
to reuse the data structures and the API exposed to the userspace by the
s/w implementation of MACsec. I then had two choices: keeping the exact
same interface for the user (having a virtual MACsec interface), or
registering the MACsec genl ops onto the real net devices (and making
the s/w implementation a virtual net dev and a provider of the MACsec
"offloading" ops).

The advantages of the first option were that nearly all the logic of the
s/w implementation could be kept and especially that it would be
transparent for the user to use both implementations of MACsec. But this
raised an issue as I had to modify the xmit / handle_frame ops to let
all the traffic pass. This is because we have no way of knowing if a
frame was handled by the MACsec h/w or not in ingress. So the virtual
interface here only serve as the entrypoint for the API...

The second option would have the advantage to better represent the actual
flow, but the way of configuring MACsec would be a bit different for the
user, whether he wants to use s/w or h/w MACsec. If we were to do this I
think we could extract the genl functions from the MACsec s/w
implementation, and let it implement the MACsec ops (exactly as the
offloading drivers).

I'm open to discussing this :)

As for the need for xmit / handle_frame ops (for a MAC w/ MACsec
offloading), I'd say the xmit / handle_frame ops of the real net device
driver could be used as the one of the MACsec virtual interface do not
do much (regardless of the implementation choice discussed above).

> > @@ -2546,11 +2814,15 @@ static netdev_tx_t macsec_start_xmit(struct sk_buff *skb,
> >  {
> >  	struct macsec_dev *macsec = netdev_priv(dev);
> >  	struct macsec_secy *secy = &macsec->secy;
> > +	struct macsec_tx_sc *tx_sc = &secy->tx_sc;
> >  	struct pcpu_secy_stats *secy_stats;
> > +	struct macsec_tx_sa *tx_sa;
> >  	int ret, len;
> >  
> > +	tx_sa = macsec_txsa_get(tx_sc->sa[tx_sc->encoding_sa]);
> 
> Declared, but not used?

I'll remove it then.

> >  	/* 10.5 */
> > -	if (!secy->protect_frames) {
> > +	if (!secy->protect_frames || macsec_get_ops(netdev_priv(dev), NULL)) {
> >  		secy_stats = this_cpu_ptr(macsec->stats);
> >  		u64_stats_update_begin(&secy_stats->syncp);
> >  		secy_stats->stats.OutPktsUntagged++;
> 
> Here you use same `if` for sw and hw flows, this making `OutPktsUntagged`
> counter invalid.

Right, I'll try to fix that.

> >  	struct macsec_dev *macsec = macsec_priv(dev);
> > -	struct net_device *real_dev;
> > +	struct net_device *real_dev, *loop_dev;
> > +	struct macsec_context ctx;
> > +	const struct macsec_ops *ops;
> > +	struct net *loop_net;
> 
> Reverse Christmas tree is normally a formatting requirement where possible.

Sure.

> > +	for_each_net(loop_net) {
> > +		for_each_netdev(loop_net, loop_dev) {
> > +			struct macsec_dev *priv;
> > +
> > +			if (!netif_is_macsec(loop_dev))
> > +				continue;
> > +
> > +			priv = macsec_priv(loop_dev);
> > +
> > +			/* A limitation of the MACsec h/w offloading is only a
> > +			 * single MACsec interface can be created for a given
> > +			 * real interface.
> > +			 */
> > +			if (macsec_get_ops(netdev_priv(dev), NULL) &&
> > +			    priv->real_dev == real_dev)
> > +				return -EBUSY;
> > +		}
> > +	}
> > +
> 
> There is no need to do this search loop if `macsec_get_ops(..) == NULL` ?
> So you can extract this check before `for_each_net` for SW macsec...

Right, I'll fix it!

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH v2 bpf-next] mm: mmap: increase sockets maximum memory size pgoff for 32bits
From: Ivan Khoronzhuk @ 2019-08-13  9:12 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Björn Töpel, linux-mm, Xdp, Network Development, bpf,
	linux-kernel, akpm, Alexei Starovoitov, Karlsson, Magnus
In-Reply-To: <CAJ8uoz0bBhdQSocQz8Y9tvrGCsCE9TDf3m1u6=sL4Eo5tZ17YQ@mail.gmail.com>

On Tue, Aug 13, 2019 at 10:02:54AM +0200, Magnus Karlsson wrote:
>On Mon, Aug 12, 2019 at 2:45 PM Ivan Khoronzhuk
><ivan.khoronzhuk@linaro.org> wrote:
>>
>> The AF_XDP sockets umem mapping interface uses XDP_UMEM_PGOFF_FILL_RING
>> and XDP_UMEM_PGOFF_COMPLETION_RING offsets. The offsets seems like are
>> established already and are part of configuration interface.
>>
>> But for 32-bit systems, while AF_XDP socket configuration, the values
>> are to large to pass maximum allowed file size verification.
>> The offsets can be tuned ofc, but instead of changing existent
>> interface - extend max allowed file size for sockets.
>
>Can you use mmap2() instead that takes a larger offset (2^44) even on
>32-bit systems?

That's for mmap2.

>
>/Magnus
>
>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> ---
>>
>> Based on bpf-next/master
>>
>> v2..v1:
>>         removed not necessarily #ifdev as ULL and UL for 64 has same size
>>
>>  mm/mmap.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 7e8c3e8ae75f..578f52812361 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1358,6 +1358,9 @@ static inline u64 file_mmap_size_max(struct file *file, struct inode *inode)
>>         if (S_ISBLK(inode->i_mode))
>>                 return MAX_LFS_FILESIZE;
>>
>> +       if (S_ISSOCK(inode->i_mode))
>> +               return MAX_LFS_FILESIZE;
>> +
>>         /* Special "we do even unsigned file positions" case */
>>         if (file->f_mode & FMODE_UNSIGNED_OFFSET)
>>                 return 0;
>> --
>> 2.17.1
>>

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox