Netdev List
 help / color / mirror / Atom feed
* [RFC PATCH 04/12] soc: qcom: ipa: immediate commands
From: Alex Elder @ 2018-11-07  0:32 UTC (permalink / raw)
  To: davem, arnd, bjorn.andersson, ilias.apalodimas
  Cc: netdev, devicetree, linux-arm-msm, linux-soc, linux-arm-kernel,
	linux-kernel, syadagir, mjavid, robh+dt, mark.rutland
In-Reply-To: <20181107003250.5832-1-elder@linaro.org>

This patch contains (mostly) code implementing "immediate commands."
(The source files are still named "ipahal" for historical reasons.)

One channel (APPS CMD_PROD) is used for sending commands *to* the
IPA itself, rather than passing data through it.  These immediate
commands are issued to the IPA using the normal GSI queueing
mechanism.  And each command's completion is handled using the
normal GSI transfer completion mechanisms.

In addition to immediate commands, the "IPA HAL" includes code for
interpreting status packets that are supplied to the IPA on consumer
channels.

Signed-off-by: Alex Elder <elder@linaro.org>
---
 drivers/net/ipa/ipahal.c | 541 +++++++++++++++++++++++++++++++++++++++
 drivers/net/ipa/ipahal.h | 253 ++++++++++++++++++
 2 files changed, 794 insertions(+)
 create mode 100644 drivers/net/ipa/ipahal.c
 create mode 100644 drivers/net/ipa/ipahal.h

diff --git a/drivers/net/ipa/ipahal.c b/drivers/net/ipa/ipahal.c
new file mode 100644
index 000000000000..de00bcd54d4f
--- /dev/null
+++ b/drivers/net/ipa/ipahal.c
@@ -0,0 +1,541 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ * Copyright (C) 2018 Linaro Ltd.
+ */
+
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <asm/unaligned.h>
+
+#include "ipahal.h"
+#include "ipa_i.h"	/* ipa_err() */
+#include "ipa_dma.h"
+
+/**
+ * DOC:  IPA Immediate Commands
+ *
+ * The APPS_CMD_PROD channel is used to issue immediate commands to
+ * the IPA.  An immediate command is generally used to request the
+ * IPA do something other than data transfer.
+ *
+ * An immediate command is represented by a GSI transfer element.
+ * Each immediate command has a well-defined format, with a known
+ * length.  The transfer element's length field can therefore be
+ * used to hold a command's opcode.  The "payload" of an immediate
+ * command contains additional information required for the command.
+ * It resides in DRAM and is referred to using the DMA memory data
+ * pointer (the same one used to refer to the data in a "normal"
+ * transfer).
+ *
+ * Immediate commands are issued to the IPA through the APPS_CMD_PROD
+ * channel using the normal GSI queueing mechanism.  And each command's
+ * completion is handled using the normal GSI transfer completion
+ * mechanisms.
+ */
+
+/**
+ * struct ipahal_context - HAL global context data
+ * @empty_fltrt_tbl:	Empty table to be used for table initialization
+ */
+static struct ipahal_context {
+	struct ipa_dma_mem empty_fltrt_tbl;
+} ipahal_ctx_struct;
+static struct ipahal_context *ipahal_ctx = &ipahal_ctx_struct;
+
+/* enum ipa_pipeline_clear_option - Values for pipeline clear waiting options
+ * @IPAHAL_HPS_CLEAR: Wait for HPS clear. All queues except high priority queue
+ *  shall not be serviced until HPS is clear of packets or immediate commands.
+ *  The high priority Rx queue / Q6ZIP group shall still be serviced normally.
+ *
+ * @IPAHAL_SRC_GRP_CLEAR: Wait for originating source group to be clear
+ *  (for no packet contexts allocated to the originating source group).
+ *  The source group / Rx queue shall not be serviced until all previously
+ *  allocated packet contexts are released. All other source groups/queues shall
+ *  be serviced normally.
+ *
+ * @IPAHAL_FULL_PIPELINE_CLEAR: Wait for full pipeline to be clear.
+ *  All groups / Rx queues shall not be serviced until IPA pipeline is fully
+ *  clear. This should be used for debug only.
+ *
+ *  The values assigned to these are assumed by the REGISTER_WRITE
+ *  (struct ipa_imm_cmd_hw_register_write) and the DMA_SHARED_MEM
+ *  (struct ipa_imm_cmd_hw_dma_shared_mem) immediate commands for
+ *  IPA version 3 hardware.  They are also used to modify the opcode
+ *  used to implement these commands for IPA version 4 hardware.
+ */
+enum ipahal_pipeline_clear_option {
+	IPAHAL_HPS_CLEAR		= 0,
+	IPAHAL_SRC_GRP_CLEAR		= 1,
+	IPAHAL_FULL_PIPELINE_CLEAR	= 2,
+};
+
+/* Immediate commands H/W structures */
+
+/* struct ipa_imm_cmd_hw_ip_fltrt_init - IP_V*_FILTER_INIT/IP_V*_ROUTING_INIT
+ * command payload in H/W format.
+ * Inits IPv4/v6 routing or filter block.
+ * @hash_rules_addr: Addr in system mem where hashable flt/rt rules starts
+ * @hash_rules_size: Size in bytes of the hashable tbl to cpy to local mem
+ * @hash_local_addr: Addr in shared mem where hashable flt/rt tbl should
+ *  be copied to
+ * @nhash_rules_size: Size in bytes of the non-hashable tbl to cpy to local mem
+ * @nhash_local_addr: Addr in shared mem where non-hashable flt/rt tbl should
+ *  be copied to
+ * @rsvd: reserved
+ * @nhash_rules_addr: Addr in sys mem where non-hashable flt/rt tbl starts
+ */
+struct ipa_imm_cmd_hw_ip_fltrt_init {
+	u64 hash_rules_addr;
+	u64 hash_rules_size	: 12,
+	    hash_local_addr	: 16,
+	    nhash_rules_size	: 12,
+	    nhash_local_addr	: 16,
+	    rsvd		: 8;
+	u64 nhash_rules_addr;
+};
+
+/* struct ipa_imm_cmd_hw_hdr_init_local - HDR_INIT_LOCAL command payload
+ *  in H/W format.
+ * Inits hdr table within local mem with the hdrs and their length.
+ * @hdr_table_addr: Word address in sys mem where the table starts (SRC)
+ * @size_hdr_table: Size of the above (in bytes)
+ * @hdr_addr: header address in IPA sram (used as DST for memory copy)
+ * @rsvd: reserved
+ */
+struct ipa_imm_cmd_hw_hdr_init_local {
+	u64 hdr_table_addr;
+	u32 size_hdr_table	: 12,
+	    hdr_addr		: 16,
+	    rsvd		: 4;
+};
+
+/* struct ipa_imm_cmd_hw_dma_shared_mem - DMA_SHARED_MEM command payload
+ *  in H/W format.
+ * Perform mem copy into or out of the SW area of IPA local mem
+ * @sw_rsvd: Ignored by H/W. My be used by S/W
+ * @size: Size in bytes of data to copy. Expected size is up to 2K bytes
+ * @local_addr: Address in IPA local memory
+ * @direction: Read or write?
+ *	0: IPA write, Write to local address from system address
+ *	1: IPA read, Read from local address to system address
+ * @skip_pipeline_clear: 0 to wait until IPA pipeline is clear. 1 don't wait
+ * @pipeline_clear_options: options for pipeline to clear
+ *	0: HPS - no pkt inside HPS (not grp specific)
+ *	1: source group - The immediate cmd src grp does npt use any pkt ctxs
+ *	2: Wait until no pkt reside inside IPA pipeline
+ *	3: reserved
+ * @rsvd: reserved - should be set to zero
+ * @system_addr: Address in system memory
+ */
+struct ipa_imm_cmd_hw_dma_shared_mem {
+	u16 sw_rsvd;
+	u16 size;
+	u16 local_addr;
+	u16 direction			: 1,
+	    skip_pipeline_clear		: 1,
+	    pipeline_clear_options	: 2,
+	    rsvd			: 12;
+	u64 system_addr;
+};
+
+/* struct ipa_imm_cmd_hw_dma_task_32b_addr -
+ *	IPA_DMA_TASK_32B_ADDR command payload in H/W format.
+ * Used by clients using 32bit addresses. Used to perform DMA operation on
+ *  multiple descriptors.
+ *  The Opcode is dynamic, where it holds the number of buffer to process
+ * @sw_rsvd: Ignored by H/W. My be used by S/W
+ * @cmplt: Complete flag: When asserted IPA will interrupt SW when the entire
+ *  DMA related data was completely xfered to its destination.
+ * @eof: Enf Of Frame flag: When asserted IPA will assert the EOT to the
+ *  dest client. This is used used for aggr sequence
+ * @flsh: Flush flag: When asserted, pkt will go through the IPA blocks but
+ *  will not be xfered to dest client but rather will be discarded
+ * @lock: Lock endpoint flag: When asserted, IPA will stop processing
+ *  descriptors from other EPs in the same src grp (RX queue)
+ * @unlock: Unlock endpoint flag: When asserted, IPA will stop exclusively
+ *  servicing current EP out of the src EPs of the grp (RX queue)
+ * @size1: Size of buffer1 data
+ * @addr1: Pointer to buffer1 data
+ * @packet_size: Total packet size. If a pkt send using multiple DMA_TASKs,
+ *  only the first one needs to have this field set. It will be ignored
+ *  in subsequent DMA_TASKs until the packet ends (EOT). First DMA_TASK
+ *  must contain this field (2 or more buffers) or EOT.
+ */
+struct ipa_imm_cmd_hw_dma_task_32b_addr {
+	u16 sw_rsvd	: 11,
+	    cmplt	: 1,
+	    eof		: 1,
+	    flsh	: 1,
+	    lock	: 1,
+	    unlock	: 1;
+	u16 size1;
+	u32 addr1;
+	u16 packet_size;
+	u16 rsvd1;
+	u32 rsvd2;
+};
+
+/* IPA Status packet H/W structures and info */
+
+/* struct ipa_status_pkt_hw - IPA status packet payload in H/W format.
+ *  This structure describes the status packet H/W structure for the
+ *   following statuses: IPA_STATUS_PACKET, IPA_STATUS_DROPPED_PACKET,
+ *   IPA_STATUS_SUSPENDED_PACKET.
+ *  Other statuses types has different status packet structure.
+ * @status_opcode: The Type of the status (Opcode).
+ * @exception: (not bitmask) - the first exception that took place.
+ *  In case of exception, src endp and pkt len are always valid.
+ * @status_mask: Bit mask specifying on which H/W blocks the pkt was processed.
+ * @pkt_len: Pkt payload len including hdr, include retained hdr if used. Does
+ *  not include padding or checksum trailer len.
+ * @endp_src_idx: Source end point index.
+ * @rsvd1: reserved
+ * @endp_dest_idx: Destination end point index.
+ *  Not valid in case of exception
+ * @rsvd2: reserved
+ * @metadata: meta data value used by packet
+ * @flt_local: Filter table location flag: Does matching flt rule belongs to
+ *  flt tbl that resides in lcl memory? (if not, then system mem)
+ * @flt_hash: Filter hash hit flag: Does matching flt rule was in hash tbl?
+ * @flt_global: Global filter rule flag: Does matching flt rule belongs to
+ *  the global flt tbl? (if not, then the per endp tables)
+ * @flt_ret_hdr: Retain header in filter rule flag: Does matching flt rule
+ *  specifies to retain header?
+ * @flt_rule_id: The ID of the matching filter rule. This info can be combined
+ *  with endp_src_idx to locate the exact rule. ID=0x3ff reserved to specify
+ *  flt miss. In case of miss, all flt info to be ignored
+ * @rt_local: Route table location flag: Does matching rt rule belongs to
+ *  rt tbl that resides in lcl memory? (if not, then system mem)
+ * @rt_hash: Route hash hit flag: Does matching rt rule was in hash tbl?
+ * @ucp: UC Processing flag.
+ * @rt_tbl_idx: Index of rt tbl that contains the rule on which was a match
+ * @rt_rule_id: The ID of the matching rt rule. This info can be combined
+ *  with rt_tbl_idx to locate the exact rule. ID=0x3ff reserved to specify
+ *  rt miss. In case of miss, all rt info to be ignored
+ * @nat_hit: NAT hit flag: Was their NAT hit?
+ * @nat_entry_idx: Index of the NAT entry used of NAT processing
+ * @nat_type: Defines the type of the NAT operation (ignored for now)
+ * @tag_info: S/W defined value provided via immediate command
+ * @seq_num: Per source endp unique packet sequence number
+ * @time_of_day_ctr: running counter from IPA clock
+ * @hdr_local: Header table location flag: In header insertion, was the header
+ *  taken from the table resides in local memory? (If no, then system mem)
+ * @hdr_offset: Offset of used header in the header table
+ * @frag_hit: Frag hit flag: Was their frag rule hit in H/W frag table?
+ * @frag_rule: Frag rule index in H/W frag table in case of frag hit
+ * @hw_specific: H/W specific reserved value
+ */
+#define IPA_RULE_ID_BITS	10	/* See ipahal_is_rule_miss_id() */
+struct ipa_pkt_status_hw {
+	u8  status_opcode;
+	u8  exception;
+	u16 status_mask;
+	u16 pkt_len;
+	u8  endp_src_idx	: 5,
+	    rsvd1		: 3;
+	u8  endp_dest_idx	: 5,
+	    rsvd2		: 3;
+	u32 metadata;
+	u16 flt_local		: 1,
+	    flt_hash		: 1,
+	    flt_global		: 1,
+	    flt_ret_hdr		: 1,
+	    flt_rule_id		: IPA_RULE_ID_BITS,
+	    rt_local		: 1,
+	    rt_hash		: 1;
+	u16 ucp			: 1,
+	    rt_tbl_idx		: 5,
+	    rt_rule_id		: IPA_RULE_ID_BITS;
+	u64 nat_hit		: 1,
+	    nat_entry_idx	: 13,
+	    nat_type		: 2,
+	    tag_info		: 48;
+	u32 seq_num		: 8,
+	    time_of_day_ctr	: 24;
+	u16 hdr_local		: 1,
+	    hdr_offset		: 10,
+	    frag_hit		: 1,
+	    frag_rule		: 4;
+	u16 hw_specific;
+};
+
+void *ipahal_dma_shared_mem_write_pyld(struct ipa_dma_mem *mem, u32 offset)
+{
+	struct ipa_imm_cmd_hw_dma_shared_mem *data;
+
+	ipa_assert(mem->size < 1 << 16);	/* size is 16 bits wide */
+	ipa_assert(offset < 1 << 16);		/* local_addr is 16 bits wide */
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	data->size = mem->size;
+	data->local_addr = offset;
+	data->direction = 0;	/* 0 = write to IPA; 1 = read from IPA */
+	data->skip_pipeline_clear = 0;
+	data->pipeline_clear_options = IPAHAL_HPS_CLEAR;
+	data->system_addr = mem->phys;
+
+	return data;
+}
+
+void *ipahal_hdr_init_local_pyld(struct ipa_dma_mem *mem, u32 offset)
+{
+	struct ipa_imm_cmd_hw_hdr_init_local *data;
+
+	ipa_assert(mem->size < 1 << 12);  /* size_hdr_table is 12 bits wide */
+	ipa_assert(offset < 1 << 16);		/* hdr_addr is 16 bits wide */
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	data->hdr_table_addr = mem->phys;
+	data->size_hdr_table = mem->size;
+	data->hdr_addr = offset;
+
+	return data;
+}
+
+static void *fltrt_init_common(struct ipa_dma_mem *mem, u32 hash_offset,
+			       u32 nhash_offset)
+{
+	struct ipa_imm_cmd_hw_ip_fltrt_init *data;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	data->hash_rules_addr = (u64)mem->phys;
+	data->hash_rules_size = (u32)mem->size;
+	data->hash_local_addr = hash_offset;
+	data->nhash_rules_addr = (u64)mem->phys;
+	data->nhash_rules_size = (u32)mem->size;
+	data->nhash_local_addr = nhash_offset;
+
+	return data;
+}
+
+void *ipahal_ip_v4_routing_init_pyld(struct ipa_dma_mem *mem, u32 hash_offset,
+			       u32 nhash_offset)
+{
+	return fltrt_init_common(mem, hash_offset, nhash_offset);
+}
+
+void *ipahal_ip_v6_routing_init_pyld(struct ipa_dma_mem *mem, u32 hash_offset,
+				     u32 nhash_offset)
+{
+	return fltrt_init_common(mem, hash_offset, nhash_offset);
+}
+
+void *ipahal_ip_v4_filter_init_pyld(struct ipa_dma_mem *mem, u32 hash_offset,
+				    u32 nhash_offset)
+{
+	return fltrt_init_common(mem, hash_offset, nhash_offset);
+}
+
+void *ipahal_ip_v6_filter_init_pyld(struct ipa_dma_mem *mem, u32 hash_offset,
+				    u32 nhash_offset)
+{
+	return fltrt_init_common(mem, hash_offset, nhash_offset);
+}
+
+void *ipahal_dma_task_32b_addr_pyld(struct ipa_dma_mem *mem)
+{
+	struct ipa_imm_cmd_hw_dma_task_32b_addr *data;
+
+	/* size1 and packet_size are both 16 bits wide */
+	ipa_assert(mem->size < 1 << 16);
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	data->cmplt = 0;
+	data->eof = 0;
+	data->flsh = 1;
+	data->lock = 0;
+	data->unlock = 0;
+	data->size1 = mem->size;
+	data->addr1 = mem->phys;
+	data->packet_size = mem->size;
+
+	return data;
+}
+
+void ipahal_payload_free(void *payload)
+{
+	kfree(payload);
+}
+
+/* IPA Packet Status Logic */
+
+/* Maps an exception type returned in a ipa_pkt_status_hw structure
+ * to the ipahal_pkt_status_exception value that represents it in
+ * the exception field of a ipahal_pkt_status structure.  Returns
+ * IPAHAL_PKT_STATUS_EXCEPTION_MAX for an unrecognized value.
+ */
+static enum ipahal_pkt_status_exception
+exception_map(u8 exception, bool is_ipv6)
+{
+	switch (exception) {
+	case 0x00:	return IPAHAL_PKT_STATUS_EXCEPTION_NONE;
+	case 0x01:	return IPAHAL_PKT_STATUS_EXCEPTION_DEAGGR;
+	case 0x04:	return IPAHAL_PKT_STATUS_EXCEPTION_IPTYPE;
+	case 0x08:	return IPAHAL_PKT_STATUS_EXCEPTION_PACKET_LENGTH;
+	case 0x10:	return IPAHAL_PKT_STATUS_EXCEPTION_FRAG_RULE_MISS;
+	case 0x20:	return IPAHAL_PKT_STATUS_EXCEPTION_SW_FILT;
+	case 0x40:	return is_ipv6 ? IPAHAL_PKT_STATUS_EXCEPTION_IPV6CT
+				       : IPAHAL_PKT_STATUS_EXCEPTION_NAT;
+	default:	return IPAHAL_PKT_STATUS_EXCEPTION_MAX;
+	}
+}
+
+/* ipahal_pkt_status_get_size() - Get H/W size of packet status */
+u32 ipahal_pkt_status_get_size(void)
+{
+	return sizeof(struct ipa_pkt_status_hw);
+}
+
+/* ipahal_pkt_status_parse() - Parse Packet Status payload to abstracted form
+ * @unparsed_status: Pointer to H/W format of the packet status as read from H/W
+ * @status: Pointer to pre-allocated buffer where the parsed info will be stored
+ */
+void ipahal_pkt_status_parse(const void *unparsed_status,
+			     struct ipahal_pkt_status *status)
+{
+	const struct ipa_pkt_status_hw *hw_status = unparsed_status;
+	bool is_ipv6;
+
+	status->status_opcode =
+			(enum ipahal_pkt_status_opcode)hw_status->status_opcode;
+	is_ipv6 = hw_status->status_mask & BIT(7) ? false : true;
+	/* If hardware status values change we may have to re-map this */
+	status->status_mask =
+			(enum ipahal_pkt_status_mask)hw_status->status_mask;
+	status->exception = exception_map(hw_status->exception, is_ipv6);
+	status->pkt_len = hw_status->pkt_len;
+	status->endp_src_idx = hw_status->endp_src_idx;
+	status->endp_dest_idx = hw_status->endp_dest_idx;
+	status->metadata = hw_status->metadata;
+	status->rt_miss = ipahal_is_rule_miss_id(hw_status->rt_rule_id);
+}
+
+int ipahal_init(void)
+{
+	struct ipa_dma_mem *mem = &ipahal_ctx->empty_fltrt_tbl;
+
+	/* Set up an empty filter/route table entry in system
+	 * memory.  This will be used, for example, to delete a
+	 * route safely.
+	 */
+	if (ipa_dma_alloc(mem, IPA_HW_TBL_WIDTH, GFP_KERNEL)) {
+		ipa_err("error allocating empty filter/route table\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+void ipahal_exit(void)
+{
+	ipa_dma_free(&ipahal_ctx->empty_fltrt_tbl);
+}
+
+/* Does the given rule ID represent a routing or filter rule miss?
+ * A rule miss is indicated as an all-1's value in the rt_rule_id
+ * or flt_rule_id field of the ipahal_pkt_status structure.
+ */
+bool ipahal_is_rule_miss_id(u32 id)
+{
+	BUILD_BUG_ON(IPA_RULE_ID_BITS < 2);
+
+	return id == (1U << IPA_RULE_ID_BITS) - 1;
+}
+
+/**
+ * ipahal_rt_generate_empty_img() - Generate empty route table header
+ * @route_count:	Number of table entries
+ * @mem:		DMA memory object representing the header structure
+ *
+ * Allocates and fills an "empty" route table header having the given
+ * number of entries.  Each entry in the table contains the DMA address
+ * of a routing entry.
+ *
+ * This function initializes all entries to point at the preallocated
+ * empty routing entry in system RAM.
+ *
+ * Return:	0 if successful, or a negative error code otherwise
+ */
+int ipahal_rt_generate_empty_img(u32 route_count, struct ipa_dma_mem *mem)
+{
+	u64 addr;
+	int i;
+
+	BUILD_BUG_ON(!IPA_HW_TBL_HDR_WIDTH);
+
+	if (ipa_dma_alloc(mem, route_count * IPA_HW_TBL_HDR_WIDTH, GFP_KERNEL))
+		return -ENOMEM;
+
+	addr = (u64)ipahal_ctx->empty_fltrt_tbl.phys;
+	for (i = 0; i < route_count; i++)
+		put_unaligned(addr, mem->virt + i * IPA_HW_TBL_HDR_WIDTH);
+
+	return 0;
+}
+
+/**
+ * ipahal_flt_generate_empty_img() - Generate empty filter table header
+ * @filter_bitmap:	Bitmap representing which endpoints support filtering
+ * @mem:		DMA memory object representing the header structure
+ *
+ * Allocates and fills an "empty" filter table header based on the
+ * given filter bitmap.
+ *
+ * The first slot in a filter table header is a 64-bit bitmap whose
+ * set bits define which endpoints support filtering.  Following
+ * this, each set bit in the mask has the DMA address of the filter
+ * used for the corresponding endpoint.
+ *
+ * This function initializes all endpoints that support filtering to
+ * point at the preallocated empty filter in system RAM.
+ *
+ * Note:  the (software) bitmap here uses bit 0 to represent
+ * endpoint 0, bit 1 for endpoint 1, and so on.  This is different
+ * from the hardware (which uses bit 1 to represent filter 0, etc.).
+ *
+ * Return:	0 if successful, or a negative error code
+ */
+int ipahal_flt_generate_empty_img(u64 filter_bitmap, struct ipa_dma_mem *mem)
+{
+	u32 filter_count = hweight32(filter_bitmap) + 1;
+	u64 addr;
+	int i;
+
+	ipa_assert(filter_bitmap);
+
+	if (ipa_dma_alloc(mem, filter_count * IPA_HW_TBL_HDR_WIDTH, GFP_KERNEL))
+		return -ENOMEM;
+
+	/* Save the endpoint bitmap in the first slot of the table.
+	 * Convert it from software to hardware representation by
+	 * shifting it left one position.
+	 * XXX Does bit position 0 represent global?  At IPA3, global
+	 * XXX configuration is possible but not used.
+	 */
+	put_unaligned(filter_bitmap << 1, mem->virt);
+
+	/* Point every entry in the table at the empty filter */
+	addr = (u64)ipahal_ctx->empty_fltrt_tbl.phys;
+	for (i = 1; i < filter_count; i++)
+		put_unaligned(addr, mem->virt + i * IPA_HW_TBL_HDR_WIDTH);
+
+	return 0;
+}
+
+void ipahal_free_empty_img(struct ipa_dma_mem *mem)
+{
+	ipa_dma_free(mem);
+}
diff --git a/drivers/net/ipa/ipahal.h b/drivers/net/ipa/ipahal.h
new file mode 100644
index 000000000000..940254940d90
--- /dev/null
+++ b/drivers/net/ipa/ipahal.h
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ * Copyright (C) 2018 Linaro Ltd.
+ */
+#ifndef _IPAHAL_H_
+#define _IPAHAL_H_
+
+#include <linux/types.h>
+
+#include "ipa_dma.h"
+
+/* The IPA implements offloaded packet filtering and routing
+ * capabilities.  This is managed by programming IPA-resident
+ * tables of rules that define the processing that should be
+ * performed by the IPA and the conditions under which they
+ * should be applied.  Aspects of these rules are constrained
+ * by things like table entry sizes and alignment requirements;
+ * all of these are in units of bytes.  These definitions are
+ * subject to some constraints:
+ * - IPA_HW_TBL_WIDTH must be non-zero
+ * - IPA_HW_TBL_SYSADDR_ALIGN must be a non-zero power of 2
+ * - IPA_HW_TBL_HDR_WIDTH must be non-zero
+ *
+ * Values could differ for different versions of IPA hardware.
+ * These values are for v3.5.1, found in the SDM845.
+ */
+#define IPA_HW_TBL_WIDTH		8
+#define IPA_HW_TBL_SYSADDR_ALIGN	128
+#define IPA_HW_TBL_HDR_WIDTH		8
+
+/**
+ * ipahal_dma_shared_mem_write_pyld() - Write to shared memory command payload
+ *
+ * Return a pointer to the payload for a DMA shared memory write immediate
+ * command, or null if one can't be allocated.  Result is dynamically
+ * allocated, and caller must ensure it gets released by providing it to
+ * ipahal_destroy_imm_cmd() when it is no longer needed.
+ *
+ * Return:	 Pointer to the immediate command payload, or NULL
+ */
+void *ipahal_dma_shared_mem_write_pyld(struct ipa_dma_mem *mem, u32 offset);
+
+/**
+ * ipahal_hdr_init_local_pyld() - Header initialization command payload
+ * mem:		DMA buffer containing data for initialization
+ * offset:	Where in location IPA local memory to write
+ *
+ * Return a pointer to the payload for a header init local immediate
+ * command, or null if one can't be allocated.  Caller must ensure result
+ * gets released by providing it to ipahal_destroy_imm_cmd().
+ *
+ * Return:	 Pointer to the immediate command payload, or NULL
+ */
+void *ipahal_hdr_init_local_pyld(struct ipa_dma_mem *mem, u32 offset);
+
+/**
+ * ipahal_ip_v4_routing_init_pyld() - IPv4 routing table initialization payload
+ * mem:		The IPv4 routing table data to be written
+ * hash_offset:	The location in IPA memory for a hashed routing table
+ * nhash_offset: The location in IPA memory for a non-hashed routing table
+ *
+ * Return a pointer to the payload for an IPv4 routing init immediate
+ * command, or null if one can't be allocated.  Caller must ensure result
+ * gets released by providing it to ipahal_destroy_imm_cmd().
+ *
+ * Return:	 Pointer to the immediate command payload, or NULL
+ */
+void *ipahal_ip_v4_routing_init_pyld(struct ipa_dma_mem *mem,
+				     u32 hash_offset, u32 nhash_offset);
+
+/**
+ * ipahal_ip_v6_routing_init_pyld() - IPv6 routing table initialization payload
+ * mem:		The IPv6 routing table data to be written
+ * hash_offset:	The location in IPA memory for a hashed routing table
+ * nhash_offset: The location in IPA memory for a non-hashed routing table
+ *
+ * Return a pointer to the payload for an IPv4 routing init immediate
+ * command, or null if one can't be allocated.  Caller must ensure result
+ * gets released by providing it to ipahal_destroy_imm_cmd().
+ *
+ * Return:	 Pointer to the immediate command payload, or NULL
+ */
+void *ipahal_ip_v6_routing_init_pyld(struct ipa_dma_mem *mem,
+				     u32 hash_offset, u32 nhash_offset);
+
+/**
+ * ipahal_ip_v4_filter_init_pyld() - IPv4 filter table initialization payload
+ * mem:		The IPv4 filter table data to be written
+ * hash_offset:	The location in IPA memory for a hashed filter table
+ * nhash_offset: The location in IPA memory for a non-hashed filter table
+ *
+ * Return a pointer to the payload for an IPv4 filter init immediate
+ * command, or null if one can't be allocated.  Caller must ensure result
+ * gets released by providing it to ipahal_destroy_imm_cmd().
+ *
+ * Return:	 Pointer to the immediate command payload, or NULL
+ */
+void *ipahal_ip_v4_filter_init_pyld(struct ipa_dma_mem *mem,
+				    u32 hash_offset, u32 nhash_offset);
+
+/**
+ * ipahal_ip_v6_filter_init_pyld() - IPv6 filter table initialization payload
+ * mem:		The IPv6 filter table data to be written
+ * hash_offset:	The location in IPA memory for a hashed filter table
+ * nhash_offset: The location in IPA memory for a non-hashed filter table
+ *
+ * Return a pointer to the payload for an IPv4 filter init immediate
+ * command, or null if one can't be allocated.  Caller must ensure result
+ * gets released by providing it to ipahal_destroy_imm_cmd().
+ *
+ * Return:	 Pointer to the immediate command payload, or NULL
+ */
+void *ipahal_ip_v6_filter_init_pyld(struct ipa_dma_mem *mem,
+				    u32 hash_offset, u32 nhash_offset);
+
+/**
+ * ipahal_dma_task_32b_addr_pyld() - 32-bit DMA task command payload
+ * mem:		DMA memory involved in the task
+ *
+ * Return a pointer to the payload for DMA task 32-bit address immediate
+ * command, or null if one can't be allocated.  Caller must ensure result
+ * gets released by providing it to ipahal_destroy_imm_cmd().
+ */
+void *ipahal_dma_task_32b_addr_pyld(struct ipa_dma_mem *mem);
+
+/**
+ * ipahal_payload_free() - Release an allocated immediate command payload
+ * @payload:	Payload to be released
+ */
+void ipahal_payload_free(void *payload);
+
+/**
+ * enum ipahal_pkt_status_opcode - Packet Status Opcode
+ * @IPAHAL_STATUS_OPCODE_PACKET_2ND_PASS: Packet Status generated as part of
+ *  IPA second processing pass for a packet (i.e. IPA XLAT processing for
+ *  the translated packet).
+ *
+ *  The values assigned here are assumed by ipa_pkt_status_parse()
+ *  to match values returned in the status_opcode field of a
+ *  ipa_pkt_status_hw structure inserted by the IPA in received
+ *  buffer.
+ */
+enum ipahal_pkt_status_opcode {
+	IPAHAL_PKT_STATUS_OPCODE_PACKET			= 0x01,
+	IPAHAL_PKT_STATUS_OPCODE_NEW_FRAG_RULE		= 0x02,
+	IPAHAL_PKT_STATUS_OPCODE_DROPPED_PACKET		= 0x04,
+	IPAHAL_PKT_STATUS_OPCODE_SUSPENDED_PACKET	= 0x08,
+	IPAHAL_PKT_STATUS_OPCODE_LOG			= 0x10,
+	IPAHAL_PKT_STATUS_OPCODE_DCMP			= 0x20,
+	IPAHAL_PKT_STATUS_OPCODE_PACKET_2ND_PASS	= 0x40,
+};
+
+/**
+ * enum ipahal_pkt_status_exception - Packet Status exception type
+ * @IPAHAL_PKT_STATUS_EXCEPTION_PACKET_LENGTH: formerly IHL exception.
+ *
+ * Note: IPTYPE, PACKET_LENGTH and PACKET_THRESHOLD exceptions means that
+ *  partial / no IP processing took place and corresponding Status Mask
+ *  fields should be ignored. Flt and rt info is not valid.
+ *
+ * NOTE:: Any change to this enum, need to change to
+ *	ipahal_pkt_status_exception_to_str array as well.
+ */
+enum ipahal_pkt_status_exception {
+	IPAHAL_PKT_STATUS_EXCEPTION_NONE = 0,
+	IPAHAL_PKT_STATUS_EXCEPTION_DEAGGR,
+	IPAHAL_PKT_STATUS_EXCEPTION_IPTYPE,
+	IPAHAL_PKT_STATUS_EXCEPTION_PACKET_LENGTH,
+	IPAHAL_PKT_STATUS_EXCEPTION_PACKET_THRESHOLD,
+	IPAHAL_PKT_STATUS_EXCEPTION_FRAG_RULE_MISS,
+	IPAHAL_PKT_STATUS_EXCEPTION_SW_FILT,
+	/* NAT and IPv6CT have the same value at HW.
+	 * NAT for IPv4 and IPv6CT for IPv6 exceptions
+	 */
+	IPAHAL_PKT_STATUS_EXCEPTION_NAT,
+	IPAHAL_PKT_STATUS_EXCEPTION_IPV6CT,
+	IPAHAL_PKT_STATUS_EXCEPTION_MAX,
+};
+
+/**
+ * enum ipahal_pkt_status_mask - Packet Status bitmask values of
+ *  the contained flags. This bitmask indicates flags on the properties of
+ *  the packet as well as IPA processing it may had.
+ * @TAG_VALID: Flag specifying if TAG and TAG info valid?
+ * @CKSUM_PROCESS: CSUM block processing flag: Was pkt processed by csum block?
+ *  If so, csum trailer exists
+ */
+enum ipahal_pkt_status_mask {
+	/* Other values are defined but are not specifically handled yet. */
+	IPAHAL_PKT_STATUS_MASK_CKSUM_PROCESS	= 0x0100,
+};
+
+/**
+ * struct ipahal_pkt_status - IPA status packet abstracted payload.
+ * @status_opcode: The type of status (Opcode).
+ * @exception: The first exception that took place.
+ *  In case of exception, endp_src_idx and pkt_len are always valid.
+ * @status_mask: Bit mask for flags on several properties on the packet
+ *  and processing it may passed at IPA.
+ * @pkt_len: Pkt pyld len including hdr and retained hdr if used. Does
+ *  not include padding or checksum trailer len.
+ * @endp_src_idx: Source end point index.
+ * @endp_dest_idx: Destination end point index.
+ *  Not valid in case of exception
+ * @metadata: meta data value used by packet
+ * @rt_miss: Routing miss flag: Was their a routing rule miss?
+ *
+ * This structure describes the status packet fields for the following
+ * status values: IPA_STATUS_PACKET, IPA_STATUS_DROPPED_PACKET,
+ * IPA_STATUS_SUSPENDED_PACKET.  Other status types have different status
+ * packet structure.  Note that the hardware supplies additional status
+ * information that is currently unused.
+ */
+struct ipahal_pkt_status {
+	enum ipahal_pkt_status_opcode status_opcode;
+	enum ipahal_pkt_status_exception exception;
+	enum ipahal_pkt_status_mask status_mask;
+	u32 pkt_len;
+	u8 endp_src_idx;
+	u8 endp_dest_idx;
+	u32 metadata;
+	bool rt_miss;
+};
+
+/**
+ * ipahal_pkt_status_get_size() - Get size of a hardware packet status
+ */
+u32 ipahal_pkt_status_get_size(void);
+
+/* ipahal_pkt_status_parse() - Parse packet status payload
+ * @unparsed_status:	Packet status read from hardware
+ * @status:		Buffer to hold parsed status information
+ */
+void ipahal_pkt_status_parse(const void *unparsed_status,
+			     struct ipahal_pkt_status *status);
+
+int ipahal_init(void);
+void ipahal_exit(void);
+
+/* Does the given ID represent rule miss? */
+bool ipahal_is_rule_miss_id(u32 id);
+
+int ipahal_rt_generate_empty_img(u32 route_count, struct ipa_dma_mem *mem);
+int ipahal_flt_generate_empty_img(u64 ep_bitmap, struct ipa_dma_mem *mem);
+
+/**
+ * ipahal_free_empty_img() - Free empty filter or route image
+ * @mem:	DMA memory containing filter/route data
+ */
+void ipahal_free_empty_img(struct ipa_dma_mem *mem);
+
+#endif /* _IPAHAL_H_ */
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v2 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
From: Tariq Toukan @ 2018-11-07  9:59 UTC (permalink / raw)
  To: Aaron Lu, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
  Cc: Andrew Morton, Paweł Staszewski, Jesper Dangaard Brouer,
	Eric Dumazet, Tariq Toukan, Ilias Apalodimas, Yoel Caspersen,
	Mel Gorman, Saeed Mahameed, Michal Hocko, Vlastimil Babka,
	Dave Hansen, Alexander Duyck
In-Reply-To: <20181106052833.GC6203@intel.com>



On 06/11/2018 7:28 AM, Aaron Lu wrote:
> page_frag_free() calls __free_pages_ok() to free the page back to
> Buddy. This is OK for high order page, but for order-0 pages, it
> misses the optimization opportunity of using Per-Cpu-Pages and can
> cause zone lock contention when called frequently.
> 
> Paweł Staszewski recently shared his result of 'how Linux kernel
> handles normal traffic'[1] and from perf data, Jesper Dangaard Brouer
> found the lock contention comes from page allocator:
> 
>    mlx5e_poll_tx_cq
>    |
>     --16.34%--napi_consume_skb
>               |
>               |--12.65%--__free_pages_ok
>               |          |
>               |           --11.86%--free_one_page
>               |                     |
>               |                     |--10.10%--queued_spin_lock_slowpath
>               |                     |
>               |                      --0.65%--_raw_spin_lock
>               |
>               |--1.55%--page_frag_free
>               |
>                --1.44%--skb_release_data
> 
> Jesper explained how it happened: mlx5 driver RX-page recycle
> mechanism is not effective in this workload and pages have to go
> through the page allocator. The lock contention happens during
> mlx5 DMA TX completion cycle. And the page allocator cannot keep
> up at these speeds.[2]
> 
> I thought that __free_pages_ok() are mostly freeing high order
> pages and thought this is an lock contention for high order pages
> but Jesper explained in detail that __free_pages_ok() here are
> actually freeing order-0 pages because mlx5 is using order-0 pages
> to satisfy its page pool allocation request.[3]
> 

Thanks for your patch!
Acked-by: Tariq Toukan <tariqt@mellanox.com>

> The free path as pointed out by Jesper is:
> skb_free_head()
>    -> skb_free_frag()
>      -> page_frag_free()
> And the pages being freed on this path are order-0 pages.
> 
> Fix this by doing similar things as in __page_frag_cache_drain() -
> send the being freed page to PCP if it's an order-0 page, or
> directly to Buddy if it is a high order page.
> 
> With this change, Paweł hasn't noticed lock contention yet in
> his workload and Jesper has noticed a 7% performance improvement
> using a micro benchmark and lock contention is gone. Ilias' test
> on a 'low' speed 1Gbit interface on an cortex-a53 shows ~11%
> performance boost testing with 64byte packets and __free_pages_ok()
> disappeared from perf top.
> 
> [1]: https://www.spinics.net/lists/netdev/msg531362.html
> [2]: https://www.spinics.net/lists/netdev/msg531421.html
> [3]: https://www.spinics.net/lists/netdev/msg531556.html
> Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
> Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---
> v2: only changelog changes:
>      - remove the duplicated skb_free_frag() as pointed by Jesper;
>      - add Ilias' test result;
>      - add people's ack/test tag.
> 
>   mm/page_alloc.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ae31839874b8..91a9a6af41a2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4555,8 +4555,14 @@ void page_frag_free(void *addr)
>   {
>   	struct page *page = virt_to_head_page(addr);
>   
> -	if (unlikely(put_page_testzero(page)))
> -		__free_pages_ok(page, compound_order(page));
> +	if (unlikely(put_page_testzero(page))) {
> +		unsigned int order = compound_order(page);
> +
> +		if (order == 0)
> +			free_unref_page(page);
> +		else
> +			__free_pages_ok(page, order);
> +	}
>   }
>   EXPORT_SYMBOL(page_frag_free);
>   
> 

^ permalink raw reply

* Re: [PATCH bpf-next] bpf_load: add map name to load_maps error message
From: Song Liu @ 2018-11-07  0:26 UTC (permalink / raw)
  To: John Fastabend
  Cc: shannon.nelson, Alexei Starovoitov, Daniel Borkmann, Networking,
	shannon.lee.nelson
In-Reply-To: <ad307fb7-0ea3-3929-1dfe-38dbf281e206@gmail.com>

On Mon, Oct 29, 2018 at 3:12 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> On 10/29/2018 02:14 PM, Shannon Nelson wrote:
> > To help when debugging bpf/xdp load issues, have the load_map()
> > error message include the number and name of the map that
> > failed.
> >
> > Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
> > ---
> >  samples/bpf/bpf_load.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> > index 89161c9..5de0357 100644
> > --- a/samples/bpf/bpf_load.c
> > +++ b/samples/bpf/bpf_load.c
> > @@ -282,8 +282,8 @@ static int load_maps(struct bpf_map_data *maps, int nr_maps,
> >                                                       numa_node);
> >               }
> >               if (map_fd[i] < 0) {
> > -                     printf("failed to create a map: %d %s\n",
> > -                            errno, strerror(errno));
> > +                     printf("failed to create map %d (%s): %d %s\n",
> > +                            i, maps[i].name, errno, strerror(errno));
> >                       return 1;
> >               }
> >               maps[i].fd = map_fd[i];
> >
>
> LGTM
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
From: David Ahern @ 2018-11-07  0:23 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller
  Cc: Song Liu, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Kernel Team, ast@kernel.org, daniel@iogearbox.net,
	peterz@infradead.org, acme@kernel.org
In-Reply-To: <39fe6abc-5c3e-bac3-0c0b-cf68bea23ab0@fb.com>

On 11/6/18 5:13 PM, Alexei Starovoitov wrote:
> On 11/6/18 3:36 PM, David Miller wrote:
>> From: Alexei Starovoitov <ast@fb.com>
>> Date: Tue, 6 Nov 2018 23:29:07 +0000
>>
>>> I think concerns with perf overhead from collecting bpf events
>>> are unfounded.
>>> I would prefer for this flag to be on by default.
>>
>> I will sit in userspace looping over bpf load/unload and see how the
>> person trying to monitor something else with perf feels about that.
>>
>> Really, it is inappropriate to turn this on by default, I completely
>> agree with David Ahern.
>>
>> It's hard enough, _AS IS_, for me to fight back all of the bloat that
>> is in perf right now and get it back to being able to handle simple
>> full workloads without dropping events..
> 
> It's a separate perf thread and separate event with its own epoll.
> I don't see how it can affect main event collection.
> Let's put it this way. If it does affect somehow, then yes,
> it should not be on. If it is not, there is no downside to keep it on.
> Typical user expects to type 'perf record' and see everything that
> is happening on the system. Right now short lived bpf programs
> will not be seen. How user suppose to even know when to use the flag?

The default is profiling where perf record collects task events and
periodic samples. So for the default record/report, the bpf load /
unload events are not relevant.

> The only option is to always pass the flag 'just in case'
> which is unnecessary burden.

People who care about an event enable the event collection of the event.

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver
From: Florian Fainelli @ 2018-11-07  0:09 UTC (permalink / raw)
  To: Russell King - ARM Linux, David Miller; +Cc: netdev, andrew
In-Reply-To: <20181107000322.GP30658@n2100.armlinux.org.uk>

On 11/6/18 4:03 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 06, 2018 at 03:38:44PM -0800, David Miller wrote:
>> From: Florian Fainelli <f.fainelli@gmail.com>
>> Date: Tue,  6 Nov 2018 15:29:10 -0800
>>
>>> This patch series allows warning an user that the generic PHY driver(s)
>>> are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
>>> likely not going to work at all.
>>>
>>> Let me know if you would want to do that differently.
>>
>> Is there ever a possibility that the generic PHY driver could work
>> in an SFP situation?
> 
> I don't yet see the reason for Florian's patch series - all the Marvell
> 88e1111 based modules I have, or have come across in information from
> manufacturers self-configure themselves and don't really need the
> Marvell 1G PHY driver.
> 
> For example, the Source Photonics were offering a range of 1GbaseT
> modules with the 88e1111 programmed in different modes, but published
> instructions for the register accesses to configure them differently
> (eg, SGMII vs 1000base-X interface facing the MAC).  Depending on
> the module part number determines which mode the PHY has been
> programmed to come up in.
> 
> So in theory, you don't need any PHY driver for these modules - but
> it's useful to have a functional PHY driver to be able to read out
> the negotiated flow control results.
> 
> I'd like more information from Florian about the reasoning behind
> this patch series before it's merged.
> 

The module that I am using [1] would not work, as in , no link up being
reported without turning on the Marvell PHY driver:

https://www.amazon.com/dp/B01LW2P72V/ref=twister_B07F3WQJQX?_encoding=UTF8&psc=1

this module uses a 88E1111 PHY as well (OUI: 0x01410cc2).
-- 
Florian

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver
From: Russell King - ARM Linux @ 2018-11-07  0:03 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, netdev, andrew
In-Reply-To: <20181106.153844.1612363235041286689.davem@davemloft.net>

On Tue, Nov 06, 2018 at 03:38:44PM -0800, David Miller wrote:
> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Tue,  6 Nov 2018 15:29:10 -0800
> 
> > This patch series allows warning an user that the generic PHY driver(s)
> > are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
> > likely not going to work at all.
> > 
> > Let me know if you would want to do that differently.
> 
> Is there ever a possibility that the generic PHY driver could work
> in an SFP situation?

I don't yet see the reason for Florian's patch series - all the Marvell
88e1111 based modules I have, or have come across in information from
manufacturers self-configure themselves and don't really need the
Marvell 1G PHY driver.

For example, the Source Photonics were offering a range of 1GbaseT
modules with the 88e1111 programmed in different modes, but published
instructions for the register accesses to configure them differently
(eg, SGMII vs 1000base-X interface facing the MAC).  Depending on
the module part number determines which mode the PHY has been
programmed to come up in.

So in theory, you don't need any PHY driver for these modules - but
it's useful to have a functional PHY driver to be able to read out
the negotiated flow control results.

I'd like more information from Florian about the reasoning behind
this patch series before it's merged.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver
From: Florian Fainelli @ 2018-11-06 23:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, andrew, rmk+kernel
In-Reply-To: <20181106.153844.1612363235041286689.davem@davemloft.net>

On 11/6/18 3:38 PM, David Miller wrote:
> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Tue,  6 Nov 2018 15:29:10 -0800
> 
>> This patch series allows warning an user that the generic PHY driver(s)
>> are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
>> likely not going to work at all.
>>
>> Let me know if you would want to do that differently.
> 
> Is there ever a possibility that the generic PHY driver could work
> in an SFP situation?

Given the PHY has to operate in SGMII mode, I doubt it could work
without a specialized driver, Andrew, Russell, would you concur?

> 
> If not, yes emit the message but also fail the load and registry too
> perhaps?
> 

I was not sure this would be acceptable, but it is definitively an easy
change.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 0/3] net: systemport: Unmap queues upon DSA unregister event
From: David Miller @ 2018-11-06 23:40 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot
In-Reply-To: <20181106231518.16314-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue,  6 Nov 2018 15:15:15 -0800

> This patch series fixes the unbinding/binding of the bcm_sf2 switch
> driver along with bcmsysport which monitors the switch port queues.
> Because the driver was not processing the DSA_PORT_UNREGISTER event, we
> would not be unmapping switch port/queues, which could cause incorrect
> decisions to be made by the HW (e.g: queue always back-pressured).

Series applied, thanks Florian.

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver
From: David Miller @ 2018-11-06 23:38 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, rmk+kernel
In-Reply-To: <20181106232913.17216-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue,  6 Nov 2018 15:29:10 -0800

> This patch series allows warning an user that the generic PHY driver(s)
> are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
> likely not going to work at all.
> 
> Let me know if you would want to do that differently.

Is there ever a possibility that the generic PHY driver could work
in an SFP situation?

If not, yes emit the message but also fail the load and registry too
perhaps?

^ permalink raw reply

* [PATCH RFC net-next 3/3] net: phy: Default MARVELL_PHY to the value of SFP
From: Florian Fainelli @ 2018-11-06 23:29 UTC (permalink / raw)
  To: netdev; +Cc: andrew, rmk+kernel, davem, Florian Fainelli
In-Reply-To: <20181106232913.17216-1-f.fainelli@gmail.com>

Marvell PHYs are typically found in 1000BaseT SFP modules, so give a
chance for users to get the correct PHY driver when using SFP modules.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 3d187cd50eb0..cf7d44ba20c5 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -350,6 +350,7 @@ config LXT_PHY
 
 config MARVELL_PHY
 	tristate "Marvell PHYs"
+	default SFP
 	---help---
 	  Currently has a driver for the 88E1011S
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH RFC net-next 2/3] net: phy: sfp: Issue warning when using Generic PHY driver(s)
From: Florian Fainelli @ 2018-11-06 23:29 UTC (permalink / raw)
  To: netdev; +Cc: andrew, rmk+kernel, davem, Florian Fainelli
In-Reply-To: <20181106232913.17216-1-f.fainelli@gmail.com>

1000BaseT SFP modules typically include an Ethernet PHY device, and
while the Generic PHY driver will be able to bind to it, it usually will
not work at all without a specialized PHY driver. Issue a warning in
that case to help toubleshoot things.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/sfp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index fd8bb998ae52..228205d8ce84 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -1203,6 +1203,9 @@ static void sfp_sm_probe_phy(struct sfp *sfp)
 	}
 
 	sfp->mod_phy = phy;
+	if (phy_driver_is_genphy(phy) || phy_driver_is_genphy_10g(phy))
+		dev_warn(sfp->dev, "Using Generic PHY driver with a SFP!\n");
+
 	phy_start(phy);
 }
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH RFC net-next 1/3] net: phy: Add helpers to determine if PHY driver is generic
From: Florian Fainelli @ 2018-11-06 23:29 UTC (permalink / raw)
  To: netdev; +Cc: andrew, rmk+kernel, davem, Florian Fainelli
In-Reply-To: <20181106232913.17216-1-f.fainelli@gmail.com>

We are already checking in phy_detach() that the PHY driver is of
generic kind (1G or 10G) and we are going to make use of that in the SFP
layer as well for 1000BaseT SFP modules, so expose helper functions to
return that information.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/phy_device.c | 34 ++++++++++++++++++++++++++++++++--
 include/linux/phy.h          |  3 +++
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index ab33d1777132..15de7a3263bf 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1262,6 +1262,36 @@ struct phy_device *phy_attach(struct net_device *dev, const char *bus_id,
 }
 EXPORT_SYMBOL(phy_attach);
 
+static bool phy_driver_is_genphy_kind(struct phy_device *phydev,
+				      struct device_driver *driver)
+{
+	struct device *d = &phydev->mdio.dev;
+	bool ret = false;
+
+	if (!phydev->drv)
+		return ret;
+
+	get_device(d);
+	ret = d->driver == driver;
+	put_device(d);
+
+	return ret;
+}
+
+bool phy_driver_is_genphy(struct phy_device *phydev)
+{
+	return phy_driver_is_genphy_kind(phydev,
+					 &genphy_driver.mdiodrv.driver);
+}
+EXPORT_SYMBOL_GPL(phy_driver_is_genphy);
+
+bool phy_driver_is_genphy_10g(struct phy_device *phydev)
+{
+	return phy_driver_is_genphy_kind(phydev,
+					 &genphy_10g_driver.mdiodrv.driver);
+}
+EXPORT_SYMBOL_GPL(phy_driver_is_genphy_10g);
+
 /**
  * phy_detach - detach a PHY device from its network device
  * @phydev: target phy_device struct
@@ -1293,8 +1323,8 @@ void phy_detach(struct phy_device *phydev)
 	 * from the generic driver so that there's a chance a
 	 * real driver could be loaded
 	 */
-	if (phydev->mdio.dev.driver == &genphy_10g_driver.mdiodrv.driver ||
-	    phydev->mdio.dev.driver == &genphy_driver.mdiodrv.driver)
+	if (phy_driver_is_genphy(phydev) ||
+	    phy_driver_is_genphy_10g(phydev))
 		device_release_driver(&phydev->mdio.dev);
 
 	/*
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 3ea87f774a76..84a6c7efef60 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1192,4 +1192,7 @@ module_exit(phy_module_exit)
 #define module_phy_driver(__phy_drivers)				\
 	phy_module_driver(__phy_drivers, ARRAY_SIZE(__phy_drivers))
 
+bool phy_driver_is_genphy(struct phy_device *phydev);
+bool phy_driver_is_genphy_10g(struct phy_device *phydev);
+
 #endif /* __PHY_H */
-- 
2.17.1

^ permalink raw reply related

* [PATCH RFC net-next 0/3] net: phy: sfp: Warn when using generic PHY driver
From: Florian Fainelli @ 2018-11-06 23:29 UTC (permalink / raw)
  To: netdev; +Cc: andrew, rmk+kernel, davem, Florian Fainelli

Hi all,

This patch series allows warning an user that the generic PHY driver(s)
are used when a SFP incorporates a PHY (e.g: 1000BaseT SFP) which is
likely not going to work at all.

Let me know if you would want to do that differently.

Florian Fainelli (3):
  net: phy: Add helpers to determine if PHY driver is generic
  net: phy: sfp: Issue warning when using Generic PHY driver(s)
  net: phy: Default MARVELL_PHY to the value of SFP

 drivers/net/phy/Kconfig      |  1 +
 drivers/net/phy/phy_device.c | 34 ++++++++++++++++++++++++++++++++--
 drivers/net/phy/sfp.c        |  3 +++
 include/linux/phy.h          |  3 +++
 4 files changed, 39 insertions(+), 2 deletions(-)

-- 
2.17.1

^ permalink raw reply

* Re: [PATCH net-next 09/11] udp: Support for error handlers of tunnels with arbitrary destination port
From: David Miller @ 2018-11-06 23:26 UTC (permalink / raw)
  To: sbrivio; +Cc: sd, lucien.xin, netdev
In-Reply-To: <d4649d12308c6d51f36a809e63514a39c75564af.1541533786.git.sbrivio@redhat.com>

From: Stefano Brivio <sbrivio@redhat.com>
Date: Tue,  6 Nov 2018 22:39:05 +0100

> diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
> index 236e40ba06bf..7855966b4a19 100644
> --- a/include/net/ip6_tunnel.h
> +++ b/include/net/ip6_tunnel.h
> @@ -69,6 +69,8 @@ struct ip6_tnl_encap_ops {
>  	size_t (*encap_hlen)(struct ip_tunnel_encap *e);
>  	int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
>  			    u8 *protocol, struct flowi6 *fl6);
> +	int (*err_handler)(struct sk_buff *, struct inet6_skb_parm *opt,
> +			   u8 type, u8 code, int offset, __be32 info);
>  };

Please give names to all of the arguments in this new method.

...
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index b0d022ff6ea1..5980659312e5 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -311,6 +311,7 @@ struct ip_tunnel_encap_ops {
>  	size_t (*encap_hlen)(struct ip_tunnel_encap *e);
>  	int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
>  			    u8 *protocol, struct flowi4 *fl4);
> +	int (*err_handler)(struct sk_buff *, u32);

Likewise.

^ permalink raw reply

* Re: [PATCH net-next 01/11] udp: Handle ICMP errors for tunnels with same destination port on both endpoints
From: David Miller @ 2018-11-06 23:25 UTC (permalink / raw)
  To: sbrivio; +Cc: sd, lucien.xin, netdev
In-Reply-To: <9f1e659e3745bbc8f29a9debb9bb8c0d8918fa24.1541533786.git.sbrivio@redhat.com>

From: Stefano Brivio <sbrivio@redhat.com>
Date: Tue,  6 Nov 2018 22:38:57 +0100

> +	/* Network header needs to point to the outer IPv4 header inside ICMP */
> +	skb_reset_network_header(skb);
> +	iph = ip_hdr(skb);
> +	/* Transport header needs to point to the UDP header */
> +	skb_set_transport_header(skb, iph->ihl << 2);

Please put an empty line before the second comment.

^ permalink raw reply

* Re: [PATCH net-next 00/11] ICMP error handling for UDP tunnels
From: David Miller @ 2018-11-06 23:24 UTC (permalink / raw)
  To: sbrivio; +Cc: sd, lucien.xin, netdev
In-Reply-To: <cover.1541533786.git.sbrivio@redhat.com>

From: Stefano Brivio <sbrivio@redhat.com>
Date: Tue,  6 Nov 2018 22:38:56 +0100

> This series introduces ICMP error handling for UDP tunnels and
> encapsulations and related selftests. We need to handle ICMP errors to
> support PMTU discovery and route redirection -- this support is entirely
> missing right now:
> 
> - patch 1/11 adds a socket lookup for UDP tunnels that use, by design,
>   the same destination port on both endpoints -- i.e. VxLAN and GENEVE
> - patches 2/11 to 7/11 are specific to VxLAN and GENEVE
> - patches 8/11 and 9/11 add infrastructure for lookup of encapsulations
>   where sent packets cannot be matched via receiving socket lookup, i.e.
>   FoU and GUE
> - patches 10/11 and 11/11 are specific to FoU and GUE

I like this series, especially the testcases.

But I have a minor coding style issue or two I'd like you
to fixup before I apply this.

I'll reply to individual patches as needed.

^ permalink raw reply

* [PATCH net-next 3/3] net: systemport: Unmap queues upon DSA unregister event
From: Florian Fainelli @ 2018-11-06 23:15 UTC (permalink / raw)
  To: netdev; +Cc: davem, andrew, vivien.didelot, Florian Fainelli
In-Reply-To: <20181106231518.16314-1-f.fainelli@gmail.com>

Binding and unbinding the switch driver which creates the DSA slave
network devices for which we set-up inspection would lead to
undesireable effects since we were not clearing the port/queue mapping
to the SYSTEMPORT TX queue.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 56 +++++++++++++++++++---
 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index f620c647bb86..f8f0a027b3ae 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -2371,17 +2371,61 @@ static int bcm_sysport_map_queues(struct notifier_block *nb,
 	return 0;
 }
 
+static int bcm_sysport_unmap_queues(struct notifier_block *nb,
+				    struct dsa_notifier_register_info *info)
+{
+	struct bcm_sysport_tx_ring *ring;
+	struct bcm_sysport_priv *priv;
+	struct net_device *slave_dev;
+	unsigned int num_tx_queues;
+	struct net_device *dev;
+	unsigned int q, port;
+
+	priv = container_of(nb, struct bcm_sysport_priv, dsa_notifier);
+	if (priv->netdev != info->master)
+		return 0;
+
+	dev = info->master;
+
+	if (dev->netdev_ops != &bcm_sysport_netdev_ops)
+		return 0;
+
+	port = info->port_number;
+	slave_dev = info->info.dev;
+
+	num_tx_queues = slave_dev->real_num_tx_queues;
+
+	for (q = 0; q < dev->num_tx_queues; q++) {
+		ring = &priv->tx_rings[q];
+
+		if (ring->switch_port != port)
+			continue;
+
+		if (!ring->inspect)
+			continue;
+
+		ring->inspect = false;
+		priv->ring_map[q + port * num_tx_queues] = NULL;
+	}
+
+	return 0;
+}
+
 static int bcm_sysport_dsa_notifier(struct notifier_block *nb,
 				    unsigned long event, void *ptr)
 {
-	struct dsa_notifier_register_info *info;
+	int ret = NOTIFY_DONE;
 
-	if (event != DSA_PORT_REGISTER)
-		return NOTIFY_DONE;
-
-	info = ptr;
+	switch (event) {
+	case DSA_PORT_REGISTER:
+		ret = bcm_sysport_map_queues(nb, ptr);
+		break;
+	case DSA_PORT_UNREGISTER:
+		ret = bcm_sysport_unmap_queues(nb, ptr);
+		break;
+	}
 
-	return notifier_from_errno(bcm_sysport_map_queues(nb, info));
+	return notifier_from_errno(ret);
 }
 
 #define REV_FMT	"v%2x.%02x"
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 2/3] net: systemport: Simplify queue mapping logic
From: Florian Fainelli @ 2018-11-06 23:15 UTC (permalink / raw)
  To: netdev; +Cc: davem, andrew, vivien.didelot, Florian Fainelli
In-Reply-To: <20181106231518.16314-1-f.fainelli@gmail.com>

The use of a bitmap speeds up the finding of the first available queue
to which we could start establishing the mapping for, but we still have
to loop over all slave network devices to set them up. Simplify the
logic to have a single loop, and use the fact that a correctly
configured ring has inspect set to true. This will make things simpler
to unwind during device unregistration.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 17 +++++++++--------
 drivers/net/ethernet/broadcom/bcmsysport.h |  1 -
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 0e2d99c737e3..f620c647bb86 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -2312,7 +2312,7 @@ static int bcm_sysport_map_queues(struct notifier_block *nb,
 	struct bcm_sysport_priv *priv;
 	struct net_device *slave_dev;
 	unsigned int num_tx_queues;
-	unsigned int q, start, port;
+	unsigned int q, qp, port;
 	struct net_device *dev;
 
 	priv = container_of(nb, struct bcm_sysport_priv, dsa_notifier);
@@ -2351,20 +2351,21 @@ static int bcm_sysport_map_queues(struct notifier_block *nb,
 
 	priv->per_port_num_tx_queues = num_tx_queues;
 
-	start = find_first_zero_bit(&priv->queue_bitmap, dev->num_tx_queues);
-	for (q = 0; q < num_tx_queues; q++) {
-		ring = &priv->tx_rings[q + start];
+	for (q = 0, qp = 0; q < dev->num_tx_queues && qp < num_tx_queues;
+	     q++) {
+		ring = &priv->tx_rings[q];
+
+		if (ring->inspect)
+			continue;
 
 		/* Just remember the mapping actual programming done
 		 * during bcm_sysport_init_tx_ring
 		 */
-		ring->switch_queue = q;
+		ring->switch_queue = qp;
 		ring->switch_port = port;
 		ring->inspect = true;
 		priv->ring_map[q + port * num_tx_queues] = ring;
-
-		/* Set all queues as being used now */
-		set_bit(q + start, &priv->queue_bitmap);
+		qp++;
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h b/drivers/net/ethernet/broadcom/bcmsysport.h
index a7a230884a87..94d64b203098 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -795,7 +795,6 @@ struct bcm_sysport_priv {
 	/* map information between switch port queues and local queues */
 	struct notifier_block	dsa_notifier;
 	unsigned int		per_port_num_tx_queues;
-	unsigned long		queue_bitmap;
 	struct bcm_sysport_tx_ring *ring_map[DSA_MAX_PORTS * 8];
 
 };
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 1/3] net: dsa: bcm_sf2: Turn on PHY to allow successful registration
From: Florian Fainelli @ 2018-11-06 23:15 UTC (permalink / raw)
  To: netdev; +Cc: davem, andrew, vivien.didelot, Florian Fainelli
In-Reply-To: <20181106231518.16314-1-f.fainelli@gmail.com>

We are binding to the PHY using the SF2 slave MDIO bus that we create,
binding involves reading the PHY's MII_PHYSID1/2 which won't be possible
if the PHY is turned off. Temporarily turn it on/off for the bus probing
to succeeed. This fixes unbind/bind problems where the port connecting
to that PHY would be in error since it could not connect to it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 2eb68769562c..2c664aac1e7b 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1090,12 +1090,16 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
 		return ret;
 	}
 
+	bcm_sf2_gphy_enable_set(priv->dev->ds, true);
+
 	ret = bcm_sf2_mdio_register(ds);
 	if (ret) {
 		pr_err("failed to register MDIO bus\n");
 		return ret;
 	}
 
+	bcm_sf2_gphy_enable_set(priv->dev->ds, false);
+
 	ret = bcm_sf2_cfp_rst(priv);
 	if (ret) {
 		pr_err("failed to reset CFP\n");
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 0/3] net: systemport: Unmap queues upon DSA unregister event
From: Florian Fainelli @ 2018-11-06 23:15 UTC (permalink / raw)
  To: netdev; +Cc: davem, andrew, vivien.didelot, Florian Fainelli

Hi all,

This patch series fixes the unbinding/binding of the bcm_sf2 switch
driver along with bcmsysport which monitors the switch port queues.
Because the driver was not processing the DSA_PORT_UNREGISTER event, we
would not be unmapping switch port/queues, which could cause incorrect
decisions to be made by the HW (e.g: queue always back-pressured).

Florian Fainelli (3):
  net: dsa: bcm_sf2: Turn on PHY to allow successful registration
  net: systemport: Simplify queue mapping logic
  net: systemport: Unmap queues upon DSA unregister event

 drivers/net/dsa/bcm_sf2.c                  |  4 ++
 drivers/net/ethernet/broadcom/bcmsysport.c | 71 ++++++++++++++++++----
 drivers/net/ethernet/broadcom/bcmsysport.h |  1 -
 3 files changed, 62 insertions(+), 14 deletions(-)

-- 
2.17.1

^ permalink raw reply

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
From: Peter Zijlstra @ 2018-11-07  8:40 UTC (permalink / raw)
  To: Song Liu; +Cc: netdev, linux-kernel, kernel-team, ast, daniel, acme
In-Reply-To: <20181106205246.567448-2-songliubraving@fb.com>

On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
> For better performance analysis of BPF programs, this patch introduces
> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
> load/unload information to user space.
> 
>         /*
>          * Record different types of bpf events:
>          *   enum perf_bpf_event_type {
>          *      PERF_BPF_EVENT_UNKNOWN          = 0,
>          *      PERF_BPF_EVENT_PROG_LOAD        = 1,
>          *      PERF_BPF_EVENT_PROG_UNLOAD      = 2,
>          *   };
>          *
>          * struct {
>          *      struct perf_event_header header;
>          *      u16 type;
>          *      u16 flags;
>          *      u32 id;  // prog_id or map_id
>          * };
>          */
>         PERF_RECORD_BPF_EVENT                   = 17,
> 
> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
> Perf utility (or other user space tools) should listen to this event and
> fetch more details about the event via BPF syscalls
> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).

Why !? You're failing to explain why it cannot provide the full
information there.

^ permalink raw reply

* Re: [PATCH v4 1/3] net: emac: implement 802.1Q VLAN TX tagging support
From: David Miller @ 2018-11-06 23:09 UTC (permalink / raw)
  To: chunkeey; +Cc: netdev, ivan, f.fainelli
In-Reply-To: <0f8f149bb526b911813cfe555f99ef00db2f1387.1541543231.git.chunkeey@gmail.com>

From: Christian Lamparter <chunkeey@gmail.com>
Date: Tue,  6 Nov 2018 23:27:49 +0100

> @@ -1435,6 +1436,22 @@ static inline netdev_tx_t emac_xmit_finish(struct emac_instance *dev, int len)
>  	return NETDEV_TX_OK;
>  }
>  
> +static inline u16 emac_tx_vlan(struct emac_instance *dev, struct sk_buff *skb)
> +{
> +	/* Handle VLAN TPID and TCI insert if this is a VLAN skb */
> +	if (emac_has_feature(dev, EMAC_FTR_HAS_VLAN_CTAG_TX) &&
> +	    skb_vlan_tag_present(skb)) {
> +		struct emac_regs __iomem *p = dev->emacp;
> +
> +		/* update the VLAN TCI */
> +		out_be32(&p->vtci, (u32)skb_vlan_tag_get(skb));

Hmmm, how does this vtci register work?

How can you have a global piece of register state controlling the VLAN
tag that will be used for the TX frame?

What happens if you queue up several TX SKBs, each one with a different
VLAN tci?

Normally the TCI state is implemented on a per-tx-descriptor basis.

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: dsa: bcm_sf2: Store rules in lists
From: Florian Fainelli @ 2018-11-06 23:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, andrew, vivien.didelot, pablo
In-Reply-To: <20181106.150619.2176282733992923198.davem@davemloft.net>

On 11/6/18 3:06 PM, David Miller wrote:
> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Tue,  6 Nov 2018 12:58:36 -0800
> 
>> Hi all,
>>
>> This patch series changes the bcm-sf2 driver to keep a copy of the
>> inserted rules as opposed to using the HW as a storage area for a number
>> of reasons:
>>
>> - this helps us with doing duplicate rule detection in a faster way, it
>>   would have required a full rule read before
>>
>> - this helps with Pablo's on-going work to convert ethtool_rx_flow_spec
>>   to a more generic flow rule structure by having fewer code paths to
>>   convert to the new structure/helpers
>>
>> - we need to cache copies to restore them during drive resumption,
>>   because depending on the low power mode the system has entered, the
>>   switch may have lost all of its context
> 
> Looks good to me, series applied and build testing right now.
> 
> I will say that the ethtool flow spec comparison should probably
> eventually be broken out into a helper function places somewhere
> common.  It's very likely this approach, and thus the helper, can
> be used by other drivers in a similar situation.

Sure, that could be done, I will check with Pablo how he wants to
approach that as well since he is reworking how flow rules are
represented. Thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: dsa: bcm_sf2: Store rules in lists
From: David Miller @ 2018-11-06 23:06 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, pablo
In-Reply-To: <20181106205841.14308-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue,  6 Nov 2018 12:58:36 -0800

> Hi all,
> 
> This patch series changes the bcm-sf2 driver to keep a copy of the
> inserted rules as opposed to using the HW as a storage area for a number
> of reasons:
> 
> - this helps us with doing duplicate rule detection in a faster way, it
>   would have required a full rule read before
> 
> - this helps with Pablo's on-going work to convert ethtool_rx_flow_spec
>   to a more generic flow rule structure by having fewer code paths to
>   convert to the new structure/helpers
> 
> - we need to cache copies to restore them during drive resumption,
>   because depending on the low power mode the system has entered, the
>   switch may have lost all of its context

Looks good to me, series applied and build testing right now.

I will say that the ethtool flow spec comparison should probably
eventually be broken out into a helper function places somewhere
common.  It's very likely this approach, and thus the helper, can
be used by other drivers in a similar situation.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/3] net: More extack messages
From: David Miller @ 2018-11-06 23:01 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern
In-Reply-To: <20181106205116.7718-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Tue,  6 Nov 2018 12:51:13 -0800

> From: David Ahern <dsahern@gmail.com>
> 
> Add more extack messages for several link create errors (e.g., invalid
> number of queues, unknown link kind) and invalid metrics argument.

Series applied, thanks David.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox