Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: iproute2 DDMMYY versioning - why?
From: Or Gerlitz @ 2020-07-31 13:23 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Linux Netdev List
In-Reply-To: <20200728125136.6c5b46e8@hermes.lan>

On Tue, Jul 28, 2020 at 10:51 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:

> It is only an historical leftover, because 15 yrs ago that is how Alexy did it

So how about putting behind the burden created by this historical leftover
and moving to use the kernel releases as the emitted version?

Or.

^ permalink raw reply

* Re: [PATCH v3 5/9] ethernet: ti: am65-cpts: Use generic helper function
From: Kurt Kanzenbach @ 2020-07-31 13:10 UTC (permalink / raw)
  To: Grygorii Strashko, Arnd Bergmann
  Cc: Richard Cochran, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Jakub Kicinski, Jiri Pirko, Ido Schimmel,
	Heiner Kallweit, Russell King, Ivan Khoronzhuk, Samuel Zou,
	Networking, Petr Machata
In-Reply-To: <f30d3a4f-6cf3-1d46-397e-baa27b3c8ade@ti.com>

[-- Attachment #1: Type: text/plain, Size: 1258 bytes --]

On Fri Jul 31 2020, Grygorii Strashko wrote:
> On 31/07/2020 14:48, Kurt Kanzenbach wrote:
>> On Thu Jul 30 2020, Arnd Bergmann wrote:
>>> On Thu, Jul 30, 2020 at 11:41 AM Kurt Kanzenbach <kurt@linutronix.de> wrote:
>>>> On Thu Jul 30 2020, Grygorii Strashko wrote:
>>>>> On 30/07/2020 11:00, Kurt Kanzenbach wrote:
>>>>>> +    msgtype = ptp_get_msgtype(hdr, ptp_class);
>>>>>> +    seqid   = be16_to_cpu(hdr->sequence_id);
>>>>>
>>>>> Is there any reason to not use "ntohs()"?
>>>>
>>>> This is just my personal preference, because I think it's more
>>>> readable. Internally ntohs() uses be16_to_cpu(). There's no technical
>>>> reason for it.
>>>
>>> I think for traditional reasons, code in net/* tends to use ntohs()
>>> while code in drivers/*  tends to use be16_to_cpu().
>>>
>>> In drivers/net/* the two are used roughly the same, though I guess
>>> one could make the argument that be16_to_cpu() would be
>>> more appropriate for data structures exchanged with hardware
>>> while ntohs() makes sense on data structures sent over the
>>> network.
>> 
>> I see, makes sense. I could simply keep it the way it was, or?
>
>   I prefer ntohs() as this packet data.

OK. I'll change it in the next iteration.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* [PATCH][next] vhost: Use flex_array_size() helper in copy_from_user()
From: Gustavo A. R. Silva @ 2020-07-31 13:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: kvm, virtualization, netdev, linux-kernel, Gustavo A. R. Silva

Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure.

This helper offers defense-in-depth against potential integer
overflows, while at the same time makes it explicitly clear that
we are dealing with a flexible array member.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
---
 drivers/vhost/vhost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 74d135ee7e26..1a22a254abe4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1405,7 +1405,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)

 	memcpy(newmem, &mem, size);
 	if (copy_from_user(newmem->regions, m->regions,
-			   mem.nregions * sizeof *m->regions)) {
+			   flex_array_size(newmem, regions, mem.nregions))) {
 		kvfree(newmem);
 		return -EFAULT;
 	}
-- 
2.27.0

^ permalink raw reply related

* [PATCH v2 net-next 11/11] sfc_ef100: add nic-type for VFs, and bind to them
From: Edward Cree @ 2020-07-31 13:01 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

We don't yet have a .sriov_configure() to create them, though.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100.c     |  2 +
 drivers/net/ethernet/sfc/ef100_nic.c | 77 ++++++++++++++++++++++++++++
 drivers/net/ethernet/sfc/ef100_nic.h |  2 +
 3 files changed, 81 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef100.c b/drivers/net/ethernet/sfc/ef100.c
index de611c0f94e7..9729983f4840 100644
--- a/drivers/net/ethernet/sfc/ef100.c
+++ b/drivers/net/ethernet/sfc/ef100.c
@@ -527,6 +527,8 @@ static int ef100_pci_probe(struct pci_dev *pci_dev,
 static const struct pci_device_id ef100_pci_table[] = {
 	{PCI_DEVICE(PCI_VENDOR_ID_XILINX, 0x0100),  /* Riverhead PF */
 		.driver_data = (unsigned long) &ef100_pf_nic_type },
+	{PCI_DEVICE(PCI_VENDOR_ID_XILINX, 0x1100),  /* Riverhead VF */
+		.driver_data = (unsigned long) &ef100_vf_nic_type },
 	{0}                     /* end of list */
 };
 
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 0872ad0aec33..a3d01a28da2d 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -784,6 +784,78 @@ const struct efx_nic_type ef100_pf_nic_type = {
 
 };
 
+const struct efx_nic_type ef100_vf_nic_type = {
+	.revision = EFX_REV_EF100,
+	.is_vf = true,
+	.probe = ef100_probe_vf,
+	.offload_features = EF100_OFFLOAD_FEATURES,
+	.mcdi_max_ver = 2,
+	.mcdi_request = ef100_mcdi_request,
+	.mcdi_poll_response = ef100_mcdi_poll_response,
+	.mcdi_read_response = ef100_mcdi_read_response,
+	.mcdi_poll_reboot = ef100_mcdi_poll_reboot,
+	.mcdi_reboot_detected = ef100_mcdi_reboot_detected,
+	.irq_enable_master = efx_port_dummy_op_void,
+	.irq_test_generate = efx_ef100_irq_test_generate,
+	.irq_disable_non_ev = efx_port_dummy_op_void,
+	.push_irq_moderation = efx_channel_dummy_op_void,
+	.min_interrupt_mode = EFX_INT_MODE_MSIX,
+	.map_reset_reason = ef100_map_reset_reason,
+	.map_reset_flags = ef100_map_reset_flags,
+	.reset = ef100_reset,
+	.check_caps = ef100_check_caps,
+	.ev_probe = ef100_ev_probe,
+	.ev_init = ef100_ev_init,
+	.ev_fini = efx_mcdi_ev_fini,
+	.ev_remove = efx_mcdi_ev_remove,
+	.irq_handle_msi = ef100_msi_interrupt,
+	.ev_process = ef100_ev_process,
+	.ev_read_ack = ef100_ev_read_ack,
+	.ev_test_generate = efx_ef100_ev_test_generate,
+	.tx_probe = ef100_tx_probe,
+	.tx_init = ef100_tx_init,
+	.tx_write = ef100_tx_write,
+	.tx_enqueue = ef100_enqueue_skb,
+	.rx_probe = efx_mcdi_rx_probe,
+	.rx_init = efx_mcdi_rx_init,
+	.rx_remove = efx_mcdi_rx_remove,
+	.rx_write = ef100_rx_write,
+	.rx_packet = __ef100_rx_packet,
+	.fini_dmaq = efx_fini_dmaq,
+	.max_rx_ip_filters = EFX_MCDI_FILTER_TBL_ROWS,
+	.filter_table_probe = ef100_filter_table_up,
+	.filter_table_restore = efx_mcdi_filter_table_restore,
+	.filter_table_remove = ef100_filter_table_down,
+	.filter_insert = efx_mcdi_filter_insert,
+	.filter_remove_safe = efx_mcdi_filter_remove_safe,
+	.filter_get_safe = efx_mcdi_filter_get_safe,
+	.filter_clear_rx = efx_mcdi_filter_clear_rx,
+	.filter_count_rx_used = efx_mcdi_filter_count_rx_used,
+	.filter_get_rx_id_limit = efx_mcdi_filter_get_rx_id_limit,
+	.filter_get_rx_ids = efx_mcdi_filter_get_rx_ids,
+	.filter_rfs_expire_one = efx_mcdi_filter_rfs_expire_one,
+
+	.rx_prefix_size = ESE_GZ_RX_PKT_PREFIX_LEN,
+	.rx_hash_offset = ESF_GZ_RX_PREFIX_RSS_HASH_LBN / 8,
+	.rx_ts_offset = ESF_GZ_RX_PREFIX_PARTIAL_TSTAMP_LBN / 8,
+	.rx_hash_key_size = 40,
+	.rx_pull_rss_config = efx_mcdi_rx_pull_rss_config,
+	.rx_push_rss_config = efx_mcdi_pf_rx_push_rss_config,
+	.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
+
+	.reconfigure_mac = ef100_reconfigure_mac,
+	.test_nvram = efx_new_mcdi_nvram_test_all,
+	.describe_stats = ef100_describe_stats,
+	.start_stats = efx_mcdi_mac_start_stats,
+	.update_stats = ef100_update_stats,
+	.pull_stats = efx_mcdi_mac_pull_stats,
+	.stop_stats = efx_mcdi_mac_stop_stats,
+
+	.mem_bar = NULL,
+	.mem_map_size = NULL,
+
+};
+
 static int compare_versions(const char *a, const char *b)
 {
 	int a_major, a_minor, a_point, a_patch;
@@ -1164,6 +1236,11 @@ int ef100_probe_pf(struct efx_nic *efx)
 	return rc;
 }
 
+int ef100_probe_vf(struct efx_nic *efx)
+{
+	return ef100_probe_main(efx);
+}
+
 void ef100_remove(struct efx_nic *efx)
 {
 	struct ef100_nic_data *nic_data = efx->nic_data;
diff --git a/drivers/net/ethernet/sfc/ef100_nic.h b/drivers/net/ethernet/sfc/ef100_nic.h
index 4a64c9438493..e799688d5264 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.h
+++ b/drivers/net/ethernet/sfc/ef100_nic.h
@@ -13,8 +13,10 @@
 #include "nic_common.h"
 
 extern const struct efx_nic_type ef100_pf_nic_type;
+extern const struct efx_nic_type ef100_vf_nic_type;
 
 int ef100_probe_pf(struct efx_nic *efx);
+int ef100_probe_vf(struct efx_nic *efx);
 void ef100_remove(struct efx_nic *efx);
 
 enum {

^ permalink raw reply related

* [PATCH v2 net-next 10/11] sfc_ef100: read pf_index at probe time
From: Edward Cree @ 2020-07-31 13:01 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

We'll need it later, for VF representors.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_nic.c | 4 ++++
 drivers/net/ethernet/sfc/ef100_nic.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 1c549bc01f61..0872ad0aec33 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -1082,6 +1082,10 @@ static int ef100_probe_main(struct efx_nic *efx)
 	if (rc)
 		goto fail;
 
+	rc = efx_get_pf_index(efx, &nic_data->pf_index);
+	if (rc)
+		goto fail;
+
 	rc = efx_ef100_init_datapath_caps(efx);
 	if (rc < 0)
 		goto fail;
diff --git a/drivers/net/ethernet/sfc/ef100_nic.h b/drivers/net/ethernet/sfc/ef100_nic.h
index 7c2d37490074..4a64c9438493 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.h
+++ b/drivers/net/ethernet/sfc/ef100_nic.h
@@ -63,6 +63,7 @@ struct ef100_nic_data {
 	u32 datapath_caps;
 	u32 datapath_caps2;
 	u32 datapath_caps3;
+	unsigned int pf_index;
 	u16 warm_boot_count;
 	u8 port_id[ETH_ALEN];
 	DECLARE_BITMAP(evq_phases, EFX_MAX_CHANNELS);


^ permalink raw reply related

* [PATCH v2 net-next 09/11] sfc_ef100: functions for selftests
From: Edward Cree @ 2020-07-31 13:00 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

Self-tests for event and interrupt reception and NVRAM.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_nic.c | 47 ++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 1c93091609bd..1c549bc01f61 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -633,6 +633,50 @@ static int efx_ef100_get_phys_port_id(struct efx_nic *efx,
 	return 0;
 }
 
+static int efx_ef100_irq_test_generate(struct efx_nic *efx)
+{
+	MCDI_DECLARE_BUF(inbuf, MC_CMD_TRIGGER_INTERRUPT_IN_LEN);
+
+	BUILD_BUG_ON(MC_CMD_TRIGGER_INTERRUPT_OUT_LEN != 0);
+
+	MCDI_SET_DWORD(inbuf, TRIGGER_INTERRUPT_IN_INTR_LEVEL, efx->irq_level);
+	return efx_mcdi_rpc_quiet(efx, MC_CMD_TRIGGER_INTERRUPT,
+				  inbuf, sizeof(inbuf), NULL, 0, NULL);
+}
+
+#define EFX_EF100_TEST 1
+
+static void efx_ef100_ev_test_generate(struct efx_channel *channel)
+{
+	MCDI_DECLARE_BUF(inbuf, MC_CMD_DRIVER_EVENT_IN_LEN);
+	struct efx_nic *efx = channel->efx;
+	efx_qword_t event;
+	int rc;
+
+	EFX_POPULATE_QWORD_2(event,
+			     ESF_GZ_E_TYPE, ESE_GZ_EF100_EV_DRIVER,
+			     ESF_GZ_DRIVER_DATA, EFX_EF100_TEST);
+
+	MCDI_SET_DWORD(inbuf, DRIVER_EVENT_IN_EVQ, channel->channel);
+
+	/* MCDI_SET_QWORD is not appropriate here since EFX_POPULATE_* has
+	 * already swapped the data to little-endian order.
+	 */
+	memcpy(MCDI_PTR(inbuf, DRIVER_EVENT_IN_DATA), &event.u64[0],
+	       sizeof(efx_qword_t));
+
+	rc = efx_mcdi_rpc(efx, MC_CMD_DRIVER_EVENT, inbuf, sizeof(inbuf),
+			  NULL, 0, NULL);
+	if (rc && (rc != -ENETDOWN))
+		goto fail;
+
+	return;
+
+fail:
+	WARN_ON(true);
+	netif_err(efx, hw, efx->net_dev, "%s: failed rc=%d\n", __func__, rc);
+}
+
 static unsigned int ef100_check_caps(const struct efx_nic *efx,
 				     u8 flag, u32 offset)
 {
@@ -669,6 +713,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
 	.mcdi_poll_reboot = ef100_mcdi_poll_reboot,
 	.mcdi_reboot_detected = ef100_mcdi_reboot_detected,
 	.irq_enable_master = efx_port_dummy_op_void,
+	.irq_test_generate = efx_ef100_irq_test_generate,
 	.irq_disable_non_ev = efx_port_dummy_op_void,
 	.push_irq_moderation = efx_channel_dummy_op_void,
 	.min_interrupt_mode = EFX_INT_MODE_MSIX,
@@ -685,6 +730,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
 	.irq_handle_msi = ef100_msi_interrupt,
 	.ev_process = ef100_ev_process,
 	.ev_read_ack = ef100_ev_read_ack,
+	.ev_test_generate = efx_ef100_ev_test_generate,
 	.tx_probe = ef100_tx_probe,
 	.tx_init = ef100_tx_init,
 	.tx_write = ef100_tx_write,
@@ -723,6 +769,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
 	.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
 
 	.reconfigure_mac = ef100_reconfigure_mac,
+	.test_nvram = efx_new_mcdi_nvram_test_all,
 	.describe_stats = ef100_describe_stats,
 	.start_stats = efx_mcdi_mac_start_stats,
 	.update_stats = ef100_update_stats,


^ permalink raw reply related

* [PATCH v2 net-next 08/11] sfc_ef100: statistics gathering
From: Edward Cree @ 2020-07-31 13:00 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

MAC stats work much the same as on EF10, with a periodic DMA to a region
 specified via an MCDI.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_netdev.c |   6 +
 drivers/net/ethernet/sfc/ef100_nic.c    | 171 ++++++++++++++++++++++++
 drivers/net/ethernet/sfc/ef100_nic.h    |  41 ++++++
 3 files changed, 218 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef100_netdev.c b/drivers/net/ethernet/sfc/ef100_netdev.c
index 362a915c836a..63c311ba28b9 100644
--- a/drivers/net/ethernet/sfc/ef100_netdev.c
+++ b/drivers/net/ethernet/sfc/ef100_netdev.c
@@ -86,6 +86,7 @@ static int ef100_net_stop(struct net_device *net_dev)
 
 	netif_stop_queue(net_dev);
 	efx_stop_all(efx);
+	efx_mcdi_mac_fini_stats(efx);
 	efx_disable_interrupts(efx);
 	efx_clear_interrupt_affinity(efx);
 	efx_nic_fini_interrupt(efx);
@@ -157,6 +158,10 @@ static int ef100_net_open(struct net_device *net_dev)
 	 */
 	(void) efx_mcdi_poll_reboot(efx);
 
+	rc = efx_mcdi_mac_init_stats(efx);
+	if (rc)
+		goto fail;
+
 	efx_start_all(efx);
 
 	/* Link state detection is normally event-driven; we have
@@ -212,6 +217,7 @@ static const struct net_device_ops ef100_netdev_ops = {
 	.ndo_open               = ef100_net_open,
 	.ndo_stop               = ef100_net_stop,
 	.ndo_start_xmit         = ef100_hard_start_xmit,
+	.ndo_get_stats64        = efx_net_stats,
 	.ndo_validate_addr      = eth_validate_addr,
 	.ndo_set_rx_mode        = efx_set_rx_mode, /* Lookout */
 	.ndo_get_phys_port_id   = efx_get_phys_port_id,
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index c96cbf3e0111..1c93091609bd 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -453,6 +453,172 @@ static int ef100_reset(struct efx_nic *efx, enum reset_type reset_type)
 	return rc;
 }
 
+static void ef100_common_stat_mask(unsigned long *mask)
+{
+	__set_bit(EF100_STAT_port_tx_bytes, mask);
+	__set_bit(EF100_STAT_port_tx_packets, mask);
+	__set_bit(EF100_STAT_port_tx_pause, mask);
+	__set_bit(EF100_STAT_port_tx_unicast, mask);
+	__set_bit(EF100_STAT_port_tx_multicast, mask);
+	__set_bit(EF100_STAT_port_tx_broadcast, mask);
+	__set_bit(EF100_STAT_port_tx_lt64, mask);
+	__set_bit(EF100_STAT_port_tx_64, mask);
+	__set_bit(EF100_STAT_port_tx_65_to_127, mask);
+	__set_bit(EF100_STAT_port_tx_128_to_255, mask);
+	__set_bit(EF100_STAT_port_tx_256_to_511, mask);
+	__set_bit(EF100_STAT_port_tx_512_to_1023, mask);
+	__set_bit(EF100_STAT_port_tx_1024_to_15xx, mask);
+	__set_bit(EF100_STAT_port_tx_15xx_to_jumbo, mask);
+	__set_bit(EF100_STAT_port_rx_bytes, mask);
+	__set_bit(EF100_STAT_port_rx_packets, mask);
+	__set_bit(EF100_STAT_port_rx_good, mask);
+	__set_bit(EF100_STAT_port_rx_bad, mask);
+	__set_bit(EF100_STAT_port_rx_pause, mask);
+	__set_bit(EF100_STAT_port_rx_unicast, mask);
+	__set_bit(EF100_STAT_port_rx_multicast, mask);
+	__set_bit(EF100_STAT_port_rx_broadcast, mask);
+	__set_bit(EF100_STAT_port_rx_lt64, mask);
+	__set_bit(EF100_STAT_port_rx_64, mask);
+	__set_bit(EF100_STAT_port_rx_65_to_127, mask);
+	__set_bit(EF100_STAT_port_rx_128_to_255, mask);
+	__set_bit(EF100_STAT_port_rx_256_to_511, mask);
+	__set_bit(EF100_STAT_port_rx_512_to_1023, mask);
+	__set_bit(EF100_STAT_port_rx_1024_to_15xx, mask);
+	__set_bit(EF100_STAT_port_rx_15xx_to_jumbo, mask);
+	__set_bit(EF100_STAT_port_rx_gtjumbo, mask);
+	__set_bit(EF100_STAT_port_rx_bad_gtjumbo, mask);
+	__set_bit(EF100_STAT_port_rx_align_error, mask);
+	__set_bit(EF100_STAT_port_rx_length_error, mask);
+	__set_bit(EF100_STAT_port_rx_overflow, mask);
+	__set_bit(EF100_STAT_port_rx_nodesc_drops, mask);
+	__set_bit(GENERIC_STAT_rx_nodesc_trunc, mask);
+	__set_bit(GENERIC_STAT_rx_noskb_drops, mask);
+}
+
+#define EF100_DMA_STAT(ext_name, mcdi_name)			\
+	[EF100_STAT_ ## ext_name] =				\
+	{ #ext_name, 64, 8 * MC_CMD_MAC_ ## mcdi_name }
+
+static const struct efx_hw_stat_desc ef100_stat_desc[EF100_STAT_COUNT] = {
+	EF100_DMA_STAT(port_tx_bytes, TX_BYTES),
+	EF100_DMA_STAT(port_tx_packets, TX_PKTS),
+	EF100_DMA_STAT(port_tx_pause, TX_PAUSE_PKTS),
+	EF100_DMA_STAT(port_tx_unicast, TX_UNICAST_PKTS),
+	EF100_DMA_STAT(port_tx_multicast, TX_MULTICAST_PKTS),
+	EF100_DMA_STAT(port_tx_broadcast, TX_BROADCAST_PKTS),
+	EF100_DMA_STAT(port_tx_lt64, TX_LT64_PKTS),
+	EF100_DMA_STAT(port_tx_64, TX_64_PKTS),
+	EF100_DMA_STAT(port_tx_65_to_127, TX_65_TO_127_PKTS),
+	EF100_DMA_STAT(port_tx_128_to_255, TX_128_TO_255_PKTS),
+	EF100_DMA_STAT(port_tx_256_to_511, TX_256_TO_511_PKTS),
+	EF100_DMA_STAT(port_tx_512_to_1023, TX_512_TO_1023_PKTS),
+	EF100_DMA_STAT(port_tx_1024_to_15xx, TX_1024_TO_15XX_PKTS),
+	EF100_DMA_STAT(port_tx_15xx_to_jumbo, TX_15XX_TO_JUMBO_PKTS),
+	EF100_DMA_STAT(port_rx_bytes, RX_BYTES),
+	EF100_DMA_STAT(port_rx_packets, RX_PKTS),
+	EF100_DMA_STAT(port_rx_good, RX_GOOD_PKTS),
+	EF100_DMA_STAT(port_rx_bad, RX_BAD_FCS_PKTS),
+	EF100_DMA_STAT(port_rx_pause, RX_PAUSE_PKTS),
+	EF100_DMA_STAT(port_rx_unicast, RX_UNICAST_PKTS),
+	EF100_DMA_STAT(port_rx_multicast, RX_MULTICAST_PKTS),
+	EF100_DMA_STAT(port_rx_broadcast, RX_BROADCAST_PKTS),
+	EF100_DMA_STAT(port_rx_lt64, RX_UNDERSIZE_PKTS),
+	EF100_DMA_STAT(port_rx_64, RX_64_PKTS),
+	EF100_DMA_STAT(port_rx_65_to_127, RX_65_TO_127_PKTS),
+	EF100_DMA_STAT(port_rx_128_to_255, RX_128_TO_255_PKTS),
+	EF100_DMA_STAT(port_rx_256_to_511, RX_256_TO_511_PKTS),
+	EF100_DMA_STAT(port_rx_512_to_1023, RX_512_TO_1023_PKTS),
+	EF100_DMA_STAT(port_rx_1024_to_15xx, RX_1024_TO_15XX_PKTS),
+	EF100_DMA_STAT(port_rx_15xx_to_jumbo, RX_15XX_TO_JUMBO_PKTS),
+	EF100_DMA_STAT(port_rx_gtjumbo, RX_GTJUMBO_PKTS),
+	EF100_DMA_STAT(port_rx_bad_gtjumbo, RX_JABBER_PKTS),
+	EF100_DMA_STAT(port_rx_align_error, RX_ALIGN_ERROR_PKTS),
+	EF100_DMA_STAT(port_rx_length_error, RX_LENGTH_ERROR_PKTS),
+	EF100_DMA_STAT(port_rx_overflow, RX_OVERFLOW_PKTS),
+	EF100_DMA_STAT(port_rx_nodesc_drops, RX_NODESC_DROPS),
+	EFX_GENERIC_SW_STAT(rx_nodesc_trunc),
+	EFX_GENERIC_SW_STAT(rx_noskb_drops),
+};
+
+static void ef100_get_stat_mask(struct efx_nic *efx, unsigned long *mask)
+{
+	ef100_common_stat_mask(mask);
+}
+
+static size_t ef100_describe_stats(struct efx_nic *efx, u8 *names)
+{
+	DECLARE_BITMAP(mask, EF100_STAT_COUNT);
+
+	ef100_get_stat_mask(efx, mask);
+	return efx_nic_describe_stats(ef100_stat_desc, EF100_STAT_COUNT,
+				      mask, names);
+}
+
+static size_t ef100_update_stats_common(struct efx_nic *efx, u64 *full_stats,
+					struct rtnl_link_stats64 *core_stats)
+{
+	struct ef100_nic_data *nic_data = efx->nic_data;
+	DECLARE_BITMAP(mask, EF100_STAT_COUNT);
+	size_t stats_count = 0, index;
+	u64 *stats = nic_data->stats;
+
+	ef100_get_stat_mask(efx, mask);
+
+	if (full_stats) {
+		for_each_set_bit(index, mask, EF100_STAT_COUNT) {
+			if (ef100_stat_desc[index].name) {
+				*full_stats++ = stats[index];
+				++stats_count;
+			}
+		}
+	}
+
+	if (!core_stats)
+		return stats_count;
+
+	core_stats->rx_packets = stats[EF100_STAT_port_rx_packets];
+	core_stats->tx_packets = stats[EF100_STAT_port_tx_packets];
+	core_stats->rx_bytes = stats[EF100_STAT_port_rx_bytes];
+	core_stats->tx_bytes = stats[EF100_STAT_port_tx_bytes];
+	core_stats->rx_dropped = stats[EF100_STAT_port_rx_nodesc_drops] +
+				 stats[GENERIC_STAT_rx_nodesc_trunc] +
+				 stats[GENERIC_STAT_rx_noskb_drops];
+	core_stats->multicast = stats[EF100_STAT_port_rx_multicast];
+	core_stats->rx_length_errors =
+			stats[EF100_STAT_port_rx_gtjumbo] +
+			stats[EF100_STAT_port_rx_length_error];
+	core_stats->rx_crc_errors = stats[EF100_STAT_port_rx_bad];
+	core_stats->rx_frame_errors =
+			stats[EF100_STAT_port_rx_align_error];
+	core_stats->rx_fifo_errors = stats[EF100_STAT_port_rx_overflow];
+	core_stats->rx_errors = (core_stats->rx_length_errors +
+				 core_stats->rx_crc_errors +
+				 core_stats->rx_frame_errors);
+
+	return stats_count;
+}
+
+static size_t ef100_update_stats(struct efx_nic *efx,
+				 u64 *full_stats,
+				 struct rtnl_link_stats64 *core_stats)
+{
+	struct ef100_nic_data *nic_data = efx->nic_data;
+	DECLARE_BITMAP(mask, EF100_STAT_COUNT);
+	__le64 *mc_stats = kmalloc(efx->num_mac_stats * sizeof(__le64),
+				   GFP_ATOMIC);
+	u64 *stats = nic_data->stats;
+
+	ef100_get_stat_mask(efx, mask);
+
+	efx_nic_copy_stats(efx, mc_stats);
+	efx_nic_update_stats(ef100_stat_desc, EF100_STAT_COUNT, mask,
+			     stats, mc_stats, false);
+
+	kfree(mc_stats);
+
+	return ef100_update_stats_common(efx, full_stats, core_stats);
+}
+
 static int efx_ef100_get_phys_port_id(struct efx_nic *efx,
 				      struct netdev_phys_item_id *ppid)
 {
@@ -557,6 +723,11 @@ const struct efx_nic_type ef100_pf_nic_type = {
 	.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
 
 	.reconfigure_mac = ef100_reconfigure_mac,
+	.describe_stats = ef100_describe_stats,
+	.start_stats = efx_mcdi_mac_start_stats,
+	.update_stats = ef100_update_stats,
+	.pull_stats = efx_mcdi_mac_pull_stats,
+	.stop_stats = efx_mcdi_mac_stop_stats,
 
 	/* Per-type bar/size configuration not used on ef100. Location of
 	 * registers is defined by extended capabilities.
diff --git a/drivers/net/ethernet/sfc/ef100_nic.h b/drivers/net/ethernet/sfc/ef100_nic.h
index c8816bc6ae78..7c2d37490074 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.h
+++ b/drivers/net/ethernet/sfc/ef100_nic.h
@@ -17,6 +17,46 @@ extern const struct efx_nic_type ef100_pf_nic_type;
 int ef100_probe_pf(struct efx_nic *efx);
 void ef100_remove(struct efx_nic *efx);
 
+enum {
+	EF100_STAT_port_tx_bytes = GENERIC_STAT_COUNT,
+	EF100_STAT_port_tx_packets,
+	EF100_STAT_port_tx_pause,
+	EF100_STAT_port_tx_unicast,
+	EF100_STAT_port_tx_multicast,
+	EF100_STAT_port_tx_broadcast,
+	EF100_STAT_port_tx_lt64,
+	EF100_STAT_port_tx_64,
+	EF100_STAT_port_tx_65_to_127,
+	EF100_STAT_port_tx_128_to_255,
+	EF100_STAT_port_tx_256_to_511,
+	EF100_STAT_port_tx_512_to_1023,
+	EF100_STAT_port_tx_1024_to_15xx,
+	EF100_STAT_port_tx_15xx_to_jumbo,
+	EF100_STAT_port_rx_bytes,
+	EF100_STAT_port_rx_packets,
+	EF100_STAT_port_rx_good,
+	EF100_STAT_port_rx_bad,
+	EF100_STAT_port_rx_pause,
+	EF100_STAT_port_rx_unicast,
+	EF100_STAT_port_rx_multicast,
+	EF100_STAT_port_rx_broadcast,
+	EF100_STAT_port_rx_lt64,
+	EF100_STAT_port_rx_64,
+	EF100_STAT_port_rx_65_to_127,
+	EF100_STAT_port_rx_128_to_255,
+	EF100_STAT_port_rx_256_to_511,
+	EF100_STAT_port_rx_512_to_1023,
+	EF100_STAT_port_rx_1024_to_15xx,
+	EF100_STAT_port_rx_15xx_to_jumbo,
+	EF100_STAT_port_rx_gtjumbo,
+	EF100_STAT_port_rx_bad_gtjumbo,
+	EF100_STAT_port_rx_align_error,
+	EF100_STAT_port_rx_length_error,
+	EF100_STAT_port_rx_overflow,
+	EF100_STAT_port_rx_nodesc_drops,
+	EF100_STAT_COUNT
+};
+
 struct ef100_nic_data {
 	struct efx_nic *efx;
 	struct efx_buffer mcdi_buf;
@@ -26,6 +66,7 @@ struct ef100_nic_data {
 	u16 warm_boot_count;
 	u8 port_id[ETH_ALEN];
 	DECLARE_BITMAP(evq_phases, EFX_MAX_CHANNELS);
+	u64 stats[EF100_STAT_COUNT];
 	u16 tso_max_hdr_len;
 	u16 tso_max_payload_num_segs;
 	u16 tso_max_frames;


^ permalink raw reply related

* [PATCH v2 net-next 07/11] sfc_ef100: plumb in fini_dmaq
From: Edward Cree @ 2020-07-31 13:00 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

Bring down the TX and RX queues at ifdown, so that we can then fini the
 EVQs (otherwise the MC would return EBUSY because they're still in use).

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_nic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index e299c62a5012..c96cbf3e0111 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -528,6 +528,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
 	.rx_remove = efx_mcdi_rx_remove,
 	.rx_write = ef100_rx_write,
 	.rx_packet = __ef100_rx_packet,
+	.fini_dmaq = efx_fini_dmaq,
 	.max_rx_ip_filters = EFX_MCDI_FILTER_TBL_ROWS,
 	.filter_table_probe = ef100_filter_table_up,
 	.filter_table_restore = efx_mcdi_filter_table_restore,


^ permalink raw reply related

* [PATCH v2 net-next 06/11] sfc_ef100: RX path for EF100
From: Edward Cree @ 2020-07-31 12:59 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

Includes RSS spreading.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_nic.c |  25 ++++-
 drivers/net/ethernet/sfc/ef100_rx.c  | 150 +++++++++++++++++++++++++--
 drivers/net/ethernet/sfc/ef100_rx.h  |   1 +
 3 files changed, 167 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 95221b829ec4..e299c62a5012 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -260,6 +260,10 @@ static int ef100_ev_process(struct efx_channel *channel, int quota)
 		ev_type = EFX_QWORD_FIELD(*p_event, ESF_GZ_E_TYPE);
 
 		switch (ev_type) {
+		case ESE_GZ_EF100_EV_RX_PKTS:
+			efx_ef100_ev_rx(channel, p_event);
+			++spent;
+			break;
 		case ESE_GZ_EF100_EV_MCDI:
 			efx_mcdi_process_event(channel, p_event);
 			break;
@@ -482,9 +486,10 @@ static unsigned int ef100_check_caps(const struct efx_nic *efx,
 
 /*	NIC level access functions
  */
-#define EF100_OFFLOAD_FEATURES	(NETIF_F_HW_CSUM |			\
+#define EF100_OFFLOAD_FEATURES	(NETIF_F_HW_CSUM | NETIF_F_RXCSUM |	\
 	NETIF_F_HIGHDMA | NETIF_F_SG | NETIF_F_FRAGLIST |		\
-	NETIF_F_TSO_ECN | NETIF_F_TSO_MANGLEID | NETIF_F_HW_VLAN_CTAG_TX)
+	NETIF_F_RXHASH | NETIF_F_RXFCS | NETIF_F_TSO_ECN | NETIF_F_RXALL | \
+	NETIF_F_TSO_MANGLEID | NETIF_F_HW_VLAN_CTAG_TX)
 
 const struct efx_nic_type ef100_pf_nic_type = {
 	.revision = EFX_REV_EF100,
@@ -540,6 +545,16 @@ const struct efx_nic_type ef100_pf_nic_type = {
 
 	.get_phys_port_id = efx_ef100_get_phys_port_id,
 
+	.rx_prefix_size = ESE_GZ_RX_PKT_PREFIX_LEN,
+	.rx_hash_offset = ESF_GZ_RX_PREFIX_RSS_HASH_LBN / 8,
+	.rx_ts_offset = ESF_GZ_RX_PREFIX_PARTIAL_TSTAMP_LBN / 8,
+	.rx_hash_key_size = 40,
+	.rx_pull_rss_config = efx_mcdi_rx_pull_rss_config,
+	.rx_push_rss_config = efx_mcdi_pf_rx_push_rss_config,
+	.rx_push_rss_context_config = efx_mcdi_rx_push_rss_context_config,
+	.rx_pull_rss_context_config = efx_mcdi_rx_pull_rss_context_config,
+	.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
+
 	.reconfigure_mac = ef100_reconfigure_mac,
 
 	/* Per-type bar/size configuration not used on ef100. Location of
@@ -888,6 +903,12 @@ static int ef100_probe_main(struct efx_nic *efx)
 	if (rc)
 		goto fail;
 
+	netdev_rss_key_fill(efx->rss_context.rx_hash_key,
+			    sizeof(efx->rss_context.rx_hash_key));
+
+	/* Don't fail init if RSS setup doesn't work. */
+	efx_mcdi_push_default_indir_table(efx, efx->n_rx_channels);
+
 	rc = ef100_register_netdev(efx);
 	if (rc)
 		goto fail;
diff --git a/drivers/net/ethernet/sfc/ef100_rx.c b/drivers/net/ethernet/sfc/ef100_rx.c
index 4223a38f46d3..13ba1a4f66fc 100644
--- a/drivers/net/ethernet/sfc/ef100_rx.c
+++ b/drivers/net/ethernet/sfc/ef100_rx.c
@@ -12,20 +12,156 @@
 #include "ef100_rx.h"
 #include "rx_common.h"
 #include "efx.h"
+#include "nic_common.h"
+#include "mcdi_functions.h"
+#include "ef100_regs.h"
+#include "ef100_nic.h"
+#include "io.h"
 
-/* RX stubs */
+/* Get the value of a field in the RX prefix */
+#define PREFIX_OFFSET_W(_f)	(ESF_GZ_RX_PREFIX_ ## _f ## _LBN / 32)
+#define PREFIX_OFFSET_B(_f)	(ESF_GZ_RX_PREFIX_ ## _f ## _LBN % 32)
+#define PREFIX_WIDTH_MASK(_f)	((1UL << ESF_GZ_RX_PREFIX_ ## _f ## _WIDTH) - 1)
+#define PREFIX_WORD(_p, _f)	le32_to_cpu((__force __le32)(_p)[PREFIX_OFFSET_W(_f)])
+#define PREFIX_FIELD(_p, _f)	((PREFIX_WORD(_p, _f) >> PREFIX_OFFSET_B(_f)) & \
+				 PREFIX_WIDTH_MASK(_f))
 
-void ef100_rx_write(struct efx_rx_queue *rx_queue)
+#define ESF_GZ_RX_PREFIX_NT_OR_INNER_L3_CLASS_LBN	\
+		(ESF_GZ_RX_PREFIX_CLASS_LBN + ESF_GZ_RX_PREFIX_HCLASS_NT_OR_INNER_L3_CLASS_LBN)
+#define ESF_GZ_RX_PREFIX_NT_OR_INNER_L3_CLASS_WIDTH	\
+		ESF_GZ_RX_PREFIX_HCLASS_NT_OR_INNER_L3_CLASS_WIDTH
+
+static bool check_fcs(struct efx_channel *channel, u32 *prefix)
 {
+	u16 rxclass;
+	u8 l2status;
+
+	rxclass = le16_to_cpu((__force __le16)PREFIX_FIELD(prefix, CLASS));
+	l2status = PREFIX_FIELD(&rxclass, HCLASS_L2_STATUS);
+
+	if (likely(l2status == ESE_GZ_RH_HCLASS_L2_STATUS_OK))
+		/* Everything is ok */
+		return 0;
+
+	if (l2status == ESE_GZ_RH_HCLASS_L2_STATUS_FCS_ERR)
+		channel->n_rx_eth_crc_err++;
+	return 1;
 }
 
 void __ef100_rx_packet(struct efx_channel *channel)
 {
-	/* Stub.  No RX path yet.  Discard the buffer. */
-	struct efx_rx_buffer *rx_buf = efx_rx_buffer(&channel->rx_queue,
-						     channel->rx_pkt_index);
-	struct efx_rx_queue *rx_queue = efx_channel_get_rx_queue(channel);
+	struct efx_rx_buffer *rx_buf = efx_rx_buffer(&channel->rx_queue, channel->rx_pkt_index);
+	struct efx_nic *efx = channel->efx;
+	u8 *eh = efx_rx_buf_va(rx_buf);
+	__wsum csum = 0;
+	u32 *prefix;
+
+	prefix = (u32 *)(eh - ESE_GZ_RX_PKT_PREFIX_LEN);
+
+	if (check_fcs(channel, prefix) &&
+	    unlikely(!(efx->net_dev->features & NETIF_F_RXALL)))
+		goto out;
+
+	rx_buf->len = le16_to_cpu((__force __le16)PREFIX_FIELD(prefix, LENGTH));
+	if (rx_buf->len <= sizeof(struct ethhdr)) {
+		if (net_ratelimit())
+			netif_err(channel->efx, rx_err, channel->efx->net_dev,
+				  "RX packet too small (%d)\n", rx_buf->len);
+		++channel->n_rx_frm_trunc;
+		goto out;
+	}
+
+	if (likely(efx->net_dev->features & NETIF_F_RXCSUM)) {
+		if (PREFIX_FIELD(prefix, NT_OR_INNER_L3_CLASS) == 1) {
+			++channel->n_rx_ip_hdr_chksum_err;
+		} else {
+			u16 sum = be16_to_cpu((__force __be16)PREFIX_FIELD(prefix, CSUM_FRAME));
 
-	efx_free_rx_buffers(rx_queue, rx_buf, 1);
+			csum = (__force __wsum) sum;
+		}
+	}
+
+	if (channel->type->receive_skb) {
+		struct efx_rx_queue *rx_queue =
+			efx_channel_get_rx_queue(channel);
+
+		/* no support for special channels yet, so just discard */
+		WARN_ON_ONCE(1);
+		efx_free_rx_buffers(rx_queue, rx_buf, 1);
+		goto out;
+	}
+
+	efx_rx_packet_gro(channel, rx_buf, channel->rx_pkt_n_frags, eh, csum);
+
+out:
 	channel->rx_pkt_n_frags = 0;
 }
+
+static void ef100_rx_packet(struct efx_rx_queue *rx_queue, unsigned int index)
+{
+	struct efx_rx_buffer *rx_buf = efx_rx_buffer(rx_queue, index);
+	struct efx_channel *channel = efx_rx_queue_channel(rx_queue);
+	struct efx_nic *efx = rx_queue->efx;
+
+	++rx_queue->rx_packets;
+
+	netif_vdbg(efx, rx_status, efx->net_dev,
+		   "RX queue %d received id %x\n",
+		   efx_rx_queue_index(rx_queue), index);
+
+	efx_sync_rx_buffer(efx, rx_buf, efx->rx_dma_len);
+
+	prefetch(efx_rx_buf_va(rx_buf));
+
+	rx_buf->page_offset += efx->rx_prefix_size;
+
+	efx_recycle_rx_pages(channel, rx_buf, 1);
+
+	efx_rx_flush_packet(channel);
+	channel->rx_pkt_n_frags = 1;
+	channel->rx_pkt_index = index;
+}
+
+void efx_ef100_ev_rx(struct efx_channel *channel, const efx_qword_t *p_event)
+{
+	struct efx_rx_queue *rx_queue = efx_channel_get_rx_queue(channel);
+	unsigned int n_packets =
+		EFX_QWORD_FIELD(*p_event, ESF_GZ_EV_RXPKTS_NUM_PKT);
+	int i;
+
+	WARN_ON_ONCE(!n_packets);
+	if (n_packets > 1)
+		++channel->n_rx_merge_events;
+
+	channel->irq_mod_score += 2 * n_packets;
+
+	for (i = 0; i < n_packets; ++i) {
+		ef100_rx_packet(rx_queue,
+				rx_queue->removed_count & rx_queue->ptr_mask);
+		++rx_queue->removed_count;
+	}
+}
+
+void ef100_rx_write(struct efx_rx_queue *rx_queue)
+{
+	struct efx_rx_buffer *rx_buf;
+	unsigned int idx;
+	efx_qword_t *rxd;
+	efx_dword_t rxdb;
+
+	while (rx_queue->notified_count != rx_queue->added_count) {
+		idx = rx_queue->notified_count & rx_queue->ptr_mask;
+		rx_buf = efx_rx_buffer(rx_queue, idx);
+		rxd = efx_rx_desc(rx_queue, idx);
+
+		EFX_POPULATE_QWORD_1(*rxd, ESF_GZ_RX_BUF_ADDR, rx_buf->dma_addr);
+
+		++rx_queue->notified_count;
+	}
+
+	wmb();
+	EFX_POPULATE_DWORD_1(rxdb, ERF_GZ_RX_RING_PIDX,
+			     rx_queue->added_count & rx_queue->ptr_mask);
+	efx_writed_page(rx_queue->efx, &rxdb,
+			ER_GZ_RX_RING_DOORBELL, efx_rx_queue_index(rx_queue));
+}
diff --git a/drivers/net/ethernet/sfc/ef100_rx.h b/drivers/net/ethernet/sfc/ef100_rx.h
index b5dadf741aa0..f2f266863966 100644
--- a/drivers/net/ethernet/sfc/ef100_rx.h
+++ b/drivers/net/ethernet/sfc/ef100_rx.h
@@ -14,6 +14,7 @@
 
 #include "net_driver.h"
 
+void efx_ef100_ev_rx(struct efx_channel *channel, const efx_qword_t *p_event);
 void ef100_rx_write(struct efx_rx_queue *rx_queue);
 void __ef100_rx_packet(struct efx_channel *channel);
 


^ permalink raw reply related

* [PATCH v2 net-next 05/11] sfc_ef100: RX filter table management and related gubbins
From: Edward Cree @ 2020-07-31 12:59 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_netdev.c | 10 ++++
 drivers/net/ethernet/sfc/ef100_nic.c    | 67 +++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef100_netdev.c b/drivers/net/ethernet/sfc/ef100_netdev.c
index ec9ca81fed85..362a915c836a 100644
--- a/drivers/net/ethernet/sfc/ef100_netdev.c
+++ b/drivers/net/ethernet/sfc/ef100_netdev.c
@@ -89,6 +89,7 @@ static int ef100_net_stop(struct net_device *net_dev)
 	efx_disable_interrupts(efx);
 	efx_clear_interrupt_affinity(efx);
 	efx_nic_fini_interrupt(efx);
+	efx_remove_filters(efx);
 	efx_fini_napi(efx);
 	efx_remove_channels(efx);
 	efx_mcdi_free_vis(efx);
@@ -138,6 +139,10 @@ static int ef100_net_open(struct net_device *net_dev)
 
 	efx_init_napi(efx);
 
+	rc = efx_probe_filters(efx);
+	if (rc)
+		goto fail;
+
 	rc = efx_nic_init_interrupt(efx);
 	if (rc)
 		goto fail;
@@ -207,8 +212,13 @@ static const struct net_device_ops ef100_netdev_ops = {
 	.ndo_open               = ef100_net_open,
 	.ndo_stop               = ef100_net_stop,
 	.ndo_start_xmit         = ef100_hard_start_xmit,
+	.ndo_validate_addr      = eth_validate_addr,
+	.ndo_set_rx_mode        = efx_set_rx_mode, /* Lookout */
 	.ndo_get_phys_port_id   = efx_get_phys_port_id,
 	.ndo_get_phys_port_name = efx_get_phys_port_name,
+#ifdef CONFIG_RFS_ACCEL
+	.ndo_rx_flow_steer      = efx_filter_rfs,
+#endif
 };
 
 /*	Netdev registration
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 5d576bd99059..95221b829ec4 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -347,6 +347,37 @@ static int ef100_phy_probe(struct efx_nic *efx)
 	return 0;
 }
 
+static int ef100_filter_table_probe(struct efx_nic *efx)
+{
+	return efx_mcdi_filter_table_probe(efx, true);
+}
+
+static int ef100_filter_table_up(struct efx_nic *efx)
+{
+	int rc;
+
+	rc = efx_mcdi_filter_add_vlan(efx, EFX_FILTER_VID_UNSPEC);
+	if (rc) {
+		efx_mcdi_filter_table_down(efx);
+		return rc;
+	}
+
+	rc = efx_mcdi_filter_add_vlan(efx, 0);
+	if (rc) {
+		efx_mcdi_filter_del_vlan(efx, EFX_FILTER_VID_UNSPEC);
+		efx_mcdi_filter_table_down(efx);
+	}
+
+	return rc;
+}
+
+static void ef100_filter_table_down(struct efx_nic *efx)
+{
+	efx_mcdi_filter_del_vlan(efx, 0);
+	efx_mcdi_filter_del_vlan(efx, EFX_FILTER_VID_UNSPEC);
+	efx_mcdi_filter_table_down(efx);
+}
+
 /*	Other
  */
 static int ef100_reconfigure_mac(struct efx_nic *efx, bool mtu_only)
@@ -393,12 +424,24 @@ static int ef100_reset(struct efx_nic *efx, enum reset_type reset_type)
 		__clear_bit(reset_type, &efx->reset_pending);
 		rc = dev_open(efx->net_dev, NULL);
 	} else if (reset_type == RESET_TYPE_ALL) {
+		/* A RESET_TYPE_ALL will cause filters to be removed, so we remove filters
+		 * and reprobe after reset to avoid removing filters twice
+		 */
+		down_read(&efx->filter_sem);
+		ef100_filter_table_down(efx);
+		up_read(&efx->filter_sem);
 		rc = efx_mcdi_reset(efx, reset_type);
 		if (rc)
 			return rc;
 
 		netif_device_attach(efx->net_dev);
 
+		down_read(&efx->filter_sem);
+		rc = ef100_filter_table_up(efx);
+		up_read(&efx->filter_sem);
+		if (rc)
+			return rc;
+
 		rc = dev_open(efx->net_dev, NULL);
 	} else {
 		rc = 1;	/* Leave the device closed */
@@ -480,6 +523,20 @@ const struct efx_nic_type ef100_pf_nic_type = {
 	.rx_remove = efx_mcdi_rx_remove,
 	.rx_write = ef100_rx_write,
 	.rx_packet = __ef100_rx_packet,
+	.max_rx_ip_filters = EFX_MCDI_FILTER_TBL_ROWS,
+	.filter_table_probe = ef100_filter_table_up,
+	.filter_table_restore = efx_mcdi_filter_table_restore,
+	.filter_table_remove = ef100_filter_table_down,
+	.filter_insert = efx_mcdi_filter_insert,
+	.filter_remove_safe = efx_mcdi_filter_remove_safe,
+	.filter_get_safe = efx_mcdi_filter_get_safe,
+	.filter_clear_rx = efx_mcdi_filter_clear_rx,
+	.filter_count_rx_used = efx_mcdi_filter_count_rx_used,
+	.filter_get_rx_id_limit = efx_mcdi_filter_get_rx_id_limit,
+	.filter_get_rx_ids = efx_mcdi_filter_get_rx_ids,
+#ifdef CONFIG_RFS_ACCEL
+	.filter_rfs_expire_one = efx_mcdi_filter_rfs_expire_one,
+#endif
 
 	.get_phys_port_id = efx_ef100_get_phys_port_id,
 
@@ -825,6 +882,12 @@ static int ef100_probe_main(struct efx_nic *efx)
 	if (rc)
 		goto fail;
 
+	down_write(&efx->filter_sem);
+	rc = ef100_filter_table_probe(efx);
+	up_write(&efx->filter_sem);
+	if (rc)
+		goto fail;
+
 	rc = ef100_register_netdev(efx);
 	if (rc)
 		goto fail;
@@ -862,6 +925,10 @@ void ef100_remove(struct efx_nic *efx)
 	struct ef100_nic_data *nic_data = efx->nic_data;
 
 	ef100_unregister_netdev(efx);
+
+	down_write(&efx->filter_sem);
+	efx_mcdi_filter_table_remove(efx);
+	up_write(&efx->filter_sem);
 	efx_fini_channels(efx);
 	kfree(efx->phy_data);
 	efx->phy_data = NULL;


^ permalink raw reply related

* [PATCH v2 net-next 04/11] sfc_ef100: TX path for EF100 NICs
From: Edward Cree @ 2020-07-31 12:59 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

Includes checksum offload and TSO, so declare those in our netdev features.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_nic.c  |   8 +
 drivers/net/ethernet/sfc/ef100_tx.c   | 368 +++++++++++++++++++++++++-
 drivers/net/ethernet/sfc/ef100_tx.h   |   4 +
 drivers/net/ethernet/sfc/net_driver.h |  21 ++
 drivers/net/ethernet/sfc/tx_common.c  |   1 +
 5 files changed, 397 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 963d6c835cba..5d576bd99059 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -263,6 +263,9 @@ static int ef100_ev_process(struct efx_channel *channel, int quota)
 		case ESE_GZ_EF100_EV_MCDI:
 			efx_mcdi_process_event(channel, p_event);
 			break;
+		case ESE_GZ_EF100_EV_TX_COMPLETION:
+			ef100_ev_tx(channel, p_event);
+			break;
 		case ESE_GZ_EF100_EV_DRIVER:
 			netif_info(efx, drv, efx->net_dev,
 				   "Driver initiated event " EFX_QWORD_FMT "\n",
@@ -436,10 +439,15 @@ static unsigned int ef100_check_caps(const struct efx_nic *efx,
 
 /*	NIC level access functions
  */
+#define EF100_OFFLOAD_FEATURES	(NETIF_F_HW_CSUM |			\
+	NETIF_F_HIGHDMA | NETIF_F_SG | NETIF_F_FRAGLIST |		\
+	NETIF_F_TSO_ECN | NETIF_F_TSO_MANGLEID | NETIF_F_HW_VLAN_CTAG_TX)
+
 const struct efx_nic_type ef100_pf_nic_type = {
 	.revision = EFX_REV_EF100,
 	.is_vf = false,
 	.probe = ef100_probe_pf,
+	.offload_features = EF100_OFFLOAD_FEATURES,
 	.mcdi_max_ver = 2,
 	.mcdi_request = ef100_mcdi_request,
 	.mcdi_poll_response = ef100_mcdi_poll_response,
diff --git a/drivers/net/ethernet/sfc/ef100_tx.c b/drivers/net/ethernet/sfc/ef100_tx.c
index 15e646f8c3e0..176dbf2d761c 100644
--- a/drivers/net/ethernet/sfc/ef100_tx.c
+++ b/drivers/net/ethernet/sfc/ef100_tx.c
@@ -9,14 +9,24 @@
  * by the Free Software Foundation, incorporated herein by reference.
  */
 
+#include <net/ip6_checksum.h>
+
 #include "net_driver.h"
 #include "tx_common.h"
 #include "nic_common.h"
+#include "mcdi_functions.h"
+#include "ef100_regs.h"
+#include "io.h"
 #include "ef100_tx.h"
+#include "ef100_nic.h"
 
-/* TX queue stubs */
 int ef100_tx_probe(struct efx_tx_queue *tx_queue)
 {
+	/* Allocate an extra descriptor for the QMDA status completion entry */
+	return efx_nic_alloc_buffer(tx_queue->efx, &tx_queue->txd.buf,
+				    (tx_queue->ptr_mask + 2) *
+				    sizeof(efx_oword_t),
+				    GFP_KERNEL);
 	return 0;
 }
 
@@ -27,10 +37,287 @@ void ef100_tx_init(struct efx_tx_queue *tx_queue)
 		netdev_get_tx_queue(tx_queue->efx->net_dev,
 				    tx_queue->channel->channel -
 				    tx_queue->efx->tx_channel_offset);
+
+	if (efx_mcdi_tx_init(tx_queue, false))
+		netdev_WARN(tx_queue->efx->net_dev,
+			    "failed to initialise TXQ %d\n", tx_queue->queue);
+}
+
+static bool ef100_tx_can_tso(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
+{
+	struct efx_nic *efx = tx_queue->efx;
+	struct ef100_nic_data *nic_data;
+	struct efx_tx_buffer *buffer;
+	struct tcphdr *tcphdr;
+	struct iphdr *iphdr;
+	size_t header_len;
+	u32 mss;
+
+	nic_data = efx->nic_data;
+
+	if (!skb_is_gso_tcp(skb))
+		return false;
+	if (!(efx->net_dev->features & NETIF_F_TSO))
+		return false;
+
+	mss = skb_shinfo(skb)->gso_size;
+	if (unlikely(mss < 4)) {
+		WARN_ONCE(1, "MSS of %u is too small for TSO\n", mss);
+		return false;
+	}
+
+	header_len = efx_tx_tso_header_length(skb);
+	if (header_len > nic_data->tso_max_hdr_len)
+		return false;
+
+	if (skb_shinfo(skb)->gso_segs > nic_data->tso_max_payload_num_segs) {
+		/* net_dev->gso_max_segs should've caught this */
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	if (skb->data_len / mss > nic_data->tso_max_frames)
+		return false;
+
+	/* net_dev->gso_max_size should've caught this */
+	if (WARN_ON_ONCE(skb->data_len > nic_data->tso_max_payload_len))
+		return false;
+
+	/* Reserve an empty buffer for the TSO V3 descriptor.
+	 * Convey the length of the header since we already know it.
+	 */
+	buffer = efx_tx_queue_get_insert_buffer(tx_queue);
+	buffer->flags = EFX_TX_BUF_TSO_V3 | EFX_TX_BUF_CONT;
+	buffer->len = header_len;
+	buffer->unmap_len = 0;
+	buffer->skb = skb;
+	++tx_queue->insert_count;
+
+	/* Adjust the TCP checksum to exclude the total length, since we set
+	 * ED_INNER_IP_LEN in the descriptor.
+	 */
+	tcphdr = tcp_hdr(skb);
+	if (skb_is_gso_v6(skb)) {
+		tcphdr->check = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+						 &ipv6_hdr(skb)->daddr,
+						 0, IPPROTO_TCP, 0);
+	} else {
+		iphdr = ip_hdr(skb);
+		tcphdr->check = ~csum_tcpudp_magic(iphdr->saddr, iphdr->daddr,
+						   0, IPPROTO_TCP, 0);
+	}
+	return true;
+}
+
+static inline efx_oword_t *ef100_tx_desc(struct efx_tx_queue *tx_queue,
+					 unsigned int index)
+{
+	if (likely(tx_queue->txd.buf.addr))
+		return ((efx_oword_t *)tx_queue->txd.buf.addr) + index;
+	else
+		return NULL;
+}
+
+void ef100_notify_tx_desc(struct efx_tx_queue *tx_queue)
+{
+	unsigned int write_ptr;
+	efx_dword_t reg;
+
+	if (unlikely(tx_queue->notify_count == tx_queue->write_count))
+		return;
+
+	write_ptr = tx_queue->write_count & tx_queue->ptr_mask;
+	/* The write pointer goes into the high word */
+	EFX_POPULATE_DWORD_1(reg, ERF_GZ_TX_RING_PIDX, write_ptr);
+	efx_writed_page(tx_queue->efx, &reg,
+			ER_GZ_TX_RING_DOORBELL, tx_queue->queue);
+	tx_queue->notify_count = tx_queue->write_count;
+	tx_queue->xmit_more_available = false;
+}
+
+static void ef100_tx_push_buffers(struct efx_tx_queue *tx_queue)
+{
+	ef100_notify_tx_desc(tx_queue);
+	++tx_queue->pushes;
+}
+
+static void ef100_set_tx_csum_partial(const struct sk_buff *skb,
+				      struct efx_tx_buffer *buffer, efx_oword_t *txd)
+{
+	efx_oword_t csum;
+	int csum_start;
+
+	if (!skb || skb->ip_summed != CHECKSUM_PARTIAL)
+		return;
+
+	/* skb->csum_start has the offset from head, but we need the offset
+	 * from data.
+	 */
+	csum_start = skb_checksum_start_offset(skb);
+	EFX_POPULATE_OWORD_3(csum,
+			     ESF_GZ_TX_SEND_CSO_PARTIAL_EN, 1,
+			     ESF_GZ_TX_SEND_CSO_PARTIAL_START_W,
+			     csum_start >> 1,
+			     ESF_GZ_TX_SEND_CSO_PARTIAL_CSUM_W,
+			     skb->csum_offset >> 1);
+	EFX_OR_OWORD(*txd, *txd, csum);
+}
+
+static void ef100_set_tx_hw_vlan(const struct sk_buff *skb, efx_oword_t *txd)
+{
+	u16 vlan_tci = skb_vlan_tag_get(skb);
+	efx_oword_t vlan;
+
+	EFX_POPULATE_OWORD_2(vlan,
+			     ESF_GZ_TX_SEND_VLAN_INSERT_EN, 1,
+			     ESF_GZ_TX_SEND_VLAN_INSERT_TCI, vlan_tci);
+	EFX_OR_OWORD(*txd, *txd, vlan);
+}
+
+static void ef100_make_send_desc(struct efx_nic *efx,
+				 const struct sk_buff *skb,
+				 struct efx_tx_buffer *buffer, efx_oword_t *txd,
+				 unsigned int segment_count)
+{
+	/* TX send descriptor */
+	EFX_POPULATE_OWORD_3(*txd,
+			     ESF_GZ_TX_SEND_NUM_SEGS, segment_count,
+			     ESF_GZ_TX_SEND_LEN, buffer->len,
+			     ESF_GZ_TX_SEND_ADDR, buffer->dma_addr);
+
+	if (likely(efx->net_dev->features & NETIF_F_HW_CSUM))
+		ef100_set_tx_csum_partial(skb, buffer, txd);
+	if (efx->net_dev->features & NETIF_F_HW_VLAN_CTAG_TX &&
+	    skb && skb_vlan_tag_present(skb))
+		ef100_set_tx_hw_vlan(skb, txd);
+}
+
+static void ef100_make_tso_desc(struct efx_nic *efx,
+				const struct sk_buff *skb,
+				struct efx_tx_buffer *buffer, efx_oword_t *txd,
+				unsigned int segment_count)
+{
+	u32 mangleid = (efx->net_dev->features & NETIF_F_TSO_MANGLEID) ||
+		skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID ?
+		ESE_GZ_TX_DESC_IP4_ID_NO_OP :
+		ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
+	u16 vlan_enable =  efx->net_dev->features & NETIF_F_HW_VLAN_CTAG_TX ?
+		skb_vlan_tag_present(skb) : 0;
+	unsigned int len, ip_offset, tcp_offset, payload_segs;
+	u16 vlan_tci = skb_vlan_tag_get(skb);
+	u32 mss = skb_shinfo(skb)->gso_size;
+
+	len = skb->len - buffer->len;
+	/* We use 1 for the TSO descriptor and 1 for the header */
+	payload_segs = segment_count - 2;
+	ip_offset =  skb_network_offset(skb);
+	tcp_offset = skb_transport_offset(skb);
+
+	EFX_POPULATE_OWORD_13(*txd,
+			      ESF_GZ_TX_DESC_TYPE, ESE_GZ_TX_DESC_TYPE_TSO,
+			      ESF_GZ_TX_TSO_MSS, mss,
+			      ESF_GZ_TX_TSO_HDR_NUM_SEGS, 1,
+			      ESF_GZ_TX_TSO_PAYLOAD_NUM_SEGS, payload_segs,
+			      ESF_GZ_TX_TSO_HDR_LEN_W, buffer->len >> 1,
+			      ESF_GZ_TX_TSO_PAYLOAD_LEN, len,
+			      ESF_GZ_TX_TSO_CSO_INNER_L4, 1,
+			      ESF_GZ_TX_TSO_INNER_L3_OFF_W, ip_offset >> 1,
+			      ESF_GZ_TX_TSO_INNER_L4_OFF_W, tcp_offset >> 1,
+			      ESF_GZ_TX_TSO_ED_INNER_IP4_ID, mangleid,
+			      ESF_GZ_TX_TSO_ED_INNER_IP_LEN, 1,
+			      ESF_GZ_TX_TSO_VLAN_INSERT_EN, vlan_enable,
+			      ESF_GZ_TX_TSO_VLAN_INSERT_TCI, vlan_tci
+		);
+}
+
+static void ef100_tx_make_descriptors(struct efx_tx_queue *tx_queue,
+				      const struct sk_buff *skb,
+				      unsigned int segment_count)
+{
+	unsigned int old_write_count = tx_queue->write_count;
+	unsigned int new_write_count = old_write_count;
+	struct efx_tx_buffer *buffer;
+	unsigned int next_desc_type;
+	unsigned int write_ptr;
+	efx_oword_t *txd;
+	unsigned int nr_descs = tx_queue->insert_count - old_write_count;
+
+	if (unlikely(nr_descs == 0))
+		return;
+
+	if (segment_count)
+		next_desc_type = ESE_GZ_TX_DESC_TYPE_TSO;
+	else
+		next_desc_type = ESE_GZ_TX_DESC_TYPE_SEND;
+
+	/* if it's a raw write (such as XDP) then always SEND single frames */
+	if (!skb)
+		nr_descs = 1;
+
+	do {
+		write_ptr = new_write_count & tx_queue->ptr_mask;
+		buffer = &tx_queue->buffer[write_ptr];
+		txd = ef100_tx_desc(tx_queue, write_ptr);
+		++new_write_count;
+
+		/* Create TX descriptor ring entry */
+		tx_queue->packet_write_count = new_write_count;
+
+		switch (next_desc_type) {
+		case ESE_GZ_TX_DESC_TYPE_SEND:
+			ef100_make_send_desc(tx_queue->efx, skb,
+					     buffer, txd, nr_descs);
+			break;
+		case ESE_GZ_TX_DESC_TYPE_TSO:
+			/* TX TSO descriptor */
+			WARN_ON_ONCE(!(buffer->flags & EFX_TX_BUF_TSO_V3));
+			ef100_make_tso_desc(tx_queue->efx, skb,
+					    buffer, txd, nr_descs);
+			break;
+		default:
+			/* TX segment descriptor */
+			EFX_POPULATE_OWORD_3(*txd,
+					     ESF_GZ_TX_DESC_TYPE, ESE_GZ_TX_DESC_TYPE_SEG,
+					     ESF_GZ_TX_SEG_LEN, buffer->len,
+					     ESF_GZ_TX_SEG_ADDR, buffer->dma_addr);
+		}
+		/* if it's a raw write (such as XDP) then always SEND */
+		next_desc_type = skb ? ESE_GZ_TX_DESC_TYPE_SEG :
+				       ESE_GZ_TX_DESC_TYPE_SEND;
+
+	} while (new_write_count != tx_queue->insert_count);
+
+	wmb(); /* Ensure descriptors are written before they are fetched */
+
+	tx_queue->write_count = new_write_count;
+
+	/* The write_count above must be updated before reading
+	 * channel->holdoff_doorbell to avoid a race with the
+	 * completion path, so ensure these operations are not
+	 * re-ordered.  This also flushes the update of write_count
+	 * back into the cache.
+	 */
+	smp_mb();
 }
 
 void ef100_tx_write(struct efx_tx_queue *tx_queue)
 {
+	ef100_tx_make_descriptors(tx_queue, NULL, 0);
+	ef100_tx_push_buffers(tx_queue);
+}
+
+void ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event)
+{
+	unsigned int tx_done =
+		EFX_QWORD_FIELD(*p_event, ESF_GZ_EV_TXCMPL_NUM_DESC);
+	unsigned int qlabel =
+		EFX_QWORD_FIELD(*p_event, ESF_GZ_EV_TXCMPL_Q_LABEL);
+	struct efx_tx_queue *tx_queue =
+		efx_channel_get_tx_queue(channel, qlabel);
+	unsigned int tx_index = (tx_queue->read_count + tx_done - 1) &
+				tx_queue->ptr_mask;
+
+	efx_xmit_done(tx_queue, tx_index);
 }
 
 /* Add a socket buffer to a TX queue
@@ -42,10 +329,81 @@ void ef100_tx_write(struct efx_tx_queue *tx_queue)
  */
 int ef100_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 {
-	/* Stub.  No TX path yet. */
+	unsigned int old_insert_count = tx_queue->insert_count;
 	struct efx_nic *efx = tx_queue->efx;
+	bool xmit_more = netdev_xmit_more();
+	unsigned int fill_level;
+	unsigned int segments;
+	int rc;
+
+	if (!tx_queue->buffer || !tx_queue->ptr_mask) {
+		netif_stop_queue(efx->net_dev);
+		dev_kfree_skb_any(skb);
+		return -ENODEV;
+	}
+
+	segments = skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 0;
+	if (segments == 1)
+		segments = 0;	/* Don't use TSO/GSO for a single segment. */
+	if (segments && !ef100_tx_can_tso(tx_queue, skb)) {
+		rc = efx_tx_tso_fallback(tx_queue, skb);
+		tx_queue->tso_fallbacks++;
+		if (rc)
+			goto err;
+		else
+			return 0;
+	}
+
+	/* Map for DMA and create descriptors */
+	rc = efx_tx_map_data(tx_queue, skb, segments);
+	if (rc)
+		goto err;
+	ef100_tx_make_descriptors(tx_queue, skb, segments);
+
+	fill_level = efx_channel_tx_fill_level(tx_queue->channel);
+	if (fill_level > efx->txq_stop_thresh) {
+		netif_tx_stop_queue(tx_queue->core_txq);
+		/* Re-read after a memory barrier in case we've raced with
+		 * the completion path. Otherwise there's a danger we'll never
+		 * restart the queue if all completions have just happened.
+		 */
+		smp_mb();
+		fill_level = efx_channel_tx_fill_level(tx_queue->channel);
+		if (fill_level < efx->txq_stop_thresh)
+			netif_tx_start_queue(tx_queue->core_txq);
+	}
+
+	if (__netdev_tx_sent_queue(tx_queue->core_txq, skb->len, xmit_more))
+		tx_queue->xmit_more_available = false; /* push doorbell */
+	else if (tx_queue->write_count - tx_queue->notify_count > 255)
+		/* Ensure we never push more than 256 packets at once */
+		tx_queue->xmit_more_available = false; /* push */
+	else
+		tx_queue->xmit_more_available = true; /* don't push yet */
+
+	if (!tx_queue->xmit_more_available)
+		ef100_tx_push_buffers(tx_queue);
+
+	if (segments) {
+		tx_queue->tso_bursts++;
+		tx_queue->tso_packets += segments;
+		tx_queue->tx_packets  += segments;
+	} else {
+		tx_queue->tx_packets++;
+	}
+	return 0;
+
+err:
+	efx_enqueue_unwind(tx_queue, old_insert_count);
+	if (!IS_ERR_OR_NULL(skb))
+		dev_kfree_skb_any(skb);
 
-	netif_stop_queue(efx->net_dev);
-	dev_kfree_skb_any(skb);
-	return -ENODEV;
+	/* If we're not expecting another transmit and we had something to push
+	 * on this queue then we need to push here to get the previous packets
+	 * out.  We only enter this branch from before the 'Update BQL' section
+	 * above, so xmit_more_available still refers to the old state.
+	 */
+	if (tx_queue->xmit_more_available && !xmit_more)
+		ef100_tx_push_buffers(tx_queue);
+	return rc;
 }
diff --git a/drivers/net/ethernet/sfc/ef100_tx.h b/drivers/net/ethernet/sfc/ef100_tx.h
index 9a472f7aff43..fa23e430bdd7 100644
--- a/drivers/net/ethernet/sfc/ef100_tx.h
+++ b/drivers/net/ethernet/sfc/ef100_tx.h
@@ -17,6 +17,10 @@
 int ef100_tx_probe(struct efx_tx_queue *tx_queue);
 void ef100_tx_init(struct efx_tx_queue *tx_queue);
 void ef100_tx_write(struct efx_tx_queue *tx_queue);
+void ef100_notify_tx_desc(struct efx_tx_queue *tx_queue);
+unsigned int ef100_tx_max_skb_descs(struct efx_nic *efx);
+
+void ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event);
 
 netdev_tx_t ef100_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 #endif
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 9791fac0b649..7bb7ecb480ae 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -173,6 +173,7 @@ struct efx_tx_buffer {
 #define EFX_TX_BUF_MAP_SINGLE	8	/* buffer was mapped with dma_map_single() */
 #define EFX_TX_BUF_OPTION	0x10	/* empty buffer for option descriptor */
 #define EFX_TX_BUF_XDP		0x20	/* buffer was sent with XDP */
+#define EFX_TX_BUF_TSO_V3	0x40	/* empty buffer for a TSO_V3 descriptor */
 
 /**
  * struct efx_tx_queue - An Efx TX queue
@@ -245,6 +246,7 @@ struct efx_tx_buffer {
  * @pio_packets: Number of times the TX PIO feature has been used
  * @xmit_more_available: Are any packets waiting to be pushed to the NIC
  * @cb_packets: Number of times the TX copybreak feature has been used
+ * @notify_count: Count of notified descriptors to the NIC
  * @empty_read_count: If the completion path has seen the queue as empty
  *	and the transmission path has not yet checked this, the value of
  *	@read_count bitwise-added to %EFX_EMPTY_COUNT_VALID; otherwise 0.
@@ -292,6 +294,7 @@ struct efx_tx_queue {
 	unsigned int pio_packets;
 	bool xmit_more_available;
 	unsigned int cb_packets;
+	unsigned int notify_count;
 	/* Statistics to supplement MAC stats */
 	unsigned long tx_packets;
 
@@ -1669,6 +1672,24 @@ static inline void efx_xmit_hwtstamp_pending(struct sk_buff *skb)
 	skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 }
 
+/* Get the max fill level of the TX queues on this channel */
+static inline unsigned int
+efx_channel_tx_fill_level(struct efx_channel *channel)
+{
+	struct efx_tx_queue *tx_queue;
+	unsigned int fill_level = 0;
+
+	/* This function is currently only used by EF100, which maybe
+	 * could do something simpler and just compute the fill level
+	 * of the single TXQ that's really in use.
+	 */
+	efx_for_each_channel_tx_queue(tx_queue, channel)
+		fill_level = max(fill_level,
+				 tx_queue->insert_count - tx_queue->read_count);
+
+	return fill_level;
+}
+
 /* Get all supported features.
  * If a feature is not fixed, it is present in hw_features.
  * If a feature is fixed, it does not present in hw_features, but
diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c
index 11b64c609550..793e234819a8 100644
--- a/drivers/net/ethernet/sfc/tx_common.c
+++ b/drivers/net/ethernet/sfc/tx_common.c
@@ -71,6 +71,7 @@ void efx_init_tx_queue(struct efx_tx_queue *tx_queue)
 		  "initialising TX queue %d\n", tx_queue->queue);
 
 	tx_queue->insert_count = 0;
+	tx_queue->notify_count = 0;
 	tx_queue->write_count = 0;
 	tx_queue->packet_write_count = 0;
 	tx_queue->old_write_count = 0;


^ permalink raw reply related

* [PATCH v2 net-next 03/11] sfc_ef100: read Design Parameters at probe time
From: Edward Cree @ 2020-07-31 12:58 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

Several parts of the EF100 architecture are parameterised (to allow
 varying capabilities on FPGAs according to resource constraints), and
 these parameters are exposed to the driver through a TLV-encoded
 region of the BAR.
For the most part we either don't care about these values at all or
 just need to sanity-check them against the driver's assumptions, but
 there are a number of TSO limits which we record so that we will be
 able to check against them in the TX path when handling GSO skbs.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_nic.c | 201 +++++++++++++++++++++++++++
 drivers/net/ethernet/sfc/ef100_nic.h |   4 +
 2 files changed, 205 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index c2bec2bdbc1f..963d6c835cba 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -515,6 +515,193 @@ static int compare_versions(const char *a, const char *b)
 	return a_patch - b_patch;
 }
 
+enum ef100_tlv_state_machine {
+	EF100_TLV_TYPE,
+	EF100_TLV_TYPE_CONT,
+	EF100_TLV_LENGTH,
+	EF100_TLV_VALUE
+};
+
+struct ef100_tlv_state {
+	enum ef100_tlv_state_machine state;
+	u64 value;
+	u32 value_offset;
+	u16 type;
+	u8 len;
+};
+
+static int ef100_tlv_feed(struct ef100_tlv_state *state, u8 byte)
+{
+	switch (state->state) {
+	case EF100_TLV_TYPE:
+		state->type = byte & 0x7f;
+		state->state = (byte & 0x80) ? EF100_TLV_TYPE_CONT
+					     : EF100_TLV_LENGTH;
+		/* Clear ready to read in a new entry */
+		state->value = 0;
+		state->value_offset = 0;
+		return 0;
+	case EF100_TLV_TYPE_CONT:
+		state->type |= byte << 7;
+		state->state = EF100_TLV_LENGTH;
+		return 0;
+	case EF100_TLV_LENGTH:
+		state->len = byte;
+		/* We only handle TLVs that fit in a u64 */
+		if (state->len > sizeof(state->value))
+			return -EOPNOTSUPP;
+		/* len may be zero, implying a value of zero */
+		state->state = state->len ? EF100_TLV_VALUE : EF100_TLV_TYPE;
+		return 0;
+	case EF100_TLV_VALUE:
+		state->value |= ((u64)byte) << (state->value_offset * 8);
+		state->value_offset++;
+		if (state->value_offset >= state->len)
+			state->state = EF100_TLV_TYPE;
+		return 0;
+	default: /* state machine error, can't happen */
+		WARN_ON_ONCE(1);
+		return -EIO;
+	}
+}
+
+static int ef100_process_design_param(struct efx_nic *efx,
+				      const struct ef100_tlv_state *reader)
+{
+	struct ef100_nic_data *nic_data = efx->nic_data;
+
+	switch (reader->type) {
+	case ESE_EF100_DP_GZ_PAD: /* padding, skip it */
+		return 0;
+	case ESE_EF100_DP_GZ_PARTIAL_TSTAMP_SUB_NANO_BITS:
+		/* Driver doesn't support timestamping yet, so we don't care */
+		return 0;
+	case ESE_EF100_DP_GZ_EVQ_UNSOL_CREDIT_SEQ_BITS:
+		/* Driver doesn't support unsolicited-event credits yet, so
+		 * we don't care
+		 */
+		return 0;
+	case ESE_EF100_DP_GZ_NMMU_GROUP_SIZE:
+		/* Driver doesn't manage the NMMU (so we don't care) */
+		return 0;
+	case ESE_EF100_DP_GZ_RX_L4_CSUM_PROTOCOLS:
+		/* Driver uses CHECKSUM_COMPLETE, so we don't care about
+		 * protocol checksum validation
+		 */
+		return 0;
+	case ESE_EF100_DP_GZ_TSO_MAX_HDR_LEN:
+		nic_data->tso_max_hdr_len = min_t(u64, reader->value, 0xffff);
+		return 0;
+	case ESE_EF100_DP_GZ_TSO_MAX_HDR_NUM_SEGS:
+		/* We always put HDR_NUM_SEGS=1 in our TSO descriptors */
+		if (!reader->value) {
+			netif_err(efx, probe, efx->net_dev,
+				  "TSO_MAX_HDR_NUM_SEGS < 1\n");
+			return -EOPNOTSUPP;
+		}
+		return 0;
+	case ESE_EF100_DP_GZ_RXQ_SIZE_GRANULARITY:
+	case ESE_EF100_DP_GZ_TXQ_SIZE_GRANULARITY:
+		/* Our TXQ and RXQ sizes are always power-of-two and thus divisible by
+		 * EFX_MIN_DMAQ_SIZE, so we just need to check that
+		 * EFX_MIN_DMAQ_SIZE is divisible by GRANULARITY.
+		 * This is very unlikely to fail.
+		 */
+		if (EFX_MIN_DMAQ_SIZE % reader->value) {
+			netif_err(efx, probe, efx->net_dev,
+				  "%s size granularity is %llu, can't guarantee safety\n",
+				  reader->type == ESE_EF100_DP_GZ_RXQ_SIZE_GRANULARITY ? "RXQ" : "TXQ",
+				  reader->value);
+			return -EOPNOTSUPP;
+		}
+		return 0;
+	case ESE_EF100_DP_GZ_TSO_MAX_PAYLOAD_LEN:
+		nic_data->tso_max_payload_len = min_t(u64, reader->value, GSO_MAX_SIZE);
+		efx->net_dev->gso_max_size = nic_data->tso_max_payload_len;
+		return 0;
+	case ESE_EF100_DP_GZ_TSO_MAX_PAYLOAD_NUM_SEGS:
+		nic_data->tso_max_payload_num_segs = min_t(u64, reader->value, 0xffff);
+		efx->net_dev->gso_max_segs = nic_data->tso_max_payload_num_segs;
+		return 0;
+	case ESE_EF100_DP_GZ_TSO_MAX_NUM_FRAMES:
+		nic_data->tso_max_frames = min_t(u64, reader->value, 0xffff);
+		return 0;
+	case ESE_EF100_DP_GZ_COMPAT:
+		if (reader->value) {
+			netif_err(efx, probe, efx->net_dev,
+				  "DP_COMPAT has unknown bits %#llx, driver not compatible with this hw\n",
+				  reader->value);
+			return -EOPNOTSUPP;
+		}
+		return 0;
+	case ESE_EF100_DP_GZ_MEM2MEM_MAX_LEN:
+		/* Driver doesn't use mem2mem transfers */
+		return 0;
+	case ESE_EF100_DP_GZ_EVQ_TIMER_TICK_NANOS:
+		/* Driver doesn't currently use EVQ_TIMER */
+		return 0;
+	case ESE_EF100_DP_GZ_NMMU_PAGE_SIZES:
+		/* Driver doesn't manage the NMMU (so we don't care) */
+		return 0;
+	case ESE_EF100_DP_GZ_VI_STRIDES:
+		/* We never try to set the VI stride, and we don't rely on
+		 * being able to find VIs past VI 0 until after we've learned
+		 * the current stride from MC_CMD_GET_CAPABILITIES.
+		 * So the value of this shouldn't matter.
+		 */
+		if (reader->value != ESE_EF100_DP_GZ_VI_STRIDES_DEFAULT)
+			netif_dbg(efx, probe, efx->net_dev,
+				  "NIC has other than default VI_STRIDES (mask "
+				  "%#llx), early probing might use wrong one\n",
+				  reader->value);
+		return 0;
+	case ESE_EF100_DP_GZ_RX_MAX_RUNT:
+		/* Driver doesn't look at L2_STATUS:LEN_ERR bit, so we don't
+		 * care whether it indicates runt or overlength for any given
+		 * packet, so we don't care about this parameter.
+		 */
+		return 0;
+	default:
+		/* Host interface says "Drivers should ignore design parameters
+		 * that they do not recognise."
+		 */
+		netif_info(efx, probe, efx->net_dev,
+			   "Ignoring unrecognised design parameter %u\n",
+			   reader->type);
+		return 0;
+	}
+}
+
+static int ef100_check_design_params(struct efx_nic *efx)
+{
+	struct ef100_tlv_state reader = {};
+	u32 total_len, offset = 0;
+	efx_dword_t reg;
+	int rc = 0, i;
+	u32 data;
+
+	efx_readd(efx, &reg, ER_GZ_PARAMS_TLV_LEN);
+	total_len = EFX_DWORD_FIELD(reg, EFX_DWORD_0);
+	netif_dbg(efx, probe, efx->net_dev, "%u bytes of design parameters\n",
+		  total_len);
+	while (offset < total_len) {
+		efx_readd(efx, &reg, ER_GZ_PARAMS_TLV + offset);
+		data = EFX_DWORD_FIELD(reg, EFX_DWORD_0);
+		for (i = 0; i < sizeof(data); i++) {
+			rc = ef100_tlv_feed(&reader, data);
+			/* Got a complete value? */
+			if (!rc && reader.state == EF100_TLV_TYPE)
+				rc = ef100_process_design_param(efx, &reader);
+			if (rc)
+				goto out;
+			data >>= 8;
+			offset++;
+		}
+	}
+out:
+	return rc;
+}
+
 /*	NIC probe and remove
  */
 static int ef100_probe_main(struct efx_nic *efx)
@@ -536,6 +723,20 @@ static int ef100_probe_main(struct efx_nic *efx)
 	net_dev->features |= efx->type->offload_features;
 	net_dev->hw_features |= efx->type->offload_features;
 
+	/* Populate design-parameter defaults */
+	nic_data->tso_max_hdr_len = ESE_EF100_DP_GZ_TSO_MAX_HDR_LEN_DEFAULT;
+	nic_data->tso_max_frames = ESE_EF100_DP_GZ_TSO_MAX_NUM_FRAMES_DEFAULT;
+	nic_data->tso_max_payload_num_segs = ESE_EF100_DP_GZ_TSO_MAX_PAYLOAD_NUM_SEGS_DEFAULT;
+	nic_data->tso_max_payload_len = ESE_EF100_DP_GZ_TSO_MAX_PAYLOAD_LEN_DEFAULT;
+	net_dev->gso_max_segs = ESE_EF100_DP_GZ_TSO_MAX_HDR_NUM_SEGS_DEFAULT;
+	/* Read design parameters */
+	rc = ef100_check_design_params(efx);
+	if (rc) {
+		netif_err(efx, probe, efx->net_dev,
+			  "Unsupported design parameters\n");
+		goto fail;
+	}
+
 	/* we assume later that we can copy from this buffer in dwords */
 	BUILD_BUG_ON(MCDI_CTL_SDU_LEN_MAX_V2 % 4);
 
diff --git a/drivers/net/ethernet/sfc/ef100_nic.h b/drivers/net/ethernet/sfc/ef100_nic.h
index 6367bbb2c9b3..c8816bc6ae78 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.h
+++ b/drivers/net/ethernet/sfc/ef100_nic.h
@@ -26,6 +26,10 @@ struct ef100_nic_data {
 	u16 warm_boot_count;
 	u8 port_id[ETH_ALEN];
 	DECLARE_BITMAP(evq_phases, EFX_MAX_CHANNELS);
+	u16 tso_max_hdr_len;
+	u16 tso_max_payload_num_segs;
+	u16 tso_max_frames;
+	unsigned int tso_max_payload_len;
 };
 
 #define efx_ef100_has_cap(caps, flag) \


^ permalink raw reply related

* [PATCH v2 net-next 02/11] sfc_ef100: fail the probe if NIC uses unsol_ev credits
From: Edward Cree @ 2020-07-31 12:58 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

In the future, EF100 is planned to have a credit-based scheme for
 handling unsolicited events, which drivers will need to use in order
 to function correctly.  However, current EF100 hardware does not yet
 generate unsolicited events and the credit scheme has not yet been
 implemented in firmware.  To prevent compatibility problems later if
 the current driver is used with future firmware which does implement
 it, we check for the corresponding capability flag (which that
 future firmware will set), and if found, we refuse to probe.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_nic.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 75131bcf4f1a..c2bec2bdbc1f 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -602,6 +602,12 @@ static int ef100_probe_main(struct efx_nic *efx)
 		goto fail;
 	}

+	if (efx_has_cap(efx, UNSOL_EV_CREDIT_SUPPORTED)) {
+		netif_info(efx, drv, efx->net_dev, "Firmware uses unsolicited-event credits\n");
+		rc = -EINVAL;
+		goto fail;
+	}
+
 	rc = ef100_phy_probe(efx);
 	if (rc)
 		goto fail;

^ permalink raw reply related

* [PATCH v2 net-next 01/11] sfc_ef100: check firmware version at start-of-day
From: Edward Cree @ 2020-07-31 12:58 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <31de2e73-bce7-6c9d-0c20-49b32e2043cc@solarflare.com>

Early in EF100 development there was a different format of event
 descriptor; if the NIC is somehow running the very old firmware
 which will use that format, fail the probe.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef100_nic.c | 40 ++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 6a00f2a2dc2b..75131bcf4f1a 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -485,6 +485,36 @@ const struct efx_nic_type ef100_pf_nic_type = {
 
 };
 
+static int compare_versions(const char *a, const char *b)
+{
+	int a_major, a_minor, a_point, a_patch;
+	int b_major, b_minor, b_point, b_patch;
+	int a_matched, b_matched;
+
+	a_matched = sscanf(a, "%d.%d.%d.%d", &a_major, &a_minor, &a_point, &a_patch);
+	b_matched = sscanf(b, "%d.%d.%d.%d", &b_major, &b_minor, &b_point, &b_patch);
+
+	if (a_matched == 4 && b_matched != 4)
+		return +1;
+
+	if (a_matched != 4 && b_matched == 4)
+		return -1;
+
+	if (a_matched != 4 && b_matched != 4)
+		return 0;
+
+	if (a_major != b_major)
+		return a_major - b_major;
+
+	if (a_minor != b_minor)
+		return a_minor - b_minor;
+
+	if (a_point != b_point)
+		return a_point - b_point;
+
+	return a_patch - b_patch;
+}
+
 /*	NIC probe and remove
  */
 static int ef100_probe_main(struct efx_nic *efx)
@@ -492,6 +522,7 @@ static int ef100_probe_main(struct efx_nic *efx)
 	unsigned int bar_size = resource_size(&efx->pci_dev->resource[efx->mem_bar]);
 	struct net_device *net_dev = efx->net_dev;
 	struct ef100_nic_data *nic_data;
+	char fw_version[32];
 	int i, rc;
 
 	if (WARN_ON(bar_size == 0))
@@ -562,6 +593,15 @@ static int ef100_probe_main(struct efx_nic *efx)
 		goto fail;
 	efx->port_num = rc;
 
+	efx_mcdi_print_fwver(efx, fw_version, sizeof(fw_version));
+	netif_dbg(efx, drv, efx->net_dev, "Firmware version %s\n", fw_version);
+
+	if (compare_versions(fw_version, "1.1.0.1000") < 0) {
+		netif_info(efx, drv, efx->net_dev, "Firmware uses old event descriptors\n");
+		rc = -EINVAL;
+		goto fail;
+	}
+
 	rc = ef100_phy_probe(efx);
 	if (rc)
 		goto fail;


^ permalink raw reply related

* Re: [PATCH v3 5/9] ethernet: ti: am65-cpts: Use generic helper function
From: Grygorii Strashko @ 2020-07-31 12:55 UTC (permalink / raw)
  To: Kurt Kanzenbach, Arnd Bergmann
  Cc: Richard Cochran, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Jakub Kicinski, Jiri Pirko, Ido Schimmel,
	Heiner Kallweit, Russell King, Ivan Khoronzhuk, Samuel Zou,
	Networking, Petr Machata
In-Reply-To: <87lfiz29di.fsf@kurt>



On 31/07/2020 14:48, Kurt Kanzenbach wrote:
> On Thu Jul 30 2020, Arnd Bergmann wrote:
>> On Thu, Jul 30, 2020 at 11:41 AM Kurt Kanzenbach <kurt@linutronix.de> wrote:
>>> On Thu Jul 30 2020, Grygorii Strashko wrote:
>>>> On 30/07/2020 11:00, Kurt Kanzenbach wrote:
>>>>> +    msgtype = ptp_get_msgtype(hdr, ptp_class);
>>>>> +    seqid   = be16_to_cpu(hdr->sequence_id);
>>>>
>>>> Is there any reason to not use "ntohs()"?
>>>
>>> This is just my personal preference, because I think it's more
>>> readable. Internally ntohs() uses be16_to_cpu(). There's no technical
>>> reason for it.
>>
>> I think for traditional reasons, code in net/* tends to use ntohs()
>> while code in drivers/*  tends to use be16_to_cpu().
>>
>> In drivers/net/* the two are used roughly the same, though I guess
>> one could make the argument that be16_to_cpu() would be
>> more appropriate for data structures exchanged with hardware
>> while ntohs() makes sense on data structures sent over the
>> network.
> 
> I see, makes sense. I could simply keep it the way it was, or?

  I prefer ntohs() as this packet data.

-- 
Best regards,
grygorii

^ permalink raw reply

* [PATCH v2 net-next 00/11] sfc: driver for EF100 family NICs, part 2
From: Edward Cree @ 2020-07-31 12:55 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev

This series implements the data path and various other functionality
 for Xilinx/Solarflare EF100 NICs.

Changed from v1:
 * Fixed build errors on CONFIG_RFS_ACCEL=n (patch #5) and 32-bit
   (patch #8)
 * Dropped patch #10 (ethtool ops) as it's buggy and will need a
   bigger rework to fix.

Edward Cree (11):
  sfc_ef100: check firmware version at start-of-day
  sfc_ef100: fail the probe if NIC uses unsol_ev credits
  sfc_ef100: read Design Parameters at probe time
  sfc_ef100: TX path for EF100 NICs
  sfc_ef100: RX filter table management and related gubbins
  sfc_ef100: RX path for EF100
  sfc_ef100: plumb in fini_dmaq
  sfc_ef100: statistics gathering
  sfc_ef100: functions for selftests
  sfc_ef100: read pf_index at probe time
  sfc_ef100: add nic-type for VFs, and bind to them

 drivers/net/ethernet/sfc/ef100.c        |   2 +
 drivers/net/ethernet/sfc/ef100_netdev.c |  16 +
 drivers/net/ethernet/sfc/ef100_nic.c    | 643 ++++++++++++++++++++++++
 drivers/net/ethernet/sfc/ef100_nic.h    |  48 ++
 drivers/net/ethernet/sfc/ef100_rx.c     | 150 +++++-
 drivers/net/ethernet/sfc/ef100_rx.h     |   1 +
 drivers/net/ethernet/sfc/ef100_tx.c     | 368 +++++++++++++-
 drivers/net/ethernet/sfc/ef100_tx.h     |   4 +
 drivers/net/ethernet/sfc/net_driver.h   |  21 +
 drivers/net/ethernet/sfc/tx_common.c    |   1 +
 10 files changed, 1242 insertions(+), 12 deletions(-)


^ permalink raw reply

* [PATCH v4 12/12] PCI: Remove '*val = 0' from pcie_capability_read_*()
From: Saheed O. Bolarinwa @ 2020-07-31 11:43 UTC (permalink / raw)
  To: helgaas
  Cc: Saheed O. Bolarinwa, bjorn, skhan, linux-kernel-mentees,
	linux-pci, linux-kernel, linux-rdma, QCA ath9k Development,
	linux-wireless, netdev, linux-acpi, linuxppc-dev,
	Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe, Arnd Bergmann, Greg Kroah-Hartman,
	David S. Miller, Kalle Valo, Jakub Kicinski, Rafael J. Wysocki,
	Len Brown, Russell Currey, Sam Bobroff, Oliver O'Halloran
In-Reply-To: <20200731114329.100848-1-refactormyself@gmail.com>

There are several reasons why a PCI capability read may fail whether the
device is present or not. If this happens, pcie_capability_read_*() will
return -EINVAL/PCIBIOS_BAD_REGISTER_NUMBER or PCIBIOS_DEVICE_NOT_FOUND
and *val is set to 0.

This behaviour if further ensured by this code inside
pcie_capability_read_*()

 ret = pci_read_config_dword(dev, pci_pcie_cap(dev) + pos, val);
 /*
  * Reset *val to 0 if pci_read_config_dword() fails, it may
  * have been written as 0xFFFFFFFF if hardware error happens
  * during pci_read_config_dword().
  */
 if (ret)
	 *val = 0;
 return ret;

a) Since all pci_generic_config_read() does is read a register value,
it may return success after reading a ~0 which *may* have been fabricated
by the PCI host bridge due to a read timeout. Hence pci_read_config_*()
will return success with a fabricated ~0 in *val, indicating a problem.
In this case, the assumed behaviour of  pcie_capability_read_*() will be
wrong. To avoid error slipping through, more checks are necessary.

b) pci_read_config_*() will return PCIBIOS_DEVICE_NOT_FOUND only if
dev->error_state = pci_channel_io_perm_failure (i.e.
pci_dev_is_disconnected()) or if pci_generic_config_read() can't find the
device. In both cases *val is initially set to ~0 but as shown in the code
above pcie_capability_read_*() resets it back to 0. Even with this effort,
drivers still have to perform validation checks more so if 0 is a valid
value.

Most drivers only consider the case (b) and in some cases, there is the
expectation that on timeout *val has a fabricated value of ~0, which *may*
not always be true as explained in (a).

In any case, checks need to be done to validate the value read and maybe
confirm which error has occurred. It is better left to the drivers to do.

Remove the reset of *val to 0 when pci_read_config_*() fails.

Suggested-by: Bjorn Helgaas <bjorn@helgaas.com>
Signed-off-by: Saheed O. Bolarinwa <refactormyself@gmail.com>
---
 drivers/pci/access.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 79c4a2ef269a..ec95edbb1ac8 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -413,13 +413,6 @@ int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val)

 	if (pcie_capability_reg_implemented(dev, pos)) {
 		ret = pci_read_config_word(dev, pci_pcie_cap(dev) + pos, val);
-		/*
-		 * Reset *val to 0 if pci_read_config_word() fails, it may
-		 * have been written as 0xFFFF if hardware error happens
-		 * during pci_read_config_word().
-		 */
-		if (ret)
-			*val = 0;
 		return ret;
 	}

@@ -448,13 +441,6 @@ int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val)

 	if (pcie_capability_reg_implemented(dev, pos)) {
 		ret = pci_read_config_dword(dev, pci_pcie_cap(dev) + pos, val);
-		/*
-		 * Reset *val to 0 if pci_read_config_dword() fails, it may
-		 * have been written as 0xFFFFFFFF if hardware error happens
-		 * during pci_read_config_dword().
-		 */
-		if (ret)
-			*val = 0;
 		return ret;
 	}

-- 
2.18.4

^ permalink raw reply related

* Re: [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case
From: Daniel Borkmann @ 2020-07-31 12:25 UTC (permalink / raw)
  To: John Fastabend, kafai, ast; +Cc: netdev, bpf
In-Reply-To: <159603977489.4454.16012925913901625071.stgit@john-Precision-5820-Tower>

On 7/29/20 6:22 PM, John Fastabend wrote:
> I had a sockmap program that after doing some refactoring started spewing
> this splat at me:
> 
> [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> [...]
> [18610.807359] Call Trace:
> [18610.807370]  ? 0xffffffffc114d0d5
> [18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
> [18610.807391]  tcp_connect+0x895/0xd50
> [18610.807400]  tcp_v4_connect+0x465/0x4e0
> [18610.807407]  __inet_stream_connect+0xd6/0x3a0
> [18610.807412]  ? __inet_stream_connect+0x5/0x3a0
> [18610.807417]  inet_stream_connect+0x3b/0x60
> [18610.807425]  __sys_connect+0xed/0x120
> 
> After some debugging I was able to build this simple reproducer,
> 
>   __section("sockops/reproducer_bad")
>   int bpf_reproducer_bad(struct bpf_sock_ops *skops)
>   {
>          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>          return 0;
>   }
> 
> And along the way noticed that below program ran without splat,
> 
> __section("sockops/reproducer_good")
> int bpf_reproducer_good(struct bpf_sock_ops *skops)
> {
>          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>          volatile __maybe_unused __u32 family;
> 
>          compiler_barrier();
> 
>          family = skops->family;
>          return 0;
> }
> 
> So I decided to check out the code we generate for the above two
> programs and noticed each generates the BPF code you would expect,
> 
> 0000000000000000 <bpf_reproducer_bad>:
> ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>         0:       r1 = *(u32 *)(r1 + 96)
>         1:       *(u32 *)(r10 - 4) = r1
> ;       return 0;
>         2:       r0 = 0
>         3:       exit
> 
> 0000000000000000 <bpf_reproducer_good>:
> ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>         0:       r2 = *(u32 *)(r1 + 96)
>         1:       *(u32 *)(r10 - 4) = r2
> ;       family = skops->family;
>         2:       r1 = *(u32 *)(r1 + 20)
>         3:       *(u32 *)(r10 - 8) = r1
> ;       return 0;
>         4:       r0 = 0
>         5:       exit
> 
> So we get reasonable assembly, but still something was causing the null
> pointer dereference. So, we load the programs and dump the xlated version
> observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
> translated by the skops access helpers.
> 
> int bpf_reproducer_bad(struct bpf_sock_ops * skops):
> ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>     0: (61) r1 = *(u32 *)(r1 +28)
>     1: (15) if r1 == 0x0 goto pc+2
>     2: (79) r1 = *(u64 *)(r1 +0)
>     3: (61) r1 = *(u32 *)(r1 +2340)
> ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>     4: (63) *(u32 *)(r10 -4) = r1
> ; return 0;
>     5: (b7) r0 = 0
>     6: (95) exit
> 
> int bpf_reproducer_good(struct bpf_sock_ops * skops):
> ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>     0: (61) r2 = *(u32 *)(r1 +28)
>     1: (15) if r2 == 0x0 goto pc+2
>     2: (79) r2 = *(u64 *)(r1 +0)
>     3: (61) r2 = *(u32 *)(r2 +2340)
> ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>     4: (63) *(u32 *)(r10 -4) = r2
> ; family = skops->family;
>     5: (79) r1 = *(u64 *)(r1 +0)
>     6: (69) r1 = *(u16 *)(r1 +16)
> ; family = skops->family;
>     7: (63) *(u32 *)(r10 -8) = r1
> ; return 0;
>     8: (b7) r0 = 0
>     9: (95) exit
> 
> Then we look at lines 0 and 2 above. In the good case we do the zero
> check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
> into the bpf_sock_ops check and we can confirm that is the 'struct
> sock *sk' pointer field. But, in the bad case,
> 
>     0: (61) r1 = *(u32 *)(r1 +28)
>     1: (15) if r1 == 0x0 goto pc+2
>     2: (79) r1 = *(u64 *)(r1 +0)
> 
> Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
> line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
> 
> [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> 
> The 0x01 makes sense because that is exactly the fullsock value. And
> its not a valid dereference so we splat.
> 
> To fix we need to guard the case when a program is doing a sock_ops field
> access with src_reg == dst_reg. This is already handled in the load case
> where the ctx_access handler uses a tmp register being careful to
> store the old value and restore it. To fix the get case test if
> src_reg == dst_reg and in this case do the is_fullsock test in the
> temporary register. Remembering to restore the temporary register before
> writing to either dst_reg or src_reg to avoid smashing the pointer into
> the struct holding the tmp variable.
> 
> Adding this inline code to test_tcpbpf_kern will now be generated
> correctly from,
> 
>    9: r2 = *(u32 *)(r2 + 96)
> 
> to xlated code,
> 
>    13: (61) r9 = *(u32 *)(r2 +28)
>    14: (15) if r9 == 0x0 goto pc+4
>    15: (79) r9 = *(u64 *)(r2 +32)
>    16: (79) r2 = *(u64 *)(r2 +0)
>    17: (61) r2 = *(u32 *)(r2 +2348)
>    18: (05) goto pc+1
>    19: (79) r9 = *(u64 *)(r2 +32)

The diff below looks good to me, but I'm confused on this one above. I'm probably
missing something, but given this is the dst == src case with the r2 register, where
in the dump do we first saves the content of r9 into the scratch tmp store?
Line 19 seems to restore it, but the save is missing, no?

Please double check whether this was just omitted, but I would really like to have
the commit message 100% correct as it otherwise causes confusion when we stare at it
again a month later wrt what was the original intention.

> And in the normal case we keep the original code, because really this
> is an edge case. From this,
> 
>    9: r2 = *(u32 *)(r6 + 96)
> 
> to xlated code,
> 
>    22: (61) r2 = *(u32 *)(r6 +28)
>    23: (15) if r2 == 0x0 goto pc+2
>    24: (79) r2 = *(u64 *)(r6 +0)
>    25: (61) r2 = *(u32 *)(r2 +2348)
> 
> So three additional instructions if dst == src register, but I scanned
> my current code base and did not see this pattern anywhere so should
> not be a big deal. Further, it seems no one else has hit this or at
> least reported it so it must a fairly rare pattern.
> 
> Fixes: 9b1f3d6e5af29 ("bpf: Refactor sock_ops_convert_ctx_access")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>   net/core/filter.c |   26 ++++++++++++++++++++++++--
>   1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 29e34551..15a0842 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8314,15 +8314,31 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>   /* Helper macro for adding read access to tcp_sock or sock fields. */
>   #define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)			      \
>   	do {								      \
> +		int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;     \
>   		BUILD_BUG_ON(sizeof_field(OBJ, OBJ_FIELD) >		      \
>   			     sizeof_field(struct bpf_sock_ops, BPF_FIELD));   \
> +		if (si->dst_reg == reg || si->src_reg == reg)		      \
> +			reg--;						      \
> +		if (si->dst_reg == reg || si->src_reg == reg)		      \
> +			reg--;						      \
> +		if (si->dst_reg == si->src_reg) {			      \
> +			*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,	      \
> +					  offsetof(struct bpf_sock_ops_kern,  \
> +					  temp));			      \
> +			fullsock_reg = reg;				      \
> +			jmp += 2;					      \
> +		}							      \
>   		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
>   						struct bpf_sock_ops_kern,     \
>   						is_fullsock),		      \
> -				      si->dst_reg, si->src_reg,		      \
> +				      fullsock_reg, si->src_reg,	      \
>   				      offsetof(struct bpf_sock_ops_kern,      \
>   					       is_fullsock));		      \
> -		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 2);	      \
> +		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);	      \
> +		if (si->dst_reg == si->src_reg)				      \
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
> +				      offsetof(struct bpf_sock_ops_kern,      \
> +				      temp));				      \
>   		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
>   						struct bpf_sock_ops_kern, sk),\
>   				      si->dst_reg, si->src_reg,		      \
> @@ -8331,6 +8347,12 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>   						       OBJ_FIELD),	      \
>   				      si->dst_reg, si->dst_reg,		      \
>   				      offsetof(OBJ, OBJ_FIELD));	      \
> +		if (si->dst_reg == si->src_reg)	{			      \
> +			*insn++ = BPF_JMP_A(1);				      \
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
> +				      offsetof(struct bpf_sock_ops_kern,      \
> +				      temp));				      \
> +		}							      \
>   	} while (0)
>   
>   #define SOCK_OPS_GET_TCP_SOCK_FIELD(FIELD) \
> 


^ permalink raw reply

* [PATCH net-next v4 2/2] net: openvswitch: make masks cache size configurable
From: Eelco Chaudron @ 2020-07-31 12:21 UTC (permalink / raw)
  To: netdev; +Cc: davem, dev, kuba, pabeni, pshelar, fw, xiangxia.m.yue
In-Reply-To: <159619801209.973760.834607259658375498.stgit@ebuild>

This patch makes the masks cache size configurable, or with
a size of 0, disable it.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---

Changes in v4:
 - Remove null check before calling free_percpu()
 - Make ovs_dp_change() return appropriate error codes

Changes in v3:
 - Use is_power_of_2() function
 - Use array_size() function
 - Fix remaining sparse errors

Changes in v2:
 - Fix sparse warnings
 - Fix netlink policy items reported by Florian Westphal

 include/uapi/linux/openvswitch.h |    1 
 net/openvswitch/datapath.c       |   17 ++++++
 net/openvswitch/flow_table.c     |  101 +++++++++++++++++++++++++++++++++-----
 net/openvswitch/flow_table.h     |   10 +++-
 4 files changed, 115 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 7cb76e5ca7cf..8300cc29dec8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -86,6 +86,7 @@ enum ovs_datapath_attr {
 	OVS_DP_ATTR_MEGAFLOW_STATS,	/* struct ovs_dp_megaflow_stats */
 	OVS_DP_ATTR_USER_FEATURES,	/* OVS_DP_F_*  */
 	OVS_DP_ATTR_PAD,
+	OVS_DP_ATTR_MASKS_CACHE_SIZE,
 	__OVS_DP_ATTR_MAX
 };
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a54df1fe3ec4..b58604eca032 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1498,6 +1498,7 @@ static size_t ovs_dp_cmd_msg_size(void)
 	msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_stats));
 	msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_megaflow_stats));
 	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_USER_FEATURES */
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_MASKS_CACHE_SIZE */
 
 	return msgsize;
 }
@@ -1535,6 +1536,10 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 	if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, OVS_DP_ATTR_MASKS_CACHE_SIZE,
+			ovs_flow_tbl_masks_cache_size(&dp->table)))
+		goto nla_put_failure;
+
 	genlmsg_end(skb, ovs_header);
 	return 0;
 
@@ -1599,6 +1604,16 @@ static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
 #endif
 	}
 
+	if (a[OVS_DP_ATTR_MASKS_CACHE_SIZE]) {
+		int err;
+		u32 cache_size;
+
+		cache_size = nla_get_u32(a[OVS_DP_ATTR_MASKS_CACHE_SIZE]);
+		err = ovs_flow_tbl_masks_cache_resize(&dp->table, cache_size);
+		if (err)
+			return err;
+	}
+
 	dp->user_features = user_features;
 
 	if (dp->user_features & OVS_DP_F_TC_RECIRC_SHARING)
@@ -1894,6 +1909,8 @@ static const struct nla_policy datapath_policy[OVS_DP_ATTR_MAX + 1] = {
 	[OVS_DP_ATTR_NAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
 	[OVS_DP_ATTR_UPCALL_PID] = { .type = NLA_U32 },
 	[OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 },
+	[OVS_DP_ATTR_MASKS_CACHE_SIZE] =  NLA_POLICY_RANGE(NLA_U32, 0,
+		PCPU_MIN_UNIT_SIZE / sizeof(struct mask_cache_entry)),
 };
 
 static const struct genl_ops dp_datapath_genl_ops[] = {
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index a5912ea05352..6527d84c3ea6 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -38,8 +38,8 @@
 #define MASK_ARRAY_SIZE_MIN	16
 #define REHASH_INTERVAL		(10 * 60 * HZ)
 
+#define MC_DEFAULT_HASH_ENTRIES	256
 #define MC_HASH_SHIFT		8
-#define MC_HASH_ENTRIES		(1u << MC_HASH_SHIFT)
 #define MC_HASH_SEGS		((sizeof(uint32_t) * 8) / MC_HASH_SHIFT)
 
 static struct kmem_cache *flow_cache;
@@ -341,15 +341,79 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
 	}
 }
 
+static void __mask_cache_destroy(struct mask_cache *mc)
+{
+	free_percpu(mc->mask_cache);
+	kfree(mc);
+}
+
+static void mask_cache_rcu_cb(struct rcu_head *rcu)
+{
+	struct mask_cache *mc = container_of(rcu, struct mask_cache, rcu);
+
+	__mask_cache_destroy(mc);
+}
+
+static struct mask_cache *tbl_mask_cache_alloc(u32 size)
+{
+	struct mask_cache_entry __percpu *cache = NULL;
+	struct mask_cache *new;
+
+	/* Only allow size to be 0, or a power of 2, and does not exceed
+	 * percpu allocation size.
+	 */
+	if ((!is_power_of_2(size) && size != 0) ||
+	    (size * sizeof(struct mask_cache_entry)) > PCPU_MIN_UNIT_SIZE)
+		return NULL;
+
+	new = kzalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return NULL;
+
+	new->cache_size = size;
+	if (new->cache_size > 0) {
+		cache = __alloc_percpu(array_size(sizeof(struct mask_cache_entry),
+						  new->cache_size),
+				       __alignof__(struct mask_cache_entry));
+		if (!cache) {
+			kfree(new);
+			return NULL;
+		}
+	}
+
+	new->mask_cache = cache;
+	return new;
+}
+int ovs_flow_tbl_masks_cache_resize(struct flow_table *table, u32 size)
+{
+	struct mask_cache *mc = rcu_dereference(table->mask_cache);
+	struct mask_cache *new;
+
+	if (size == mc->cache_size)
+		return 0;
+
+	if ((!is_power_of_2(size) && size != 0) ||
+	    (size * sizeof(struct mask_cache_entry)) > PCPU_MIN_UNIT_SIZE)
+		return -EINVAL;
+
+	new = tbl_mask_cache_alloc(size);
+	if (!new)
+		return -ENOMEM;
+
+	rcu_assign_pointer(table->mask_cache, new);
+	call_rcu(&mc->rcu, mask_cache_rcu_cb);
+
+	return 0;
+}
+
 int ovs_flow_tbl_init(struct flow_table *table)
 {
 	struct table_instance *ti, *ufid_ti;
+	struct mask_cache *mc;
 	struct mask_array *ma;
 
-	table->mask_cache = __alloc_percpu(sizeof(struct mask_cache_entry) *
-					   MC_HASH_ENTRIES,
-					   __alignof__(struct mask_cache_entry));
-	if (!table->mask_cache)
+	mc = tbl_mask_cache_alloc(MC_DEFAULT_HASH_ENTRIES);
+	if (!mc)
 		return -ENOMEM;
 
 	ma = tbl_mask_array_alloc(MASK_ARRAY_SIZE_MIN);
@@ -367,6 +431,7 @@ int ovs_flow_tbl_init(struct flow_table *table)
 	rcu_assign_pointer(table->ti, ti);
 	rcu_assign_pointer(table->ufid_ti, ufid_ti);
 	rcu_assign_pointer(table->mask_array, ma);
+	rcu_assign_pointer(table->mask_cache, mc);
 	table->last_rehash = jiffies;
 	table->count = 0;
 	table->ufid_count = 0;
@@ -377,7 +442,7 @@ int ovs_flow_tbl_init(struct flow_table *table)
 free_mask_array:
 	__mask_array_destroy(ma);
 free_mask_cache:
-	free_percpu(table->mask_cache);
+	__mask_cache_destroy(mc);
 	return -ENOMEM;
 }
 
@@ -453,9 +518,11 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
 {
 	struct table_instance *ti = rcu_dereference_raw(table->ti);
 	struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
+	struct mask_cache *mc = rcu_dereference(table->mask_cache);
+	struct mask_array *ma = rcu_dereference_ovsl(table->mask_array);
 
-	free_percpu(table->mask_cache);
-	call_rcu(&table->mask_array->rcu, mask_array_rcu_cb);
+	call_rcu(&mc->rcu, mask_cache_rcu_cb);
+	call_rcu(&ma->rcu, mask_array_rcu_cb);
 	table_instance_destroy(table, ti, ufid_ti, false);
 }
 
@@ -724,6 +791,7 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 					  u32 *n_mask_hit,
 					  u32 *n_cache_hit)
 {
+	struct mask_cache *mc = rcu_dereference(tbl->mask_cache);
 	struct mask_array *ma = rcu_dereference(tbl->mask_array);
 	struct table_instance *ti = rcu_dereference(tbl->ti);
 	struct mask_cache_entry *entries, *ce;
@@ -733,7 +801,7 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 
 	*n_mask_hit = 0;
 	*n_cache_hit = 0;
-	if (unlikely(!skb_hash)) {
+	if (unlikely(!skb_hash || mc->cache_size == 0)) {
 		u32 mask_index = 0;
 		u32 cache = 0;
 
@@ -749,11 +817,11 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 
 	ce = NULL;
 	hash = skb_hash;
-	entries = this_cpu_ptr(tbl->mask_cache);
+	entries = this_cpu_ptr(mc->mask_cache);
 
 	/* Find the cache entry 'ce' to operate on. */
 	for (seg = 0; seg < MC_HASH_SEGS; seg++) {
-		int index = hash & (MC_HASH_ENTRIES - 1);
+		int index = hash & (mc->cache_size - 1);
 		struct mask_cache_entry *e;
 
 		e = &entries[index];
@@ -867,6 +935,13 @@ int ovs_flow_tbl_num_masks(const struct flow_table *table)
 	return READ_ONCE(ma->count);
 }
 
+u32 ovs_flow_tbl_masks_cache_size(const struct flow_table *table)
+{
+	struct mask_cache *mc = rcu_dereference(table->mask_cache);
+
+	return READ_ONCE(mc->cache_size);
+}
+
 static struct table_instance *table_instance_expand(struct table_instance *ti,
 						    bool ufid)
 {
@@ -1095,8 +1170,8 @@ void ovs_flow_masks_rebalance(struct flow_table *table)
 	for (i = 0; i < masks_entries; i++) {
 		int index = masks_and_count[i].index;
 
-		new->masks[new->count++] =
-			rcu_dereference_ovsl(ma->masks[index]);
+		if (ovsl_dereference(ma->masks[index]))
+			new->masks[new->count++] = ma->masks[index];
 	}
 
 	rcu_assign_pointer(table->mask_array, new);
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index 325e939371d8..74ce48fecba9 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -27,6 +27,12 @@ struct mask_cache_entry {
 	u32 mask_index;
 };
 
+struct mask_cache {
+	struct rcu_head rcu;
+	u32 cache_size;  /* Must be ^2 value. */
+	struct mask_cache_entry __percpu *mask_cache;
+};
+
 struct mask_count {
 	int index;
 	u64 counter;
@@ -53,7 +59,7 @@ struct table_instance {
 struct flow_table {
 	struct table_instance __rcu *ti;
 	struct table_instance __rcu *ufid_ti;
-	struct mask_cache_entry __percpu *mask_cache;
+	struct mask_cache __rcu *mask_cache;
 	struct mask_array __rcu *mask_array;
 	unsigned long last_rehash;
 	unsigned int count;
@@ -77,6 +83,8 @@ int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
 			const struct sw_flow_mask *mask);
 void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow);
 int  ovs_flow_tbl_num_masks(const struct flow_table *table);
+u32  ovs_flow_tbl_masks_cache_size(const struct flow_table *table);
+int  ovs_flow_tbl_masks_cache_resize(struct flow_table *table, u32 size);
 struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
 				       u32 *bucket, u32 *idx);
 struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,


^ permalink raw reply related

* [PATCH net-next v4 1/2] net: openvswitch: add masks cache hit counter
From: Eelco Chaudron @ 2020-07-31 12:20 UTC (permalink / raw)
  To: netdev; +Cc: davem, dev, kuba, pabeni, pshelar, fw, xiangxia.m.yue
In-Reply-To: <159619801209.973760.834607259658375498.stgit@ebuild>

Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 include/uapi/linux/openvswitch.h |    2 +-
 net/openvswitch/datapath.c       |    5 ++++-
 net/openvswitch/datapath.h       |    3 +++
 net/openvswitch/flow_table.c     |   19 ++++++++++++++-----
 net/openvswitch/flow_table.h     |    3 ++-
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 9b14519e74d9..7cb76e5ca7cf 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -102,8 +102,8 @@ struct ovs_dp_megaflow_stats {
 	__u64 n_mask_hit;	 /* Number of masks used for flow lookups. */
 	__u32 n_masks;		 /* Number of masks for the datapath. */
 	__u32 pad0;		 /* Pad for future expension. */
+	__u64 n_cache_hit;       /* Number of cache matches for flow lookups. */
 	__u64 pad1;		 /* Pad for future expension. */
-	__u64 pad2;		 /* Pad for future expension. */
 };
 
 struct ovs_vport_stats {
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 95805f0e27bd..a54df1fe3ec4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -225,13 +225,14 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
 	struct dp_stats_percpu *stats;
 	u64 *stats_counter;
 	u32 n_mask_hit;
+	u32 n_cache_hit;
 	int error;
 
 	stats = this_cpu_ptr(dp->stats_percpu);
 
 	/* Look up flow. */
 	flow = ovs_flow_tbl_lookup_stats(&dp->table, key, skb_get_hash(skb),
-					 &n_mask_hit);
+					 &n_mask_hit, &n_cache_hit);
 	if (unlikely(!flow)) {
 		struct dp_upcall_info upcall;
 
@@ -262,6 +263,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
 	u64_stats_update_begin(&stats->syncp);
 	(*stats_counter)++;
 	stats->n_mask_hit += n_mask_hit;
+	stats->n_cache_hit += n_cache_hit;
 	u64_stats_update_end(&stats->syncp);
 }
 
@@ -699,6 +701,7 @@ static void get_dp_stats(const struct datapath *dp, struct ovs_dp_stats *stats,
 		stats->n_missed += local_stats.n_missed;
 		stats->n_lost += local_stats.n_lost;
 		mega_stats->n_mask_hit += local_stats.n_mask_hit;
+		mega_stats->n_cache_hit += local_stats.n_cache_hit;
 	}
 }
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 697a2354194b..86d78613edb4 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -38,12 +38,15 @@
  * @n_mask_hit: Number of masks looked up for flow match.
  *   @n_mask_hit / (@n_hit + @n_missed)  will be the average masks looked
  *   up per packet.
+ * @n_cache_hit: The number of received packets that had their mask found using
+ * the mask cache.
  */
 struct dp_stats_percpu {
 	u64 n_hit;
 	u64 n_missed;
 	u64 n_lost;
 	u64 n_mask_hit;
+	u64 n_cache_hit;
 	struct u64_stats_sync syncp;
 };
 
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index af22c9ee28dd..a5912ea05352 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -667,6 +667,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
 				   struct mask_array *ma,
 				   const struct sw_flow_key *key,
 				   u32 *n_mask_hit,
+				   u32 *n_cache_hit,
 				   u32 *index)
 {
 	u64 *usage_counters = this_cpu_ptr(ma->masks_usage_cntr);
@@ -682,6 +683,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
 				u64_stats_update_begin(&ma->syncp);
 				usage_counters[*index]++;
 				u64_stats_update_end(&ma->syncp);
+				(*n_cache_hit)++;
 				return flow;
 			}
 		}
@@ -719,7 +721,8 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
 struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 					  const struct sw_flow_key *key,
 					  u32 skb_hash,
-					  u32 *n_mask_hit)
+					  u32 *n_mask_hit,
+					  u32 *n_cache_hit)
 {
 	struct mask_array *ma = rcu_dereference(tbl->mask_array);
 	struct table_instance *ti = rcu_dereference(tbl->ti);
@@ -729,10 +732,13 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 	int seg;
 
 	*n_mask_hit = 0;
+	*n_cache_hit = 0;
 	if (unlikely(!skb_hash)) {
 		u32 mask_index = 0;
+		u32 cache = 0;
 
-		return flow_lookup(tbl, ti, ma, key, n_mask_hit, &mask_index);
+		return flow_lookup(tbl, ti, ma, key, n_mask_hit, &cache,
+				   &mask_index);
 	}
 
 	/* Pre and post recirulation flows usually have the same skb_hash
@@ -753,7 +759,7 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 		e = &entries[index];
 		if (e->skb_hash == skb_hash) {
 			flow = flow_lookup(tbl, ti, ma, key, n_mask_hit,
-					   &e->mask_index);
+					   n_cache_hit, &e->mask_index);
 			if (!flow)
 				e->skb_hash = 0;
 			return flow;
@@ -766,10 +772,12 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 	}
 
 	/* Cache miss, do full lookup. */
-	flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &ce->mask_index);
+	flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, n_cache_hit,
+			   &ce->mask_index);
 	if (flow)
 		ce->skb_hash = skb_hash;
 
+	*n_cache_hit = 0;
 	return flow;
 }
 
@@ -779,9 +787,10 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
 	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 	struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
 	u32 __always_unused n_mask_hit;
+	u32 __always_unused n_cache_hit;
 	u32 index = 0;
 
-	return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &index);
+	return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &n_cache_hit, &index);
 }
 
 struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index 1f664b050e3b..325e939371d8 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -82,7 +82,8 @@ struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
 struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,
 					  const struct sw_flow_key *,
 					  u32 skb_hash,
-					  u32 *n_mask_hit);
+					  u32 *n_mask_hit,
+					  u32 *n_cache_hit);
 struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *,
 				    const struct sw_flow_key *);
 struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,


^ permalink raw reply related

* [PATCH net-next v4 0/2] net: openvswitch: masks cache enhancements
From: Eelco Chaudron @ 2020-07-31 12:20 UTC (permalink / raw)
  To: netdev; +Cc: davem, dev, kuba, pabeni, pshelar, fw, xiangxia.m.yue

This patchset adds two enhancements to the Open vSwitch masks cache.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

Changes in v4 [patch 2/2 only]:
 - Remove null check before calling free_percpu()
 - Make ovs_dp_change() return appropriate error codes

Changes in v3 [patch 2/2 only]:
 - Use is_power_of_2() function
 - Use array_size() function
 - Fix remaining sparse errors

Changes in v2 [patch 2/2 only]:
 - Fix sparse warnings
 - Fix netlink policy items reported by Florian Westphal

Eelco Chaudron (2):
      net: openvswitch: add masks cache hit counter
      net: openvswitch: make masks cache size configurable


 include/uapi/linux/openvswitch.h |    3 +
 net/openvswitch/datapath.c       |   22 +++++++
 net/openvswitch/datapath.h       |    3 +
 net/openvswitch/flow_table.c     |  120 ++++++++++++++++++++++++++++++++------
 net/openvswitch/flow_table.h     |   13 +++-
 5 files changed, 139 insertions(+), 22 deletions(-)


^ permalink raw reply

* Re: [Linux-kernel-mentees] [PATCH net] rds: Prevent kernel-infoleak in rds_notify_queue_get()
From: Håkon Bugge @ 2020-07-31 12:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Dan Carpenter, Leon Romanovsky, Peilin Ye, Santosh Shilimkar,
	David S. Miller, Jakub Kicinski, Arnd Bergmann,
	linux-kernel-mentees, netdev, OFED mailing list, rds-devel,
	linux-kernel
In-Reply-To: <20200731115909.GA1649637@kroah.com>



> On 31 Jul 2020, at 13:59, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> 
> On Fri, Jul 31, 2020 at 01:14:09PM +0200, Håkon Bugge wrote:
>> 
>> 
>>> On 31 Jul 2020, at 11:59, Dan Carpenter <dan.carpenter@oracle.com> wrote:
>>> 
>>> On Fri, Jul 31, 2020 at 07:53:01AM +0300, Leon Romanovsky wrote:
>>>> On Thu, Jul 30, 2020 at 03:20:26PM -0400, Peilin Ye wrote:
>>>>> rds_notify_queue_get() is potentially copying uninitialized kernel stack
>>>>> memory to userspace since the compiler may leave a 4-byte hole at the end
>>>>> of `cmsg`.
>>>>> 
>>>>> In 2016 we tried to fix this issue by doing `= { 0 };` on `cmsg`, which
>>>>> unfortunately does not always initialize that 4-byte hole. Fix it by using
>>>>> memset() instead.
>>>> 
>>>> Of course, this is the difference between "{ 0 }" and "{}" initializations.
>>>> 
>>> 
>>> No, there is no difference.  Even struct assignments like:
>>> 
>>> 	foo = *bar;
>>> 
>>> can leave struct holes uninitialized.  Depending on the compiler the
>>> assignment can be implemented as a memset() or as a series of struct
>>> member assignments.
>> 
>> What about:
>> 
>> struct rds_rdma_notify {
>> 	__u64                      user_token;
>> 	__s32                      status;
>> } __attribute__((packed));
> 
> Why is this still a discussion at all?
> 
> Try it and see, run pahole and see if there are holes in this structure
> (odds are no), you don't need us to say what is happening here...

An older posting had this:

$ pahole -C "rds_rdma_notify" net/rds/recv.o
struct rds_rdma_notify {
	__u64                      user_token;           /*     0     8 */
	__s32                      status;               /*     8     4 */

	/* size: 16, cachelines: 1, members: 2 */
	/* padding: 4 */
	/* last cacheline: 16 bytes */
};


Thxs, Håkon


^ permalink raw reply

* [PATCH v4 04/12] iwlegacy: Check if pcie_capability_read_*() reads ~0
From: Saheed O. Bolarinwa @ 2020-07-31 11:02 UTC (permalink / raw)
  To: helgaas, Kalle Valo, David S. Miller, Jakub Kicinski
  Cc: Saheed O. Bolarinwa, bjorn, skhan, linux-kernel-mentees,
	linux-pci, linux-kernel, Stanislaw Gruszka, linux-wireless,
	netdev
In-Reply-To: <20200731110240.98326-1-refactormyself@gmail.com>

On failure pcie_capability_read_dword() sets it's last parameter, val
to 0. However, with Patch 12/12, it is possible that val is set to ~0
on failure. This would introduce a bug because (x & x) == (~0 & x).

Since ~0 is an invalid value here,

Add an extra check for ~0 to the if condition to ensure success.

Suggested-by: Bjorn Helgaas <bjorn@helgaas.com>
Signed-off-by: Saheed O. Bolarinwa <refactormyself@gmail.com>
---
 drivers/net/wireless/intel/iwlegacy/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlegacy/common.c b/drivers/net/wireless/intel/iwlegacy/common.c
index 348c17ce72f5..659027563260 100644
--- a/drivers/net/wireless/intel/iwlegacy/common.c
+++ b/drivers/net/wireless/intel/iwlegacy/common.c
@@ -4287,7 +4287,7 @@ il_apm_init(struct il_priv *il)
 	 */
 	if (il->cfg->set_l0s) {
 		pcie_capability_read_word(il->pci_dev, PCI_EXP_LNKCTL, &lctl);
-		if (lctl & PCI_EXP_LNKCTL_ASPM_L1) {
+		if ((lctl != (u16)~0) && (lctl & PCI_EXP_LNKCTL_ASPM_L1)) {
 			/* L1-ASPM enabled; disable(!) L0S  */
 			il_set_bit(il, CSR_GIO_REG,
 				   CSR_GIO_REG_VAL_L0S_ENABLED);
-- 
2.18.4


^ permalink raw reply related

* [PATCH v4 03/12] ath9k: Check if pcie_capability_read_*() reads ~0
From: Saheed O. Bolarinwa @ 2020-07-31 11:02 UTC (permalink / raw)
  To: helgaas, David S. Miller, Kalle Valo, Jakub Kicinski
  Cc: Saheed O. Bolarinwa, bjorn, skhan, linux-kernel-mentees,
	linux-pci, linux-kernel, QCA ath9k Development, linux-wireless,
	netdev
In-Reply-To: <20200731110240.98326-1-refactormyself@gmail.com>

On failure pcie_capability_read_dword() sets it's last parameter, val
to 0. However, with Patch 12/12, it is possible that val is set to ~0
on failure. This would introduce a bug because (x & x) == (~0 & x).

Since ~0 is an invalid value here,

Add an extra check for ~0 to the if condition to ensure success.

Suggested-by: Bjorn Helgaas <bjorn@helgaas.com>
Signed-off-by: Saheed O. Bolarinwa <refactormyself@gmail.com>
---
 drivers/net/wireless/ath/ath9k/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/pci.c b/drivers/net/wireless/ath/ath9k/pci.c
index f3461b193c7a..f02b243befef 100644
--- a/drivers/net/wireless/ath/ath9k/pci.c
+++ b/drivers/net/wireless/ath/ath9k/pci.c
@@ -867,7 +867,8 @@ static void ath_pci_aspm_init(struct ath_common *common)
 		pci_read_config_dword(pdev, 0x70c, &ah->config.aspm_l1_fix);
 
 	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &aspm);
-	if (aspm & (PCI_EXP_LNKCTL_ASPM_L0S | PCI_EXP_LNKCTL_ASPM_L1)) {
+	if ((aspm != (u16)~0) &&
+	    (aspm & (PCI_EXP_LNKCTL_ASPM_L0S | PCI_EXP_LNKCTL_ASPM_L1))) {
 		ah->aspm_enabled = true;
 		/* Initialize PCIe PM and SERDES registers. */
 		ath9k_hw_configpcipowersave(ah, false);
-- 
2.18.4


^ permalink raw reply related

* [PATCH v4 00/12] PCI: Remove '*val = 0' from pcie_capability_read_*()
From: Saheed O. Bolarinwa @ 2020-07-31 11:02 UTC (permalink / raw)
  To: helgaas, Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe, Arnd Bergmann, Greg Kroah-Hartman,
	David S. Miller, Kalle Valo, Jakub Kicinski, Rafael J. Wysocki,
	Len Brown, Russell Currey, Sam Bobroff, Oliver O'Halloran
  Cc: Saheed O. Bolarinwa, bjorn, skhan, linux-kernel-mentees,
	linux-pci, linux-kernel, linux-rdma, QCA ath9k Development,
	linux-wireless, netdev, linux-acpi, linuxppc-dev

v4 CHANGES:
- Drop uses of pcie_capability_read_*() return value. This related to
  [1] which is pointing towards making the accessors return void.
- Remove patches found to be unnecessary
- Reword some commit messages

v3 CHANGES:
- Split previous PATCH 6/13 into two : PATCH 6/14 and PATCH 7/14
- Fix commit message of PATCH 5/14
- Update Patch numbering and Commit messages
- Add 'Acked by Greg KH' to PATCH 2/14
- Add PATCH version

v2 CHANGES:
- Fix missing comma, causing the email cc error
- Fix typos and numbering errors in commit messages
- Add commit message to 13/13
- Add two more patches: PATCH 3/13 and PATCH 4/13

MERGING:
- Patch 6/12 depends on Patch 5/12. However Patch 5/12 has no dependency.
  Please, merge PATCH 6/12 only after Patch 5/12.
- Patch 12/12 depends on all preceding patches. Please merge Patch 12/12
  only after other patches in this series have been merged.
- All other patches have no dependencies besides those mentioned above and
  can be merge individually.

PATCH 5/12:
Set the default case in the switch-statement to set status
to "Power On".

PATCH 1/12 to 11/12:
Use the value read by pcie_capability_read_*() to determine success or
failure. This is done by checking if it is ~0, while maintaining the
functions' behaviour. This ensures that the changes in PATCH 12/12 does
not introduce any bug.

PATCH 12/12:
There are several reasons why a PCI capability read may fail whether the
device is present or not. If this happens, pcie_capability_read_*() will
return -EINVAL/PCIBIOS_BAD_REGISTER_NUMBER or PCIBIOS_DEVICE_NOT_FOUND
and *val is set to 0.

This behaviour if further ensured by this code inside
pcie_capability_read_*()

 ret = pci_read_config_dword(dev, pci_pcie_cap(dev) + pos, val);
 /*
  * Reset *val to 0 if pci_read_config_dword() fails, it may
  * have been written as 0xFFFFFFFF if hardware error happens
  * during pci_read_config_dword().
  */
 if (ret)
	 *val = 0;
 return ret;

a) Since all pci_generic_config_read() does is read a register value,
it may return success after reading a ~0 which *may* have been fabricated
by the PCI host bridge due to a read timeout. Hence pci_read_config_*() 
will return success with a fabricated ~0 in *val, indicating a problem.
In this case, the assumed behaviour of  pcie_capability_read_*() will be
wrong. To avoid error slipping through, more checks are necessary.

b) pci_read_config_*() will return PCIBIOS_DEVICE_NOT_FOUND only if 
dev->error_state = pci_channel_io_perm_failure (i.e. 
pci_dev_is_disconnected()) or if pci_generic_config_read() can't find the
device. In both cases *val is initially set to ~0 but as shown in the code
above pcie_capability_read_*() resets it back to 0. Even with this effort,
drivers still have to perform validation checks more so if 0 is a valid
value.

Most drivers only consider the case (b) and in some cases, there is the 
expectation that on timeout *val has a fabricated value of ~0, which *may*
not always be true as explained in (a).

In any case, checks need to be done to validate the value read and maybe
confirm which error has occurred. It is better left to the drivers to do.

Check the return value of pcie_capability_read_dword() to ensure success
and avoid bug as a result of Patch 14/14.
Remove the reset of *val to 0 when pci_read_config_*() fails.

[1] https://lore.kernel.org/linux-pci/20200714234625.GA428442@bjorn-Precision-5520/

Saheed O. Bolarinwa (12):
  IB/hfi1: Check if pcie_capability_read_*() reads ~0
  misc: rtsx: Check if pcie_capability_read_*() reads ~0
  ath9k: Check if pcie_capability_read_*() reads ~0
  iwlegacy: Check if pcie_capability_read_*() reads ~0
  PCI: pciehp: Set "Power On" as the default get_power_status
  PCI: pciehp: Check if pcie_capability_read_*() reads ~0
  PCI/ACPI: Check if pcie_capability_read_*() reads ~0
  PCI: Check if pcie_capability_read_*() reads ~0
  PCI/PM: Check if pcie_capability_read_*() reads ~0
  PCI/AER: Check if pcie_capability_read_*() reads ~0
  PCI/ASPM: Check if pcie_capability_read_*() reads ~0
  PCI: Remove '*val = 0' from pcie_capability_read_*()

 drivers/infiniband/hw/hfi1/aspm.c            |  6 ++--
 drivers/misc/cardreader/rts5227.c            |  2 +-
 drivers/misc/cardreader/rts5249.c            |  2 +-
 drivers/misc/cardreader/rts5260.c            |  2 +-
 drivers/misc/cardreader/rts5261.c            |  2 +-
 drivers/net/wireless/ath/ath9k/pci.c         |  3 +-
 drivers/net/wireless/intel/iwlegacy/common.c |  2 +-
 drivers/pci/access.c                         | 14 --------
 drivers/pci/hotplug/pciehp_hpc.c             | 13 +++++---
 drivers/pci/pci-acpi.c                       |  4 +--
 drivers/pci/pci.c                            | 34 ++++++++++++++------
 drivers/pci/pcie/aer.c                       |  2 +-
 drivers/pci/pcie/aspm.c                      | 10 +++---
 drivers/pci/probe.c                          | 12 +++----
 14 files changed, 56 insertions(+), 52 deletions(-)

-- 
2.18.4

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox